Literature DB >> 20515448

EnvMine: a text-mining system for the automatic extraction of contextual information.

Javier Tamames1, Victor de Lorenzo.   

Abstract

BACKGROUND: For ecological studies, it is crucial to count on adequate descriptions of the environments and samples being studied. Such a description must be done in terms of their physicochemical characteristics, allowing a direct comparison between different environments that would be difficult to do otherwise. Also the characterization must include the precise geographical location, to make possible the study of geographical distributions and biogeographical patterns. Currently, there is no schema for annotating these environmental features, and these data have to be extracted from textual sources (published articles). So far, this had to be performed by manual inspection of the corresponding documents. To facilitate this task, we have developed EnvMine, a set of text-mining tools devoted to retrieve contextual information (physicochemical variables and geographical locations) from textual sources of any kind.
RESULTS: EnvMine is capable of retrieving the physicochemical variables cited in the text, by means of the accurate identification of their associated units of measurement. In this task, the system achieves a recall (percentage of items retrieved) of 92% with less than 1% error. Also a Bayesian classifier was tested for distinguishing parts of the text describing environmental characteristics from others dealing with, for instance, experimental settings.Regarding the identification of geographical locations, the system takes advantage of existing databases such as GeoNames to achieve 86% recall with 92% precision. The identification of a location includes also the determination of its exact coordinates (latitude and longitude), thus allowing the calculation of distance between the individual locations.
CONCLUSION: EnvMine is a very efficient method for extracting contextual information from different text sources, like published articles or web pages. This tool can help in determining the precise location and physicochemical variables of sampling sites, thus facilitating the performance of ecological analyses. EnvMine can also help in the development of standards for the annotation of environmental features.

Entities:  

Mesh:

Year:  2010        PMID: 20515448      PMCID: PMC2901371          DOI: 10.1186/1471-2105-11-294

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  19 in total

1.  Global dispersal of free-living microbial eukaryote species.

Authors:  Bland J Finlay
Journal:  Science       Date:  2002-05-10       Impact factor: 47.728

Review 2.  The next generation of literature analysis: integration of genomic analysis into text mining.

Authors:  Matthias Scherf; Anton Epple; Thomas Werner
Journal:  Brief Bioinform       Date:  2005-09       Impact factor: 11.622

3.  Zone analysis in biology articles as a basis for information extraction.

Authors:  Yoko Mizuta; Anna Korhonen; Tony Mullen; Nigel Collier
Journal:  Int J Med Inform       Date:  2005-08-19       Impact factor: 4.046

Review 4.  Text mining and its potential applications in systems biology.

Authors:  Sophia Ananiadou; Douglas B Kell; Jun-ichi Tsujii
Journal:  Trends Biotechnol       Date:  2006-10-12       Impact factor: 19.536

Review 5.  Diversity of the human gastrointestinal tract microbiota revisited.

Authors:  Mirjana Rajilić-Stojanović; Hauke Smidt; Willem M de Vos
Journal:  Environ Microbiol       Date:  2007-09       Impact factor: 5.491

Review 6.  Analysis of bacterial bowel communities of IBD patients: what has it revealed?

Authors:  Harry Sokol; Christophe Lay; Philippe Seksik; Gerald W Tannock
Journal:  Inflamm Bowel Dis       Date:  2008-06       Impact factor: 5.325

7.  Prokaryotic genetic diversity throughout the salinity gradient of a coastal solar saltern.

Authors:  Susana Benlloch; Arantxa López-López; Emilio O Casamayor; Lise Øvreås; Victoria Goddard; Frida Lise Daae; Gary Smerdon; Ramón Massana; Ian Joint; Frede Thingstad; Carlos Pedrós-Alió; Francisco Rodríguez-Valera
Journal:  Environ Microbiol       Date:  2002-06       Impact factor: 5.491

8.  Text detective: a rule-based system for gene annotation in biomedical texts.

Authors:  Javier Tamames
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

9.  Facilitating the development of controlled vocabularies for metabolomics technologies with text mining.

Authors:  Irena Spasić; Daniel Schober; Susanna-Assunta Sansone; Dietrich Rebholz-Schuhmann; Douglas B Kell; Norman W Paton
Journal:  BMC Bioinformatics       Date:  2008-04-29       Impact factor: 3.169

Review 10.  Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

Authors:  Martin Krallinger; Alfonso Valencia; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  7 in total

1.  A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.

Authors:  Tasnia Tahsin; Davy Weissenbacher; Robert Rivera; Rachel Beard; Mari Firago; Garrick Wallstrom; Matthew Scotch; Graciela Gonzalez
Journal:  J Am Med Inform Assoc       Date:  2016-01-17       Impact factor: 4.497

2.  BioNLP Shared Task--The Bacteria Track.

Authors:  Robert Bossy; Julien Jourde; Alain-Pierre Manine; Philippe Veber; Erick Alphonse; Maarten van de Guchte; Philippe Bessières; Claire Nédellec
Journal:  BMC Bioinformatics       Date:  2012-06-26       Impact factor: 3.169

3.  Knowledge-driven geospatial location resolution for phylogeographic models of virus migration.

Authors:  Davy Weissenbacher; Tasnia Tahsin; Rachel Beard; Mari Figaro; Robert Rivera; Matthew Scotch; Graciela Gonzalez
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

4.  Deep neural networks and distant supervision for geographic location mention extraction.

Authors:  Arjun Magge; Davy Weissenbacher; Abeed Sarker; Matthew Scotch; Graciela Gonzalez-Hernandez
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

5.  Named entity linking of geospatial and host metadata in GenBank for advancing biomedical research.

Authors:  Tasnia Tahsin; Davy Weissenbacher; Demetrius Jones-Shargani; Daniel Magee; Matteo Vaiente; Graciela Gonzalez; Matthew Scotch
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

6.  Bi-directional Recurrent Neural Network Models for Geographic Location Extraction in Biomedical Literature.

Authors:  Arjun Magge; Davy Weissenbacher; Abeed Sarker; Matthew Scotch; Graciela Gonzalez-Hernandez
Journal:  Pac Symp Biocomput       Date:  2019

7.  Extracting and modeling geographic information from scientific articles.

Authors:  Elise Acheson; Ross S Purves
Journal:  PLoS One       Date:  2021-01-06       Impact factor: 3.240

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.