| Literature DB >> 22513129 |
Lynette Hirschman1, Gully A P C Burns, Martin Krallinger, Cecilia Arighi, K Bretonnel Cohen, Alfonso Valencia, Cathy H Wu, Andrew Chatr-Aryamontri, Karen G Dowell, Eva Huala, Anália Lourenço, Robert Nash, Anne-Lise Veuthey, Thomas Wiegers, Andrew G Winter.
Abstract
Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on 'Text Mining for the BioCuration Workflow' at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community.Entities:
Mesh:
Year: 2012 PMID: 22513129 PMCID: PMC3328793 DOI: 10.1093/database/bas020
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Text mining and the biocuration workflow: main tasks of a canonical annotation workflow, including (A) triage, (B) bio-entity identification and normalization, (C) annotation event detection, (D) evidential qualifier association and (E) database record completion.
Partial list of text mining tools and capabilities in the BioCuration Workflow supporting: Triage, bio-entity identification and normalization, annotation relation and event detection and evidential qualifier association
A dark cell indicates that the tool is applicable to the task; a light color cell indicates not applicable. Tools are linked to their associated website
Biological databases represented in the surveys: biocurators from databases in *bold were interviewed for the initial biocuration workflow study
| Description | |
|---|---|
| Protein–protein interaction | |
| Model Organism Databases | |
| RGD | Rat Genome Database |
| Dictybase | |
| MaizeGDB | Maize Genome Database |
| WormBase | Database of the biology and genome of |
| FlyBase | Database of |
| SoyBase | Resource for soybean researchers |
| UniProt | Protein Database |
| Pathway and reactions | |
| Reactome | Signaling and metabolic pathway focused on Human |
| SABIO-RK | SABIO-Reaction Kinetics Database Genome |
| JGI | Joint Genome Institute genome portal |
| AgBase | Resource for functional analysis of agricultural plant and animal gene products |
| @NoteWiki | Genome-scale metabolic reconstruction and regulatory network analysis |
| Cardiovascular Gene Ontology | Gene Ontology annotations for the cardiovascular system |
| modENCOD | Model organism ENCyclopedia Of DNA Elements project |
| BioWisdom | Healthcare intelligent system |