| Literature DB >> 22859749 |
Tanya Z Berardini1, Donghui Li, Robert Muller, Raymond Chetty, Larry Ploetz, Shanker Singh, April Wensel, Eva Huala.
Abstract
As the scientific literature grows, leading to an increasing volume of published experimental data, so does the need to access and analyze this data using computational tools. The most commonly used method to convert published experimental data on gene function into controlled vocabulary annotations relies on a professional curator, employed by a model organism database or a more general resource such as UniProt, to read published articles and compose annotation statements based on the articles' contents. A more cost-effective and scalable approach capable of capturing gene function data across the whole range of biological research organisms in computable form is urgently needed. We have analyzed a set of ontology annotations generated through collaborations between the Arabidopsis Information Resource and several plant science journals. Analysis of the submissions entered using the online submission tool shows that most community annotations were well supported and the ontology terms chosen were at an appropriate level of specificity. Of the 503 individual annotations that were submitted, 97% were approved and community submissions captured 72% of all possible annotations. This new method for capturing experimental results in a computable form provides a cost-effective way to greatly increase the available body of annotations without sacrificing annotation quality. Database URL: www.arabidopsis.org.Entities:
Mesh:
Year: 2012 PMID: 22859749 PMCID: PMC3410254 DOI: 10.1093/database/bas030
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.The TOAST interface. (A) Initial page that requests stable article identifiers and locus identifiers. Users can then add annotations in six different areas, five of which are controlled vocabularies. (B) The subcellular localization data entry form. Submissions are aided by an auto-complete functionality which suggests terms that match the user's entry. Once selected, the appropriate stable id for the ontology term is also captured but not displayed to the submitter. Users can also enter terms not in the suggestion list. (C) Form with data ready for submission. At this stage the user may add additional loci or annotations or complete the submission process by saving to the curation database.
Figure 2.Literature-based annotation at TAIR (2000–2010). The total number of research articles containing Arabidopsis gene-related information in the TAIR database is represented in blue. In green and orange are the number of articles used for controlled vocabulary annotations by either TAIR or the community, respectively.
Figure 3.Distribution of community annotation counts. The bins group articles by number of associated community annotations.
Figure 4.Analysis of community annotations. (A) Completeness of community annotations. The 50 articles analyzed are shown on X-axis, and the total number of curator and community annotations per paper shown on the Y-axis. The number of community annotations is shown in blue, and the number of added curator annotations in orange. (B) Experimental support for community annotations. Supported community annotations in blue, unsupported community annotations in orange, out of scope annotations in green. (C) Level of specificity of community annotations. Papers shown on X-axis, total number of community annotations per paper shown on Y-axis. Community annotations with same specificity as curator annotations are shown in blue, more specific community annotations in orange, more specific curator annotations in green.
Specificity of community versus curator annotations
| Article | Term submitted by author | Term matched by submission software | Term selected by curator | Annotation count | Greater specificity | Min. number of steps between two terms |
|---|---|---|---|---|---|---|
| A | Mitochondrion | Mitochondrion (GO:0005739) | Mitochondrial respiratory chain complex I (GO:0005747) | 1 | Curator | 6 |
| A | Mitochondrion | Mitochondrion (GO:0005739) | Mitochondrial respiratory chain complex III (GO:0005750) | 1 | Curator | 6 |
| A | Mitochondrion | Mitochondrion (GO:0005739) | Mitochondrial proton-transporting ATP synthase complex (GO:0005753) | 1 | Curator | 5 |
| B | Ammonium transmembrane transporter activity | Ammonium transmembrane transporter activity (GO:0008519) | High affinity secondary active ammonium transmembrane transporter activity (GO:0015398) | 1 | Curator | 1 |
| C | Leaf | Leaf (PO:0025034) | Vascular leaf (PO:0009025) | 1 | Curator | 1 |
| D | Cytokinesis | Cytokinesis (GO:0000910) | Cell plate assembly (GO:0000919) | 1 | Curator | 2 |
| E | Response to ABA | Response to abscisic acid stimulus (GO:0009737) | Negative regulation of abscisic acid-mediated signaling pathway (GO:0009788) | 1 | Curator | 4 |
| F | Cytokinesis | Cytokinesis (GO:0000910) | Cell plate assembly (GO:0000919) | 1 | Curator | 2 |
| G | Leaf | Leaf (PO:0025034) | Vascular leaf (PO:0009025) | 2 | Curator | 1 |
| H | Protein binding | Protein binding (GO:000551) | Protein self-association (GO:0043621) | 1 | Curator | 1 |
| H | Protein binding | Protein binding (GO:000551) | Protein heterodimerization activity (GO:0046982) | 1 | Curator | 2 |
| I | Seed maturation | Seed maturation (GO:0010431) | Negative regulation of seed maturation (GO:2000692) | 3 | Curator | 2 |
| I | Seed maturation | Seed maturation (GO:0010431) | Regulation of seed maturation (GO:2000034) | 1 | Curator | 1 |
| J | Transcription activator activity | Transcription activator activity (GO:0003710) | Positive regulation of transcription, DNA-dependent (GO:0045893) | 1 | Curator | 1 |
| K | Leaf formation | Leaf formation (GO:0010338) | Leaf morphogenesis (GO:0009965) | 1 | Author | 1 |
| L | Brassinosteroid-mediatedsignaling pathway | Brassinosteroid-mediatedsignaling pathway (GO:0009742) | Response to brassinosteroid stimulus (GO:0009741) | 1 | Author | 2 |
Based on GO ontology files as of 23 August 2011.
aNew GO term added.
Figure 5.TAIR annotation detail page showing attribution to community member.