| Literature DB >> 21335611 |
Stefan Götz1, Roland Arnold, Patricia Sebastián-León, Samuel Martín-Rodríguez, Patrick Tischler, Marc-André Jehl, Joaquín Dopazo, Thomas Rattei, Ana Conesa.
Abstract
MOTIVATION: Functional genomics research has expanded enormously in the last decade thanks to the cost reduction in high-throughput technologies and the development of computational tools that generate, standardize and share information on gene and protein function such as the Gene Ontology (GO). Nevertheless, many biologists, especially working with non-model organisms, still suffer from non-existing or low-coverage functional annotation, or simply struggle retrieving, summarizing and querying these data.Entities:
Mesh:
Year: 2011 PMID: 21335611 PMCID: PMC3065692 DOI: 10.1093/bioinformatics/btr059
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The B2G-FAR annotation pipeline. The scheme shows how the different data-sources are related and contribute to the generation of the B2G-FAR annotations, all passing through the Blast2GO annotation algorithm.
Simap2GO annotation coverage: the table shows the number of Blast2GO-annotated sequences in relation to the whole SIMAP dataset (May 2010) and the number of GO sequences which has been used as annotation source/reference dataset
| Data source | Unique sequences |
|---|---|
| Whole Simap | 29 906 548 |
| Simap without metagenomes | 25 099 929 |
| Simap protein sequences annotated by Blast2GO | 14 175 984 |
| Sequences which do not surpass the annotation threshold | 1 938 862 |
| Sequences without sequence alignment | 8 985 083 |
| GO annotation source sequences (only sequences with non-electronic annotations) | 465 677 |
Only sequences with at least one non-electronic annotations (non-IEA) were used (GO-Lite data-set). Additionally, the number of sequences which could not be annotated is given, i.e. sequences without sequence alignments and sequences whose annotations did not surpass the annotation threshold.
Functional annotation of 10 000 random sequences from the GO and B2G-FAR compared against each other (annotation score ≥ 70, evalue ≤1 × E−10, GOw = 5, 5 BLAST hits)
| Compared | GO versus FAR | FAR versus GO |
|---|---|---|
| Compared terms | 46 414 (GO) | 61 176 (B2G-FAR) |
| Exact GO term match | 29 446 | 29 446 |
| More specific GO terms | 510 | 7960 |
| More general GO terms | 13 457 | 156 |
| Other GO branch | 1126 | 16 193 |
| Other GO category | 1875 | 7421 |
*Comparisons are given as reference database versus comparing database, and numbers refer to the reference database.
Fig. 2.Comparison between NetAffx and Blast2GO generated annotations for GeneChips contained in B2G-FAR.