| Literature DB >> 23725374 |
Zhe Liu1, Hongwu Ma, Igor Goryanin.
Abstract
BACKGROUND: Different genome annotation services have been developed in recent years and widely used. However, the functional annotation results from different services are often not the same and a scheme to obtain consensus functional annotations by integrating different results is in demand.Entities:
Mesh:
Year: 2013 PMID: 23725374 PMCID: PMC3680241 DOI: 10.1186/1471-2105-14-172
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of the gene annotation comparison procedure. The boxes represent the processes used to compare the annotations and the figures in each box denote the annotation comparison number and rate.
Examples of three types of term difference
| Text variant | cAMP-binding proteins - catabolite gene activator and regulatory subunit of cAMP-dependent protein kinases | putative cAMP-binding protein - catabolite protein activator and regulatory subunit of cAMP-dependent protein kinase |
| Synonym and abbreviation | RIP metalloprotease | Membrane-associated zinc metalloprotease |
| Functional expression variant | phosphoribosyl-ATP pyrophosphatase/phosphoribosyl-AMP cyclohydrolase | phosphoribosyl-ATP diphosphatase |
Figure 2Flowchart of similarity cut-off determination for vector space model.
Paired comparison results between automated annotation services
| Baseline | 62% | 51% | 39% | 46% | 40% | 32% |
| One HP another non-HP | 1% | 1% | 1% | 2% | 2% | 1% |
| Annotation and database term | 23% | 20% | 11% | 27% | 14% | 30% |
| Gene symbol entry | 1% | 2% | 5% | 1% | 4% | 5% |
| Pfam entry | 3% | 7% | 11% | 6% | 10% | 8% |
| Orthologue | 1% | 2% | 7% | 3% | 7% | 6% |
| Vector space model | 1% | 2% | 3% | 1% | 2% | 2% |
| Matching | 1% | 3% | 6% | 1% | 4% | 1% |
| Overall result | 93% | 88% | 83% | 87% | 83% | 85% |
Baseline: baseline annotation comparison result; One HP another non-HP: one annotation is a ‘hypothetical protein’ and another one has a characterised function; Annotation and database term: two annotations have the same annotation text or database terms; Gene symbol entry: annotations are derived from the same gene symbol entry; Pfam entry: annotations are derived from the same Pfam entry; Orthologue: annotations are derived from the same OGA; Vector space model: vector space model comparison; Matching: matching between comparison results; Overall result: total number of gene annotations compared.
Figure 3Flowchart of gene annotation determination procedure.
Discrepant annotations between automated annotation services and consensus annotation
| IGS | 74 |
| IMG | 122 |
| JCVI | 133 |
| RAST | 134 |
Examples of discrepant annotations between automated annotation services and consensus annotation
| 76 | putative membrane protein | hypothetical protein | conserved hypothetical protein | hypothetical protein | Putative membrane protein |
| 2119 | hypothetical protein | ABC-type Co2+ transport system | | | ABC-type Co2+ transport system, periplasmic component |
| 547 | hypothetical protein | hypothetical protein | prepilin-type N-terminal cleavage/methylation domain protein | hypothetical protein | prepilin-type N-terminal cleavage/methylation domain protein |
| 2176 | hypothetical protein | hypothetical protein | conserved hypothetical protein | FAD/FMN-containing dehydrogenases | FAD/FMN-containing dehydrogenases |
Discrepant annotations between RAST and consensus annotation
| 44 | hypothetical protein | membrane transport family protein | consensus | Pfam, TIGRfam, BLAST, TMHMM |
| 1333 | protein of unknown function DUF481 | Putative salt-induced outer membrane protein | consensus | BLAST |
| 77 | hypothetical protein | tat (twin-arginine translocation) pathway signal sequence domain protein | neither | NADH dehydrogenase, FAD-containing subunit (BLAST) |
| 2081 | conserved hypothetical protein | Methyltransferase domain. | neither | Tellurite resistance protein TehB (pFam) |
| 982 | histidine kinase | bacterial extracellular solute-binding proteins, family 3 family protein | RAST | two-component sensor histidine kinase (BLAST) |
| 1900 | hypothetical protein | N- methylation | RAST | hypothetical protein (BLAST) |
| 183 | conserved hypothetical protein | putative membrane protein | not enough evidence | |
| 212 | conserved hypothetical protein | transcriptional regulator, Spx/MgsR family | not enough evidence |
Figure 4Automated annotation comparison results for six genomes. eco, ctr, hpy, mge, mtu, rpr stand for the automated genome annotation comparison results for Escherichia coli K-12 MG1655, Chlamydia trachomatis D/UW-3/CX, Helicobacter pylori 26695, Mycoplasma genitalium G37, Mycobacterium tuberculosis H37Rv and Rickettsia prowazekii Madrid E, respectively. These abbreviation representations apply to the following sections as well.
Genome annotation comparison results for six genomes
| eco | IGS vs IMG | 52% | 86% | 34% |
| IGS vs RAST | 37% | 68% | 31% | |
| IMG vs RAST | 32% | 66% | 34% | |
| ctr | IGS vs IMG | 55% | 79% | 24% |
| IGS vs RAST | 40% | 61% | 21% | |
| IMG vs RAST | 50% | 73% | 23% | |
| hpy | IGS vs IMG | 49% | 79% | 30% |
| IGS vs RAST | 38% | 64% | 26% | |
| IMG vs RAST | 46% | 72% | 26% | |
| mge | IGS vs IMG | 81% | 93% | 12% |
| IGS vs RAST | 82% | 91% | 9% | |
| IMG vs RAST | 82% | 90% | 8% | |
| myt | IGS vs IMG | 43% | 72% | 29% |
| IGS vs RAST | 35% | 56% | 21% | |
| IMG vs RAST | 39% | 62% | 23% | |
| rpr | IGS vs IMG | 54% | 77% | 23% |
| IGS vs RAST | 32% | 59% | 27% | |
| IMG vs RAST | 40% | 67% | 27% |
Figure 5Automated annotation determination results for six genomes.