| Literature DB >> 26594309 |
Vasilis J Promponas1, Ioannis Iliopoulos2, Christos A Ouzounis3.
Abstract
The function annotation process in computational biology has increasingly shifted from the traditional characterization of individual biochemical roles of protein molecules to the system-wide detection of entire metabolic pathways and genomic structures. The so-called genome-aware methods broaden misannotation inconsistencies in genome sequences beyond protein function assignments, encompassing phylogenetic anomalies and artifactual genomic regions. We outline three categories of error propagation in databases by providing striking examples - at various levels of appreciation by the community from traditional to emerging, thus raising awareness for future solutions.Entities:
Keywords: Error propagation; Genome evolution; Genome structure; Genome-aware methods; Genome-wide annotation; Mis-annotation modeling; Next-generation sequencing; Protein function prediction
Year: 2015 PMID: 26594309 PMCID: PMC4653902 DOI: 10.1186/s40793-015-0101-2
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Fig. 1Depiction of the relationships across eight families of 62 “putaitve” proteins. Network view of sequence similarities detected by BlastP [21], generated with BioLayout [22]. Six of the eight displayed families originate from a single genome project [23]
Eight select cases of similarity-based mis-assignment
| # | GI # | Accession # | Description | Species |
|---|---|---|---|---|
| 1 | 19698819 | gb|AAL91145.1 | putative protein {Nup85} |
|
| 2 | 7573329 | emb|CAB87799.1 | putative protein {Sec16} |
|
| 3 | 296819643 | ref|XP_002849880.1 | protein kinase domain-containing protein {+Nic96} |
|
| 4 | 557867390 | gb|ESS70565.1 | unspecified product {Sec16} |
|
| 5 | 316978722 | gb|EFV61666.1 | putative ATP synthase F1, delta subunit {Nup98-96} |
|
| 6 | 308809856 | ref|XP_003082237.1 | ATP-dependent RNA helicase (ISS) {Sec16} |
|
| 7 | 255574074 | ref|XP_002527953.1 | nucleotide binding protein, putative {Sec16} |
|
| 8 | 443916862 | gb|ELU37796.1 | DUF1479 domain-containing protein {+Nup85} |
|
Column names: #: case number, GI#: gene identifier number, Accession#: database and accession number, Description: description line, Species: species name (and strain type where available). In curly brackets within the Description field, we list the corresponding protein domains (Nup85, Nup98-96, Nic96 nucleoporins – and ancestral coatomer element 1 Sec16 (ACE1-Sec16-like); + sign: partially correct annotation, missing the domain indicated, two cases)
Fig. 2Phylogenetic distribution of nucleoporin Nup160 domains in Pfam. The collapsed eukaryotic tree with the distribution of 336 members is shown, along the bacterial branch containing two unexpected entries with 3 members (underlined by a purple oval box). These phylogenetic anomalies are present both in Pfam (PF11715) [24], as well as the corresponding UniProt entries [14]. The presence of other domains is also shown
Fig. 3Domain organization for two unique instances of multi-domain architectures for Y-Nups. The arginase-Nup133 (Nucleoporin_C) fusion is accompanied by a Nup170-like domain in the middle (green) [top]. The aconitase-Nup75 (Nup85) fusion also contains a number of other regions of interest [bottom]. For details, please refer to the corresponding UniProt/Pfam entries, see main text for identifiers