| Literature DB >> 21677861 |
Kevin Galens1, Joshua Orvis, Sean Daugherty, Heather H Creasy, Sam Angiuoli, Owen White, Jennifer Wortman, Anup Mahurkar, Michelle Gwinn Giglio.
Abstract
The Institute for Genome Sciences (IGS) has developed a prokaryotic annotation pipeline that is used for coding gene/RNA prediction and functional annotation of Bacteria and Archaea. The fully automated pipeline accepts one or many genomic sequences as input and produces output in a variety of standard formats. Functional annotation is primarily based on similarity searches and motif finding combined with a hierarchical rule based annotation system. The output annotations can also be loaded into a relational database and accessed through visualization tools.Entities:
Keywords: BER; Ergatis; Glimmer; HMM; IGS Annotation Engine; Institute for Genome Sciences; Manatee; annotation pipeline; functional annotation; microbial genomics; pFunc; prokaryotic genomics; structural annotation
Year: 2011 PMID: 21677861 PMCID: PMC3111993 DOI: 10.4056/sigs.1223234
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Figure 1Flow of data and logic for IGS automated microbial annotation pipeline. Protein coding genes and RNAs are predicted from nucleotide sequence, which are then structurally curated and assigned a function.
BER annotation hierarchy
| | | | | | |
|---|---|---|---|---|---|
| Yes | Full | Full | 1 | None | copied from match |
| Yes | Full | Partial | 2 | … domain protein | GO root terms/TIGR unknown |
| Yes | Partial | Full | 2 | … domain protein | copied from match |
| No | Full | Full | 3 | possible … | copied from match |
| No | Partial | Full | 4 | possible …domain protein | GO root/TIGR unknown |
| No | Full | Partial | 4 | possible…domain protein | GO root/TIGR unknown |
| with ambiguous term | Full/Partial | Full/Partial | 5 | “conserved hypothetical protein” | GO root/TIGR conserved hypothetical |
HMM annotation hierarchy.*
| | | |
|---|---|---|
| Equivalog | 1 | None |
| Equivalog Domain | 2 | None |
| Subfamily | 3 | … family protein |
| Superfamily | 4 | … family protein |
| Subfamily Domain | 5 | … domain protein |
| Domain | 6 | … domain protein |
| Pfam | 7 | … family protein |
| Hypothetical Equivalog | 7 | None |
*In all cases, GO terms and TIGR roles are copied from the HMM.
Final annotation hierarchy
| | | | | |
|---|---|---|---|---|
| HMM | Equivalog | N/A | N/A | 1 |
| BER | Trusted | Full | Full | 2 |
| HMM | Equivalog Domain | Full | Full | 3 |
| BER | Trusted | Partial | Full | 4 |
| HMM | Subfamily | N/A | N/A | 5 |
| HMM | Superfamily | N/A | N/A | 6 |
| HMM | Subfamily Domain | N/A | N/A | 7 |
| HMM | Domain | Partial | Full | 8 |
| HMM | Pfam | Full | Full | 9 |
| BER | Trusted | Full | Partial | 10 |
| TMHMM | > 5 membrane spans | N/A | N/A | 11 |
| LipoP | Presence of prediction | N/A | N/A | 12 |
| HMM | Hypothetical Equivalog | N/A | N/A | 13 |
| BER | Not trusted | Full | Full | 14 |
| BER | Not trusted | Partial | Full | 15 |
| BER | Not trusted | Full | Partial | 16 |
| BER | With ambiguous term | Full/Partial | Full/Partial | 17 |
Software versions and parameters
| | | | |
|---|---|---|---|
| Structural | tRNA-scanSE | 1.23 | -q -b -B |
| Annotation | RNAmmer | 1.2 | -S bac -m lsu,tsu,ssu, -xml -gff |
| Glimmer3 | 3.02 | -o50 -g110 -t30 -z11 -l -X | |
| Functional | blastall -p blastx | 2.2.17 | -e 1e-5 -F T -b 150 -v 150 -M BLOSUM62 |
| Annotation | HMMer | 2.3.2 | -acc |
| SignalP | 3.0b | -m ‘nn+hmm’ -trnc 1000 -graphics ‘gif+eps’ | |
| TMHMM | 2.0c | --libdir TMHMM/lib | |
| LipoP | 1.0a | -short -cutoff -3 | |
| Prosite (ps_scan) | 1.34 | -s | |
| RPS-blast | 2.2.17 | -e 1e-5 -F T -b 150 -v 150 | |
| Blastp | 2.2.17 | -e 1e-5 -F T -b 150 -v 150 -M BLOSUM62 |