| Literature DB >> 17572364 |
Satoshi Tamaki1, Kazuharu Arakawa, Nobuaki Kono, Masaru Tomita.
Abstract
Annotations of complete genome sequences submitted directly from sequencing projects are diverse in terms of annotation strategies and update frequencies. These inconsistencies make comparative studies difficult. To allow rapid data preparation of a large number of complete genomes, automation and speed are important for genome re-annotation. Here we introduce an open-source rapid genome re-annotation software system, Restauro-G, specialized for bacterial genomes. Restauro-G re-annotates a genome by similarity searches utilizing the BLAST-Like Alignment Tool, referring to protein databases such as UniProt KB, NCBI nr, NCBI COGs, Pfam, and PSORTb. Re-annotation by Restauro-G achieved over 98% accuracy for most bacterial chromosomes in comparison with the original manually curated annotation of EMBL releases. Restauro-G was developed in the generic bioinformatics workbench G-language Genome Analysis Environment and is distributed at http://restauro-g.iab.keio.ac.jp/under the GNU General Public License.Entities:
Mesh:
Year: 2007 PMID: 17572364 PMCID: PMC5054091 DOI: 10.1016/S1672-0229(07)60014-X
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
The five reliability levels for the BLAT search
| Level | E-value | (Match/Subject length) and (Match/Query length) |
|---|---|---|
| Level 1 | ≤ 1E-70 | ≥ 98% |
| Level 2 | ≤ 1E-50 | ≥ 95% |
| Level 3 | ≤ 1E-30 | ≥ 90% |
| Level 4 | ≤ 1E-10 | ≥ 80% |
| Level 5 | None of the above | |
Types of annotations and information included in Restauro-G annotation
| Type | Database | Annotataion |
|---|---|---|
| Similarity | UniProt KB/Swiss-Prot | ID/gene name/description/database cross-reference/E-value/level/comments/feature table |
| NCBI nr | ID/gene name/description/E-value/level | |
| Orthologous | NCBI COGs | ID/gene name/description/COG family/E-value/level |
| Domain | HMMPfam | ID/gene name/description/domain information/E-value |
| Protein location | PSORTb | Protein location information |
Validation of Restauro-G annotation accuracy
| Genome | No. of coding sequences | Annotation in EMBL | Restauro-G prediction | Matches with EMBL | Matches with Genome Reviews (%) | Time (s) |
|---|---|---|---|---|---|---|
| 4,106 | 4,106 | 4,105 | 4,100 (99.85%) | 99.70% | 1,110 | |
| 4,331 | 4,259 | 4,301 | 4,202 (98.66%) | 99.46% | 510 | |
| 476 | 476 | 476 | 476 (100.00%) | 100.00% | 212 | |
| 4,189 | 4,186 | 4,188 | 4,127 (98.59%) | 98.50% | 2,430 | |
| 2,065 | 2,057 | 2,062 | 2,043 (99.31%) | 99.27% | 1,195 |
Genome versions: B. subtilis—EMBL: AL009126 07-JUL-2003 (rel. 76, ver. 3); E. coli—EMBL: U00096 13-AUG-2006 (rel. 88, ver. 6); M. genitalium—EMBL: L43967 14-JAN-2006 (rel. 86, ver. 2); M. tuberculosis—EMBL: AE000516 14-APR-2005 (rel. 83, ver. 2); P. furiosus—EMBL: AE009950 22-JAN-2004 (rel. 78, ver. 2).