| Literature DB >> 28386247 |
Eric Altermann1, Jingli Lu2, Alan McCulloch3.
Abstract
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use.Entities:
Keywords: Artemis genome viewer; expert curation; genome annotation; genome visualization; microbial; sequence analysis; stand-alone software
Year: 2017 PMID: 28386247 PMCID: PMC5362640 DOI: 10.3389/fmicb.2017.00346
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1GAMOLA2 annotation workflow. A schematic representation of the GAMOLA2 core annotation workflow. Input FASTA and Genbank sequences can be concatenated and/or clustered before submitted to gene model prediction, functional and structural analyses. The final output comprises an annotated Genbank file and associated data that can be viewed in Artemis or other suitable software. For convenience, the results may be compressed into a single archive file. Individual input or output files are shown in red (each analysis generates text output files well which are stored in their respective directories, not shown); programs are shown in green, available Blast flavors are shown in dark green; databases used are indicated in blue.
Figure 2File structure of the GAMOLA2 output. Screenshot of the GAMOLA2 file and directory arrangement. Upon completion of an annotation run, GAMOLA2 can sort results of individual entries into separate directories that comprise the main annotated Genbank file, the underlying FASTA sequence file and, where appropriate, the contig order of the concatenated sequence. Further, the full dataset for the entire genome is available in their respective folders and can be easily retrieved for a more detailed analysis and background information.
Figure 3Genome visualization in Artemis. Screenshot of the modified Artemis genome browser displaying a GAMOLA2 annotated sequence. The Artemis genome browser is a Java based application that is platform independent and can, once the Genbank file is loaded, traverse along the genome and display information for individual genes in real-time. Annotations for individual genes are presented in individual feature blocks that always begin with the “gene” and “CDS” features (gray boxes). Additional features are shown based on their respective genome location. Each feature has a defined color code, creating a consistent user experience. Changing gene annotations is achieved by modifying the “gene” qualifier in the “gene” and “CDS” features, whereby “gene” features display a short gene name and “CDS” features a verbose description (Supplemental Figure 9A). Names of functional domains are often cryptic and do not directly contribute to the deciphering of the biological role of a given gene. Each feature in a GAMOLA2 annotation therefore contains additional information to explain the respective biological role (where known) or provide additional qualitative details (Supplemental Figure 9B). Genes that lack a close characterized homolog or well-known domains often remain annotated as “conserved hypotheticals.” Investigating all functional and structural information above the selected thresholds often reveals common biological themes that lead to a putative annotation. The modified Artemis genome browser can retrieve the underlying full results for Blast, COG, PFam, and TIGRfam for each gene as long as the original file and folder structure is maintained (Supplemental Figure 9C).
Feature comparison between GAMOLA2 and .
| GUI | ||||
| Visualization of annotation | ||||
| Local PC | ||||
| Server-client/terminal | ||||
| Cloud | ||||
| Web-based | ||||
| Multi-threaded | ||||
| Off-line capability | ||||
| Re-use previous results | ||||
| Filter Blast results | ||||
| Glimmer 2/3 | ||||
| Prodigal | ||||
| Critica | ||||
| GeneMark | ||||
| RAST | ||||
| Intergenic Blast | ||||
| Blast Homology based gene prediction | ||||
| RBS | ||||
| External gene model | ||||
| Additive prediction | ||||
| Rule based prediction | ||||
| Blast/Blast-Plus | ||||
| Multiple Blast flavors | ||||
| Custom Blast databases | ||||
| COG | ||||
| Multiple COG databases | ||||
| eggnog | ||||
| Pfam (HMMER2/3) | ||||
| TIGRfam | ||||
| FIGfam | ||||
| Selection of multiple databases | ||||
| Gene Ontology descriptor | ||||
| InterPro descriptor | ||||
| InterProScan | ||||
| EC number | ||||
| KEGG | ||||
| Metabolic reconstruction | ||||
| tRNA | ||||
| rRNA | ||||
| Non-coding RNAs | ||||
| Transmembrane helices | ||||
| Signal Peptide Cleavage Sites | ||||
| Rho-independent terminators | ||||
| CRISPRs | ||||
| Vector screen | ||||
| FASTA | ||||
| msFASTA | ||||
| Genbank | ||||
| msGenbank | ||||
| Concatenate sequences | ||||
| Create concatenated sequence clusters | ||||
| Prevent gene model bleeding across contigs | ||||
| Update Genbank files | ||||
| Genbank | ||||
| GFF tracks | ||||
| EMBL | ||||
| All features displayed | ||||
| Embedded feature descriptors | ||||
| Feature csv/tsv file | ||||
| Feature Excel file | ||||
| Log and Error files | ||||
| Statistic file | ||||
| Create custom Blast databases | ||||
| Rotate Genbank files | ||||
| Prepare for Sequin submission | ||||
| Annotation transfer | ||||
| Functional Metagenome analysis | ||||
Via enhanced Artemis genome viewer;
static HTML sites to KEGG and KO;
custom browser;
custom filter;
automatic;
default;
internal and GFF format;
blastp only;
NCBI nr only;
COG2003, COG2008, COG2014, arCOG, arCOG2014, POG2013;
planned in next release;
for Pfam and TIGRfam;
partial feature only;
available as separate software, integration in the next release update;
Gamola2 creates verbose error logs.
Download URLs to the GAMOLA2 distribution.
| Readme | 1 | |
| GAMOLA2 Manual (PDF) | 3,947 | |
| GAMOLA2 distribution | 1,376,676 | |
| Customized Artemis 16 | 24,116 | |
| GAMOLA2 tutorial dataset | 5,265 | |
| Databases | 1,201,277 |