| Literature DB >> 16855290 |
K Bryson1, V Loux, R Bossy, P Nicolas, S Chaillou, M van de Guchte, S Penaud, E Maguin, M Hoebeke, P Bessières, J-F Gibrat.
Abstract
We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16855290 PMCID: PMC1524909 DOI: 10.1093/nar/gkl471
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Characteristics of some annotation platforms
| Method | Reference | Organisms | Graphic interface | Automatic processing of data | Manual annotation | Collaborative annotation | ‘Reasoning’ capabilities | Annotation assessment | Availability |
|---|---|---|---|---|---|---|---|---|---|
| MAGPIE | ( | Prokaryote | No | Yes | Limited | No | Yes | No | Code available |
| GENOTATOR | ( | Eukaryote | Yes | Yes | possible | No | No | No | Code available |
| GAIA | ( | Eukaryote | Yes | Yes | Limited | No | No | No | Web use |
| IMAGENE | ( | Prokaryote | Yes | Possible | Yes | No | No | No | Availableb |
| GENEQUIZ | ( | Botha | No | Yes | No | No | Yes | Yes | Web use |
| ARTEMIS | ( | Both | Yes | No | Yes | No | No | No | GPLc |
| M-AGENTS | ( | Virus | No | Yes | No | No | No | No | Web use |
| PEDANT | ( | Both | Yes | Yes | Possible | No | No | No | Web use |
| ENSEMBL | ( | Both | Nod | Yes | Noe | Yes | No | No | GPL |
| APOLLO | ( | Both | Yes | No | Limited | Yes | No | No | GPL |
| OTTER | ( | Both | Yes | No | Yes | Yes | No | No | Code available |
| RICEGAAS | ( | Eukaryote | Yes | Yes | Limited | No | No | No | Web use |
| GENQUIRE | ( | Both | Yes | No | Yes | No | No | No | GPL |
| ATUGC | ( | Prokaryote | No | Yes | No | No | Yes | No | No |
| GENDB | ( | Prokaryote | Yes | Yes | Yes | No | Yes | No | GPL |
| ASAP | ( | Both | Yes | No | Yes | Yes | No | No | Web use |
| SABIA | ( | Prokaryote | No | Yes | No | No | No | No | Code available |
| MANATEE | (see notef) | Both, virus | Yes | Yes | Yes | No | No | No | GPL |
| MAGE | ( | Prokaryote | Yes | Yes | Yes | Yes | No | No | Web use |
| AGMIAL | Prokaryotes | Yes | Yes | Yes | Yes | No | No | GPL |
See the text for a detailed definition of the column headings.
aGeneQuiz carries out protein sequence analysis only. It has been mostly used to re-annotate prokaryotic genome.
bRequires ILOG licensed libraries.
cGNU public license.
dProvided by APOLLO.
ePerformed with OTTER.
f.
Bioinformatics tools integrated into the CAM
| Method | Description | Reference |
|---|---|---|
| SHOW | Gene detection using a hidden Markov models | |
| tRNAScan | Detection of tRNA sequences | ( |
| rRNAScan | Detection of rRNA sequences. | |
| PETRIN | Detection of terminator sequences | ( |
Figure 1Views of the DNA sequence at different scales. The upper part of the figure represents an atlas view of the genome obtained with CGView. One can zoom on a particular region of this map, for instance on the area containing the mgsA gene: a methylglyoxal synthase that belongs to the glycolytic pathway. Clicking on this gene will open the MuGeN interface showing its genomic context (genome map frame). It is possible to zoom on this representation to see the DNA sequence and the translation in the six reading frames (sequence frame). The green symbol represents the RBS, the gene sequence is colored as in the previous view. The navigation window allows one to move along the genome, either by entering a range of base numbers, or by looking for a feature with a particular qualifier or by specifying a DNA or protein motif to be searched for in the current window or in the complete genome. The window at the lower left of the figure shows the gene editor. Most fields are automatically filled, in particular the gene annotation qualifiers since, in general, CDS annotation is performed in PAM and then updated in CAM (see Figure 4). Clicking on the link to PAM, indicated by the magnifying glass, will lead the annotator to the PAM interface shown in Figure 2. For clarity the Artemis interface is not shown on the figure.
Bioinformatics tools integrated into the PAM
| Method | Description | Reference |
|---|---|---|
| Methods to determine sequence intrinsic properties | ||
| pI | Isoelectric point and molecular mass | |
| SEG | Detection of low-complexity regions | ( |
| COIL | Detection of coiled-coil structures | ( |
| SIGSEQ | Detection of signal peptides | ( |
| MEMSAT | Transmembrane segment prediction | ( |
| Homology search methods | ||
| RPS-BLAST | Reverse position specific BLAST | |
| PSI-BLAST | Sequenced-based homology search | ( |
| FROST | Fold recognition method for detecting remote homologues | ( |
| Miscellaneous methods | ||
| PSORTb | Prediction of subcellular localization | ( |
| Jalview | Multiple sequence alignment editor | ( |
| InterProScan | Integrated protein motif detection, including the following software: | ( |
| Motif and functional signature detection methods | ||
| ProfileScan | Find motifs using profiles | ( |
| ScanRegExp | Find motifs using regular expressions | ( |
| FPrintScan | Find multiple motifs | ( |
| Domain detection using HMM methods | ||
| HMMSmart | Genetically mobile domains from SMART | ( |
| HMMPfam | Protein family domains from PFAM | ( |
| HMMTigr | Protein families from TIGR institute | ( |
| HMMPIR | Protein families from PIR | ( |
| HMMPanther | Protein families subdivided into functionally related subfamilies | ( |
| SuperFamily | Proteins of known 3D structure | ( |
| Gene3D | Protein families in complete genomes | ( |
| Method to split protein sequences into domains | ||
| BlastProDom | Defines domains in protein sequences | ( |
| Methods to find sequence intrinsic properties | ||
| SignalPHMM | Prediction of signal peptide | ( |
| TMHMM | Prediction of transmembrane helices in proteins | ( |
Figure 2The right part of the figure shows the results of the different bioinformatic methods applied to the sequence of MgsA. Not all result sections are shown here. In the ‘homology’ section, checking the boxes on the left of the homologous sequences and then clicking on the link to Jalview, below, will show the multiple alignment of the selected sequences and the corresponding phylogenetic tree. The left part of the figure shows the annotation window where the protein annotation is performed. Information entered in this section is forwarded to CAM, the system always makes sure that both managers are synchronized. The bottom of this window shows the annotation history. The link to CAM at the top will lead the annotator back to the CAM interface (MuGeN interface, see Figure 1). The link to PAREO (our relational version of the KEGG database) near the ‘EC number’ box will lead the user to the KEGG interface shown in Figure 3.
Databases used by the PAM tools
| Name | Description | Reference |
|---|---|---|
| UNIPROT | Protein sequences and functions | ( |
| KEGG | Molecular interaction networks | ( |
| CDD | Conserved domain database | ( |
| PDB | 3D structures database | ( |
| SCOP | Protein 3D domain database | ( |
| PROSITE | Functional motifs and profiles derived from SWISSPROT | ( |
| PRINTS | Manually derived functional motifs | ( |
| SMART | Motifs of genetically mobile domains | ( |
| PFAM | Protein family domains | ( |
| TIGRFAMs | Similar to PFAM | ( |
| SUPERFAMILY | Families of proteins of known 3D structures | ( |
| GENE3D | Protein families and domain architectures in complete genomes | ( |
| PANTHER | Protein families subdivided into functionally related subfamilies | ( |
| PRODOM | Automatically generated protein domain families | ( |
Figure 3This figure shows both glycolysis and pyruvate metabolism pathways for Lactobacillus plantarum and L.sakei. As indicated in the legend at the top of the figure, enzymes that are only found in L.plantarum and L.sakei are colored respectively in red and purple. Enzymes found in both organisms are colored in green. The magnifying glasses are used to indicate the role of MgsA in these pathways. This enzyme appears to be involved in a methylglyoxal bypass (reversible reaction) of glycolysis in L.sakei. The figure illustrates well the major difference in glycolysis in L.plantarum and L.sakei. The bottom right box shows the product of the reaction catalyzed by MgsA. A detailed account of L.sakei energy production pathways contributing to meat adaptation can be found in ref. (38).
Figure 4Dashed arrows represent automatic processes between managers, solid arrows represent human interaction with the managers. Graphic interfaces are described in Figures 1 and 2.