| Literature DB >> 16845034 |
Federico Abascal1, Rafael Zardoya, David Posada.
Abstract
Although the majority of the organisms use the same genetic code to translate DNA, several variants have been described in a wide range of organisms, both in nuclear and organellar systems, many of them corresponding to metazoan mitochondria. These variants are usually found by comparative sequence analyses, either conducted manually or with the computer. Basically, when a particular codon in a query-species is linked to positions for which a specific amino acid is consistently found in other species, then that particular codon is expected to translate as that specific amino acid. Importantly, and despite the simplicity of this approach, there are no available tools to help predicting the genetic code of an organism. We present here GenDecoder, a web server for the characterization and prediction of mitochondrial genetic codes in animals. The analysis of automatic predictions for 681 metazoans aimed us to study some properties of the comparative method, in particular, the relationship among sequence conservation, taxonomic sampling and reliability of assignments. Overall, the method is highly precise (99%), although highly divergent organisms such as platyhelminths are more problematic. The GenDecoder web server is freely available from http://darwin.uvigo.es/software/gendecoder.html.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845034 PMCID: PMC1538875 DOI: 10.1093/nar/gkl044
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Scheme of GenDecoder's workflow. The example is based on the UCU codon. A similar pipeline is executed for every other codon and using the whole set of 13 mitochondrial protein-coding genes.
Figure 2Performance of GenDecoder under different entropy thresholds and using the sampling-balanced alignments. The accuracy under different parameters for 41 042 codon assignments corresponding to 681 species is summarized in the graph. In every case columns with >20% of gaps were ignored. Comparison of this figure with the one appearing in (3) indicates that the use of taxonomically balanced alignments displaces the optimal point towards less restrictive entropy thresholds.
Performance of GenDecoder and the importance of using an appropriate taxonomic sampling
| Number of species | 54-Taxa multiple alignments | All-metazoans multiple alignments | |||
|---|---|---|---|---|---|
| #Concordant/total | FP/TP (%) | Number of concordant/total | FP/TP (%) | ||
| Annelida | 4 | 244/247 | 1.2 | 244/248 | 1.6 |
| Arthropoda | 87 | 5116/5222 | 2.1 | 5048/5265 | 4.3 |
| Brachiopoda | 2 | 122/123 | 0.8 | 118/124 | 5.1 |
| Cephalochordata | 5 | 303/303 | 0.0 | 305/306 | 0.3 |
| Cnidaria | 4 | 246/248 | 0.8 | 242/248 | 2.5 |
| Echinodermata | 11 | 671/676 | 0.7 | 672/678 | 0.9 |
| Hemichordata | 1 | 60/60 | 0.0 | 60/60 | 0.0 |
| Mollusca | 15 | 911/924 | 1.4 | 895/926 | 3.5 |
| Nematoda | 12 | 634/690 | 8.8 | 600/703 | 17.2 |
| Platyhelmynthes | 10 | 525/598 | 13.9 | 475/601 | 26.5 |
| Porifera | 3 | 176/178 | 1.1 | 176/178 | 1.1 |
| Vertebrata | 461 | 27 288/27 375 | 0.3 | 27 547/27 498 | 0.2 |
Note: discrepancies in the number of assignments between the two experiments are related with the different behaviour that the conservancy threshold manifests with different alignments (e.g. there were 598 and 601 assignments for platyhelminths in the two experiments).
#Concordant/total, number of assignments concordant with GenBank/total number of assignments. Unassigned codons, i.e. codons that either are not used or do not appear at conserved positions (in this case entropy > 2.0; gaps > 20%), are not considered in this table.
#FP/TP, false-positive rate. Non-concordant/concordant assignments × 100.
Figure 3GenDecoder output for the acantocephalan L.thecatus. Codon-imp, number of codons at conserved positions (in this case S < 2.0, gaps < 20%); Codon-num, number of codons in the mt-genome; Freq-aa, first decimal in the frequency of the most frequent amino acid; Diff-freq, difference between the frequency of the predicted and expected amino acids (first decimal).