| Literature DB >> 20435676 |
Federico Abascal1, Rafael Zardoya, Maximilian J Telford.
Abstract
We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.Entities:
Mesh:
Year: 2010 PMID: 20435676 PMCID: PMC2896173 DOI: 10.1093/nar/gkq291
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Example illustrating the different performance of the direct and back-translated nucleotide alignments (multiple alignments were built with Muscle with default parameters).
Figure 2.Screen capture of a fragment of the results of TranslatorX. The nucleotide back-translated alignment and the corresponding amino acid alignment are shown with Jalview.
Length of each alignment and number of positions whose alignments differ between each pair of methods
| Length | TrX + ClustalW | TrX + Muscle | TrX + Mafft | TrX + Tcoffee | ClustalW | Muscle | Mafft | Tcoffee | |
|---|---|---|---|---|---|---|---|---|---|
| TrX + ClustalW | 11 514 | 0 | 780 | 816 | 684 | 1580 | 1260 | 1491 | 2520 |
| TrX + Muscle | 11 553 | 819 | 0 | 501 | 633 | 1582 | 1246 | 1455 | 2543 |
| TrX + Mafft | 11 562 | 864 | 510 | 0 | 693 | 1545 | 1240 | 1448 | 2520 |
| TrX + Tcoffee | 11 526 | 696 | 606 | 657 | 0 | 1552 | 1201 | 1450 | 2505 |
| ClustalW | 11 562 | 1628 | 1591 | 1545 | 1588 | 0 | 1338 | 1380 | 2434 |
| Muscle | 11 604 | 1350 | 1297 | 1282 | 1279 | 1380 | 0 | 1342 | 2487 |
| Mafft | 11 679 | 1656 | 1581 | 1565 | 1603 | 1497 | 1417 | 0 | 2574 |
| Tcoffee | 13 771 | 4777 | 4761 | 4729 | 4750 | 4643 | 4654 | 4666 | 0 |
Trx, TranslatorX—the back-translation approach.
Number of gaps, gap segments and types of gap arrangements for the different alignments
| ClustalW | Muscle | Mafft | T-coffee | SD | |
| Alignment length | 11 562 | 11 604 | 11 679 | 13771 | 1079.09 |
| Total gaps | 1803 | 2181 | 2856 | 21 684 | 9711.77 |
| Gap segments | 536 | 407 | 431 | 2414 | 979.60 |
| One gap | 236 | 133 | 94 | 1179 | 515.82 |
| Two gaps | 146 | 94 | 82 | 579 | 237.46 |
| Three gaps | 425 | 620 | 866 | 6449 | 2911.60 |
| TrX + ClustalW | TrX + Muscle | TrX + Mafft | TrX + Tcoffee | SD | |
| Alignment length | 11 514 | 11 553 | 11 562 | 11 526 | 22.50 |
| Total gaps | 1371 | 1722 | 1803 | 1479 | 202.50 |
| Gap segments | 166 | 213 | 232 | 206 | 27.77 |
| One gap | 0 | 0 | 0 | 0 | 0.00 |
| Two gaps | 0 | 0 | 0 | 0 | 0.00 |
| Three gaps | 457 | 574 | 601 | 493 | 67.50 |
Trx, TranslatorX—the back-translation approach.
Figure 3.Comparison of the phylogenetic trees inferred from the sub-alignments that comprise positions whose alignment differed between the back-translated and direct Mafft (A, B) and Muscle (C, D) alignments.
Statistics for the different alignments obtained using the Muscle alignment program
| Length | Average %id | Min %id | Max %id | GC (%) | Gaps (%) | |
|---|---|---|---|---|---|---|
| TranslatorX | ||||||
| Complete alignment | 11 553 | 67 | 63 | 71 | 40.36 | 1.65 |
| Gblocks accepted | 10 083 | 69 | 66 | 73 | 40.83 | 0 |
| Gblocks discarded | 1470 | 46 | 39 | 57 | 36.67 | 13.02 |
| TranslatorX versus direct Muscle | ||||||
| Consensus (coinciding) | 10 237 | 69 | 65 | 73 | 40.68 | 0.06 |
| Different in TranslatorX | 1316 | 45 | 36 | 58 | 37.47 | 14.08 |
| Different in Muscle | 1367 | 49 | 43 | 59 | 37.47 | 17.29 |
The first set of rows are comparisons of TranslatorX alignments before and after the GBlocks cleaning. The second set of rows refers to the comparison between TranslatorX and direct nucleotide alignment approaches. Average %id, Min %id and Max %id: average/minimum/maximum percentage of identity between aligned sequences; GC %: GC-content percentage; Gaps %: percentage of gaps in the multiple alignment.