| Literature DB >> 18505568 |
Amarendran R Subramanian1, Michael Kaufmann, Burkhard Morgenstern.
Abstract
BACKGROUND: DIALIGN-T is a reimplementation of the multiple-alignment program DIALIGN. Due to several algorithmic improvements, it produces significantly better alignments on locally and globally related sequence sets than previous versions of DIALIGN. However, like the original implementation of the program, DIALIGN-T uses a a straight-forward greedy approach to assemble multiple alignments from local pairwise sequence similarities. Such greedy approaches may be vulnerable to spurious random similarities and can therefore lead to suboptimal results. In this paper, we present DIALIGN-TX, a substantial improvement of DIALIGN-T that combines our previous greedy algorithm with a progressive alignment approach.Entities:
Year: 2008 PMID: 18505568 PMCID: PMC2430965 DOI: 10.1186/1748-7188-3-6
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Figure 1High-level description of our algorithm to calculate a multiple alignment of a set of input sequences s1, . . ., s. The algorithm calculates a first alignment A0 using our novel progressive approach and a second alignment A1 with the greedy method previously used in DIALIGN. Finally, the alignment with the higher numerical score is returned. For the progressive method, fragments, i.e. local gap-free pairwise alignments from the respective optimal pairwise alignments are considered. Fragments with a weight score above the average fragment score are processed first following a guide tree as described in the main text. Lower-scoring fragments are added later, provided they are consistent with the previously included high-scoring fragments. Note that the output of the sub-routine PAIRWISE_ALIGNMENT is a chain of fragments. This is equivalent to a pairwise alignment in the sense of DIALIGN.
Sum-of-pairs scores of various alignment programs on the benchmark database IRMBASE 2
| Method (Protein) | REF1 | REF2 | REF3 | REF4 | Total |
| DIALIGN-TX | 89.42 | 93.75 | 93.64 | 92.93 | |
| DIALIGN-T 0.2.2 | 89.670 | 94.190 | 93.120 | 92.730 | |
| DIALIGN 2.2 | 90.430 | 93.40- | 91.78-- | 92.98- | 92.15-- |
| CLUSTAL W2 | 07.13-- | 10.63-- | 19.87-- | 26.17-- | 15.95-- |
| T-COFFEE 5.56 | 72.67-- | 77.80-- | 83.03-- | 83.48- | 79.24-- |
| POA V2 | 87.56- | 49.57-- | 41.90-- | 37.56-- | 54.15-- |
| MAFFT 6.240 L-INSi | 82.780 | 84.29- | 84.15-- | 82.42-- | 84.41-- |
| MAFFT 6.240 E-INSi | 94.370 | 93.110 | |||
| MUSCLE 3.7 | 32.67-- | 34.82-- | 54.19-- | 57.84-- | 44.88-- |
| PROBCONS 1.12 | 78.78-- | 86.82-- | 87.29- | 87.69-- | 85.15-- |
Average sum-of-pair scores (SPS) of the benchmarked programs on the core blocks (given by the implanted conserved motifs) of IRMBASE 2. Minus symbols denote statistically significant inferiority of the respective method compared with DIALIGN-TX, while plus symbols denote statistically significant superiority of the method. 0 denotes non-significant superiority or inferiority of DIALIGN-TX, respectively. Single plus or minus symbols denote significance according to the Wilcoxon Matched Pairs Signed Rank Test with p ≤ 0.05 and double symbols denote significance with p ≤ 0.001, respectively.
Column scores of different programs on IRMBASE 2
| Method (Protein) | REF1 | REF2 | REF3 | REF4 | Total |
| DIALIGN-TX | 64.17 | 70.30 | |||
| DIALIGN-T 0.2.2 | 67.040 | 75.810 | 70.400 | 70.930 | |
| DIALIGN 2.2 | 73.32- | 65.34- | 69.50- | 69.17-- | |
| CLUSTAL W2 | 00.00-- | 00.00-- | 00.11-- | 02.86-- | 00.74-- |
| T-COFFEE 5.56 | 34.84-- | 40.87-- | 43.62-- | 49.56-- | 42.22-- |
| POA V2 | 50.99- | 16.95-- | 11.79-- | 10.18-- | 22.47-- |
| MAFFT 6.240 L-INSi | 37.81-- | 39.54-- | 32.79-- | 38.75-- | 32.22-- |
| MAFFT 6.240 E-INSi | 45.70- | 52.37-- | 43.11-- | 54.82-- | 49.00-- |
| MUSCLE 3.7 | 04.65-- | 06.87-- | 14.80-- | 19.65-- | 11.49-- |
| PROBCONS 1.12 | 36.77-- | 43.47-- | 41.89-- | 43.56-- | 41.42-- |
Average column scores (CS) of the benchmarked programs on the core blocks of IRM-BASE 2. The symbols are analogous to Table 1.
Sum-of-pairs scores on DIRMBASE 1
| Method (DNA) | REF1 | REF2 | REF3 | REF4 | Total |
| DIALIGN-TX | |||||
| DIALIGN-T 0.2.2 | 64.00-- | 61.22-- | 64.96-- | 65.24-- | 63.85-- |
| DIALIGN 2.2 | 92.61- | 91.10- | 94.62- | 94.13- | 93.12-- |
| CLUSTAL W2 | 06.79-- | 08.27-- | 18.51-- | 29.09-- | 15.66-- |
| T-COFFEE 5.56 | 14.71-- | 18.88-- | 32.08-- | 43.39-- | 27.62-- |
| POA V2 | 32.03-- | 27.40-- | 28.78-- | 32.18-- | 30.10-- |
| MAFFT 6.240 L-INSi | 52.40-- | 48.81-- | 49.77-- | 57.47-- | 52.36-- |
| MAFFT 6.240 E-INSi | 92.420 | 84.15-- | 87.91- | 89.36- | 88.46-- |
| MUSCLE 3.7 | 48.17-- | 54.40-- | 56.57-- | 60.24-- | 56.84-- |
| PROBCONSRNA 1.10 | 13.00-- | 12.94-- | 20.28-- | 32.56-- | 19.69-- |
Average sum-of-pair scores (SPS) of the benchmarked programs on the core blocks of DIRMBASE 1. The symbols are analogous to Table 1.
Column scores on DIRMBASE 1.
| Method (DNA) | REF1 | REF2 | REF3 | REF4 | Total |
| DIALIGN-TX | |||||
| DIALIGN-T 0.2.2 | 29.60-- | 28.63-- | 35.51-- | 35.85-- | 32.40-- |
| DIALIGN 2.2 | 69.950 | 68.190 | 71.250 | 72.480 | 70.47- |
| CLUSTAL W2 | 00.00-- | 00.00-- | 02.19-- | 04.99-- | 01.80-- |
| T-COFFEE 5.56 | 00.00-- | 00.18-- | 04.01-- | 08.44-- | 03.16-- |
| POA V2 | 05.63-- | 07.32-- | 04.12-- | 06.81-- | 05.97-- |
| MAFFT 6.240 L-INSi | 21.45-- | 11.93-- | 16.02-- | 22.30-- | 17.93-- |
| MAFFT 6.240 E-INSi | 40.28-- | 41.99-- | 45.77-- | 51.01-- | 44.76-- |
| MUSCLE 3.7 | 14.18-- | 16.18-- | 19.62-- | 30.43-- | 20.10-- |
| PROBCONSRNA 1.10 | 00.73-- | 00.05-- | 01.34-- | 04.31-- | 01.61-- |
Average column scores (CS) of the benchmarked programs on the core blocks of DIRM-BASE 1. The symbols are analogous to Table 1.
Program run time on IRMBASE 2 and DIRMBASE 1
| Method | Average runtime on IRMBASE 2 | Average runtime on DIRMBASE 1 |
| DIALIGN-TX 1.0 | 4.47 | 9.84 |
| DIALIGN-T 0.2.2 | 2.73 | 2.31 |
| DIALIGN 2.2 | 4.98 | 4.82 |
| CLUSTAL W2 | 1.86 | 1.36 |
| T-COFFEE 5.56 | 26.41 | 365.88 |
| POA V2 | 1.81 | 1.20 |
| MAFFT 6.240 L-INSi | 8.47 | 5.33 |
| MAFFT 6.240 E-INSi | 15.35 | 8.39 |
| MUSCLE 3.7 | 6.34 | 4.87 |
| PROBCONS(RNA) 1.12(1.10) | 28.27 | 18.54 |
Average running time (in seconds) per multiple alignment for sequence families on IRMBASE 2 and DIRMBASE 1. Program runs were performed on a Linux workstation with an 3.2 GHz Pentium 4 processor and 2 GB RAM.
Sum-of-pairs scores on BALIBASE 3
| Method (Protein) | RV11 | RV12 | RV20 | RV30 | RV40 | RV50 | Total |
| DIALIGN-TX | 51.52 | 89.18 | 87.87 | 76.18 | 83.65 | 82.28 | 78.83 |
| DIALIGN-T 0.2.2 | 49.30- | 88.760 | 86.290 | 74.660 | 81.95- | 80.14- | 77.31-- |
| DIALIGN 2.2 | 50.730 | 86.66- | 86.910 | 74.050 | 83.310 | 80.690 | 77.52-- |
| CLUSTAL W2 | 50.060 | 86.430 | 85.160 | 72.50- | 78.930 | 74.24- | 75.36-- |
| T-COFFEE 5.56 | 58.22++ | 92.27++ | 90.92++ | 79.09+ | 86.03+ | 86.09+ | 82.41++ |
| POA V2 | 37.96-- | 83.19-- | 85.28- | 71.93- | 78.22-- | 71.49-- | 72.17-- |
| MAFFT 6.240 L-INSi | 93.63++ | 85.55++ | |||||
| MAFFT 6.240 E-INSi | 66.00++ | 93.61++ | 92.64++ | 91.46++ | 89.91++ | 86.83++ | |
| MUSCLE 3.7 | 57.90+ | 91.67++ | 89.17+ | 80.60+ | 87.26+ | 83.390 | 82.19++ |
| PROBCONS 1.12 | 66.99++ | 91.68++ | 84.61++ | 90.24++ | 89.28++ | 86.40++ |
Average sum-of-pair scores (SPS) of the benchmarked programs on the core blocks of BALIBASE 3. The symbols are analogous to Table 1.
Column scores on BALIBASE 3
| Method (Protein) | RV11 | RV12 | RV20 | RV30 | RV40 | RV50 | Total |
| DIALIGN-TX 1.0 | 26.53 | 75.23 | 30.49 | 38.53 | 44.82 | 46.56 | 44.34 |
| DIALIGN-T 0.2.2 | 25.320 | 72.550 | 29.200 | 34.90- | 45.230 | 44.250 | 42.76- |
| DIALIGN 2.2 | 26.500 | 69.55- | 29.220 | 31.23- | 44.120 | 42.50- | 41.49-- |
| CLUSTAL W2 | 22.740 | 71.590 | 21.980 | 27.23- | 39.550 | 30.75- | 37.35-- |
| T-COFFEE 5.56 | 31.340 | 81.18++ | 37.81+ | 36.570 | 48.200 | 50.630 | 48.54++ |
| POA V2 | 15.26-- | 63.84-- | 23.34- | 28.23- | 33.67-- | 27.00-- | 33.37-- |
| MAFFT 6.240 L-INSi | 83.75++ | 56.93++ | 56.19+ | ||||
| MAFFT 6.240 E-INSi | 43.71++ | 83.43++ | 44.63++ | 58.33++ | 58.37++ | ||
| MUSCLE 3.7 | 33.03+ | 80.46++ | 35.220 | 38.770 | 45.960 | 44.940 | 47.58++ |
| PROBCONS 1.12 | 41.68++ | 40.49++ | 54.37++ | 52.90++ | 56.50++ | 55.66++ |
Average column scores (CS) of the benchmarked programs on the core blocks of BAL-IBASE 3. The symbols are analogous to Table 1.
Sum-of-pairs scores on BRAliBase II
| Method (DNA) | G2In | rRNA | SRP | tRNA | U5 | Total |
| DIALIGN-TX 1.0 | 72.08 | 91.69 | 82.92 | 78.53 | 77.80 | 80.42 |
| DIALIGN-T 0.2.2 | 54.68-- | 69.13-- | 60.81-- | 64.44-- | 67.87-- | 63.53-- |
| DIALIGN 2.2 | 71.720 | 89.89-- | 81.47-- | 78.570 | 76.16-- | 79.37-- |
| CLUSTAL W2 | 72.680 | 93.25+ | 87.40++ | 86.96++ | 79.56+ | 83.80++ |
| T-COFFEE 5.56 | 73.790 | 90.94+ | 83.900 | 81.650 | 79.13+ | 81.73+ |
| POA V2 | 67.22-- | 88.92-- | 85.47++ | 76.91- | 77.280 | 79.02-- |
| MAFFT 6.240 L-INSi | 78.93++ | 93.85+ | 87.46++ | 91.79++ | 82.80++ | 86.84++ |
| MAFFT 6.240 E-INSi | 77.39++ | 93.80+ | 87.24++ | 90.60++ | 80.46++ | 85.71++ |
| MUSCLE 3.7 | 76.42++ | 94.04+ | 87.06++ | 87.27++ | 79.71+ | 84.69++ |
| PROBCONSRNA 1.10 |
Average sum-of-pair scores (SPS) of the benchmarked programs on BRAliBase II. The The symbols are analogous to Table 1.
Column scores on BRAliBase II
| Method (DNA) | G2In | rRNA | SRP | tRNA | U5 | Total |
| DIALIGN-TX 1.0 | 60.85 | 84.33 | 70.95 | 68.05 | 62.71 | 69.03 |
| DIALIGN-T 0.2.2 | 36.51-- | 50.00-- | 42.34-- | 52.01-- | 50.34-- | 46.43-- |
| DIALIGN 2.2 | 60.900 | 81.08-- | 68.53-- | 67.590 | 60.11- | 67.29-- |
| CLUSTAL W2 | 61.240 | 86.720 | 76.61++ | 76.20++ | 65.11+ | 72.85++ |
| T-COFFEE 5.56 | 60.240 | 82.56- | 71.630 | 69.230 | 62.930 | 69.010 |
| POA V2 | 55.21-- | 80.38-- | 73.77++ | 66.030 | 61.630 | 67.12-- |
| MAFFT 6.240 L-INSi | 65.23+ | 87.49+ | 76.75++ | 84.59++ | 68.46++ | 76.25++ |
| MAFFT 6.240 E-INSi | 63.84+ | 87.34+ | 76.59++ | 83.29++ | 65.71++ | 75.04++ |
| MUSCLE 3.7 | 63.200 | 87.97+ | 76.57++ | 78.01++ | 64.34+ | 73.64++ |
| PROBCONSRNA 1.10 |
Average column scores (CS) of the benchmarked programs on BRAliBase II. The symbols are analogous to Table 1.
Run time on BALIBASE 3 and BRAliBase II
| Method | Average runtime on BALIBASE 3 | Average runtime on BRAliBase II |
| DIALIGN-TX 1.0 | 33.37 | 0.15 |
| DIALIGN-T 0.2.2 | 27.79 | 0.08 |
| DIALIGN 2.2 | 45.41 | 0.09 |
| CLUSTAL W2 | 8.72 | 0.07 |
| T-COFFEE 5.56 | 315.78 | 1.95 |
| POA V2 | 8.07 | 0.04 |
| MAFFT 6.240 L-INSi | 19.51 | 0.26 |
| MAFFT 6.240 E-INSi | 28.26 | 0.27 |
| MUSCLE 3.7 | 10.49 | 0.05 |
| PROBCONS(RNA) 1.12(1.10) | 168.65 | 0.24 |
Average running time (in seconds) per multiple alignment for sequence families on BALIBASE 3 and BRAliBase II. Program runs were performed on a Linux workstation with an 3.2 GHz Pentium 4 processor and 2 GB RAM.