| Literature DB >> 16556910 |
Iain M Wallace1, Orla O'Sullivan, Desmond G Higgins, Cedric Notredame.
Abstract
We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from http://www.tcoffee.org/.Entities:
Mesh:
Year: 2006 PMID: 16556910 PMCID: PMC1410914 DOI: 10.1093/nar/gkl091
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Methods Tree. A UPGMA tree which shows the clustering of all the multiple alignments. Pairwise distances are calculated on the HOMSTRAD benchmark by computing the SP differences of the alignments produced by individual methods.
Figure 2CS after combining multiple alignment methods with T-Coffee. Alignments are added in order of decreasing performance as single methods (as determined on HOMSTRAD) from left to right. The peak of 67.59 is achieved using a combination of six methods. It is significantly better than ProbCons, the best single method (Wilcoxon signed rank test, P < 0.001), whose performance is materialized by the straight line.
The first column lists the individual methods used
| Alignment method | Default %CS | VarCov weight | THG weight | ALT weight | ACC weight | No weight |
|---|---|---|---|---|---|---|
| DIALIGN | 55.71 | 1.31 | 1.33 | 1.83 | 0.92 | 1.00 |
| FFTNS1 | 58.27 | 0.81 | 0.74 | 0.64 | 0.96 | 1.00 |
| FFTNS2 | 60.47 | 0.40 | 0.64 | 0.44 | 1.00 | 1.00 |
| FFTNSI | 63.07 | 0.17 | 0.64 | 0.44 | 1.04 | 1.00 |
| GINSI | 63.43 | 0.50 | 0.74 | 0.38 | 1.04 | 1.00 |
| Muscle v3.52 | 64.49 | 1.02 | 0.85 | 0.54 | 1.06 | 1.00 |
| POA-local | 49.28 | 1.42 | 1.87 | 2.51 | 0.81 | 1.00 |
| %CS for M-Coffee15 | 67.33 | 66.96 | 65.79 | 67.16 | 67.11 | |
Methods in boldface marked with an asterisk are part of the M-Coffee8 selection of methods. Column 2 indicates the average performance of each individual method on HOMSTRAD. Columns 3–7 are the weights for each method as calculated by the indicated weighting schemes. The last two lines show the average score of M-Coffee15 and M-Coffee8, using all the indicted weighting schemes.
Figure 3M-Coffee8. The top line (closed diamonds) is the CS on the HOMSTRAD benchmark after combining multiple alignments using only one method per developer. The bottom line (closed squares) is the default performance for each method on the benchmark.
Figure 4Effect of adding in 1, 2 or 3 extra ClustalW alignments to M-Coffee8. The average accuracy of ClustalW is materialized by the solid line.
The CS accuracy performance of M-Coffee8 and various individual methods on the HOMSTRAD, Prefab and Balibase references
| M-Coffee8 | ClustalW | Dialign-T | FINSI | Muscle 6 | PCMA | POA | Probcons | T-Coffee | |
|---|---|---|---|---|---|---|---|---|---|
| Homstrad | 61.15 | 57.92 | 64.22 | 66.04 | 63.73 | 51.9 | 66.41 | 65.37 | |
| Prefab <10% | 18.25 | 15.51 | 24.86 | 24.14 | 25.53 | 9.09 | 24.81 | 23.41 | |
| Prefab 10 to <20% | 43.27 | 44.11 | 58.76 | 54.76 | 55.96 | 32.26 | 56.21 | 55.28 | |
| Prefab 20 to <30% | 74.79 | 75.28 | 83.76 | 82.09 | 81.47 | 64.42 | 82.85 | 82.39 | |
| Prefab 30 to <40% | 87.27 | 85.62 | 91.81 | 90.42 | 89.84 | 79.96 | 91.68 | 91.51 | |
| Prefab 40 to <100% | 94.91 | 96.07 | 96.92 | 96.17 | 95.03 | 94.30 | 96.20 | 96.68 | |
| Prefab total | 61.68 | 62.05 | 72.01 | 69.56 | 69.76 | 52.61 | 70.54 | 69.97 | |
| BaliBase Set: 11 | 22.68 | 25.32 | 38.95 | 34.37 | 37.45 | 11.18 | 39.55 | 32.68 | |
| BaliBase Set: 12 | 71.43 | 72.57 | 82.68 | 84.80 | 82.61 | 51.05 | 84.80 | 83.00 | |
| BaliBase Set: 20 | 43.12 | 21.68 | 29.20 | 36.49 | 44.83 | 13.37 | 37.78 | 39.68 | |
| BaliBase Set: 30 | 25.48 | 35.19 | 57.59 | 41.04 | 58.15 | 7.89 | 47.26 | 47.48 | |
| BaliBase Set: 40 | 58.17 | 39.04 | 44.75 | 48.42 | 53.83 | 14.42 | 51.25 | 55.58 | |
| BaliBase Set: 50 | 59.81 | 33.69 | 44.25 | 57.69 | 50.56 | 21.63 | 55.25 | 57.31 | |
| BaliBase Set: S11 | 40.76 | 33.34 | 50.63 | 59.37 | 44.76 | 31.37 | 58.45 | 47.61 | |
| BaliBase Set: S12 | 86.59 | 79.05 | 76.20 | 84.02 | 86.95 | 82.91 | 68.14 | 83.75 | |
| BaliBase Set: S2 | 44.37 | 36.90 | 53.85 | 55.78 | 51.85 | 35.24 | 54.46 | 49.78 | |
| BaliBase Set: S3 | 49.69 | 47.31 | 63.83 | 63.14 | 64.10 | 36.14 | 65.03 | 64.45 | |
| BaliBase Set: S5 | 43.27 | 45.47 | 57.73 | 60.33 | 56.73 | 28.47 | 59.80 | 55.67 | |
| BaliBase total | 42.83 | 44.59 | 59.34 | 56.47 | 57.92 | 29.00 | 58.24 | 56.10 |
HOMSTRAD was evaluated with aln_compare, Prefab with Qscore and BaliBAse with BaliScore. Methods significantly better (P < 0.05) than the next best are marked with an asterisk. The highest score in each benchmark is highlighted in bold.
Individual dataset analysis
| M-Coffee8 better | M-Coffee8 worse | P(Wilcoxon Signed) | Best single method | |
|---|---|---|---|---|
| Homstrad | 139 | 65 | 0.000 | ProbCons |
| Prefab <10% | 49 | 37 | 0.16 | PCMA |
| Prefab 10 to <20% | 326 | 226 | 0.000 | Finsi |
| Prefab 20 to <30% | 278 | 132 | 0.000 | Finsi |
| Prefab 30 to <40% | 64 | 35 | 0.003 | ProbCons |
| Prefab 40 to <100% | 62 | 25 | 0.002 | Finsi |
| Prefab total | 779 | 455 | 0.000 | / |
| BaliBase Set: 11 | 19 | 5 | 0.002 | ProbCons |
| BaliBase Set: 12 | 26 | 7 | 0.008 | ProbCons |
| BaliBase Set: 20 | 16 | 14 | 0.967 | Finsi |
| BaliBase Set: 30 | 16 | 5 | 0.013 | PCMA |
| BaliBase Set: 40 | 24 | 10 | 0.333 | Finsi |
| BaliBase Set: 50 | 12 | 4 | 0.078 | PCMA |
| BaliBase Set: S11 | 12 | 15 | 0.793 | Muscle 6 |
| BaliBase Set: S12 | 13 | 11 | 0.437 | ProbCons |
| BaliBase Set: S2 | 21 | 13 | 0.397 | Muscle 6 |
| BaliBase Set: S3 | 19 | 6 | 0.024 | ProbCons |
| BaliBase Set: S5 | 8 | 5 | 0.623 | Muscle 6 |
| BaliBase total | 186 | 95 | 0.002 | / |
| Total | 1104 | 615 | / | |
| Total versus ProbCons | 1249 | 615 | ProBcons |
The data are the same as in Table 2. On each subset, M-Coffee8 is compared with the best performing method. Column 2 indicates the number of times M-Cofee8 is better/worse than the best single method on that subset. The two last lines indicate the total for the table (Total) and the result of a comparison against ProbCons, the best individual method.