| Literature DB >> 26568625 |
Gearóid Fox1, Fabian Sievers1, Desmond G Higgins1.
Abstract
MOTIVATION: Multiple sequence alignments (MSAs) with large numbers of sequences are now commonplace. However, current multiple alignment benchmarks are ill-suited for testing these types of alignments, as test cases either contain a very small number of sequences or are based purely on simulation rather than empirical data.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26568625 PMCID: PMC5939968 DOI: 10.1093/bioinformatics/btv592
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Flowchart of benchmark process for one test case. Sequences from Pfam are aligned with the method of interest and the resulting MSA is used to predict residue–residue contacts for one of the proteins in the alignment. The 3D coordinates of this target protein are used to calculate the true residue–residue contacts. The two lists of contacts are compared to calculate a score for the alignment
Benchmark scores for alignments from Pfam and seven MSA packages
| Alignment method | Mean PSICOV precision | Mean EVfold-mfDCA precision | ContTest score | |
|---|---|---|---|---|
| hmmt | 0.526 | 0.590 | 0.551 | |
|
| ||||
| Kalign1 (fast) | 0.527 | 0.581 | 0.550 | |
|
| ||||
| Pfam | 0.535 | 0.577 | 0.545 | |
|
| ||||
| Kalign 2 | 0.507 | 0.533 | 0.513 | |
|
| ||||
| Clustal Omega | 0.497 | 0.390 | 0.428 | |
|
| ||||
| MAFFT (default) | 0.433 | 0.379 | 0.396 | |
|
| ||||
| Clustal W2 | 0.413 | 0.373 | 0.381 | |
|
| ||||
| MUSCLE (2 iterations) | 0.445 | 0.329 | 0.371 |
Mean PSICOV precision and mean EVfold-mfDCA precision are the mean precisions of the top L/5 long range contacts predicted by PSICOV and EVfold-mfDCA, respectively. ContTest Score is the geometric mean of the PSICOV and Evfold-mfDCA precisions. Statistical significances are indicated for the differences in consecutive pairs of ContTest scores. *P < 0.007; **P < 0.0014; ***P < 0.00014 (0.05, 0.001 and 0.001 with Bonferroni correction for seven tests, respectively); NS, not significant.
Fig. 2.Comparison of Kalign 2 and Clustal Omega guide tree imbalance. The Sackin score (sum of distances from leaves to root) produced by each program is plotted against the number of sequences in the alignment for each test case. Values for fully chained and balanced trees and expected values under the Equal Rates Markov and Proportional to Distinguishable Arrangements models of tree growth are indicated with lines
Benchmark scores for a variety of parameter sets of MAFFT, MUSCLE and Clustal Omega
| Alignment method | Mean PSICOV precision | Mean EVfold-mfDCA precision | ContTest score |
|---|---|---|---|
| Clustal Omega (chained) | 0.501 | 0.540 | 0.513 |
| Clustal Omega (Pfam HMM) | 0.527 | 0.420 | 0.459 |
| Clustal Omega (3 iterations) | 0.510 | 0.400 | 0.441 NS |
| Clustal Omega (2 iterations) | 0.516 | 0.398 | 0.441 NS |
| Clustal Omega (default) | 0.497 | 0.390 | 0.428 |
| MAFFT (chained) | 0.509 | 0.530 | 0.517 |
| MAFFT (default) | 0.433 | 0.379 | 0.396 |
| MAFFT NW-NS-PartTree-1 | 0.469 | 0.334 | 0.389 NS |
| MUSCLE (chained, 2 iterations) | 0.508 | 0.553 | 0.526 |
| MUSCLE (2 iterations) | 0.445 | 0.329 | 0.371 |
| MUSCLE (chained, 1 iteration) | 0.499 | 0.541 | 0.516 |
| MUSCLE (1 iteration) | 0.415 | 0.314 | 0.354 |
Statistical significances of the differences in ContTest score between default and variant parameters of each package is indicated. Clustal Omega with chained guide trees, external HMM and two and three iterations are compared with default Clustal Omega scores. MAFFT PartTree and MAFFT with chained guide trees are compared with default MAFFT. MUSCLE with two iterations and a starting chained guide tree is compared with two iterations of MUSCLE. MUSCLE with a chained guide tree and one iteration is compared with MUSCLE with one iteration.
***P < 0.001.
Fig. 3.Introducing random misalignments decreases the benchmark score. Each boxplot represents 20 replicates where a different random subset of sequences is misaligned. There is a strong correlation between more errors and decreasing benchmark score
Sum-of-pairs scores and ContTest scores for 80 test cases with embedded HOMSTRAD sequences
| Alignment method | Mean SP score | ContTest score |
|---|---|---|
| Clustal Omega (default) | 0.722 | 0.410 |
| Kalign 1 (fast) | 0.722 | 0.548 |
| hmmt | 0.718 | 0.543 |
| Kalign 2 | 0.709 | 0.507 |
| MUSCLE (2 iterations, chained) | 0.696 | 0.519 |
| MUSCLE (1 iteration, chained) | 0.676 | 0.509 |
| MAFFT FFT-NS-2 | 0.610 | 0.376 |
| MUSCLE (2 iterations) | 0.588 | 0.355 |
| MAFFT NW-NS-PartTree-1 | 0.567 | 0.389 |
| MUSCLE (1 iteration) | 0.529 | 0.331 |