| Literature DB >> 27585754 |
Yongtao Ye1, Tak-Wah Lam1, Hing-Fung Ting2.
Abstract
BACKGROUND: This paper describes a new MSA tool called PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies sequences into two types: normally related and distantly related. For normally related sequences, it uses an adaptive approach to construct the guide tree needed for progressive alignment; it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the better method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree and uses instead some non-progressive alignment method to generate the alignment.Entities:
Keywords: Guide trees; Multiple sequence alignment; Phylogenetic trees
Mesh:
Year: 2016 PMID: 27585754 PMCID: PMC5009527 DOI: 10.1186/s12859-016-1121-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Some structure property that we use for handling normally related sequences
Fig. 2Accumulated TC score difference between GLProbs-WPGMA and GLProbs-UPGMA
Fig. 3Accumulated TC score difference between PicXAA-AD and GLProbs
Average SP and TC scores on OXBench
| ALL (0–100 %) | 0 %–20 % | 20 %–50 % | 50 %–100 % | Time | |||||
|---|---|---|---|---|---|---|---|---|---|
| SP | TC | SP | TC | SP | TC | SP | TC | mm:ss | |
| PNPProbs |
|
|
|
| 83.47 ∗ |
|
|
| 2:58 |
| GLProbs | 90.38 ∗ | 82.14 ∗ | 47.29 ∗ | 22.95 ∗ |
| 68.65 ∗ |
|
| 3:15 |
| MSAProbs | 90.07 | 81.75 | 44.83 | 22.08 | 82.77 | 67.74 | 98.01 | 95.08 | 4:04 |
| Probalign | 89.97 | 81.68 | 43.58 | 20.51 | 82.53 | 67.46 |
|
| 2:10 |
| CONTRAlign | 89.34 | 79.87 | 44.76 | 17.83 | 81.56 | 64.75 | 97.55 | 94.10 | 10:19 |
| ProbCons | 89.68 | 80.86 | 44.15 | 20.30 | 82.06 | 66.33 | 97.84 | 94.61 | 1:48 |
| MUSCLE | 89.50 | 80.67 | 45.64 | 21.90 | 81.75 | 66.15 | 97.63 | 94.28 | 0:19 |
| MAFFT | 88.00 | 77.96 | 37.82 | 13.27 | 78.99 | 60.86 | 97.41 | 93.68 | 0:19 |
| T-Coffee | 89.52 | 80.50 | 43.99 | 19.11 | 81.82 | 65.85 | 97.75 | 94.38 | 15:05 |
| Clustal | 88.91 | 79.99 | 39.09 | 16.38 | 80.71 | 64.49 | 97.76 | 94.58 | 0:12 |
| ClustalW | 89.43 | 80.16 | 42.94 | 18.23 | 81.67 | 65.01 | 97.76 | 94.40 | 0:22 |
| PicXAA | 89.64 | 80.74 | 45.11 | 22.04 | 81.86 | 65.91 | 97.84 | 94.55 | 4:26 |
| DIALIGN | 83.97 | 72.41 | 26.03 | 8.07 | 72.67 | 52.57 | 95.21 | 89.54 | 3:17 |
| Align-m | 86.95 | 76.06 | 28.36 | 12.74 | 76.35 | 57.54 | 96.95 | 92.60 | 21:14 |
The table shows the average SP and average TC score (multiplied by 100). The best and second best results in each column are marked in bold and with *, respectively. The last column shows the running time using a single CPU thread. Note that we use default parameters for all tools
Fig. 4Mean TC score on OXBench (0–20 % similarity)
Average SP and TC scores on SABmark
| ALL | Twilight Zone | Superfamily | Time | ||||
|---|---|---|---|---|---|---|---|
| SP | TC | SP | TC | SP | TC | mm:ss | |
| PnpProbs | 61.37 ∗ |
|
|
| 67.19 ∗ |
| 3:00 |
| GLProbs |
| 41.36 ∗ | 44.35 ∗ | 24.30 ∗ |
| 47.21 ∗ | 3:20 |
| MSAProbs | 60.27 | 40.02 | 42.97 | 22.88 | 66.20 | 45.90 | 1:58 |
| Probalign | 59.53 | 38.63 | 42.42 | 22.64 | 65.39 | 44.11 | 1:01 |
| CONTRAlign | 57.45 | 35.59 | 39.01 | 17.69 | 63.77 | 41.73 | 4:56 |
| ProbCons | 59.69 | 39.17 | 42.81 | 22.78 | 65.47 | 44.79 | 1:12 |
| MUSCLE | 54.51 | 33.47 | 34.69 | 16.96 | 61.29 | 39.13 | 0:46 |
| MAFFT | 52.63 | 32.57 | 31.72 | 15.17 | 59.79 | 38.53 | 0:22 |
| T-Coffee | 59.14 | 39.53 | 41.66 | 23.29 | 65.13 | 45.10 | 4:36 |
| Clustal | 55.02 | 35.47 | 35.55 | 18.10 | 61.69 | 41.42 | 0:18 |
| ClustalW | 51.92 | 31.37 | 31.45 | 15.09 | 58.93 | 36.95 | 0:14 |
| PicXAA | 59.37 | 39.11 | 41.05 | 21.51 | 65.65 | 45.14 | 3:29 |
| DIALIGN | 47.09 | 27.11 | 27.85 | 12.73 | 53.69 | 32.05 | 1:03 |
| Align-m | 46.19 | 31.07 | 25.72 | 16.28 | 53.21 | 36.14 | 5:32 |
The best and second best results in each column are marked in bold and with *, respectively
Average SP and TC scores on BAliBASE
| ALL | RV11 | RV12 | Time | ||||
|---|---|---|---|---|---|---|---|
| SP | TC | SP | TC | SP | TC | mm:ss | |
| PnpProbs | 82.80 ∗ |
| 68.91 |
| 94.79 ∗ | 87.23 ∗ | 3:22 |
| GLProbs |
| 67.59 ∗ |
| 44.68 |
|
| 4:05 |
| MSAProbs | 82.35 | 66.83 | 68.13 | 44.02 | 94.63 | 86.52 | 3:02 |
| Probalign | 82.53 | 67.27 | 69.50 ∗ | 45.34 ∗ | 94.63 | 86.20 | 1:47 |
| CONTRAlign | 77.59 | 58.10 | 61.78 | 35.60 | 91.23 | 77.52 | 6:37 |
| ProbCons | 81.55 | 65.22 | 66.99 | 41.68 | 94.12 | 85.54 | 1:41 |
| MUSCLE | 75.60 | 58.27 | 57.15 | 32.06 | 91.53 | 80.89 | 0:37 |
| MAFFT | 72.46 | 52.58 | 52.96 | 26.19 | 89.30 | 75.38 | 0:14 |
| T-Coffee | 80.82 | 64.93 | 65.63 | 41.36 | 93.94 | 85.29 | 5:18 |
| Clustal | 75.96 | 59.38 | 59.01 | 36.21 | 90.60 | 79.38 | 0:21 |
| ClustalW | 69.63 | 49.21 | 50.06 | 22.99 | 86.52 | 71.84 | 0:21 |
| PicXAA | 81.33 | 66.08 | 66.56 | 44.06 | 93.47 | 84.19 | 3:26 |
| DIALIGN | 68.63 | 48.22 | 49.72 | 26.81 | 84.18 | 65.81 | 1:34 |
| Align-m | 71.45 | 56.04 | 51.88 | 33.06 | 88.36 | 75.88 | 7:09 |
The best and second best results in each column are marked in bold and with *, respectively
Fig. 5Running time of PnpProbs
Fig. 6Similarity between hypothesized trees and model trees for simulated data
Fig. 7RF distance difference between RefAln and other hypothesized trees
Fig. 8Similarity between hypothesized trees and model Trees for empirical data