| Literature DB >> 31760933 |
Qing Zhan1, Nan Wang2, Shuilin Jin2, Renjie Tan1, Qinghua Jiang3, Yadong Wang4.
Abstract
BACKGROUND: During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches.Entities:
Keywords: Hidden Markov Model; Multiple sequence alignment; Particle swarm optimization; Partition function
Mesh:
Substances:
Year: 2019 PMID: 31760933 PMCID: PMC6876095 DOI: 10.1186/s12859-019-3132-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Framework of the ProbPFP algorithm
Fig. 2Comparison of mean SoP scores for different numbers of particles and iterations
Mean TC and SP Scores for 14 Aligners on OXBench
| Aligner | Mean TC score | Mean SP score |
|---|---|---|
| ProbPFP | ||
| Probalign | *81.68* | *89.97* |
| ProbCons | 80.86 | 89.68 |
| T-Coffee | 80.50 | 89.52 |
| PicXAA | 80.74 | 89.64 |
| CONTRAlign | 79.87 | 89.34 |
| COBALT | 79.73 | 88.96 |
| Clustal | 79.99 | 88.91 |
| MUSCLE | 80.67 | 89.50 |
| KALIGN | 78.88 | 88.39 |
| MAFFT | 77.96 | 88.00 |
| ClustalW | 80.16 | 89.43 |
| Align-m | 76.06 | 86.95 |
| DIALIGN | 72.14 | 83.97 |
The scores in this table are multiplied by 100. In each column, the maximum score is highlighted in bold, while the second highest score is displayed between two asterisks
Mean TC and SP Scores for 14 Aligners on BAliBASE
| Aligner | Mean TC score | Mean SP score |
|---|---|---|
| ProbPFP | *67.03* | *82.50* |
| Probalign | ||
| ProbCons | 65.22 | 81.55 |
| T-Coffee | 64.93 | 80.82 |
| PicXAA | 66.08 | 81.33 |
| CONTRAlign | 58.10 | 77.59 |
| COBALT | 57.49 | 76.08 |
| Clustal | 59.38 | 75.96 |
| MUSCLE | 58.27 | 75.60 |
| KALIGN | 59.66 | 76.99 |
| MAFFT | 52.58 | 72.46 |
| ClustalW | 49.21 | 69.63 |
| Align-m | 56.04 | 71.45 |
| DIALIGN | 48.22 | 68.63 |
The scores in this table are multiplied by 100. In each column, the maximum score is highlighted in bold, while the second highest score is displayed between two asterisks
Mean TC and SP Scores for 14 Aligners on SABmark
| Aligner | Mean TC score | Mean SP score |
|---|---|---|
| ProbPFP | ||
| Probalign | 38.63 | 59.53 |
| ProbCons | 39.17 | *59.69* |
| T-Coffee | *39.53* | 59.14 |
| PicXAA | 39.11 | 59.37 |
| CONTRAlign | 35.59 | 57.54 |
| COBALT | 36.00 | 56.71 |
| Clustal | 35.47 | 55.02 |
| MUSCLE | 33.47 | 54.51 |
| KALIGN | 33.22 | 52.13 |
| MAFFT | 32.57 | 52.63 |
| ClustalW | 31.37 | 51.92 |
| Align-m | 31.07 | 46.19 |
| DIALIGN | 27.11 | 47.09 |
The scores in this table are multiplied by 100. In each column, the maximum score is highlighted in bold, while the second highest score is displayed between two asterisks
Robinson-Foulds Distances between the Inferred Phylogenetic Trees with the Reference Tree
| TreeFam ID | ProbPFP | MUSCLE | MSAProbs | Clustal | T-Coffee |
|---|---|---|---|---|---|
| TF101116 (104) | 0.97 | 0.98 | 0.90 | ||
| TF105063 (133) | 0.83 | 0.85 | 0.84 | 0.84 | |
| TF105629 (88) | 0.66 | 0.67 | 0.68 | 0.65 | |
| TF105895 (89) | 0.53 | 0.53 | 0.56 | 0.51 | |
| TF106377 (26) | 0.48 | 0.48 | 0.48 | 0.43 | |
| TF101222 (48) | 0.71 | 0.78 | 0.76 |
For each family, the number in the parentheses after the ID represents the sequences amount of the family. The smallest distances are highlighted in bold, in each row