| Literature DB >> 24564436 |
Abstract
BACKGROUND: Sequence alignment has become an indispensable tool in modern molecular biology research, and probabilistic sequence alignment models have been shown to provide an effective framework for building accurate sequence alignment tools. One such example is the pair hidden Markov model (pair-HMM), which has been especially popular in comparative sequence analysis for several reasons, including their effectiveness in modeling and detecting sequence homology, model simplicity, and the existence of efficient algorithms for applying the model to sequence alignment problems. However, despite these advantages, pair-HMMs also have a number of practical limitations that may degrade their alignment performance or render them unsuitable for certain alignment tasks.Entities:
Mesh:
Year: 2014 PMID: 24564436 PMCID: PMC4046711 DOI: 10.1186/1471-2164-15-S1-S14
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Pair hidden Markov models. (A) The state transition diagram of a widely used pair-HMM. (B) An alternative pair-HMM implementation that does not allow transitions between the two insertion states I and I . (C) An example of a sequence pair (x, y) that is generated by a pair-HMM.
Figure 2Illustration of the proposed message-passing scheme. At iteration n, the alignment confidence score c (i, j) of the symbol pair (x , y ) is updated based on the messages received from its neighbors (x , y ) and (x , y ) and the joint occurrence probability P(x , y ) of the symbols x and y . Solid lines indicate the messages that are used to update c (i, j), while the dashed lines correspond to messages that are used to update the alignment confidence scores of other symbol pairs.
Pairwise sequence alignment performance evaluated on the BAliBASE 3.0 benchmark.
| Ref | Pair-HMM | Message-Passing | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
| |
| RV11 | 0.048 | 0.106 | 1.934 | 0.123 | 0.149 |
| 0.155 | 0.175 | 1.465 |
|
| 3.675 |
| RV12 | 0.213 | 0.414 | 2.707 | 0.399 | 0.468 |
| 0.475 | 0.523 | 2.145 |
|
| 5.200 |
| RV20 | 0.276 | 0.476 | 2.725 | 0.504 | 0.568 |
| 0.570 | 0.613 | 2.251 |
|
| 5.531 |
| RV30 | 0.168 | 0.300 | 2.656 | 0.324 | 0.369 |
| 0.372 | 0.402 | 2.160 |
|
| 5.432 |
| RV40 | 0.153 | 0.271 | 4.084 | 0.250 | 0.284 |
| 0.300 | 0.323 | 3.234 |
|
| 7.970 |
| RV50 | 0.140 | 0.254 | 4.969 | 0.248 | 0.278 |
| 0.294 | 0.312 | 3.967 |
|
| 9.856 |
The average sensitivity (SN), positive predictive value (PPV), and CPU time (seconds) on different reference sets are shown for each sequence alignment scheme. All experiments were performed in Matlab on a MacPro workstation with 2 × 2.8 GHz Quad-Core Intel Xeon processors and 32GB memory.
Figure 3Performance comparison between the proposed message-passing scheme and the traditional pair-HMM approach. The plots in the left column show the distributions of the sensitivity difference SN -SN between the message-passing scheme and the pair-HMM-based approach. In the right column, the distributions of the difference between the positive predictive values PPV - PPV of the two schemes are shown. Each row shows the evaluation results obtained from each of the six reference sets in BAliBASE 3.0.