| Literature DB >> 28606054 |
Thomas M Poulsen1, Martin Frith2,3,4.
Abstract
BACKGROUND: Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effects, but introduce design and implementation issues that have typically made them impractical for read mapping applications. We present a variable-order paired HMM that we term VarHMM, which addresses central issues involved with higher order modeling for sequence alignment.Entities:
Keywords: HMM; Higher order; Ion Torrent; Pathogen detection; Sequence alignment
Mesh:
Substances:
Year: 2017 PMID: 28606054 PMCID: PMC5469136 DOI: 10.1186/s12859-017-1710-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Paired HMM: a States and connections: B = begin and E = end states, and M = match, X,Y = output states (B to E connection not shown). b Example of an alignment between two sequences (state that was entered in the paired HMM is shown for each position of the alignment)
Fig. 2Finding the alignment using Viterbi: a For each state M, X, Y we calculate an alignment matrix: each matrix position contains the alignment probability V(i,j) up until the position i in Sequence1 and j in Sequence2 (shown in Fig. 1). b Example of how the alignment matrices are traversed using the alignment from Fig. 1. c To find the alignment with the highest probability: V(i,j) is calculated for each of the states M: V M(i,j), X: V X(i,j), Y: V Y(i,j). The probability of transitioning between two states is given by a (e.g. from X to M is a ), and is multiplied by the alignment probability V of the state and position we are coming from (for example V M(i,j)=a ·V X(i−1,j−1) if going from state X to M). We then transition from the previous state and position which maximizes V M(i,j) for alignment matrix M, and V X(i,j), V Y(i,j) for the X,Y matrices. We perform this for each i,j position and always store which previous state was transitioned from. The alignment with the highest probability V can then be found using traceback: given a Sequence1 of length n and a Sequence2 of length m, traceback starts from the state given by: V =max{a ·V M(n,m), X: a ·V X(n,m), Y: a ·V Y(n,m)} where a , a , and a are the transition probabilities from M, X, and Y respectively to E
Fig. 3a The Viterbi equations, extended for the higher order paired HMMs. In addition to the X c and Y c counts, a higher order HMM is also restricted by its order which we term n (see Fig. 2 for definition of other variables). The order used to calculate , , is then taken as the minimum between the X ,Y counts and n as shown above. b Update of count matrices, where and are the x and y values we from the Viterbi path we take in the (a) equations (see text for details) c The order used and the X c and Y c values updated at each alignment position for a 2nd order paired HMM (using Fig. 2 example). The order used at each position is based on the X c and Y c counts from the previous position and the equations in (a). For example when A aligns to A at V M(3,4), the previous alignment position was at (2,3) in state X, and so the order used to calculate V M(3,4) is the minimum of [, , n ], which is equal to zero as
Fig. 4Comparison of different alignment methods: a R200 reads aligned to DH10B and MG1655, b R400 reads aligned to DH10B and MG1655, c R200 reads aligned to DH10B and C3026, d R400 reads aligned to DH10B and C3026
Run times for the different alignment methods for aligning the R200 reads to DH10B and MG1655
| Aligner | Run time (minutes) |
|---|---|
| Bowtie2 | 1.10 |
| Last | 1.85 |
| TMAP | 2.10 |
| Segemehl | 48.24 |
| VarHMM | 210.45 |
| 1) | Sequence1 | T | A | A | T | C | G |
| Sequence2 | T | A | A | T | - | G | |
| States | M | M | M | M | X | M |