| Literature DB >> 28379292 |
Abstract
MOTIVATION: Next-generation sequencing (NGS) provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast and accurate alignment algorithms. Most existing algorithms for read mapping basically adopt seed-and-extend strategy, which is sequential in nature and takes much longer time on longer reads.Entities:
Mesh:
Year: 2017 PMID: 28379292 PMCID: PMC5860120 DOI: 10.1093/bioinformatics/btx189
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1The algorithm to explore all LMEMs with length ≥ k. BWT_search is the function to search for the occurrences of the maximal exact match for R[start, stop] on the given BWT array. It returns desirable LMEMs as well as their occurrences on the reference genome
Fig. 2Simple pair A overlaps with simple pair B. Kart removes the overlap by shrinking the size of the smaller simple pair
Fig. 3Simple pairs and normal pairs. A read sequence can be decomposed into different parts according to the alignment with the genome sequence. A simple pair represents a pair of identical sequence fragments; a normal pair represents a pair of sequence fragments which contains some sequence variations in the alignment
The average sizes of simple and normal pairs after two rounds of sequence partition on Illumina datasets (those NP-clips were not included in the percentage calculation)
| Dataset | LMEM- seed | 8-LMEM- seed | NP-gap free | NP- indels | NP-NW |
|---|---|---|---|---|---|
| SRR622458 | 73.0 | 11.4 | 3.9 | 1.8 | 17.5 |
| (96.5%) | (0.9%) | (0.7%) | (0%) | (1.9%) | |
| SRR826460 | 112.7 | 13.7 | 4.5 | 1.9 | 19.5 |
| (97.9%) | (0.5%) | (0.7%) | (0%) | (0.9%) | |
| SRR826471 | 104.2 | 12.4 | 7.5 | 1.9 | 22.8 |
| (84.9%) | (3.8%) | (2.5%) | (0%) | (8.8%) | |
| M130929 | 21.3 | 12.4 | 10.8 | 1.4 | 21.3 |
| (13.7%) | (39.7%) | (0.1%) | (2.6%) | (44%) |
Illumina and PacBio-like simulated data. Ten million paired-end reads of 100 bp, 150 bp and 300 bp and one million single-end reads of 7000 bp were simulated from human genome (Hg19) with wgsim simulator
| Synthetic datasets | Aligner | Precision | Recall | Runtime |
|---|---|---|---|---|
| Hg19_L100_E02 | Kart | 97.8 | 97.8 | 53 |
| Bowtie2 | 96.3 | 95.8 | 149 | |
| BWA-MEM | 98.6 | 98.6 | 403 | |
| Cushaw3 | 98.2 | 98.2 | 1412 | |
| HPG-Aligner | 97.7 | 97.5 | 146 | |
| HISAT2 | 95.3 | 92.7 | 78 | |
| Subread | 98.5 | 93.4 | 353 | |
| Hg19_L150_E02 | Kart | 98.4 | 98.4 | 66 |
| Bowtie2 | 96.2 | 96.2 | 266 | |
| BWA-MEM | 98.9 | 98.9 | 581 | |
| Cushaw3 | 98.6 | 98.6 | 1278 | |
| HPG-Aligner | 98.5 | 98.5 | 315 | |
| HISAT2 | 92.3 | 89.4 | 91 | |
| Subread | 98.0 | 96.9 | 474 | |
| Hg19_L300_E02 | Kart | 99.0 | 99.0 | 113 |
| Bowtie2 | 96.1 | 96.1 | 718 | |
| BWA-MEM | 99.2 | 99.2 | 1096 | |
| Cushaw3 | 99.1 | 99.1 | 3085 | |
| HPG-Aligner | 99.1 | 99.1 | 317 | |
| HISAT2 | 70.5 | 54.6 | 155 | |
| Subread | 98.8 | 98.8 | 774 | |
| Hg19_L7000_E15 | Kart | 99.6 | 99.6 | 733 |
| BWA-MEM | 99.8 | 99.8 | 4614 | |
| LAST | 99.9 | 99.4 | 78432 | |
| Minimap | 83.4 | 83.4 | 288 | |
| BLASR | 99.8 | 99.8 | 9185 |
Experiment result on the four real datasets with different read lengths
| Real datasets | Aligner | Sensitivity | Identical base pairs | MEM (Gb) | Runtime |
|---|---|---|---|---|---|
| SRR622458 Illumina-101 bp | Kart | 98.6 | 99 | 12 | 158 |
| Bowtie2 | 97.4 | 99 | 4.5 | 458 | |
| BWA-MEM | 98.8 | 97 | 8.5 | 1157 | |
| Cushaw3 | 99.1 | 98 | 4.8 | 9063 | |
| HPG-Aligner | NA | NA | 31.2 | NA | |
| HISAT2 | 86.0 | 99 | 5.5 | 298 | |
| Subread | 91.2 | 97 | 18.4 | 1362 | |
| SRR826460 Illumina-150 bp | Kart | 99.3 | 149 | 12 | 186 |
| Bowtie2 | 98.4 | 149 | 4.5 | 769 | |
| BWA-MEM | 99.3 | 147 | 8.5 | 1374 | |
| Cushaw3 | 99.3 | 148 | 4.8 | 10736 | |
| HPG-Aligner | 98.3 | 147 | 31.2 | 1204 | |
| HISAT2 | 91.9 | 149 | 5.5 | 371 | |
| Subread | 97.5 | 147 | 18.4 | 2694 | |
| SRR826471 Illumina-250 bp | Kart | 98.6 | 237 | 12 | 395 |
| Bowtie2 | 94.7 | 237 | 4.5 | 1729 | |
| BWA-MEM | 98.6 | 220 | 8.5 | 3027 | |
| Cushaw3 | 98.4 | 232 | 4.6 | 37689 | |
| HPG-Aligner | NA | NA | 27.9 | NA | |
| HISAT2 | 43.9 | 247 | 5.5 | 461 | |
| Subread | NA | NA | 18.4 | NA | |
| M130929 PacBio-7118 bp | Kart | 100.0 | 5152 | 13 | 1811 |
| BWA-MEM | 90.7 | 2953 | 9 | 7338 | |
| LAST | 97.2 | 5022 | 15 | 31295 | |
| BLASR | 97.8 | 5389 | 28.9 | 18682 |