| Literature DB >> 26063651 |
Haoyu Cheng1,2, Huaipan Jiang3,4, Jiaoyun Yang5, Yun Xu6,7, Yi Shang8.
Abstract
BACKGROUND: As the next-generation sequencing (NGS) technologies producing hundreds of millions of reads every day, a tremendous computational challenge is to map NGS reads to a given reference genome efficiently. However, existing methods of all-mappers, which aim at finding all mapping locations of each read, are very time consuming. The majority of existing all-mappers consist of 2 main parts, filtration and verification. This work significantly reduces verification time, which is the dominant part of the running time.Entities:
Mesh:
Year: 2015 PMID: 26063651 PMCID: PMC4462005 DOI: 10.1186/s12859-015-0626-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The verification window in reference genome. It includes the candidate mapping locations of the read with edit distance up to 3
Fig. 2The computing area in dynamic programming matrix with the edit distance threshold k = 2. The initial cells should be set to 0 because the start locations of potential matches can not be known in advance
Fig. 3The data structure for the variables containing several bit vectors. Each pattern needs (2k+2) bits to represent itself
Fig. 4Performance for the vectorized Gene Myers’ bit-vector algorithm according to different n
Fig. 5Two vectorized verification schemes for the vectorized Gene Myers’ bit-vector algorithm. a A location in a reference genome corresponds to four reads. b A read corresponds to two locations in the reference genome
Rabema benchmark results (normalized found interval) for 100 k simulated reads
| Mapper | Time | Benchmark category | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [min:sec] | All[%] | All-best[%] | Any-best[%] | ||||||||||
| Bowtie2 a | 0:18 | 90.18 | 97.68 | 96.60 | 92.25 | 95.87 | 96.46 | 96.14 | 94.48 | 99.26 | 100.00 | 99.49 | 97.51 |
| 78.95 | 52.71 | 21.13 | 93.69 | 92.76 | 92.27 | 96.63 | 96.27 | 95.38 | |||||
| BWA b | 0:49 | 92.28 | 100.00 | 99.81 | 96.75 | 98.84 | 100.00 | 99.80 | 99.40 | 98.89 | 100.00 | 99.86 | 99.50 |
| 79.47 | 44.91 | 16.65 | 93.61 | 78.42 | 70.67 | 93.70 | 78.61 | 71.17 | |||||
| GEM c | 0:14 | 92.75 | 98.25 | 97.65 | 95.69 | 98.15 | 98.27 | 98.21 | 97.97 | 99.36 | 99.42 | 99.42 | 99.24 |
| 88.44 | 67.06 | 33.44 | 98.11 | 96.87 | 95.87 | 99.42 | 98.74 | 97.33 | |||||
| Bowtie 2 | — | 99.73 | 100.00 | 100.00 | 100.00 | 99.97 | 100.00 | 100.00 | 100.00 | 99.97 | 100.00 | 100.00 | 100.00 |
| 99.96 | 99.53 | 95.45 | 99.95 | 99.67 | 97.89 | 99.95 | 99.67 | 97.89 | |||||
| BWA | 40:32 | 97.80 | 100.00 | 99.97 | 99.62 | 98.95 | 100.00 | 99.97 | 99.62 | 98.95 | 100.00 | 99.97 | 99.62 |
| 94.26 | 83.53 | 75.30 | 93.82 | 79.03 | 70.93 | 93.82 | 79.16 | 71.17 | |||||
| GEM | 3:15 | 96.02 | 98.25 | 98.24 | 98.09 | 98.15 | 98.27 | 98.18 | 98.01 | 99.35 | 99.42 | 99.41 | 99.24 |
| 95.92 | 87.02 | 66.35 | 98.17 | 97.02 | 95.94 | 99.44 | 98.68 | 97.17 | |||||
| Masai | 17:43 | 99.86 | 100.00 | 100.00 | 100.00 | 99.96 | 100.00 | 100.00 | 100.00 | 99.97 | 100.00 | 100.00 | 100.00 |
| 99.87 | 99.54 | 97.87 | 99.85 | 99.29 | 98.59 | 99.85 | 98.34 | 98.70 | |||||
| Hobbes 2 | 7:51 | 99.82 | 99.98 | 99.97 | 99.97 | 99.97 | 99.98 | 99.97 | 99.97 | 99.99 | 100.00 | 99.99 | 99.98 |
| 99.98 | 99.87 | 97.20 | 99.97 | 99.98 | 99.80 | 99.97 | 100.00 | 99.92 | |||||
| mrFAST | 12:32 | 99.32 | 100.00 | 100.00 | 100.00 | 99.42 | 100.00 | 100.00 | 100.00 | 99.43 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.96 | 87.51 | 100.00 | 100.00 | 53.69 | 100.00 | 100.00 | 54.09 | |||||
| RazerS 3 d | 41:33 | 99.92 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.84 | 98.62 | 100.00 | 99.95 | 99.92 | 100.00 | 99.95 | 99.92 | |||||
| RazerS 3 | 54:59 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |||||
| Yara | 3:06 | — | — | — | — | — | — | — | — | — | — | — | — |
| — | — | — | — | — | — | — | — | — | |||||
| BitMapper | 2:57 | 99.99 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.99 | 99.98 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |||||
Bowtie2 a, BWA b, and GEM c represent the results in default sensitivity mode, while Bowtie2, BWA, and GEM represent the results in high sensitivity mode. The RazerS 3 d and RazerS 3 represent the results of RazerS 3 in default and full sensitivity mode, respectively. Note that in default mode, RazerS 3 is designed to find 99% of mapping locations, while Bowtie2, BWA, and GEM are designed to find the best mapping locations for each reads
Rabema benchmark results (normalized found interval) for 1 million 100 bp real reads of human
| Mapper | Time | Benchmark category | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [min:sec] | All[%] | All-best[%] | Any-best[%] | ||||||||||
| Masai | 42:50 | 99.94 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.97 | 98.93 | 100.00 | 99.99 | 99.70 | 100.00 | 99.99 | 99.80 | |||||
| Hobbes 2 | 60:05 | 99.89 | 99.99 | 99.97 | 99.97 | 99.98 | 99.98 | 99.98 | 99.99 | 99.99 | 99.99 | 99.99 | 100.00 |
| 99.96 | 99.91 | 98.43 | 99.99 | 99.99 | 99.91 | 100.00 | 100.00 | 99.99 | |||||
| mrFAST | 98:06 | 99.79 | 100.00 | 100.00 | 100.00 | 99.91 | 100.00 | 100.00 | 100.00 | 99.92 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.97 | 96.45 | 100.00 | 99.96 | 93.61 | 100.00 | 99.97 | 93.88 | |||||
| RazerS 3 a | 372:21 | 99.90 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.80 | 98.45 | 100.00 | 99.90 | 99.47 | 100.00 | 99.91 | 99.70 | |||||
| RazerS 3 | 512:46 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |||||
| Yara | 29:25 | — | — | — | — | — | — | — | — | — | — | — | — |
| — | — | — | — | — | — | — | — | — | |||||
| BitMapper | 17:03 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.99 | 99.98 | 100.00 | 100.00 | 99.98 | 100.00 | 100.00 | 99.99 | |||||
RazerS 3 a: the result of RazerS 3 in default sensitivity mode (i.e., finding 99% of mapping locations); RazerS 3: the result of RazerS 3 in full sensitivity mode (i.e., finding 100% of mapping locations)
Rabema benchmark results (normalized found interval) for 1 million 100 bp real reads of caenorhabditis elegans
| Mapper | Time | Benchmark category | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [min:sec] | All[%] | All-best[%] | Any-best[%] | ||||||||||
| Masai | 3:02 | 99.93 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 99.92 | 99.67 | 98.14 | 99.98 | 99.93 | 99.78 | 99.99 | 99.95 | 99.89 | |||||
| Hobbes 2 | 2:01 | 99.94 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 99.99 | 99.86 | 98.18 | 99.99 | 99.96 | 99.72 | 99.99 | 99.99 | 99.97 | |||||
| mrFAST | 3:40 | 98.89 | 100.00 | 100.00 | 100.00 | 99.95 | 100.00 | 100.00 | 100.00 | 99.96 | 100.00 | 100.00 | 100.00 |
| 99.99 | 99.99 | 96.50 | 99.99 | 100.00 | 93.43 | 99.99 | 100.00 | 93.89 | |||||
| RazerS 3 a | 7:18 | 99.95 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 99.99 | 99.79 | 98.64 | 99.99 | 99.89 | 99.61 | 99.99 | 99.95 | 99.84 | |||||
| RazerS 3 | 7:53 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |||||
| Yara | 1:25 | — | — | — | — | — | — | — | — | — | — | — | — |
| — | — | — | — | — | — | — | — | — | |||||
| BitMapper | 0:32 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 100.00 | 99.99 | 99.98 | 100.00 | 99.99 | 99.98 | 100.00 | 99.99 | 99.98 | |||||
RazerS 3 a: the result of RazerS 3 in default sensitivity mode (i.e., finding 99% of mapping locations); RazerS 3: the result of RazerS 3 in full sensitivity mode (i.e., finding 100% of mapping locations)
Rabema benchmark results (normalized found interval) for 1 million 101 bp real reads of arabidopsis thaliana
| Mapper | Time | Benchmark category | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| [min:sec] | All[%] | All-best[%] | Any-best[%] | ||||||||||
| Masai | 3:05 | 99.96 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 99.97 | 99.96 | 99.25 | 99.98 | 99.96 | 99.27 | 99.98 | 99.96 | 99.53 | |||||
| Hobbes 2 | 1:52 | 99.92 | 100.00 | 100.00 | 99.99 | 99.99 | 100.00 | 100.00 | 99.99 | 99.99 | 100.00 | 100.00 | 99.99 |
| 99.97 | 99.88 | 98.72 | 99.98 | 99.99 | 99.64 | 99.99 | 100.00 | 99.98 | |||||
| mrFAST | 2:30 | 99.88 | 100.00 | 100.00 | 100.00 | 99.98 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 99.99 | 99.99 | 97.94 | 99.99 | 100.00 | 97.70 | 100.00 | 100.00 | 98.50 | |||||
| RazerS 3 a | 8:30 | 99.88 | 100.00 | 100.00 | 100.00 | 99.98 | 100.00 | 100.00 | 100.00 | 99.99 | 100.00 | 100.00 | 100.00 |
| 99.99 | 99.70 | 98.12 | 99.98 | 99.70 | 98.63 | 99.99 | 99.84 | 99.27 | |||||
| RazerS 3 | 9:06 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |||||
| Yara | 1:25 | — | — | — | — | — | — | — | — | — | — | — | — |
| — | — | — | — | — | — | — | — | — | |||||
| BitMapper | 0:32 | 99.99 | 100.00 | 100.00 | 99.99 | 99.99 | 100.00 | 100.00 | 99.99 | 99.99 | 100.00 | 100.00 | 99.99 |
| 99.98 | 100.00 | 99.99 | 99.99 | 100.00 | 99.99 | 99.99 | 100.00 | 99.99 | |||||
RazerS 3 a: the result of RazerS 3 in default sensitivity mode (i.e., finding 99% of mapping locations); RazerS 3: the result of RazerS 3 in full sensitivity mode (i.e., finding 100% of mapping locations)
Results for mapping 10 million 100 bp and 151 bp single-end reads against human genome
| 100 bp reads | 151 bp reads | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mapper | Time[min:sec] | Peak | Mapping | Mapped | Time [min:sec] | Peak | Mapping | Mapped | ||||
| 1 thr | 8 thr | memory | sites[million] | reads[%] | 1 thr | 8 thr | memory | sites[million] | reads[%] | |||
| Masai | 361:35 | — | 20.1GB | 1371.18 | 92.2736 | 602:06 | — | 21.3GB | 939.89 | 93.8483 | ||
| Hobbes 2 | 587:04 | 135:10 | 14.0GB | 1368.86 | 92.2767 | 694:53 | 151:52 | 14.5GB | 936.42 | 93.8481 | ||
| mrFAST | 921:46 | — | 4.9GB | 1374.76 | 92.2572 | 795:59 | — | 6.5GB | 939.48 | 93.7376 | ||
| RazerS 3 a | — | — | >24GB | — | — | — | — | >24GB | — | — | ||
| RazerS 3 | — | — | >24GB | — | — | — | — | >24GB | — | — | ||
| Yara | 278:09 | 78:10 | 9.0GB | 1367.42 | 92.2658 | 389:56 | 93:15 | 9.3GB | 939.44 | 93.8480 | ||
| BitMapper | 158:57 | 32:59 | 17.9GB | 1375.68 | 92.2771 | 135:06 | 27:56 | 19.2GB | 940.16 | 93.8487 | ||
RazerS 3 a: the result of RazerS 3 in default sensitivity mode (i.e., finding 99% of mapping locations); RazerS 3: the result of RazerS 3 in full sensitivity mode (i.e., finding 100% of mapping locations)
Results for mapping 10 million 100 bp single-end reads against caenorhabditis elegans and arabidopsis thaliana
| Caenorhabditis elegans | Arabidopsis thaliana | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mapper | Time [min:sec] | Peak | Mapping | Mapped | Time [min:sec] | Peak | Mapping | Mapped | ||||
| 1 thr | 8 thr | memory | sites[million] | reads[%] | 1 thr | 8 thr | memory | sites[million] | reads[%] | |||
| Masai | 22:28 | — | 3.2GB | 54.61 | 90.4140 | 21:06 | — | 3.3GB | 57.83 | 98.2578 | ||
| Hobbes 2 | 16:51 | 4:42 | 0.9GB | 55.40 | 90.4150 | 16:05 | 3:42 | 1.0GB | 57.76 | 98.2616 | ||
| mrFAST | 35:15 | — | 4.2GB | 55.60 | 90.4119 | 23:12 | — | 4.3GB | 57.94 | 98.2609 | ||
| RazerS 3 a | 69:24 | 59:39 | 12.0GB | 55.24 | 90.4118 | 86:39 | 72:31 | 10.3GB | 57.49 | 98.2551 | ||
| RazerS 3 | 75:27 | 61:40 | 12.6GB | 55.61 | 90.4154 | 89:31 | 75:26 | 10.4GB | 57.96 | 98.2622 | ||
| Yara | 13:42 | 3:37 | 1.1GB | 54.65 | 90.4150 | 15:13 | 4:01 | 1.2GB | 57.87 | 98.2608 | ||
| BitMapper | 5:08 | 1:25 | 4.5GB | 55.63 | 90.4159 | 5:24 | 1:30 | 4.5GB | 57.94 | 98.2631 | ||
RazerS 3 a: the result of RazerS 3 in default sensitivity mode (i.e., finding 99% of mapping locations); RazerS 3: the result of RazerS 3 in full sensitivity mode (i.e., finding 100% of mapping locations)
Results for mapping 10 million 300bp single-end reads against caenorhabditis elegans and arabidopsis thaliana
| Caenorhabditis elegans | Arabidopsis thaliana | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mapper | Time [min:sec] | Peak | Mapping | Mapped | Time [min:sec] | Peak | Mapping | Mapped | ||||
| 1 thr | 8 thr | memory | sites[million] | reads[%] | 1 thr | 8 thr | memory | sites[million] | reads[%] | |||
| Masai | 48:54 | — | 11.5GB | 17.44 | 99.9894 | 46:26 | — | 11.8GB | 14.83 | 99.9884 | ||
| Hobbes 2 | 66:38 | 13:12 | 0.9GB | 2.14 | 0.5327 | 64:25 | 12:50 | 1.0GB | 0.01 | 0.0219 | ||
| mrFAST | 80:56 | — | 9.9GB | 16.71 | 96.1888 | 47:00 | — | 10.0GB | 14.27 | 96.2356 | ||
| RazerS 3 a | 195:07 | 182:34 | 11.9GB | 17.43 | 99.9894 | 172:21 | 155:48 | 12.0GB | 14.82 | 99.9884 | ||
| RazerS 3 | 209:30 | 185:29 | 12.6GB | 17.44 | 99.9894 | 185:05 | 160:02 | 12.0GB | 14.83 | 99.9884 | ||
| Yara | 34:20 | 7:43 | 2.1GB | 17.33 | 99.9894 | 29:44 | 6:34 | 2.1GB | 14.72 | 99.9884 | ||
| BitMapper | 12:26 | 4:57 | 10.1GB | 17.43 | 99.9894 | 12:10 | 4:55 | 10.2GB | 14.83 | 99.9884 | ||
RazerS 3 a: the result of RazerS 3 in default sensitivity mode (i.e., finding 99% of mapping locations); RazerS 3: the result of RazerS 3 in full sensitivity mode (i.e., finding 100% of mapping locations)
Results for mapping 10 million paired-end reads
| Human | Caenorhabditis elegans | Arabidopsis thaliana | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mapper | Time [min:sec] | Peak | Mapped | Time [min:sec] | Peak | Mapped | Time [min:sec] | Peak | Mapped | |||||
| 1 thr | 8 thr | memory | pairs[%] | 1 thr | 8 thr | memory | pairs[%] | 1 thr | 8 thr | memory | pairs[%] | |||
| Masai | 464:07 | — | 16.8GB | 84.8984 | 31:07 | — | 11.3GB | 65.8674 | 29:40 | — | 11.6GB | 64.9149 | ||
| Hobbes 2 | 439:05 | 105:29 | 14.6GB | 87.3945 | 80:04 | 22:41 | 0.9GB | 67.1739 | 23:59 | 6:21 | 1.0GB | 68.1224 | ||
| RazerS 3 a | — | — | >24GB | — | 61:15 | 47:11 | 16.4GB | 67.1841 | 51:25 | 41:31 | 14.9GB | 68.1250 | ||
| RazerS 3 | — | — | >24GB | — | 66:28 | 50:13 | 17.4GB | 67.1894 | 55:33 | 42:38 | 17.1GB | 68.1473 | ||
| Yara | 489:58 | 117:40 | 13.2GB | 87.1614 | 23:43 | 5:47 | 2.0GB | 67.1058 | 28:09 | 6:52 | 2.2GB | 66.8150 | ||
| BitMapper | 177:47 | 39:39 | 21.5GB | 87.4233 | 11:16 | 3:15 | 8.0GB | 67.1883 | 6:47 | 2:20 | 8.1GB | 68.1500 | ||
RazerS 3 a: the result of RazerS 3 in default sensitivity mode (i.e., finding 99% of mapping locations); RazerS 3: the result of RazerS 3 in full sensitivity mode (i.e., finding 100% of mapping locations)