| Literature DB >> 24499321 |
Jongik Kim, Chen Li, Xiaohui Xie1.
Abstract
BACKGROUND: Next-generation sequencing (NGS) enables rapid production of billions of bases at a relatively low cost. Mapping reads from next-generation sequencers to a given reference genome is an important first step in many sequencing applications. Popular read mappers, such as Bowtie and BWA, are optimized to return top one or a few candidate locations of each read. However, identifying all mapping locations of each read, instead of just one or a few, is also important in some sequencing applications such as ChIP-seq for discovering binding sites in repeat regions, and RNA-seq for transcript abundance estimation.Entities:
Mesh:
Year: 2014 PMID: 24499321 PMCID: PMC3927682 DOI: 10.1186/1471-2105-15-42
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Excerpt of a reference sequence and a portion of its 5-gram inverted index. The inverted lists of the 5-grams ACGGT, CGGTC, and ACCCT are shown, each containing a sorted list of locations in the reference sequence where the respective 5-gram appears.
Figure 2Filtering effect of an additional prefix-gram. Gray-scaled areas indicate candidates. (a) An additional prefix q-gram g3 plays an important role of filtering out a number of false positives in E and F. (b) If we use k+1=2q-grams, g1 and g2, much more candidates are generated.
Figure 3Problems caused by indels.(a) Indels occurring between two matched q-grams (b) Deletions occurring before any matched q-grams. (c) Insertions occurring before any matched q-grams. (d) Verification windows of a semi-global alignment algorithm.
Rabema benchmark results of mapping simulated 100k reads of length 100bp against HG18
| | | |||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Hobbes2 | 9:43 | 1:33 | | 99.85 | 100.0 | 100.0 | 100.0 | 99.99 | 100.0 | 100.0 | 100.0 | 99.99 | 100.0 | 100.0 | 100.0 | 98.97 | 100.0 | 99.90 | 99.68 | 14.6 GB |
| | | | | | 99.99 | 99.94 | 97.48 | | 100.0 | 100.0 | 99.84 | | 100.0 | 100.0 | 99.84 | | 99.34 | 99.04 | 99.77 | |
| Hobbes | 19:36 | 3:35 | | 98.34 | 99.29 | 99.28 | 98.93 | 98.67 | 98.86 | 99.02 | 99.00 | 98.99 | 99.19 | 99.31 | 99.34 | 96.91 | 98.66 | 97.99 | 96.68 | 20.7 GB |
| | | | | | 97.40 | 93.78 | 87.84 | | 98.19 | 92.85 | 89.21 | | 98.55 | 93.25 | 90.14 | | 95.33 | 91.68 | 90.17 | |
| Masai | 18:11 | − | | 99.83 | 100.0 | 100.0 | 100.0 | 99.94 | 100.0 | 100.0 | 100.0 | 99.94 | 100.0 | 100.0 | 100.0 | 99.03 | 100.0 | 100.0 | 100.0 | 16.9 GB |
| | | | | | 99.73 | 99.18 | 97.69 | | 99.69 | 98.73 | 98.52 | | 99.69 | 98.73 | 98.52 | | 99.71 | 98.77 | 98.56 | |
| RazerS3 | 60:06 | 42:07 | | 99.90 | 100.0 | 100.0 | 100.0 | 99.99 | 100.0 | 100.0 | 100.0 | 99.99 | 100.0 | 100.0 | 100.0 | 99.09 | 100.0 | 100.0 | 100.0 | 4.5 GB |
| | | | | | 100.0 | 99.86 | 98.44 | | 100.0 | 100.0 | 99.92 | | 100.0 | 100.0 | 99.92 | | 100.0 | 100.0 | 99.92 | |
| Bowtie2 | − | 266:21 | | 99.74 | 100.0 | 100.0 | 100.0 | 99.97 | 100.0 | 100.0 | 100.0 | 99.97 | 100.0 | 100.0 | 100.0 | 98.80 | 100.0 | 99.70 | 99.40 | 37.7 GB |
| | | | | | 100.0 | 99.55 | 95.75 | | 100.0 | 99.70 | 98.35 | | 100.0 | 99.72 | 98.45 | | 99.10 | 98.70 | 98.50 | |
| BWA | 75:04 | 12:20 | | 97.73 | 100.0 | 99.98 | 99.64 | 98.89 | 100.0 | 99.98 | 99.61 | 98.90 | 100.0 | 99.98 | 99.61 | 97.91 | 100.0 | 99.98 | 99.45 | 4.8 GB |
| | | | | | 93.47 | 82.91 | 75.15 | | 93.03 | 78.87 | 70.57 | | 93.03 | 78.98 | 70.73 | | 92.47 | 78.55 | 71.18 | |
| GEM | 5:19 | 2:56 | | 97.74 | 100.0 | 99.99 | 99.84 | 99.86 | 100.0 | 99.88 | 99.81 | 99.92 | 100.0 | 99.96 | 99.93 | 98.66 | 100.0 | 99.42 | 99.12 | 4.3 GB |
| | | | | | 97.36 | 88.78 | 68.31 | | 99.47 | 99.28 | 97.34 | | 99.69 | 99.61 | 97.67 | | 98.17 | 98.29 | 98.64 | |
| Bowtie2* | 0:31 | 0:32 | | 91.34 | 98.87 | 97.75 | 93.55 | 97.08 | 97.65 | 97.33 | 95.69 | 99.29 | 100.0 | 99.45 | 97.65 | 95.96 | 97.75 | 96.88 | 95.00 | 3.2 GB |
| | | | | | 81.07 | 53.90 | 21.95 | | 95.38 | 93.98 | 93.74 | | 97.41 | 96.24 | 95.89 | | 94.60 | 93.33 | 93.95 | |
| BWA* | 2:08 | 0:25 | | 92.27 | 100.0 | 99.82 | 96.90 | 98.79 | 100.0 | 99.83 | 99.41 | 98.83 | 100.0 | 99.89 | 99.49 | 97.31 | 100.0 | 99.17 | 97.76 | 4.5 GB |
| | | | | | 79.11 | 45.49 | 16.99 | | 92.57 | 78.26 | 70.34 | | 92.70 | 78.60 | 70.73 | | 90.39 | 77.11 | 70.35 | |
| GEM* | 0:31 | 0:13 | | 94.48 | 100.0 | 99.38 | 97.61 | 99.86 | 100.0 | 99.88 | 99.81 | 99.92 | 100.0 | 99.95 | 99.92 | 98.62 | 100.0 | 99.28 | 99.06 | 4.3 GB |
| 90.10 | 69.11 | 35.34 | 99.41 | 99.17 | 97.37 | 99.72 | 99.61 | 97.75 | 98.24 | 98.35 | 98.94 | |||||||||
all: all mappings within the given edit distance threshold; all-best: all best mappings (i.e., all mappings with lowest edit distances); any-best: any best mappings (i.e., any mapping with lowest edit distances).
Results of mapping 500k and 1 million single end reads of length 100bp against HG18
| | | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| | | ||||||||||
| Hobbes2 | 91.476% | 66.34 | 44:54 | 05:17 | 14.7 GB | | 91.558% | 132.87 | 87:27 | 09:04 | 14.7 GB |
| Hobbes | 91.449% | 66.93 | 84:38 | 13:10 | 21.5 GB | | 91.533% | 134.14 | 169:50 | 26:33 | 22.8 GB |
| Masai | 91.473% | 66.44 | 47:38 | − | 17.1 GB | | 91.555% | 133.09 | 82:46 | − | 17.3 GB |
| RazerS3 | 91.472% | 66.10 | 276:00 | 193:19 | 10.8 GB | 91.554% | 132.45 | 540:35 | 378:18 | 18.8 GB | |
Results of mapping 1 million single end reads of length 100bp against C. elegans and D. melanogaster
| | | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| | | ||||||||||
| Hobbes2 | 91.003% | 5.71 | 03:09 | 00:41 | 0.8 GB | | 95.470% | 438.33 | 79:35 | 28:46 | 1.2 GB |
| Hobbes | 90.994% | 5.84 | 04:26 | 01:06 | 1.4 GB | | 95.436% | 453.43 | 90:11 | 57:01 | 29.1 GB |
| Masai | 91.002% | 5.68 | 05:19 | − | 0.9 GB | | 95.466% | 446.98 | 131:11 | − | 1.3 GB |
| RazerS3 | 91.002% | 5.69 | 13:28 | 12:35 | 1.4 GB | − | − | − | − | 96.5 GB | |
Filtration of 500k reads of length 100bp on HG18
| Hobbes2 | 04:14 | 1,161,828,591 |
| Hobbes | 01:45 | 3,833,554,010 |
| Masai | 09:48 | 1,190,600,997 |
| RazerS3 | 15:01 | 7,007,527,711 |
Results of mapping 1 million × 2 paired end reads of length 100bp against HG18
| | ||||
|---|---|---|---|---|
| Hobbes2 | 86.66% | 59:40 | 11:12 | 14.9 GB |
| Hobbes | 86.52% | 61:54 | 24:43 | 20.4 GB |
| Masai | 84.07% | 68:46 | − | 17.3 GB |
| RazerS3 | 86.68% | 420:07 | 342:14 | 17.5 GB |
| Bowtie2* | 82.12% | 8:40 | 0:52 | 3.6 GB |