| Literature DB >> 29523083 |
Haowen Zhang1, Yuandong Chan2, Kaichao Fan2, Bertil Schmidt3, Weiguo Liu4,5.
Abstract
BACKGROUND: Various indexing techniques have been applied by next generation sequencing read mapping tools. The choice of a particular data structure is a trade-off between memory consumption, mapping throughput, and construction time.Entities:
Keywords: Hash index; Next-generation sequencing; Read mapping; Seed selection
Mesh:
Year: 2018 PMID: 29523083 PMCID: PMC5845352 DOI: 10.1186/s12859-018-2094-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow of FEM
Fig. 2Example of seeding for index construction of the succinct hash index using l=7
Fig. 3Consider a mapping position p∈P2 of a read R in a reference genome sequence. We distinguish useful and useless seeds in R for searching the mapping position p using l=4
Fig. 4An illustration of retrieving all locations of a read with the group seeding algorithm using l=4
Fig. 5The length of the seed is 9 and the length of any sub-seeds of it is 6. Since l=4, 4 sub-seeds of the seed are generated. The occurred position of the seed can belong to any of the 4 position groups, which is showed in four cases. In any case, the occurred location can be retrieved with one of its sub-seeds from the succinct hash index
Index construction times (C-Time) and index sizes for hg19
| Mappers | C-Time | C-Time | index size |
|---|---|---|---|
| 1 thread (s) | 32 threads (s) | ||
| FEM ( | 202.7 | 52.9 | 5.3 GB |
| FEM ( | 133.6 | 28.1 | 3.5 GB |
| Hobbes3 | 558.6 | 249.9 | 11 GB |
| Bitmapper | 627.6 | - | 15 GB |
| Bitmapper2 | 519.8 | - | 4.9 GB |
Rabema benchmarking results for mapping 100k simulated reads of length 100 bps to hg19
| Mappers | Mapped | Accuracy | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| reads | All[%] | All-best[%] | Any-best[%] | ||||||||||
| FEM-vl | 99997 | 100.00 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 |
| 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | ||||||||
| FEM-g | 99997 | 99.9705 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 |
| 99.99 | 99.25 | 100.0 | 100.0 | 100.0 | 100.0 | ||||||||
| Hobbes3 | 99997 | 100.00 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 |
| 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | ||||||||
| BitMapper | 99997 | 99.9999 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 |
| 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | ||||||||
| BitMapper2 | 99997 | 99.9998 | 100.0 | 100.0 | 100.0 | 100.00 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | ||||||||
| Masai | 99995 | 99.9493 | 100.0 | 100.0 | 100.0 | 99.998 | 100.0 | 100.0 | 100.0 | 99.998 | 100.0 | 100.0 | 100.0 |
| 99.97 | 98.73 | 100.0 | 97.33 | 100.0 | 97.33 | ||||||||
| GEM | 99994 | 98.6008 | 100.0 | 99.99 | 99.80 | 99.994 | 100.0 | 99.99 | 99.98 | 99.997 | 100.0 | 100.0 | 100.0 |
| 89.97 | 70.24 | 100.0 | 96.00 | 100.0 | 96.00 | ||||||||
| BWA | 99990 | 92.2433 | 98.66 | 96.79 | 85.80 | 97.5195 | 97.46 | 97.70 | 97.35 | 99.985 | 100.0 | 99.97 | 99.93 |
| 36.11 | 1.83 | 98.03 | 96.15 | 99.74 | 97.33 | ||||||||
Results for mapping 5 million real reads of length 150 bps to the hg19 (ED 4)
| Mappers | 1-thread time | 8-thread time | 16-thread time | 32-thread time | Mapped reads |
|---|---|---|---|---|---|
| (s) | (s) | (s) | (s) | (#) | |
| FEM-vl | 1370 | 195 | 101 | 78 | 4615727 |
| FEM-g | 1224 | 176 | 91 | 71 | 4615701 |
| Hobbes3 | 2965 | 405 | 213 | 171 | 4615730 |
| Bitmapper | 1123 | 137 | 83 | 82 | 4615730 |
| Bitmapper2 | 3405 | 467 | 250 | 232 | 4615829 |
| Masai | 6556 | - | - | - | 4615694 |
Results for mapping 5 million real reads of length 250 bps to the hg19 (ED 7)
| Mappers | 1-thread time | 8-thread time | 16-thread time | 32-thread time | Mapped reads |
|---|---|---|---|---|---|
| (s) | (s) | (s) | (s) | (#) | |
| FEM-vl | 1085 | 150 | 78 | 56 | 4014477 |
| FEM-g | 893 | 126 | 64 | 48 | 4014476 |
| Hobbes3 | 8442 | 1160 | 594 | 529 | 4014477 |
| Bitmapper | 1089 | 149 | 134 | 135 | 4014477 |
| Bitmapper2 | 4957 | 680 | 353 | 310 | 4014474 |
| Masai | 11363 | - | - | - | 4014476 |
Results for mapping 20 million real reads of length 150 bps to the hg19 (ED 4)
| Mappers | FEM-g | FEM-vl | BitMapper2 | BitMapper | Hobbes3 |
|---|---|---|---|---|---|
| 32-thread time/s | 467 | 520 | 770 | 722 | 1189 |
Fig. 6Percentage of runtime spent on different stages of FEM. a, b: When mapping the 150-bp real read dataset with FEM-g (a) and FEM-vl (b). c, d: When mapping the 250-bp real read dataset with FEM-g (c) and FEM-vl (d)
Fig. 7The mapping time of FEM-g and FEM-vl with different indexing step size l
Fig. 8The number of candidate locations for FEM and Hobbes2 under different error thresholds