| Literature DB >> 31856811 |
Hongzhe Guo1, Bo Liu1, Dengfeng Guan1, Yilei Fu1, Yadong Wang2.
Abstract
BACKGROUND: Many genetic variants have been reported from sequencing projects due to decreasing experimental costs. Compared to the current typical paradigm, read mapping incorporating existing variants can improve the performance of subsequent analysis. This method is supposed to map sequencing reads efficiently to a graphical index with a reference genome and known variation to increase alignment quality and variant calling accuracy. However, storing and indexing various types of variation require costly RAM space.Entities:
Keywords: Landau-Vishkin algorithm; Seed-and-extension alignment; Variation-aware read alignment
Mesh:
Year: 2019 PMID: 31856811 PMCID: PMC6921400 DOI: 10.1186/s12911-019-0960-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Flowchart of the variation-aware alignment
Fig. 2A schematic illustration of the variation tree construction
Fig. 3A schematic illustration of the edit distance matrix. a The editing distance matrix Landau-Vishkin algorithm. b The three-dimensional editing distance matrix
Fig. 4A schematic illustration of the algorithm VARA processing
Fig. 5A schematic illustration of the type analysis of cigar restoration
Statistics on simulated human datasets
| Dataset | Aligner | Accuracy % a | Unmapped # b | Soft # c | Time-t1(s) | Time-t8(s) |
|---|---|---|---|---|---|---|
| Sim-i100 | deBGA-VARA | 99.9 | 517 | 1221 | 114 | 41 |
| deBGA | 99.9 | 526 | 4418 | 84 | 39 | |
| BWA-MEM | 99.9 | 0 | 40368 | 435 | 114 | |
| BWBBLE n =2 | 86.6 | 972083 | 0 | 958 | 295 | |
| BWBBLE n =6 | 95.2 | 92406 | 0 | 7378 | 1074 | |
| vg | 99.9 | 276 | 23878 | 10737 | 1986 | |
| Sim-i250 | deBGA-VARA | 99.9 | 32 | 291 | 212 | 70 |
| deBGA | 99.9 | 38 | 1421 | 184 | 61 | |
| BWA-MEM | 99.9 | 0 | 37830 | 924 | 182 | |
| BWBBLE n =6 | 80.0 | 398745 | 0 | 16483 | 2387 | |
| BWBBLE n =10 | 84.4 | 310725 | 0 | 53126 | 8897 | |
| vg | 99.9 | 12 | 25451 | 19164 | 3761 |
aThe mapping accuracy rate.
bNumber of unmapped reads.
cNumber of soft clipping reads
Fig. 6A schematic illustration of read alignments on the HTS dataset and Sim-i100 dataset. a Results on simulation dataset. b Results on HTS dataset. c Results on MHC region
Statistics on HTS human datasets
| Dataset | Aligner | Mapped % a | Unmapped # b | Soft # c | Time-t1(s) | Time-t8(s) |
|---|---|---|---|---|---|---|
| ERR174324 | deBGA-VARA | 98.1 | 37354 | 72301 | 565 | 79 |
| deBGA | 98.1 | 37479 | 87653 | 230 | 70 | |
| BWA-MEM | 99.1 | 13725 | 116316 | 590 | 78 | |
| BWBBLE n =2 | 84.5 | 309005 | 0 | 892 | 219 | |
| BWBBLE n =6 | 88.8 | 223736 | 0 | 6192 | 1125 | |
| vg | 98.1 | 28286 | 91585 | 13347 | 2092 |
aThe mapping sensitivity rate.
bNumber of unmapped reads.
cNumber of soft clipping reads
Statistics on MHC region of simulation and HTS datasets
| Aligner | Sim-i100 # a | Soft Sim-i100 # b | Sim-i250 # | Soft Sim-i250 # | HTS # | Soft HTS # |
|---|---|---|---|---|---|---|
| deBGA-VARA | 3855 | 0 | 3716 | 0 | 3114 | 0 |
| deBGA | 3750 | 40 | 3613 | 26 | 3025 | 96 |
| BWA-MEM | 3735 | 113 | 3618 | 93 | 3058 | 179 |
| BWBBLE n =2 | 3070 | 0 | 2742 | 0 | 2785 | 0 |
| BWBBLE n =6 | 3616 | 0 | 3157 | 0 | 3005 | 0 |
| vg | 3765 | 53 | 3620 | 61 | 3022 | 119 |
anumber of correct alignments in MHC region on Sim-i100 dataset.
bnumber soft clipping alignments in MHC region on Sim-i100