| Literature DB >> 29568413 |
Hojun Lee1, Ki-Wook Lee1,2, Taeseob Lee1, Donghyun Park1, Jongsuk Chung1,3, Chung Lee1,4, Woong-Yang Park1,3,4, Dae-Soon Son1.
Abstract
In addition to the rapid advancement in Next-Generation Sequencing (NGS) technology, clinical panel sequencing is being used increasingly in clinical studies and tests. However, tools that are used in NGS data analysis have not been comparatively evaluated in performance for panel sequencing. This study aimed to evaluate the tools used in the alignment process, the first procedure in bioinformatics analysis, by comparing tools that have been widely used with ones that have been introduced recently. With the accumulated panel sequencing data, detected variant lists were cataloged and inserted into simulated reads produced from the reference genome (h19). The amount of unmapped reads and misaligned reads, mapping quality distribution, and runtime were measured as standards for comparison. As the most widely used tools, Bowtie2 and BWA-MEM each showed explicit performance with AUC of 0.9984 and 0.9970 respectively. Kart, maintaining superior runtime and less number of misaligned read, also similarly possessed high level of AUC (0.9723). Such selection and optimization method of tools appropriate for panel sequencing can be utilized for fields requiring error minimization, such as clinical application and liquid biopsy studies.Entities:
Keywords: Alignment tool; CancerSCAN; Clinical panel sequencing; Next-Generation Sequencing; Read mapping
Mesh:
Year: 2017 PMID: 29568413 PMCID: PMC5846869 DOI: 10.1007/s13258-017-0621-9
Source DB: PubMed Journal: Genes Genomics ISSN: 1976-9571 Impact factor: 1.839
Number of SNVs and InDels inserted in simulated FASTQ sets
| Variant type | Set 1 | Set 2 | Set 3 | Set 4 | Set 5 | Set 6 | Set 7 | Set 8 |
|---|---|---|---|---|---|---|---|---|
| SNV | 0 | 20 | 40 | 80 | 160 | 230 | 300 | 4111 |
| Insertion | 0 | 1 | 5 | 10 | 20 | 35 | 48 | 48 |
| Deletion | 0 | 2 | 5 | 10 | 20 | 35 | 53 | 53 |
List of alignment tool and features
| Alignment tool | Versiona | Citationb | Published year | Citations/year | Mismatch | InDels | Gaps | MQ range |
|---|---|---|---|---|---|---|---|---|
| BatAlign | v1 | 3 | 2015 | 1.5 | 5 | Y | 200 | 0–60 |
| Bowtie 2 | v2.3.2 | 7227 | 2009 | 843.6 | Score | Score | Y | 0–42 |
| BWA-MEM | v0.7.15 | 8494 | 2013 | 1035.5 | Y | 8 | Y | 0–60 |
| BWA-PSSM | v0.7.8 | 15 | 2014 | 4.3 | Y | 8 | Y | 0–200 |
| CUSHAW3 | v3.0.3 | 177 | 2014 | 48.1 | Y | Y | Y | 0–250 |
| Kart | v2.2.1 | 0 | 2017 | 0 | Y | 5 | 5 | 0–60 |
| NextGenMap | v0.5.0 | 55 | 2013 | 14.1 | Score | Score | Y | 0–60 |
| NovoAlignc | v3.08.00 | – | 2014 | – | 8 | 7 | Y | 0–70 |
Mismatch and InDels column shows the number of mismatches and InDels allowed in the alignment by default. Score indicates that the mapper uses score function. Gap column shows if consecutive InDels are permitted in alignment, as if possible the length of gaps in base pair. Yes is abbreviated as Y
aVersion: the tool versions used were the latest versions as of August 23, 2017
bCitation: the number of citations of tool publications was obtained from Web of Science on September 28, 2017
cNovoAlign: NovoAlign is not published and can be accessed through http://www.novocraft.com. The published year for NovoAlign is the year of its first version
Fig. 1Number of misaligned reads for each simulated FASTQ set. The average number of misaligned reads obtained by comparing the alignment result with the original position of the reads. The detailed average and standard deviation are listed in Supplementary Table 4
Fig. 2Mapping quality distribution for aligned and misaligned reads. Mapping quality for aligned and misaligned reads was calculated and the reads were grouped into six categories according to their scores. The solid bars indicate the properly aligned reads and the dashed boxes indicate the misaligned reads. The mapping quality range for all tools was equalized from 0 to 60 using linear transformation
Fig. 3ROC analysis of read mapping quality in Set 8. ROC curve and the corresponding AUC was displayed for the mapping quality result on the alignment result of Set 8 for each tool
Fig. 4Runtime of tools for different simulated FASTQ sets. Each set had three repetitions and used four threads when aligning the reads. NovoAlign was the only tool that used a single thread
Characteristics of misaligned reads in Set 1
| Set 1 | Count of misaligned reads | Percentage of misaligned read | ||||
|---|---|---|---|---|---|---|
| Total misaligned | Identical sequence | Different sequence | Total misaligned (%) | Identical sequence (%) | Different sequence (%) | |
| BatAlign | 154,232 ± 348.7 | 65041 ± 304.3 | 89,191 ± 47.1 | 100.00 | 42.2 ± 0.1 | 57.8 ± 0.1 |
| Bowtie 2 | 112,071 ± 278.9 | 65237 ± 361.1 | 46,834 ± 177.6 | 100.00 | 58.2 ± 0.2 | 41.8 ± 0.2 |
| BWA-MEM | 110,809 ± 374.6 | 64787 ± 124.8 | 46,022 ± 331.5 | 100.00 | 58.5 ± 0.2 | 41.5 ± 0.2 |
| BWA-PSSM | 186,931 ± 737.5 | 34672 ± 157.6 | 152,259 ± 813.7 | 100.00 | 18.5 ± 0.1 | 81.5 ± 0.1 |
| CUSHAW3 | 2,099,439 ± 3718.2 | 139615 ± 44.6 | 1,959,824 ± 3732.9 | 100.00 | 6.7 ± 0.0 | 93.3 ± 0.0 |
| Kart | 103,813 ± 46 | 65261 ± 113.1 | 38,552 ± 137.6 | 100.00 | 62.9 ± 0.1 | 37.1 ± 0.1 |
| NextGenMap | 277,783 ± 315.2 | 63498 ± 80.4 | 214,285 ± 306.3 | 100.00 | 22.9 ± 0.0 | 77.1 ± 0.0 |
| NovoAlign | 32,455 ± 149.8 | 104 ± 19 | 32,351 ± 158 | 100.00 | 0.3 ± 0.1 | 99.7 ± 0.1 |
Misaligned reads are classified as identical sequence when the read sequence is identical to the reference sequence of the aligned position. If the two sequences are different, such misaligned reads are classified as different sequence
The counts of misaligned reads were repeated three times and the data shows the mean ± standard deviation