| Literature DB >> 24962530 |
Zhen Xuan Yeo, Joshua Chee Leong Wong, Steven G Rozen1, Ann Siew Gek Lee.
Abstract
BACKGROUND: The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing.Entities:
Mesh:
Year: 2014 PMID: 24962530 PMCID: PMC4079958 DOI: 10.1186/1471-2164-15-516
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Examples to illustrate base calling errors associated with homopolymers generated by PGM sequencing. A: An example of a homopolymer indel error illustrated with the PGM Ionogram. An ionogram is a graphical representation that demonstrates the conversion of PGM sequencing output to read sequences. The x-axis indicates the nucleotides along the read sequence. The y-axis indicates the number of consecutively identical nucleotide. One peak in the ionogram (arrowed) has a peak height of between three and four 'C' bases which suggests that the read sequence at this region could be ‘CCC’ or ‘CCCC’. During read alignment, if the reference sequence has four 'C' bases in this region, a deletion might be generated by reads with three 'C' bases. B: The top panel represents an IGV snapshot that indicates the read alignment of a DNA region with no indel generated by SOLiD sequencing. The bottom panel shows a “deletion” detected using PGM resequencing for the same region as the top panel.
Comparison of indel calling in the 6 training samples using different variant calling workflows, without subsequent filtering
| Read mapper | Variant caller | FP a | FN a | TP a | TN a | Sensitivity [95% CI] | Specificity [95% CI] | FDR [95% CI] |
|---|---|---|---|---|---|---|---|---|
| TMAP-TS2.0 | TSVC2.0 | 0 | 2 | 1 | 96135 | 33.33% [3.87, 82.33] | 100% [100, 100] | 0% [0, 77.15] |
| TMAP-TS2.2 | TSVC2.2 | 0 | 2 | 1 | 96135 | 33.33% [3.87, 82.33] | 100% [100, 100] | 0% [0, 77.15] |
| TMAP-TS3.4 | TSVC3.4 | 8 | 1 | 2 | 96127 | 66.67% [17.67, 96.13] | 99.99% [99.98, 100] | 80% [49.72, 95.59] |
| TMAP-TS2.0 | GATK | 4 | 1 | 2 | 96131 | 66.67% [17.67, 96.13] | 99.99% [99.99, 100] | 66.67% [28.64, 92.32] |
| TMAP-TS2.2 | GATK | 9 | 1 | 2 | 96126 | 66.67% [17.67, 96.13] | 99.99% [99.98, 100] | 81.82% [53.28, 96.02] |
| *TMAP-TS3.4 | GATK | 5 | 0 | 3 | 96130 | 100% [55.59, 100] | 99.99% [99.99, 100] | 62.5% [29.48, 88.1] |
| TMAP-TS2.0 | SAMtools | 0 | 3 | 0 | 96135 | 100% [55.59, 100] | 99.97% [99.96, 99.98] | 90.62% [77.05, 97.29] |
| TMAP-TS2.2 | SAMtools | 39 | 3 | 0 | 96096 | 100% [55.59, 100] | 99.99% [99.98, 99.99] | 81.25% [57.92, 94.42] |
| *TMAP-TS3.4 | SAMtools | 17 | 0 | 3 | 96118 | 100% [55.59, 100] | 99.98% [99.97, 99.99] | 85% [65.14, 95.59] |
| *BWA | GATK | 1 | 0 | 3 | 96134 | 100% [55.59, 100] | 99.99% [99.99, 100] | 25% [2.85, 71.62] |
| *BWA | SAMtools | 20 | 0 | 3 | 96115 | 100% [55.59, 100] | 99.98% [99.97, 99.99] | 86.96% [69.13, 96.19] |
We considered all bases in coding exons. Across the 6 samples the total number of bases considered was 96,138.
aFP = False Positives; FN = False Negatives; TP = True Positive; TN = True Negatives.
*Workflow with 100% sensitivity.
Figure 2Characteristics of true (T) and false (F) positive indels. Four panels show the boxplot distributions of BAF, QUAL, QD and VARW for true (blue) and false (red) positive indels detected by different indel calling workflows indicated at the top of the panels. The false positive indels detected by workflows using GATK as variant caller show higher average BAF and average QUAL than the values of true positive indels. Only QD and VARW had a consistent trend detected by all workflows, with true positive indels having a higher average QD and lower average VARW than the values of false positive indels.
Comparison of indel calling in the 6 training samples using different workflows with QD and VARW filters
| Read mapper | Variant caller | QD th | VARW th | FP a | FN a | TP a | TN a | Sensitivity [95% CI] | Specificity [95% CI] | FDR [95% CI] |
|---|---|---|---|---|---|---|---|---|---|---|
| TMAP-TS3.4 | GATK | 2.5 | 0 | 1 | 0 | 3 | 96134 | 100% [55.59, 100] | 99.99% [99.99, 100] | 25% [2.85, 71.62] |
| TMAP-TS3.4 | SAMtools | 1 | 0 | 0 | 0 | 3 | 96135 | 100% [55.59, 100] | 100% [100, 100] | 0% [0, 44.41] |
| BWA | GATK | 2.5 | 0 | 0 | 0 | 3 | 96135 | 100% [55.59, 100] | 100% [100, 100] | 0% [0, 44.41] |
| BWA | SAMtools | 1 | 0 | 0 | 0 | 3 | 96135 | 100% [55.59, 100] | 100% [100, 100] | 0% [0, 44.41] |
We considered all bases in coding exons. Across the 6 samples the total number of bases considered was 96,138.
aFP = False Positives; FN = False Negatives; TP = True Positive; TN = True Negatives.
Comparison of indel calling in the 17 additional test samples using different workflows with QD and VARW filters
| Read mapper | Variant caller | QD th | VARW th | FP a | FN a | TP a | TN a | Sensitivity [95% CI] | Specificity [95% CI] | FDR [95% CI] |
|---|---|---|---|---|---|---|---|---|---|---|
| TMAP-TS3.4 | GATK | 2.5 | 0 | 25 | 0 | 2 | 272364 | 100% [43.07, 100] | 99.99% [99.99, 99.99] | 92.59% [78.3, 98.43] |
| TMAP-TS3.4 | SAMtools | 1 | 0 | 2 | 0 | 2 | 272387 | 100% [43.07, 100] | 99.99% [99.99, 100] | 50% [12.28, 87.72] |
| BWA | GATK | 2.5 | 0 | 14 | 0 | 2 | 272375 | 100% [43.07, 100] | 99.99% [99.99, 100] | 87.5% [65.58, 97.31] |
| BWA | SAMtools | 1 | 0 | 4 | 0 | 2 | 272385 | 100% [43.07, 100] | 99.99% [99.99, 100] | 66.67% [28.64, 92.32] |
We considered all bases in coding exons. Across the 17 samples the total number of bases considered was 272,391.
aFP = False Positives; FN = False Negatives; TP = True Positive; TN = True Negatives.
Figure 3Proposed workflows for highly sensitive and specific indel detection from PGM data. BAM files were generated by read alignment of PGM sequencing outputs using either TMAP-TS3.4 (blue) or BWA (red). SAMtools was used to call indels. This was followed by a post-indel calling filtering using QDth and VARWth. An independent confirmation of called indel was performed using Sanger sequencing. The numbers of indels called by each step were specified.