| Literature DB >> 28222155 |
Sarah Sandmann1, Aniek O de Graaf2, Bert A van der Reijden2, Joop H Jansen2, Martin Dugas1.
Abstract
BACKGROUND: There are various next-generation sequencing techniques, all of them striving to replace Sanger sequencing as the gold standard. However, false positive calls of single nucleotide variants and especially indels are a widely known problem of basically all sequencing platforms.Entities:
Mesh:
Year: 2017 PMID: 28222155 PMCID: PMC5319672 DOI: 10.1371/journal.pone.0171983
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of the variant calling pipeline (steps marked by dashed frames are only performed in case of the variant calling pipeline with GLM).
Overview of the parameters investigated for the variant calling pipeline with GLM.
| Category | Parameter | Origin | SNVs | Indels |
|---|---|---|---|---|
| Quality and depth | vcf file (QUAL) | x | x | |
| vcf file (DP) | x | x | ||
| vcf file (QD) | x | x | ||
| Coverage | calculated | x | x | |
| calculated | x | x | ||
| vcf file (AD: #ref+#alt) | x | x | ||
| Allele frequency | calculated | x | x | |
| calculated | x | x | ||
| vcf file (AD: #alt/(#ref+#alt)) | x | x | ||
| Strand bias | calculated | x | x | |
| vcf file (FS) | x | x | ||
| vcf file (SOR) | x | x | ||
| Variant position | calculated | x | x | |
| vcf file (ReadPosRankSum) | x | x | ||
| Base quality | calculated | x | ||
| vcf file (BaseQRankSum) | x | x | ||
| Mapping quality | vcf file (MQ) | x | x | |
| vcf file (MQRankSum) | x | x | ||
| Homopolymer length | calculated | x | ||
| calculated | x | |||
| calculated | x | |||
| Indel width | calculated ([ | x | ||
| calculated | x |
Overview of the subjects sequenced on 454, Ion Torrent and Illumina NextSeq (comparison set marked with a c, re-sequencing set marked with an r).
| Sample | 454 | Ion Torrent | Illumina | |||
|---|---|---|---|---|---|---|
| 1st set | 2nd set | 1st set | 2nd set | 1st set | 2nd set | |
| UPN001 | x | x | x | x | ||
| UPN002 | x | x | x | x | ||
| UPN003 | x | x | x | x | ||
| UPN004 | x | x | x | x | ||
| UPN005 | x | x | x | x | ||
| UPN006 | x | x | x | |||
| UPN007 | x | x | x | |||
| UPN008 | x | x | x | |||
| UPN009 | x | x | x | x | ||
| UPN010 | x | x | ||||
| UPN011 | x | x | ||||
| UPN012 | x | x | ||||
| UPN013 | x | x | ||||
| UPN014 | x | x | ||||
| UPN015 | x | x | ||||
| UPN016 | x | x | ||||
| UPN017 | x | x | ||||
| UPN018 | x | x | ||||
| UPN019 | x | x | ||||
| UPN020 | x | x | ||||
Fig 2Median coverage of the genes in the intersecting target region in the case of 454 (black), Ion Torrent (red) and Illumina (green) considering the comparison data set.
True- and false positive SNV calls, sensitivity and PPV considering the comparison subset (n = 9), the re-sequencing subset (n = 5) and all data (454 and Illumina: n = 15, Ion Torrent: n = 10), using the standard analysis pipleine (without GLM). Only those variants are considered that are covered by at least 20 reads.
| Dataset | Platform | SNVs | False Positives | Sensitivity | PPV | |
|---|---|---|---|---|---|---|
| Comparison | 454 | 11 | 3 | 0.84 | 0.79 | |
| Ion Torrent | 12 | 6 | 0.92 | 0.67 | ||
| Illumina NextSeq | 11 | 1 | 0.84 | 0.92 | ||
| Re-sequencing | 454 | Set1 | 14 | 1 | 1.00 | 0.93 |
| Set2 | 12 | 0 | 0.86 | 1.00 | ||
| Overlap | 12 | 0 | 0.86 | 1.00 | ||
| Ion Torrent | Set1 | 4 | 6 | 0.80 | 0.40 | |
| Set2 | 4 | 1 | 0.80 | 0.80 | ||
| Overlap | 4 | 0 | 0.80 | 0.80 | ||
| Illumina NextSeq | Set1 | 8 | 2 | 0.89 | 0.80 | |
| Set2 | 9 | 0 | 1.00 | 1.00 | ||
| Overlap | 8 | 0 | 0.89 | 1.00 | ||
| Altogether | 454 | 40 | 4 | 0.91 | 0.91 | |
| Ion Torrent | 21 | 7 | 0.91 | 0.75 | ||
| Illumina NextSeq | 29 | 3 | 0.88 | 0.91 |
Normalized relative variable importance for all parameters characterizing SNVs, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
| Parameter | 454 | Ion Torrent | Illumina |
|---|---|---|---|
| 0.60 | |||
| 0.58 | 0.61 | 0.29 | |
| 1.73 | 0.81 | 0 | |
| 1.01 | 1.00 | ||
| 1.01 | 1.00 | 1.14 | |
| 0.81 | 0.55 | 0.32 | |
| 0.71 | 0.78 | 0 | |
| 0.74 | 0.78 | 0 | |
| 1.00 | 0.69 | 0 | |
| 0.38 | 0.17 | 0.44 | |
| 0.85 | 0.70 | 2.12 | |
| 0.80 | 1.26 | 0.56 | |
| 0.67 | 0.63 | 0.41 | |
| 0.73 | |||
| 0.84 | 0.73 | 0.66 | |
| 2.91 | 0.80 | 1.08 | |
| 1.07 | 0.76 | 1.20 | |
| 0.45 | 0.84 | 0.30 |
True- and false positive SNV calls, sensitivity (sens) and PPV considering the training subset (454 and Illumina: n = 12, Ion Torrent: n = 7) and the test subset (n = 3), comparing the standard analysis pipleine (without GLM) and the optimized analysis pipleine (with GLM). Only those variants are considered that are covered by at least 20 reads.
| Dataset | Platform | SNVs without GLM | SNVs with GLM | ||||||
|---|---|---|---|---|---|---|---|---|---|
| SNVs | False Positives | Sens | PPV | SNVs | False Positives | Sens | PPV | ||
| Training | 454 | 36 | 3 | 0.92 | 0.92 | 36 | 1 | 0.92 | 0.97 |
| Ion Torrent | 15 | 7 | 0.88 | 0.68 | 15 | 1 | 0.88 | 0.94 | |
| Illumina NextSeq | 27 | 3 | 0.90 | 0.90 | 27 | 1 | 0.90 | 0.96 | |
| Test | 454 | 4 | 1 | 0.80 | 0.80 | 4 | 0 | 0.80 | 1.00 |
| Ion Torrent | 6 | 0 | 1.00 | 1.00 | 6 | 0 | 1.00 | 1.00 | |
| Illumina NextSeq | 2 | 0 | 0.67 | 1.00 | 2 | 0 | 0.67 | 1.00 | |
True- and false positive indel calls, sensitivity and PPV considering the comparison subset (n = 9), the re-sequencing subset (n = 5) and all data (454 and Illumina: n = 15, Ion Torrent: n = 10), using the standard analysis pipleine (without GLM). Only those variants are considered that are covered by at least 20 reads.
| Dataset | Platform | Indels | False Positives | Sensitivity | PPV | |
|---|---|---|---|---|---|---|
| Comparison | 454 | 3 | 77 | 0.60 | 0.04 | |
| Ion Torrent | 5 | 422 | 1.00 | 0.01 | ||
| Illumina NextSeq | 3 | 17 | 0.60 | 0.15 | ||
| Re-sequencing | 454 | Set1 | 0 | 26 | / | / |
| Set2 | 0 | 75 | / | / | ||
| Overlap | 0 | 17 | / | / | ||
| Ion Torrent | Set1 | 2 | 235 | 1.00 | 0.01 | |
| Set2 | 2 | 297 | 1.00 | 0.01 | ||
| Overlap | 2 | 123 | 1.00 | 0.02 | ||
| Illumina NextSeq | Set1 | 4 | 6 | 0.67 | 0.40 | |
| Set2 | 5 | 11 | 1.00 | 0.31 | ||
| Overlap | 4 | 4 | 0.67 | 0.5 | ||
| Altogether | 454 | 6 | 186 | 0.75 | 0.03 | |
| Ion Torrent | 8 | 800 | 1.00 | 0.01 | ||
| Illumina NextSeq | 13 | 37 | 0.76 | 0.26 |
Normalized relative variable importance for all parameters characterizing indels, considering 454, Ion Torrent and Illumina NextSeq sequencing data.
| Parameter | 454 | Ion Torrent | Illumina |
|---|---|---|---|
| 0.54 | 0.74 | ||
| 0.28 | 0.46 | 2.00 | |
| 2.06 | 0.42 | ||
| 0.32 | 0.63 | ||
| 0.35 | 0.99 | 0.49 | |
| 0.44 | 0.61 | ||
| 2.90 | 0.73 | 0.95 | |
| 1.42 | 0.61 | 0.58 | |
| 0.22 | 1.70 | 0.24 | |
| 0.27 | 0.00 | 1.02 | |
| 0.12 | 0.02 | 0.79 | |
| 0.63 | 0.58 | ||
| 0.19 | 1.30 | 1.74 | |
| 0.36 | 0.49 | 0.34 | |
| 0.41 | 0.44 | ||
| 0.35 | 0.61 | ||
| 0.39 | 0.70 | 0.76 | |
| 1.24 | |||
| 0.10 | 0.24 | ||
| 0.69 | 0.90 | 0.74 | |
| 0.35 | 0.81 | 0.60 | |
| 0.74 | 0.56 | 1.87 |
True- and false positive indel calls, sensitivity (sens) and PPV considering the training subset (454 and Illumina: n = 12, Ion Torrent: n = 7) and the test subset (n = 3), comparing the standard analysis pipleine (without GLM) and the optimized analysis pipleine (with GLM). Only those variants are considered that are covered by at least 20 reads.
| Dataset | Platform | Indels without GLM | Indels with GLM | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Indels | False Positives | Sens | PPV | Indels | False Positives | Sens | PPV | ||
| Training | 454 | 3 | 158 | 0.60 | 0.02 | 3 | 4 | 0.60 | 0.43 |
| Ion Torrent | 7 | 644 | 1.00 | 0.01 | 7 | 4 | 1.00 | 0.64 | |
| Illumina NextSeq | 12 | 31 | 0.80 | 0.28 | 12 | 2 | 0.80 | 0.86 | |
| Test | 454 | 3 | 28 | 1.00 | 0.10 | 1 | 1 | 0.33 | 0.50 |
| Ion Torrent | 1 | 156 | 1.00 | 0.01 | 1 | 3 | 1.00 | 0.25 | |
| Illumina NextSeq | 1 | 6 | 0.50 | 0.14 | 1 | 0 | 0.50 | 1.00 | |