| Literature DB >> 29301485 |
Aditya Deshpande1,2, Wenhua Lang3, Tina McDowell3, Smruthy Sivakumar1,4, Jiexin Zhang5, Jing Wang5, F Anthony San Lucas1, Jerry Fowler1, Humam Kadara6, Paul Scheet7,8,9.
Abstract
BACKGROUND: 'Next-generation' (NGS) sequencing has wide application in medical genetics, including the detection of somatic variation in cancer. The Ion Torrent-based (IONT) platform is among NGS technologies employed in clinical, research and diagnostic settings. However, identifying mutations from IONT deep sequencing with high confidence has remained a challenge. We compared various computational variant-calling methods to derive a variant identification pipeline that may improve the molecular diagnostic and research utility of IONT.Entities:
Keywords: Ion Reporter; Ion torrent; MuTect; Next-generation sequencing; Variant calling strategies; Varscan2
Mesh:
Year: 2018 PMID: 29301485 PMCID: PMC5753459 DOI: 10.1186/s12859-017-1991-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Sample description
| Tumor section | Tumor CNB | Normal | |
|---|---|---|---|
| NSCLC 1 | 1 | 6 | Blood |
| NSCLC 2 | 1 | 8 | Nasal |
| NSCLC 3 | 1 | 7 | Blood |
| NSCLC 4 | 1 | 6 | Blood |
Software versions used
| Software | Label | Version | Notes |
|---|---|---|---|
| Ion Reporter | IR | IonReporter Version 0.1.2 | IonReporter VCF employs TVC 4.2 |
| Poorman’s | PM | Torrent Variant Caller 4.2 | Subtract variants found in normal |
| Varscan | VS | varscan2.3.5 | reject |
| MuTect | MU | muTect-1.1.7.jar | |
| MuTect-Gaps | MG | muTect-1.1.7.jar | nearby_gap_events added (see “ |
Unmodified MuTect (MU) was not used in the analyses described because it produced so few variants
Fig. 1Variant Inspection Workflow. We used the depicted workflow to organize the variants called by the four software packages. We determined validity of variants via visual inspection after having established the categorization criteria described in the lower three boxes by consensus (Review and Decision, following independent inspections by individuals 1-4)
Performance of variant calling strategies
| Strategy | Calls (C) | Valid (v) | PPV ∗ | (SE, n) | Sensitivity ∗ | (SE, |
|---|---|---|---|---|---|---|
| IR | 127 | 64 | 0.50 | (0.044, 127) | 0.83 | (0.025) |
| MG | 106 | 41 ∗ | 0.39 | (0.056, 76) | 0.53 | (0.033) |
| PM | 199 | 71 ∗ | 0.36 | (0.043, 124) | 0.92 | (0.018) |
| VS | 381 | 26 ∗ | 0.07 | (0.030, 72) | 0.34 | (0.031) |
| Any one | 637 | 77 ∗ | 0.12 | (0.022, 223) | 1.00 | (NA) |
| IR ∩ MG | 42 | 40 | 0.95 | (NA, 42) | 0.52 | (0.033) |
| IR ∩ PM | 94 | 58 | 0.62 | (0.050, 94) | 0.75 | (0.029) |
| IR ∩ VS | 27 | 26 | 0.96 | (NA, 27) | 0.34 | (0.031) |
| MG ∩ PM | 40 | 39 | 0.97 | (NA, 40) | 0.51 | (0.033) |
| MG ∩ VS | 31 | 22 | 0.71 | (0.082, 31) | 0.29 | (0.030) |
| PM ∩ VS | 29 | 26 | 0.90 | (NA, 29) | 0.34 | (0.031) |
| Any two | 111 | 61 | 0.55 | (0.047, 111) | 0.79 | (0.027) |
| IR ∩ MG ∩ PM | 38 | 38 | 1.00 | (NA, 38) | 0.49 | (0.033) |
| IR ∩ MG ∩ VS | 22 | 22 | 1.00 | (NA, 22) | 0.29 | (0.030) |
| IR ∩ PM ∩ VS | 27 | 26 | 0.96 | (NA, 27) | 0.34 | (0.031) |
| MG ∩ PM ∩ VS | 22 | 22 | 1.00 | (NA, 22) | 0.29 | (0.030) |
| Any three | 43 | 42 | 0.98 | (NA, 43) | 0.55 | (0.033) |
| IR ∩ MG ∩ PM ∩ VS | 22 | 22 | 1.00 | (NA, 22) | 0.29 | (0.030) |
| IR ∪ PM | 232 | 77 ∗ | 0.33 | (0.026, 157) | 1.00 | (NA) |
| (IR ∪ PM) ∩ (MG ∪ VS) | 51 | 45 | 0.88 | (0.046, 51) | 0.58 | (0.033) |
Variant calls made and calls validated by visual inspection are given for each software package, as well as their intersections, excluding those variants identified by ANNOVAR as intronic or intergenic, which leaves 223 relevant SNPs collectively identified. The column labelled PPV shows Positive Predictive Value. Sensitivity and PPV were calculated as Sensitivity=v/V and PPV=v/C, respectively, where numerator v is the number of valid variants correctly called by a given caller or set of callers, denominator V is the extrapolated set of 77 valid variants (including 68 examined variants that we classified as valid plus 9 additional “valid” variants estimated via extrapolation), and denominator C is the total number variants called by all callers. The strategies described as “Any one,” “Any two,” and “Any three” each require the variant site to be identified by at least that many packages, which is effectively a union of sites identified by all the combinations in that section of the table. Asterisks (*) indicate values that were extrapolated. Standard error (SE) for PPV and Sensitivity is provided when n∗min(p,1−p)≥5, where n is the number of variants examined rather than the number called, because the latter is based on an extrapolation and does not contribute to estimated precision
Fig. 2Venn Diagram of Number of Variants Grouped by Method Combinations. Listed under each method name are that method’s total called variants. (Note, areas are not scaled according to set size.)