| Literature DB >> 29552334 |
Chang Xu1.
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.Entities:
Keywords: Benchmarking; Low-frequency mutation; Somatic mutation; Unique molecular identifier; Variant calling
Year: 2018 PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
List of tumor-normal somatic SNV callers sorted in alphabetical order. For each variant caller, the types of variants that are reported (column 2), whether single-sample input is allowed (column 3), and a high-level summary of the core algorithm (column 4) are provided. The variant callers and their core algorithms are explained in detail in Section 3.
| Variant caller | Type of variant | Single-sample mode | Type of core algorithm |
|---|---|---|---|
| BAYSIC | SNV | No | Machine learning (ensemble caller) |
| CaVEMan | SNV | No | Joint genotype analysis |
| deepSNV | SNV | No | Allele frequency analysis |
| EBCall | SNV, indel | No | Allele frequency analysis |
| FaSD-somatic | SNV | Yes | Joint genotype analysis |
| FreeBayes | SNV, indel | Yes | Haplotype analysis |
| HapMuC | SNV, indel | Yes | Haplotype analysis |
| JointSNVMix2 | SNV | No | Joint genotype analysis |
| LocHap | SNV, indel | No | Haplotype analysis |
| LoFreq | SNV, indel | Yes | Allele frequency analysis |
| LoLoPicker | SNV | No | Allele frequency analysis |
| MutationSeq | SNV | No | Machine learning |
| MuSE | SNV | No | Markov chain model |
| MuTect | SNV | Yes | Allele frequency analysis |
| SAMtools | SNV, indel | Yes | Joint genotype analysis |
| Platypus | SNV, indel, SV | Yes | Haplotype analysis |
| qSNP | SNV | No | Heuristic threshold |
| RADIA | SNV | No | Heuristic threshold |
| Seurat | SNV, indel, SV | No | Joint genotype analysis |
| Shimmer | SNV, indel | No | Heuristic threshold |
| SNooPer | SNV, indel | Yes | Machine learning |
| SNVSniffer | SNV, indel | Yes | Joint genotype analysis |
| SOAPsnv | SNV | No | Heuristic threshold |
| SomaticSeq | SNV | No | Machine learning (ensemble caller) |
| SomaticSniper | SNV | No | Joint genotype analysis |
| Strelka | SNV, indel | No | Allele frequency analysis |
| TVC | SNV, indel, SV | Yes | Ion Torrent specific |
| VarDict | SNV, indel, SV | Yes | Heuristic threshold |
| VarScan2 | SNV, indel | Yes | Heuristic threshold |
| Virmid | SNV | No | Joint genotype analysis |
List of single-sample somatic and germline SNV callers sorted in alphabetical order. For each variant caller, the types of variants that are reported (column 2), whether somatic variants are distinguished from germline variants (column 3), applications reported in the original publication (column 4), and a high-level summary of the core algorithm (column 5) are presented. The variant callers and their core algorithms are explained in detail in Section 4.
| Variant caller | Type of variant | Somatic-germline classification | Reported application | Type of core algorithm |
|---|---|---|---|---|
| ISOWN | SNV | Yes | Deep sequencing, FFPE samples | Supervised learning |
| OutLyzer | SNV | No | Deep sequencing, FFPE samples | Noise level estimation |
| Pisces | SNV, indel | Yes | Deep sequencing | Poisson model on read count |
| PoreSeq | SNV, indel | No | Low-coverage nanopore data | Nanopore specific |
| Shearwater | SNV | No | Deep sequencing | Noise level estimation |
| SiNVICT | SNV, indel | No | Deep sequencing; cfDNA | Poisson model on read count |
| SNVer | SNV, indel | No | Deep sequencing | Allele frequency analysis |
| SNVMix2 | SNV | No | WGS, WES | Genotype analysis |
| SomVarIUS | SNV, indel | Yes | WES; FFPE samples | Noise level estimation |
| SPLINTER | SNV, indel | No | Deep sequencing | Noise level estimation |
Fig. 1(a) Building a consensus read from a UMI group. Errors (blue cross) are corrected and real mutations (green circle) are preserved. Yellow segment indicates UMI sequence. (b) Reducing amplification bias by counting UMIs instead of reads.
List of UMI-based somatic and germline SNV callers sorted in alphabetical order. For each variant caller, the types of variants that are reported (column 2), whether a complete workflow including UMI handling (extraction, consensus, clustering), read processing, and mapping/alignment is provided (column 3), whether duplex sequencing data are supported (column 4), the library preparation and sequencing protocol companion to the caller (column 5), and the detection of limit reported in the original publication (column 6) are presented. The variant callers and their core algorithms are explained in detail in Section 5.
| Variant caller | Type of variant | Complete workflow | Duplex sequencing data | Companion protocol | Detection limit (original paper) |
|---|---|---|---|---|---|
| DeepSNVMiner | SNV, indel | Yes | No | Unspecified | 0.1% |
| iDES | SNV, indel | Yes | Yes | CARP-Seq | 0.00025–0.025% |
| MAGERI | SNV, indel | Yes | Yes | Multiple protocols | 0.1% |
| smCounter | SNV, indel | No | No | QIAseq targeted DNA-seq | 1% |
List of RNA-seq somatic and germline SNV callers sorted in alphabetical order. For each variant caller, the types of variants that are reported (column 2), whether DNA-RNA integrated analysis is performed (column 3), whether the tool is exclusively for RNA-seq variant calling (column 4), and whether a complete workflow including RNA-seq read mapping, variant calling, and filtering is provided (column 6) are presented. The variant callers and their core algorithms are explained in detail in Section 6.
| Variant caller | Type of variant | Integrated analysis | Dedicated to RNA-seq | Complete workflow |
|---|---|---|---|---|
| eSNV-detect | SNV | No | Yes | No |
| RADIA | SNV | Yes | No | No |
| Seurat | SNV, indel | Yes | No | No |
| SNPiR | SNV | No | Yes | Yes |
| VarDict | SNV, indel, SV | No | No | No |
| VarScan2 | SNV, indel | No | No | No |
Definition of variant calling performance metrics. TP, TN, FP, FN are true positive, true negative, false positive, false negative respectively.
| Metric | Synonym | Formula | Relation with other metrics |
|---|---|---|---|
| Sensitivity | Recall | ||
| Specificity | |||
| False positive rate (FPR) | 1 - specificity | ||
| Positive predictive value (PPV) | Precision | ||
| False discovery rate (FDR) | 1 - PPV | ||
| F-score | harmonic mean of sensitivity and PPV |
Fig. 2Illustration of a complex variant at position 101: TACA > TAATGTCTATCAGA being represented in two combinations of simple SNV and indels. Representation one: insertion at 101: T > TAATGTCTATC and SNV at 103: G > C. Representation two: insertions at 102: A > AATGT and 103: C > CTATCAG.