| Literature DB >> 27874022 |
Lei Cai1,2, Wei Yuan1, Zhou Zhang1,3, Lin He1,4, Kuo-Chen Chou2,5.
Abstract
Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.Entities:
Mesh:
Year: 2016 PMID: 27874022 PMCID: PMC5118795 DOI: 10.1038/srep36540
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Counts of mutations detected by four sSNV callers in different patients based on WES (A) and UDT-Seq (B) data.
Figure 2The relationship of mutations detected by four sSNV callers.
Venn diagrams of mutations based on WES (A) and UDT-Seq (B) data illustrate the overlaps between mutation candidate sets. Scatter-plot of mutation candidate sets from each sample based on WES (C) and UDT-Seq (D) data demonstrate the correlation between size of mutations detected by different callers: each dot in the upper panel represents the number of mutations detected from each tumor-normal tissue pair, and the lower panels display the Pearson correlation coefficients between the two callers with font size proportional to the correlations.
Multiple features used for gauging call sets produced by four sSNV callers.
| Whole Exome Sequencing or WES (32 samples) | ||||
|---|---|---|---|---|
| Method | Varscan | SomaticSniper | Strelka | MuTect2 |
| Total variants | 7957 | 9826 | 6441 | 4952 |
| HAAIC variants | 2156 (27.1%) | 2319 (23.6%) | 380 (5.9%) | 228 (4.6%) |
| Variants with strand bias | 286 (3.6%) | 20 (0.2%) | 39 (0.6%) | 10 (0.2%) |
| Variants with >2 observations in controls | 4042 (50.8%) | 7006 (71.3%) | 715 (11.1%) | 560(11.3%) |
| Variants present in dbSNP | 3724 (46.8%) | 6976 (71.0%) | 1024 (15.9%) | 738 (14.9%) |
| Variants present in COSMIC | 374 (4.7%) | 452 (4.6%) | 309 (4.8%) | 352 (7.1%) |
| Targeted sequencing (51 samples) | ||||
| Total variants | 971 | 1699 | 2128 | 1295 |
| HAAIC variants | 179 (18.4%) | 227 (13.4%) | 298 (14.0%) | 191 (14.7%) |
| Variants with strand bias | 43 (4.4%) | 12 (0.7%) | 62 (2.9%) | 13 (1%) |
| Variants with >2 observations in controls | 415 (42.7%) | 1139 (67.0%) | 811 (38.1%) | 480 (37.1%) |
| Variants present in dbSNP | 396 (40.8%) | 1151 (67.7) | 864 (40.6%) | 497 (38.4%) |
| Variants present in COSMIC | 58 (6.0%) | 91 (5.4%) | 174 (8.2%) | 171 (13.2%) |
aPhred-scaled p-value > 30.
bVariants present in both COSMIC and dbSNP are considered only with COSMIC entries since dbSNP holds a small fraction of somatic mutation.
Figure 3The depth-VAF scatter plot of SNV candidates in WES (A) and UDT-Seq (B).The Y-axis of each graph means the read depth of a variant in the tumor tissue, while the X-axis means the VAF of each SNF candidate. Triangles means SNV candidates have HAAIC. The color means SNV candidates detection methods.
Sanger sequencing validation for SNV candidates.
| Item | Varscan | SomaticSniper | Strelka | MuTect2 | SNVs for Sanger resequencing |
|---|---|---|---|---|---|
| WES | 23 (35.4%) | 21 (32.3%) | 47 (72.3%) | 58 (89.2%) | 65 |
| Targeted sequencing | 34 (50.0%) | 31 (45.6%) | 60 (88.2%) | 64 (94.1%) | 68 |
Comparison between candidates identified by the four callers and those by Cake.
| Method | Total variants | COSMIC* | dbSNP | HAAIC | Mean depth | Mean VAF |
|---|---|---|---|---|---|---|
| WES | 22052 | 1125 (5.1%) | 11004 (49.9%) | 4896 (22.2%) | 53.21 | 0.33 |
| WES after Cake | 2691 | 180 (6.7%) | 280 (10.4%) | 105 (3.9%) | 35.16 | 0.28 |
| Targeted sequencing | 4642 | 281 (6.1%) | 2505 (54.0%) | 842 (18.14%) | 192.77 | 0.29 |
| Targeted sequencing after Cake | 227 | 50 (22.0%) | 44 (19.4%) | 9 (4.0%) | 367.06 | 0.28 |
Figure 4The intersection of SNVs obtained from WES and UDT-Seq data.
The dots demonstrate SNV candidates from 23 tumor-normal tissue pairs undergoing both WES and UDT-Seq. sUDT-Seq: SNVs detected within UDT-Seq are shared with those within WES; sWES: SNVs detected within WES are shared with those within UDT-Seq; UDT-Seq: SNVs uniquely detected within UDT-Seq data; WES: SNVs uniquely detected within WES data.
SNV Candidates detected by four callers based on WES and UDT-Seq data in 23 samples.
| Methods | Total sites | COSMIC | dbSNP | SNVs with >2 observations in controls | HAAIC |
|---|---|---|---|---|---|
| WES | 182 | 10 (5.6%) | 43 (23.6%) | 60 (33.0%) | 27 (15.1%) |
| Targeted sequencing | 2175 | 680 (31.3%) | 1067 (49.1%) | 1081 (49.7%) | 441 (20.3%) |
aWES produces SNV candidates located in gene regions covered by UDT-Seq.
bVariants present in both COSMIC and dbSNP are considered only with COSMIC entries.
cHAAIC is abbreviation for High-Alternate-Alllele-In-Control.