| Literature DB >> 32103116 |
Zixi Chen1, Yuchen Yuan1, Xiaoshi Chen1, Jiayun Chen1, Shudai Lin1, Xingsong Li1, Hongli Du2.
Abstract
In the past decade, treatments for tumors have made remarkable progress, such as the successful clinical application of targeted therapies. Nowadays, targeted therapies are based primarily on the detection of mutations, and next-generation sequencing (NGS) plays an important role in relevant clinical research. The mutation frequency is a major problem in tumor mutation detection and increasing sequencing depth is a widely used method to improve mutation calling performance. Therefore, it is necessary to evaluate the effect of different sequencing depth and mutation frequency as well as mutation calling tools. In this study, Strelka2 and Mutect2 tools were used in detecting the performance of 30 combinations of sequencing depth and mutation frequency. Results showed that the precision rate kept greater than 95% in most of the samples. Generally, for higher mutation frequency (≥20%), sequencing depth ≥200X is sufficient for calling 95% mutations; for lower mutation frequency (≤10%), we recommend improving experimental method rather than increasing sequencing depth. Besides, according to our results, although Strelka2 and Mutect2 performed similarly, the former performed slightly better than the latter one at higher mutation frequency (≥20%), while Mutect2 performed better when the mutation frequency was lower than 10%. Besides, Strelka2 was 17 to 22 times faster than Mutect2 on average. Our research will provide a useful and comprehensive guideline for clinical genomic researches on somatic mutation identification through systematic performance comparison among different sequencing depths and mutation frequency.Entities:
Mesh:
Year: 2020 PMID: 32103116 PMCID: PMC7044309 DOI: 10.1038/s41598-020-60559-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The work flowchart of the experiment design. Two DNA samples were first sent to sequencing, the sequencing reads mapped to hg19 reference genome. Then the germline mutation calling pipeline were conducted to obtain true mutation set; the high depth data were downsampled and mixed to simulate tumor samples and then conduct somatic mutation calling pipeline. The results of somatic mutation calling were then compared with true mutation set and visualized for further dissusion.
Detailed information of sequencing data.
| Sample | Total reads(M) | Clean Reads(M) | Total base(G) | Q20 base(G) | Q30 base(G) | Q20% | Q30% | GC content | Duplication rate | Average Depth (removed duplicates) |
|---|---|---|---|---|---|---|---|---|---|---|
| NA12878 | 635.17 | 614.43 | 95.28 | 91.12 | 86.89 | 95.64 | 91.20 | 0.50 | 0.19 | 819.96 |
| YH-1 | 298.25 | 292.09 | 44.74 | 42.76 | 41.63 | 95.57 | 93.05 | 0.49 | 0.16 | 411.1 |
Figure 2P-R curves of Strelka2 and Mutect2 for different mutation frequency and sequencing depth. The P-R curves of replicate group 1. The colors in the figure represent different sequencing depths, the dotted lines represent Strelka2 and the solid lines represent Mutect2.
Figure 3F-score box-scatter plot. The box-scatter plot of F-score, the colors represent different mutation frequency (a) and sequencing depths (b).
Figure 4Software running time. The running time of Strelka2 and Mutect2 for each sample.