| Literature DB >> 28541743 |
Yangyang Hao1,2, Xiaoling Xuei3,4, Lang Li1,2, Harikrishna Nakshatri5,6, Howard J Edenberg1,3,4, Yunlong Liu1,2,4,6.
Abstract
Accurate identification of low-frequency somatic point mutations in tumor samples has important clinical utilities. Although high-throughput sequencing technology enables capturing such variants while sequencing primary tumor samples, our ability for accurate detection is compromised when the variant frequency is close to the sequencer error rate. Most current experimental and bioinformatic strategies target mutations with ≥5% allele frequency, which limits our ability to understand the cancer etiology and tumor evolution. We present an experimental and computational modeling framework, RareVar, to reliably identify low-frequency single-nucleotide variants from high-throughput sequencing data under standard experimental protocols. RareVar protocol includes a benchmark design by pooling DNAs from already sequenced individuals at various concentrations to target variants at desired frequencies, 0.5%-3% in our case. By applying a generalized, linear model-based, position-specific error model, followed by machine-learning-based variant calibration, our approach outperforms existing methods. Our method can be applied on most capture and sequencing platforms without modifying the experimental protocol.Entities:
Keywords: low frequency SNVs; machine learning; next-generation sequencing; sequencing error modeling; somatic mutation.
Mesh:
Substances:
Year: 2017 PMID: 28541743 PMCID: PMC5510701 DOI: 10.1089/cmb.2017.0057
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479