Literature DB >> 33709636

A novel machine learning approach (svmSomatic) to distinguish somatic and germline mutations using next-generation sequencing data.

Yu-Fang Mao1, Xi-Guo Yuan2, Yu-Peng Cun3.   

Abstract

Somatic mutations are a large category of genetic variations, which play an essential role in tumorigenesis. Detection of somatic single nucleotide variants (SNVs) could facilitate downstream analysis of tumorigenesis. Many computational methods have been developed to detect SNVs, but most require normal matched samples to differentiate somatic SNVs from the normal state, which can be difficult to obtain. Therefore, developing new approaches for detecting somatic SNVs without matched samples are crucial. In this work, we detected somatic mutations from individual tumor samples based on a novel machine learning approach, svmSomatic, using next-generation sequencing (NGS) data. In addition, as somatic SNV detection can be impacted by multiple mutations, with germline mutations and co-occurrence of copy number variations (CNVs) common in organisms, we used the novel approach to distinguish somatic and germline mutations based on the NGS data from individual tumor samples. In summary, svmSomatic: (1) considers the influence of CNV co-occurrence in detecting somatic mutations; and (2) trains a support vector machine algorithm to distinguish between somatic and germline mutations, without requiring normal matched samples. We further tested and compared svmSomatic with other common methods. Results showed that svmSomatic performance, as measured by F1-score, was significantly better than that of others using both simulation and real NGS data.

Entities:  

Keywords:  Copy number variants; Germline mutation; Next-generation sequencing; Single nucleotide variations; Somatic mutation; Support vector machine

Mesh:

Year:  2021        PMID: 33709636      PMCID: PMC7995270          DOI: 10.24272/j.issn.2095-8137.2021.014

Source DB:  PubMed          Journal:  Zool Res        ISSN: 2095-8137


  20 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  SomVarIUS: somatic variant identification from unpaired tissue samples.

Authors:  Kyle S Smith; Vinod K Yadav; Shanshan Pei; Daniel A Pollyea; Craig T Jordan; Subhajyoti De
Journal:  Bioinformatics       Date:  2015-11-20       Impact factor: 6.937

3.  Copy-number analysis and inference of subclonal populations in cancer genomes using Sclust.

Authors:  Yupeng Cun; Tsun-Po Yang; Viktor Achter; Ulrich Lang; Martin Peifer
Journal:  Nat Protoc       Date:  2018-05-24       Impact factor: 13.491

4.  FaSD-somatic: a fast and accurate somatic SNV detection algorithm for cancer genome sequencing data.

Authors:  Weixin Wang; Panwen Wang; Feng Xu; Ruibang Luo; Maria Pik Wong; Tak-Wah Lam; Junwen Wang
Journal:  Bioinformatics       Date:  2014-05-14       Impact factor: 6.937

5.  IntSIM: An Integrated Simulator of Next-Generation Sequencing Data.

Authors:  Xiguo Yuan; Junying Zhang; Liying Yang
Journal:  IEEE Trans Biomed Eng       Date:  2016-04-29       Impact factor: 4.538

6.  CONDEL: Detecting Copy Number Variation and Genotyping Deletion Zygosity from Single Tumor Samples Using Sequence Data.

Authors:  Xiguo Yuan; Jun Bai; Junying Zhang; Liying Yang; Junbo Duan; Yaoyao Li; Meihong Gao
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2018-11-26       Impact factor: 3.710

7.  Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data.

Authors:  Valentina Boeva; Tatiana Popova; Kevin Bleakley; Pierre Chiche; Julie Cappo; Gudrun Schleiermacher; Isabelle Janoueix-Lerosey; Olivier Delattre; Emmanuel Barillot
Journal:  Bioinformatics       Date:  2011-12-06       Impact factor: 6.937

8.  MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data.

Authors:  Yu Fan; Liu Xi; Daniel S T Hughes; Jianjun Zhang; Jianhua Zhang; P Andrew Futreal; David A Wheeler; Wenyi Wang
Journal:  Genome Biol       Date:  2016-08-24       Impact factor: 13.583

9.  ISOWN: accurate somatic mutation identification in the absence of normal tissue controls.

Authors:  Irina Kalatskaya; Quang M Trinh; Melanie Spears; John D McPherson; John M S Bartlett; Lincoln Stein
Journal:  Genome Med       Date:  2017-06-29       Impact factor: 11.117

10.  STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome.

Authors:  Xiguo Yuan; Chao Ma; Haiyong Zhao; Liying Yang; Shuzhen Wang; Jianing Xi
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2021-12-08       Impact factor: 3.710

View more
  4 in total

1.  Review on COVID-19 diagnosis models based on machine learning and deep learning approaches.

Authors:  Zaid Abdi Alkareem Alyasseri; Mohammed Azmi Al-Betar; Iyad Abu Doush; Mohammed A Awadallah; Ammar Kamal Abasi; Sharif Naser Makhadmeh; Osama Ahmad Alomari; Karrar Hameed Abdulkareem; Afzan Adam; Robertas Damasevicius; Mazin Abed Mohammed; Raed Abu Zitar
Journal:  Expert Syst       Date:  2021-07-28       Impact factor: 2.812

2.  svBreak: A New Approach for the Detection of Structural Variant Breakpoints Based on Convolutional Neural Network.

Authors:  Shaoqiang Wang; Jie Li; A K Alvi Haque; Haiyong Zhao; Liying Yang; Xiguo Yuan
Journal:  Biomed Res Int       Date:  2022-03-19       Impact factor: 3.411

Review 3.  What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics.

Authors:  Anthony M Musolf; Emily R Holzinger; James D Malley; Joan E Bailey-Wilson
Journal:  Hum Genet       Date:  2021-12-04       Impact factor: 5.881

4.  CIRCNV: Detection of CNVs Based on a Circular Profile of Read Depth from Sequencing Data.

Authors:  Hai-Yong Zhao; Qi Li; Ye Tian; Yue-Hui Chen; Haque A K Alvi; Xi-Guo Yuan
Journal:  Biology (Basel)       Date:  2021-06-25
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.