Literature DB >> 23620357

A support vector machine for identification of single-nucleotide polymorphisms from next-generation sequencing data.

Brendan D O'Fallon1, Whitney Wooderchak-Donahue, David K Crockett.   

Abstract

MOTIVATION: Accurate determination of single-nucleotide polymorphisms (SNPs) from next-generation sequencing data is a significant challenge facing bioinformatics researchers. Most current methods use mechanistic models that assume nucleotides aligning to a given reference position are sampled from a binomial distribution. While such methods are sensitive, they are often unable to discriminate errors resulting from misaligned reads, sequencing errors or platform artifacts from true variants.
RESULTS: To enable more accurate SNP calling, we developed an algorithm that uses a trained support vector machine (SVM) to determine variants from .BAM or .SAM formatted alignments of sequence reads. Our SVM-based implementation determines SNPs with significantly greater sensitivity and specificity than alternative platforms, including the UnifiedGenotyper included with the Genome Analysis Toolkit, samtools and FreeBayes. In addition, the quality scores produced by our implementation more accurately reflect the likelihood that a variant is real when compared with those produced by the Genome Analysis Toolkit. While results depend on the model used, the implementation includes tools to easily build new models and refine existing models with additional training data. AVAILABILITY: Source code and executables are available from github.com/brendanofallon/SNPSVM/

Mesh:

Year:  2013        PMID: 23620357     DOI: 10.1093/bioinformatics/btt172

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  14 in total

1.  A gradient-boosting approach for filtering de novo mutations in parent-offspring trios.

Authors:  Yongzhuang Liu; Bingshan Li; Renjie Tan; Xiaolin Zhu; Yadong Wang
Journal:  Bioinformatics       Date:  2014-03-10       Impact factor: 6.937

2.  Genome-Wide Noninvasive Prenatal Diagnosis of De Novo Mutations.

Authors:  Ravit Peretz-Machluf; Tom Rabinowitz; Noam Shomron
Journal:  Methods Mol Biol       Date:  2021

3.  Characterization of the complete mitochondrial genome of flower-breeding Drosophila incompta (Diptera, Drosophilidae).

Authors:  F C De Ré; G L Wallau; L J Robe; E L S Loreto
Journal:  Genetica       Date:  2014-11-22       Impact factor: 1.082

Review 4.  Next Generation Sequencing and Bioinformatics Analysis of Family Genetic Inheritance.

Authors:  Aquillah M Kanzi; James Emmanuel San; Benjamin Chimukangara; Eduan Wilkinson; Maryam Fish; Veron Ramsuran; Tulio de Oliveira
Journal:  Front Genet       Date:  2020-10-23       Impact factor: 4.599

5.  VariantMetaCaller: automated fusion of variant calling pipelines for quantitative, precision-based filtering.

Authors:  András Gézsi; Bence Bolgár; Péter Marx; Peter Sarkozy; Csaba Szalai; Péter Antal
Journal:  BMC Genomics       Date:  2015-10-28       Impact factor: 3.969

6.  Sequence-based identification of recombination spots using pseudo nucleic acid representation and recursive feature extraction by linear kernel SVM.

Authors:  Liqi Li; Sanjiu Yu; Weidong Xiao; Yongsheng Li; Lan Huang; Xiaoqi Zheng; Shiwen Zhou; Hua Yang
Journal:  BMC Bioinformatics       Date:  2014-11-20       Impact factor: 3.169

7.  Comprehensive variation discovery in single human genomes.

Authors:  Neil I Weisenfeld; Shuangye Yin; Ted Sharpe; Bayo Lau; Ryan Hegarty; Laurie Holmes; Brian Sogoloff; Diana Tabbaa; Louise Williams; Carsten Russ; Chad Nusbaum; Eric S Lander; Iain MacCallum; David B Jaffe
Journal:  Nat Genet       Date:  2014-10-19       Impact factor: 38.330

8.  Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine.

Authors:  Rui Mao; Praveen Kumar Raj Kumar; Cheng Guo; Yang Zhang; Chun Liang
Journal:  PLoS One       Date:  2014-08-11       Impact factor: 3.240

9.  Protein sub-nuclear localization prediction using SVM and Pfam domain information.

Authors:  Ravindra Kumar; Sohni Jain; Bandana Kumari; Manish Kumar
Journal:  PLoS One       Date:  2014-06-04       Impact factor: 3.240

10.  A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference.

Authors:  Adam Cornish; Chittibabu Guda
Journal:  Biomed Res Int       Date:  2015-10-11       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.