Literature DB >> 29673316

A study on fast calling variants from next-generation sequencing data using decision tree.

Zhentang Li1,2, Yi Wang3, Fei Wang4,5.   

Abstract

BACKGROUND: The rapid development of next-generation sequencing (NGS) technology has continuously been refreshing the throughput of sequencing data. However, due to the lack of a smart tool that is both fast and accurate, the analysis task for NGS data, especially those with low-coverage, remains challenging.
RESULTS: We proposed a decision-tree based variant calling algorithm. Experiments on a set of real data indicate that our algorithm achieves high accuracy and sensitivity for SNVs and indels and shows good adaptability on low-coverage data. In particular, our algorithm is obviously faster than 3 widely used tools in our experiments.
CONCLUSIONS: We implemented our algorithm in a software named Fuwa and applied it together with 4 well-known variant callers, i.e., Platypus, GATK-UnifiedGenotyper, GATK-HaplotypeCaller and SAMtools, to three sequencing data sets of a well-studied sample NA12878, which were produced by whole-genome, whole-exome and low-coverage whole-genome sequencing technology respectively. We also conducted additional experiments on the WGS data of 4 newly released samples that have not been used to populate dbSNP.

Entities:  

Keywords:  Decision tree; Next-generation sequencing; Variant calling

Mesh:

Year:  2018        PMID: 29673316      PMCID: PMC5907718          DOI: 10.1186/s12859-018-2147-9

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  15 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

Review 3.  Next-generation sequencing: big data meets high performance computing.

Authors:  Bertil Schmidt; Andreas Hildebrandt
Journal:  Drug Discov Today       Date:  2017-02-02       Impact factor: 7.851

4.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

5.  Natural genetic variation caused by small insertions and deletions in the human genome.

Authors:  Ryan E Mills; W Stephen Pittard; Julienne M Mullaney; Umar Farooq; Todd H Creasy; Anup A Mahurkar; David M Kemeza; Daniel S Strassler; Chris P Ponting; Caleb Webber; Scott E Devine
Journal:  Genome Res       Date:  2011-04-01       Impact factor: 9.043

6.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

7.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

8.  SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.

Authors:  Jean-François Spinella; Pamela Mehanna; Ramon Vidal; Virginie Saillour; Pauline Cassart; Chantal Richer; Manon Ouimet; Jasmine Healy; Daniel Sinnett
Journal:  BMC Genomics       Date:  2016-11-14       Impact factor: 3.969

9.  Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.

Authors:  Andy Rimmer; Hang Phan; Iain Mathieson; Zamin Iqbal; Stephen R F Twigg; Andrew O M Wilkie; Gil McVean; Gerton Lunter
Journal:  Nat Genet       Date:  2014-07-13       Impact factor: 38.330

10.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

View more
  2 in total

1.  Adapting Genotyping-by-Sequencing and Variant Calling for Heterogeneous Stock Rats.

Authors:  Alexander F Gileta; Jianjun Gao; Apurva S Chitre; Hannah V Bimschleger; Celine L St Pierre; Shyam Gopalakrishnan; Abraham A Palmer
Journal:  G3 (Bethesda)       Date:  2020-07-07       Impact factor: 3.154

2.  BITS2019: the sixteenth annual meeting of the Italian society of bioinformatics.

Authors:  Alfonso Urso; Antonino Fiannaca; Massimo La Rosa; Laura La Paglia; Giosue' Lo Bosco; Riccardo Rizzo
Journal:  BMC Bioinformatics       Date:  2020-09-16       Impact factor: 3.169

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.