Literature DB >> 25270638

Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes.

Vassily Trubetskoy1, Alex Rodriguez1, Uptal Dave1, Nicholas Campbell2, Emily L Crawford2, Edwin H Cook1, James S Sutcliffe2, Ian Foster1, Ravi Madduri1, Nancy J Cox1, Lea K Davis1.   

Abstract

MOTIVATION: The development of cost-effective next-generation sequencing methods has spurred the development of high-throughput bioinformatics tools for detection of sequence variation. With many disparate variant-calling algorithms available, investigators must ask, 'Which method is best for my data?' Machine learning research has shown that so-called ensemble methods that combine the output of multiple models can dramatically improve classifier performance. Here we describe a novel variant-calling approach based on an ensemble of variant-calling algorithms, which we term the Consensus Genotyper for Exome Sequencing (CGES). CGES uses a two-stage voting scheme among four algorithm implementations. While our ensemble method can accept variants generated by any variant-calling algorithm, we used GATK2.8, SAMtools, FreeBayes and Atlas-SNP2 in building CGES because of their performance, widespread adoption and diverse but complementary algorithms.
RESULTS: We apply CGES to 132 samples sequenced at the Hudson Alpha Institute for Biotechnology (HAIB, Huntsville, AL) using the Nimblegen Exome Capture and Illumina sequencing technology. Our sample set consisted of 40 complete trios, two families of four, one parent-child duo and two unrelated individuals. CGES yielded the fewest total variant calls (N(CGES) = 139° 897), the highest Ts/Tv ratio (3.02), the lowest Mendelian error rate across all genotypes (0.028%), the highest rediscovery rate from the Exome Variant Server (EVS; 89.3%) and 1000 Genomes (1KG; 84.1%) and the highest positive predictive value (PPV; 96.1%) for a random sample of previously validated de novo variants. We describe these and other quality control (QC) metrics from consensus data and explain how the CGES pipeline can be used to generate call sets of varying quality stringency, including consensus calls present across all four algorithms, calls that are consistent across any three out of four algorithms, calls that are consistent across any two out of four algorithms or a more liberal set of all calls made by any algorithm.
AVAILABILITY AND IMPLEMENTATION: To enable accessible, efficient and reproducible analysis, we implement CGES both as a stand-alone command line tool available for download in GitHub and as a set of Galaxy tools and workflows configured to execute on parallel computers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 25270638      PMCID: PMC4287941          DOI: 10.1093/bioinformatics/btu591

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  20 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

2.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing.

Authors:  Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson
Journal:  Genome Res       Date:  2012-02-02       Impact factor: 9.043

3.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

4.  Patterns and rates of exonic de novo mutations in autism spectrum disorders.

Authors:  Benjamin M Neale; Yan Kou; Li Liu; Avi Ma'ayan; Kaitlin E Samocha; Aniko Sabo; Chiao-Feng Lin; Christine Stevens; Li-San Wang; Vladimir Makarov; Paz Polak; Seungtai Yoon; Jared Maguire; Emily L Crawford; Nicholas G Campbell; Evan T Geller; Otto Valladares; Chad Schafer; Han Liu; Tuo Zhao; Guiqing Cai; Jayon Lihm; Ruth Dannenfelser; Omar Jabado; Zuleyma Peralta; Uma Nagaswamy; Donna Muzny; Jeffrey G Reid; Irene Newsham; Yuanqing Wu; Lora Lewis; Yi Han; Benjamin F Voight; Elaine Lim; Elizabeth Rossin; Andrew Kirby; Jason Flannick; Menachem Fromer; Khalid Shakir; Tim Fennell; Kiran Garimella; Eric Banks; Ryan Poplin; Stacey Gabriel; Mark DePristo; Jack R Wimbish; Braden E Boone; Shawn E Levy; Catalina Betancur; Shamil Sunyaev; Eric Boerwinkle; Joseph D Buxbaum; Edwin H Cook; Bernie Devlin; Richard A Gibbs; Kathryn Roeder; Gerard D Schellenberg; James S Sutcliffe; Mark J Daly
Journal:  Nature       Date:  2012-04-04       Impact factor: 49.962

5.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

6.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.

Authors:  Jeremy Goecks; Anton Nekrutenko; James Taylor
Journal:  Genome Biol       Date:  2010-08-25       Impact factor: 13.583

7.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

8.  Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls.

Authors:  Li Liu; Aniko Sabo; Benjamin M Neale; Uma Nagaswamy; Christine Stevens; Elaine Lim; Corneliu A Bodea; Donna Muzny; Jeffrey G Reid; Eric Banks; Hillary Coon; Mark Depristo; Huyen Dinh; Tim Fennel; Jason Flannick; Stacey Gabriel; Kiran Garimella; Shannon Gross; Alicia Hawes; Lora Lewis; Vladimir Makarov; Jared Maguire; Irene Newsham; Ryan Poplin; Stephan Ripke; Khalid Shakir; Kaitlin E Samocha; Yuanqing Wu; Eric Boerwinkle; Joseph D Buxbaum; Edwin H Cook; Bernie Devlin; Gerard D Schellenberg; James S Sutcliffe; Mark J Daly; Richard A Gibbs; Kathryn Roeder
Journal:  PLoS Genet       Date:  2013-04-11       Impact factor: 5.917

9.  Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing.

Authors:  Jason O'Rawe; Tao Jiang; Guangqing Sun; Yiyang Wu; Wei Wang; Jingchu Hu; Paul Bodily; Lifeng Tian; Hakon Hakonarson; W Evan Johnson; Zhi Wei; Kai Wang; Gholson J Lyon
Journal:  Genome Med       Date:  2013-03-27       Impact factor: 11.117

10.  Comparing a few SNP calling algorithms using low-coverage sequencing data.

Authors:  Xiaoqing Yu; Shuying Sun
Journal:  BMC Bioinformatics       Date:  2013-09-17       Impact factor: 3.169

View more
  8 in total

1.  Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.

Authors:  Adam C Naj; Honghuang Lin; Badri N Vardarajan; Simon White; Daniel Lancour; Yiyi Ma; Michael Schmidt; Fangui Sun; Mariusz Butkiewicz; William S Bush; Brian W Kunkle; John Malamon; Najaf Amin; Seung Hoan Choi; Kara L Hamilton-Nelson; Sven J van der Lee; Namrata Gupta; Daniel C Koboldt; Mohamad Saad; Bowen Wang; Alejandro Q Nato; Harkirat K Sohi; Amanda Kuzma; Li-San Wang; L Adrienne Cupples; Cornelia van Duijn; Sudha Seshadri; Gerard D Schellenberg; Eric Boerwinkle; Joshua C Bis; Josée Dupuis; William J Salerno; Ellen M Wijsman; Eden R Martin; Anita L DeStefano
Journal:  Genomics       Date:  2018-05-29       Impact factor: 5.736

2.  A case study for cloud based high throughput analysis of NGS data using the globus genomics system.

Authors:  Krithika Bhuvaneshwar; Dinanath Sulakhe; Robinder Gauba; Alex Rodriguez; Ravi Madduri; Utpal Dave; Lukasz Lacinski; Ian Foster; Yuriy Gusev; Subha Madhavan
Journal:  Comput Struct Biotechnol J       Date:  2014-11-07       Impact factor: 7.271

3.  Rare genetic variants in the endocannabinoid system genes CNR1 and DAGLA are associated with neurological phenotypes in humans.

Authors:  Douglas R Smith; Christine M Stanley; Theodore Foss; Richard G Boles; Kevin McKernan
Journal:  PLoS One       Date:  2017-11-16       Impact factor: 3.240

4.  CoVaCS: a consensus variant calling system.

Authors:  Matteo Chiara; Silvia Gioiosa; Giovanni Chillemi; Mattia D'Antonio; Tiziano Flati; Ernesto Picardi; Federico Zambelli; David Stephen Horner; Graziano Pesole; Tiziana Castrignanò
Journal:  BMC Genomics       Date:  2018-02-05       Impact factor: 3.969

5.  AMLVaran: a software approach to implement variant analysis of targeted NGS sequencing data in an oncological care setting.

Authors:  Christian Wünsch; Henrik Banck; Carsten Müller-Tidow; Martin Dugas
Journal:  BMC Med Genomics       Date:  2020-02-04       Impact factor: 3.063

Review 6.  Molecular genetic testing strategies used in diagnostic flow for hereditary endocrine tumour syndromes.

Authors:  Henriett Butz; Jo Blair; Attila Patócs
Journal:  Endocrine       Date:  2021-02-11       Impact factor: 3.633

7.  UPS-indel: a Universal Positioning System for Indels.

Authors:  Mohammad Shabbir Hasan; Xiaowei Wu; Layne T Watson; Liqing Zhang
Journal:  Sci Rep       Date:  2017-10-26       Impact factor: 4.379

Review 8.  Molecular genetic diagnostics of hypogonadotropic hypogonadism: from panel design towards result interpretation in clinical practice.

Authors:  Henriett Butz; Gábor Nyírő; Petra Anna Kurucz; István Likó; Attila Patócs
Journal:  Hum Genet       Date:  2020-03-28       Impact factor: 4.132

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.