Literature DB >> 24829188

Reducing false-positive incidental findings with ensemble genotyping and logistic regression based variant filtering methods.

Kyu-Baek Hwang1, In-Hee Lee, Jin-Ho Park, Tina Hambuch, Yongjoon Choe, MinHyeok Kim, Kyungjoon Lee, Taemin Song, Matthew B Neu, Neha Gupta, Isaac S Kohane, Robert C Green, Sek Won Kong.   

Abstract

As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false-positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here, we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false-negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous single nucleotide variants (SNVs); 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery in NA12878, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and an ensemble genotyping would be essential to minimize false-positive DNM candidates.
© 2014 WILEY PERIODICALS, INC.

Entities:  

Keywords:  ensemble genotyping; false positive; incidental finding; logistic regression; whole genome sequencing

Mesh:

Year:  2014        PMID: 24829188      PMCID: PMC4112476          DOI: 10.1002/humu.22587

Source DB:  PubMed          Journal:  Hum Mutat        ISSN: 1059-7794            Impact factor:   4.878


  46 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

3.  VarScan: variant detection in massively parallel sequencing of individual and pooled samples.

Authors:  Daniel C Koboldt; Ken Chen; Todd Wylie; David E Larson; Michael D McLellan; Elaine R Mardis; George M Weinstock; Richard K Wilson; Li Ding
Journal:  Bioinformatics       Date:  2009-06-19       Impact factor: 6.937

4.  Variation in genome-wide mutation rates within and between human families.

Authors:  Donald F Conrad; Jonathan E M Keebler; Mark A DePristo; Sarah J Lindsay; Yujun Zhang; Ferran Casals; Youssef Idaghdour; Chris L Hartl; Carlos Torroja; Kiran V Garimella; Martine Zilversmit; Reed Cartwright; Guy A Rouleau; Mark Daly; Eric A Stone; Matthew E Hurles; Philip Awadalla
Journal:  Nat Genet       Date:  2011-06-12       Impact factor: 38.330

5.  DNMT3A mutations in acute myeloid leukemia.

Authors:  Timothy J Ley; Li Ding; Matthew J Walter; Michael D McLellan; Tamara Lamprecht; David E Larson; Cyriac Kandoth; Jacqueline E Payton; Jack Baty; John Welch; Christopher C Harris; Cheryl F Lichti; R Reid Townsend; Robert S Fulton; David J Dooling; Daniel C Koboldt; Heather Schmidt; Qunyuan Zhang; John R Osborne; Ling Lin; Michelle O'Laughlin; Joshua F McMichael; Kim D Delehaunty; Sean D McGrath; Lucinda A Fulton; Vincent J Magrini; Tammi L Vickery; Jasreet Hundal; Lisa L Cook; Joshua J Conyers; Gary W Swift; Jerry P Reed; Patricia A Alldredge; Todd Wylie; Jason Walker; Joelle Kalicki; Mark A Watson; Sharon Heath; William D Shannon; Nobish Varghese; Rakesh Nagarajan; Peter Westervelt; Michael H Tomasson; Daniel C Link; Timothy A Graubert; John F DiPersio; Elaine R Mardis; Richard K Wilson
Journal:  N Engl J Med       Date:  2010-11-10       Impact factor: 91.245

6.  Taxonomizing, sizing, and overcoming the incidentalome.

Authors:  Isaac S Kohane; Michael Hsing; Sek Won Kong
Journal:  Genet Med       Date:  2012-02-09       Impact factor: 8.822

7.  A small-cell lung cancer genome with complex signatures of tobacco exposure.

Authors:  Erin D Pleasance; Philip J Stephens; Sarah O'Meara; David J McBride; Alison Meynert; David Jones; Meng-Lay Lin; David Beare; King Wai Lau; Chris Greenman; Ignacio Varela; Serena Nik-Zainal; Helen R Davies; Gonzalo R Ordoñez; Laura J Mudie; Calli Latimer; Sarah Edkins; Lucy Stebbings; Lina Chen; Mingming Jia; Catherine Leroy; John Marshall; Andrew Menzies; Adam Butler; Jon W Teague; Jonathon Mangion; Yongming A Sun; Stephen F McLaughlin; Heather E Peckham; Eric F Tsung; Gina L Costa; Clarence C Lee; John D Minna; Adi Gazdar; Ewan Birney; Michael D Rhodes; Kevin J McKernan; Michael R Stratton; P Andrew Futreal; Peter J Campbell
Journal:  Nature       Date:  2009-12-16       Impact factor: 49.962

8.  Rate of de novo mutations and the importance of father's age to disease risk.

Authors:  Augustine Kong; Michael L Frigge; Gisli Masson; Soren Besenbacher; Patrick Sulem; Gisli Magnusson; Sigurjon A Gudjonsson; Asgeir Sigurdsson; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Wendy S W Wong; Gunnar Sigurdsson; G Bragi Walters; Stacy Steinberg; Hannes Helgason; Gudmar Thorleifsson; Daniel F Gudbjartsson; Agnar Helgason; Olafur Th Magnusson; Unnur Thorsteinsdottir; Kari Stefansson
Journal:  Nature       Date:  2012-08-23       Impact factor: 49.962

9.  The Human Gene Mutation Database: 2008 update.

Authors:  Peter D Stenson; Matthew Mort; Edward V Ball; Katy Howells; Andrew D Phillips; Nick St Thomas; David N Cooper
Journal:  Genome Med       Date:  2009-01-22       Impact factor: 11.117

10.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

Authors:  Juliane C Dohm; Claudio Lottaz; Tatiana Borodina; Heinz Himmelbauer
Journal:  Nucleic Acids Res       Date:  2008-07-26       Impact factor: 16.971

View more
  3 in total

1.  HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS.

Authors:  Shuang Wang; Yuchen Zhang; Wenrui Dai; Kristin Lauter; Miran Kim; Yuzhe Tang; Hongkai Xiong; Xiaoqian Jiang
Journal:  Bioinformatics       Date:  2015-10-06       Impact factor: 6.937

2.  Rapid Identification of Pathogenic Variants in Two Cases of Charcot-Marie-Tooth Disease by Gene-Panel Sequencing.

Authors:  Chi-Chun Ho; Shuk-Mui Tai; Edmond Chi-Nam Lee; Timothy Shin-Heng Mak; Timothy Kam-Tim Liu; Victor Wai-Lun Tang; Wing-Tat Poon
Journal:  Int J Mol Sci       Date:  2017-04-05       Impact factor: 5.923

3.  Allele balance bias identifies systematic genotyping errors and false disease associations.

Authors:  Francesc Muyas; Mattia Bosio; Anna Puig; Hana Susak; Laura Domènech; Georgia Escaramis; Luis Zapata; German Demidov; Xavier Estivill; Raquel Rabionet; Stephan Ossowski
Journal:  Hum Mutat       Date:  2018-11-23       Impact factor: 4.878

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.