Daniel Quang1, Yifei Chen2, Xiaohui Xie1. 1. Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA. 2. Department of Computer Science and Center for Complex Biological Systems, University of California, Irvine, CA 92697, USA.
Abstract
UNLABELLED: Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology. AVAILABILITY AND IMPLEMENTATION: All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/.
UNLABELLED: Annotating genetic variants, especially non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large number of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative reduction in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodology. AVAILABILITY AND IMPLEMENTATION: All data and source code are available at https://cbcl.ics.uci.edu/public_data/DANN/.
Authors: Martin Kircher; Daniela M Witten; Preti Jain; Brian J O'Roak; Gregory M Cooper; Jay Shendure Journal: Nat Genet Date: 2014-02-02 Impact factor: 38.330
Authors: Wenqing Fu; Timothy D O'Connor; Goo Jun; Hyun Min Kang; Goncalo Abecasis; Suzanne M Leal; Stacey Gabriel; Mark J Rieder; David Altshuler; Jay Shendure; Deborah A Nickerson; Michael J Bamshad; Joshua M Akey Journal: Nature Date: 2012-11-28 Impact factor: 49.962
Authors: Xiuhua Bozarth; Jennifer N Dines; Qian Cong; Ghayda M Mirzaa; Kimberly Foss; J Lawrence Merritt; Jenny Thies; Heather C Mefford; Edward Novotny Journal: Am J Med Genet A Date: 2018-12-04 Impact factor: 2.802
Authors: Rocky Cheung; Kimberly D Insigne; David Yao; Christina P Burghard; Jeffrey Wang; Yun-Hua E Hsiao; Eric M Jones; Daniel B Goodman; Xinshu Xiao; Sriram Kosuri Journal: Mol Cell Date: 2018-11-29 Impact factor: 17.970
Authors: Cornelis Blauwendraat; Demis A Kia; Lasse Pihlstrøm; Ziv Gan-Or; Suzanne Lesage; J Raphael Gibbs; Jinhui Ding; Roy N Alcalay; Sharon Hassin-Baer; Alan M Pittman; Janet Brooks; Connor Edsall; Sun Ju Chung; Stefano Goldwurm; Mathias Toft; Claudia Schulte; Dena Hernandez; Andrew B Singleton; Mike A Nalls; Alexis Brice; Sonja W Scholz; Nicholas W Wood Journal: Neurobiol Aging Date: 2017-12-20 Impact factor: 4.673
Authors: Yuwen Liu; Yanyu Liang; A Ercument Cicek; Zhongshan Li; Jinchen Li; Rebecca A Muhle; Martina Krenzer; Yue Mei; Yan Wang; Nicholas Knoblauch; Jean Morrison; Siming Zhao; Yi Jiang; Evan Geller; Iuliana Ionita-Laza; Jinyu Wu; Kun Xia; James P Noonan; Zhong Sheng Sun; Xin He Journal: Am J Hum Genet Date: 2018-05-10 Impact factor: 11.025