Philipp Rentzsch1,2, Max Schubach1,2, Jay Shendure3,4, Martin Kircher5,6. 1. Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany. 2. Berlin Institute of Health (BIH), 10178, Berlin, Germany. 3. Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA. 4. Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA. 5. Charité - Universitätsmedizin Berlin, 10117, Berlin, Germany. martin.kircher@bihealth.de. 6. Berlin Institute of Health (BIH), 10178, Berlin, Germany. martin.kircher@bihealth.de.
Abstract
BACKGROUND: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. METHODS: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. RESULTS: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu ), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. CONCLUSIONS: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.
BACKGROUND: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. METHODS: It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. RESULTS: We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu ), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. CONCLUSIONS: While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.
Authors: Rocky Cheung; Kimberly D Insigne; David Yao; Christina P Burghard; Jeffrey Wang; Yun-Hua E Hsiao; Eric M Jones; Daniel B Goodman; Xinshu Xiao; Sriram Kosuri Journal: Mol Cell Date: 2018-11-29 Impact factor: 17.970
Authors: Julia di Iulio; Istvan Bartha; Emily H M Wong; Hung-Chun Yu; Victor Lavrenko; Dongchan Yang; Inkyung Jung; Michael A Hicks; Naisha Shah; Ewen F Kirkness; Martin M Fabani; William H Biggs; Bing Ren; J Craig Venter; Amalio Telenti Journal: Nat Genet Date: 2018-02-26 Impact factor: 38.330
Authors: Kymberleigh A Pagel; Vikas Pejaver; Guan Ning Lin; Hyun-Jun Nam; Matthew Mort; David N Cooper; Jonathan Sebat; Lilia M Iakoucheva; Sean D Mooney; Predrag Radivojac Journal: Bioinformatics Date: 2017-07-15 Impact factor: 6.937
Authors: Nikola Kresojević; Valerija Dobričić; Milica Ječmenica Lukić; Aleksandra Tomić; Igor Petrović; Nataša Dragašević; Ivana Perović; Ana Marjanović; Marija Branković; Milena Janković; Ivana Novaković; Marina Svetel; Vladimir S Kostić Journal: J Neurol Date: 2022-01-07 Impact factor: 4.849
Authors: Matthew P Wilson; Alejandro Garanto; Filippo Pinto E Vairo; Bobby G Ng; Wasantha K Ranatunga; Marina Ventouratou; Melissa Baerenfaenger; Karin Huijben; Christian Thiel; Angel Ashikov; Liesbeth Keldermans; Erika Souche; Sandrine Vuillaumier-Barrot; Thierry Dupré; Helen Michelakakis; Agata Fiumara; James Pitt; Susan M White; Sze Chern Lim; Lyndon Gallacher; Heidi Peters; Daisy Rymen; Peter Witters; Antonia Ribes; Blai Morales-Romero; Agustí Rodríguez-Palmero; Diana Ballhausen; Pascale de Lonlay; Rita Barone; Mirian C H Janssen; Jaak Jaeken; Hudson H Freeze; Gert Matthijs; Eva Morava; Dirk J Lefeber Journal: Am J Hum Genet Date: 2021-10-14 Impact factor: 11.025
Authors: Erik Rosenhahn; Thomas J O'Brien; Maha S Zaki; Ina Sorge; Dagmar Wieczorek; Kevin Rostasy; Antonio Vitobello; Sophie Nambot; Fowzan S Alkuraya; Mais O Hashem; Amal Alhashem; Brahim Tabarki; Abdullah S Alamri; Ayat H Al Safar; Dalal K Bubshait; Nada F Alahmady; Joseph G Gleeson; Mohamed S Abdel-Hamid; Nicole Lesko; Sofia Ygberg; Sandrina P Correia; Anna Wredenberg; Shahryar Alavi; Seyed M Seyedhassani; Mahya Ebrahimi Nasab; Haytham Hussien; Tarek E I Omar; Ines Harzallah; Renaud Touraine; Homa Tajsharghi; Heba Morsy; Henry Houlden; Mohammad Shahrooei; Maryam Ghavideldarestani; Ghada M H Abdel-Salam; Annalaura Torella; Mariateresa Zanobio; Gaetano Terrone; Nicola Brunetti-Pierri; Abdolmajid Omrani; Julia Hentschel; Johannes R Lemke; Heinrich Sticht; Rami Abou Jamra; Andre E X Brown; Reza Maroofian; Konrad Platzer Journal: Am J Hum Genet Date: 2022-07-12 Impact factor: 11.043
Authors: Timo C E Zondag; Lamberto Torralba-Raga; Jan A M Van Laar; Maud A W Hermans; Arjen Bouman; Iris H I M Hollink; P Martin Van Hagen; Deborah A Briggs; Alistair N Hume; Yenan T Bryceson Journal: J Clin Immunol Date: 2022-07-23 Impact factor: 8.542
Authors: Amy Hardcastle; Aliska M Berry; Ian M Campbell; Xiaonan Zhao; Pengfei Liu; Amanda E Gerard; Jill A Rosenfeld; Saumya D Sisoudiya; Andres Hernandez-Garcia; Sara Loddo; Silvia Di Tommaso; Antonio Novelli; Maria L Dentici; Rossella Capolino; Maria C Digilio; Ludovico Graziani; Cecilie F Rustad; Katherine Neas; Giovanni B Ferrero; Alfredo Brusco; Eleonora Di Gregorio; Diana Wellesley; Claire Beneteau; Madeleine Joubert; Kris Van Den Bogaert; Anneleen Boogaerts; Dominic J McMullan; John Dean; Maria G Giuffrida; Laura Bernardini; Vinod Varghese; Nora L Shannon; Rachel E Harrison; Wayne W K Lam; Shane McKee; Peter D Turnpenny; Trevor Cole; Jenny Morton; Jacqueline Eason; Marilyn C Jones; Rebecca Hall; Michael Wright; Karen Horridge; Chad A Shaw; Wendy K Chung; Daryl A Scott Journal: Am J Med Genet A Date: 2022-07-29 Impact factor: 2.578