Ali Torkamani1, Nicholas J Schork. 1. Department of Molecular and Experimental Medicine, Scripps Genomic Medicine and the Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA.
Abstract
MOTIVATION: Limited availability of data has hindered the development of algorithms that can identify functionally meaningful regulatory single nucleotide polymorphisms (rSNPs). Given the large number of common polymorphisms known to reside in the human genome, the identification of functional rSNPs via laboratory assays will be costly and time-consuming. Therefore appropriate bioinformatics strategies for predicting functional rSNPs are necessary. Recent data from the Encyclopedia of DNA Elements (ENCODE) Project has significantly expanded the amount of available functional information relevant to non-coding regions of the genome, and, importantly, led to the conclusion that many functional elements in the human genome are not conserved. RESULTS: In this article we describe how ENCODE data can be leveraged to probabilistically determine the functional and phenotypic significance of non-coding SNPs (ncSNPs). The method achieves excellent sensitivity ( approximately 80%) and speci.city ( approximately 99%) based on a set of known phenotypically relevant and non-functional SNPs. In addition, we show that our method is not overtrained through the use of cross-validation analyses. AVAILABILITY: The software platforms used in our analyses are freely available (http://www.cs.waikato.ac.nz/ml/weka/). In addition, we provide the training dataset (Supplementary Table 3), and our predictions (Supplementary Table 6), in the Supplementary Material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Limited availability of data has hindered the development of algorithms that can identify functionally meaningful regulatory single nucleotide polymorphisms (rSNPs). Given the large number of common polymorphisms known to reside in the human genome, the identification of functional rSNPs via laboratory assays will be costly and time-consuming. Therefore appropriate bioinformatics strategies for predicting functional rSNPs are necessary. Recent data from the Encyclopedia of DNA Elements (ENCODE) Project has significantly expanded the amount of available functional information relevant to non-coding regions of the genome, and, importantly, led to the conclusion that many functional elements in the human genome are not conserved. RESULTS: In this article we describe how ENCODE data can be leveraged to probabilistically determine the functional and phenotypic significance of non-coding SNPs (ncSNPs). The method achieves excellent sensitivity ( approximately 80%) and speci.city ( approximately 99%) based on a set of known phenotypically relevant and non-functional SNPs. In addition, we show that our method is not overtrained through the use of cross-validation analyses. AVAILABILITY: The software platforms used in our analyses are freely available (http://www.cs.waikato.ac.nz/ml/weka/). In addition, we provide the training dataset (Supplementary Table 3), and our predictions (Supplementary Table 6), in the Supplementary Material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: A E Kel; E Gössling; I Reuter; E Cheremushkin; O V Kel-Margoulis; E Wingender Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971
Authors: Bastiaan Hoogendoorn; Sharon L Coleman; Carol A Guy; Kaye Smith; Tim Bowen; Paul R Buckland; Michael C O'Donovan Journal: Hum Mol Genet Date: 2003-07-22 Impact factor: 6.150
Authors: Bonnie Burgess-Beusse; Catherine Farrell; Miklos Gaszner; Michael Litt; Vesco Mutskov; Felix Recillas-Targa; Melanie Simpson; Adam West; Gary Felsenfeld Journal: Proc Natl Acad Sci U S A Date: 2002-08-01 Impact factor: 11.205
Authors: Gangning Liang; Joy C Y Lin; Vivian Wei; Christine Yoo; Jonathan C Cheng; Carvell T Nguyen; Daniel J Weisenberger; Gerda Egger; Daiya Takai; Felicidad A Gonzales; Peter A Jones Journal: Proc Natl Acad Sci U S A Date: 2004-05-03 Impact factor: 11.205
Authors: Peter D Stenson; Edward V Ball; Matthew Mort; Andrew D Phillips; Jacqueline A Shiel; Nick S T Thomas; Shaun Abeysinghe; Michael Krawczak; David N Cooper Journal: Hum Mutat Date: 2003-06 Impact factor: 4.878
Authors: Thomas A Peterson; Matthew Mort; David N Cooper; Predrag Radivojac; Maricel G Kann; Sean D Mooney Journal: Hum Mutat Date: 2016-08-31 Impact factor: 4.878
Authors: Yiqiang Zhao; Wyatt T Clark; Matthew Mort; David N Cooper; Predrag Radivojac; Sean D Mooney Journal: Hum Mutat Date: 2011-09-09 Impact factor: 4.878
Authors: Alex Wells; David Heckerman; Ali Torkamani; Li Yin; Jonathan Sebat; Bing Ren; Amalio Telenti; Julia di Iulio Journal: Nat Commun Date: 2019-11-20 Impact factor: 14.919