Minghui Wang1, Xiang Chen, Heping Zhang. 1. Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520-8034, USA.
Abstract
MOTIVATION: High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings. RESULTS: We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of association between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical P-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and permutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a GWAS of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genome-wide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Compared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utilizing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to identify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases. CONTACT: heping.zhang@yale.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings. RESULTS: We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of association between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical P-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and permutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a GWAS of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genome-wide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Compared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utilizing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to identify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases. CONTACT: heping.zhang@yale.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Robert J Klein; Caroline Zeiss; Emily Y Chew; Jen-Yue Tsai; Richard S Sackler; Chad Haynes; Alice K Henning; John Paul SanGiovanni; Shrikant M Mane; Susan T Mayne; Michael B Bracken; Frederick L Ferris; Jurg Ott; Colin Barnstable; Josephine Hoh Journal: Science Date: 2005-03-10 Impact factor: 47.728
Authors: Anna Helgadottir; Gudmar Thorleifsson; Andrei Manolescu; Solveig Gretarsdottir; Thorarinn Blondal; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Asgeir Sigurdsson; Adam Baker; Arnar Palsson; Gisli Masson; Daniel F Gudbjartsson; Kristinn P Magnusson; Karl Andersen; Allan I Levey; Valgerdur M Backman; Sigurborg Matthiasdottir; Thorbjorg Jonsdottir; Stefan Palsson; Helga Einarsdottir; Steinunn Gunnarsdottir; Arnaldur Gylfason; Viola Vaccarino; W Craig Hooper; Muredach P Reilly; Christopher B Granger; Harland Austin; Daniel J Rader; Svati H Shah; Arshed A Quyyumi; Jeffrey R Gulcher; Gudmundur Thorgeirsson; Unnur Thorsteinsdottir; Augustine Kong; Kari Stefansson Journal: Science Date: 2007-05-03 Impact factor: 47.728
Authors: Wendy Rodenburg; A Geert Heidema; Jolanda M A Boer; Ingeborg M J Bovee-Oudenhoven; Edith J M Feskens; Edwin C M Mariman; Jaap Keijer Journal: Physiol Genomics Date: 2007-12-27 Impact factor: 3.107
Authors: Mingyao Li; Pelin Atmaca-Sonmez; Mohammad Othman; Kari E H Branham; Ritu Khanna; Michael S Wade; Yun Li; Liming Liang; Sepideh Zareparsi; Anand Swaroop; Gonçalo R Abecasis Journal: Nat Genet Date: 2006-08-27 Impact factor: 38.330
Authors: Jonathan L Haines; Michael A Hauser; Silke Schmidt; William K Scott; Lana M Olson; Paul Gallins; Kylee L Spencer; Shu Ying Kwan; Maher Noureddine; John R Gilbert; Nathalie Schnetz-Boutaud; Anita Agarwal; Eric A Postel; Margaret A Pericak-Vance Journal: Science Date: 2005-03-10 Impact factor: 47.728
Authors: Chrysanthi Ainali; Najl Valeyev; Gayathri Perera; Andrew Williams; Johann E Gudjonsson; Christos A Ouzounis; Frank O Nestle; Sophia Tsoka Journal: BMC Genomics Date: 2012-09-12 Impact factor: 3.969
Authors: Wouter G Touw; Jumamurat R Bayjanov; Lex Overmars; Lennart Backus; Jos Boekhorst; Michiel Wels; Sacha A F T van Hijum Journal: Brief Bioinform Date: 2012-07-10 Impact factor: 11.622
Authors: Dunia Pino Del Carpio; Ram Kumar Basnet; Ric C H De Vos; Chris Maliepaard; Maria João Paulo; Guusje Bonnema Journal: PLoS One Date: 2011-05-13 Impact factor: 3.240
Authors: Shang-Ming Zhou; Fabiola Fernandez-Gutierrez; Jonathan Kennedy; Roxanne Cooksey; Mark Atkinson; Spiros Denaxas; Stefan Siebert; William G Dixon; Terence W O'Neill; Ernest Choy; Cathie Sudlow; Sinead Brophy Journal: PLoS One Date: 2016-05-02 Impact factor: 3.240