Literature DB >> 20130032

Maximal conditional chi-square importance in random forests.

Minghui Wang1, Xiang Chen, Heping Zhang.   

Abstract

MOTIVATION: High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings.
RESULTS: We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of association between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical P-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and permutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a GWAS of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genome-wide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Compared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utilizing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to identify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases. CONTACT: heping.zhang@yale.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2010        PMID: 20130032      PMCID: PMC2832825          DOI: 10.1093/bioinformatics/btq038

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  28 in total

1.  Complement factor H polymorphism in age-related macular degeneration.

Authors:  Robert J Klein; Caroline Zeiss; Emily Y Chew; Jen-Yue Tsai; Richard S Sackler; Chad Haynes; Alice K Henning; John Paul SanGiovanni; Shrikant M Mane; Susan T Mayne; Michael B Bracken; Frederick L Ferris; Jurg Ott; Colin Barnstable; Josephine Hoh
Journal:  Science       Date:  2005-03-10       Impact factor: 47.728

2.  A common variant on chromosome 9p21 affects the risk of myocardial infarction.

Authors:  Anna Helgadottir; Gudmar Thorleifsson; Andrei Manolescu; Solveig Gretarsdottir; Thorarinn Blondal; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Asgeir Sigurdsson; Adam Baker; Arnar Palsson; Gisli Masson; Daniel F Gudbjartsson; Kristinn P Magnusson; Karl Andersen; Allan I Levey; Valgerdur M Backman; Sigurborg Matthiasdottir; Thorbjorg Jonsdottir; Stefan Palsson; Helga Einarsdottir; Steinunn Gunnarsdottir; Arnaldur Gylfason; Viola Vaccarino; W Craig Hooper; Muredach P Reilly; Christopher B Granger; Harland Austin; Daniel J Rader; Svati H Shah; Arshed A Quyyumi; Jeffrey R Gulcher; Gudmundur Thorgeirsson; Unnur Thorsteinsdottir; Augustine Kong; Kari Stefansson
Journal:  Science       Date:  2007-05-03       Impact factor: 47.728

3.  A framework to identify physiological responses in microarray-based gene expression studies: selection and interpretation of biologically relevant genes.

Authors:  Wendy Rodenburg; A Geert Heidema; Jolanda M A Boer; Ingeborg M J Bovee-Oudenhoven; Edith J M Feskens; Edwin C M Mariman; Jaap Keijer
Journal:  Physiol Genomics       Date:  2007-12-27       Impact factor: 3.107

4.  Linkage strategies for genetically complex traits. II. The power of affected relative pairs.

Authors:  N Risch
Journal:  Am J Hum Genet       Date:  1990-02       Impact factor: 11.025

5.  CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration.

Authors:  Mingyao Li; Pelin Atmaca-Sonmez; Mohammad Othman; Kari E H Branham; Ritu Khanna; Michael S Wade; Yun Li; Liming Liang; Sepideh Zareparsi; Anand Swaroop; Gonçalo R Abecasis
Journal:  Nat Genet       Date:  2006-08-27       Impact factor: 38.330

6.  Complement factor H polymorphism and age-related macular degeneration.

Authors:  Albert O Edwards; Robert Ritter; Kenneth J Abel; Alisa Manning; Carolien Panhuysen; Lindsay A Farrer
Journal:  Science       Date:  2005-03-10       Impact factor: 47.728

7.  Complement factor H variant increases the risk of age-related macular degeneration.

Authors:  Jonathan L Haines; Michael A Hauser; Silke Schmidt; William K Scott; Lana M Olson; Paul Gallins; Kylee L Spencer; Shu Ying Kwan; Maher Noureddine; John R Gilbert; Nathalie Schnetz-Boutaud; Anita Agarwal; Eric A Postel; Margaret A Pericak-Vance
Journal:  Science       Date:  2005-03-10       Impact factor: 47.728

8.  Cell and tumor classification using gene expression data: construction of forests.

Authors:  Heping Zhang; Chang-Yung Yu; Burton Singer
Journal:  Proc Natl Acad Sci U S A       Date:  2003-03-17       Impact factor: 11.205

9.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

10.  Detecting significant single-nucleotide polymorphisms in a rheumatoid arthritis study using random forests.

Authors:  Minghui Wang; Xiang Chen; Meizhuo Zhang; Wensheng Zhu; Kelly Cho; Heping Zhang
Journal:  BMC Proc       Date:  2009-12-15
View more
  13 in total

1.  Random forest classification of etiologies for an orphan disease.

Authors:  Jaime Lynn Speiser; Valerie L Durkalski; William M Lee
Journal:  Stat Med       Date:  2014-11-03       Impact factor: 2.373

2.  Power of data mining methods to detect genetic associations and interactions.

Authors:  Annette M Molinaro; Nicholas Carriero; Robert Bjornson; Patricia Hartge; Nathaniel Rothman; Nilanjan Chatterjee
Journal:  Hum Hered       Date:  2011-09-17       Impact factor: 0.444

Review 3.  Random forests for genetic association studies.

Authors:  Benjamin A Goldstein; Eric C Polley; Farren B S Briggs
Journal:  Stat Appl Genet Mol Biol       Date:  2011-07-12

Review 4.  Random forests for genomic data analysis.

Authors:  Xi Chen; Hemant Ishwaran
Journal:  Genomics       Date:  2012-04-21       Impact factor: 5.736

5.  The use of classification trees for bioinformatics.

Authors:  Xiang Chen; Minghui Wang; Heping Zhang
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2011-01-06

6.  Transcriptome classification reveals molecular subtypes in psoriasis.

Authors:  Chrysanthi Ainali; Najl Valeyev; Gayathri Perera; Andrew Williams; Johann E Gudjonsson; Christos A Ouzounis; Frank O Nestle; Sophia Tsoka
Journal:  BMC Genomics       Date:  2012-09-12       Impact factor: 3.969

7.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

Authors:  Wouter G Touw; Jumamurat R Bayjanov; Lex Overmars; Lennart Backus; Jos Boekhorst; Michiel Wels; Sacha A F T van Hijum
Journal:  Brief Bioinform       Date:  2012-07-10       Impact factor: 11.622

8.  Comparative methods for association studies: a case study on metabolite variation in a Brassica rapa core collection.

Authors:  Dunia Pino Del Carpio; Ram Kumar Basnet; Ric C H De Vos; Chris Maliepaard; Maria João Paulo; Guusje Bonnema
Journal:  PLoS One       Date:  2011-05-13       Impact factor: 3.240

9.  Genomic Prediction of Breeding Values Using a Subset of SNPs Identified by Three Machine Learning Methods.

Authors:  Bo Li; Nanxi Zhang; You-Gan Wang; Andrew W George; Antonio Reverter; Yutao Li
Journal:  Front Genet       Date:  2018-07-04       Impact factor: 4.599

10.  Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis.

Authors:  Shang-Ming Zhou; Fabiola Fernandez-Gutierrez; Jonathan Kennedy; Roxanne Cooksey; Mark Atkinson; Spiros Denaxas; Stefan Siebert; William G Dixon; Terence W O'Neill; Ernest Choy; Cathie Sudlow; Sinead Brophy
Journal:  PLoS One       Date:  2016-05-02       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.