Literature DB >> 19291098

Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction.

Manuel García-Magariños1, Iñaki López-de-Ullibarri, Ricardo Cao, Antonio Salas.   

Abstract

Most common human diseases are likely to have complex etiologies. Methods of analysis that allow for the phenomenon of epistasis are of growing interest in the genetic dissection of complex diseases. By allowing for epistatic interactions between potential disease loci, we may succeed in identifying genetic variants that might otherwise have remained undetected. Here we aimed to analyze the ability of logistic regression (LR) and two tree-based supervised learning methods, classification and regression trees (CART) and random forest (RF), to detect epistasis. Multifactor-dimensionality reduction (MDR) was also used for comparison. Our approach involves first the simulation of datasets of autosomal biallelic unphased and unlinked single nucleotide polymorphisms (SNPs), each containing a two-loci interaction (causal SNPs) and 98 'noise' SNPs. We modelled interactions under different scenarios of sample size, missing data, minor allele frequencies (MAF) and several penetrance models: three involving both (indistinguishable) marginal effects and interaction, and two simulating pure interaction effects. In total, we have simulated 99 different scenarios. Although CART, RF, and LR yield similar results in terms of detection of true association, CART and RF perform better than LR with respect to classification error. MAF, penetrance model, and sample size are greater determining factors than percentage of missing data in the ability of the different techniques to detect true association. In pure interaction models, only RF detects association. In conclusion, tree-based methods and LR are important statistical tools for the detection of unknown interactions among true risk-associated SNPs with marginal effects and in the presence of a significant number of noise SNPs. In pure interaction models, RF performs reasonably well in the presence of large sample sizes and low percentages of missing data. However, when the study design is suboptimal (unfavourable to detect interaction in terms of e.g. sample size and MAF) there is a high chance of detecting false, spurious associations.

Entities:  

Mesh:

Year:  2009        PMID: 19291098     DOI: 10.1111/j.1469-1809.2009.00511.x

Source DB:  PubMed          Journal:  Ann Hum Genet        ISSN: 0003-4800            Impact factor:   1.670


  35 in total

1.  Polymorphic variation in the GC and CASR genes and associations with vitamin D metabolite concentration and metachronous colorectal neoplasia.

Authors:  Elizabeth A Hibler; Chengcheng Hu; Peter W Jurutka; Maria E Martinez; Elizabeth T Jacobs
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2011-12-05       Impact factor: 4.254

2.  Power of data mining methods to detect genetic associations and interactions.

Authors:  Annette M Molinaro; Nicholas Carriero; Robert Bjornson; Patricia Hartge; Nathaniel Rothman; Nilanjan Chatterjee
Journal:  Hum Hered       Date:  2011-09-17       Impact factor: 0.444

3.  Comments on Fifty Years of Classification and Regression Trees.

Authors:  Chi Song; Heping Zhang
Journal:  Int Stat Rev       Date:  2014-12-01       Impact factor: 2.217

4.  An integrated approach to reduce the impact of minor allele frequency and linkage disequilibrium on variable importance measures for genome-wide data.

Authors:  Raymond Walters; Charles Laurin; Gitta H Lubke
Journal:  Bioinformatics       Date:  2012-07-30       Impact factor: 6.937

Review 5.  Random forests for genetic association studies.

Authors:  Benjamin A Goldstein; Eric C Polley; Farren B S Briggs
Journal:  Stat Appl Genet Mol Biol       Date:  2011-07-12

Review 6.  Germ line polymorphisms as predictive markers for pre-surgical radiochemotherapy in locally advanced rectal cancer: a 5-year literature update and critical review.

Authors:  Elisa Pezzolo; Yasmina Modena; Barbara Corso; Pietro Giusti; Milena Gusella
Journal:  Eur J Clin Pharmacol       Date:  2015-03-06       Impact factor: 2.953

7.  A strategy analysis for genetic association studies with known inbreeding.

Authors:  Stefano Cabras; Maria Eugenia Castellanos; Ginevra Biino; Ivana Persico; Alessandro Sassu; Laura Casula; Stefano Del Giacco; Francesco Bertolino; Mario Pirastu; Nicola Pirastu
Journal:  BMC Genet       Date:  2011-07-18       Impact factor: 2.797

Review 8.  Systems biology data analysis methodology in pharmacogenomics.

Authors:  Andrei S Rodin; Grigoriy Gogoshin; Eric Boerwinkle
Journal:  Pharmacogenomics       Date:  2011-09       Impact factor: 2.533

9.  Two genetic variants in telomerase-associated protein 1 are associated with stomach cancer risk.

Authors:  Dong-Hao Jin; Sung Kim; Duk-Hwan Kim; Joobae Park
Journal:  J Hum Genet       Date:  2016-06-16       Impact factor: 3.172

10.  Tree-Based Methods for Discovery of Association between Flow Cytometry Data and Clinical Endpoints.

Authors:  M Eliot; L Azzoni; C Firnhaber; W Stevens; D K Glencross; I Sanne; L J Montaner; A S Foulkes
Journal:  Adv Bioinformatics       Date:  2010-01-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.