Literature DB >> 17323372

A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.

Digna R Velez1, Bill C White, Alison A Motsinger, William S Bush, Marylyn D Ritchie, Scott M Williams, Jason H Moore.   

Abstract

Multifactor dimensionality reduction (MDR) was developed as a method for detecting statistical patterns of epistasis. The overall goal of MDR is to change the representation space of the data to make interactions easier to detect. It is well known that machine learning methods may not provide robust models when the class variable (e.g. case-control status) is imbalanced and accuracy is used as the fitness measure. This is because most methods learn patterns that are relevant for the larger of the two classes. The goal of this study was to evaluate three different strategies for improving the power of MDR to detect epistasis in imbalanced datasets. The methods evaluated were: (1) over-sampling that resamples with replacement the smaller class until the data are balanced, (2) under-sampling that randomly removes subjects from the larger class until the data are balanced, and (3) balanced accuracy [(sensitivity+specificity)/2] as the fitness function with and without an adjusted threshold. These three methods were compared using simulated data with two-locus epistatic interactions of varying heritability (0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4) and minor allele frequency (0.2, 0.4) that were embedded in 100 replicate datasets of varying sample sizes (400, 800, 1600). Each dataset was generated with different ratios of cases to controls (1 : 1, 1 : 2, 1 : 4). We found that the balanced accuracy function with an adjusted threshold significantly outperformed both over-sampling and under-sampling and fully recovered the power. These results suggest that balanced accuracy should be used instead of accuracy for the MDR analysis of epistasis in imbalanced datasets. (c) 2007 Wiley-Liss, Inc.

Mesh:

Year:  2007        PMID: 17323372     DOI: 10.1002/gepi.20211

Source DB:  PubMed          Journal:  Genet Epidemiol        ISSN: 0741-0395            Impact factor:   2.135


  135 in total

1.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies.

Authors:  Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Xiaodan Fan; Nelson L S Tang; Weichuan Yu
Journal:  Am J Hum Genet       Date:  2010-09-10       Impact factor: 11.025

2.  Identification of fetal and maternal single nucleotide polymorphisms in candidate genes that predispose to spontaneous preterm labor with intact membranes.

Authors:  Roberto Romero; Digna R Velez Edwards; Juan Pedro Kusanovic; Sonia S Hassan; Shali Mazaki-Tovi; Edi Vaisbuch; Chong Jai Kim; Tinnakorn Chaiworapongsa; Brad D Pearce; Lara A Friel; Jacquelaine Bartlett; Madan Kumar Anant; Benjamin A Salisbury; Gerald F Vovis; Min Seob Lee; Ricardo Gomez; Ernesto Behnke; Enrique Oyarzun; Gerard Tromp; Scott M Williams; Ramkumar Menon
Journal:  Am J Obstet Gynecol       Date:  2010-05       Impact factor: 8.661

3.  Reconstructability analysis as a tool for identifying gene-gene interactions in studies of human diseases.

Authors:  Stephen Shervais; Patricia L Kramer; Shawn K Westaway; Nancy J Cox; Martin Zwick
Journal:  Stat Appl Genet Mol Biol       Date:  2010-03-03

4.  A Bayesian method for identifying genetic interactions.

Authors:  Shyam Visweswaran; An-Kwok Ian Wong; M Michael Barmada
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14

5.  A Balanced Accuracy Fitness Function Leads to Robust Analysis using Grammatical Evolution Neural Networks in the Case of Class Imbalance.

Authors:  Nicholas E Hardison; Theresa J Fanelli; Scott M Dudek; David M Reif; Marylyn D Ritchie; Alison A Motsinger-Reif
Journal:  Genet Evol Comput Conf       Date:  2008

6.  A simple and computationally efficient sampling approach to covariate adjustment for multifactor dimensionality reduction analysis of epistasis.

Authors:  Jiang Gui; Angeline S Andrew; Peter Andrews; Heather M Nelson; Karl T Kelsey; Margaret R Karagas; Jason H Moore
Journal:  Hum Hered       Date:  2010-10-01       Impact factor: 0.444

7.  Evidence for gene-gene epistatic interactions among susceptibility loci for systemic lupus erythematosus.

Authors:  Travis Hughes; Adam Adler; Jennifer A Kelly; Kenneth M Kaufman; Adrienne H Williams; Carl D Langefeld; Elizabeth E Brown; Graciela S Alarcón; Robert P Kimberly; Jeffrey C Edberg; Rosalind Ramsey-Goldman; Michelle Petri; Susan A Boackle; Anne M Stevens; John D Reveille; Elena Sanchez; Javier Martín; Timothy B Niewold; Luis M Vilá; R Hal Scofield; Gary S Gilkeson; Patrick M Gaffney; Lindsey A Criswell; Kathy L Moser; Joan T Merrill; Chaim O Jacob; Betty P Tsao; Judith A James; Timothy J Vyse; Marta E Alarcón-Riquelme; John B Harley; Bruce C Richardson; Amr H Sawalha
Journal:  Arthritis Rheum       Date:  2012-02

8.  Entropy-based information gain approaches to detect and to characterize gene-gene and gene-environment interactions/correlations of complex diseases.

Authors:  R Fan; M Zhong; S Wang; Y Zhang; A Andrew; M Karagas; H Chen; C I Amos; M Xiong; J H Moore
Journal:  Genet Epidemiol       Date:  2011-11       Impact factor: 2.135

9.  Data-driven advice for applying machine learning to bioinformatics problems.

Authors:  Randal S Olson; William La Cava; Zairah Mustahsan; Akshay Varik; Jason H Moore
Journal:  Pac Symp Biocomput       Date:  2018

10.  Variants in TNFSF4, TNFAIP3, TNIP1, BLK, SLC15A4 and UBE2L3 interact to confer risk of systemic lupus erythematosus in Chinese population.

Authors:  Xian-Bo Zuo; Yu-Jun Sheng; Su-Juan Hu; Jin-Ping Gao; Yang Li; Hua-Yang Tang; Xian-Fa Tang; Hui Cheng; Xian-Yong Yin; Lei-Lei Wen; Liang-Dan Sun; Sen Yang; Yong Cui; Xue-Jun Zhang
Journal:  Rheumatol Int       Date:  2013-10-04       Impact factor: 2.631

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.