Fentaw Abegaz1, François Van Lishout2, Jestinah M Mahachie John2, Kridsadakorn Chiachoompu2, Archana Bhardwaj2, Diane Duroux2, Elena S Gusareva2, Zhi Wei3, Hakon Hakonarson4,5, Kristel Van Steen2,6. 1. GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium. fentawabegaz@yahoo.com. 2. GIGA-R, Medical Genomics - BIO3, University of Liège, Liège, Belgium. 3. Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA. 4. Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA. 5. Department of Pediatrics, Division of Human Genetics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. 6. WELBIO (Walloon Excellence in Lifesciences and Biotechnology), University of Liège, Liège, Belgium.
Abstract
BACKGROUND: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. METHODS: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. RESULTS: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. CONCLUSION: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.
BACKGROUND: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. METHODS: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. RESULTS: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. CONCLUSION: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.
Entities:
Keywords:
Confounding; Epistasis; GWAIS; GWAS; Gene-gene interaction; MB-MDR; Population stratification; Population structure; Principal components
Authors: Xiang Wan; Can Yang; Qiang Yang; Hong Xue; Xiaodan Fan; Nelson L S Tang; Weichuan Yu Journal: Am J Hum Genet Date: 2010-09-10 Impact factor: 11.025
Authors: Tom Cattaert; M Luz Calle; Scott M Dudek; Jestinah M Mahachie John; François Van Lishout; Victor Urrea; Marylyn D Ritchie; Kristel Van Steen Journal: Ann Hum Genet Date: 2010-09-08 Impact factor: 1.670
Authors: Anatoliy I Yashin; Deqing Wu; Konstantin Arbeev; Arseniy P Yashkin; Igor Akushevich; Olivia Bagley; Matt Duan; Svetlana Ukraintseva Journal: J Transl Genet Genom Date: 2021-10-19