Haohan Wang1, Fen Pei2, Michael M Vanyukov3, Ivet Bahar2, Wei Wu4, Eric P Xing5. 1. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. 2. Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA. 3. Department of Pharmaceutical Sciences, Departments of Psychiatry, and Human Genetics, University of Pittsburgh, Pittsburgh, PA, USA. 4. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. weiwu2@cs.cmu.edu. 5. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. epxing@cs.cmu.edu.
Abstract
BACKGROUND: In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS: In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer's disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .
BACKGROUND: In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS: In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer's disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .
Entities:
Keywords:
Deconfounding; Joint analysis; Mixed model
Authors: Xiaofeng Zhu; Tao Feng; Bamidele O Tayo; Jingjing Liang; J Hunter Young; Nora Franceschini; Jennifer A Smith; Lisa R Yanek; Yan V Sun; Todd L Edwards; Wei Chen; Mike Nalls; Ervin Fox; Michele Sale; Erwin Bottinger; Charles Rotimi; Yongmei Liu; Barbara McKnight; Kiang Liu; Donna K Arnett; Aravinda Chakravati; Richard S Cooper; Susan Redline Journal: Am J Hum Genet Date: 2014-12-11 Impact factor: 11.025
Authors: Bridget F Grant; Risë B Goldstein; Tulshi D Saha; S Patricia Chou; Jeesun Jung; Haitao Zhang; Roger P Pickering; W June Ruan; Sharon M Smith; Boji Huang; Deborah S Hasin Journal: JAMA Psychiatry Date: 2015-08 Impact factor: 21.596
Authors: Zhiwu Zhang; Elhan Ersoz; Chao-Qiang Lai; Rory J Todhunter; Hemant K Tiwari; Michael A Gore; Peter J Bradbury; Jianming Yu; Donna K Arnett; Jose M Ordovas; Edward S Buckler Journal: Nat Genet Date: 2010-03-07 Impact factor: 38.330