Yi Yang1,2, Xingjie Shi2,3, Yuling Jiao4, Jian Huang5, Min Chen6, Xiang Zhou7, Lei Sun8, Xinyi Lin2,9,10, Can Yang11, Jin Liu2. 1. School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China. 2. Centre for Quantitative Medicine, Program in Health Services & Systems Research, Duke-NUS Medical School, 169857, Singapore. 3. Department of Statistics, Nanjing University of Finance and Economics, Nanjing 210046, China. 4. School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China. 5. Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA. 6. Academy of Mathematics and Systems Science, The Chinese Academy of Sciences, Beijing 100190, China. 7. Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA. 8. Cardiovascular and Metabolic Disorders Program, Duke-NUS Medical School, 169857, Singapore. 9. Singapore Clinical Research Institute, 138669, Singapore. 10. Singapore Institute for Clinical Sciences, A*STAR, 117609, Singapore. 11. Department of Mathematics, Hong Kong University of Science and Technology, Hong Kong 999077, China.
Abstract
MOTIVATION: Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. RESULTS: In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. AVAILABILITY AND IMPLEMENTATION: The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. RESULTS: In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. AVAILABILITY AND IMPLEMENTATION: The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Zheng Li; Wei Zhao; Lulu Shang; Thomas H Mosley; Sharon L R Kardia; Jennifer A Smith; Xiang Zhou Journal: Am J Hum Genet Date: 2022-03-24 Impact factor: 11.043
Authors: Shizhen Tang; Aron S Buchman; Philip L De Jager; David A Bennett; Michael P Epstein; Jingjing Yang Journal: PLoS Genet Date: 2021-04-02 Impact factor: 5.917