Literature DB >> 31666794

LEP: A Statistical Method Integrating Individual-Level and Summary-Level Data of the Same Trait From Different Populations.

Mingwei Dai1, Jin Liu2, Can Yang3.   

Abstract

Statistical approaches for integrating multiple data sets in genome-wide association studies (GWASs) are increasingly important. Proper utilization of more relevant information is expected to improve statistical efficiency in the analysis. Among these approaches, LEP was proposed for joint analysis of individual-level data and summary-level data in the same population by leveraging pleiotropy. The key idea of LEP is to explore correlation of the association status among different data sets while accounting for the heterogeneity. In this commentary, we show that LEP is applicable to integrate individual-level data and summary-level data of the same trait from different populations, providing new insights into the genetic architecture of different populations.
© The Author(s) 2019.

Entities:  

Keywords:  Genome-wide association study; heterogeneity; integrative analysis; pleiotropy; polygenicity

Year:  2019        PMID: 31666794      PMCID: PMC6798161          DOI: 10.1177/1178222619881624

Source DB:  PubMed          Journal:  Biomed Inform Insights        ISSN: 1178-2226


Comment on: Dai M, Wan X, Peng H, et al. Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy. Bioinformatics. 2019;35(10):1729-1736. doi:10.1093/bioinformatics/bty870. PubMed PMID: 30307540. https://www.ncbi.nlm.nih.gov/pubmed/30307540. The flourishing growth of genome-wide association studies (GWASs) has provided comprehensive understanding of genetic determinants of disease susceptibility,[1,2] shedding light on better prevention and treatment of diseases. The results from GWAS suggested the existence of “polygenicity” for complex diseases, which means that a complex disease is often affected by many variants with small effects. Due to polygenicity, limited sample size of a single GWAS often has a relatively low statistical power of association identification and poor predictive ability. To this end, many methods have been proposed to effectively improve statistical efficiency by combining multiple data sets.[3,4] These methods might take different types of data as input; integrating different sources of data is often feasible by leveraging pleiotropy.[5,6] Recently, we have proposed a statistical method named LEP[7] to integrate the individual-level genotype data and summary statistics in GWASs. LEP and other statistical methods that integrate individual-level data and summary-level data are becoming increasingly important. This is because we often have limited individual-level data (usually a few thousands of samples at hand) but can get access to summary-level data through many public gateways. Working on limited samples with individual-level data may lead to great uncertainty on the estimation of genetic effects on a complex trait. Fortunately, genome-wide summary-level data bring additional information about genetic effects on the trait. LEP explores this kind of information in the joint analysis of individual-level data and summary-level data. Originally, LEP was designed to integrate multiple traits of the same population by exploring pleiotropy among them. More specifically, pleiotropy means that a variant can affect multiple seemingly unrelated traits. LEP integrates the individual-level data and the summary-level data by modeling their pleiotropic relationship. By introducing and to indicate whether the jth variant is associated with the trait for the individual-level data and the trait for the summary-level data, respectively, LEP characterizes the pleiotropic relationship between the trait for the individual-level data and the trait for the summary-level data through the following probabilistic model Comprehensive simulation studies and real-data analysis demonstrated the effectiveness of LEP by leveraging pleiotropy in the presence of heterogeneity among the individual-level and summary-level data. For a given trait/disease, GWASs have been conducted in different populations. As a matter of fact, many GWASs have been conducted in the populations of European ancestry. Because the allele frequency and linkage disequilibrium (LD) pattern of samples from different populations can be quite different,[6,8,9] heterogeneity of genetic effects widely exists and the discoveries in 1 population could not be directly transferred to another population. The study of different approaches to deal with the heterogeneous genetic effects in different populations is gaining increasing attention. Although LEP was designed to explore pleiotropy among different traits, the essential idea of LEP is to make use of the correlation of association status of multiple GWASs while accounting for the heterogeneity. Clearly, the probabilistic model given in equation (1) can account for heterogeneity in the presence of either pleiotropy or correlated genetic effects of the same trait in different populations. The pair of parameters measures the extent to which the genetic determinants of disease risk are likely to be shared by or specific to populations. As an illustrative example, we applied LEP to analyze GWAS data of Crohn’s disease (CD) from several different populations. The individual-level data are from the Welcome Trust Case Control Consortium (WTCCC).[10] The summary-level data of CD are from the study by Franke et al,[11] composed of the P-values of 7 GWASs in total. These data sets are summarized in Table 1 (detailed information can be found in the study by Dai et al[12]). We first applied Bayesian variable selection regression[13] to the individual-level data and obtained accuracy of (measured by the area under the curve [AUC]). Then, we applied LEP to incorporate summary-level data sets and the accuracy was improved, as shown in Table 2. The corresponding estimated parameters are also given in Table 2, indicating that LEP successfully accounts for heterogeneity.
Table 1.

Information of the GWAS data for Crohn’s disease from different populations.

GWASCasesControlsnSNPAncestryType
WTCCC2,0053,004308,950EnglandIndividual-level
Belgium537913953,242BelgiumSummary-level
Cedars-Sinai9252,882953,242USASummary-level
Early Onset1,6896,197953,242USA, Italy etc.Summary-level
NIDDK956982953,242USASummary-level
German4791,145953,242GermanSummary-level
Total4,58612,119

Abbreviations: GWAS, genome-wide association studies; SNP, single-nucleotide polymorphism; WTCCC, Welcome Trust Case Control Consortium.

After extracting overlapped SNPs of individual-level data (after quality control) and summary statistics, we had the individual-level data and a P-value matrix , where is the number of samples, and is the number of overlapped SNPs of individual-level data and summary-level data. The samples from the Cedars-Sinai Medical Center were divided into 2 studies (Cedar 1 and Cedar 2) and the samples from NIDDK were divided into the Jewish study (NiddkJ) and the non-Jewish study (NiddkNJ).

Table 2.

Estimated parameters u, v for every single GWAS jointly analysis with WTCCC data.

BelgeCedar 2Early OnsetCedar 1NiddkJGermanNiddkNJ
u^ 1111111
v^ 0.7130.56980.97040.89540.89530.91680.8834
Accuracy63.85% ± 0.54%63.50% ± 0.59%66.30% ± 0.54%64.26% ± 0.52%63.54% ± 0.40%64.08% ± 0.55%64.09% ± 0.43%

Abbreviation: GWAS, genome-wide association studies.

Accuracy is calculated from 10 replications.

Information of the GWAS data for Crohn’s disease from different populations. Abbreviations: GWAS, genome-wide association studies; SNP, single-nucleotide polymorphism; WTCCC, Welcome Trust Case Control Consortium. After extracting overlapped SNPs of individual-level data (after quality control) and summary statistics, we had the individual-level data and a P-value matrix , where is the number of samples, and is the number of overlapped SNPs of individual-level data and summary-level data. The samples from the Cedars-Sinai Medical Center were divided into 2 studies (Cedar 1 and Cedar 2) and the samples from NIDDK were divided into the Jewish study (NiddkJ) and the non-Jewish study (NiddkNJ). Estimated parameters u, v for every single GWAS jointly analysis with WTCCC data. Abbreviation: GWAS, genome-wide association studies. Accuracy is calculated from 10 replications. In summary, LEP can effectively account for heterogeneity when integrating individual-level data and summary-level data from GWAS. As a result, not only can LEP be applied to leverage pleiotropy for analysis of multiple traits in the same population but also it can serve as an effective tool to analyze the same trait across different populations.
  10 in total

Review 1.  One hundred years of pleiotropy: a retrospective.

Authors:  Frank W Stearns
Journal:  Genetics       Date:  2010-11       Impact factor: 4.562

2.  Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy.

Authors:  Mingwei Dai; Xiang Wan; Hao Peng; Yao Wang; Yue Liu; Jin Liu; Zongben Xu; Can Yang
Journal:  Bioinformatics       Date:  2019-05-15       Impact factor: 6.937

Review 3.  Clinical use of current polygenic risk scores may exacerbate health disparities.

Authors:  Alicia R Martin; Masahiro Kanai; Yoichiro Kamatani; Yukinori Okada; Benjamin M Neale; Mark J Daly
Journal:  Nat Genet       Date:  2019-03-29       Impact factor: 38.330

Review 4.  Type 2 diabetes: genetic data sharing to advance complex disease research.

Authors:  Jason Flannick; Jose C Florez
Journal:  Nat Rev Genet       Date:  2016-07-11       Impact factor: 53.242

Review 5.  Genomics of disease risk in globally diverse populations.

Authors:  Deepti Gurdasani; Inês Barroso; Eleftheria Zeggini; Manjinder S Sandhu
Journal:  Nat Rev Genet       Date:  2019-06-24       Impact factor: 53.242

Review 6.  Genetic correlations of polygenic disease traits: from theory to practice.

Authors:  Wouter van Rheenen; Wouter J Peyrot; Andrew J Schork; S Hong Lee; Naomi R Wray
Journal:  Nat Rev Genet       Date:  2019-10       Impact factor: 53.242

7.  Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci.

Authors:  Andre Franke; Dermot P B McGovern; Jeffrey C Barrett; Kai Wang; Graham L Radford-Smith; Tariq Ahmad; Charlie W Lees; Tobias Balschun; James Lee; Rebecca Roberts; Carl A Anderson; Joshua C Bis; Suzanne Bumpstead; David Ellinghaus; Eleonora M Festen; Michel Georges; Todd Green; Talin Haritunians; Luke Jostins; Anna Latiano; Christopher G Mathew; Grant W Montgomery; Natalie J Prescott; Soumya Raychaudhuri; Jerome I Rotter; Philip Schumm; Yashoda Sharma; Lisa A Simms; Kent D Taylor; David Whiteman; Cisca Wijmenga; Robert N Baldassano; Murray Barclay; Theodore M Bayless; Stephan Brand; Carsten Büning; Albert Cohen; Jean-Frederick Colombel; Mario Cottone; Laura Stronati; Ted Denson; Martine De Vos; Renata D'Inca; Marla Dubinsky; Cathryn Edwards; Tim Florin; Denis Franchimont; Richard Gearry; Jürgen Glas; Andre Van Gossum; Stephen L Guthery; Jonas Halfvarson; Hein W Verspaget; Jean-Pierre Hugot; Amir Karban; Debby Laukens; Ian Lawrance; Marc Lemann; Arie Levine; Cecile Libioulle; Edouard Louis; Craig Mowat; William Newman; Julián Panés; Anne Phillips; Deborah D Proctor; Miguel Regueiro; Richard Russell; Paul Rutgeerts; Jeremy Sanderson; Miquel Sans; Frank Seibold; A Hillary Steinhart; Pieter C F Stokkers; Leif Torkvist; Gerd Kullak-Ublick; David Wilson; Thomas Walters; Stephan R Targan; Steven R Brant; John D Rioux; Mauro D'Amato; Rinse K Weersma; Subra Kugathasan; Anne M Griffiths; John C Mansfield; Severine Vermeire; Richard H Duerr; Mark S Silverberg; Jack Satsangi; Stefan Schreiber; Judy H Cho; Vito Annese; Hakon Hakonarson; Mark J Daly; Miles Parkes
Journal:  Nat Genet       Date:  2010-12       Impact factor: 38.330

8.  IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

Authors:  Mingwei Dai; Jingsi Ming; Mingxuan Cai; Jin Liu; Can Yang; Xiang Wan; Zongben Xu
Journal:  Bioinformatics       Date:  2017-09-15       Impact factor: 6.937

Review 9.  10 Years of GWAS Discovery: Biology, Function, and Translation.

Authors:  Peter M Visscher; Naomi R Wray; Qian Zhang; Pamela Sklar; Mark I McCarthy; Matthew A Brown; Jian Yang
Journal:  Am J Hum Genet       Date:  2017-07-06       Impact factor: 11.025

10.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Authors: 
Journal:  Nature       Date:  2007-06-07       Impact factor: 49.962

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.