| Literature DB >> 31666794 |
Mingwei Dai1, Jin Liu2, Can Yang3.
Abstract
Statistical approaches for integrating multiple data sets in genome-wide association studies (GWASs) are increasingly important. Proper utilization of more relevant information is expected to improve statistical efficiency in the analysis. Among these approaches, LEP was proposed for joint analysis of individual-level data and summary-level data in the same population by leveraging pleiotropy. The key idea of LEP is to explore correlation of the association status among different data sets while accounting for the heterogeneity. In this commentary, we show that LEP is applicable to integrate individual-level data and summary-level data of the same trait from different populations, providing new insights into the genetic architecture of different populations.Entities:
Keywords: Genome-wide association study; heterogeneity; integrative analysis; pleiotropy; polygenicity
Year: 2019 PMID: 31666794 PMCID: PMC6798161 DOI: 10.1177/1178222619881624
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
Information of the GWAS data for Crohn’s disease from different populations.
| GWAS | Cases | Controls | nSNP | Ancestry | Type |
|---|---|---|---|---|---|
| WTCCC | 2,005 | 3,004 | 308,950 | England | Individual-level |
| Belgium | 537 | 913 | 953,242 | Belgium | Summary-level |
| Cedars-Sinai | 925 | 2,882 | 953,242 | USA | Summary-level |
| Early Onset | 1,689 | 6,197 | 953,242 | USA, Italy etc. | Summary-level |
| NIDDK | 956 | 982 | 953,242 | USA | Summary-level |
| German | 479 | 1,145 | 953,242 | German | Summary-level |
| Total | 4,586 | 12,119 |
Abbreviations: GWAS, genome-wide association studies; SNP, single-nucleotide polymorphism; WTCCC, Welcome Trust Case Control Consortium.
After extracting overlapped SNPs of individual-level data (after quality control) and summary statistics, we had the individual-level data and a P-value matrix , where is the number of samples, and is the number of overlapped SNPs of individual-level data and summary-level data. The samples from the Cedars-Sinai Medical Center were divided into 2 studies (Cedar 1 and Cedar 2) and the samples from NIDDK were divided into the Jewish study (NiddkJ) and the non-Jewish study (NiddkNJ).
Estimated parameters u, v for every single GWAS jointly analysis with WTCCC data.
| Belge | Cedar 2 | Early Onset | Cedar 1 | NiddkJ | German | NiddkNJ | |
|---|---|---|---|---|---|---|---|
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 0.713 | 0.5698 | 0.9704 | 0.8954 | 0.8953 | 0.9168 | 0.8834 |
| Accuracy | 63.85% ± 0.54% | 63.50% ± 0.59% | 66.30% ± 0.54% | 64.26% ± 0.52% | 63.54% ± 0.40% | 64.08% ± 0.55% | 64.09% ± 0.43% |
Abbreviation: GWAS, genome-wide association studies.
Accuracy is calculated from 10 replications.