| Literature DB >> 35953715 |
Ravi Mathur1, Fang Fang1, Nathan Gaddis1, Dana B Hancock1, Michael H Cho2,3, John E Hokanson4, Laura J Bierut5, Sharon M Lutz6, Kendra Young4, Albert V Smith7,8, Edwin K Silverman2,3, Grier P Page1,9, Eric O Johnson10,11.
Abstract
Genome-wide association studies (GWAS) have made impactful discoveries for complex diseases, often by amassing very large sample sizes. Yet, GWAS of many diseases remain underpowered, especially for non-European ancestries. One cost-effective approach to increase sample size is to combine existing cohorts, which may have limited sample size or be case-only, with public controls, but this approach is limited by the need for a large overlap in variants across genotyping arrays and the scarcity of non-European controls. We developed and validated a protocol, Genotyping Array-WGS Merge (GAWMerge), for combining genotypes from arrays and whole-genome sequencing, ensuring complete variant overlap, and allowing for diverse samples like Trans-Omics for Precision Medicine to be used. Our protocol involves phasing, imputation, and filtering. We illustrated its ability to control technology driven artifacts and type-I error, as well as recover known disease-associated signals across technologies, independent datasets, and ancestries in smoking-related cohorts. GAWMerge enables genetic studies to leverage existing cohorts to validly increase sample size and enhance discovery for understudied traits and ancestries.Entities:
Mesh:
Year: 2022 PMID: 35953715 PMCID: PMC9372058 DOI: 10.1038/s42003-022-03738-6
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Fig. 1Overview of the protocol to use whole-genome sequencing (WGS) data as public control in GWAS.
*The quality control (QC) of the case and public control data is conducted independently according to the steps outlined in the methods.
Dataset characteristics.
| COGEND | COPDGene | ECLIPSE | |||
|---|---|---|---|---|---|
| Array type | Illumina HumanOmni2.5 | Illumina HumanOmni1-Quad_v1-0_B | Illumina HumanHap550v3.0 | ||
| Array-genotyped data | N, SNPs | 2,443,179 | 1,051,295 | 561,466 | |
| Participants, total N | 2,673 | 9,962 | 2,159 | ||
| Ancestry group, N (%) | European | 1,961 (73%) | 6,664 (67%) | 2,159 (100%) | |
| African American | 712 (27%) | 3,298 (33%) | NA | ||
| Sex, N (%) | Males | 1,019 (38%) | 5,333 (54%) | 1,367 (63%) | |
| Females | 1,654 (62%) | 4,629 (46%) | 792 (37%) | ||
| COPD diagnosis, N (%) | Yes | NA | 4,280 (43%) | 1,764 (82%) | |
| No | 3,632 (36%) | 395 (14%) | |||
| Age (mean ± SD) | 36.6 ± 5.6 | 59.6 ± 9.0 | 62.2 ± 8.2 | ||
| WGS-genotyped dataa | Participants, total N | NA | 9,737 | 1,484 | |
| Ancestry group, N (%) | European | NA | 6,502 (67%) | 1,461 (98%) | |
| African American | 3,235 (33%) | 23b(2%) | |||
| Sex, N (%) | Males | 5,213 (54%) | 933 (64%) | ||
| Females | 4,524 (46%) | 528 (36%) | |||
| COPD diagnosis, N (%) | Yes | 4,186 (43%) | 1,271 (87%) | ||
| No | 3,549 (36%) | 190 (13%) | |||
| Age (mean ± SD) | 59.6 ± 9.0 | 62.7 ± 7.7 | |||
aAll WGS-genotyped data are from TOPMed freeze 6a.
bThe number of African American in ECLIPSE is too small and excluded from following analysis.
Fig. 2Evaluation design for GAWMerge.
Evaluation design for a technical comparison, b type-I error assessment, and c known GWAS hits. *The samples with European ancestry in COPDGene were evenly divided into two subsets of samples. EA1 includes all COPD cases and some COPD controls to match the COPD prevalence in ECLIPSE. EA2 has all the rest COPD free samples.
Fig. 3Meta-analysis results from evaluation for type-I error.
The Manhattan plot (a) shows the expected no signal, while the QQ-plot (b) shows no inflation.
Fig. 4Meta-analysis results for replication of GWAS hits for COPD.
The Manhattan plot (a) shows the replicated signals, while the QQ-plot (b) shows inflation due to the true signal.
Recovery of GWAS-identified variants, following application of our protocol to each of 3 GWAS and their meta-analysis, compared to published risk loci for COPD with combined data from COPDGene, ECLIPSE, NETT/NAS, and GenKOLS (Norway)[24].
| SNP | Position | Risk Allele | Related gene | Reported ( | Current meta-analysis ( | |||
|---|---|---|---|---|---|---|---|---|
| OR | OR | Direction | ||||||
| rs12914385 | chr15:78898723 | T | 1.36 | 2.70E-16 | 1.28 | +++ | 3.35E-16 | |
| rs4416442 | chr4:89866713 | C | 1.36 | 9.44E-15 | 1.21 | +++ | 2.66E-10 | |
| rs7937[ | chr19:41302706 | C | 0.74 | 2.88E-09 | 0.84 | --- | 1.91E-08 | |
| rs4846480 | chr1:218598469 | A | 1.26 | 1.25E-07 | 1.19 | +++ | 9.37E-08 | |
| rs13141641 | chr4:145506456 | T | 1.39 | 3.66E-15 | 1.23 | ?++* | 2.64E-07 | |
| rs754388 | chr14:93115410 | C | 1.33 | 6.69E-08 | 1.12 | ?++* | 0.020 | |
| rs626750 | chr11:102720945 | G | 1.36 | 5.35E-09 | 1.14 | ?++* | 0.005 | |
*The question mark “?” means the SNP is missing from the first analysis, and it may result in reduced power in the final meta-analysis.