| Literature DB >> 32415273 |
George Kanoungi1, Michael Nothnagel1, Tim Becker2,3, Dmitriy Drichel4,5.
Abstract
Region-based genome-wide scans are usually performed by use of a priori chosen analysis regions. Such an approach will likely miss the region comprising the strongest signal and, thus, may result in increased type II error rates and decreased power. Here, we propose a genomic exhaustive scan approach that analyzes all possible subsequences and does not rely on a prior definition of the analysis regions. As a prime instance, we present a computationally ultraefficient implementation using the rare-variant collapsing test for phenotypic association, the genomic exhaustive collapsing scan (GECS). Our implementation allows for the identification of regions comprising the strongest signals in large, genome-wide rare-variant association studies while controlling the family-wise error rate via permutation. Application of GECS to two genomic data sets revealed several novel significantly associated regions for age-related macular degeneration and for schizophrenia. Our approach also offers a high potential to improve genome-wide scans for selection, methylation, and other analyses.Entities:
Mesh:
Year: 2020 PMID: 32415273 PMCID: PMC7608423 DOI: 10.1038/s41431-020-0639-3
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Fig. 1Workflow chart of the entire simulation study illustrating the major steps and procedures.
See main text for details.
Empirical, sample-size dependent significance thresholds (α, with control of the FWER at 5%) for simulated genome-wide studies.
| Sample size | Number of replications | SMA | GECS, 3 MAF | GECS, MAF | GECS, MAF | GECS, MAF |
|---|---|---|---|---|---|---|
| 1000 | 1000 | 2.95 × 10−8 | 7.35 × 10−10 | 3.61 × 10−9 | 1.73 × 10−9 | 1.60 × 10−9 |
| 5000 | 1000 | 1.86 × 10−8 | 3.31 × 10−10 | 1.26 × 10−9 | 8.92 × 10−10 | 8.49 × 10−10 |
| 10,000 | 1000 | 1.27 × 10−8 | 2.81 × 10−10 | 1.05 × 10−9 | 7.13 × 10−10 | 6.91 × 10−10 |
| 20,000 | 500 | 1.15 × 10−8 | 2.59 × 10−10 | 9.28 × 10−10 | 6.36 × 10−10 | 6.01 × 10−10 |
Fig. 2Comparative power analysis for a rare disease (prevalence K = 0.01) and small sample size (N = 1000).
Results are given for studies with proportion of neutral rare variants (PNV) = 0.3, different simulated window sizes (x-axis), and different proportions of detrimental rare variants (PDV) (y-axis). Black lines: GECS; red lines: SMA. In each grid cell, the power is presented on the y-axis and OR intervals on the x-axis. For an overview see Table S20.
Fig. 3Comparative power analysis for a rare disease (prevalence K = 0.01) and moderate sample size (N = 10,000).
Results are given for studies with proportion of neutral rare variants (PNV) = 0.3, different simulated window sizes (x-axis), and different proportions of detrimental rare variants (PDV) (y-axis). Black lines: GECS; red lines: SMA. In each grid cell, the power is presented on the y-axis and OR intervals on the x-axis. For an overview see Table S20.
Significance thresholds (α, with control of the FWER at 5%) for the whole-genome, imputed AAMD data set, and the whole-exome SCZD data set.
| Data set | SMA | GECS, 3 MAF | GECS, MAF | GECS, MAF | GECS, MAF |
|---|---|---|---|---|---|
| AAMD | 1.81 × 10−8 | 1.43 × 10−9 | 7.42 × 10−9 | 2.80 × 10−9 | 2.54 × 10−9 |
| SCZD | 8.32 × 10−7 | 1.87 × 10−8 | 4.35 × 10−8 | 3.59 × 10−8 | 2.84 × 10−8 |
A selection of bins with the locally most significant association signals in AAMD and SCZD data sets, detected by GECS and verified by SKAT.
| Chr. | Bin position (hg19) | Gene | MAF | #RVs | OR [95% CI] | ||||
|---|---|---|---|---|---|---|---|---|---|
| AAMD | 6 | 31,935,392 | 31,937,762 | 0.03 | 28 | 0.55 [0.52, 0.59] | 6.24 × 10−76 | 4.74 × 10−81 | |
| 6 | 31,878,006 | 31,878,721 | 0.05 | 5 | 0.53 [0.50, 0.57] | 3.78 × 10−80 | 1.23 × 10−70 | ||
| 10 | 124,226,492 | 124,249,185 | 0.05 | 64 | 0.62 [0.59, 0.65] | 2.09 × 10−84 | 2.96 × 10−30 | ||
| 19 | 6,718,146 | 6,718,155 | 0.03 | 2 | 2.98 [2.42, 3.69] | 7.87 × 10−27 | 6.30 × 10−28 | ||
| 6 | 31,473,707 | 31,474,883 | 0.05 | 6 | 1.27 [1.19, 1.38] | 1.71 × 10−10 | 2.08 × 10−13 | ||
| 6 | 31,323,455 | 31,323,745 | 0.05 | 12 | 1.18 [1.12, 1.24] | 3.48 × 10−10 | 2.76 × 10−11 | ||
| 4 | 110,685,721 | 110,685,820 | 0.01 | 5 | 3.42 [2.34, 5.04] | 2.15 × 10−11 | 7.03 × 10−10 | ||
| 6 | 31,373,445 | 31,373,957 | 0.05 | 9 | 1.29 [1.20, 1.39] | 5.74 × 10−12 | 1.03 × 10−09 | ||
| 5 | 39,199,134 | 39,199,134 | 0.03 | 1 | 1.75 [1.47, 2.08] | 2.40 × 10−10 | 1.70 × 10−10 | ||
| 5 | 39,327,884 | 39,327,888 | 0.03 | 2 | 1.75 [1.48, 2.03] | 4.58 × 10−12 | 1.28 × 10−11 | ||
| 9 | 33,796,672 | 33,798,630 | 0.05 | 20 | 1.37 [1.24, 1.52] | 5.07 × 10−10 | 3.89 × 10−11 | ||
| 15 | 73,044,829 | 73,044,833 | 0.03 | 2 | 0.42 [0.36, 0.52] | 1.17 × 10−19 | 4.30 × 10−20 | ||
| 17 | 49,239,143 | 49,239,143 | 0.01 | 1 | 0.13 [0.06, 0.29] | 2.72 × 10−09 | 5.58 × 10−11 | ||
| 19 | 8,999,386 | 9,028,410 | 0.03 | 62 | 1.29 [1.19, 1.40] | 2.59 × 10−09 | 3.10 × 10−10 | ||
| 22 | 17,687,954 | 17,688,129 | 0.01 | 9 | 0.27 [0.19, 0.40] | 3.79 × 10−13 | 6.77 × 10−14 | ||
Each bin is the most significant signal in the block of all overlapping significant bins detected by GECS. These bins are verified by SKAT, adjusted for sex, age, ten principal components, and common variants in physical proximity, if available (p′ values). For verification with SKAT, we set the threshold at 5 × 10−8 for AAMD and 2 × 10−6 for SCZD. See Supplementary Material for more comprehensive results.