| Literature DB >> 32466134 |
Yu Jiang1, Sai Chen2, Xingyan Wang1, Mengzhen Liu3, William G Iacono4, John K Hewitt5, John E Hokanson6, Kenneth Krauter5, Markku Laakso7, Kevin W Li8, Sharon M Lutz9, Matthew McGue3, Anita Pandit8, Gregory J M Zajac8, Michael Boehnke8, Goncalo R Abecasis8, Scott I Vrieze3, Bibo Jiang1, Xiaowei Zhan10, Dajiang J Liu1.
Abstract
There is great interest in understanding the impact of rare variants in human diseases using large sequence datasets. In deep sequence datasets of >10,000 samples, ~10% of the variant sites are observed to be multi-allelic. Many of the multi-allelic variants have been shown to be functional and disease-relevant. Proper analysis of multi-allelic variants is critical to the success of a sequencing study, but existing methods do not properly handle multi-allelic variants and can produce highly misleading association results. We discuss practical issues and methods to encode multi-allelic sites, conduct single-variant and gene-level association analyses, and perform meta-analysis for multi-allelic variants. We evaluated these methods through extensive simulations and the study of a large meta-analysis of ~18,000 samples on the cigarettes-per-day phenotype. We showed that our joint modeling approach provided an unbiased estimate of genetic effects, greatly improved the power of single-variant association tests among methods that can properly estimate allele effects, and enhanced gene-level tests over existing approaches. Software packages implementing these methods are available online.Entities:
Keywords: GWAS; meta-analysis; multi-allelic variants; smoking
Mesh:
Year: 2020 PMID: 32466134 PMCID: PMC7288273 DOI: 10.3390/genes11050586
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Genotype coding for multi-allelic variant with two different alternative alleles.
| Genotypes | Genotype Coding |
|---|---|
|
| (0,0) |
|
| (1,0) |
|
| (2,0) |
|
| (1,1) |
|
| (0,2) |
|
| (0,1) |
The power for single-variant association analysis. We compared the power of single allelic analysis and joint multi-allelic analysis for detecting associations with each alternative allele. The power was evaluated under the threshold of , adjusting for the increased multiple testing burden for analyzing multiple alleles.
| Sample Size | Genetic Effects | Single Allelic Analysis | Multi-Allelic Analysis |
|---|---|---|---|
|
| |||
| 10,000 | 0 | 4.7 × 10−8 | 4.2 × 10−8 |
| 0.1 | 0.24 | 0.25 | |
| 0.25 | 0.57 | 0.57 | |
| 0.5 | 0.75 | 0.76 | |
| 20,000 | 0 | 4.6 × 10−8 | 4.2 × 10−8 |
| 0.1 | 0.36 | 0.37 | |
| 0.25 | 0.67 | 0.68 | |
| 0.5 | 0.82 | 0.82 | |
|
| |||
| 10,000 | 0 | 4.1 × 10−8 | 4.8 × 10−8 |
| 0.1 | 0.037 | 0.056 | |
| 0.25 | 0.24 | 0.3 | |
| 0.5 | 0.48 | 0.55 | |
| 20,000 | 0 | 4.9 × 10−8 | 4.3 × 10−8 |
| 0.1 | 0.087 | 0.12 | |
| 0.25 | 0.36 | 0.43 | |
| 0.5 | 0.6 | 0.66 | |
The Type I Error and Power for Gene-level Association Tests. We compared the power for simple burden, SKAT, and VT tests for the joint multi-allelic analysis and the analysis that discards multi-allelic sites. The power and type I error were assessed under a threshold of using 1,000,000 replicates.
| MAF Cutoff | Pct of Causal Variants | Power | |
|---|---|---|---|
| Burden/SKAT/VT | Burden/SKAT/VT | ||
| Joint Multi-allelic Analysis | Discard Multi-allelic Sites | ||
| Type I Error | |||
| 0.01 | 0% | 2.6 × 10−6/2.1 × 10−6/3.0 × 10−6 | 2.5 × 10−6/3.1 × 10−6/2.6 ×10−6 |
| 0.05 | 0% | 2.5 × 10−6/2.3 × 10−6/2.3 × 10−6 | 3.0 × 10−6/2.1 × 10−6/2.7 × 10−6 |
| Power—Causal Variants Have Uni-directional Effects | |||
| 0.01 | 20% | 0.50/0.39/0.68 | 0.42/0.35/0.61 |
| 50% | 0.93/0.79/0.99 | 0.90/0.77/0.98 | |
| 0.05 | 20% | 0.42/0.39/0.71 | 0.37/0.36/0.64 |
| 50% | 0.88/0.80/0.99 | 0.87/0.79/0.99 | |
| Power—Causal Variants Have Bi-directional Effects | |||
| 0.01 | 20% | 0.06/0.16/0.13 | 0.05/0.13/0.11 |
| 50% | 0.14/0.44/0.31 | 0.14/0.40/0.29 | |
| 0.05 | 20% | 0.05/0.17/0.14 | 0.05/0.15/0.12 |
| 50% | 0.12/0.42/0.30 | 0.12/0.40/0.30 | |
Top Single-Variant Association Signals for the Cigarettes-Per-Day Phenotype Using Multi-Allelic Analysis. Results are shown for variants with p-values less than 5 × 10−8. We report the p-values and genetic effect estimates for each alternative allele at multi-allelic sites. As a comparison, we also report the p-values and test statistics from single-allelic analysis, as well as the omnibus test that collapses multiple alleles.
| Position | Ref Allele | Alt Allele | Alt Allele Freq | β | β SD | N | Direction of Effects* | Anno | Stat Single-Allelic Analysis | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15:78915370 | CT | C | 0.41 | 1.6 × 10−11 | 0.078 | 0.012 | 17,512 | -+++++++ | Intergenic | 44.67 | 2.3 × 10−11 | 1.0 × 10−10 |
| 15:78915370 | CT | CTTT | 0.019 | 0.61 | 0.022 | 0.044 | 17,512 | -++++-+- | Intergenic | 0.34 | 0.55 | |
| 15:78859605 | AAAAAG | A | 0.33 | 2.3 × 10−11 | 0.079 | 0.012 | 17,512 | -+++++++ | Deletion; CHRNA5 | 43.74 | 3.8 × 10−11 | 5.1 × 10−10 |
| 15:78859605 | A | G | 0.00077 | 0.38 | 0.29 | 0.33 | 17,512 | +++-+-+- | Intron; CHRNA5 | 0.091 | 0.76 | |
| 15:78913353 | CGCGGGCGG | C | 0.47 | 2.4 × 10−9 | 0.072 | 0.012 | 17,512 | -+++++++ | Deletion; CHRNA3 | 33.31 | 7.8 × 10−9 | 2.8 × 10−7 |
| 15:78913353 | CGCGGGCGG | CGCGGGCGGGCGG | 0.033 | 0.10 | −0.057 | 0.035 | 17,512 | +--+---- | Insertion; CHRNA3 | 0.75 | 0.38 | |
| 15:78785944 | AT | ATT | 0.29 | 7.7 × 10−9 | 0.079 | 0.014 | 17,512 | -+++++++ | Insertion; IREB2 | 31.81 | 1.7 × 10−8 | 1.5 × 10−4 |
| 15:78785944 | AT | A | 0.18 | 0.71 | 0.0056 | 0.016 | 17,512 | +------+ | Deletion; IREB2 | 0.00057 | 0.98 | |
| 15:78871382 | CT | CTT | 0.40 | 1.4 × 10−8 | 0.080 | 0.014 | 13,723 | XX++++++ | Insertion; CHRNA5 | 27.81 | 1.3 × 10−7 | 6.2 × 10−7 |
| 15:78871382 | CT | C | 0.070 | 0.37 | 0.022 | 0.025 | 17,512 | -+-----+ | Deletion; CHRNA5 | 0.29 | 0.58 | |
| 15:78751667 | G | GTTTTTTGTTTGTTTGT | 0.29 | 1.6 × 10−8 | 0.071 | 0.013 | 17,512 | -+++++++ | Insertion; IREB2 | 22.25 | 2.4 × 10−6 | 1.1 × 10−7 |
| 15:78751667 | G | GTTTTTTTGTTTGTTTG | 0.0019 | 0.97 | 0.0048 | 0.14 | 17,512 | ++--+--- | Insertion; IREB2 | 0.065 | 0.80 |
*: “+” and “-“ denote the direction of relationship between alternative alleles and Cigarettes-Per-Day phenotype.
Top Gene-level Association Signals for Genes with Multi-allelic Sites. We performed simple burden, SKAT, and VT tests under the two different minor allele frequency cutoffs, 0.01 and 0.05. No results were significant under the threshold . For each rare-variant test performed, we show the test statistics, p-values, the number of variant sites, and the number of multi-allelic variant sites for the top 3 signals.
| Gene | Statistic | Number of Variant Site | Number of Multi-allelic Site | Number of Multi-allelic Site with Rare Variant | Gene | Statistic | Number of Variant Site | Number of Multi-allelic Site | Number of Multi-allelic Site with Rare Variant | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Burden Test with MAF < 1% | Burden Test with MAF < 5% | ||||||||||
| MLKL | 16.81 | 4.1 × 10−5 | 16 | 8 | 4 | PTPN22 | 13.49 | 0.00024 | 19 | 10 | 3 |
| DMBX1 | 13.11 | 0.00029 | 4 | 2 | 2 | CROCC | 11.14 | 0.00085 | 80 | 11 | 8 |
| BRD3 | 10.73 | 0.0011 | 5 | 6 | 4 | HLA-DQA1 | 10.25 | 0.0014 | 6 | 15 | 0 |
| SKAT Test with MAF < 1% | SKAT Test with MAF < 5% | ||||||||||
| ABTB1 | 1,654,137.32 | 5.5 × 10−5 | 12 | 6 | 5 | ABTB1 | 1,834,395.76 | 0.00015 | 12 | 6 | 5 |
| SEMA7A | 1,056,004.78 | 0.00032 | 5 | 2 | 1 | DTNBP1 | 4,098,263.16 | 0.00049 | 7 | 6 | 5 |
| METTL8 | 1,075,541.99 | 0.00036 | 6 | 15 | 8 | NRBF2 | 1,454,883.06 | 0.00074 | 3 | 4 | 2 |
| VT Test with MAF < 1% | VT Test with MAF < 5% | ||||||||||
| TTC15 | 21.98 | 1.9 × 10−5 | 17 | 23 | 9 | TTC15 | 21.98 | 2.46 × 10−5 | 17 | 23 | 9 |
| MLKL | 16.81 | 0.00031 | 16 | 8 | 4 | WNK1 | 16.11 | 0.00049 | 41 | 25 | 8 |
| ARHGEF40 | 15.08 | 0.00078 | 26 | 4 | 2 | MLKL | 15.71 | 0.00068 | 16 | 8 | 4 |