| Literature DB >> 34956337 |
Yingjie Guo1,2, Honghong Cheng3, Zhian Yuan4, Zhen Liang5, Yang Wang1, Debing Du6.
Abstract
Unexplained genetic variation that causes complex diseases is often induced by gene-gene interactions (GGIs). Gene-based methods are one of the current statistical methodologies for discovering GGIs in case-control genome-wide association studies that are not only powerful statistically, but also interpretable biologically. However, most approaches include assumptions about the form of GGIs, which results in poor statistical performance. As a result, we propose gene-based testing based on the maximal neighborhood coefficient (MNC) called gene-based gene-gene interaction through a maximal neighborhood coefficient (GBMNC). MNC is a metric for capturing a wide range of relationships between two random vectors with arbitrary, but not necessarily equal, dimensions. We established a statistic that leverages the difference in MNC in case and in control samples as an indication of the existence of GGIs, based on the assumption that the joint distribution of two genes in cases and controls should not be substantially different if there is no interaction between them. We then used a permutation-based statistical test to evaluate this statistic and calculate a statistical p-value to represent the significance of the interaction. Experimental results using both simulation and real data showed that our approach outperformed earlier methods for detecting GGIs.Entities:
Keywords: gene-based testing; gene-gene interactions; genome-wide association studies; maximal neighborhood coefficient; qualitative traits
Year: 2021 PMID: 34956337 PMCID: PMC8693929 DOI: 10.3389/fgene.2021.801261
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Illustration of the Gene-Based gene-gene interaction through a Maximal Neighborhood Coefficient (GBMNC) workflow for detection of gene-based, gene-gene interaction.
Table of odds for the no effect model without interaction between a pair of SNPs.
| AA | Aa | Aa | |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Type-I error for KCCU, GBIGM, AGGrEGATOr, and GBMNC when varying the sample size from 1,000 to 5,000.
| Methods | Sample size | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| 0.02 | 0.02 | 0.01 | 0.05 | 0.07 |
|
| 0.13 | 0.06 | 0.07 | 0.07 | 0.07 |
|
| 0.05 | 0.06 | 0.07 | 0.04 | 0.02 |
|
| 0.02 | 0.05 | 0.07 | 0.05 | 0.05 |
The statistical power of simulation studies for GBMNC, AGGrEGATOr, KCCU and GBIGM under 10 heritability-MAF combinations, with and MAF . Each heritability-MAF combination has five models. Bold font indicates the method that performed best under each model.
| MAF | Heritability | Model | M1 | M2 | M3 | M4 | M5 |
|---|---|---|---|---|---|---|---|
|
| |||||||
| 0.2 | 0.01 | GBMNC | 0.13 | 0.40 | 0.68 | 0.72 | 0.89 |
| AGGrEGATOr | 0.12 | 0.12 | 0.89 | 0.89 | 1 | ||
| KCCU | 0.15 | 0.09 | 0.29 | 0.43 | 0.62 | ||
| GBIGM | 0.09 | 0.11 | 0.13 | 0.11 | 0.08 | ||
| 0.025 | GBMNC | 0.95 | 0.75 | 1 | 0.96 | 1 | |
| AGGrEGATOr | 1 | 0.27 | 1 | 0.37 | 1 | ||
| KCCU | 0.58 | 0.09 | 0.74 | 0.24 | 0.8 | ||
| GBIGM | 0.08 | 0.07 | 0.11 | 0.13 | 0.2 | ||
| 0.05 | GBMNC | 0.68 | 0.83 | 0.94 | 1 | 1 | |
| AGGrEGATOr | 0.09 | 0.59 | 0.89 | 1 | 1 | ||
| KCCU | 0.13 | 0.57 | 0.65 | 0.84 | 0.85 | ||
| GBIGM | 0.18 | 0.08 | 0.22 | 0.17 | 0.19 | ||
| 0.1 | GBMNC | 1 | 1 | 1 | 1 | 1 | |
| AGGrEGATOr | 1 | 1 | 1 | 1 | 1 | ||
| KCCU | 0.81 | 0.93 | 0.9 | 0.86 | 0.91 | ||
| GBIGM | 0.15 | 0.14 | 0.23 | 0.16 | 0.16 | ||
| 0.2 | GBMNC | 1 | 1 | 1 | 1 | 1 | |
| AGGrEGATOr | 1 | 1 | 1 | 1 | 1 | ||
| KCCU | 0.89 | 0.97 | 0.94 | 0.89 | 0.97 | ||
| GBIGM | 0.19 | 0.31 | 0.18 | 0.22 | 0.21 | ||
| 0.4 | 0.01 | GBMNC | 0.75 | 0.66 | 0.82 | 0.90 | 0.96 |
| AGGrEGATOr | 0.71 | 0.09 | 0.1 | 0.94 | 0.96 | ||
| KCCU | 0.34 | 0.05 | 0.08 | 0.77 | 0.29 | ||
| GBIGM | 0.09 | 0.08 | 0.1 | 0.11 | 0.07 | ||
| 0.025 | GBMNC | 1 | 0.73 | 0.85 | 0.93 | 0.80 | |
| AGGrEGATOr | 0.99 | 0.56 | 0.12 | 0.91 | 0.26 | ||
| KCCU | 0.58 | 0.24 | 0.08 | 0.24 | 0.11 | ||
| GBIGM | 0.15 | 0.12 | 0.14 | 0.11 | 0.09 | ||
| 0.05 | GBMNC | 1 | 1 | 1 | 0.68 | 0.86 | |
| AGGrEGATOr | 1 | 0.97 | 0.91 | 0.35 | 0.42 | ||
| KCCU | 0.86 | 0.9 | 0.95 | 0.41 | 0.37 | ||
| GBIGM | 0.11 | 0.12 | 0.09 | 0.08 | 0.10 | ||
| 0.1 | GBMNC | 1 | 1 | 1 | 0.63 | 1 | |
| AGGrEGATOr | 0.98 | 1 | 0.96 | 0.27 | 1 | ||
| KCCU | 0.62 | 1 | 0.95 | 0.41 | 1 | ||
| GBIGM | 0.12 | 0.19 | 0.18 | 0.26 | 0.20 | ||
| 0.2 | GBMNC | 1 | 1 | 1 | 1 | 1 | |
| AGGrEGATOr | 0.93 | 1 | 0.99 | 1 | 0.80 | ||
| KCCU | 0.28 | 1 | 0.83 | 1 | 0.76 | ||
| GBIGM | 0.19 | 0.25 | 0.31 | 0.13 | 0.26 |
Average power for GBMNC, AGGrEGATOr, KCCU, and GBIGM under 10 heritability-MAF combinations, with heritability and MAF.
| MAF | Method | GBMNC | AGGrE-GATOr | KCCU | GBIGM |
|---|---|---|---|---|---|
| Heritability | |||||
| 0.2 | 0.01 | 0.564 | 0.604 | 0.316 | 0.104 |
| 0.025 | 0.932 | 0.728 | 0.490 | 0.118 | |
| 0.05 | 0.890 | 0.714 | 0.608 | 0.168 | |
| 0.1 | 1 | 1 | 0.882 | 0.168 | |
| 0.2 | 1 | 1 | 0.932 | 0.222 | |
| 0.4 | 0.01 | 0.818 | 0.560 | 0.306 | 0.090 |
| 0.025 | 0.862 | 0.568 | 0.250 | 0.122 | |
| 0.05 | 0.908 | 0.730 | 0.698 | 0.100 | |
| 0.1 | 0.926 | 0.842 | 0.796 | 0.190 | |
| 0.2 | 1 | 0.944 | 0.774 | 0.228 |
FIGURE 2Illustration of the distribution of power of each method in each heritability-MAF combination with and MAF .
The statistical power of simulation studies for GBMNC, AGGrEGATOr, KCCU, and GBIGM under models with , MAF , and sample sizes that varied from to .
| MAF | Method | GBMNC | AGGrEGATOr | KCCU | GBIGM |
|---|---|---|---|---|---|
| Sample size | |||||
| 0.2 | 1,000 | 0.67 | 0.15 | 0.11 | 0.2 |
| 2000 | 0.83 | 0.18 | 0.38 | 0.16 | |
| 3,000 | 1 | 0.20 | 0.55 | 0.23 | |
| 4,000 | 1 | 0.31 | 0.76 | 0.21 | |
| 5,000 | 1 | 0.29 | 0.87 | 0.12 | |
| 0.4 | 1,000 | 0.68 | 0.16 | 0.13 | 0 |
| 2000 | 0.97 | 0.20 | 0.11 | 0.04 | |
| 3,000 | 1 | 0.35 | 0.2 | 0.11 | |
| 4,000 | 1 | 0.54 | 0.37 | 0.11 | |
| 5,000 | 1 | 0.65 | 0.58 | 0.05 |
The calculated p-value for the 20 gene pairs using GBMNC and AGGrEGATOr. p-values in bold font indicate that they are significant. The “Chr” column indicates the chromosome number of the human genome where the gene is located.
| Gene1 | Chr | Gene2 | Chr |
| |
|---|---|---|---|---|---|
| GBMNC | AGGrEGATOr | ||||
| TGF- | 1 | CXCL8 | 4 | 0.0 | 1 |
| CTLA4 | 2 | GM-CSF | 5 | 0.0 | 0.327 |
| CD80 | 3 | HLA-classII | 6 | 0.0 | 0.37 |
| GM-CSF | 5 | TRAP | 19 | 0.0 | 0.01 |
| TLR-4 | 9 | FLT-1 | 13 | 0.0 | 0.069 |
| IL-17 | 6 | TNFSF13B | 13 | 0.0 | 0.185 |
| CXCL6 | 4 | ICAM1 | 19 | 0.0 | 1 |
| CD28 | 2 | CXCL6 | 4 | 0.0 | 0.512 |
| CTLA4 | 2 | CXCL6 | 4 | 0.0 | 0.849 |
| MMP-3 | 11 | FLT-1 | 13 | 0.0 | 0.089 |
| CD80 | 3 | April | 17 | 0.99 | 0.0007 |
| CTSK | 1 | TNFSF13B | 13 | 0.615 | 0.0008 |
| JUN | 1 | IL-6 | 7 | 0.445 | 0.0019 |
| CD80 | 3 | CTSL | 25 | 0.0 | 0.002 |
| CXCL6 | 4 | FLT-1 | 13 | 0.297 | 0.0021 |
| CTLA4 | 2 | FOS | 37 | 0.727 | 0.0022 |
| FLT-1 | 13 | LFA-1 | 39 | 0.815 | 0.0033 |
| CCL3 | 17 | TRAP | 19 | 0.564 | 0.0034 |
| IL-18 | 11 | TGF- | 14 | 0.693 | 0.004 |
| IL-1 | 2 | CXCL12 | 10 | 0.081 | 0.004 |