| Literature DB >> 32573681 |
Anja C Gumpinger1,2, Bastian Rieck1,2, Dominik G Grimm3,4, Karsten Borgwardt1,2.
Abstract
MOTIVATION: Correlating genetic loci with a disease phenotype is a common approach to improve our understanding of the genetics underlying complex diseases. Standard analyses mostly ignore two aspects, namely genetic heterogeneity and interactions between loci. Genetic heterogeneity, the phenomenon that genetic variants at different loci lead to the same phenotype, promises to increase statistical power by aggregating low-signal variants. Incorporating interactions between loci results in a computational and statistical bottleneck due to the vast amount of candidate interactions.Entities:
Year: 2021 PMID: 32573681 PMCID: PMC8034561 DOI: 10.1093/bioinformatics/btaa581
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) The binary dataset D. Rows correspond to samples, columns correspond to binary genetic variants. Phenotype vector y and categorical covariate vector c. (B) Network. Nodes correspond to genes, and each edge represents an interaction between two genes. (C) Mapping of genetic variants to genes in the network. All variants that overlap with a gene’s introns and exons (dark orange) are mapped to the gene. Optionally, variants that lie in a fixed-size window around the gene (light orange) are mapped to the gene as well. (D) Analysis of an interaction. Both adjacent genes are represented by their overlapping genetic markers, highlighted in blue and green in the data matrix. All possible segments are considered in both genes (illustrated by horizontal colored bars, where segments of length 1, i.e. single SNPs, are omitted). Each segment from the first gene is tested against each segment from the second gene. (E) Heterogeneity encoding (allelic heterogeneity) of two segments in the blue and green genes, highlighted with black boxes in A. In white, heterogeneity encoding (locus heterogeneity) w of their interaction. The max function is the elementwise maximum over the two vectors
A 2 × 2 contingency table to test the binary encoding w of a segment interaction for its association with the binary phenotype y
| Class label |
|
| Row totals |
|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Col. totals |
|
|
|
Overview of comparison partners
| SNPs | Genes | Edges | Segments | SNP-interactions | Segment-interactions | |
|---|---|---|---|---|---|---|
| Tarone | FastCMH | edgeEpi (WY) | SiNIMin (WY) | |||
| FastLMM | FastLMM-singleSNP | FastLMM-genes | FastLMM-edge | FastLMM-segment | FastLMM-interact | |
| SKAT-O | SKATO-genes | SKATO-edge | SKATO-segment | SKATO-interact | ||
| PLINK |
PLINK-epistasis, PLINK-fast-epistasis |
Note: Columns indicate the features/feature-sets that were tested, rows the different method-classes that either fall into the FastLMM framework (Lippert , 2014), the SKAT-O framework (Lee ), the Tarone framework (Llinares-López et al., 2017) or the PLINK framework (Chang ) (see Supplementary Section S2.2 for details).
Fig. 2.Power (1-type II error) and type I error analysis of the simulation study. We compared SiNIMin and SiNIMin-WY to (A) approaches that consider interactions between sets of genetic variants and (B) approaches that test sets of genetic variants (no interactions) (see Table 2 for comparison partners). Both figures show the power to detect the truly significant interaction for varying association strengths p. (C) Empirical FWERs for varying association strengths p. The black horizontal line indicates the target FWER ()
Novel hits for A.thaliana
| Phenotype | Gene–gene interaction |
| SiNIMin-WY | SiNIMin | edgeEpi-WY | edgeEpi |
|---|---|---|---|---|---|---|
| avrB | AT1G15750–AT1G17380 | 6.76e-08 |
| 0 | 0 | 0 |
| avrRpm1 | AT1G15750–AT1G17380 | 1.15e-07 |
| 0 | 0 | 0 |
| Chlorosis22 | AT1G74490–AT2G41100 | 9.47e-08 |
| 0 | 0 | 0 |
| Hiks1 | AT1G15760–AT5G43460 | 2.19e-07 |
| 0 | 0 | 0 |
| Hiks1 | AT4G16845–AT5G57380 | 4.26e-08 |
| 0 | 0 | 0 |
| Hiks1 | AT4G19030–AT5G43460 | 2.19e-07 |
| 0 | 0 | 0 |
| Leafroll16 | AT5G25150–AT5G45600 | 3.89e-09 |
| 10 | 0 | 0 |
| Leafroll22 | AT3G18490–AT5G42980 | 5.24e-09 |
| 16 | 0 | 0 |
| Noco2 | AT2G01950–AT3G43850 | 1.75e-07 |
| 0 | 0 | 0. |
| avrRpt2 | AT3G15660–AT4G15730 | 1.15e-07 | 0 | 0 |
| 1 |
| Leafroll10 | AT2G04630–AT3G56270 | 3.59e-07 | 0 | 0 |
| 0 |
Note: The last four columns contain the number of significant gene-segment interactions (SiNIMin, SiNIMin-WY) and SNP interactions (edgeEpi, edgeEpi-WY) in the novel hit. We report the lowest p-value for any pair of segments within the novel hit, and highlight the method in bold, for which this p-value is obtained. For more details on the significant segments, see Supplementary Table S6.
Novel hits for migraine cohorts
| Dataset | Gene-interaction |
| SiNIMin-WY | SiNIMin | edgeEpi-WY | edgeEpi |
|---|---|---|---|---|---|---|
| dMaMo | EPHA6–TIAM1 | 2.64e-08 |
| 0 |
| 1 |
| gMaMo | BMP4–BMPR1B | 1.50e-07 | 0 | 0 |
| 0 |
| gMaMo | HAO1–VDAC3 | 1.33e-07 | 0 | 0 |
| 0 |
Note: The last four columns contain the number of significant gene segment interactions (SiNIMin, SiNIMin-WY) and SNP interactions (edgeEpi, edgeEpi-WY) in the novel hit. We report the lowest p-value for any pair of segments within the novel hit, and highlight the method in bold, for which this p-value is obtained. For more details on the significant segments, see Supplementary Table S10.