| Literature DB >> 29057891 |
Tobias Egli1,2, Vanja Vukojevic1,2,3, Thierry Sengstag4,5, Martin Jacquot4, Rubén Cabezón4, David Coynel2,6, Virginie Freytag1,2, Angela Heck1,2,7, Christian Vogler1,2,7, Dominique J-F de Quervain2,7,6, Andreas Papassotiropoulos1,2,7,3, Annette Milnik8,9,10.
Abstract
Studies assessing the existence and magnitude of epistatic effects on complex human traits provide inconclusive results. The study of such effects is complicated by considerable increase in computational burden, model complexity, and model uncertainty, which in concert decrease model stability. An additional source introducing significant uncertainty with regard to the detection of robust epistasis is the biological distance between the genetic variation and the trait under study. Here we studied CpG methylation, a genetically complex molecular trait that is particularly close to genomic variation, and performed an exhaustive search for two-locus epistatic effects on the CpG-methylation signal in two cohorts of healthy young subjects. We detected robust epistatic effects for a small number of CpGs (N = 404). Our results indicate that epistatic effects explain only a minor part of variation in DNA-CpG methylation. Interestingly, these CpGs were more likely to be associated with gene-expression of nearby genes, as also shown by their overrepresentation in DNase I hypersensitivity sites and underrepresentation in CpG islands. Finally, gene ontology analysis showed a significant enrichment of these CpGs in pathways related to HPV-infection and cancer.Entities:
Mesh:
Year: 2017 PMID: 29057891 PMCID: PMC5651902 DOI: 10.1038/s41598-017-13256-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Sample description.
| Discovery sample | Replication sample | |||
|---|---|---|---|---|
| All subjects data freeze 2013-08 | Selected subjects | All subjects data freeze 2014-04 | Selected subjects | |
| Sample size N | 1174 | 533 | 1935 | 319 |
| Sex female | 59.8% | 58.3% | 66.1% | 69.6% |
| Blood sampled | 63.7% | 100% | 36.1% | 100% |
| Affymetrix 6.0 data | 84.3% | 100% | 89.9% | 100% |
| Genetic outlier | 6.5% | 0% | 7.8% | 0% |
| Age at main investigation | 22 (18–35) | 22 (18–35) | 23 (18–35) | 23 (18–35) |
| Age at blood sampling | 23 (18–36) | 23 (18–36) | 24 (18–39) | 24 (18–36) |
| Days between main investigation and blood drawing | 336 (1–1392) | 350 (2–1385) | 642 (1–1992) | 380 (1–954) |
| Smoking behavior at main investigation | 1.6 (1–5) | 1.6 (1–5) | 1.8 (1–5) | 1.7 (1–5) |
Phenotypic information was collected at the time-point of the main investigation (see Methods). Subjects were later re-invited for an additional blood sampling to investigate e.g. blood-related methylation and expression values. Reported are the numbers from the data freezes used to select subjects for the blood-DNA-methylation study (discovery sample 2013-08; replication sample 2014-04). Quantitative variables are reported as mean (min - max). Smoking behavior was measured on a 5-point Likert-scale ranging from 1 (never) up to 5 (20 cigarettes per day).
Figure 1Power analysis exhaustive search for epistatic effects. In (a) we adjusted alpha to reach genome-wide and methylome-wide Bonferroni correction (discovery phase, p = 6.8 × 10−18). In (b) we adjusted alpha to reach a per-CpG Bonferroni correction threshold (replication phase, p = 3.8 × 10−6). The legends depict the variance that can be explained (in percentage) for different effect sizes (r = 0.03, 0.1%; r = 0.55, 30%). The vertical gray bars correspond to a sample size of N = 533 (discovery sample) and N = 319 (replication sample).
Main results exhaustive search for epistatic effects.
|
| Average | Max |
| Both SNPs in | One SNP in | Both SNPs in | |
|---|---|---|---|---|---|---|---|
| Before replication | 13,112 | 657 | 46,314 | 8,608,567 | 0.03% | 0.20% | 99.78% |
| After replication | 1,477 | 3 | 131 | 4,816 | 43.60% | 10.36% | 46.03% |
| +Permutation and sign-test | 802 | 3 | 49 | 2,262 | 90.45% | 3.98% | 5.57% |
| Per-CpG model approach | 174 | 1 | 5 | 239 | 88.28% | 4.18% | 7.53% |
| +Exclusion of LD-block associated effects | 47 | 1 | 3 | 55 | 63.64% | 18.18% | 18.18% |
The results shown refer to significant interaction effects, depending on the different analytical steps. cis is defined as 500 KB around the CpG.
Figure 2Example of a main effect of a SNP causing a spurious significant interaction effect between two other SNPs. Data is shown for cg00022866 from the discovery sample. (a) rs11231741 shows a strong main effect (p = 4.5 × 10−112). This causes a spurious significant interaction (b p = 3.3 × 10−18) because rs11231741 is in LD with both interacting SNPs (rs11231740: r 2 = 0.55; rs2236648: r 2 = 0.25). Of note, the two interacting SNPs show low LD only (r 2 = 0.024). Panel (c) depicts the dependencies between the 9 SNP-groups build from rs11231740 and rs2236648 and the 3 SNP-groups from rs11231741 (color-coded in black, red and green; a jitter has been added to the data): the 9 SNP-groups of the interacting SNPs mimic the three SNP-groups of the main effect, with 5 of the 9 groups mainly corresponding to the homozygous common allele carrier (black), 3 of the 9 groups mainly corresponding to the heterozygous group (red) and 1 group mainly corresponding to the homozygous rare allele carrier (green). Panel (d) shows the same data as in (b), but now with color-coding of the three SNP-groups from rs11231741.
Exhaustive search average variance explained by main effects and interaction effects.
| Discovery sample average variance explained | Replication sample average variance explained | |
|---|---|---|
|
| 57.1% | 57.8% |
| - Most-significant main effect | 44.9% | 45.4% |
|
| 8.2% | 8.3% |
| - Most-significant LD-block associated effect | 4.4% | 4.5% |
| - Most-significant epistatic effects | 7.8% | 7.4% |
|
| 65.2% | 66% |
The results are based on the N = 174 CpGs that showed at least one significant interaction effect when taking into account also main effects. For only 7 out of 174 CpGs (4%) no significant main effect of a SNP was detectable. 12 out of 174 CpGs (6.9%) showed both, a LD-block based effect as well as an epistatic effect. All significant main effects: average variance explained by all main effects that were kept in the final model. Most-significant main effect: average variance explained by the main effect that exhibited the smallest p-value. All significant interaction effects: average variance explained by all interaction effects that were kept in the final model; these were further separated in LD-block associated effects with SNP-pairs showing an r 2 > 0.021, or epistatic effects (r 2 ≤ 0.021), most-significant corresponds to the effect with the smallest p-value, if more than one of these effects were kept in the final model. All significant main effects and interaction effects: average variance explained by all main effects and interaction effects that were kept in the final model.
Search for epistatic effects based on SNPs exhibiting main-effects.
| Per-CpG model in 3.5 MB window | N unique CpGs | Discovery sample Average variance explained by SNPs | Replication sample Average variance explained by SNPs | ||||
|---|---|---|---|---|---|---|---|
| Most-signif. main effect | All signif. main effects | All signif. main effects and interaction effects | Most-signif. main effect | All signif. main effects | All signif. main effects and interaction effects | ||
| - CpGs showing at least one significant main effect | 59,134 | 16% | 17.7% | 17.7% | 16% | 18.1% | 18.1% |
| - CpGs showing at least two significant main effect | 17,938 | 22.6% | 28.2% | 28.3% | 22.3% | 29% | 29.1% |
| - CpGs showing significant interaction effects | 281 | 31.2% | 41.9% | 46.8% | 31.2% | 43.3% | 49.1% |
Average variance explained by main effects of SNPs or interaction effects of SNP-pairs. Results are shown for three different filtering steps, which were based on the number of significant main effects or interaction effects per CpG, identified with a forward-linear regression approach. Most-signif. main effect: average variance explained by the main effect that exhibited the smallest p-value. All signif. main effects: average variance explained by all main effects that were kept in the final model. All signif. main effects and interaction effects: average variance explained by all main effects and interaction effects that were kept in the final model. Signif.: significant.
Enrichment analyses.
| Expected | Observed |
| |
|---|---|---|---|
| CpG Island | 32.9% | 20.6% | 1.1 × 10−6 |
| TFBS | 63.3% | 66.9% | 0.19 |
| DNase I | 70.4% | 76% | 0.025 |
| Gene expression | 10.8% | 50.9% | 1.2 × 10−127 |
For N = 404 CpGs we identified a significant interaction between SNPs. These 404 CpGs could be assigned to N = 350 clusters. For each cluster we randomly assigned one CpG as representative. For these 350 CpGs we compared the observed percentage of being located in CpG-dense regions (CpG Island), transcription factor binding sites (TFBS), DNase I hypersensitivity sites (DNase I) or being associated with gene expression against the expected numbers that are based on all remaining CpGs (N = 395,027), by using Chi 2-tests.
Results for the gene-set enrichment analysis.
| Term | Pathway |
|
|
|
|
|---|---|---|---|---|---|
| hsa05165 | Human papillomavirus infection | 303 | 10 | 1.7 × 10−7 | 0.0014 |
| hsa05200 | Pathways in cancer | 384 | 10 | 1.1 × 10−6 | 0.0036 |
| hsa05224 | Breast cancer | 141 | 7 | 1.5 × 10−6 | 0.0036 |
| hsa01100 | Metabolic pathways | 1190 | 14 | 1.7 × 10−6 | 0.0036 |
| hsa04014 | Ras signaling pathway | 217 | 7 | 9.9 × 10−6 | 0.017 |
| hsa04151 | PI3K-Akt signaling pathway | 317 | 8 | 1.2 × 10−5 | 0.018 |
| hsa05203 | Viral carcinogenesis | 191 | 6 | 2.9 × 10−5 | 0.036 |
Significant gene-sets (p < 0.05) are reported.