| Literature DB >> 32302299 |
Chengwen Luo1,2, Botao Fa1,2, Yuting Yan2, Yang Wang2, Yiwang Zhou3, Yue Zhang1,2, Zhangsheng Yu1,2.
Abstract
Mediation analysis with high-dimensional DNA methylation markers is important in identifying epigenetic pathways between environmental exposures and health outcomes. There have been some methodology developments of mediation analysis with high-dimensional mediators. However, high-dimensional mediation analysis methods for time-to-event outcome data are still yet to be developed. To address these challenges, we propose a new high-dimensional mediation analysis procedure for survival models by incorporating sure independent screening and minimax concave penalty techniques for variable selection, with the Sobel and the joint method for significance test of indirect effect. The simulation studies show good performance in identifying correct biomarkers, false discovery rate control, and minimum estimation bias of the proposed procedure. We also apply this approach to study the causal pathway from smoking to overall survival among lung cancer patients potentially mediated by 365,307 DNA methylations in the TCGA lung cancer cohort. Mediation analysis using a Cox proportional hazards model estimates that patients who have serious smoking history increase the risk of lung cancer through methylation markers including cg21926276, cg27042065, and cg26387355 with significant hazard ratios of 1.2497(95%CI: 1.1121, 1.4045), 1.0920(95%CI: 1.0170, 1.1726), and 1.1489(95%CI: 1.0518, 1.2550), respectively. The three methylation sites locate in the three genes which have been showed to be associated with lung cancer event or overall survival. However, the three CpG sites (cg21926276, cg27042065 and cg26387355) have not been reported, which are newly identified as the potential novel epigenetic markers linking smoking and survival of lung cancer patients. Collectively, the proposed high-dimensional mediation analysis procedure has good performance in mediator selection and indirect effect estimation.Entities:
Year: 2020 PMID: 32302299 PMCID: PMC7190184 DOI: 10.1371/journal.pcbi.1007768
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Accuracy of mediator selection (p = 10000, with 500 replications).
| Censoring | Sample size | Sobel test | Joint test | ||||
|---|---|---|---|---|---|---|---|
| TPR | FP | FDP | TPR | FP | FDP | ||
| 300 | 0.7860 | 0.0080 | 0.0019 | 0.8400 | 0.2360 | 0.0519 | |
| 500 | 0.9865 | 0.0060 | 0.0012 | 0.9900 | 0.0340 | 0.0069 | |
| 1000 | 1 | 0.0220 | 0.0044 | 1 | 0.0360 | 0.0072 | |
| 300 | 0.7650 | 0.0100 | 0.0025 | 0.8355 | 0.2460 | 0.0581 | |
| 500 | 0.9840 | 0.0060 | 0.0012 | 0.9880 | 0.0360 | 0.0074 | |
| 1000 | 1 | 0.0200 | 0.0040 | 1 | 0.0280 | 0.0056 | |
| 300 | 0.7435 | 0.0080 | 0.0019 | 0.8270 | 0.2480 | 0.0584 | |
| 500 | 0.9850 | 0.0080 | 0.0016 | 0.9880 | 0.0500 | 0.0099 | |
| 1000 | 1 | 0.0220 | 0.0044 | 1 | 0.0300 | 0.0060 | |
*TPR: the average value of true positive rates; FP: the average number of false positive; FDP: false discovery proportion (= V/R, where V is the number of false discoveries, R is the number of total discoveries); TPR, FP and FDP are the average value over 500 times.
Estimation of log hazard indirect effects: αβ.
| (αk,βk) = αkβk | Estimation | Cen = 15% | Cen = 25% | Cen = 35% | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| n = 300 | n = 500 | n = 1000 | n = 300 | n = 500 | n = 1000 | n = 300 | n = 500 | n = 1000 | ||
| (0.5,0.55) = 0.275 | Est. | 0.2952 | 0.2794 | 0.2753 | 0.2954 | 0.2794 | 0.2758 | 0.2956 | 0.2801 | 0.2764 |
| (0.45,0.6) = 0.27 | Est. | 0.2916 | 0.2806 | 0.2716 | 0.2930 | 0.2821 | 0.2717 | 0.2930 | 0.2827 | 0.2719 |
| (0.5,0.65) = 0.325 | Est. | 0.3443 | 0.3343 | 0.3300 | 0.3445 | 0.3341 | 0.3305 | 0.3436 | 0.3347 | 0.3311 |
| (0.4,0.7) = 0.28 | Est. | 0.2975 | 0.2889 | 0.2813 | 0.2978 | 0.2897 | 0.2814 | 0.2982 | 0.2896 | 0.2815 |
| (0.45,0) = 0 | Est. | 0.0078 | - | - | 0.0167 | 0.0514 | - | 0.0098 | 0.0143 | - |
| (0.45,0) = 0 | Est. | 0.0327 | -0.0478 | - | 0.0393 | 0.0860 | - | 0.0128 | -0.0485 | - |
| (0,0.5) = 0 | Est. | -0.0043 | 0.0024 | 0.0009 | -0.0051 | 0.0024 | 0.0009 | -0.0043 | 0.0027 | 0.0009 |
| (0,0.5) = 0 | Est. | -0.0013 | 0.0011 | 0.0035 | -0.0013 | 0.0012 | 0.0035 | -0.0006 | 0.0008 | 0.0035 |
| (0,0) = 0 | Est. | 0.0382 | - | - | 0.0930 | - | - | 0.0621 | - | - |
| (0,0) = 0 | Est. | -0.0027 | - | - | -0.0136 | - | - | -0.0737 | - | - |
*Est.: the mean of estimators; CP: coverage probability, the proportion of the replicates that the 95% confidence interval covers the true value of estimate; Emp. SE: empirical standard error calculated as the sample standard deviation of the estimates over all replicates; Est. SE: the average of the standard errors over all replicates;—means the not available value.
Summary of selected CpGs with estimators and P-values for significant mediators.
| CpGs | Chromosome | Gene | |||
|---|---|---|---|---|---|
| cg21926276 | chr11 | H19 | 0.2229 | 1.266e-03 | 1.662e-06 |
| cg27042065 | chr12 | CDCA3 | 0.0880 | 1.071e-01 | 4.409e-02 |
| cg26387355 | chr12 | LOC338797 | 0.1388 | 1.449e-02 | 1.558e-03 |
| cg15292688 | chr18 | ZNF519 | -0.2301 | 4.844e-02 | 1.084e-02 |
| cg24200525 | chr12 | SBF1 | -0.1127 | 3.018e-02 | 5.535e-03 |
| cg07690349 | chr11 | MUC5B | -0.1403 | 1.217e-02 | 9.126e-04 |
*The CpGs are the DNA methylation sites. Chromosomes and Genes are where the CpGs locate. is the estimation of log-hazard indirect effect. P(Sobel) is the Sobel test p-values and P(Joint) is the joint test p-values, which are corrected by bonferroni’s method.
Path-specific effects (effect scale: hazard ratio) of tobacco smoking on overall survival of lung cancer patients (only CpGs with are included).
| Effect Estimate | 95% Confidence Interval | |
|---|---|---|
| X→Y(Direct effect) | 1.4309 | (1.0810, 1.9074) |
| X→cg21926276→Y | 1.2497 | (1.1121, 1.4045) |
| X→cg27042065→Y | 1.0920 | (1.0170, 1.1726) |
| X→cg26387355→Y | 1.1489 | (1.0518, 1.2550) |
| Total effect | 1.3248 | (1.0220, 1.7170) |
* denotes the estimate effect.
Fig 1(Left) The directed acyclic graph describes high-dimensional mediation with the p mediators assumed to be uncorrelated with one another. (Right) The relationship of the three-variable path diagram used to represent standard mediation framework.
Fig 2Overall workflow for high-dimensional mediation analysis.
The workflow includes the main processes: (a) using SIS technique for preliminary screening; (b) conducting MCP-based variable selection; (c) testing for mediation effects.