| Literature DB >> 32930613 |
Sebastian Rauschert1, Phillip E Melton2,3,4, Anni Heiskala5, Ville Karhunen6, Graham Burdge7, Jeffrey M Craig8,9, Keith M Godfrey10, Karen Lillycrop11, Trevor A Mori12, Lawrence J Beilin12, Wendy H Oddy4, Craig Pennell13, Marjo-Riitta Järvelin5,6,14, Sylvain Sebert5,15, Rae-Chi Huang1.
Abstract
BACKGROUND: Fetal exposure to maternal smoking during pregnancy is associated with the development of noncommunicable diseases in the offspring. Maternal smoking may induce such long-term effects through persistent changes in the DNA methylome, which therefore hold the potential to be used as a biomarker of this early life exposure. With declining costs for measuring DNA methylation, we aimed to develop a DNA methylation score that can be used on adolescent DNA methylation data and thereby generate a score for in utero cigarette smoke exposure.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32930613 PMCID: PMC7491641 DOI: 10.1289/EHP6076
Source DB: PubMed Journal: Environ Health Perspect ISSN: 0091-6765 Impact factor: 9.031
Figure 1.Flow chart for the modeling steps. This includes details of the steps undertaken in the training, testing and validation phase, as well as the data used per step.
Characteristics for the Raine study training and test data subset and the Northern Finland birth cohort 1986 and 1966.
| Raine Study: testing | Raine Study: training | NFBC1986 | NFBC1966 | ||
|---|---|---|---|---|---|
| 198 | 797 | 478 | 602 | ||
| Age (y) [mean (SD | 17.20 (0.49) | 17.27 (0.61) | 0.132 | 16.06 (0.36) | 31.01 (0.34) |
| Sex (%) | 0.975 | ||||
| Male | 99 (50.0) | 402 (50.4) | 221 (46.2) | 261 (43.4) | |
| Female | 99 (50.0) | 395 (49.6) | 257 (53.8) | 341 (56.6) | |
| Adolescent smoking (%) | 0.401 | ||||
| Non-smoker | 108 (54.5) | 392 (49.2) | 288 (60.3) | 283 (47.0) | |
| Ever-smoker | 42 (21.2) | 191 (24.0) | 164 (34.3) | 313 (52.0) | |
| Missing | 48 (24.2) | 214 (26.9) | 26 (5.4) | 6 (1.0) | |
| Maternal smoking during pregnancy (%) | 1 | ||||
| Exposed | 59 (29.8) | 237 (29.7) | 95 (19.9) | 130 (21.6) | |
| Not exposed | 139 (70.2) | 560 (70.3) | 383 (78.7) | 472 (78.4) |
Note: is the value for the -test and chi-square test between the Raine Study training and test set.
SD: Standard deviation.
Wilcoxon-Mann-Whitney U-test.
Chi-square test.
Adolescent smoking status was defined as ever smoked during the lifetime vs. never smoked as based on questionnaires.
Wilcoxon Mann-Whitney U-test.
Maternal smoking was defined as any smoking during pregnancy.
Model quality measures (sensitivity, specificity, Cohen’s κ, accuracy, AUC curve and Brier score) for the elastic net machine learning model, Reese et al. cord blood, Richmond et al. 568 CpG, Richmond et al. 19 CpG score the gradient boosting machine, random forest and support vector machine models that were among the four best performing models in our analysis. Results provided in this table are based on the Raine Study test data (), NFBC1986 (), and NFBC1966 ().
| Sensitivity | Specificity | Cohen’s κ | Accuracy | AUC | Brier score | # CpGs required | |
|---|---|---|---|---|---|---|---|
| Raine Study test data set | |||||||
| Elastic net score | 0.91 | 0.76 | 0.68 | 0.83 | 0.87 | 0.13 | 204 |
| Gradient boosting machine | 0.91 | 0.82 | 0.72 | 0.88 | 0.88 | 0.1 | 1,511 |
| Random forest | 0.87 | 0.73 | 0.58 | 0.83 | 0.83 | 0.17 | 1,511 |
| Support vector machine | 0.87 | 0.73 | 0.6 | 0.83 | 0.85 | 0.13 | 1,511 |
| Reese score | 0.88 | 0.72 | 0.6 | 0.83 | 0.85 | 0.21 | 28 |
| Richmond score 568 CpGs | 0.7 | 0.68 | 0.34 | 0.69 | 0.72 | 0.22 | 568 |
| Richmond score 19 CpGs | 0.79 | 0.58 | 0.37 | 0.72 | 0.73 | 0.22 | 19 |
| NFBC1986 | |||||||
| Elastic net score | 0.87 | 0.75 | 0.56 | 0.84 | 0.85 | 0.13 | 204 |
| Gradient boosting machine | 0.95 | 0.29 | 0.19 | 0.54 | 0.74 | 0.39 | 1,511 |
| Random forest | 0.79 | 0.16 | 0.06 | 0.64 | 0.54 | 0.24 | 1,511 |
| Support vector machine | 0.87 | 0.44 | 0.33 | 0.77 | 0.79 | 0.16 | 1,511 |
| Reese score | 0.87 | 0.61 | 0.46 | 0.82 | 0.8 | 0.18 | 28 |
| Richmond score 568 CpGs | 0.65 | 0.76 | 0.34 | 0.74 | 0.71 | 0.22 | 568 |
| Richmond score 19 CpGs | 0.65 | 0.77 | 0.31 | 0.68 | 0.73 | 0.22 | 19 |
| NFBC1966 | |||||||
| Elastic net score | 0.72 | 0.78 | 0.39 | 0.73 | 0.8 | 0.19 | 204 |
| Gradient boosting machine | 0.88 | 0.26 | 0.1 | 0.45 | 0.68 | 0.48 | 1,511 |
| Random forest | 0.77 | 0.18 | 0.05 | 0.64 | 0.48 | 0.24 | 1,511 |
| Support vector machine | 0.88 | 0.45 | 0.33 | 0.76 | 0.75 | 0.2 | 1,511 |
| Reese score | 0.72 | 0.7 | 0.32 | 0.71 | 0.73 | 0.18 | 28 |
| Richmond score 568 CpGs | 0.66 | 0.63 | 0.22 | 0.69 | 0.72 | 0.22 | 568 |
| Richmond score 19 CpGs | 0.61 | 0.72 | 0.23 | 0.63 | 0.73 | 0.22 | 19 |
Note: AUC, area under the receiver operator curve.
Figure 2.ROC for the four different model scores tested: elastic net regression, Reese et al. methylation score, Richmond et al. 568 CpG, and 19 CpG scores. AUC provided for every score, applied to the Raine Study test set, NFBC1986, and NFBC1966. Note: AUC, area under the ROC; ROC, receiver operator curve.
DeLong Test for significant difference between all ROC curves in Figure 2.
| Test | DeLong Test | ||
|---|---|---|---|
| Raine Study | NFBC1986 | NFBC1966 | |
| Elastic net vs. Reese score | 0.49 | 0.12 | 0.03 |
| Elastic net vs. Richmond young | 0.00058 | 0.008 | 0.006 |
| Elastic net vs. Richmond old | 0.01 | 0.04 | 0.004 |
| Reese vs. Richmond young | 0.23 | 0.43 | |
| Reese vs. Richmond old | 0.002 | 0.67 | 0.47 |
| Richmond young vs. Richmond old | 0.001 | 0.41 | 0.91 |