| Literature DB >> 35740428 |
Thi Mai Nguyen1, Hoang Long Le2, Kyu-Baek Hwang3, Yun-Chul Hong4,5, Jin Hee Kim1.
Abstract
DNA methylation modification plays a vital role in the pathophysiology of high blood pressure (BP). Herein, we applied three machine learning (ML) algorithms including deep learning (DL), support vector machine, and random forest for detecting high BP using DNA methylome data. Peripheral blood samples of 50 elderly individuals were collected three times at three visits for DNA methylome profiling. Participants who had a history of hypertension and/or current high BP measure were considered to have high BP. The whole dataset was randomly divided to conduct a nested five-group cross-validation for prediction performance. Data in each outer training set were independently normalized using a min-max scaler, reduced dimensionality using principal component analysis, then fed into three predictive algorithms. Of the three ML algorithms, DL achieved the best performance (AUPRC = 0.65, AUROC = 0.73, accuracy = 0.69, and F1-score = 0.73). To confirm the reliability of using DNA methylome as a biomarker for high BP, we constructed mixed-effects models and found that 61,694 methylation sites located in 15,523 intragenic regions and 16,754 intergenic regions were significantly associated with BP measures. Our proposed models pioneered the methodology of applying ML and DNA methylome data for early detection of high BP in clinical practices.Entities:
Keywords: DNA methylome; deep learning; high blood pressure; machine learning
Year: 2022 PMID: 35740428 PMCID: PMC9220060 DOI: 10.3390/biomedicines10061406
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Information about study participants.
| Variables | Parameter (n = 50) |
|---|---|
| Age, year, mean ± SD | 72.5 ± 3.5 |
| Sex, n (%) | |
| Male | 3 (6.0) |
| Female | 47 (94.0) |
| BMI, kg/m2, n (%) | |
| 18.5–<23.0 | 11 (22.0) |
| 23.0–<25.0 | 14 (28.0) |
| 25.0–<30.0 | 22 (44.0) |
| 30.0–<35.0 | 3 (6.0) |
| Drinker, n (%) | |
| Yes | 12 (24.0) |
| No | 38 (76.0) |
| Smoker, n (%) | |
| Yes | 3 (6.0) |
| No | 47 (94.0) |
| History of hypertension, n (%) | |
| Yes | 26 (52.0) |
| No | 24 (48.0) |
| History of diabetes, n (%) | |
| Yes | 42 (84.0) |
| No | 8 (16.0) |
| Current SBP, mmHg, mean ± SD | 128.5 ± 66.5 |
| <130, n (%) | 71 (47.3) |
| 130–139, n (%) | 48 (32.0) |
| ≥140, n (%) | 31 (20.7) |
| Current DBP, mmHg, mean ± SD | 77.3 ± 38.5 |
| <85, n (%) | 125 (83.3) |
| 85–89, n (%) | 16 (10.7) |
| ≥90, n (%) | 9 (6.0) |
| Current high BP status, n (%) | |
| Yes | 87 (58.0) |
| No | 63 (42.0) |
SD, standard deviation; BMI, body mass index, BP, blood pressure; SBP, systolic blood pressure; DBP, diastolic blood pressure; SD, standard deviation. Current BP measure of a participant was considered high if it met at least one of the following criteria: (1) the participant had a history of hypertension diagnosed, (2) SBP ≥ 140 mmHg, and (3) DBP ≥ 90 mmHg.
Figure 1Box plots of performance of three proposed models for high BP prediction. PCA, principal component analysis; DL, deep learning; RF, random forest; SVM, support vector machine; AUPRC, area under the precision-recall curve; AUROC, area under the receiver operator characteristics curve. An asterisk (*) denotes a significant difference with p-value < 0.05. A double asterisk (**) denotes a significant difference with p-value < 0.005.
Figure 2Disease classes related to 9154 target genes mapped with CpG sites significantly associated with high BP. The percentages were calculated as the number of target genes that regulated the corresponding disease class divided by 9154 target genes that regulated all related disease classes.
Figure 3Target genes as significant biomarkers for high BP-related diseases using a curated database in the DisGeNet platform. Grey lines link gene expression biomarkers for target disease, while blue line links genetic variation for target disease.
Estimated associations between BP measures and the most significant CpG sites mapped with biomarker genes for high BP.
| CpG | Chr | Position | USCS Gene | SBP | DBP | ||||
|---|---|---|---|---|---|---|---|---|---|
| Estimate | SE | Estimate | SE | ||||||
| cg20203971 | 2 | 240171099 |
| 443.9 | 101.0 | <0.001 | 205.5 | 56.0 | <0.001 |
| cg03573792 | 4 | 148465429 |
| 99.1 | 57.7 | 0.088 | 64.4 | 31.3 | 0.041 |
| cg04956913 | 6 | 30712436 |
| –413.3 | 161.2 | 0.011 | –202.1 | 88.1 | 0.023 |
| cg13224213 | 7 | 150689881 |
| –86.3 | 41.7 | 0.040 | –57.7 | 22.5 | 0.012 |
| cg18899064 | 8 | 42066228 |
| 58.2 | 70.4 | 0.410 | 91.3 | 37.6 | 0.016 |
| cg07528661 | 10 | 78647708 |
| 152.6 | 65.7 | 0.022 | 71.9 | 35.9 | 0.047 |
| cg06976598 | 10 | 53639124 |
| –37.3 | 13.8 | 0.008 | –19.6 | 7.5 | 0.010 |
| cg18248586 | 11 | 113329026 |
| 64.4 | 33.4 | 0.056 | 47.2 | 18.0 | 0.010 |
| cg03793270 | 11 | 89224684 |
| 43.4 | 20.9 | 0.040 | 26.0 | 11.3 | 0.023 |
| cg16655193 | 12 | 102802953 |
| –115.9 | 52.5 | 0.029 | –52.5 | 28.7 | 0.070 |
| cg07109046 | 13 | 41204388 |
| 50.8 | 15.1 | 0.001 | 23.1 | 8.3 | 0.006 |
| cg10821964 | 15 | 40269214 |
| 177.0 | 70.2 | 0.013 | 102.4 | 38.1 | 0.008 |
| cg09094674 | 16 | 23194733 |
| 78.7 | 30.3 | 0.010 | 64.0 | 16.0 | <0.001 |
| cg20019489 | 20 | 57414351 |
| –47.0 | 16.6 | 0.005 | –21.1 | 9.1 | 0.022 |
| cg09640960 | 20 | 60794676 |
| 162.0 | 48.5 | 0.001 | 80.9 | 26.6 | 0.003 |
USCS, University of California, Santa Cruz; SBP, systolic blood pressure; DBP, diastolic blood pressure; SE, standard error.