| Literature DB >> 33092459 |
Chang Shu1,2, Amy C Justice2,3, Xinyu Zhang1,2, Vincent C Marconi4, Dana B Hancock5, Eric O Johnson5,6, Ke Xu1,2.
Abstract
Background: With the improved life expectancy of people living with HIV (PLWH), identifying vulnerable subpopulations at high mortality risk is important. Evidences showed that DNA methylation (DNAm) is associated with mortality in non-HIV populations. Here, we established a panel of DNAm biomarkers that can predict mortality risk among PLWH.Entities:
Keywords: DNA methylation; HIV; machine learning prediction; mortality risk
Mesh:
Substances:
Year: 2020 PMID: 33092459 PMCID: PMC8216205 DOI: 10.1080/15592294.2020.1824097
Source DB: PubMed Journal: Epigenetics ISSN: 1559-2294 Impact factor: 4.528
Figure 1.Flowchart of analytical procedures for selecting CpG sites in the peripheral blood methylome, machine learning prediction models to predict high and low mortality risk groups, survival analysis, Gene Ontology enrichment analysis, and epigenome-wide association analysis
Study sample characteristics
| Training set (N = 460) | Validation set (N = 114) | Testing set (N = 507) | p value* | |
|---|---|---|---|---|
| Age (year, mean ± sd) | 52.56 (7.54) | 51.21 (8.09) | 50.86 (7.67) | 0.002 |
| Female (%) | 6 (1.3) | 1 (0.9) | 11 (2.2) | 0.452 |
| Race (%) | ||||
| Caucasian | 45 (9.8) | 9 (7.9) | 46 (9.1) | 0.003 |
| African Americans | 392 (85.2) | 103 (90.4) | 409 (80.7) | |
| Other | 23 (5.0) | 2 (1.8) | 52 (10.3) | |
| Smokers (%) | 360 (59.4) | 309 (58.4) | 294 (58.0) | 0.719 |
| HIV treatment adherence (%) | 362 (78.7) | 85 (74.6) | 382 (75.3) | 0.399 |
| CD4 count | 432.68 (291.43) | 411.82 (281.55) | 450.61 (280.05) | 0.287 |
| log 10 HIV-1 viral load | 2.76 (1.22) | 2.68 (1.17) | 2.83 (1.22) | 0.399 |
| VACS index (mean ± sd) | 30.62 (20.35) | 35.50 (21.83) | 35.46 (22.35) | 0.001 |
| 10-year mortality (%) | 123 (26.7) | 32 (28.1) | 121 (23.9) | 0.477 |
*ANOVA test is used for continuous variables, chi-square test is used for categorical variables
Figure 2.Variable importance ranking of predictive machine learning CpG sites. Variable importance is a score between 0 and 100, as calculated by elastic-net-regularized generalized linear models (GLMNET). We obtained variable importance scores from 100 bootstraps. The top 20 ranked CpG sites and 20 bootstraps are shown
Figure 5.Receiver operating characteristic (auROC) curve in the testing set
Figure 6.Kaplan-Meier curves of predicted high and low mortality risk groups among people living with HIV
Overlapping CpG sites between machine learning selected CpG sites and epigenome-wide significant CpG sites on mortality risk among people living with HIV
| probe | Chr | Position | Nearest gene | Variable Importance | Meta Effect (SE) | Meta P | Refgene group | Relation to CpG island |
|---|---|---|---|---|---|---|---|---|
| cg01971407 | 11 | 313,624 | 9.9 | −0.0399 (0.0039) | 8.05E-25 | TSS1500 | N_Shelf | |
| cg22930808 | 3 | 122,281,881 | 13.7 | −0.1057 (0.0105) | 7.24E-24 | 5UTR;TSS1500 | N_Shore | |
| cg23570810 | 11 | 315,102 | 13.0 | −0.0638 (0.0065) | 1.15E-22 | Body | N_Shore | |
| cg14864167 | 8 | 66,751,182 | 12.0 | −0.0832 (0.0085) | 1.54E-22 | Body | N_Shelf | |
| cg01190666 | 20 | 62,204,908 | 19.4 | −0.0343 (0.0035) | 1.79E-22 | 5UTR | N_Shore | |
| cg11702942 | 8 | 144,102,584 | 7.8 | −0.0382 (0.004) | 7.52E-22 | Body | S_Shore | |
| cg03607951 | 1 | 79,085,586 | 23.0 | −0.0683 (0.0072) | 3.58E-21 | TSS1500 | ||
| cg03848588 | 9 | 32,525,008 | 14.4 | −0.0274 (0.0029) | 6.02E-21 | Body | N_Shore | |
| cg04582010 | 11 | 313,120 | 22.1 | −0.0455 (0.0052) | 2.19E-18 | TSS1500 | S_Shore | |
| cg18394552 | 5 | 159,428,643 | 24.4 | 0.0342 (0.0043) | 1.57E-15 | |||
| cg03753191 | 13 | 43,566,902 | 4.1 | −0.018 (0.0023) | 2.43E-15 | TSS1500 | S_Shore | |
| cg17267239 | 1 | 173,640,200 | 21.8 | −0.0186 (0.0025) | 1.22E-13 | TSS1500 | S_Shore | |
| cg12461141 | 11 | 5,710,654 | 37.6 | −0.0274 (0.0037) | 1.37E-13 | TSS1500 | ||
| cg09251764 | 17 | 6,659,070 | 13.8 | −0.0142 (0.002) | 3.93E-13 | TSS200 | ||
| cg05626226 | 4 | 106,515,450 | 11.5 | 0.0279 (0.004) | 2.45E-12 | Body | ||
| cg22959742 | 10 | 13,913,931 | 16.4 | 0.0289 (0.0041) | 3.08E-12 | Body | ||
| cg16936953 | 17 | 57,915,665 | 11.3 | −0.0379 (0.0056) | 1.95E-11 | Body | ||
| cg18181703 | 17 | 76,354,621 | 32.6 | −0.0215 (0.0033) | 1.15E-10 | Body | N_Shore | |
| cg07107453 | 1 | 79,114,976 | 4.8 | −0.0292 (0.0046) | 2.26E-10 | TSS1500 | ||
| cg25114611 | 6 | 35,696,870 | 12.8 | −0.0125 (0.002) | 4.78E-10 | TSS1500;Body | S_Shore | |
| cg01059398 | 3 | 172,235,808 | 26.7 | −0.0289 (0.0047) | 7.85E-10 | Body | ||
| cg19459791 | 15 | 65,363,022 | 15.1 | 0.0159 (0.0026) | 1.20E-09 | S_Shelf | ||
| cg26282236 | 12 | 1,025,755 | 30.2 | 0.0243 (0.0041) | 3.63E-09 | Body | ||
| cg06357748 | 12 | 1,025,529 | 10.7 | 0.0277 (0.0047) | 4.12E-09 | Body | ||
| cg04442417 | 20 | 62,191,507 | 26.6 | 0.0201 (0.0035) | 1.22E-08 | Body | Island | |
| cg14602222 | 12 | 1,025,663 | 13.2 | 0.0241 (0.0042) | 1.42E-08 | Body | ||
| cg26724018 | 11 | 5,716,255 | 11.9 | −0.0171 (0.003) | 1.49E-08 | 5UTR | ||
| cg03084350 | 3 | 38,065,265 | 18.0 | 0.0114 (0.002) | 2.05E-08 | Body | N_Shore | |
| cg00569896 | 4 | 204,382 | 0.2 | 0.0254 (0.0047) | 5.78E-08 | N_Shore | ||
| cg12126344 | 1 | 12,207,564 | 8.4 | −0.011 (0.002) | 7.36E-08 |
Gene ontology term enrichment analysis of the selected 393 CpG sites that predict mortality risk among HIV-positive population
| Term | Total genes | Predictive Genes | P value |
|---|---|---|---|
| tumor necrosis factor receptor superfamily binding | 46 | 7 | 8.33E-06 |
| response to virus | 303 | 15 | 4.26E-05 |
| defence response | 1505 | 39 | 1.29E-04 |
| mitochondrial DNA metabolic process | 18 | 4 | 1.43E-04 |
| cytokine receptor binding | 267 | 12 | 1.48E-04 |
| regulation of response to interferon-gamma | 25 | 4 | 4.15E-04 |
| regulation of interferon-gamma-mediated signaling pathway | 25 | 4 | 4.15E-04 |
| cell-cell adhesion mediator activity | 50 | 6 | 4.43E-04 |
| intrinsic component of the cytoplasmic side of the plasma membrane | 7 | 3 | 4.76E-04 |
| immune response | 1896 | 45 | 4.97E-04 |