| Literature DB >> 36091522 |
Jiajin He1, Jinhua Li2, Siqing Jiang1, Wei Cheng3, Jun Jiang3, Yun Xu3, Jiezhe Yang3, Xin Zhou3, Chengliang Chai3, Chao Wu4.
Abstract
Background: Continuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tendency in disease diagnosis prediction. We aimed to develop and validate machine learning models in predicting HIV infection among MSM that can identify individuals at increased risk of HIV acquisition for transmission-reduction interventions.Entities:
Keywords: HIV; MSM; machine learning; models; prediction
Mesh:
Year: 2022 PMID: 36091522 PMCID: PMC9452878 DOI: 10.3389/fpubh.2022.967681
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
The confusion matrix.
|
|
| |
|---|---|---|
| Positive prediction | True positive (TP) | False positive (FP) |
| Negative prediction | False negative (FN) | True negative (TN) |
Figure 1Flow chart of models development and prospective validation. LR, logistic regression; DT, decision tree; SVM, support vector machines; RF, random forest.
Basic characteristics of variables in both 2018–2019 and 2020 MSM and univariate associations of potential predictors with HIV infection in 2018–2019 MSM.
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|
|
|
|
|
| |||
|
|
|
|
| |||
|
|
|
|
| |||
|
| ||||||
| <18 | 66 | 6 | Ref. | 27 | 2 | |
| 18–40 | 4,720 | 273 | 0.64 (0.27, 1.48) | 0.294 | 2,470 | 108 |
| 41–65 | 1,161 | 84 | 0.80 (0.34, 1.89) | 0.605 | 698 | 34 |
| >65 | 27 | 9 | 3.67 (1.19, 11.30) | 0.024 | 24 | 1 |
|
| ||||||
| Unmarried | 3,574 | 247 | Ref. | 1,974 | 92 | |
| Married | 1,990 | 102 | 0.74 (0.59, 0.94) | 0.013 | 992 | 44 |
| Cohabiting | 49 | 2 | 0.59 (0.14, 2.44) | 0.467 | 21 | 0 |
| Divorced or widowed | 361 | 21 | 0.84 (0.53, 1.33) | 0.461 | 232 | 9 |
|
| ||||||
| No | 2,289 | 221 | Ref. | 1,260 | 86 | |
| Yes | 3,685 | 151 | 0.42 (0.34, 0.53) | <0.001 | 1,929 | 59 |
|
| ||||||
| Han | 5,878 | 349 | Ref. | 3,161 | 139 | |
| Others | 96 | 23 | 4.04 (2.53, 6.44) | <0.001 | 58 | 6 |
|
| ||||||
| <3 months | 280 | 28 | Ref. | 155 | 10 | |
| 3–6 months | 269 | 16 | 0.59 (0.31, 1.12) | 0.110 | 107 | 9 |
| 7–12 months | 467 | 24 | 0.51 (0.29, 0.90) | 0.021 | 197 | 4 |
| 1–2 years | 849 | 51 | 0.60 (0.37, 0.97) | 0.037 | 633 | 21 |
| >2 years | 4,109 | 253 | 0.62 (0.41, 0.93) | 0.020 | 2,127 | 101 |
|
| ||||||
| Illiteracy | 35 | 4 | Ref. | 6 | 0 | |
| Primary school | 250 | 18 | 0.63 (0.20, 1.97) | 0.427 | 107 | 8 |
| Junior high school | 1,553 | 121 | 0.68 (0.24, 1.95) | 0.475 | 838 | 41 |
| Senior high school | 2,047 | 111 | 0.47 (0.16, 1.36) | 0.165 | 1,153 | 38 |
| College degree or above | 2,089 | 118 | 0.49 (0.17, 1.41) | 0.189 | 1,115 | 58 |
|
| ||||||
| Homosexuality | 3,994 | 249 | Ref. | 2,289 | 102 | |
| Heterosexuality | 48 | 2 | 0.67 (0.16, 2.76) | 0.578 | 36 | 1 |
| Bisexuality | 1,743 | 109 | 1.00 (0.79, 1.27) | 0.979 | 774 | 40 |
| Unascertained | 189 | 12 | 1.02 (0.17, 1.41) | 0.952 | 120 | 2 |
|
| ||||||
| Bar/dance hall | 339 | 13 | Ref. | 260 | 0 | |
| Tearoom/clubhouse | 157 | 10 | 1.66 (0.71, 3.87) | 0.240 | 143 | 8 |
| Public bath | 329 | 20 | 1.58 (0.78, 3.24) | 0.206 | 175 | 5 |
| Park | 257 | 6 | 0.61 (0.23, 1.62) | 0.321 | 72 | 1 |
| Internet | 4,761 | 315 | 1.73 (0.98, 3.04) | 0.059 | 2,527 | 179 |
| Others | 131 | 8 | 1.59 (0.65, 3.93) | 0.313 | 42 | 2 |
|
| ||||||
| No | 2,529 | 203 | Ref. | 1,143 | 71 | |
| Yes | 3,445 | 169 | 0.61 (0.50, 0.75) | <0.001 | 2,076 | 74 |
|
| ||||||
| No | 3,062 | 209 | Ref. | 1,630 | 91 | |
|
| 2,912 | 163 | 0.82 (0.66, 1.01) | 0.065 | 1,589 | 54 |
|
| ||||||
| No | 1,072 | 158 | Ref. | 363 | 39 | |
| Yes | 4,902 | 214 | 0.30 (0.24, 0.37) | <0.001 | 2,856 | 106 |
|
| ||||||
| Never | 249 | 39 | Ref. | 105 | 10 | |
| Sometimes | 2,430 | 233 | 0.61 (0.43, 0.88) | <0.001 | 817 | 82 |
| Every time | 3,295 | 100 | 0.19 (0.13, 0.29) | <0.001 | 2,297 | 53 |
|
| ||||||
| No | 5,730 | 362 | Ref. | 3,105 | 139 | |
| Yes | 244 | 10 | 0.65 (0.34, 1.23) | 0.186 | 114 | 6 |
|
| ||||||
| No | 4,602 | 298 | Ref. | 2,636 | 122 | |
| Yes | 1,372 | 74 | 0.83 (0.64, 1.08) | 0.171 | 583 | 23 |
|
| ||||||
| No | 5,912 | 366 | Ref. | 3,213 | 144 | |
| Yes | 62 | 6 | 1.56 (0.67, 3.64) | 0.300 | 6 | 1 |
|
| ||||||
| No | 5,725 | 350 | Ref. | 3,116 | 137 | |
| Yes | 249 | 22 | 1.45 (1.02, 2.26) | 0.008 | 103 | 8 |
|
| ||||||
| No | 1,406 | 124 | Ref. | 572 | 45 | |
| Yes | 4,514 | 248 | 0.65 (0.52, 0.81) | <0.001 | 2,647 | 100 |
|
| ||||||
| No | 5,585 | 344 | Ref. | 3,038 | 138 | |
| Yes | 389 | 28 | 1.17 (0.78, 1.74) | 0.444 | 181 | 7 |
|
| ||||||
| No | 3,059 | 225 | Ref. | 1,697 | 77 | |
| Yes | 2,915 | 147 | 0.69 (0.55, 0.85) | 0.001 | 1,522 | 68 |
|
| ||||||
| No | 2,691 | 212 | Ref. | 1,278 | 67 | |
| Yes | 3,283 | 160 | 0.62 (0.50, 0.76) | <0.001 | 1,941 | 78 |
Description of original data and SMOTE-processed data.
|
|
|
|
|
|---|---|---|---|
| Training | 260 | 4,182 | 4,442 |
| Training-smote | 4,182 | 4,182 | 8,364 |
| Testing | 112 | 1,792 | 1,904 |
| Testing-smote | 1,792 | 1,792 | 3,584 |
Results of classification models in original unbalanced data.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| LR | 0.941 | 0.500 | 0.009 | 0.018 | 0.764 |
| DT | 0.934 | 0.208 | 0.045 | 0.074 | 0.549 |
| SVM | 0.935 | 0.071 | 0.009 | 0.016 | 0.632 |
| RF | 0.934 | 0.118 | 0.018 | 0.031 | 0.667 |
Results of classification models in SMOTE-processed data.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| LR | 0.702 | 0.690 | 0.733 | 0.711 | 0.778 |
| DT | 0.852 | 0.954 | 0.741 | 0.834 | 0.853 |
| SVM | 0.811 | 0.906 | 0.695 | 0.787 | 0.887 |
| RF | 0.871 | 0.960 | 0.775 | 0.858 | 0.942 |
Figure 2Receiver operating characteristics (ROC) curve of four models for the prediction of HIV in original unbalanced data (A) and SMOTE-processed data (B). LR, logistic regression; DT, decision tree; SVM, support vector machines; RF, random forest.
Figure 3Receiver operating characteristics (ROC) curve of four models for the prediction of HIV in prospective validation data. LR, logistic regression; DT, decision tree; SVM, support vector machines; RF, random forest.