| Literature DB >> 35055826 |
Yinqiao Dong1,2, Shangbin Liu1, Danni Xia1, Chen Xu1, Xiaoyue Yu1, Hui Chen1, Rongxi Wang1, Yujie Liu1, Jingwen Dong1, Fan Hu1, Yong Cai1, Ying Wang1.
Abstract
The impact of psychosocial factors on increasing the risk of HIV infection among men who have sex with men (MSM) has attracted increasing attention. We aimed to develop and validate an integrated prediction model, especially incorporating emerging psychosocial variables, for predicting the risk of HIV infection among MSM. We surveyed and collected sociodemographic, psychosocial, and behavioral information from 547 MSM in China. The participants were split into a training set and a testing set in a 3:1 theoretical ratio. The prediction model was constructed by introducing the important variables selected with the least absolute shrinkage and selection operator (LASSO) regression, applying multivariate logistic regression, and visually assessing the risk of HIV infection through the nomogram. Receiver operating characteristic curves (ROC), Kolmogorov-Smirnov test, calibration plots, Hosmer-Lemeshow test and population stability index (PSI) were performed to test validity and stability of the model. Four of the 15 selected variables-unprotected anal intercourse, multiple sexual partners, involuntary subordination and drug use before sex-were included in the prediction model. The results indicated that the comprehensive prediction model we developed had relatively good predictive performance and stability in identifying MSM at high-risk for HIV infection, thus providing targeted interventions for high-risk MSM.Entities:
Keywords: HIV infection; involuntary subordination; machine learning; men who have sex with men; model validation; nomogram; psychosocial factors
Mesh:
Year: 2022 PMID: 35055826 PMCID: PMC8776241 DOI: 10.3390/ijerph19021010
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Flow diagram of study design.
Demographic, psychosocial and behavioral characteristics of the 547 MSM enrolled in the study according to randomization to the training and testing sets.
| Characteristic | Total Populations | Training Set | Testing Set | |
|---|---|---|---|---|
| Age (years old) | 28.0 (25.0, 33.0) | 28.0 (25.0, 33.0) | 27.0 (24.5, 34.0) | 0.933 |
| Employment status | 0.991 | |||
| Employed | 447 (81.7) | 335 (81.7) | 112 (81.8) | |
| Unemployed | 100 (18.3) | 75 (18.3) | 25 (18.2) | |
| Highest education level | 0.858 | |||
| Senior high school or less | 157 (28.7) | 119 (29.0) | 38 (27.7) | |
| College degree or above | 390 (71.3) | 291 (71.0) | 99 (72.3) | |
| Current marital status | 0.195 | |||
| Single | 434 (79.3) | 329 (80.2) | 105 (76.6) | |
| Married 1 | 82 (15.0) | 62 (15.1) | 20 (14.6) | |
| Divorced or widowed | 31 (5.7) | 19 (4.7) | 12 (8.8) | |
| Income(CNY) | 0.332 | |||
| <3000 2 | 133 (24.3) | 99(24.1) | 34(24.8) | |
| 3000–6000 | 211 (38.6) | 152(37.1) | 59(43.1) | |
| >6000 | 203 (37.1) | 159(38.8) | 44(32.1) | |
| Residence status | 0.079 | |||
| Local | 389 (71.1) | 127 (31.0) | 31 (22.6) | |
| Non-local | 158 (28.9) | 283 (69.0) | 106 (77.4) | |
| Sexual orientation | 0.714 | |||
| Non-homosexual | 157 (28.7) | 116 (28.3) | 41 (29.9) | |
| Gay/homosexual | 390 (71.3) | 294 (71.7) | 96 (70.1) | |
| Have had a VCT | 0.822 | |||
| No | 251 (45.9) | 187 (45.6) | 64 (46.7) | |
| Yes | 296 (54.1) | 223 (54.4) | 73 (53.3) | |
| Alcohol use before having sex | 0.415 | |||
| No | 278 (50.8) | 213 (52.0) | 65 (47.4) | |
| Yes | 269 (49.2) | 197 (48.0) | 72 (52.6) | |
| Drug use before having sex | 1.000 | |||
| No | 530 (96.9) | 397 (96.8) | 133 (97.1) | |
| Yes | 17 (3.1) | 13 (3.2) | 4 (2.9) | |
| MSP | 0.502 | |||
| No | 250 (45.7) | 184 (44.9) | 66 (48.2) | |
| Yes | 297 (54.3) | 226 (55.1) | 71 (51.8) | |
| UAI | 0.188 | |||
| No | 249 (45.5) | 180 (43.9) | 69 (50.4) | |
| Yes | 298 (54.5) | 230 (56.1) | 68 (49.6) | |
| Involuntary subordination | 80.52 ± 18.06 | 80.31 ± 18.23 | 81.12 ± 17.57 | 0.649 |
| Social support | 61.0 (51.0, 70.0) | 62.00 (51.0, 70.0) | 59.00 (50.5, 69.0) | 0.448 |
| Sexual compulsivity | 23.0 (19.0, 26.0) | 23.00(19.0, 26.0) | 23.00(20.0, 26.0) | 0.906 |
1 Marital status refers to only heterosexual marriage. Homosexual marriage is still not legalized in Mainland China. 2 CNY 3000 equivalent to USD 450; 6000 equivalents to USD 900. Note: VCT, voluntary HIV counseling and testing; MSP, multiple sexual partners; UAI, unprotected anal intercourse. Data are mean ± SD, median (interquartile range) or n (%).
Figure 2Predictors selection using the least absolute shrinkage and selection operator (LASSO) binary logistic regression model: (a) A coefficient profile plot was constructed against the log (lambda) parameters. Four variables with nonzero coefficients were selected by deriving the optimal lambda; (b) Selection of optimal parameter (lambda) in the LASSO model used 10-fold cross-validation error curve and was based on 1 standard error of the minimum criteria (1-SE criteria). The partial likelihood deviance (binomial deviance) curve was plotted versus log (lambda). The dotted lines were drawn at the optimal values by the minimum criteria and 1-SE criteria.
Logistic regression analysis of the predictors for the risk of HIV infection among MSM.
| Intercept and Variables | Estimate | Prediction Model | ||
|---|---|---|---|---|
| Wald Values | Odds Ratio (95% CI) | |||
| Intercept | −6.862 | 51.835 | 0.000 (0.000, 0.006) | <0.001 |
| UAI | 0.931 | 5.698 | 2.536 (1.219, 5.703) | 0.017 |
| MSP | 1.043 | 6.720 | 2.837 (1.338, 6.591) | 0.010 |
| Drug use before sex | 1.522 | 5.323 | 4.579 (1.184, 16.437) | 0.021 |
| Involuntary subordination | 0.040 | 17.161 | 1.041 (1.022, 1.061) | <0.001 |
Note: CI, confidence interval.
Figure 3Developed nomogram to assess the HIV infection risk among MSM.
Figure 4Receiver operating characteristic (ROC) validation of the HIV infection risk nomogram prediction in training set and testing set: (a) The area under the receiver operating characteristic curve (AUC) represents the discrimination performance of the model in the training set; (b) The area under the receiver operating characteristic curve (AUC) represents the discrimination performance of the model in the testing set.
Figure 5Kolmogorov–Smirnov test curve of the HIV risk prediction model in the training set and in the testing set. The horizontal coordinate of the curve is the “threshold” (the overall sample is divided into 10 equal parts in probability order), and the vertical coordinate is the value of TPR (true positive rate) or FPR (false positive rate), ranging from 0 to 1. The red and blue solid lines indicate the HIV-positive and negative cases in the training set, respectively. The maximum vertical distance between these two curves (gray dashed line) is the KS test value, and the corresponding horizontal coordinate is the threshold value that classifies the model best. The orange and green dashed lines indicate the HIV-positive and negative cases in the testing set, respectively.
Figure 6Calibration curves for the prediction of HIV infection risk nomogram The y-axis represented the actual risk of HIV infection. The x-axis represented the predicted risk of HIV infection. The diagonal dotted line represents a perfect prediction by an ideal model, the solid line represents the performance of the training set (a) and testing set (b), with the results indicating that a closer fit to the diagonal dotted line represents a better prediction.
Figure 7Population distribution plot of HIV infection risk prediction model in the training and testing sets. The horizontal coordinate indicates the 10 binning interval of the general distribution, and the vertical coordinate indicates the percentage of the population. The red and dark green squares indicate the actual population distribution in each binning interval for the training and testing sets, respectively. The green solid line and the orange dashed line indicate the expected population distribution in each binning interval for the training and testing sets, respectively.