| Literature DB >> 35610227 |
Lei Zhang1,2,3,4, Jason J Ong5,6,7, Xianglong Xu8,9, Christopher K Fairley8,9, Eric P F Chow8,9,10, David Lee9, Ei T Aung8,9.
Abstract
Timely and regular testing for HIV and sexually transmitted infections (STI) is important for controlling HIV and STI (HIV/STI) among men who have sex with men (MSM). We established multiple machine learning models (e.g., logistic regression, lasso regression, ridge regression, elastic net regression, support vector machine, k-nearest neighbour, naïve bayes, random forest, gradient boosting machine, XGBoost, and multi-layer perceptron) to predict timely (i.e., within 30 days) clinic attendance and HIV/STI testing uptake after receiving a reminder message via short message service (SMS) or email). Our study used 3044 clinic consultations among MSM within 12 months after receiving an email or SMS reminder at the Melbourne Sexual Health Centre between April 11, 2019, and April 30, 2020. About 29.5% [899/3044] were timely clinic attendance post reminder messages, and 84.6% [761/899] had HIV/STI testing. The XGBoost model performed best in predicting timely clinic attendance [mean [SD] AUC 62.8% (3.2%); F1 score 70.8% (1.2%)]. The elastic net regression model performed best in predicting HIV/STI testing within 30 days [AUC 82.7% (6.3%); F1 score 85.3% (1.8%)]. The machine learning approach is helpful in predicting timely clinic attendance and HIV/STI re-testing. Our predictive models could be incorporated into clinic websites to inform sexual health care or follow-up service.Entities:
Mesh:
Year: 2022 PMID: 35610227 PMCID: PMC9128330 DOI: 10.1038/s41598-022-12033-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Characteristics of MSM stratified by their timing of clinic attendance after receiving a reminder message and HIV/STI testing post clinic reminder message.
| Variables | Clinic attendance | Uptake of HIV/STI testing | Uptake of HIV/STI testing | |||
|---|---|---|---|---|---|---|
| > 30 days (n = 2145) | ≤ 30 days (n = 899) | No (n = 138) | Yes (n = 761) | No (n = 347) | Yes (n = 2697) | |
| Age [years, median (IQR)] | 33.0 (27.0–39.5) | 31.0 (26.0–37.0) | 31.0 (27.0–39.8) | 30.0 (26.0–37.0) | 33.0 (27.0–39.5) | 31.0 (26.0–37.0) |
| Email and SMS | 398 (18.6%) | 205 (22.8%) | 30 (21.7%) | 175 (23.0%) | 44 (12.7%) | 559 (20.7%) |
| Email only | 1038 (48.4%) | 471 (52.4%) | 63 (45.7%) | 408 (53.6%) | 105 (30.3%) | 1404 (52.1%) |
| SMS only | 709 (33.1%) | 223 (24.8%) | 45 (32.6%) | 178 (23.4%) | 198 (57.1%) | 734 (27.2%) |
| 12 monthly | 26 (1.2%) | 9 (1.0%) | 116 (84.1%) | 658 (86.5%) | 4 (1.2%) | 31 (1.1%) |
| 3 monthly | 1754 (81.8%) | 774 (86.1%) | 22 (15.9%) | 94 (12.4%) | 271 (78.1%) | 2257 (83.7%) |
| 6 monthly | 365 (17.0%) | 116 (12.9%) | 0 (0%) | 9 (1.2%) | 72 (20.7%) | 409 (15.2%) |
| No | 1654 (77.1%) | 660 (73.4%) | 137 (99.3%) | 523 (68.7%) | 343 (98.8%) | 1971 (73.1%) |
| Yes | 491 (22.9%) | 239 (26.6%) | 1 (0.7%) | 238 (31.3%) | 4 (1.2%) | 726 (26.9%) |
| No | 1712 (79.8%) | 750 (83.4%) | 129 (93.5%) | 621 (81.6%) | 322 (92.8%) | 2140 (79.3%) |
| Yes | 433 (20.2%) | 149 (16.6%) | 9 (6.5%) | 140 (18.4%) | 25 (7.2%) | 557 (20.7%) |
| No | 1987 (92.6%) | 850 (94.5%) | 137 (99.3%) | 713 (93.7%) | 343 (98.8%) | 2494 (92.5%) |
| Yes | 158 (7.4%) | 49 (5.5%) | 1 (0.7%) | 48 (6.3%) | 4 (1.2%) | 203 (7.5%) |
IQR interquartile range.
Figure 1AUC-ROC curves for different machine learning algorithms. (a) AUC-ROC curves for different machine learning algorithms on the prediction of clinic attendance within 30 days after receiving a reminder message. (b) AUC-ROC curves for different machine learning algorithms on timely HIV/STI testing. ROC Receiver operating characteristic, AUC area under the ROC curve, LR logistic regression, Lasso LASSO regression, Ridge ridge regression, Elastic Net elastic net regression, GBM gradient boosting machine, RF Random Forest, NB Naïve Bayes, MLP with two hidden layers two hidden layers multi-layer perceptron neural network, XGBoost Extreme Gradient Boosting, Bayesian GLM Bayesian Generalized Linear Model, KNN K-Nearest Neighbour, SVM (Linear) Linear Support Vector Machines (without kernel extensions), Kernel SVM (Polynomial) SVM Using Polynomial Basis Kernel, Kernel SVM (RBF) SVM Using Radial Basis Function Kernel.
Figure 2Variable importance in the prediction of timely clinic attendance after receiving a clinic reminder message by XGBoost.
Figure 3Variable importance in the prediction of timely HIV/STI testing after receiving a clinic reminder message by elastic net regression.