| Literature DB >> 35882962 |
Mhairi Maskew1, Kieran Sharpey-Schafer2, Lucien De Voux2, Thomas Crompton3, Jacob Bor4,5,6, Marcus Rennick3, Admire Chirowodza3, Jacqui Miot4, Seithati Molefi3, Chuka Onaga3, Pappie Majuba3, Ian Sanne4,3, Pedro Pisa3,7.
Abstract
HIV treatment programs face challenges in identifying patients at risk for loss-to-follow-up and uncontrolled viremia. We applied predictive machine learning algorithms to anonymised, patient-level HIV programmatic data from two districts in South Africa, 2016-2018. We developed patient risk scores for two outcomes: (1) visit attendance ≤ 28 days of the next scheduled clinic visit and (2) suppression of the next HIV viral load (VL). Demographic, clinical, behavioral and laboratory data were investigated in multiple models as predictor variables of attending the next scheduled visit and VL results at the next test. Three classification algorithms (logistical regression, random forest and AdaBoost) were evaluated for building predictive models. Data were randomly sampled on a 70/30 split into a training and test set. The training set included a balanced set of positive and negative examples from which the classification algorithm could learn. The predictor variable data from the unseen test set were given to the model, and each predicted outcome was scored against known outcomes. Finally, we estimated performance metrics for each model in terms of sensitivity, specificity, positive and negative predictive value and area under the curve (AUC). In total, 445,636 patients were included in the retention model and 363,977 in the VL model. The predictive metric (AUC) ranged from 0.69 for attendance at the next scheduled visit to 0.76 for VL suppression, suggesting that the model correctly classified whether a scheduled visit would be attended in 2 of 3 patients and whether the VL result at the next test would be suppressed in approximately 3 of 4 patients. Variables that were important predictors of both outcomes included prior late visits, number of prior VL tests, time since their last visit, number of visits on their current regimen, age, and treatment duration. For retention, the number of visits at the current facility and the details of the next appointment date were also predictors, while for VL suppression, other predictors included the range of the previous VL value. Machine learning can identify HIV patients at risk for disengagement and unsuppressed VL. Predictive modeling can improve the targeting of interventions through differentiated models of care before patients disengage from treatment programmes, increasing cost-effectiveness and improving patient outcomes.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35882962 PMCID: PMC9325703 DOI: 10.1038/s41598-022-16062-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Flow chart of data source inclusion. *Notes: All VL tests immediately following a high test (> = 1000) and subsequent tests done < 6 months after first test removed. All visits with a ‘next appointment date’ that was either missing or more than 99 days were removed.
Input features for the (A) retention model and (B) unsuppressed VL model.
| Input feature | Description |
|---|---|
| 3 days late ratio | Ratio of patient’s historical visits which they have been late by 3 + days |
| 28 days late count | How many times previously has the patient been late by 28 + days |
| Visits at this facility | How many visits has the patient had at this facility (in total) |
| Number of VL tests | The number of VL tests the patient has had (in total) |
| Months since first visit | Months since the patients’ earliest recorded visit in patient record |
| Months since last visit | Months since the patients’ most recent recorded visit in patient record |
| Current age | Patient’s age at the current visit |
| Day of month next appointment | Day of the month (1–31) that the next appointment is schedule for |
| Last VL value | Patient’s last VL test value in copies/mL |
| Day of week next appointment | Day of the week (1–7) that the next appointment is schedule for |
| # Visits on current regimen | The number of recorded sequential visits the patient has had on the current ART treatment regimen |
| Is male | Gender of the patient encoded as 1 for Male, and 0 for Female |
| #Missed Months | The number of times the patient has had a whole month with no recorded clinical visit |
| Age started ART | Patient’s age on starting ART |
| Last VL value | Patient’s last VL test value in copies/mL |
| Duration on ART | Total number of months on ART treatment ever |
| Month of test | Calendar Month of Test |
| #Visits on current regimen | The number of recorded sequential visits the patient has had on the current ART treatment regimen |
| # of visits ever | The number of recorded visits the patient has had ever |
| Visits miss ratio | The proportion of patient’s historical visits which they have missed completely |
| Months since last VL test | Months since a last VL test was taken |
| Months since last visit | Months since the patient was last recorded to have visited a facility |
| Year of test | The year of the test |
| # Missed visits ever | The number of appointments the patient has missed ever |
Late visit model metrics based on (A) balanced and (B) unbalanced training sets.
| Missed visit observed | Missed visit not observed | Total observations | Se/Sp metrics | F1-score | |
|---|---|---|---|---|---|
| Missed visit predicted | 89,140 | 414,590 | 503,730 | Se = 61% | 0.27 |
| Missed visit not predicted | 57,741 | 837,674 | 895,415 | Sp = 67% | 0.78 |
| 146,881 | 1,252,264 | 1,399,145 | |||
| PPV | 18% | ||||
| NPV | 94% | ||||
| AUC | 0.688 | ||||
| Accuracy | 66.2% | ||||
| Missed visit predicted | 59,739 | 211,764 | 271,503 | Se = 41% | 0.29 |
| Missed visit not predicted | 87,040 | 1,040,602 | 1,127,642 | Sp = 83% | 0.87 |
| 146,779 | 1,252,366 | 1,399,145 | |||
| PPV | 22% | ||||
| NPV | 92% | ||||
| AUC | 0.688 | ||||
| Accuracy | 78.6% | ||||
The F1 score is the harmonic mean of sensitivity and specificity such that 1.0 is the best score.
Accuracy is the number of correct predictions out of the total number of predictions over the test set observations.
Se = sensitivity; Sp = specificity; PPV = positive predictive value; NPV = negative predictive value.
Figure 2ROC Curve of (A) 50:50 balanced late visit classifier, (B) 60:40 unbalanced late visit classifier and (C) 50:50 balanced unsuppressed VL classifier.
Unsuppressed VL model metrics based on balanced (50:50) training sets.
| Unsuppressed VL observed | Unsuppressed VL not observed | Total observations | Se/Sp metrics | F1-score | |
|---|---|---|---|---|---|
| Unsuppressed VL predicted | 14,225 | 50,678 | 64,903 | Se = 66% | 0.33 |
| Unsuppressed VL not predicted | 7,454 | 138,958 | 146,412 | Sp = 73% | 0.83 |
| 21,679 | 189,636 | 211,315 | |||
| PPV | 22% | ||||
| NPV | 95% | ||||
| AUC | 0.758 | ||||
| Accuracy | 72.5% | ||||
The F1 score is the harmonic mean of sensitivity and specificity such that 1.0 is the best score.
Accuracy is the number of correct predictions out of the total number of predictions over the test set observations.
Se = sensitivity; Sp = specificity; PPV = positive predictive value; NPV = negative predictive value.
Figure 3(A) Final input features included in late visit model ranked by importance. (B) Final input features included in unsuppressed VL model ranked by importance.