| Literature DB >> 32286333 |
Arthi Ramachandran1,2, Avishek Kumar1, Hannes Koenig1, Adolfo De Unanue1, Christina Sung1, Joe Walsh1, John Schneider2, Rayid Ghani1, Jessica P Ridgway3.
Abstract
Consistent medical care among people living with HIV is essential for both individual and public health. HIV-positive individuals who are 'retained in care' are more likely to be prescribed antiretroviral medication and achieve HIV viral suppression, effectively eliminating the risk of transmitting HIV to others. However, in the United States, less than half of HIV-positive individuals are retained in care. Interventions to improve retention in care are resource intensive, and there is currently no systematic way to identify patients at risk for falling out of care who would benefit from these interventions. We developed a machine learning model to identify patients at risk for dropping out of care in an urban HIV care clinic using electronic medical records and geospatial data. The machine learning model has a mean positive predictive value of 34.6% [SD: 0.15] for flagging the top 10% highest risk patients as needing interventions, performing better than the previous state-of-the-art logistic regression model (PPV of 17% [SD: 0.06]) and the baseline rate of 11.1% [SD: 0.02]. Machine learning methods can improve the prediction ability in HIV care clinics to proactively identify patients at risk for not returning to medical care.Entities:
Mesh:
Year: 2020 PMID: 32286333 PMCID: PMC7156693 DOI: 10.1038/s41598-020-62729-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Characteristics of Study Population of 713 University of Chicago HIV Clinic Patients from January 1, 2008 through May 31, 2015.
| Characteristics | Value |
|---|---|
| Age at first visit in study period, Mean (SD) | 47.3 (13.6) |
| Female, No. (%) | 314 (44%) |
| African American, No. (%) | 585 (82%) |
| White, No. (%) | 93 (13%) |
| Other, No. (%) | 35 (5%) |
| Private, No. (%) | 312 (44%) |
| Medicaid, No. (%) | 309 (43%) |
| Medicare, No. (%) | 85 (12%) |
| Number of attended appointments, Mean (SD) | 19.5 (17) |
| Fraction in zipcode that are African American, Mean (SD) | 0.81 (0.29) |
| Fraction in zipcode with less than $10 k income, Mean (SD) | 0.07 (0.04) |
| Fraction in zipcode with income between $10 k and $15 k, Mean (SD) | 0.03 (0.02) |
| Fraction in zipcode on SNAP, Mean (SD) | 0.14 (0.08) |
| Fraction in zipcode with high school education, Mean (SD) | 0.38 (0.14) |
| Fraction in zipcode with some college, Mean (SD) | 0.19 (0.06) |
| Fraction in zipcode with bachelors, Mean (SD) | 0.13 (0.09) |
| Distance from residence to clinic (miles), Mean (SD) | 4.88 (3.65) |
| Travel time in minutes (public transit) from residence to clinic, Mean (SD) | 43.5 (19.4) |
| Travel time in minutes (car) from residence to clinic, Mean (SD) | 18.8 (10.7) |
| Average crime rate on route from from residence to clinic, Mean (SD) | 0.11 (0.03) |
Figure 1Positive Predictive Value of highest 10% risk scores for Retention in Care (top) and Access to Care (bottom) across model space: Positive Predictive Value (PPV) measures how many appointments were correctly predicted to have no follow-up (as defined by the HRSA HAB definition of retention) among the top 10% of appointments. The 10% threshold was chosen to match the resources the clinic has for launching an intensive intervention. The machine learning models shown below are the best performing model (blue) and the best performing model of an alternate model type (for retention in care, a decision tree, and for access to care in 6 months, a logistic regression).
Figure 2Features Learned by Machine Learning Models for (top) Retention in Care and (bottom) Access to Care: The feature importance of the random forest is the mean of the gain in purity of each of the underlying decision trees and is similar to logistic regression coefficients. The maximum importance within each class of predictor variables shows that the most important predictors for the model are based on the history of retention and the previous infectious disease clinic visits.
Figure 3Trade off of performance vs fairness in models for retention in care: (top) There is a trade off in choosing models with high performance (x-axis) and minimal bias (y-axis). The circles show the average PPV and FOR. The lines show distribution of both PPV and FOR ratio over the different time periods. The thick lines show the first and third quartiles; the thin lines show the 5% and 95% percentiles. The purple band is the band of minimal disparity in FOR i.e., the ratio of the FOR for Black vs White races is within [0.9, 1.1] (bottom). Over time, the disparity in FOR for both our best performing machine learning models reduces. The machine learning model that is selected for best stable performance (blue) is better performing than the previous state of the art model (red). The best decision tree model (orange) has slightly lower performance and similar FOR ratios. The remaining models (black) were chosen for minimal disparity.
Figure 4Trade off of performance vs fairness in models for accessing care: (top) There is a trade off in choosing models with high performance (x-axis) and minimal bias (y-axis). The circles show the average PPV and FOR. The lines show distribution of both PPV and FOR ratio over the different time periods. The thick lines show the first and third quartiles; the thin lines show the 5% and 95% percentiles. The purple band is the band of minimal disparity in FOR i.e., the ratio of the FOR for Black vs White races is within [0.9, 1.1]. Note that the x-axis goes from 0 to 0.4 to highlight the performance of the models. (bottom) Over time, the disparity in FOR for both our best performing machine learning models reduces. The machine learning model that is selected for best stable performance (blue) is better performing than the previous state of the art model (red). The best logistic regression model (green) has slightly lower performance and similar FOR ratios. The remaining models (black) were chosen for minimal disparity.