Literature DB >> 35768754

Prospective Evaluation of a Machine-Learning Prediction Model for Missed Radiology Appointments.

Steven Rothenberg1,2, Bill Bame3, Ed Herskovitz4.   

Abstract

The term "no-show" refers to scheduled appointments that a patient misses, or for which she arrives too late to utilize medical resources. Accurately predicting no-shows creates opportunities to intervene, ensuring that patients receive needed medical resources. A machine-learning (ML) model can accurately identify individuals at high no-show risk, to facilitate strategic and targeted interventions. We used 4,546,104 non-same-day scheduled appointments in our medical system from 1/1/2017 through 1/1/2020 for training data, including 631,386 no-shows. We applied eight ML techniques, which yielded cross-validation AUCs of 0.77-0.93. We then prospectively tested the best performing model, Gradient Boosted Regression Trees, over a 6-week period at a single outpatient location. We observed 123 no-shows. The model accurately identified likely no-show patients retrospectively (AUC 0.93) and prospectively (AUC 0.73, p < 0.0005). Individuals in the highest-risk category were three times more likely to no-show than the average of all other patients. No-show prediction modeling based on machine learning has the potential to identify patients for targeted interventions to improve their access to medical resources, reduce waste in the medical system and improve overall operational efficiency. Caution is advised, due to the potential for bias to decrease the quality of service for patients based on race, zip code, and gender.
© 2022. The Author(s) under exclusive licence to Society for Imaging Informatics in Medicine.

Entities:  

Year:  2022        PMID: 35768754      PMCID: PMC9243788          DOI: 10.1007/s10278-022-00670-3

Source DB:  PubMed          Journal:  J Digit Imaging        ISSN: 0897-1889            Impact factor:   4.903


Introduction

The term “no-show” refers to scheduled appointments that a patient misses or arrives too late to utilize medical resources. No-shows can waste provider time, underutilize limited medical resources, disrupt scheduler workflows, cause a loss of revenue for the radiology department, and most important, deprive patients of important medical tests their physicians ordered. “No-shows” do not include cancellations, modifications, or other situations where prior notice was given. Accurately predicting no-shows creates opportunities to intervene, ensuring that patients receive needed medical resources. We hypothesized that a machine-learning prediction model can accurately identify individuals at high no-show risk, to facilitate strategic and targeted interventions. The objectives of this study were to train a prediction model using data readily available in our electronic medical record (EMR) and validate this prospectively at a single outpatient-imaging center. If successful, this algorithm could be used to implement interventions to reduce no-shows. A literature review conducted in 2020 found that 82% of the articles published on predicting missed appointments since 2020 were published during the last 10 years, with logistic regression being the most common algorithm used. Of the 50 studies included in the review, 26% used the same training data and validation data for performance, 62% conducted single validation with split training and validation data, and only 12% of the studies performed a repeat or k-fold validation [1]. Given currently available literature, prospective validation is needed to evaluate model performance.

Methods

This study was determined to be exempt from institutional review board as it was less than minimal risk to human subjects.

Model Creation

39 features from our EMR were selected that had the potential for no-show prediction (Table 1). 4,546,104 non-same-day scheduled appointments in our medical system from 1/1/2017 through 1/1/2020 were selected for training data, including 631,386 no-shows. No-shows are recorded in the EMR by technologists after a missed appointment as part of routine practice. We applied eight ML techniques retrospectively, which yielded cross-validation AUCs of 0.77–0.93 (Table 2). Two separate Gradient Boosted Regression Trees models were created using xgboost (https://xgboost.ai/) and catboost (https://catboost.ai/). Xgboost outperformed catboost, therefore the catboost results were dropped. All code was written in python and no formal statistical package was used.
Table 1

List of features and relative contributions from Gradient Boosted Regression Trees model. It is important to note that this model incorporates up to 4-way interactions between features, so although the metrics above estimate overall feature-by-feature contributions, those contributions should not be thought of as “weights” or considered independently of other features. The four metrics therefore summarize or approximate the impact a feature had on the model. See comments on SHAP values in discussion

Description% Incr% DecrMinMax
Body Mass Index (Most Recent)0.580.42-5.621.22
Appointment Day-of-Year0.500.50-4.651.91
Tobacco User Category0.650.35-4.220.36
Joint Appointment Flag0.980.02-3.901.69
Appointment No Show Count (Previous)0.290.71-1.213.71
Advance Directives0.430.57-0.493.67
Appointment Scheduled From (Epic Module)0.670.33-2.963.56
Appointment Day-of-Week0.460.54-1.233.24
Appointment Department0.430.57-1.193.19
Appointment Department Specialty0.490.51-2.143.07
Appointment LWBS Count (Previous)0.920.08-2.891.69
Provider Type Category0.400.60-1.972.62
Appointment Change Count0.890.11-2.600.88
Referral Flag0.540.46-0.752.45
Appointment Lead Days0.530.47-1.452.36
Appointment Procedure Type0.510.49-1.262.35
Appointment Length0.570.43-1.642.23
Appointment Center (Location)0.440.56-1.682.18
Referral Requested Flag0.220.78-0.682.15
Appointment Normal Status Count (Previous)0.620.38-2.061.55
Zip Code (Patient Permanent Address)0.420.58-0.851.94
Appointment Hour-of-Day0.480.52-1.681.94
Appointment No-Show Ratio0.710.29-0.781.89
Patient Religion Category0.500.50-1.891.00
Appointment Block Category0.570.43-1.601.60
Patient Financial Class0.310.69-1.111.57
Number of Calls (Reminders etc.)0.770.23-1.511.00
Number of Canceled Appointments (Previous)0.510.49-0.831.42
Appointment Confirmation Status0.590.41-1.351.21
Patient Language0.440.56-1.311.01
Patient Ethnic Group0.250.75-0.711.30
Age (on Appointment Date)0.450.55-1.001.26
Homeless Flag0.001.00-0.081.22
Employment Status0.390.61-0.761.21
Veteran Status0.260.74-0.450.91
Marital Status0.560.44-0.660.83
Interpreter Needed Flag0.680.32-0.800.63
Appointment Month0.500.50-0.520.44
Patient Sex0.440.56-0.230.27
Table 2

List of machine learning techniques applied to retrospective data with respective performance as measured by AUC

Machine Learning TechniqueAUC
Epic No-Show Model (Logistic Regression)0.77
Ochsner Model 1 (Logistic Regression)0.81
Ochsner Model 2 (Neural Network)0.82
Ridge Regression0.85
Support Vector Regression0.88
Random Forrest0.92
Deep Feedforward Neural Network (i.e. Deep Learning)0.93
Gradient Boosted Regression Trees0.93
List of features and relative contributions from Gradient Boosted Regression Trees model. It is important to note that this model incorporates up to 4-way interactions between features, so although the metrics above estimate overall feature-by-feature contributions, those contributions should not be thought of as “weights” or considered independently of other features. The four metrics therefore summarize or approximate the impact a feature had on the model. See comments on SHAP values in discussion List of machine learning techniques applied to retrospective data with respective performance as measured by AUC

Prospective Validation

The best performing model, Gradient Boosted Regression Trees, was tested prospectively, over a 6-week period by calculating a no-show risk score two weeks prior to every outpatient’s scheduled appointment at a single outpatient location. Outcomes for all visits were derived from the electronic medical record (EMR) after the scheduled appointment. We binned risk-scores in 0.05 intervals and used Microsoft Excel (Redmond, WA) to calculate the AUC.

Statistical Analysis

Statistical significance was calculated using a t-test to compare the risk score of the two groups of patients (show/no-show) with Microsoft Excel data analysis package. Subgroup analysis to determine the relative risk comparing groups of participants with a specific risk score bin was calculated using MedCalc’s Relative Risk Calculator.

Results

123 no-shows were observed in 2,264 total scheduled exams (5.4%). An ROC analysis yielded AUC of 0.73 (p < 0.0005) (Table 3).
Table 3

Binned analysis of no-show results based on risk score

Risk Score# of Shows# of No ShowRate of No Show
less than 0.05535162.9%
0.05–0.10626192.9%
0.10–0.15349205.4%
0.15–0.20219229.1%
0.20–0.25151106.2%
0.25–0.309366.1%
0.30–0.35781213.3%
0.35–0.45621216.2%
Above 0.4528617.6%
Total21411235.4%
Binned analysis of no-show results based on risk score

Analysis

Sub-group analysis of risk scores above 0.30 demonstrated a 13.3–17.6% rate of no-show. This high-risk subgroup was three times more likely to no-show compared to average. The high-risk subgroup’s (risk score above 0.30) relative risk compared to the low-risk subgroup (risk score ≤ 0.1) was 6.08 (95% interval 12.1 to 4.7, p < 0.0001). Features that most increased the no-show risk score were: Joint appointment flag Appointment left without being seen count Appointment normal status count Appointment no show ratio Interpreter needed flag Features that most decreased the no-show risk score were: Referral requested flag Homelessness flag Patient ethnic group Veteran status Appointment no show count (previous) Features that had the largest effect size for increasing the risk score were: Appointment no show count Advance directives Appointment day of week Appointment department specialty Appointment lead days Features that had the largest effect size for decreasing the risk score were: BMI Appointment day of year Tobacco user category Joint appointment flag Appointment scheduled from

Discussion

Machine learning has the potential to identify patients who are at risk for missing their radiology appointments. Although our risk model achieved statistically significant results (AUC = 0.73, p < 0.0005) for prospective prediction, the prospective performance was worse than all of the retrospective models trained (AUC range 0.77–0.93). Our best performing retrospective model was comparable to a top performing retrospective model in the literature (Kurasawa et. al achieved AUC = 0.958 for missed diabetes appointments) [2]. The difference in performance between retrospective and prospective implementation reinforces the critical need for prospective validation of machine-learning models [3]. Several of the machine learning models performed well, including Random Forest, Deep Feedforward Neural Network, and Gradient Boosted Regression Trees. We are not sure why Gradient Boosted Regression Trees performed best on our retrospective cohort. Our strategy was to empirically select the best performing model to test prospectively after exploring several different machine-learning techniques. The features of the best performing model analyzed in Table 1 were evaluated using SHAP (SHapley Additive exPlanations) [4] values aggregated over all 5 cross-validation sets where: % Incr = percent of time this features increases the risk score % Decr = percent of time this feature decreases the risk score Min/Max = range of contributions for this feature (pre-logistic-transformation) The patient population studied were outpatients scheduled for radiology examinations at a single imaging center owned by a large academic medical center located in west Baltimore. Similar to other studies in the literature, one of the most important features for determining no-show risk was the appointment no-show ratio and appointment no-show count. Algorithm actionability is important for implementation of a model into clinical practice. The highest-risk group (risk score > 0.3) had a no-show rate of 15.1% compared to the lowest-risk group (risk score < 0.1) with a no-show rate of 2.9%. Although this group is 5 times more likely to miss an appointment, the low incidence of no-show events may limit return on investment for cost-intensive interventions (e.g. providing ride-sharing services). It is worth noting that algorithm bias is a concern in radiology [5]. Underlying racial and socioeconomic disparities may be relevant to our results, as our algorithm used features of religion, race, and zip code. Therefore, interventions for high-risk groups such as appointment double booking may lead to worse experiences for disadvantaged patients in the form of longer outpatient imaging wait times. We advise caution to those considering double-booking high-risk no-show patients.

Room for Improvement

We trained our models using appointment data from before the COVID-19 pandemic. Our initial attempts to implement the model were delayed due to prolonged closure of the outpatient-imaging center starting in March of 2020. When we finally evaluated the model, a shift in patient behavior as a result of the pandemic may have contributed to our observed degraded performance relative to retrospective evaluation results. During prospective implementation, our no-show rate decreased relative to the rate from the training data, which may have been due to flexible working conditions for patients with a societal shift to remote work, among other factors. Training on post-pandemic data may improve model performance.

Conclusion

Machine learning can be used to identify patients at risk for missing their radiology appointments. Our model performed worse on prospective than on retrospective data, but results were still statistically significant with respect to no-show prediction. Our results highlight the importance of, and need for, prospective evaluation of machine-learning models before they can be used for clinical decision-making.
  4 in total

Review 1.  The Role of the ACR Data Science Institute in Advancing Health Equity in Radiology.

Authors:  Bibb Allen; Keith Dreyer
Journal:  J Am Coll Radiol       Date:  2019-04       Impact factor: 5.532

2.  Machine-Learning-Based Prediction of a Missed Scheduled Clinical Appointment by Patients With Diabetes.

Authors:  Hisashi Kurasawa; Katsuyoshi Hayashi; Akinori Fujino; Koichi Takasugi; Tsuneyuki Haga; Kayo Waki; Takashi Noguchi; Kazuhiko Ohe
Journal:  J Diabetes Sci Technol       Date:  2016-05-03

3.  Design Characteristics of Studies Reporting the Performance of Artificial Intelligence Algorithms for Diagnostic Analysis of Medical Images: Results from Recently Published Papers.

Authors:  Dong Wook Kim; Hye Young Jang; Kyung Won Kim; Youngbin Shin; Seong Ho Park
Journal:  Korean J Radiol       Date:  2019-03       Impact factor: 3.500

Review 4.  Patient No-Show Prediction: A Systematic Literature Review.

Authors:  Danae Carreras-García; David Delgado-Gómez; Fernando Llorente-Fernández; Ana Arribas-Gil
Journal:  Entropy (Basel)       Date:  2020-06-17       Impact factor: 2.524

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.