Literature DB >> 33088883

An Electronic Medical Record-Based Discharge Disposition Tool Gets Bundle Busted: Decaying Relevance of Clinical Data Accuracy in Machine Learning.

Alexander S Greenstein¹, Jack Teitel², David J Mitten², Benjamin F Ricciardi³, Thomas G Myers³.

Abstract

BACKGROUND: Determining discharge disposition after total joint arthroplasty (TJA) has been a challenge. Advances in machine learning (ML) have produced computer models that learn by example to generate predictions on future events. We hypothesized a trained ML algorithm's diagnostic accuracy will be better than that of current predictive tools to predict discharge disposition after primary TJA.
METHODS: This study was a retrospective cohort study from a single, tertiary referral center for primary TJA. We trained and validated an artificial neural network (ANN) based on 4368 distinct surgical encounters between 1/1/2013 and 6/28/2016. The ANN's ability to identify discharge disposition was then tested on 1452 distinct surgical encounters between 1/3/17 and 11/30/17.
RESULTS: The area under the curve and accuracy achieved during model validation were 0.973 and 91.7%, respectively, with 25% of patients being discharged to skilled nursing facilities (SNFs). Within our testing data set, 6.7% of patients went to SNFs. The performance in the testing set included an area under the curve of 0.804, accuracy of 61.3%, sensitivity of 28.9%, and specificity of 93.8%.
CONCLUSIONS: This is the first prediction tool using an electronic medical record-integrated ANN to predict discharge disposition after TJA based on locally generated data. Dramatically reduced numbers of patients discharged to SNFs due to implementation of a bundled payment model lead to poor recall in the testing model. This model serves as a proof of concept for developing an ML prediction tool using a relatively small data set and subsequent integration into the electronic medical record.

Entities: Chemical

Keywords: Arthroplasty; Artificial intelligence; Discharge; Machine learning

Year: 2020 PMID： 33088883 PMCID： PMC7567055 DOI： 10.1016/j.artd.2020.08.007

Source DB: PubMed Journal: Arthroplast Today ISSN： 2352-3441

Introduction

American health care is currently being transformed from a system focused on quantity to one concentrated on value—defined as health outcomes achieved per dollar spent [1]. With this paradigm shift, a greater percentage of the financial risk is being transferred to health-care systems as part of new innovative payment models. One such example is the Centers for Medicare & Medicaid Services’ Bundle Payments for Care Improvement initiative [2]. Bundled reimbursement models provide a fixed payment for an episode of care (eg, total joint arthroplasty [TJA]) regardless of the resources used. Furthermore, certain reimbursement models being considered penalize providers for a hospital readmission [2]. Bundled payments are becoming more common across payers for TJAs [3], so the need to ensure high-quality care at the lowest possible cost has never been greater. Health-care systems would benefit from understanding which patients are more likely to require a larger amount of resources. Skilled nursing facilities (SNFs) are one example of a high-cost intervention for TJAs. Prior research has demonstrated that patients discharged to an SNF after TJA have increased odds of readmission [4]. In addition, in many bundled arrangements, the use of SNFs postoperatively for patients undergoing TJA cause the health-care system to exceed the predetermined bundled episode of care threshold dollar amount [5]. Therefore, reducing the number of patients undergoing TJA who are discharged to SNFs may decrease health-care costs, thereby leading to an increased overall value. Although traditional statistical approaches are valued in evidence-based medicine, the application of machine learning (ML) to clinical orthopaedic care may improve the identification of patterns within complex data sets. For example, recent work by Navarro et al showed that an ML algorithm using the New York State administrative database could accurately predict length of stay and costs before primary total knee arthroplasty (TKA) [6]. Previous studies have all used large administrative databases working off the premise that these large administrative data sets are necessary to leverage the power of ML. However, previous authors outside of orthopaedics were able to show that prioritizing small amounts of recent (accurate) data is more effective than using larger amounts of older data toward future clinical predictions [7]. We sought to explore these concepts in TJA to see if an ML algorithm could be trained on local hospital-level data to accurately predict the discharge disposition of patients undergoing TJA and integrate such a prediction tool into our electronic medical record (EMR). Our hypothesis was that using hospital data generated from clinically relevant local practice patterns would accurately predict discharge disposition after primary TJA.

Material and methods

The study was approved by the University of Rochester Institutional Review Board.

Data sources and study population

Local data were collected through the electronic health record retrospectively, which encompasses 2 hospitals within the same health system: a large level one academic center and a smaller community hospital. Data were collected through an SQL query of the Epic Clarity database. Patients were included in our cohort based on Current Procedural Terminology (CPT) billing codes for primary total hip arthroplasty (THA) (CPT 27130) or primary TKA (CPT 27447). Missing variables were replaced with null values and included within our cohort. Exclusion criteria were limited to incomplete or absence of variables necessary to train the ML algorithm. For our training and validation cohort, we collected data on patients who had undergone surgery between 1/1/2013 and 6/28/2016. A total of 4370 distinct surgical encounters were identified from 3887 unique patients. Our testing cohort was based on data from patients who had undergone surgeries between 1/1/2017 and 11/30/2017. During this series, we identified 1467 distinct surgical encounters within 1395 unique patients, with some patients receiving more than one TJA.

Developing the ML algorithm

The model we used was a fully connected feed-forward artificial neural network (ANN) with a single 100-node hidden layer. An ANN works by taking the input variables, multiplying them by weights (amount of impact), and using a nonlinear function to scale this information within a range. This process occurs at each node within the neural network to optimize its ability to determine the output variable. The model was trained on Microsoft Azure Machine Learning Studio [8]. Initial selection of variables to be examined was determined through literature review and careful consideration, which led to a set of 38 variables for the model (Appendix). Once we had this initial set, we refined the predictive power of each variable. We sequentially retrained the model, excluding one variable at a time. If the predictive value of the model improved or did not change with the removal of that variable, we knew that the variable was not important. We then removed all nonimportant variables from our model. Eleven variables were ultimately selected based on predictive power: age, race, ethnicity, height, weight, gender, previous SNF admission, solitary living arrangement (lives alone), laterality of the procedure, the procedure performed (THA or TKA), and whether or not the procedure was performed using the anterior approach (THA only) (Table 1).

Appendix

Initial variables chosen for model development.

Medical record number	Comorbidities
Service date	Asthma
Service location (hospital in system)	Atrial fibrillation
Gender	Coronary artery disease
Address	Congestive heart failure
City	Chronic obstructive pulmonary disease
State	Diabetes
Zip code	Hypertension
Length of stay	Obesity
Ethnic group	Chronic kidney disease
Race	Depression
Lives alone	Osteoporosis
Height	Chronic liver disease
Weight	Sickle cell
Previous skilled nursing facility admission	Hyperlipidemia
Previous surgery
Age
Insurance 1
Insurance 2
Diagnosis code 1
Diagnosis code 2
Diagnosis code 3
Provider
Procedure

Table 1

Demographic distribution of training, validation, and testing cohorts.

Joint (#)	Training cohort		Validation cohort		Testing cohort
Joint (#)	TKA (936)	THA (755)	TKA (170)	THA (180)	TKA (615)	THA (837)
Age (years)	69 (42-90)	67 (20-90)	69 (47-88)	66.5 (30-90)	69 (47-90)	65 (21-90)
Gender (% male)	35.58%	40.66	35.29%	35.00%	43.25%	46.12%
Race % white	89.64%	90.20%	85.29%	92.78%	90.89%	92.59%
Race % black	7.80%	8.21%	10%	5.56%	6.99%	6.45%
Race % other	2.56%	1.59%	4.71%	1.66%	2.12%	0.96%
Ethnicity (% Hispanic)	1.50%	1.59%	1.18%	1.11%	1.30%	0.84%
Height (inches)	65 (54-78.5)	66 (54-78)	65 (55.5-77)	65.98 (57-75)	65.98 (57-77)	66.5 (54-77)
Weight (oz)	3040 (1392-5392)	2864 (1360-5600)	3128 (1760-5168)	2848 (1504-4843.2)	3200 (1712-5040)	2928 (1456-5288)
Previous SNF admission (% yes)	37.67%	25.03%	36.47%	21.67%	7.48%	4.78%
Laterality (% right)	14.85%	15.23%	15.29%	15.00%	52.52%	54.84%
Laterality (% left)	12.39%	16.03%	14.71%	12.22%	47.15%	44.44%
% Lives alone (% yes)	10.35%	10.20%	7.65%	10.00%	6.99%	7.65%

Demographic distribution of training, validation, and testing cohorts. The outcome for our model was the patient’s discharge disposition, which was converted into a binary variable with 1 being a discharge to an SNF and 0 being a discharge elsewhere. The training data set discharge to the SNF rate was 25%. This class imbalance is highly undesirable for ML as it has the potential to lead to a biased model. Therefore, to control this potential bias, we randomly selected one-third of our data that had an outcome of 0 (elsewhere), combined that with all the data that had an outcome of 1 (SNF discharge), and used this as our effective training data set [9]. To allow for both training and validation of our training model, we randomly split our effective data set into 2 data sets, where 80% of our effective data went into the training set and 20% were assigned into a validation set.

Evaluating the algorithm

We evaluated our algorithm using 2 metrics—overall accuracy and area under the receiver operating curve (ROC). Accuracy is the ratio of correct predictions to the whole pool of subjects. Area under the curve (AUC) is commonly used in classification analysis, including ML, to evaluate how well a model predicts classes. It is computed by evaluating the ROC and is calculated to predict the probability that the algorithm will rank a randomly chosen positive example greater than a randomly chosen negative example. As models provide a greater probability of fulfilling this prediction, the ROC approaches 1 [10]. Generally, an ROC of greater than 0.70 suggests a strong model [10]. We evaluated our algorithm on 2 separate sets of data. Initially, we used the validation set to evaluate our model throughout the training phase where we developed the model and tuned the model architecture and hyperparameters. Thereafter, we applied the ANN to our testing set, consisting of the 2017 data, to evaluate the generalizability of our model. The model was only evaluated on the testing set after it had been successfully trained and validated with the validation set.

Integration into the EMR

After developing our model, we integrated it directly into the Epic EMR. Within the chart of each patient who underwent TJA, we created a new button labeled “joint analytics.” When this button was pressed by the provider, it would automatically pull the necessary data from the patient chart. It would then deidentify these data and send them to a cloud service on which our model was running. Our model would process these data and send the results back to the chart. We would then graphically display the model results directly within the patient chart for the provider. We also displayed the values of all 11 data points used within the model so the provider could see the variables used in the analysis. In addition, we allowed providers to change the value of these variables and rerun our model. We felt this was quite useful for intervention planning, as both the provider and patient would be able to immediately see the projected results of any intervention. When the provider changed a variable, it would not affect the value of the corresponding field within the patient’s chart, but only the value that the model sees; thus, the patient chart would remain unaffected. An example of this would be if the provider wanted to see what the patient’s outcome would look like if they had assistance at home. For this, they would change the variable “lives alone” from 0 to 1, rerun the model, and see how their SNF admission risk changes (Fig. 1).

Figure 1

EMR-integrated discharge dashboard.

Results

Our training and validation cohort included a total of 4368 distinct surgical encounters identified from 3886 unique patients who underwent surgery between 1/1/2013 and 6/30/2016. The testing cohort, which included patients who had undergone surgery between 1/1/2017 and 11/30/2017, identified 1385 unique patients with 1452 distinct surgical encounters. The AUC we achieved when validating our model was 0.973. The overall accuracy within the validation cohort was 91.7%. The testing cohort demonstrated similar prediction power with an overall AUC of 0.819 and an overall accuracy of 91%. Of note, however, only 6.7% of patients discharged to SNFs within our testing data set compared with 25% SNF discharge in the training and validation cohorts. Because of the change in practice pattern due to implementation of a bundled payment model and resultant imbalance, we reanalyzed our testing cohort after selecting an equal number of SNF to non-SNF discharged patients. When testing the model with this revised data set, we found an AUC of 0.804 with an accuracy of 61.3% (Fig. 2). Under the balanced testing set, the model achieved 28.9% sensitivity, accurately predicting 28 of 97 patients who went to an SNF, and 93.8% specificity.

Figure 2

Balanced ROC curve—the rate of true positives (true discharges to home) vs false-positive rates (algorithmically selected SNF discharge patients who were in fact discharged home).

Discussion

An accurate method to predict outcomes in real time after TJA is currently lacking. Prediction tools integrated into commercially available EMRs are currently unavailable. The existence of such tools would assist physician-patient interactions and help members of the arthroplasty support team (eg, nurse navigators) better target at-risk patients. Identification of such patients may help patient optimization efforts and cost-saving measures, depending on the identified risk factor. In this proof-of-concept study, the risk factor identified was discharge to an SNF. To our knowledge, this is the first use of artificial intelligence on local data with a prediction tool built into the EMR. The validated model performed with excellent combined sensitivity and specificity as well as excellent accuracy with an ROC and accuracy of 0.973 and 91.7%, respectively. However, owing to the change in practice patterns, specifically bundle-driven decreased SNF utilization over the study period, our model performed poorly in our testing cohort with a resulting 31% sensitivity and 95.6% specificity. Previous authors have studied various discharge prediction tools derived from several different sources. Menendez et al. investigated an Activity Measure for Post-Acute Care “6-Click” model and found an AUC of 0.78 [11]. Gholson et al. created a discharge prediction tool using the American College of Surgeons National Surgical Quality Improvement Program database with an AUC of 0.70 [12]. While the work evaluated the ability of the National Surgical Quality Improvement Program data to generate a discharge prediction tool, the authors acknowledged the study lacked validation of their model [12]. Probably the most cited discharge prediction tool in TJA is the Risk Assessment and Prediction Tool (RAPT) (Table 2) [[13], [14], [15], [16], [17], [18]]. The RAPT has been applied prospectively with intermediate success within the literature, including within the setting of the bundled care for TJA [16,17]. Overall, current discharge prediction tools, including the RAPT, are better at predicting who will discharge home compared with who will discharge to an SNF or an SNF equivalent.

Table 2

Risk Assessment and Prediction Tool.

Item	Value	Score
Age group(y)	50-65	2
	66-75	1
	>75	0
Gender	Male	2
	Female	1
Ambulation (block = 200 m)	Two blocks or more	2
	1-2 blocks	1
	Housebound	0
Walking aids	None	2
	Single-point stick	1
	Crutches/frame	0
Use of community support (home help, home nurse, meals on wheels)	None or 1 per week	1
	Two or more per week	0
Postoperative caregiver	Yes	3
	No	0

Risk Assessment and Prediction Tool. The most recent and analogous study to this study was performed by Ramkumar et al [19]. The authors trained an ANN using the Nationwide Inpatient Sample database and externally validated it using a local prospective institutional database (Orthopaedic Minimal Data Set Episode of Care). The authors found the model performance in the training cohort of 76.1% and 69.4% in the AUC and accuracy, respectively. After external validation of their model with a local prospective database, the AUC and accuracy were 69.2% and 64.4%, respectively. The model in the present study trained on local institutional data performed similar to the model of Ramkumar et al [19] trained on Nationwide Inpatient Sample data in their respective validation cohorts (Table 3). This is an interesting finding because it is commonly accepted that to optimize ML performance, more data are thought to be better. However, this may not be true if the data do not accurately reflect local or current practice patterns. The locality of data may, but does not completely, explain important differences between these 2 studies. Other factors include how the primary outcome variable was defined. In our study, the primary outcome of interest was “SNF” or “not-SNF,” whereas in the study by Ramkumar et al., it was “home” or “not home.” We included all patients receiving primary TJA, whereas Ramkumar et al. included only patients with Medicare older than 65 years. Probably, the most important differences between the studies were the variables used to train the ANN. Ramkumar et al. included variables related to the admission (the type of admission and day of the week) and patient (age, gender, ethnicity, race, all patient refined risk of mortality and severity of illness, etc.). It is possible that the differences in variable selection between the 2 studies ultimately reflect the availability of variables within an administrative database vs local data. However, given the results of our model, smaller amounts of highly accurate data can likely work as well, if not better, than attempting to create a prediction tool based on broad national data.

Table 3

Summary of recent studies with discharge prediction tools.

Study lead author	Discharge tool	Cohort	Accuracy	AUC	% Of cohort discharge to SNFs
The present study	ANN	Retrospective Institutional	91.7% in the validation cohort	0.97 in the validation cohort	25% in the training cohort
The present study	ANN	Retrospective Institutional	61.3% in the testing cohort	0.80 in the testing cohort	6.7% in the testing cohort
Ramkumar et al. [19]	ANN	NIS—training	69.4% in validation cohort	0.76 in the validation cohort	45.6%
Ramkumar et al. [19]	ANN	OME—validation	64.4% in testing cohort	0.69 in the testing cohort	47.1%
Gholson et al. [12]	ACS-NSQIP	Retrospective multicenter	n/a	0.70 (not validated)	n/a
Menendez et al. [11]	AM-PAC	Retrospective institutional	n/a	0.78	39%
Dibra et al. [15]	RAPT	Retrospective institutional	88%, however 52.2-78.7% in intermediate risk patients	n/a	14%
Hansen et al. [16]	RAPT	Prospective institutional	78%, however 65.2% in intermediate risk patients	n/a	15%
Slover et al. [17]	RAPT	Prospective institutional	n/a; however, 28% of high-risk and 76% intermediate-risk patients discharge to home	n/a	29.5%

NIS, Nationwide Inpatient Sample; OME, orthopaedic minimal data set episode of care.

Summary of recent studies with discharge prediction tools. NIS, Nationwide Inpatient Sample; OME, orthopaedic minimal data set episode of care. Predictive models, especially those assuming a static relationship between the input (eg, patient factors) and output variables (eg, discharge to SNF), are subject to poor and degrading performance. Changes in underlying data occur because of changing practice patterns, changes in the population, or the complex nature of the health-care environment. In ML, these unexpected changes are referred to as concept drift [[20], [21], [22]]. This was shown by Chen et al. who looked at an ML algorithm trained to predict and then recommend future clinical orders based on EMR data [7]. They concluded that prioritizing small amounts of recent data (current) is more effective than using larger amounts of older data toward future clinical predictions [7]. For this reason, we tested our model on a temporally adjacent data set. As practice patterns change, an ML model would need to be periodically refit and updated to avoid the inevitable concept drift of changes in practice patterns, be it systemic or local. Concept drift is the single best explanation for the low sensitivity, and high sensitivity of our model in the testing cohort was the relatively low SNF utilization rate in our testing cohort. Our institution adopted bundle payments for TJA around the year 2014. The subsequent years saw SNF utilization drop from 25% to 6.7%. This change in practice pattern occurred during the years that were used to train the algorithm (2013-2016) and therefore would directly impact the model’s accuracy on any testing cohort with such low SNF utilization. For example, if the model would have predicted that every single patient in our cohort would discharge to home, it would still maintain a high accuracy of 93.3%. The validation phase in our study showed a much higher accuracy (91.7%) and AUC (0.973) than the testing phase, likely reflecting the higher historical SNF usage during the model’s training years. There are limitations to our study. Institutional practices are an important variable in predicting patient discharge, and these were noted to be in flux during the training of the ANN [23,24]. Although the exclusive use of local data optimized our results, it almost certainly limits the generalizability of our tool. However, the lack of generalizability should be an expectation and not necessarily a limitation when dealing with ML predictions involving practice patterns.

Conclusions

This is the first prediction tool using an EMR-integrated ANN to preoperatively predict the likelihood a patient will be discharged to an SNF after TJA based exclusively on locally generated data. Our model is a proof of concept to preoperatively counsel patients and define expectations for patients, families, clinicians, surgeons, and hospitals regarding postoperative care after arthroplasty. Future studies will further refine the ANN and evaluate the effect of current practice patterns on discharge prediction models.

Conflict of interests

T.G. Myers is a member of the editorial/governing board of the Journal of Arthroplasty, and B.F. Ricciardi is a board/committee member of Clinical Orthopaedics and Related Research and Arthroplasty Today; all other authors declare no potential conflicts of interest.

17 in total

1. Geographic variation in the use of post-acute care.

Authors: Robert L Kane; Wen-Chieh Lin; Lynn A Blewett
Journal: Health Serv Res Date: 2002-06 Impact factor: 3.402

2. A strategy for health care reform--toward a value-based system.

Authors: Michael E Porter
Journal: N Engl J Med Date: 2009-06-03 Impact factor: 91.245

3. Skilled Nursing Facilities After Total Knee Arthroplasty: The Time for Selective Partnerships Is Now!

Authors: Sean P Ryan; Daniel E Goltz; Claire B Howell; David E Attarian; Michael P Bolognesi; Thorsten M Seyler
Journal: J Arthroplasty Date: 2018-08-18 Impact factor: 4.757

4. Does "6-Clicks" Day 1 Postoperative Mobility Score Predict Discharge Disposition After Total Hip and Knee Arthroplasties?

Authors: Mariano E Menendez; Charles S Schumacher; David Ring; Andrew A Freiberg; Harry E Rubash; Young-Min Kwon
Journal: J Arthroplasty Date: 2016-02-17 Impact factor: 4.757

5. Predicting discharge outcomes after total knee replacement using the Risk Assessment and Predictor Tool.

Authors: C Tan; G Loo; Y H Pua; H C Chong; W Yeo; P H Ong; N N Lo; G Allison
Journal: Physiotherapy Date: 2013-07-02 Impact factor: 3.358

6. Decaying relevance of clinical data towards future decisions in data-driven inpatient clinical order sets.

Authors: Jonathan H Chen; Muthuraman Alagappan; Mary K Goldstein; Steven M Asch; Russ B Altman
Journal: Int J Med Inform Date: 2017-03-18 Impact factor: 4.046

7. Transcultural validation of the Risk Assessment and Predictor Tool (RAPT) to predict discharge outcomes after total hip replacement.

Authors: E Coudeyre; B Eschalier; S Descamps; A Claeys; S Boisgard; C Noirfalize; L Gerbaud
Journal: Ann Phys Rehabil Med Date: 2014-03-24