Literature DB >> 32269037

Use of electronic medical records in development and validation of risk prediction models of hospital readmission: systematic review.

Elham Mahmoudi^1,2, Neil Kamdar^2,3,4,5, Noa Kim⁶, Gabriella Gonzales^7,6, Karandeep Singh^8,9, Akbar K Waljee^10,11,12.

Abstract

OBJECTIVE: To provide focused evaluation of predictive modeling of electronic medical record (EMR) data to predict 30 day hospital readmission.
DESIGN: Systematic review. DATA SOURCE: Ovid Medline, Ovid Embase, CINAHL, Web of Science, and Scopus from January 2015 to January 2019. ELIGIBILITY CRITERIA FOR SELECTING STUDIES: All studies of predictive models for 28 day or 30 day hospital readmission that used EMR data. OUTCOME MEASURES: Characteristics of included studies, methods of prediction, predictive features, and performance of predictive models.
RESULTS: Of 4442 citations reviewed, 41 studies met the inclusion criteria. Seventeen models predicted risk of readmission for all patients and 24 developed predictions for patient specific populations, with 13 of those being developed for patients with heart conditions. Except for two studies from the UK and Israel, all were from the US. The total sample size for each model ranged between 349 and 1 195 640. Twenty five models used a split sample validation technique. Seventeen of 41 studies reported C statistics of 0.75 or greater. Fifteen models used calibration techniques to further refine the model. Using EMR data enabled final predictive models to use a wide variety of clinical measures such as laboratory results and vital signs; however, use of socioeconomic features or functional status was rare. Using natural language processing, three models were able to extract relevant psychosocial features, which substantially improved their predictions. Twenty six studies used logistic or Cox regression models, and the rest used machine learning methods. No statistically significant difference (difference 0.03, 95% confidence interval -0.0 to 0.07) was found between average C statistics of models developed using regression methods (0.71, 0.68 to 0.73) and machine learning (0.74, 0.71 to 0.77).
CONCLUSIONS: On average, prediction models using EMR data have better predictive performance than those using administrative data. However, this improvement remains modest. Most of the studies examined lacked inclusion of socioeconomic features, failed to calibrate the models, neglected to conduct rigorous diagnostic testing, and did not discuss clinical impact. © Author(s) (or their employer(s)) 2019. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical

Mesh：

Year: 2020 PMID： 32269037 PMCID： PMC7249246 DOI： 10.1136/bmj.m958

Source DB: PubMed Journal: BMJ ISSN： 0959-8138

Introduction

Hospitals across the US continue to be under scrutiny to reduce their 30 day readmission rates (hereafter readmission), as a measure of both hospital quality and cost reduction. The Hospital Readmissions Reduction Program is a Medicare value based program that since October 2012 has started reducing payments to hospitals with excess readmissions.1 Between 2007 and 2015, readmission rates for specific conditions dropped from 21.5% to 17.5%.2 This has been largely attributed to investments by hospitals to enhance their discharge processes,2 which include providing better medication reconciliation, educating patients and their care givers regarding continuity of care, and implementing follow-up processes for discharged patients. However, implementation of an effective discharge process in hospitals is time consuming and expensive. The development of readmission risk tools has increased sharply in recent years to enable precise identification of patients at high risk and inform a more efficient use of post-discharge care coordination. However, because of the complexity of inpatient care and discharge processes, achieving a high sensitivity and specificity in predicting who is at risk of readmission and why is still a work in progress. The accuracy and reliability of risk models largely depend on predictors and methods of development, validation, calibration, and clinical utility.3 In the context of choosing an appropriate set of predictors, administrative data are inherently limited, primarily due to the lack of clinical specificity for conditions and laboratory results. With recent multibillion dollar investments in electronic medical records (EMRs) and their increasing use and application in healthcare systems,4 the use of machine learning methods in medicine has also expanded. Thus, the past few years has seen a surge in the development of highly sophisticated predictive models using EMRs. Two previously published systematic reviews of predictive models of readmission—regardless of the data source used or whether the model was validated—assessed predictive models up to 2015.5 6 Gaps exist in the knowledge about predictive models of readmission that leverage the use of EMRs and new methods of prediction. This study focuses on validated predictive models of readmission that specifically use EMR data. We adopted the systematic review guide for evaluation of prediction model performance.7 The objectives of this study were to evaluate the variation in predicting readmission for all patients versus patient specific populations, to examine the properties of the EMR based candidate features, to assess differences in performance between traditional regression and machine learning models, and to assess the quality of the studies.

Methods

Information sources and search

We searched Ovid Medline, Ovid Embase, CINAHL, Web of Science, and Scopus by using an inclusive combination of exploded MeSH subject headings, keywords, and title, abstract, and full text keywords, with and without adjacencies when available, with a publication date range of 1 January 2015 to 1 January 2019. The last electronic database search took place in April 2019. We imported all citations into electronic citation management software (EndNote X9). Supplementary tables A-C provide detailed information on inclusion and exclusion criteria and on our search strategy.

Eligibility criteria

Studies eligible for inclusion were peer reviewed and published between 1 January 2015 and 1 January 2019. We included only studies that developed and validated a predictive model of hospital readmission within 28 or 30 days after initial discharge. We excluded studies that did not use EMR data in the development or validation of the model, studies published before 2015 owing to overlap with previous reviews,5 6 studies not published in English, and conference abstract only references (supplementary table A). We did not do an extensive hand search for this systematic review.

Study selection

After de-duplication, two authors (EM and GG) screened our initial 3506 citations for title and abstract relevance. We excluded 3206 records and accessed 300 resulting citations in their full text form. Two authors evaluated each article independently by using the inclusion and exclusion criteria shown in figure 1. Discrepancies between reviewers were resolved through additional review during group discussions.

Fig 1

Schematic flow diagram of selected studies[A: I believe the number of full text articles excluded should be 259 rather than 257]

Data extraction

Two authors (EM and GG) extracted data from the final included studies to profile each model’s population (table 1 and table 2), candidate features (supplementary table D), model description (supplementary table E), and quality assessment (supplementary table F). To ease the cross linkage between tables 1 and 2 and the supplements, we organized all supplementary tables similarly. Firstly, we separated the included studies into two general categories: all patient populations and specific patient populations. We then listed studies in each group alphabetically according to lead author’s last name.

Table 1

Characteristics of patients and hospitals in studies that included all patient populations

Study	Study population	Hospital type	Multicenter	Total sample size		Observed readmission rate (%)
Study	Study population	Hospital type	Multicenter	Derivation	Validation	Observed readmission rate (%)
Amarasingham et al, 20158	Adults 18+	Non-academic/large community	Yes	19 831	19 773	12.7
Brindise et al, 20189	All patients	Non-academic/large community	Yes	8,814	4,407	23
Chen et al, 201610	NA	Academic	No	15 629	1,897	8.3
Damery et al, 201711	18+	Non-academic/large community	No	51 747	51 747	7.7
Escobar et al, 201512	18+	Non-academic/large community	Yes	179 978	180 058	Any:14.5; non-elective: 12.5
Greenwood et al, 201813	18+	Non-academic/large community	Yes	39 155	NA	11.1%
Hao et al, 201514	All patients	Non-academic/large community	Yes	24 810	Retrospective: 24 857’; prospective: 118 951	Retrospective: 13.2; prospective: 14.7
Jamei et al, 201715	All patients	Non-academic/large community	Yes	268 652	67 163	9.7
Logue et al, 201616	18+	Academic	No	958	Bootstrap cross validation	14
Morris et al, 201617	18+	VA surgical quality improvement	Yes	213 697	23 744	11.1
Nguyen et al, 201618	All patients	Academic and non-academic	Yes	16 492	16 430	12.7
Rajkomar et al, 201819	18+	Academic	Yes	194 470	21751	Hospital A: 10.5; hospital B: 15.1
Shadmi et al, 201520	18+	Academic and non-academic	Yes	22 406	11 233	15.2
Tabak et al, 201721	18+	Academic and non-academic	Yes	836 992	358 648	11.9
Tong et al, 201622	All patients	Non-academic/large community	Yes	80 000	80 000*	11.5
Walsh et al, 201723	All patients	Academic	No	92 530	27 470	All cause:13.4
Wang et al, 201824	NA	Non-academic/large community	No	41 503/700	60:15:25 split for training, validation, and testing	Hospital data: 6; operating room data: 17.7

NA=not available; VA=Veterans Affairs.

Different sample sizes of 2500, 5000, 20 000, and 80 000 for derivation and validation of four different models were considered. Results shown are for sample size of 80 000.

Table 2

Characteristics of patients and hospitals in studies that included specific patient populations

Study	Study population	Hospital type	Multicenter	Total sample size		Observed readmission rate (%)
Study	Study population	Hospital type	Multicenter	Derivation	Validation	Observed readmission rate (%)
Asche et al, 201625	18+ AMI patients	Academic and non-academic	Yes	3058	Fivefold cross validation	8.9
Benuzillo et al, 201826	18+ CABG patients	Academic and non-academic	Yes	1693	896	9.15
Cheung et al, 201827	18+ heart failure patients	Non-academic/large community	No	4711	2019 (validation and testing)	13
Eby et al, 201528	18+ type 2 diabetes patients	Academic and non-academic	Yes	52 070	Bootstrap resampling with 500 iteration	10
Flythe et al, 201629	18+ hemodialysis patients	Academic	No	349	Bootstrap resampling with 1000 iteration	32.1
Golas et al, 201830	18+ heart failure patients	Academic and non-academic	Yes	11 510	Bootstrap 10-fold cross validation	23
Hatipoglu et al, 201831	18+ pneumonia patients	Academic	No	1295	393	25
Horne et al, 201632	18+ heart failure patients	Academic and non-academic	Yes	Total: 6079	Total: 2663	14.1
				Female: 3013	Female: 1318	12.5
				Male: 3066	Male: 1334	16.5
					External total: 5162	15.6
					External female: 2537	14.6
					External male: 2625	18.6
Karunakaran et al, 201833	18+ type 2 diabetes patients	Academic	No	44 203	Split sample	20.4
Mahajan et al, 201834	18+ heart failure patients	VA health center	Yes	1210	Bootstrap 10-fold cross validation	21.7
Makam et al, 201735	18+ pneumonia patients	Academic and non-academic	Yes	1463	Fivefold cross validation	13.6
McGirt et al, 201536	18+ low back surgery patients	Academic	No	1803	361	5.9
Nguyen et al, 201837	18+ AMI patients	Academic and non-academic	Yes	826	Fivefold cross validation	13
Padhukasahasram et al, 201538	18+ heart failure patients	Non-academic/large community	No	789	10-fold cross validation	54.4
Reddy et al, 201839	18+ lupus patients	Academic and non-academic	Yes	9457	70:30 split	17.2
Rubin et al, 201640	18+ type 2 diabetes patients	Academic	No	26 522	17 681	20.4
Rubin et al, 201741	18+ type 2 diabetes patients admitted to hospital for cardiovascular conditions	Academic	No	4950	3219	20
Rumshisky et al, 201642	18+ psychiatric patients	Academic	No	3281	1406	22
Shameer et al, 201643	18+ heart failure patients	Non-academic/large community	No	748	320	16.6
Taber et al, 201544	18+ kidney transplant patients	Academic	No	1147	Bootstrap cross validation	11
Wang et al, 201624	18+ heart failure patients	Academic	No	4548	Bootstrap cross validation	33
Xiao et al, 201845	18+ heart failure patients	NA	NA	3000	67:16:16 split for training, validation, and testing	NA
Zheng et al, 201546	18+ heart failure patients	NA	NA	1641	Fivefold cross validation	19.3
Zolbanian et al, 201847	18+ heart failure patients	Academic and non-academic	Yes	Heart failure: 32 350; COPD: 31 070	70:30 10-fold cross validation	NA

AMI=acute myocardial infarction; CABG=coronary artery bypass graft; COPD=chronic obstructive pulmonary disease; NA=not available.

Characteristics of patients and hospitals in studies that included all patient populations NA=not available; VA=Veterans Affairs. Different sample sizes of 2500, 5000, 20 000, and 80 000 for derivation and validation of four different models were considered. Results shown are for sample size of 80 000. Characteristics of patients and hospitals in studies that included specific patient populations AMI=acute myocardial infarction; CABG=coronary artery bypass graft; COPD=chronic obstructive pulmonary disease; NA=not available.

Data synthesis

The wide heterogeneity of the included models did not permit a quantitative meta-analysis of their performance; however, we provide a qualitative review and synthesis of population studied and model characteristics. To analyze the differences between studies that used machine learning methods and those that used traditional regression or between those for all patient populations and those developed for specific populations, we assumed that every study was weighted equally regardless of the number of patients and/or methods used. If more than one model was used in a study, we chose the C statistic representing the maximum for that study. We report validation C statistics in this review; however, when the study was ambiguous about the C statistic being from either the development or validation dataset, we assumed they were being reported from the validation cohort. Finally, on further analysis, we calculated 95% confidence intervals for the C statistics of different study groups. We also calculated the 95% confidence intervals for the difference in the mean C statistic between the two study groups to ascertain potentially significant differences in concordance.

Results

From 3506 titles and abstracts (after removing 937 duplicates), we selected 300 articles for complete text review. Our final set included 41 studies that met our inclusion criteria (fig 1). We divided these studies on the basis of their population cohort into all patient populations (n=17, including one intensive care unit and one emergency department readmission) and patient specific populations (n=24). Most patient specific models were for heart conditions (n=13).24 25 26 27 30 32 34 37 38 43 45 46 47 The remainder were based on readmission among patients with diabetes (n=4),28 33 40 41 kidney transplantation (1),44 hemodialysis (1),29 low back surgery (1),36 pneumonia (2),31 35 lupus (1),39 and psychiatric conditions (1).42 Thirty nine studies were based on data from US hospitals, and two were from other developed countries (the UK11 and Israel20). The total sample size in each model ranged from 349 to 1 195 640.21 29 All validations were done internally; most were conducted through retrospective validation (n=37) and used split sample (n=24) or cross validation (n=11) methods. The C statistics ranged between 0.52 and 0.90,23 24 with 17 studies reporting a C statistic of 0.75 or greater.11 12 14 15 16 17 19 23 29 33 34 36 37 42 43 46 47

Characteristics of patients and hospitals

Table 1 and table 2 show characteristics of patient populations and hospitals. Seventeen studies developed predictive models of readmission for all patient populations (table 1). Most studies included adults 18 years or older (n=9). Twelve studies used data from multiple hospitals. Included centers were non-academic (n=9),8 9 11 12 13 14 15 22 24 academic (n=4),10 16 19 23 or a combination of both (n=3).18 20 21 Observed readmission rates for these sets of models were between 6% and 23%.9 24 Twenty four studies developed predictive models of readmission among specific patient populations (table 2). Most (13/24 studies) of these models were developed for patients admitted with heart conditions.24 25 26 27 30 32 34 37 38 43 45 46 47 All patient specific studies included adults 18 years or older. Ten studies used data from multiple hospitals.25 26 28 30 32 34 35 37 39 47 Data came from academic centers (n=9),24 29 31 33 36 40 41 42 44 non-academic centers (n=3),27 38 43 or a combination of both (n=10).25 26 28 30 32 35 37 39 47 Observed readmission rates for patient specific readmissions ranged between 5.9% and 54%.36 38

Candidate features and predictors

Supplementary table D summarizes the features used in the predictive models. We categorized the features into five groups: clinical data, demographics, healthcare encounter history, functional status, and socioeconomic status. Using EMR data, detailed clinical and healthcare encounter data such as admission type and discharge location, primary and additional diagnoses, morbidities, laboratory results, vital signs, type and number of drugs, and basic demographics such as age, sex, race and ethnicity, and insurance type were readily available and thus examined in most of the predictive models. Additionally, Escobar et al and Morris et al used length of operating room stay in hours as a proxy for complexity of surgical procedure if the inpatient hospital stay included any surgical procedure.12 17 Being admitted to an intensive care unit and number of procedures during the index hospital stay have also been used as proxies for the complexity of a patient’s condition.15 18 19 22 25 27 28 30 38 43 A few studies used composite clinical scores that are not readily available to account for severity of conditions for patients admitted to hospital. For example, Tong et al used the Braden Score to indicate risk of pressure ulcers,22 and Escobar et al used the Comorbidity Point Score or COPS2,12 which uses 45 of the 70 possible Hierarchical Condition Categories originally developed by the Centers for Medicare and Medicaid Services to measure the severity of a patient’s comorbidity. Other noteworthy composite scores included severity of illness on the day of admission and discharge based on the Laboratory Acute Physiology Score,12 Acute Laboratory Risk of Mortality Score,21 polypharmacy (more than six medicines),16 22 surgical complications,41 number of laboratory results marked as “high,” “low,” or “abnormal,”9 use of specific drugs among patients admitted for heart failure,38 or use of 10 or more drugs at the time of admission among hemodialysis patients.29 Functional status is usually not recorded in structured EMR data. Only seven studies included measures of disability or limitations on activities of daily living in their models.13 16 17 20 34 36 38 Shadmi et al used a disability indicator, which is routinely collected in Clalit Health Services data in Israel,20 and this proved to be a top predictor for readmission. Morris et al used functional status available via Veterans Affairs data and nurses’ notes,17 and McGirt et al used questionnaire data in addition to EMRs to collect this information.36 Sixteen studies considered various socioeconomic proxies for socioeconomic status as candidate predictors for readmission.9 13 14 15 16 17 18 20 21 28 30 33 35 38 40 41 44 These studies used nurses’ notes, self-reported patient questionnaires, or census 2010 block level or zip code level aggregate data to include features such as income and education. Despite the cited importance of care giver availability, only Greenwood et al used availability of a support person after discharge.13 A couple of studies showed that proxies for socioeconomic status (not having a high school degree, being enrolled in Medicaid, and living in a poor neighborhood) were strong predictors of readmission.35 44 The top predictors among all models mostly included healthcare encounter history (previous emergency or inpatient visits within three to six months before index hospital admission)11 14 15 and a variety of clinical data indicating the severity of the patient’s condition during the index admission (low level of albumin or using a variety of constructed severity scores).12 21 22 As stated above, a few studies also showed that disability/functional status measures and socioeconomic status were strong predictors of readmission.8 17 20 28 33 44

Predictive models

Supplementary table E summarizes characteristics of predictive models used in the included studies. Timing of the prediction is of paramount importance for institutions to operationalize these risk assessment tools for readmission. Most studies (n=23) predicted readmission right before or at discharge. Ten studies did not report the timing of their predictions; the rest reported it as within 24 hours after admission (n=3),10 23 32 before admission (n=1),20 or after discharge (n=3).30 33 34 Most of the studies (n=24) examined more than one predictive model and chose the model with the highest C statistic and fewest predictors. Although all models are presented, for the ease of representation we chose the model with the highest C statistic for each study. Out of 41 included studies, 26 used multivariable Cox or logistic regression models. Different feature selection techniques such as stepwise variable selection (forward, backward, or backward-forward methods),22 univariate binary regression, and LASSO (least absolute shrinkage and selection operator) were used.23 Fifteen studies used various machine learning methods, including Bayesian conditional probability,10 43 random forest,14 15 38 46 47 neural network,15 19 27 39 deep learning,41AdoBoost,22 gradient boosting,30 47 natural language processing,30 36 42 and others.24 45 46 The most popular machine learning methods used were random forest and neural networks. Shrinkage methods (for both traditional and machine learning models) such as LASSO or machine learning algorithms such as Ado boosting were used to limit the number of features. On average, the C statistics for machine learning and traditional regression models were 0.74 (standard deviation 0.06; 95% confidence interval 0.71 to 0.77) and 0.71 (0.07; 0.68 to 0.73), respectively. Although the mean C statistic was higher for machine learning models, the difference was not statistically significant (difference 0.03, 95% confidence interval −0.0 to 0.07). Furthermore, we did not find a significant difference between the C statistics for all patient (0.76, 0.72 to 0.79) and patient specific (0.72, 0.70 to 0.75) models (difference 0.03, –0.01 to 0.07). A few studies used other comprehensive methods of model evaluation, such as the integrated discrimination index and net reclassification index. For example, by calculating the clinical utility of a predictive model for a given threshold, Walsh et al also used their findings to develop a model of clinical usefulness to evaluate the potential cost of mis-calibration and to measure the value of interventions aimed at reducing readmission.23 Of all studies, eight (20%) reported sensitivity and specificity of the developed models,15 20 24 25 30 39 42 46 and seven (17%) reported positive and negative predictive values.15 20 24 25 30 32 44 Finally, five (12%) reported being implemented in the EMR system.8 13 14 24 32 Three studies used natural language processing to extract additional psychosocial information such as suicidality or excessive alcohol consumption that otherwise were not available via structured EMR data.30 36 42 For example, Rumshisky et al used 1000 informative words to extract additional data from related clinical notes, which improved the C statistic from 0.75 in the base model to 0.78.42 In addition to structured EMRs, Golas et al used natural language processing for two types of unstructured data—physicians’ notes and discharge summaries—to analyze data related to patients’ social history and treatment during admission (allergic reactions, history of illness, intolerances and sensitivities).30

Quality assessment

We assessed the quality and risk of bias of studies by using six variables, including accounting for missing values, validation method, type of validation (internal versus external and prospective versus retrospective), calibration (yes/no), and scope of readmission assessment (only at studied hospitals or in a larger geographic area) (supplementary table F). We used a few techniques to deal with missing values in EMRs: removing data with missing values from the analytic sample,10 11 creating a separate category for them,8 imputing their values,14 and considering missing laboratory results to be normal.21 As we included only validated models, most were of high quality with a relatively low risk of bias. However, only a few expanded the assessment of their models beyond basic C statistics to evaluate the clinical usefulness of the models.24 46 Fifteen (37%) studies calibrated their models.8 12 18 19 21 23 26 31 34 35 36 37 41 Calibration techniques such as Hosmer-Lemeshow, plot scaling, and prevalence adjustment were used to make the model probabilities more similar to the probabilities of the population studied. However, most of these studies failed to report the number of patients in each risk group, so we were unable to estimate the average predicted readmission rate and observed-to-expected ratios for each model. Furthermore, most models measured readmission only among included hospitals instead of using a broader (regional) scope for readmission. Finally, all validations were done internally. Thus, we could not assess the generalizability and practical utility of the developed readmission risk assessment tools.

Discussion

In this systematic review, we reviewed 41 studies of the development and validation of predictive models of 30 day hospital readmission using electronic medical records. These models were developed to identify patients at high risk of readmission for whom coordinated discharge care might reduce the chance of early readmission. On average, the predictive ability of risk readmission models based on EMR data compared with that of previously published models using all other available datasets (administrative or survey data) has improved, from 0.67 to 0.74.5

Comparisons with other studies

Over the past few years, despite increasing use of “big data” and rich clinical information available via EMRs,4 48 and application of sophisticated machine learning methods, predicting risk of early hospital readmission with reliable accuracy has remained elusive. Hospital readmission is a complex and multidimensional problem, demanding to be better understood. Although inclusion of essential clinical data available in EMRs (such as vital signs, laboratory results, or complexity of the surgical procedure) increased the predictive ability of the models, some important clinical data were still not readily available in EMRs. For example, composite measures of severity such as the Braden Score (risk indicator for developing a pressure ulcer),22 Comorbidity Point Score (risk indicator of multimorbid severity),12 and Laboratory Acute Physiology Score (risk indicator of illness severity)12 21 have rarely been examined. Furthermore, functional status or frailty at the time of discharge, known to be an important risk factor for readmission,49 is not routinely collected and used in EMR based predictive models.50 51 Most notably, despite a large body of literature showing significant links between social and environmental factors and risk of readmission or other adverse health events,52 53 health systems are still not systematically collecting these data. Including selected social and environmental factors,54 55 such as care giver availability or housing instability,56 57 58 59 could likely substantially improve the predictive accuracy of the risk readmission models.55 To fill this void, alternative approaches have been examined. For instance, Census Bureau zip code or block level socioeconomic data have been merged with EMRs.30 33 35 Perhaps because of the imprecise nature of these aggregate data, however, many of them did not show significant difference in discriminatory power when examined in predictive models. Many models started using natural language processing to extract key social and environmental data from unstructured data such as physicians’ notes.60 Although physicians’ or nurses’ notes are unsystematically recorded, meaning that what is recorded by one physician may not be recorded by another, natural language processing has shown promising results for improving the accuracy of predictive models. Natural language processing can also be used to collect other salient information that is usually missing from structured EMRs, such as psychosocial or sensory statuses.30 36 42 Additionally, the quality and integrity of EMR data are of concern and have implications for leveraging these data to develop accurate and precise risk assessment.61 This systematic review emphasized that nowhere in the literature was appropriate validation of EMR data performed. Furthermore, for missing data elements, standard approaches to identify the missingness mechanism and to appropriately deal with it without compromising the data elements used for further modeling are absent. Also, as previously described, EMRs lack certain salient data elements informed by the literature for risk assessment. We hypothesize that overcoming these data quality and integrity problems would not only improve the C statistics but would also introduce significant and impactful features that are missing from currently reviewed models. Furthermore, with the emergence of big data and sophisticated machine learning methods in healthcare, the number of predictive models of hospital readmission has increased over the past few years.62 The clinical utility of machine learning methods, however, needs further attention. For example, sophisticated machine learning methods such as neural networks work like a black box, lacking transparency in selection of features. Thus, their relative contribution, usefulness, and interpretability in medicine need to be investigated.63 For example, class imbalance—when dichotomized outcomes are substantially different in probabilities—might cause biased predictions when machine learning is used, unless certain adjustments are made to correct it.64 Otherwise, models that are developed using imbalanced training data would intrinsically provide more accurate predictions for the class with a higher number of occurrences. Despite extreme class imbalance in hospital readmission, only a small number of reviewed predictive models adjusted for it. EMRs encompass a large repository of multidimensional data. Although traditional regression is easy to use and implement, it may not take advantage of the volume of data elements available in the EMR; however, machine learning methods are capable of using the exhaustive set of data elements for consideration. Despite the growing literature supporting machine learning methods as an alternative, coupled with potential benefits in their use for predicting readmissions, three important remaining criteria have been highlighted, which our systematic review attempts to tackle. Firstly, feature selection remains an important criterion that is predicated on having an exhaustive and diverse set of data elements available, such as socioeconomic and functional status. Subsequent studies should consider implementing sufficiently granular data elements via text mining, merging this with smaller geographic units of analysis (census tract or neighborhood level), or encouraging health systems to collect these salient attributes. Secondly, machine learning methods struggle to achieve parsimony owing to the selection of several hundred to thousands of features to predict an outcome. The use of machine learning methods, although fashionable and offering a potential academic exercise, fails to answer important clinical questions about the implementation and interpretability of the results. Thirdly, machine learning methods vary substantially in their interpretability, creating barriers and impediments for clinical buy-in and for their implementation across health systems. Although interpretable machine learning methods have been absent in this systematic review, the evolution of the field requires development and implementation of interpretable machine learning methods to establish clinical usefulness and inspire potential changes in practice patterns. The paucity of studies (5%) that provide information on the implementation and clinical utility of these models in the hospital setting leads to a substantial void in how these models can improve care coordination and discharge planning across readmission risk strata. With interpretable models that would enhance clinician “buy-in,” their implementation would encourage identification of patients who need efficient allocation of limited available resources for care coordination. Furthermore, these models would inform hospitals to tailor appropriate discharge protocols to the patients across readmission risk groups. Regardless of the method used for prediction, careful diagnostic tests such as C statistics, sensitivity and specificity, positive and negative predictive values, integrated discrimination index, and net reclassification index should be calculated and discussed to ensure not only the accuracy of a model but also its clinical usefulness.65 The C statistic is a measure of “discrimination” because it measures whether a model can discriminate patients at higher risk from those at lower risk. Besides C statistics, most of the reviewed models failed to calculate and interpret a reasonable array of other diagnostic tests or even clinical usefulness of the models developed. Finally, to ensure approximate closeness of the model’s performance to existing probabilities of the target population, a prediction model, regardless of how it was developed, needs to be well calibrated.23 Most of the models we reviewed either did not discuss calibration or simply used goodness-of-fit tests such as Hosmer-Lemeshow in place of full calibration. Categorization into tenths of predicted counts and observed counts can be difficult if discrimination is poor.66

Limitations of study

Our study had a few limitations. Most probably, the definition and classification of variables (both predictors and outcomes) varied among models. Although improving, EMR data are not yet standardized like administrative claims or survey data. As these studies used center specific EMRs, predictive models were developed for particular hospital settings and are not generalizable at a national level; therefore, the broader clinical and practice benefit may be localized and most likely applicable for institutional quality improvement. However, if disparate EMR data can be recoded in a manner that harmonizes across different EMRs, then comparisons can be made. Data coordinating centers could plausibly guide hospitals to do this, and this should be a future direction. We attempted to synthesize the findings to the best of our ability, so future studies may benefit from testing the variables and methods that were found to be most promising. Furthermore, few rigorous studies that have studied interventions targeting reduction in readmissions have actually shown a decrease in readmissions.67 68 Finally, as discussed above, most of the reviewed models neither included other recommended diagnostic tests besides C statistics nor discussed the clinical usefulness of their findings. To minimize bias, we chose only the highest quality models by including the ones that explicitly validated their findings.

Conclusions and policy implications

In short, despite notable progress in the development and accuracy of the models, predicting and reducing readmission remain a complex process.67 68 Most of the models developed to date have moderate predictive ability. No well accepted threshold of what constitutes an accurate C statistic exists because model discrimination is also a measure of how predictable a given outcome is. For an outcome such as 30 day readmission that is considered to be difficult to predict, a C statistic of 0.75 may be adequate for the model to be useful. In contrast, for an outcome that is readily predictable by clinical experts, even a model with a C statistic of 0.90 may not be useful. Use of EMR data and machine learning methods have created an enormous opportunity for further refinement of risk prediction tools for readmission, making them specifically pragmatic for hospitals to better identify patients who are at higher risk of readmission. Continued development of these models to optimize performance of the model (tuning) may lead us toward improvement through institutional quality improvement and readmission reduction. The development of tools to predict the risk of 30 day hospital readmission and thus enable identification of patients at high risk has increased sharply in recent years However, achieving a high sensitivity and specificity in predicting who is at risk of readmission and why is still a work in progress The accuracy and reliability of risk prediction models largely depend on predictors and methods of development, validation, calibration, and clinical utility On average, risk prediction models using electronic medical records have better predictive performance than those using administrative data, but this improvement remains modest The quality and integrity of electronic medical records are concerning and pose significant barriers to effectively leveraging these data to develop accurate and precise risk assessment tools Most studies did not account for salient socioeconomic features, failed to calibrate their models, and lacked careful assessment of the clinical utilities and implementation of the developed tools

62 in total

1. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.

Authors: Evangelia Christodoulou; Jie Ma; Gary S Collins; Ewout W Steyerberg; Jan Y Verbakel; Ben Van Calster
Journal: J Clin Epidemiol Date: 2019-02-11 Impact factor: 6.437

2. A guide to systematic review and meta-analysis of prediction model performance.

Authors: Thomas P A Debray; Johanna A A G Damen; Kym I E Snell; Joie Ensor; Lotty Hooft; Johannes B Reitsma; Richard D Riley; Karel G M Moons
Journal: BMJ Date: 2017-01-05

3. DEVELOPMENT AND VALIDATION OF A NOVEL TOOL TO PREDICT HOSPITAL READMISSION RISK AMONG PATIENTS WITH DIABETES.

Authors: Daniel J Rubin; Elizabeth A Handorf; Sherita Hill Golden; Deborah B Nelson; Marie E McDonnell; Huaqing Zhao
Journal: Endocr Pract Date: 2016-10 Impact factor: 3.443

4. Depressive symptoms and chronic obstructive pulmonary disease: effect on mortality, hospital readmission, symptom burden, functional status, and quality of life.

Authors: Tze-Pin Ng; Mathew Niti; Wan-Cheng Tan; Zhenying Cao; Kian-Chung Ong; Philip Eng
Journal: Arch Intern Med Date: 2007-01-08

5. Readmissions, Observation, and the Hospital Readmissions Reduction Program.

Authors: Rachael B Zuckerman; Steven H Sheingold; E John Orav; Joel Ruhter; Arnold M Epstein
Journal: N Engl J Med Date: 2016-02-24 Impact factor: 91.245

6. Predischarge and Postdischarge Risk Factors for Hospital Readmission Among Patients With Diabetes.

Authors: Abhijana Karunakaran; Huaqing Zhao; Daniel J Rubin
Journal: Med Care Date: 2018-07 Impact factor: 2.983

7. Support from hospital to home for elders: a randomized trial.

Authors: L Elizabeth Goldman; Urmimala Sarkar; Eric Kessell; David Guzman; Michelle Schneidermann; Edgar Pierluissi; Barbara Walter; Eric Vittinghoff; Jeff Critchfield; Margot Kushel
Journal: Ann Intern Med Date: 2014-10-07 Impact factor: 25.391

8. Electronic medical record-based multicondition models to predict the risk of 30 day readmission or death among adult medicine patients: validation and comparison to existing models.

Authors: Ruben Amarasingham; Ferdinand Velasco; Bin Xie; Christopher Clark; Ying Ma; Song Zhang; Deepa Bhat; Brian Lucena; Marco Huesch; Ethan A Halm
Journal: BMC Med Inform Decis Mak Date: 2015-05-20 Impact factor: 2.796

9. Development, Validation and Deployment of a Real Time 30 Day Hospital Readmission Risk Assessment Tool in the Maine Healthcare Information Exchange.

Authors: Shiying Hao; Yue Wang; Bo Jin; Andrew Young Shin; Chunqing Zhu; Min Huang; Le Zheng; Jin Luo; Zhongkai Hu; Changlin Fu; Dorothy Dai; Yicheng Wang; Devore S Culver; Shaun T Alfreds; Todd Rogow; Frank Stearns; Karl G Sylvester; Eric Widen; Xuefeng B Ling
Journal: PLoS One Date: 2015-10-08 Impact factor: 3.240

10. Measures of Diagnostic Accuracy: Basic Definitions.

Authors: Ana-Maria Šimundić
Journal: EJIFCC Date: 2009-01-20

23 in total

Review 1. Artificial intelligence unifies knowledge and actions in drug repositioning.

Authors: Zheng Yin; Stephen T C Wong
Journal: Emerg Top Life Sci Date: 2021-12-21

2. Forecasting Hospital Readmissions with Machine Learning.

Authors: Panagiotis Michailidis; Athanasia Dimitriadou; Theophilos Papadimitriou; Periklis Gogas
Journal: Healthcare (Basel) Date: 2022-05-25

3. AutoScore: A Machine Learning-Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records.

Authors: Feng Xie; Bibhas Chakraborty; Marcus Eng Hock Ong; Benjamin Alan Goldstein; Nan Liu
Journal: JMIR Med Inform Date: 2020-10-21

4. Effect of a Real-Time Risk Score on 30-day Readmission Reduction in Singapore.

Authors: Christine Xia Wu; Ernest Suresh; Francis Wei Loong Phng; Kai Pik Tai; Janthorn Pakdeethai; Jared Louis Andre D'Souza; Woan Shin Tan; Phillip Phan; Kelvin Sin Min Lew; Gamaliel Yu-Heng Tan; Gerald Seng Wee Chua; Chi Hong Hwang
Journal: Appl Clin Inform Date: 2021-05-19 Impact factor: 2.342

5. Automated model versus treating physician for predicting survival time of patients with metastatic cancer.

Authors: Michael F Gensheimer; Sonya Aggarwal; Kathryn R K Benson; Justin N Carter; A Solomon Henry; Douglas J Wood; Scott G Soltys; Steven Hancock; Erqi Pollom; Nigam H Shah; Daniel T Chang
Journal: J Am Med Inform Assoc Date: 2021-06-12 Impact factor: 4.497

Review 6. Routine Health Information Systems in the European Context: A Systematic Review of Systematic Reviews.

Authors: Francesc Saigí-Rubió; José Juan Pereyra-Rodríguez; Joan Torrent-Sellens; Hans Eguia; Natasha Azzopardi-Muscat; David Novillo-Ortiz
Journal: Int J Environ Res Public Health Date: 2021-04-27 Impact factor: 3.390

Review 7. Application of machine learning in predicting hospital readmissions: a scoping review of the literature.

Authors: Yinan Huang; Ashna Talwar; Satabdi Chatterjee; Rajender R Aparasu
Journal: BMC Med Res Methodol Date: 2021-05-06 Impact factor: 4.615

8. Patient similarity analytics for explainable clinical risk prediction.

Authors: Hao Sen Andrew Fang; Ngiap Chuan Tan; Wei Ying Tan; Ronald Wihal Oei; Mong Li Lee; Wynne Hsu
Journal: BMC Med Inform Decis Mak Date: 2021-07-01 Impact factor: 2.796

9. Prediction of incident atrial fibrillation in community-based electronic health records: a systematic review with meta-analysis.

Authors: Ramesh Nadarajah; Eman Alsaeed; Ben Hurdus; Suleman Aktaa; David Hogg; Matthew G D Bates; Campbel Cowan; Jianhua Wu; Chris P Gale
Journal: Heart Date: 2022-06-10 Impact factor: 7.365

10. Machine Learning in Clinical Journals: Moving From Inscrutable to Informative.

Authors: Karandeep Singh; Andrew L Beam; Brahmajee K Nallamothu
Journal: Circ Cardiovasc Qual Outcomes Date: 2020-10-14