Literature DB >> 35298467

Development and internal validation of prediction models for future hospital care utilization by patients with multimorbidity using electronic health record data.

Marlies Verhoeff^1,2, Janke de Groot², Jako S Burgers³, Barbara C van Munster¹.

Abstract

OBJECTIVE: To develop and internally validate prediction models for future hospital care utilization in patients with multiple chronic conditions.
DESIGN: Retrospective cohort study.
SETTING: A teaching hospital in the Netherlands (542 beds). PARTICIPANTS: All adult patients (n = 18.180) who received care at the outpatient clinic in 2017 for two chronic diagnoses or more (including oncological diagnoses) and who returned for hospital care or outpatient clinical care in 2018. Development and validation using a stratified random split-sample (n = 12.120 for development, n = 6.060 for internal validation). OUTCOMES: ≥2 emergency department visits in 2018, ≥1 hospitalization in 2018 and ≥12 outpatient visits in 2018. STATISTICAL ANALYSIS: Multivariable logistic regression with forward selection.
RESULTS: Evaluation of the models' performance showed c-statistics of 0.70 (95% CI 0.69-0.72) for the hospitalization model, 0.72 (95% CI 0.70-0.74) for the ED visits model and 0.76 (95% 0.74-0.77) for the outpatient visits model. With regard to calibration, there was agreement between lower predicted and observed probability for all models, but the models overestimated the probability for patients with higher predicted probabilities.
CONCLUSIONS: These models showed promising results for further development of prediction models for future healthcare utilization using data from local electronic health records. This could be the first step in developing automated alert systems in electronic health records for identifying patients with multimorbidity with higher risk for high healthcare utilization, who might benefit from a more integrated care approach.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35298467 PMCID： PMC8929569 DOI： 10.1371/journal.pone.0260829

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The prevalence of multimorbidity (defined as having two or more chronic conditions) is increasing [1]. Kingston et al. (2018) predicted that by 2035 67.8% of the adults in the UK aged over 65 years will be living with multimorbidity [2]. An increasing prevalence of multimorbidity puts pressure on current healthcare systems, as hospital organizations are mostly providing disease-specific care that is generally delivered by separate disciplines or medical specialties [3,4]. Compared to patients with single chronic conditions, patients with multimorbidity have a higher risk of experiencing fragmented care, possibly resulting in suboptimal outcomes [4-8]. Fragmentation of care, especially with a lack of care coordination, can lead to adverse outcomes such as over- or undertreatment, unnecessary diagnostics and medication-interactions [9-12]. If undetected, these consequences can result in unnecessary and potentially preventable healthcare utilization, such as emergency department (ED) visits, hospitalizations and outpatient visits [13]. Several (inter)national healthcare organizations suggest that quality of care for patients with multimorbidity might improve with a more integrated care approach, for example by organizing better coordination and more tailoring of care [14]. This approach might also reduce the risk and related costs of adverse outcomes and decrease preventable future healthcare utilization, like emergency department visits, acute hospitalization and unnecessary outpatient visits [14-16]. Nevertheless, to allocate healthcare resources in a way that is both feasible and sustainable, healthcare professionals should identify patients with multimorbidity that might benefit most from a more integrated care approach, such as those with a high modifiable risk for adverse outcomes or a high risk of frequent or acute healthcare utilization [14,17]. Several studies found that healthcare utilization as well as high costs are associated with numerous disease-related, patient-related and healthcare-related factors [18-23]. Because of this multifactorial association, it is difficult for individual healthcare professionals to quickly recognize patients with multimorbidity at high risk for future frequent or acute healthcare utilization that potentially could (partially) be prevented with a more integrated care approach. A risk screening tool might aid healthcare professionals in identifying patients who might benefit most from an integrated care approach. In other fields, several risk screening tools are available, e.g. in cardiovascular risk management and the diagnostic pathway of deep-vein thrombosis, that combine several patient-related or disease-related factors to support healthcare professionals’ decisions when dealing with individual patients [24,25]. Normally, the healthcare professional collects data on risk factors to calculate the risk for the individual patient and tailors the treatment strategy based on this risk. The registration of data in the Electronic Health Record (EHR) offers opportunities to develop, integrate and automate the data collection and calculation of an individual patient’s risk for specific outcomes, such as future healthcare utilization, using the registered individual patient data [26-28]. Therefore, the aim of this study was to develop, validate and evaluate the performance of prediction models for (1) ≥2 emergency department visits, (2) ≥1 acute hospitalization and (3) ≥12 outpatient visits in patients with multimorbidity, based on administrative EHR data.

Methods

Our study is a retrospective cohort study of a large hospital population of patients with multimorbidity. We used data on the population’s demographics and healthcare utilization in 2017 to develop and internally validate three prediction models for healthcare utilization outcomes in 2018. We followed the recommendations of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement for this article [29].

Source of data and population

The data used for this study was administrative EHR data on all adult patients with multiple chronic conditions who visited the outpatient clinic of Gelre hospital in Apeldoorn, a middle-large teaching hospital in the Netherlands, in 2017 and 2018. We included all patients who: ■ were aged 18 years or older; ■ had received outpatient clinical care for multimorbidity, defined as at least two chronic conditions, in 2017. Both chronic and oncologic diagnoses were considered chronic conditions; ■ had received hospital care for at least one diagnosis in 2018. The local institutional review board approved the anonymous use of these data for research purposes and a waiver of consent (Local ethics committee Gelre ziekenhuizen (Gelre LTC) number 2019_02). In the Netherlands, hospital care is coded and billed using billing codes that include diagnosis and treatment combinations (DTCs). These DTCs contain information about the diagnosis, including an International Classification of Diseases and Related Health Problems 10 (ICD-10) code. The DTC data also contains information about the location, time, type and number of care activities linked to the specific diagnoses [30]. The diagnoses were classified into 259 clinically relevant diagnosis groups with the use of Clinical Classifications Software (CCS) for ICD-10-PCS, which was developed by the Agency for Healthcare Research and Quality (AHRQ) [31]. The diagnoses from the CCS classification were categorized by Dutch Hospital Data in acute, chronic, elective, oncological and other diagnoses.

Outcomes

We included three types of healthcare utilization: Acute hospitalization(s); Multiple emergency department (ED) visits; A high number of outpatient visits. Acute hospitalization(s) was defined as one or more acute hospitalizations in 2018. Acute hospitalization is a major potential adverse event that, when caused by the consequences of care fragmentation, might be preventable. Multiple ED visits were defined as two or more ED visits in 2018, which is consistent with other definitions of frequent ED visits [32]. One ED visit can happen to anyone, but more than one ED visit could suggest that there is a more chronic cause for acute care utilization. If care fragmentation and the described consequences are present in these patients, some of the ED visits might be preventable. A high number of outpatient visits was defined as twelve or more outpatient visits in 2018, which is on average one outpatient visit per month. A recent study that developed prediction models for high care need in patients with multimorbidity from the primary care perspective also used twelve or more contacts with the general practitioner as the cut-off point [33].

Predictors

We selected candidate predictor variables based on existing literature and clinical expertise. The demographic characteristics consisted of age, sex, and socio-economic status [18-23,34]. Socio-economic status was based on ZIP code and classified as ‘low’, ‘medium’ and ‘high’ based on information from the Central Bureau of Statistics Netherlands [35]. The healthcare utilization characteristics included number of chronic diagnoses, number of acute diagnoses and number of medical specialties involved in the patients’ hospital care in 2017. Additionally, we calculated and included the number of outpatient visits, number of acute hospitalizations, number of inpatient days, number of ICU days and number of emergency department visits, using the information on care activities in 2017. We imputed missing values with the mean of non-missing cases [36].

Statistical analysis

We used R 3.6.1, Rstudio 1.2.5001 and the pROC (v1.17.0.1) package for the statistical analysis and the ggplot2 (v3.3.3) package for visualization [37-40]. For continuous and discrete numerical variables, we checked the linearity of the log odds by categorizing the variable into n groups and analyzing the association with the outcome with the use of (n-1) dummy variables reported by tables and plots. If the association was approximately linear, the variable was included as continuous variable in the model. If the association was defined as non-linear, the variable was split into groups and included in the model as a categorical variable. The number of categories for a variable was determined with the following steps: The dataset was divided into quantiles (starting with quartiles, followed by either terciles or quintiles/sextiles depending on the percentile distribution of the variables) We assessed the quantile’s cut-off points. If the cut-off points were deemed clinically relevant, these cut-off points were used. The cut-off points were clinically relevant if they had been used in prior research or if they were meaningful based on clinical practicality and meaningfulness (determined by the researchers). Quantile’s cut-off points were rounded to clinically relevant cut-off points if possible, to stay as close as possible to the quantile distribution of patients. If the groups and cut-off points produced by the quantiles were not deemed clinically relevant, the cut-off points and number of groups were determined based on clinical practicality and meaningfulness (determined by the researchers). A minimum of 50 events per group was used. Moreover, the overlap of the odds ratios and confidence intervals for the groups were assessed. A solution with a number of groups with no overlap in confidence intervals was preferred, if possible. For the log odds of the number of acute diagnoses and number of emergency department visits, a linear relationship with the log odds of the outcome was found to be a good approximation after assessment using the prior described steps. For age, number of chronic/oncologic diagnoses, number of specialties, number of outpatient visits, number of acute hospitalizations, number of inpatient days, number of ICU days, and number of therapeutic care activities, groups were prepared using the methods as earlier described, with the following cut-off-points: Age: 55, 65, and 75 years; Number of chronic/oncologic diagnoses in 2017: 2, 3, 4, 5 and 6 chronic/oncologic diagnoses; Number of specialties involved in 2017: 2, 3, 4, 5 and 6 specialties; Number of outpatient visits in 2017: 5 and 8 outpatient visits. Number of acute hospitalizations in 2017: 1 and 2 acute hospitalization(s) Number of inpatient days in 2017: 1, 4 and 8 inpatient days.

Building the prediction models

We used multivariable logistic regression with forward selection. For each step we started by adding the candidate predictor with the lowest p-value to the model. We stopped adding new variables to the model when all remaining candidate predictors had a p-value < 0.05. We assessed internal validity with a weighted split-sample procedure. We randomly split the sample into development sets (two third of the sample size) and validation sets (one third of the sample size) aiming for a minimum of 50 cases per predictor group in the development sets [41]. We assured the same distribution for every outcome in both sets by first grouping the data by outcome. To evaluate performance of each model we examined discrimination with a ROC-curve and calculated the c-statistic (AUC) using the pROC package and examined calibration by plotting the calibration curve.

Results

Population characteristics

Overall, 18180 patients were included (S1 Fig). Table 1 shows the general, disease and care characteristics in 2017. Median age of the population was 68.0 years (IQR 48.1–87.8 years). 61.6% of the included patients had two diagnoses, 24.4% had three diagnoses and the remaining patients had four or more chronic and/or oncologic diagnoses for which they had used hospital care. With regard to the outcomes in 2018, 2257 patients (12.4%) had at least one hospitalization in 2018, 1258 (6.9%) had two or more ED visits in 2018 and 1293 patients (7.1%) had at least 12 outpatient visits in 2018. After checking for linearity between the log odds of the outcomes and the candidate predictors, only the number of acute diagnoses in 2017 and number of ED visits in 2017 were included without groups. All other candidate predictors showed a non-linear relationship to the outcomes and were split into groups. The characteristics of the validation datasets were similar to those of the derivations datasets (S1 Table). The estimated socio-economic status had missing values (n = 47) due to missing socio-economic status information for a number of ZIP codes. These missing values were imputed with the mean of non-missing cases.

Table 1

Population characteristics in 2017.

Variable	Total population (n = 18180)
General characteristics
Age, medianAge (groups)18–54 years55–64 years65–74 years75–98 years	68.0 (48.1–87.8)4132 (22.7)3585 (19.7)5369 (29.5)5094 (28.0)
Sex, female, n (%)	10289 (56.6)
Socioeconomic status, n (%)LowMiddleHigh	7243 (39.8)7244 (39.9)3693 (20.3)
Disease characteristics
Chronic/oncologic diagnoses, median (IQR)Chronic/oncologic diagnoses (groups) n(%)2 chronic/oncologic diagnoses3 chronic/oncologic diagnoses4 chronic/oncologic diagnoses5 chronic/oncologic diagnoses≥6 chronic/oncologic diagnoses	2 (2–3)11203 (61.6)4433 (24.4)1653 (9.1)575 (3.2)316 (1.7)
Acute diagnoses, median (IQR)	0 (0–1)
Hospital care characteristics
Medical specialties involved, median (IQR)Medical specialties involved (groups), n(%)2 specialties3 specialties4 specialties5 specialties≥6 specialties	3 (2–5)5559 (30.6)5817 (32.0)3662 (20.1)1797 (9.9)1345 (7.4)
Outpatient visits, median (IQR)Outpatient visits (groups), n (%)2–4 visits5–7 visits≥8 visits	6 (2–10)6337 (34.9)6176 (34.0)5667 (31.2)
Acute hospitalizations, median (IQR)Acute hospitalizations (groups)0 acute hospitalizations1 acute hospitalization≥2 acute hospitalizations	0 (0–0)14847 (81.7)2479 (13.6)854 (4.7)
Inpatient days, median (IQR)Inpatient days (groups)0 inpatient days1–3 inpatient days4–7 inpatient days≥8 inpatient days	0 (0–2)13402 (73.7)1504 (8.3)1531 (8.4)1743 (9.5)
ICU days, median (IQR)Patients with at least 1 ICU admission, n (%)	0 (0–0)265 (1.4)
Emergency department visits, median (IQR), visits	0 (0–1)

Model 1: Predicting at least one hospitalization in 2018 (Table 2)

A higher age in 2017 was associated with at least one hospitalization in 2018. With age group 18–54 years as reference, the OR increased per age group, from 1.53 (95% CI 1.25–1.87) for the age 55–64 years to 2.54 (95% CI 2.13–3.04) for the age 75 or more years. In the univariable analysis, the number of chronic/oncologic diagnoses showed significant associations with at least one hospitalization in 2018, but in the multivariable analysis these associations disappeared, except for the group with six or more chronic/oncologic diagnoses (OR 1.80, 95% CI 1.29–2.50). Moreover, variables of acute care utilization in 2017 where significant predictors in the multivariable model for hospitalization in 2018. Patients with two or more hospitalizations in 2017 had 1.55 higher odds (95% CI 1.11–2.16) to have at least one hospitalization in 2018 compared to patients without a hospitalization in 2017. Moreover, every ED visit in 2017 led to 1.23 higher odds (95% CI 1.15–1.33) of having at least one hospitalization in 2018. Compared to patients with no inpatient days, patients with eight or more inpatient days had 1.47 higher odds (95%CI 1.13–1.89) of having at least one hospitalization in 2018.

Model 2: Predicting 2 or more ED visits in 2018 (Table 3)

Age in 2017, number of outpatient visits in 2017 and variables of acute healthcare utilization in 2017 were predictors for 2 or more ED visits in 2018. The association of number of chronic/oncologic diagnoses in 2017 with 2 or more ED visits in 2018 in the univariable analysis disappeared in the multivariable analysis, with exception of the group with six or more chronic diagnoses. In the multivariable model, patients with 4–7 or ≥8 inpatient days had 1.22 (95% CI 0.99–1.51) and 1.72 (95% CI 1.37–2.17) higher odds of visiting the ED 2 or more times in 2018, respectively, compared to patients with no inpatient days. Every ED visit in 2017 led to 1.49 (95% CI 1.39–1.61) higher odds of visiting the ED 2 or more times in 2018. Moreover, patients with eight or more outpatient visits in 2017 had an OR 1.80 (1.29–250) of visiting the ED twice or more in 2018.

Model 3: Predicting 12 or more outpatient visits in 2018 (Table 4)

Age, number of chronic/oncologic diagnoses, higher numbers of involved medical specialties and number of outpatient and ED visits were significant predictors of the outcome in the multivariable model for 12 or more outpatient visits in 2018. The number of outpatient visits in 2017 was the strongest predictor of 12 or more outpatient visits in 2018. The OR was 2.13 (95% CI 1.52–2.61) for five to seven outpatient visits and 4.76 (95% CI 3.63–6.29) for eight or more outpatient visits compared to patients with two to four outpatient visits.

Performance and internal validation of the models

Evaluation of the models’ performance showed c-statistics of 0.70 (95% CI 0.69–0.72) for the hospitalization model, 0.72 (95% CI 0.70–0.74) for the ED visits model and 0.75 (95% CI 0.73–0.76) for the outpatient visits model. The c-statistics in two validation sets were almost similar to the c-statistics in the development sets: 0.69 (95% CI 0.67–0.71) for the hospitalization model and 0.75 (95% CI 0.73–0.78) for the outpatient visits model. The model predicting two or more ED visits performed less in the validation set with a c-statistic of 0.67 (95% CI 0.64–0.70). The full prognostic models including intercept and model performance measures for the development and validation sets are included in supplementary tables (see S2–S4 Tables). The models’ calibration curves (Fig 1) show that there was agreement between lower predicted and observed probability for hospitalization and ED visits, but that the models overestimated the probability for patients with higher predicted probabilities. For the outpatient visit model there was good agreement, with a slight underestimation of the probability in the patients with intermediate predicted probability and an overestimation of the probability in the patients with a higher predicted probability.

Fig 1

Calibration curves for the three prediction models.

The models for hospitalization and ED visits overestimate the risk in patients with a higher predicted risk. The model for outpatient visits has a reasonable agreement between predicted and actual risk.

Calibration curves for the three prediction models.

Integrating prediction models into the EHR

An individual patient’s risk can be calculated using the regression coefficients in the supplementary tables (see S2–S4 Tables). Fig 2 shows an example of how the calculated predicted risk, including the probability percentile (top-X% risk group, see S2 Fig) for the three outcomes for new fictive patients, could be reported to an individual healthcare professional.

Fig 2

Predicted probabilities for two fictive patients using the developed models and indicating the probability percentile per outcome.

Discussion

The aim of this study was to develop and validate prediction models for future (1) ≥2 emergency department visits, (2) ≥1 acute hospitalization and (3) ≥12 outpatient visits in patients with multimorbidity, using existing administrative EHR data. Our results suggest that local administrative data from the EHR can be used to locally develop and validate reasonable performing prediction models for these outcomes. All prediction models also performed reasonably well in the validation sets (see S2–S4 Tables). The predicted and actual probabilities show good agreement in each model, but show a tendency to overestimate the actual probability in the higher risk groups for ≥1 hospitalization and ≥2 ED visits. In line with other research, our study shows that administrative data from the EHR can be used to develop reasonable prediction models for healthcare utilization [42]. In a systematic review by Wallace et al. (2014) the best performing models to predict acute hospitalization using administrative or clinical record data had similar c-statistics in development studies ranging from 0.68 to 0.83 [42]. Hudon et al. (2020) developed prediction models to predict four or more ED visits and reported c-statistics of 0.76 and 0.79 [43]. Compared to these models, our acute unplanned care models scored well. However, we were unable to compare calibration to these models, because these studies did not show calibration curves for their data. To our knowledge, there are no studies that developed prediction models to predict high numbers of outpatient visits, but models using administrative data from primary care predicting persistent frequent attendance and ≥12 general practitioner visits reported c-statistics of respectively 0.67 and 0.83, which is consistent with the c-statistic of 0.75 in our validation set [33,44]. In all three of our models, age, ≥6 chronic/oncologic diagnoses, the number of ED visits, and a higher number of outpatient visits in the year prior were significant predictors of healthcare utilization one year later. This is consistent with the acute care models with the best model accuracy described by Wallace et al. (2014), with age, prior healthcare utilization and a multimorbidity measure as some of the most important predictors [42]. However, we expected that the number of chronic diagnoses would have been a stronger predictor, based on the association between multimorbidity and healthcare utilization and healthcare costs reported in prior research [21,23]. Consistent with Heins et al. (2020), we found that higher numbers of outpatient visits in the prior year was the strongest predictor of higher numbers of outpatient visits one year later, and also predicted hospitalization and ED visits one year later [33]. These findings suggest that it is feasible to include measures of age, multimorbidity and prior healthcare utilization in models to predict future healthcare utilization. A strength of our study is that we developed reasonably performing prediction models using local administrative data from the EHR. Prediction models based on national and regional data might perform worse in a local population and be less applicable due to local variations [42]. Our results suggest that administrative data from the local EHR are sufficient to develop reasonably performing prediction models. Another strength of our study is the inclusion of a large, general hospital population with multimorbidity to develop these prediction models. This population matches the general definition of multimorbidity [1]. The prediction models can aid healthcare professionals in the hospital in differentiating between several patients with multimorbidity in the general hospital population. The combination of variables such as age, number of chronic diagnoses and prior healthcare utilization and the associated risk for adverse outcomes can be used in addition to the clinical assessment. Another strength of our study is the interpretability of the models. Compared to black-box models, that tend to have the best performance, models such as multiple logistic regression generally have lower accuracy, but are more interpretable, which is beneficial for the usage in the clinical setting [45]. Models with high interpretability can offer insight into the relative importance of each predictor and can help to form hypotheses about how and why the model predicts high probability for certain patients. Further research using newer techniques with regularization and sample size calculation for the required events per candidate predictor might improve the accuracy of the models without losing interpretability [46,47]. Moreover, other interpretable models, like classification trees and random forest, may perform better if there are important interaction effects between predictor variables. A limitation of our prediction models is their overestimation of the actual probability for the higher risk groups, especially for the acute care models. However, if a higher predicted risk for healthcare utilization is considered an indication of need for support, the overestimation could be acceptable if the models are used in combination with clinical assessment of the need for support. Digitalization of health records and data generation gives rise to the possibility of integrating prediction models in the EHR and using recent EHR data and machine learning for automatic stratification of patients with multimorbidity at risk for adverse outcomes [26,48,49]. However, future impact studies should evaluate if patients who are identified as high risk patients are indeed patients with a high modifiable risk for these outcomes and if they would benefit from a more integrated care approach. Moreover, factors like perceived health status, coordination of care, health literacy or the reason for healthcare utilization are not included as variables in these models and not a standard part of EHR registration. In the future, further development and use of artificial intelligence solutions could aid in retrieving information that is not a standard part of EHR registration. Including these factors in the models or in the identification process might add valuable information about a patient’s need for more integrated care [6,50]. Moreover, the models’ performance and identification process could also be improved by adding more variables by connecting to and using data from other (local) data sources, e.g. mortality from the municipal database or number of general practitioner’s visits from local general practitioners’ databases [26]. Adding a prediction model for mortality to the identification process could be valuable, as the majority of patients approaching end of life are not being appropriately identified as such, and might also benefit from a more integrated care approach to enhance adequate advance care planning [51]. Our prediction models can be considered a useful example of how local prediction models could support individual healthcare professionals in the identification of high risk of hospitalization, emergency department visits, and outpatient visits in patients with multimorbidity. Hospitals could use their own administrative data, and our predictors for hospitalizations, emergency department visits and outpatient visits for patients with multimorbidity. By locally developing and validating these models, local variation of the hospital population will be taken into account. The development and internal validation of local prediction models could be the first step in developing an automated alert system in EHRs for identifying patients with multimorbidity who might benefit from an integrated care approach.

Flow chart patients included in final dataset.

(TIF) Click here for additional data file.

Distribution of predicted probabilities in the development datasets based on the developed models, with cut-off values for top 5% and top 10% probabilities.

(TIF) Click here for additional data file.

Comparison of development and validation data.

Total dataset (n = 18180) was split randomly three times, weighted for every outcome. (PDF) Click here for additional data file.

Full Prognostic Model including intercept and model performance measures for derivation and validation set for outcome measure ‘≥1 hospitalization(s) in 2018’.

(PDF) Click here for additional data file.

Full Prognostic Model including intercept and model performance measures for derivation and validation set for outcome measure ‘≥2 ED visits in 2018.

(PDF) Click here for additional data file.

Full Prognostic Model including intercept and model performance measures for derivation and validation set for outcome measure ‘≥12 outpatient visits in 2018’.

(PDF) Click here for additional data file. 23 Aug 2021 PONE-D-21-09724 Predicting future hospital care utilization by patients with multimorbidity using electronic health record data PLOS ONE Dear Dr. Verhoeff, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Oct 07 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Ram Chandra Bajpai, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (if provided): As it seems a model development study. Authors should include it in the title and clearly mention in the methods section. Forward stepwise variable section is not recommended when building a prediction model as regression coefficients may be unstable and some important combination of predictors may be missed. Authors should use recommendations by Heinze et al 2018 for variable selection (https://onlinelibrary.wiley.com/doi/full/10.1002/bimj.201700067). Patient selection process should be better represented by a flow diagram so authors should consider it. Why did authors not formally calculated power for each outcome of interest? Authors should use Riley et al 2020 (https://www.bmj.com/content/368/bmj.m441) to demonstrate appropriateness of study power for each outcome. Authors have used a weighted split-sample procedure for internal validation. This statement required a proper citation. Authors should also consider some additional model calibration measures such as Brier score, and expected /observed event ratio etc. Model algorithm (or final equation) for each outcome must be presented in the manuscript so others will know about how to calculate risk for a given patient. Included figures are not clear and readable. Kindly add high resolution figures. Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for including your ethics statement: "The local institutional review board104approved the anonymous use of thesedata for research purposesanda waiver of consent(Gelre LTC 105number2019_02)". Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”). For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research. 3. Please correct your reference to "p=0.000" to "p<0.001" or as similarly appropriate, as p values cannot equal zero. 4. In your Methods section, please ensure that sufficient information to make the study reproducible are provided (for example, by describing the models and equations used, and describing parameters and assumptions applied). 5. Please upload a new copy of Figure 1 as the detail is not clear. Please follow the link for more information: https://blogs.plos.org/plos/2019/06/looking-good-tips-for-creating-your-plos-figures-graphics/" https://blogs.plos.org/plos/2019/06/looking-good-tips-for-creating-your-plos-figures-graphics/. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for the opportunity to review the submission of the manuscript entitled "Predicting future hospital care utilization by patients with multimorbidity using electronic health record data". The authors have written a very succinct and interesting manuscript, which will contribute to the knowledge base of multimorbidity. I have included suggested revisions below and would be happy to review a resubmission. -Could the authors please further clarify why these separate groups of patients were created : "who had received outpatient clinical [care?] for two or more chronic diagnoses, two or more oncological diagnoses or at least one chronic and one oncological diagnosis in 2017 and who had received hospital care for at least one diagnosis in 2018"? -Please change "multimorbid patients" to "patients with multimorbidity" -Please change "probability for patient" as it is not clear what this sentence is describing -Could the authors please clarify whether data about the reason(s) for hospitalizations, emergency department visits or outpatient visits were available? -If this data were not available, what potential insight would this data provide and if this data were available, why was this data not included in the analyses? Reviewer #2: This paper seeks to establish whether data from electronic health records can be used to predict future healthcare utilization as measured by >=1 acute hospital stays, >=2 ED visits, or >=12 outpatient visits. Numerous studies have been conducted on this topic, but this is unique in that it is focused on the population with multimorbidity, defined as 2 or more chronic conditions. The results show that with basic information from the EHR, future health utilzation can be predicted with mild accuracy (c-statistic around 0.70) Overall, the manuscript was technically sound, and the results presented are appropriate. The sample size is large enough for the methods used. One strength is the authors used the train/testing approach to build their models on 2/3 of the data, and validate on the remaining one-third. The results presented in the manuscript and supplementary material provide a good deal of transparency, and the methods are described well enough to replicate the study. Although the methods used are valid - they are not state-of-the-art when it comes to building models where the main goal is prediction (as opposed to inference). Linear Regularization methods like LASSO and Elastic-Net generally perform better at prediction than stepwise logistic models. Further, methods like classification trees and Random Forest, may do better if there are important interaction effects between predictor variables. The authors should address this in the limitations. Further, they may want to highlight some of the strengths of the model they used, namely that it is interpretable, compared to many other "black box" methods. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 4 Oct 2021 Editor Thank you very much for your time and considering our manuscript for publication. Moreover, thank you for your additional comments and suggestions, that we have addressed point by point below and in the Response to Reviewers-file. - As it seems a model development study. Authors should include it in the title and clearly mention in the methods section. Thank you for your suggestion, we have changed the title of our manuscript and added specification to our method. p1, line 4 p4, line 94 - Forward stepwise variable section is not recommended when building a prediction model as regression coefficients may be unstable and some important combination of predictors may be missed. Authors should use recommendations by Heinze et al 2018 for variable selection (https://onlinelibrary.wiley.com/doi/full/10.1002/bimj.201700067). Thank you for your input and the provided literature. After careful consideration of the provided literature, it seems that there are two issues: 1. We realized that the term “stepwise” in some papers means the automated selection using an algorithm that determines in each step which variable is added to the model next. However, we did not use an automated algorithm for this selection, but evaluated the effect of each additional variable ourselves. Therefore, it might be less confusing when we use the term “forward selection” instead of “forward stepwise selection”. We have changed this term in our manuscript. P8, line 185 2. The choice for forward selection instead of backward selection. As we understand, Heinze et al (2018) prefer backward elimination over forward selection. The reference that Heinze et al (2018) used to support their preference is Mantel (1970). But in the same journal Baele (1970) already critically commented on Mantel’s ideas. There is still an ongoing debate about the best method for developing prediction models. To our knowledge, both forward selection (FS) and backward elimination (BE) have limitations. For example, FS can miss an important combination of predictors, whereas BE might discard an important predictor due to a “nonsense” correlation between variables (Baele (1970)). We decided on forward selection, mainly because our sample size was large enough and it provided us with the opportunity to see the additional effect of each new variable that was added to the model. This also gave us the opportunity to carefully evaluate whether collinearity was present, causing coefficients and standard errors to blow up. - Patient selection process should be better represented by a flow diagram so authors should consider it. We understand that ‘plain text’ representation of the inclusion criteria might not be ideal. We have considered to present the data in a flow diagram, but feel that the information might be more readable as a list of bullets, which we added to the methods section. We hope you will agree. However, we also added a flow diagram as extra supplementary material (named S2_fig) to consider for publication (we have not yet added this figure to the manuscript). We leave the decision to the Editor. P 5, line 101-105 - Why did authors not formally calculated power for each outcome of interest? Authors should use Riley et al 2020 (https://www.bmj.com/content/368/bmj.m441) to demonstrate appropriateness of study power for each outcome. Thank you for your suggestion. We understand that Riley et al (2020) offer new insights into the events per variable or events per candidate predictor parameter (EPP) and supply a new method to calculate the EPP. When we designed this study, this article was not yet available, so we have used the commonly used rule of thumb ( at least 10/15) as mentioned by Heinze et al 2018) with a large margin (lowest number of EPV was 27). We have added a remark on sample size calculation to the discussion section with reference to Riley et al (2020). P20, line 332 - Authors have used a weighted split-sample procedure for internal validation. This statement required a proper citation. Thank you for your suggestions, we have added the required reference. P8, line 190 - Authors should also consider some additional model calibration measures such as Brier score, and expected /observed event ratio etc. Thank you for your suggestions. We chose the calibration curves as we feel that they give the best insight in the agreement between observed and expected probability. Our idea for the use of local prediction models for multimorbidity is to identify (groups of) patients with higher risks for the outcome compared to other patients. We did not expect to be able to perfectly predict whether or not patients will have the outcome as both multimorbidity and the causes for the outcomes are complex and influenced by many factors that could not be included using EHR data. Therefore, we feel that it is most important to show the agreement between observed and predicted probability. This enables users to assess how trustworthy the relatively higher risk is. - Model algorithm (or final equation) for each outcome must be presented in the manuscript so others will know about how to calculate risk for a given patient. Thank you for your suggestion. The aim of our research was to develop, validate and evaluate the performance of these prediction models based on EHR data. We show that it was possible to develop prediction models with mild accuracy with EHR data. It was not our primary aim to develop prediction models that will be used by others, but we do understand that others might want to use the models and/or check the external validity for example. For this purpose, we have provided the full models in the supplementary material, with the regression coefficients that can be used to calculate the risk for a given patient. The information for calculation available in the supplementary material could also be published in the main text. We leave this decision to the Editor. - Included figures are not clear and readable. Kindly add high resolution figures. Thank you for your comment, we have added figures with higher resolution, used PACE to check the requirements and hope they are clear and readable now. Journal requirements - Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. Thank you for your comment. We aimed to follow all PLOS ONE’s style requirements, but do apologize if we forgot any. Please let us know if any requirements are not met in the submission of the revised manuscript . - Thank you for including your ethics statement: "The local institutional review board104approved the anonymous use of thesedata for research purposesanda waiver of consent (Gelre LTC 105number2019_02)". Please amend your current ethics statement to include the full name of the ethics committee/institutional review board(s) that approved your specific study. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”). Thank you for your comment, we have made the requested amendments. P5, line 107. - Please correct your reference to "p=0.000" to "p<0.001" or as similarly appropriate, as p values cannot equal zero. Thank you for your comment, we have corrected our reference. All tables in manuscript. - In your Methods section, please ensure that sufficient information to make the study reproducible are provided (for example, by describing the models and equations used, and describing parameters and assumptions applied) Thank you for your comment. As reviewer #2 states the methods are described well enough to replicate the study, we are uncertain about what information is missing. The full models with regression coefficients that can be used to calculate individual risk for a patient are added as supplementary tables. - Please upload a new copy of Figure 1 as the detail is not clear. Thank you for your comment, we have re-added figure 1 with higher resolution and hope it is clear and readable now. Reviewer #1 - Thank you for the opportunity to review the submission of the manuscript entitled "Predicting future hospital care utilization by patients with multimorbidity using electronic health record data". The authors have written a very succinct and interesting manuscript, which will contribute to the knowledge base of multimorbidity. I have included suggested revisions below and would be happy to review a resubmission. Thank you very much for your time and positive feedback on our manuscript. Moreover, thank you for your comments and suggestions, that we address point by point below and in the Response to Reviewers-file. - Could the authors please further clarify why these separate groups of patients were created : "who had received outpatient clinical [care?] for two or more chronic diagnoses, two or more oncological diagnoses or at least one chronic and one oncological diagnosis in 2017 and who had received hospital care for at least one diagnosis in 2018"? Thank you for your question. When designing this study, we aimed to study a hospital population with multimorbidity. As described, multimorbidity is generally defined as two or more chronic conditions. In the Netherlands, the Clinical Classification Software diagnoses are categorized into acute, chronic, elective, oncological and other diagnoses. We had a discussion about the oncological diagnoses, because the clinicians in our research group felt that many oncological conditions/care over the years has turned into a specific type of chronic care. That is why we decided to include oncological diagnoses as chronic conditions. We have changed the description of this inclusion criterium in the methods to further clarify the inclusion. Furthermore, we only included patients who had received hospital care for at least one diagnosis in 2018, because we were interested in those patients that we might aim an intervention at to coordinate and tailor hospital care. P5, line 103-105 - Please change "multimorbid patients" to "patients with multimorbidity" Thank you for your suggestion. We have changed this text. P6, line 132 - Please change "probability for patient" as it is not clear what this sentence is describing Thank you for your careful reading. It appears that there was an ‘s’ missing, and the sentence should have said “probability for patients with ..”. We have added the ‘s’ and hope the sentence is clear with this adjustment. P17, line 264 - Could the authors please clarify whether data about the reason(s) for hospitalizations, emergency department visits or outpatient visits were available?-If this data were not available, what potential insight would this data provide and if this data were available, why was this data not included in the analyses? Thank you for your questions. The data we used for the development of the models are data that are registered in the EHR for both medical record purposes and financial claim reasons. These are part of the so-called diagnosis and treatment combinations. This makes the data easily retrievable from the EHR. Reason(s) for hospitalizations, ED visits or outpatient visits are not required for financial claims to insurance. In general, data on reasons for healthcare use is registered as open text and thus more often contain missing data. However, we do agree that reasons for visiting the hospital could give valuable insight, such as the relationship to the presence of certain conditions, the severity of present diseases (e.g. many hospitalizations for specific condition or many outpatient visits for specific condition might suggest unstable disease). Further development and use of artificial intelligence solutions could aid in retrieving the reason(s) for healthcare utilization in the future. We have added a remark on the possible future use of AI to include information that is not a standard part of EHR registration to the discussion section. P20, line 347 Reviewer #2 - This paper seeks to establish whether data from electronic health records can be used to predict future healthcare utilization as measured by >=1 acute hospital stays, >=2 ED visits, or >=12 outpatient visits. Numerous studies have been conducted on this topic, but this is unique in that it is focused on the population with multimorbidity, defined as 2 or more chronic conditions. The results show that with basic information from the EHR, future health utilzation can be predicted with mild accuracy (c-statistic around 0.70) Overall, the manuscript was technically sound, and the results presented are appropriate. The sample size is large enough for the methods used. One strength is the authors used the train/testing approach to build their models on 2/3 of the data, and validate on the remaining one-third. The results presented in the manuscript and supplementary material provide a good deal of transparency, and the methods are described well enough to replicate the study. Thank you very much for your time and positive feedback. Moreover, thank you for your comment and suggestions, that we address point by point below and in the Response to Reviewers-file. - Although the methods used are valid - they are not state-of-the-art when it comes to building models where the main goal is prediction (as opposed to inference). Linear Regularization methods like LASSO and Elastic-Net generally perform better at prediction than stepwise logistic models. Further, methods like classification trees and Random Forest, may do better if there are important interaction effects between predictor variables. The authors should address this in the limitations. Further, they may want to highlight some of the strengths of the model they used, namely that it is interpretable, compared to many other "black box" methods. Thank you for your comment and the useful suggestions. We have added a paragraph to the discussion to address the points that you raised. P19-20, line 325-333 Submitted filename: Response to reviewers.docx Click here for additional data file. 18 Nov 2021 Development and internal validation of prediction models for future hospital care utilization by patients with multimorbidity using electronic health record data PONE-D-21-09724R1 Dear Dr. Verhoeff, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Ram Chandra Bajpai, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thanks very much for your responses and revisions based on reviewer feedback -- I believe that this manuscript is acceptable for publication. Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 29 Nov 2021 PONE-D-21-09724R1 Development and internal validation of prediction models for future hospital care utilization by patients with multimorbidity using electronic health record data Dear Dr. Verhoeff: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Ram Chandra Bajpai Academic Editor PLOS ONE

Table 2

Results (development data) for outcome ‘≥1 hospitalization(s) in 2018’.

Variable	At least 1 hospitalization in 2018 (n = 1505)	No hospitalization in 2018 (n = 10615)	UnivariableOdds Ratio(95% CI)	MultivariableOdds Ratio(95% CI)	p-value
General characteristics (2017)
Age group, n (%)18–54 years55–64 years65–74 years75–98 years	192 (12.8)264 (17.5)434 (28.8)615 (40.9)	2603 (24.5)2143 (20.2)3135 (29.5)2734 (25.8)	1 (ref)1.67 (1.38–2.03)1.88 (1.57–2.25)3.05 (2.58–3.63)	1 (ref)1.53 (1.25–1.87)1.67 (1.39–2.02)2.54 (2.13–3.04)	<0.0001<0.0001<0.0001
Sex, female, n (%)	746 (49.6)	6143 (57.9)	0.72 (0.64–0.80)	0.80 (0.71–0.89)	0.0001
Socioeconomic status, n (%)LowMiddleHigh	630 (41.9)595 (39.5)280 (18.6)	4185 (39.4)4244 (40.0)2186 (20.6)	1 (ref)0.93 (0.83–1.05)0.85 (0.73–0.99)	1 (ref)0.89 (0.78–1.01)0.84 (0.71–0.98)	0.06100.0245
Disease characteristics (2017)
Chronic/oncologic diagnoses, n (%)2 chronic/oncologic diagnoses3 chronic/oncologic diagnoses4 chronic/oncologic diagnoses5 chronic/oncologic diagnoses≥6 chronic/oncologic diagnoses	739 (49.1)404 (26.8)201 (13.4)82 (5.5)79 (5.3)	6748 (63.6)2560 (24.1)862 (8.1)309 (2.9)136 (1.3)	1 (ref)1.44 (1.27–1.64)2.13 (1.79–2.52)2.42 (1.87–3.11)5.30 (3.97–7.05)	1 (ref)1.12 (0.97–1.28)1.34 (1.10–1.62)1.13 (0.85–1.49)1.80 (1.29–2.50)	0.12280.00280.40170.0005
Acute diagnoses, median (IQR), diagnoses	0 (0–1)	0 (0–1)	1.67 (1.57–1.78)	1.09 (1.01–1.18)	0.0296
Hospital care characteristics (2017)
Medical specialties involved, n (%)2 specialties3 specialties4 specialties5 specialties≥6 specialties	312 (20.7)424 (28.2)327 (21.7)231 (15.4)211 (14.0)	3440 (32.4)3458 (32.6)2069 (19.5)978 (9.2)670 (6.3)	1 (ref)1.35 (1.16–1.58)1.74 (1.48–2.05)2.60 (2.16–3.13)3.47 (2.86–4.21)
Outpatient visits, n(%)2–4 visits5–7 visits≥8 visits	366 (24.32)442 (29.37)697 (46.3)	3884 (36.6)3685 (34.7)3046 (28.7)	1 (ref)1.27 (1.10–1.47)2.43 (2.12–2.78)	1 (ref)1.02 (0.87–1.18)1.26 (1.07–1.50)	0.84410.0065
Acute hospitalizations, n (%)No acute hospitalizations1 acute hospitalization≥2 acute hospitalizations	943 (62.7)345 (22.9)217 (14.4)	8908 (83.9)1359 (12.8)348 (3.3)	1 (ref)2.40 (2.09–2.74)5.89 (4.90–7.06)	1 (ref)1.23 (0.99–1.55)1.55 (1.11–2.16)	0.06790.0100
Inpatient days, n(%)No inpatient days1–3 inpatient days4–7 inpatient days≥8 inpatient days	821 (54.6)142 (9.4)188 (12.5)354 (23.5)	8085 (76.2)870 (8.2)844 (8.0)816 (7.7)	1 (ref)1.61 (1.32–1.94)2.19 (1.84–2.60)4.27 (3.70–4.93)	1 (ref)1.09 (0.87–1.37)1.20 (0.94–1.51)1.47 (1.13–1.89)	0.44690.14230.0033
Patients with at least 1 ICU admission, n (%)	27 (1.8)	139 (1.3)	1.38 (0.89–2.05)
Emergency department visits, median (IQR), visits	1 (0–2)	0 (0–1)	1.62 (1.54–1.70)	1.23 (1.15–1.33)	<0.0001

Table 3

Results (development data) for outcome ‘≥2 ED visits in 2018’.

Variable	≥2 ED visits (n = 839)	No or 1 ED visit (n = 11.282)	UnivariableOdds Ratio(95% CI)	MultivariableOdds Ratio(95% CI)	p-value
General characteristics (2017)
Age group, n (%)18–54 years55–64 years65–74 years75–98 years	139 (16.6)142 (16.9)264 (31.5)294 (35.0)	2547 (22.6)2186 (19.4)3443 (30.5)3106 (27.5)	1 (ref)1.19 (0.94–1.51)1.41 (1.14–1.74)1.73 (1.41–2.14)	1 (ref)1.11 (0.86–1.42)1.29 (1.03–1.61)1.42 (1.14–1.77)	0.42930.02520.0019
Sex, female, n (%)	443 (52.8)	6373 (56.5)	0.86 (0.75–0.99)
Socioeconomic status, n (%)LowMiddleHigh	351 (41.8)337 (40.2)151 (18.0)	4449 (39.4)4554 (40.4)2279 (20.2)	1 (ref)0.94 (0.80–1.10)0.84 (0.69–1.02)
Disease characteristics (2017)
Chronic/oncologic diagnoses, n (%)2 chronic/oncologic diagnoses3 chronic/oncologic diagnoses4 chronic/oncologic diagnoses5 chronic/oncologic diagnoses≥6 chronic/oncologic diagnoses	389 (46.4)222 (26.5)116 (13.8)53 (6.3)59 (7.0)	7005 (62.1)2780 (24.6)988 (8.8)346 (3.1)163 (1.4)	1 (ref)1.44 (1.21–1.70)2.11 (1.69–2.62)2.76 (2.01–3.72)6.52 (4.72–8.88)	1 (ref)1.05 (0.88–1.26)1.16 (0.91–1.47)1.09 (0.76–1.52)1.90 (1.31–2.72)	0.57990.23240.63490.0006
Acute diagnoses, median (IQR)	1 (0–2)	0 (0–1)	1.79 (1.66–1.92)	1.11 (1.01–1.22)	0.0271
Hospital care characteristics (2017)
Medical specialties involved, n (%)2 specialties3 specialties4 specialties5 specialties≥6 specialties	164 (19.5)215 (25.6)166 (19.8)134 (16.0)160 (19.1)	3504 (31.1)3646 (32.3)2285 (20.2)1083 (9.6)767 (6.8)	1 (ref)1.26 (1.02–1.55)1.55 (1.24–1.94)2.64 (2.08–3.35)4.46 (3.54–5.62)
Outpatient visits, n(%)2–4 visits5–7 visits≥8 visits	150 (17.9)239 (28.5)450 (53.6)	4042 (35.8)3835 (34.0)3405 (30.2)	1 (ref)1.68 (1.36–2.07)3.56 (2.95–4.32)	1 (ref)1.36 (1.09–1.69)1.72 (1.37–2.17)	0.0056<0.0001
Acute hospitalizations, n (%)No acute hospitalizations1 acute hospitalization≥2 acute hospitalizations	495 (59.0)192 (22.9)152 (18.1)	9350 (82.9)1505 (13.3)427 (3.8)	1 (ref)2.41 (2.02–2.87)6.72 (5.46–8.25)
Inpatient days, n(%)No inpatient days1–3 inpatient days4–7 inpatient days≥8 inpatient days	427 (50.9)75 (8.9)112 (13.3)225 (26.8)	8427 (74.7)942 (8.3)926 (8.2)987 (8.8)	1 (ref)1.57 (1.21–201)2.39 (1.91–2.96)4.50 (3.77–5.35)	1 (ref)1.00 (0.76–1.29)1.24 (0.97–1.57)1.37 (1.08–1.73)	0.98620.07480.0086
Patients with at least 1 ICU admission, n (%)	29 (3.5)	152 (1.3)	2.62 (1.72–3.86)
Emergency department days, median (IQR)	1 (1–3)	0 (0–1)	1.81 (1.72–1.91)	1.49 (1.39–1.61)	<0.0001

Table 4

Results (development data) for outcome ‘≥12 outpatient visits in 2018’.

Variable	≥12 outpatient visits in 2018 (n = 862)	Less than 12 outpatient visits in 2018 (n = 11258)	UnivariateOdds Ratio(95% CI)	MultivariableOdds Ratio(95% CI)	P Value
General characteristics (2017)
Age group, n (%)18–54 years55–64 years65–74 years75–98 years	127 (14.7)173 (20.1)299 (34.7)263 (30.5)	2691 (23.9)2194 (19.5)3258 (28.9)3115 (27.7)	1 (ref)1.67 (1.32–2.12)1.94 (1.57–2.42)1.79 (1.44–2.23)	1(ref)1.44 (1.13–1.83)1.58 (1.27–1.98)1.34 (1.07–1.69)	0.0034<0.00010.0109
Sex, female, n (%)	451 (52.3)	6426 (57.1)	0.83 (0.72–0.95)
Socioeconomic status, n (%)LowMiddleHigh	361 (41.9)338 (39.2)163 (18.9)	4530 (40.2)4489 (40.0)2239 (19.9)	1 (ref)0.94 (0.81–1.10)0.91 (0.75–1.10)
Disease characteristics (2017)
Chronic/oncologic diagnoses, n (%)2 chronic/oncologic diagnoses3 chronic/oncologic diagnoses4 chronic/oncologic diagnoses5 chronic/oncologic diagnoses≥6 chronic/oncologic diagnoses	321 (37.2)262 (30.4)147 (17.1)73 (8.5)59 (6.8)	7185 (63.8)2663 (23.7)957 (8.5)310 (2.8)143 (1.3)	1 (ref)2.20 (1.86–2.61)3.44 (2.79–4.22)5.27 (3.97–6.93)9.24 (6.64–12.70)	1 (ref)1.34 (1.12–1.62)1.43 (1.12–1.81)1.57 (1.13–2.15)2.17 (1.48–3.17)	0.00180.00350.00630.0001
Acute diagnoses in 2017, median (IQR), diagnoses	0 (0–1)	0 (0–1)	1.52 (1.41–1.63)
Hospital care characteristics (2017)
Medical specialties involved, n (%)2 specialties3 specialties4 specialties5 specialties≥6 specialties	123 (14.3)188 (21.8)198 (23.0)168 (19.5)185 (21.5)	3644 (32.4)3641 (32.3)2238 (19.9)1041 (9.2)694 (6.2)	1 (ref)1.53 (1.21–1.93)2.62 (2.08–3.31)4.78 (3.76–6.10)7.90 (6.21–10.08)	1 (ref)0.93 (0.72–1.19)1.03 (0.79–1.35)1.34 (1.00–1.80)1.59 (1.15–2.20)	0.54890.84330.05460.0047
Outpatient visits, n(%)2–4 visits5–7 visits≥8 visits	90 (10.4)212 (24.6)560 (65.0)	4180 (37.1)3904 (34.7)3174 (28.2)	1 (ref)2.52 (1.97–3.25)8.19 (6.56–10.35)	1 (ref)2.13 (1.52–2.61)4.76 (3.63–6.29)	<0.0001<0.0001
Acute hospitalizations, n (%)No acute hospitalizations1 acute hospitalization≥2 acute hospitalizations	586 (68.0)168 (19.5)108 (12.5)	9328 (82.9)1456 (12.9)474 (4.2)	1 (ref)1.84 (1.53–2.19)3.63 (2.88–4.53)
Inpatient days, n(%)No inpatient days1–3 inpatient days4–7 inpatient days≥8 inpatient days	486 (56.4)90 (10.4)115 (13.3)171 (19.8)	8490 (75.4)900 (8.0)906 (8.0)962 (8.5)	1 (ref)1.75 (1.37–2.20)2.22 (1.78–2.74)3.11 (2.57–3.73)
Patients with at least 1 ICU admission, n (%)	28 (3.2)	149 (1.3)	2.50 (1.63–3.71)
Emergency department days, median (IQR), days	0 (0–1)	0 (0–1)	1.45 (1.37–1.53)	1.13 (1.06–1.20)	<0.0001

39 in total

Review 1. Causes and consequences of comorbidity: a review.

Authors: R Gijsen; N Hoeymans; F G Schellevis; D Ruwaard; W A Satariano; G A van den Bos
Journal: J Clin Epidemiol Date: 2001-07 Impact factor: 6.437

Review 2. The association between continuity of care and outcomes: a systematic and critical review.

Authors: Carl van Walraven; Natalie Oake; Alison Jennings; Alan J Forster
Journal: J Eval Clin Pract Date: 2010-10 Impact factor: 2.431

3. Healthcare Fragmentation and the Frequency of Radiology and Other Diagnostic Tests: A Cross-Sectional Study.

Authors: Lisa M Kern; Joanna K Seirup; Lawrence P Casalino; Monika M Safford
Journal: J Gen Intern Med Date: 2016-10-27 Impact factor: 5.128

4. Health-related quality of life and healthcare utilization in multimorbidity: results of a cross-sectional survey.

Authors: Calypse B Agborsangaya; Darren Lau; Markus Lahtinen; Tim Cooke; Jeffrey A Johnson
Journal: Qual Life Res Date: 2012-06-09 Impact factor: 4.147

5. Risk of Frequent Emergency Department Use Among an Ambulatory Care Sensitive Condition Population: A Population-based Cohort Study.

Authors: Catherine Hudon; Josiane Courteau; Yohann M Chiu; Maud-Christine Chouinard; Marie-France Dubois; Nicole Dubuc; Nicolas Elazhary; Francois Racine-Hemmings; Isabelle Dufour; Alain Vanasse
Journal: Med Care Date: 2020-03 Impact factor: 2.983

6. Predicting hospitalizations from electronic health record data.

Authors: Kyle Morawski; Yoni Dvorkis; Craig B Monsen
Journal: Am J Manag Care Date: 2020-01-01 Impact factor: 2.229

7. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study.

Authors: Karen Barnett; Stewart W Mercer; Michael Norbury; Graham Watt; Sally Wyke; Bruce Guthrie
Journal: Lancet Date: 2012-05-10 Impact factor: 79.321

8. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration.

Authors: Karel G M Moons; Douglas G Altman; Johannes B Reitsma; John P A Ioannidis; Petra Macaskill; Ewout W Steyerberg; Andrew J Vickers; David F Ransohoff; Gary S Collins
Journal: Ann Intern Med Date: 2015-01-06 Impact factor: 25.391

9. Online Prediction of Health Care Utilization in the Next Six Months Based on Electronic Health Record Information: A Cohort and Validation Study.

Authors: Zhongkai Hu; Shiying Hao; Bo Jin; Andrew Young Shin; Chunqing Zhu; Min Huang; Yue Wang; Le Zheng; Dorothy Dai; Devore S Culver; Shaun T Alfreds; Todd Rogow; Frank Stearns; Karl G Sylvester; Eric Widen; Xuefeng Ling
Journal: J Med Internet Res Date: 2015-09-22 Impact factor: 5.428

10. Associations between multimorbidity, healthcare utilisation and health status: evidence from 16 European countries.

Authors: Raffaele Palladino; John Tayu Lee; Mark Ashworth; Maria Triassi; Christopher Millett
Journal: Age Ageing Date: 2016-03-24 Impact factor: 10.668