Literature DB >> 30911219

Applying Supervised Machine Learning to Identify Which Patient Characteristics Identify the Highest Rates of Mortality Post-Interhospital Transfer.

Andrew P Reimer^1,2, Nicholas K Schiltz¹, Vanessa P Ho³, Elizabeth A Madigan⁴, Siran M Koroukian⁵.

Abstract

OBJECTIVE: To demonstrate the usefulness of applying supervised machine-learning analyses to identify specific groups of patients that experience high levels of mortality post-interhospital transfer.
METHODS: This was a cross-sectional analysis of data from the Health Care Utilization Project 2013 National Inpatient Sample, that applied supervised machine-learning approaches that included (1) classification and regression tree to identify mutually exclusive groups of patients and their associated characteristics of those experiencing the highest levels of mortality and (2) random forest to identify the relative importance of each characteristic's contribution to post-transfer mortality.
RESULTS: A total of 21 independent groups of patients were identified, with 13 of those groups exhibiting at least double the national average rate of mortality post-transfer. Patient characteristics identified as influencing post-transfer mortality the most included: diagnosis of a circulatory disorder, comorbidity of coagulopathy, diagnosis of cancer, and age.
CONCLUSIONS: Employing supervised machine-learning analyses enabled the computational feasibility to assess all potential combinations of available patient characteristics to identify groups of patients experiencing the highest rates of mortality post-interhospital transfer, providing potentially useful data to support developing clinical decision support systems in future work.

Entities: Chemical

Keywords: Transportation of patients; patient outcome assessment; supervised machine learning

Year: 2019 PMID： 30911219 PMCID： PMC6425528 DOI： 10.1177/1178222619835548

Source DB: PubMed Journal: Biomed Inform Insights ISSN： 1178-2226

Approximately 1.6 million patients undergo interhospital transfer annually.[1] Patients undergoing interhospital transfer experience up to three times higher mortality,[2,3] use double the resources and experience twice the length of stay than those not transferred from another hospital.[1] Interhospital transfers consist of two primary patient types: those experiencing an immediately life-threatening condition (e.g. myocardial infarction, trauma) and those who are not experiencing an immediately life-threatening condition. Transfer for patients experiencing an immediately life-threatening condition has been shown to be a life-saving measure, with reductions in mortality for trauma[4-8] and heart attack patients[9] but has yielded conflicting results for stroke[10,11] and minimally injured trauma patients.[12-14] The decision to transfer patients from lower to higher levels of care for an immediately life-threatening condition are common and often supported by referral networks established within local regions like trauma and stroke networks. For those patients not experiencing an immediately life-threatening condition, the decision to transfer is complicated and is based on individual provider judgment, family request, or other factors. Currently, no national guidelines[15] exists to guide interhospital transfer; furthermore, there is limited understanding of who does and does not benefit from being transferred and exactly when those transfers should occur. The overall poor outcomes that interhospital transfer patients experience and mixed outcomes for patients that are immediately transferred for time-sensitive conditions suggest that we do not have a good understanding of immediately life-threatening conditions. Outside of patients that are transferred for intervention that must be performed immediately upon arrival at the receiving hospital (e.g. cardiac catheterization and surgical procedure), our recognition of what constitutes a patient experiencing an immediately life-threatening condition needs to be reconceptualized. Reconceptualizing type of transfer patients require the focus to move beyond the currently used broad categories (e.g. trauma and stroke) to categories that support patient-specific characteristics that identify those who should be considered for transfer. Therefore, to begin moving toward a more patient-centric approach, the purpose of this study was to identify specific groups of patients and their associated characteristics that experience high levels of mortality post-transfer.

Methods

Data source

We used the 2013 Nationwide Inpatient Sample (NIS). The NIS is part of the Healthcare Cost and Utilization Project (HCUP) from the Agency for Healthcare Research and Quality (AHRQ) and is the largest all-payer inpatient database in the United States with a nationally representative sample of approximately 8 million inpatient discharges each year.[16] We identified all adult patients aged 19 years or older that were transferred from one acute care hospital to another to compose an interhospital transfer cohort.

Measures

Our main outcome measure is in-hospital mortality, as recorded on the hospital billing record discharge status. To identify patient characteristics and variables that are clinically meaningful and where available in the data set, we only incorporated covariates that are useful in guiding clinical decision-making or practice. Patient-level covariates included the following: age (continuous), gender, payer type, race, comorbidity, and primary diagnosis. To include the primary diagnosis and to make the analysis computationally feasible, we accounted for the primary diagnosis via the Clinical Classification System (CCS) for the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM),[17,18] using the multi-level diagnosis category labels—a total of 17 categories. The multi-level CCS category is a standard, established method to collapse over 14 000 diagnosis codes and 3900 procedure codes into clinically meaningful categories.[18] Refer to Supplementary Material Table 1 for variables included in analysis, for a listing of the covariates and CCS categories used in the model. We measured the presence of comorbid conditions using the Elixhauser comorbidity index list. The Elixhauser index contains 30 comorbid conditions defined through secondary ICD-9-CM diagnosis codes and Diagnosis Related Group (DRG) codes.[19-21] We excluded both arthritis and fluid and electrolyte disorder comorbidities. Many patients have arthritis, and for the purposes of this study, it was not considered a factor that differentiates patients for transfer. In addition, most patients hospitalized and undergoing interhospital transfer experience some form of abnormal laboratory value, making it not clinically useful for identifying discrete subgroups of patients who will provide new insight to enable reconceptualizing patient categories for transfer. To describe the severity of the patient population and to enabling comparison between the data subsets used in the analysis, we used the All Patient Refined Diagnosis Related Groups (APR-DRG) Risk Mortality covariate provided by HCUP. The APR-DRGs are assigned using proprietary software developed by 3M Health Information Systems that include the base APR-DRG, the severity of illness subclass, and the risk of mortality subclass within each base APR-DRG.[16] We only used this variable to provide a description of the study samples and did not include it in the model development and analyses due to it being a combination of other covariates already included in the model (e.g. age, gender, and diagnosis) while also including proprietary calculations that are not available within the electronic medical record (EMR) and thus would not be useable in decision-support tools or other patient care activities relying on primary data. System-level covariates in the analysis included the following: admission month, admission on a weekend, hospital bed size, hospital teaching status, hospital region, hospital control/ownership, and patient location before hospitalization. We also accounted for whether patients received a major operating room procedure that was either diagnostic or therapeutic occurring post-transfer. The University Hospitals Case Medical Center Institutional Review Board determined that this study meets the exemption criteria for human subject research (IRB #em-14-30).

Statistical analysis

Frequency counts and percentages were tabulated for the categorical outcome—mortality. For descriptive analysis, we used discharge-level survey weights provided in the NIS that accounted for complex survey design effects. The final sample for this study is a nationally representative sample generated via the weighting variable provided with the data set. However, the classification and regression tree (CART) analysis does not apply the sample weights, which leads to smaller samples in the terminal nodes. We excluded cases where the mortality variable was missing. We did not exclude any observations with missing values for the independent variables specifically because a robust feature of the CART algorithm is that it handles missing data using the surrogate split method—a method that finds an alternative variable that is highly correlated with the missing variable to determine the split.[22] While there are other methods for handling missing data in CART analysis,[23] the default setting in CART packages is to skip missing variables to streamline the analysis.[24] In this analysis, we employed the surrogate split method that identifies and supplements a surrogate variable.

Supervised machine-learning approaches

We used CART analysis to identify combinations of predictors associated with post-transfer mortality. The CART involves a tree-building technique in which the choice of “splitting” variables is based on an exhaustive search of all possibilities, using a recursive partitioning algorithm, resulting in mutually exclusive groups that are the most different with respect to the dependent variable.[25] The tree-building process leads to terminal nodes (or leaves), at which point the nodes cannot be divided anymore and need to be pruned to avoid over-fitting and increase efficiency.[26] First, CART recursively partitions the patients into smaller and smaller homogeneously distributed groups—in this case, based on the presence of specific combinations of clinical conditions. The purpose is to reduce variations within the group and to improve the fit as best as possible. Next, CART uses these groups to predict post-transfer mortality. We used the following stopping criteria (based on model tuning described below): a maximum tree depth of 10 splits, a minimum node size of 50 subjects, requiring a split to increase the complexity parameter by a minimum of 0.001 and using the information impurity index to determine node splits. To build our model, we partitioned the study data into a training data set (70% of the data) and validation data set (remaining 30%) using random sampling within each class of the outcome variables. We used 10-fold cross-validation repeated three times on the training data set to build the CART models. Since mortality is highly unbalanced, we weighted the “cost” of a false negative to be higher than a false positive to improve sensitivity and produce a more meaningful model. We then tested the accuracy of our models on the testing data set using a confusion matrix and by calculating the area under the curve. We also used the Matthews correlation coefficient measure, a measure of accuracy that accounts for imbalanced outcomes.[27] We chose our final model for the outcome based on accuracy and interpretability. In addition, we compared our final models with those from a random forest model to see if they were in agreement on variables that are the most important predictors. Random forest is a bootstrap aggregation method that creates multiple decision trees using random variable selection. Breiman et al[22] provides a detailed description of random forest. We used SAS software version 9.4[28] for data management; for our statistical analyses, we used R version 3.3.1 and RStudio 1.0.136[29] and the “rpart” (CART), partykit (tree graphics), “randomForest” (random forest), and “caret” (model tuning and cross-validation) packages.

Results

In 2013, approximately 1 456 422 adult patients underwent interhospital transfer, 52% were male, 66% White, 11% Black, and 7% Hispanic. The primary payers for the interhospital transfer were Medicare 44%, Medicaid 19%, and private insurance 26%. Further demographic characteristics of the nationally weighted sample are provided in Table 1, and the frequency of the primary diagnosis categorized by the multi-level diagnosis category of the CCS in Table 2. As expected, circulatory disease was the most frequent diagnosis in the older age groups (45 and older), whereas mental health was the most frequent in the youngest age groups (19-44). Frequency of comorbidities across age cohorts is presented in Table 3. The distribution of patient characteristics across the total study population and between the training and testing data sets are available in Supplementary Material Table 2.

Table 1.

Sample characteristics*.

	Age (19-34)		Age (35-44)		Age (45-54)		Age (55-64)		Age (65-74)		Age (75-84)		Age over 85
	n	%	n	%	n	%	n	%	n	%	n	%	n	%
Total subjects	179 895		125 525		212 665		275 530		290 259		245 504		127 044
Sex
Male	80 530	44.8	63 445	50.5	119 255	56	156 235	56.7	154 425	53.2	119 249	48.6	50 685	39.9
Female	99 360	55.2	62 075	49.5	93 400	43.9	119 295	43.3	135 830	46.8	126 239	51.4	76 344	60.1
Race
White	104 065	57.8	77 490	61.7	137 925	64.9	188 155	68.3	212 470	73.2	183 954	74.9	98 324	77.4
Black	27 440	15.3	17 425	13.9	29 795	14.0	32 750	11.9	23 950	8.3	15 515	6.3	6 495	5.1
Hispanic	16 675	9.3	10 155	8.1	14 070	6.6	14 535	5.3	13 620	4.7	11 710	4.8	5050	4.0
Other	31 715	17.6	20 455	16.3	30 875	14.5	40 089	14.6	40 219	13.9	34 324	14.0	17 175	13.5
Admission status
Non-elective admission	150 180	83.5	105 820	84.3	176 975	83.2	225 710	81.9	229 175	79.0	185 905	75.7	92 630	72.9
Elective admission	29 015	16.1	19 230	15.3	35 055	16.5	48 955	17.8	60 104	20.7	58 684	23.9	33 859	26.7
Mortality risk
Minor	133 945	74.5	77 125	61.4	106 010	49.8	109 945	39.9	70 790	24.4	33 840	13.8	13 135	10.3
Moderate	22 550	12.5	23 660	18.8	50 490	23.7	75 700	27.5	92 070	31.7	89 624	36.5	50 949	40.1
Major	13 390	7.4	14 515	11.6	34 065	16.0	54 995	20.0	80 220	27.6	82 605	33.6	45 805	36.1
Extreme	9935	5.5	10 175	8.1	22 020	10.4	34 785	12.6	47 095	16.2	39 305	16.0	17 105	13.5
Payer type
Medicare	15 125	8.4	22 785	18.2	49 135	23.1	81 740	29.7	246 009	84.8	224 184	91.3	116 769	91.9
Medicaid	58 895	32.7	32 775	26.1	45 755	21.5	41 995	15.2	4040	1.4	2245	0.9	1045	0.8
Private	62 255	34.6	43 020	34.3	77 200	36.3	112 865	41.0	30 780	10.6	13 220	5.4	6015	4.7
Self-pay	25 765	14.3	16 385	13.1	24 620	11.6	19 480	7.1	1455	0.5	925	0.4	670	0.5
Other	14 305	8.0	8300	6.6	12 700	6.0	16 725	6.1	7265	2.5	4405	1.8	2335	1.8
Hospital teaching status
Nonteaching	44 000	24.5	30 335	24.2	51 405	24.2	65 450	23.8	74 775	25.8	70 215	28.6	39 500	31.1
Teaching	124 789	69.4	87 409	69.6	149 349	70.2	193 974	70.4	195 469	67.3	154 374	62.9	74 619	58.7
Hospital region
Northeast	25 900	14.4	18 500	14.7	32 320	15.2	40 530	14.7	42 525	14.7	36 855	15.0	21 085	16.6
Midwest	54 565	30.3	36 895	29.4	59 160	27.8	77 994	28.3	80 814	27.8	72 529	29.5	37 829	29.8
South	69 915	38.9	49 685	39.6	88 215	41.5	110 055	39.9	115 365	39.7	90 185	36.7	42 040	33.1
West	29 515	16.4	20 445	16.3	32 970	15.5	46 950	17.0	51 554	17.8	45 934	18.7	26 090	20.5

Sample characteristics total subjects and % represent the data set weighted to reflect national estimates.

Table 2.

Clinical classification frequencies*.

	Age (19-34)		Age (35-44)		Age (45-54)		Age (55-64)		Age (65-74)		Age (75-84)		Age over 85
	n	%	n	%	n	%	n	%	n	%	n	%	n	%
Total subjects	179 895		125 525		212 665		275 530		290 259		245 504		127 044
Clinical Classification
Infection	5905	3.3	5740	4.6	11 615	5.5	16 780	6.1	16 685	5.7	12 785	5.2	6520	5.1
Cancer	2755	1.5	3525	2.8	8570	4.0	13 755	5.0	14 020	4.8	9745	4.0	3230	2.5
Hematologic	2340	1.2	1365	1.0	1595	0.7	1995	0.7	2510	0.8	1675	0.7	2400	1.1
Metabolic	3825	2.1	2870	2.3	4925	2.3	5470	2.0	4770	1.6	3200	1.3	1515	1.2
Mental	56 350	31.3	29 970	23.9	36 970	17.4	22 565	8.2	11 510	4.0	7445	3.0	4000	3.1
Nervous	7495	4.2	5395	4.3	7470	3.5	8235	3.0	7055	2.4	5175	2.1	1810	1.4
Circulatory	10 695	5.9	21 480	17.1	56 110	26.4	88 615	32.2	100 700	34.7	83 435	34.0	37 355	29.4
Respiratory	6165	3.4	5550	4.4	11 770	5.5	18 825	6.8	21 510	7.4	17 525	7.1	8980	7.1
Digestive	12 735	7.1	11 840	9.4	19 380	9.1	23 360	8.5	23 195	8.0	18 685	7.6	10 295	8.1
Genitourinary	3710	2.1	3780	3.0	5535	2.6	7425	2.7	8730	3.0	8035	3.3	4650	3.7
Pregnancy	28 545	15.9	4745	3.8	80	0.0	0	0.0	0	0.0	0	0.0	0	0.0
Skin	2690	1.5	2085	1.7	2970	1.4	2765	1.0	2025	0.7	1535	0.6	795	0.6
Musculoskeletal	2510	1.4	2520	2.0	3990	1.9	5000	1.8	5255	1.8	3870	1.6	1755	1.4
Congenital	390	0.2	265	0.2	310	0.1	260	0.1	190	0.1	155	0.1	60	0.0
Injury/poison	26 765	14.9	17 285	13.8	25 065	11.8	30 500	11.1	31 215	10.8	29 255	11.9	20 105	15.8

Sample characteristics total subjects and % represent the data set weighted to reflect national estimates.

Table 3.

Comorbidity Frequencies*.

	Age (19-34)		Age (35-44)		Age (45-54)		Age (55-64)		Age (65-74)		Age (75-84)		Age over 85
	n	%	n	%	n	%	n	%	n	%	n	%	n	%
Total subjects	179 895		125 525		212 665		275 530		290 259		245 504		127 044
Comorbidities
AIDS	345	0.2	465	0.4	910	0.4	480	0.2	170	0.1	45	0.0	0	0.0
Alcohol abuse	18 230	10.1	15 170	12.1	30 825	14.5	24 085	8.7	13 035	4.5	4735	1.9	750	0.6
Deficiency anemias	19 540	10.9	17 770	14.2	34 995	16.5	53 595	19.5	64 060	22.1	60 210	24.5	31 720	25.0
Rheumatoid arthritis	2410	1.3	2860	2.3	5255	2.5	8480	3.1	10 625	3.7	9390	3.8	3885	3.1
Long-term blood loss anemia	5845	3.2	1895	1.5	1935	0.9	2995	1.1	3430	1.2	3205	1.3	1790	1.4
Congestive heart failure	2455	1.4	3800	3.0	11 745	5.5	24 835	9.0	36 610	12.6	40 205	16.4	26 365	20.8
Long-term pulmonary disease	18 770	10.4	16 435	13.1	40 600	19.1	66 610	24.2	78 965	27.2	61 535	25.1	24 755	19.5
Coagulopathy	7920	4.4	7515	6.0	16 735	7.9	23 745	8.6	23 920	8.2	19 760	8.0	8630	6.8
Depression	15 880	8.8	15 520	12.4	28 215	13.3	37 830	13.7	36 185	12.5	26 925	11.0	12 500	9.8
Diabetes—uncomplicated	8810	4.9	16 465	13.1	43 640	20.5	73 310	26.6	89 275	30.8	68 140	27.8	25 630	20.2
Diabetes—long-term complications	2305	1.3	4990	4.0	12 420	5.8	21 450	7.8	24 250	8.4	16 895	6.9	5065	4.0
Drug abuse	37 305	20.7	18 360	14.6	23 970	11.3	13 605	4.9	3840	1.3	1020	0.4	260	0.2
Hypertension—combine	22 860	12.7	43 875	35.0	110 650	52.0	174 475	63.3	205 790	70.9	182 144	74.2	94 364	74.3
Hypothyroidism	6270	3.5	8785	7.0	18 920	8.9	30 870	11.2	42 890	14.8	44 915	18.3	28 720	22.6
Liver disease	3780	2.1	5240	4.2	13 805	6.5	17 185	6.2	10 215	3.5	4255	1.7	970	0.8
Lymphoma	400	0.2	590	0.5	1295	0.6	2440	0.9	3970	1.4	3175	1.3	1105	0.9
Fluid and electrolyte disorders	27 630	15.4	26 510	21.1	54 190	25.5	78 815	28.6	86 100	29.7	74 350	30.3	39 365	31.0
Metastatic cancer	735	0.4	1350	1.1	4615	2.2	8390	3.0	9355	3.2	6090	2.5	1680	1.3
Other neurological disorders	12 535	7.0	10 550	8.4	17 865	8.4	23 235	8.4	26 305	9.1	26 040	10.6	15 065	11.9
Obesity	18 115	10.1	21 465	17.1	37 150	17.5	49 950	18.1	45 995	15.8	22 575	9.2	4235	3.3
Paralysis	5980	3.3	5340	4.3	10 660	5.0	16 150	5.9	18 350	6.3	14 115	5.7	6345	5.0
Peripheral vascular disorders	1695	0.9	2760	2.2	9880	4.6	22 780	8.3	34 320	11.8	31 790	12.9	14 300	11.3
Psychoses	11 940	6.6	9460	7.5	15 065	7.1	16 145	5.9	12 305	4.2	7985	3.3	3690	2.9
Pulmonary circulation disorders	2340	1.3	2265	1.8	4965	2.3	8500	3.1	10 390	3.6	10 420	4.2	5645	4.4
Renal failure	4615	2.6	7660	6.1	19 875	9.3	38 770	14.1	55 325	19.1	56 865	23.2	31 255	24.6
Solid tumor without metastasis	850	0.5	1045	0.8	3515	1.7	7005	2.5	9750	3.4	8040	3.3	3345	2.6
Peptic ulcer disease	15	0.0	40	0.0	65	0.0	90	0.0	120	0.0	115	0.0	15	0.0
Valvular disease	2095	1.2	2205	1.8	4635	2.2	8630	3.1	13 850	4.8	17 940	7.3	12 610	9.9
Weight loss	7105	3.9	6520	5.2	14 575	6.9	23 025	8.4	26 140	9.0	23 010	9.4	12 095	9.5

Sample characteristics total subjects and % represent the data set weighted to reflect national estimates.

Sample characteristics*. Sample characteristics total subjects and % represent the data set weighted to reflect national estimates. Clinical classification frequencies*. Sample characteristics total subjects and % represent the data set weighted to reflect national estimates. Comorbidity Frequencies*. Sample characteristics total subjects and % represent the data set weighted to reflect national estimates. The final CART identified 21 discrete subgroups of patients (Figure 1). Trees from the training holdout data set and the testing holdout data set contained the same splits and terminal nodes. Of the 21 subgroups, 12 were for patients with a primary cardiac diagnosis (n = 16 798 patients), the next eight groups primary diagnoses were cancer (n = 35 030 patients), and the remaining subgroup had neither cardiac nor cancer as a primary diagnosis (n = 151 464).

Figure 1.

Classification and regression tree.

The classification and regression tree with variables identified within the ovals and the value of each variable at the split signified and defined by each connecting line to the next variable and split. The bars at the bottom identify a distinct clinical group with the total number of subjects contained in that group (n), the bar represents the % mortality for patients in that group.

Classification and regression tree. The classification and regression tree with variables identified within the ovals and the value of each variable at the split signified and defined by each connecting line to the next variable and split. The bars at the bottom identify a distinct clinical group with the total number of subjects contained in that group (n), the bar represents the % mortality for patients in that group. Subgroups with a primary cardiac diagnosis (Figure 1—right side) experiencing the highest rates of post-transfer mortality included (1) patients greater than 40 years old with either coagulopathy (30% mortality) or with metastasis (~35%), (2) patients greater than 52 years old with cardiac arrhythmia and either liver failure (~35%) or pulmonary circulatory comorbidity (30%), and (3) patients greater than 72 years without Medicare (35%). The payer mix of the patients in the subgroup that was greater than 72 years and without Medicare consisted of 10% on Medicaid, 56% private insurance, 10% self-pay, and 24% not specified. Alternatively, patients that were less than 40 years (5% mortality) or greater than 40 years and underwent an operating room procedure (5% mortality) experienced the highest rates of survival. Subgroups of patients that had cancer as the primary diagnosis (Figure 1—left side) that experienced the highest rates of mortality post-transfer included (1) those greater than 83 years old (35% mortality), (2) those >68 years with either hypertension (15% mortality) or on Medicare (10% mortality), and (3) for those <68 years old with coagulopathy and either arrhythmia (25% mortality) or pulmonary circulatory comorbidity (35% mortality). The results from the random forest analysis are presented in Figure 2. Variables identified as being important via random forest, but not included in any of the CART pathways include weight loss, congestive heart failure, and genitourinary.

Figure 2.

Random forest results.

Abbreviations: Dx, diagnosis; CM, comorbidity.

Variables identified as contributing the most to post-transfer mortality are displayed with the most important starting at the top and descending to least important. The highlighted box contains the variables with the highest importance. Variables with an * are not included in the classification and regression tree.

Random forest results. Abbreviations: Dx, diagnosis; CM, comorbidity. Variables identified as contributing the most to post-transfer mortality are displayed with the most important starting at the top and descending to least important. The highlighted box contains the variables with the highest importance. Variables with an * are not included in the classification and regression tree.

Model performance

We tested the performance of our model on a holdout data set. The area under the curve was 0.69, and the Matthews correlation coefficient was 0.198. The model had a positive predictive value (PPV) of 0.291 and a negative predictive value (NPV) of 0.960. The sensitivity was 0.18 and the specificity was 0.98. As we further describe below, the aim of this model was to identify clinically meaningful rather than most accurately predict mortality post-transfer.

Discussion

This analysis identified 21 distinct groups of patients, 13 of which experienced mortality rates more than double the national average ranging from 4.7% to 5.2% post-transfer mortality.[1] In 2013, the national mortality for all-cause hospital admissions was 2%. This analysis included all patients, even patients who underwent transfer for routine procedures such as orthopedic cases or appendectomies, who were accounted for in the far left of the tree in the lowest mortality group (n = 151 464). Alternatively, the other lowest mortality group consisted those with a circulatory diagnosis and who were aged younger than 40.5 years. The left side of the tree, or the non-cardiac side, was dominated by patients with cancer, composing the second largest group of patients undergoing transfer (n = 35 020), with the highest mortality experienced by those with coagulopathy as a comorbid condition. Coagulopathy is also represented on the right side as significant contributor to increased mortality post-transfer. Of note, comorbid conditions in the AHRQ NIS are not directly related to the primary diagnosis or necessarily the main reason for admission, likely having originated before the current hospitalization, thus representing a pre-existing condition.[16] The finding that coagulopathy is a significant predictor of post-transfer mortality was surprising, but its significance is reinforced by the random forest analysis (Figure 2) and our other work looking at surgical populations.[30] Coagulopathy typically manifests as a secondary physiologic response to a primary disturbance such as cancer and trauma induced and has been found to be an independent predictor of in-hospital mortality, regardless of transfer status.[31,32] This study reinforces including coagulopathy, whether it is a comorbidity or a condition on the active problem list for the current hospitalization, as a covariate in future modeling efforts. This study identified that patients with a cardiac diagnosis and aged less than 40 years or were older than 40 years and received an operating room procedure experienced the highest survival rates post-transfer. While we cannot ascertain the specific operating room procedures performed, the high survival rates for this clinical group receiving a major therapeutic or diagnostic operating room procedure supports the role that transfer plays in improving mortality. Likely, these patients without concomitant comorbidity or other significant clinical characteristics, represent those experiencing a myocardial infarction or other time-sensitive condition that benefits from rapid transport and subsequent intervention. While the primary focus of this study was not to predict patient mortality, the methods employed identified groups of patients that experience mortality at rates two to three times higher than the expected rate of post-transfer mortality of 5% and thus provides specific groups of patients that warrant focused inquiry. Current efforts to leverage EMR data to support developing clinical decision-support systems (e.g. health system transfer command centers)[33] can benefit by initially focusing on high-risk target populations like those identified in this analysis. The random forest model identified several important variables not included in the individual tree, those being weight loss, congestive heart failure, and genitourinary conditions. The variable importance results reported in the random forest are the average results of many individual trees—many trees included the three omitted variables while others did not. Given that the CART tree represents an individual tree and sample; in this case sample, 789 out of 10 000, it is possible that variables identified in the random forest analysis are not represented in this specific tree. Omission of these variables in the individual tree can be due to the greedy splitting procedure that identifies the best split at that particular point in the tree without considering the impact on the full model. Therefore, depending on the random sample chosen to run the CART, the tree for each sample can include different variables and split points. During the analytic process, we randomly select the samples and “freeze” them, otherwise we would get a different training and testing sample each time the analysis was performed. The omission of the variables underlines the importance of running complementary or additional analyses when using atheoretical approaches. Our model had an area under the curve of 0.69, which is reasonable performance for rare and difficult events to predict like mortality. The area under the curve (AUC) is in-line with other studies that have used the Elixhauser or Charleston comorbidity indices to predict mortality that ranged between 0.65-0.80.[34,35] It is difficult to compare the performance of AUC across studies that assess different patient populations, and to our knowledge, this is the first model to predict mortality among all-diagnoses of transferred patients. Finally, employing the supervised machine-learning techniques provides distinct analytical advantages over traditional modeling techniques that we have used in past analyses. The primary advantage is the ability to assess all available covariates in every possible combination. Rather than identifying the influence of a given covariate while the others are held constant, the supervised machine-learning techniques employed allow us to test every possible combination of the covariates to identify clinically meaningful combinations and report those combinations in mutually exclusive groups capable of being easily incorporated into decision-support modeling or other approaches such as developing more precise clinical nomograms. In addition, the mutually exclusive groups provide easily recognizable patient characteristics in specific combinations that are more descriptive than the odds of change in one variable while the others are held constant. For example, our past work employing regression identified that the odds of death increased with age, with age being included in the regression via seven categories.[1] Alternatively, in CART, we are able to include age as a continuous variable and let the technique determine what the significant splits in age are for a given combination of characteristics. For example, in Figure 1, age is split five different times in the tree with each split signifying a significant difference in outcome for those patients above or below that age threshold. Attempting to identify these age categories via other approaches, would be burdensome, if achievable at all.

Limitations

Secondary analyses of existing databases present several limitations. First, we were only able to include basic demographic characteristics, the Elixhauser comorbidities, primary diagnosis via the CCS, and basic hospital descriptors. While nationally representative, the lack of rich clinical descriptors limits the depth of the analyses and applicability of the findings. Second, primary diagnosis determination is complex and is influenced by the clinical course of care as well as coding for payment. This well-known limitation has been identified by others. Third, we included all patients that were transferred between hospitals, including groups of patients that on one end would not impact overall transfer mortality rates (e.g. mental health) and, on the other end, patients who exceeded the level of care available at their current hospital (i.e. community hospital) and had to be transferred to a tertiary center. Fourth, inclusion of variables such as operating room procedure are only broad indicators of care and do not provide specificity in differentiating between normal and unexpected rates of mortality. However, the inclusion of operating procedure across the models highlights the need to conduct further in-depth investigations into specifically which transfers and corresponding procedures impart improved morbidity and mortality, highlighting a strength of this broad approach to focus future inquiry. Finally, we do not know why the patient was transferred and the elements contributing to the decision. This will be future work.

Conclusions

This study analyzed a nationally representative sample of hospital discharges to identify groups of patients who experience increased mortality after undergoing interhospital transfer. The supervised machine-learning approach implemented identified 13 distinct groups of patients who experience post-transfer mortality more than double the national average mortality of post-transfer patients. Of the 13 groups, 10 experience mortality rates of 20% or greater, identifying specific groups of patients that may benefit from being transferred sooner based on their individual characteristics. The individual characteristics identified do not necessarily fall into the currently used categories of transfer patients, supporting the reconceptualization of which patient groups should be considered for immediate transfer to another hospital. Click here for additional data file. Supplemental material, Supplementary_Material_Table_1_xyz1364584bb4390 for Applying Supervised Machine Learning to Identify Which Patient Characteristics Identify the Highest Rates of Mortality Post-Interhospital Transfer by Andrew P Reimer, Nicholas K Schiltz, Vanessa P Ho, Elizabeth A Madigan and Siran M Koroukian in Biomedical Informatics Insights Click here for additional data file. Supplemental material, Supplementary_Material_Table_2_xyz13645265ff583 for Applying Supervised Machine Learning to Identify Which Patient Characteristics Identify the Highest Rates of Mortality Post-Interhospital Transfer by Andrew P Reimer, Nicholas K Schiltz, Vanessa P Ho, Elizabeth A Madigan and Siran M Koroukian in Biomedical Informatics Insights

26 in total

Review 1. Classification and regression tree analysis in public health: methodological review and comparison with logistic regression.

Authors: Stephenie C Lemon; Jason Roy; Melissa A Clark; Peter D Friedmann; William Rakowski
Journal: Ann Behav Med Date: 2003-12

2. Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data.

Authors: Danielle A Southern; Hude Quan; William A Ghali
Journal: Med Care Date: 2004-04 Impact factor: 2.983

3. Interhospital transfer of critically ill patients: demographic and outcomes comparison with nontransferred intensive care unit patients.

Authors: Andrea D Hill; Evelyn Vingilis; Claudio M Martin; Kathleen Hartford; Kathy N Speechley
Journal: J Crit Care Date: 2007-12 Impact factor: 3.425

4. A modification of the Elixhauser comorbidity measures into a point system for hospital death using administrative data.

Authors: Carl van Walraven; Peter C Austin; Alison Jennings; Hude Quan; Alan J Forster
Journal: Med Care Date: 2009-06 Impact factor: 2.983

5. Risk factors and clinical significance of trauma-induced coagulopathy in ICU patients with severe trauma.

Authors: Shan-Xiang Xu; Lian Wang; Guang-Ju Zhou; Mao Zhang; Jian-Xin Gan
Journal: Eur J Emerg Med Date: 2013-08 Impact factor: 2.799

Review 6. Helicopter emergency medical services for adults with major trauma.

Authors: Samuel M Galvagno; Stephen Thomas; Christopher Stephens; Elliott R Haut; Jon M Hirshon; Douglas Floccare; Peter Pronovost
Journal: Cochrane Database Syst Rev Date: 2013-03-28

7. Does helicopter emergency medical service transfer offer benefit to patients with stroke?

Authors: Michael D Olson; Alejandro A Rabinstein
Journal: Stroke Date: 2011-12-08 Impact factor: 7.914

8. Impact of prehospital mode of transport after severe injury: a multicenter evaluation from the Resuscitation Outcomes Consortium.

Authors: Eileen M Bulger; Danielle Guffey; Francis X Guyette; Russell D MacDonald; Karen Brasel; Jeffery D Kerby; Joseph P Minei; Craig Warden; Sandro Rizoli; Laurie J Morrison; Graham Nichol
Journal: J Trauma Acute Care Surg Date: 2012-03 Impact factor: 3.313

Applying Supervised Machine Learning to Identify Which Patient Characteristics Identify the Highest Rates of Mortality Post-Interhospital Transfer.

Methods

Data source

Measures

Statistical analysis

Supervised machine-learning approaches

Results

Model performance

Discussion

Limitations

Conclusions

Review 1. Classification and regression tree analysis in public health: methodological review and comparison with logistic regression.

2. Comparison of the Elixhauser and Charlson/Deyo methods of comorbidity measurement in administrative data.

3. Interhospital transfer of critically ill patients: demographic and outcomes comparison with nontransferred intensive care unit patients.

4. A modification of the Elixhauser comorbidity measures into a point system for hospital death using administrative data.

5. Risk factors and clinical significance of trauma-induced coagulopathy in ICU patients with severe trauma.

Review 6. Helicopter emergency medical services for adults with major trauma.

7. Does helicopter emergency medical service transfer offer benefit to patients with stroke?

8. Impact of prehospital mode of transport after severe injury: a multicenter evaluation from the Resuscitation Outcomes Consortium.

9. Coagulopathy and inhospital deaths in patients with acute subdural hematoma.

10. Accepting critically ill transfer patients: adverse effect on a referral center's outcome and benchmark measures.

1. Factors associated with Interhospital transfers of emergency general surgery patients from emergency departments.