| Literature DB >> 24886637 |
Samir E AbdelRahman1, Mingyuan Zhang, Bruce E Bray, Kensaku Kawamoto.
Abstract
BACKGROUND: The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time.Entities:
Mesh:
Year: 2014 PMID: 24886637 PMCID: PMC4074427 DOI: 10.1186/1472-6947-14-41
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Independent explanatory variables
| Female, Male | |
| Assembly Of God, Atheist, Baptist, Buddhist, Catholic, Christian, Church Of Christ, Episcopalian, Greek Orthodox, Islamic, Jehovah's Witness, Jewish, Latter Day Saints, Lutheran, Methodist, Missing, Muslim, Native, No spiritual preference/needs, Non-Denominational, Not Verified, Pentecostal, Presbyterian, Protestant, Seventh Day Adventist, Spiritual, Not Religious, Unable To Answer, Unitarian, Other | |
| Divorced, Legally Separated, Life/Domestic Partner, Married, Single, Unknown, Widowed | |
| African American, American Indian-Alaska, Asian, Hawaiian-Pacific Islander, Missing, Others, Patient Refused, Unknown, White Or Caucasian | |
| Bone Marrow Transplant, Cardiology, Cardiothoracic Surgery, Emergency, General Surgery, Hematology/Oncology, Internal Medicine, Nephrology, Observation, Pulmonology, Rheumatology, Transplant Study, Vascular Surgery | |
| Dsch/Xfer Court/Law Enforcement, Expired, Federal Hospital, Home Health Care Svc, Home Or Self Care, Hospice/Homem, Hospice/Medical Facility, Intermediate Care Facility, Left Against Medical Advice, Long Term Care, Psychiatric Hospital, Rehab Facility, Skilled Nursing Facility | |
| Agencies, Champus, Commercial, Facility, Grants&Studies, Medicaid, Medicare, Pehp, Self Pay, UT Misc Government, UT Workers Comp, Healthcare Network | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
| Numeric values | |
Explanatory variables are noted in bold and their values are noted using a normal font. The acronyms of the variables are listed in bold parentheses. The lab variable is presented in the format of name (acronym)-measure unit. The table includes the gender, religion, discharge disposition, marital status, insurance/finance class, and hospital service. It also includes the laboratory tests, Clinical Classification Software (CCS) category for ICD-9-CM diagnoses, Charlson Index frequency (CharlsonIndexF), vital sign variables, age at admission, zip code, length of stay (LOS), and the prior 6-month variables of emergency department frequency, mean Charlson Index frequency, and mean length of stay in the emergency department.
Figure 1Diagram of the proposed general approach and its implementation in the congestive heart failure (CHF) readmission case. The upper part of the figure is the proposed general approach steps and the lower part is for their CHF readmission counterparts. The three steps of the approach, namely pre-processing, systematic model development, and risk factor analysis, are shown in italic-bold cases and are bounded by red and blue lines with shaded backgrounds for the proposed general approach and CHF readmission case study respectively.
Figure 2Frequencies of the values of gender, emergency department, marital status, and race. The datasets are from 2003–2012, 2008–2012, 2009–2012, and 2013 (6-month). The X axis stands for the values/days of the variable and the Y axis stands for the related frequencies.
Figure 3Frequencies of the values of hospital service, insurance/finance class, discharge disposition, and religion. The datasets are from 2003–2012, 2008–2012, 2009–2012, and 2013 (6-month). The X axis stands for the values of the variable and the Y axis stands for the related frequencies.
Correlation of derivation and validation datasets
| 0.099 | 0.135 | 0.175 | |
| ≤0.001 | ≤0.001 | 0.002 | |
| 0.406 | 0.499 | 0.632 | |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| ≤0.001 | ≤0.001 | 0.009 | |
| 0.023 | ≤0.001 | ≤0.001 | |
| (64.39/15.8) | (64.38/15.89) | (63.79/15.82) | |
| 0.011 | ≤0.001 | ≤0.001 | |
| | (93.2/205.64) | (93.65/206.78) | (95.73/196.36) |
| 0.011 | ≤0.001 | ≤0.001 | |
| | (65650/19673.84) | (65630/19599.2) | (65580/19612) |
| 0.02 | 0.01 | 0.01 | |
| 0.19 | 0.236 | 0.43 | |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| | (7.46/9.16) | (7.56/9.33) | (7.73/196.36) |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| | (4.87/2.84) | (4.87/2.83) | (4.92/2.89) |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| (0.062-4.94)/ | (0.065-5.01)/ | (0.069-5.03)/ | |
| (0.270-2.55) | (0.277-2.55) | (0.287-2.6) | |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| | (0.025-1430)/ | (0.026-1456)/ | (0.026-1446)/ |
| (0.047-130.57) | (0.047-130.41) | (0.049-132.82) | |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| (110.5/21.65) | (110.7/21.4) | (110.7/21.4) | |
| (79.71/15.92) | (79.88/15.98) | (79.77/15.91) | |
| (92.57/31.34) | (92.57/31.57) | (93.18/31.35) | |
| (89.93/42.47) | (90.1/43.23) | (90.88/44.57) | |
| | | | |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| (2.99/7.5) | (3.04/7.615) | (3.08/7.7) | |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| ≤0.001 | ≤0.001 | ≤0.001 | |
| | (0.16/0.17) | (0.15/0.13) | (0.15/0.13) |
| 608/141 | 597/132 | 471/121 | |
| 0.0004 | 0.0002 | 0.0001 |
For each numeric variable, both the p-value and the related mean and standard deviation are provided. For laboratory tests and 18 CCS categories, the ranges of means with their associated standard deviations are listed. All variable acronyms are from Table 1.
Performance characteristics of final models
| | | | | | | |
| 80.3 | 87.23 | 43.8 | 91.3 | 31.82 | 94.58 | |
| 83 | 85.64 | 43.8 | 89.5 | 28 | 94.48 | |
| | | | | | | |
| | | | | | | |
| 83.5 | 86.7 | 62.5 | 89 | 38.46 | 96.23 | |
| 85 | 86.17 | 56.3 | 89 | 32.14 | 95.63 |
The models were developed using the strategy outlined in Figure 1. The results are shown for the three derivation datasets: 2003-2012, 2008-2012, and 2009-2012 without/with discretization; Sens. = sensitivity, spec. = specificity, PPV = positive predictive value, NPV = negative predictive value. The best results are indicated in bold rows.
Figure 4AUC of the highest performing model and its component classifiers. The derivation dataset was from 2008–2012 and 47 independent variables were used along with discretization.
Final status of categorical and accumulated discretized explanatory variables
| | | | |||||
| | | | | ||||
| | | | |||||
| | | | | ||||
| 0.073 | White Or Caucasian | 401 | 95 | | | | |
| | Others | 156 | 37 | | | | |
| 0.85 | Latter Day Saints | 201 | 51 | | | | |
| | Others | 356 | 81 | | | | |
| ≤0.001 | Home Or Self Care | 348 | 77 | 2 | 1 | 1 | |
| | Others | 209 | 55 | | | | |
| ≤0.001 | Cardiology | 366 | 72 | 8 | 2 | 7 | |
| Others | 191 | 60 | | | | ||
| 0.12 | Medicare | 347 | 73 | | | | |
| | Others | 210 | 59 | | | | |
| 0.001 | 1 | 250 | 41 | 13 | 6 | 12 | |
| | Others | 307 | 91 | | | | |
| ≤0.001 | ≤ 83.5 | 475 | 129 | 3 | 8 | 5 | |
| >83.5 | 82 | 3 | | | | ||
| ≤0.001 | ≤ 5 | 330 | 68 | | | | |
| >5 | 227 | 64 | | | | ||
| ≤0.001 | ≤ 44 | 407 | 86 | | | | |
| >44 | 150 | 46 | |||||
| ≤0.001 | | | | ||||
| ≤0.001 | ≤0.235 | 382 | 110 | 11 | 13 | 13 | |
| >0.235 | 175 | 22 |
Variables included as risk factors in the final model are indicated in non-italic font, and variables excluded from the model are indicated in italics. The dataset used is the 2008–2012 derivation dataset. The count under Yes indicates the frequency of the variable value when readmissions occurred, and the coun under No indicates the frequency of the variable value when readmissions did not occur. G/W, Info and Sym are the relative ranks of the most significant variables (relative weight ≥ 0.001) using the GainRatioAttributeEval/Wrapper, InfoGainAtttributeEval, and SymmetricalUncertAttrbuteEval ranking strategies, respectively. G/W combines two strategies, because the relative ranks were equivalent using both approaches.
Final status of numeric explanatory variables- laboratory variables
| <0.0001 | 15.86 | 2.26 | 1 | 9 | 4 | |
| | | | ||||
| | | | ||||
| <0.0001 | 136.2 | 4.54 | | | | |
| <0.0001 | 1.52 | 1.05 | | | | |
| <0.0001 | 4.13 | 0.52 | | | | |
| <0.0001 | 11.73 | 2.42 | 5 | 3 | 2 | |
| <0.0001 | 8.91 | 1.08 | | | | |
| <0.0001 | 250 | 114.28 | | | | |
| <0.0001 | 29.23 | 3 | | | | |
| <0.0001 | 89.61 | 7.27 | | | | |
| <0.0001 | 4.04 | 0.85 | 5 | 7 | 8 | |
| <0.0001 | 36 | 7.28 | 6 | 4 | 3 | |
| | | | ||||
| <0.0001 | 32.6 | 1.52 | | | | |
| | | | ||||
| <0.0001 | 0.44 | 0.31 | 4 | 6 | 7 | |
| <0.0001 | 2.38 | 2.4 | | | | |
| <0.0001 | 18.11 | 9.01 | | | | |
| <0.0001 | 6.5 | 2.14 | | | | |
| | | | ||||
| <0.0001 | 0.02 | 0.04 | | | | |
| <0.0001 | 1.42 | 0.74 | | | | |
| <0.0001 | 0.53 | 0.23 | | | | |
| <0.0001 | 0.18 | 0.19 | | | | |
| <0.0001 | 6.73 | 0.83 | | | | |
| <0.0001 | 1.09 | 2.19 | | | | |
| <0.0001 | 3.49 | 0.55 | | | | |
| | | | ||||
| | | | ||||
| | | | ||||
| | | | ||||
| <0.0001 | 4.66 | 2.64 | 9 | 11 | 9 | |
| | | | ||||
| <0.0001 | 0.35 | 0.69 | | | | |
| <0.0001 | 0.35 | 0.66 | 12 | 10 | 11 | |
| <0.0001 | 110.8 | 21.08 | | | | |
| <0.0001 | 80.16 | 15.78 | | | | |
| <0.0001 | 91.36 | 31.92 | | | | |
| <0.0001 | 88.87 | 46.21 | | | | |
| <0.0001 | 3.34 | 8.06 | 10 | 12 | 10 |
Variables included as risk factors in the final model are indicated in non-italic font, and variables excluded from the model are indicated in italics. The dataset used is the 2008–2012 derivation dataset. G/W, Info and Sym are the relative ranks of the most significant variables (relative weight ≥ 0.001) using the GainRatioAttributeEval/Wrapper, InfoGainAtttributeEval, and SymmetricalUncertAttrbuteEval ranking strategies, respectively. G/W combines two strategies, because the relative ranks were equivalent using both approaches. All ranked variables are listed with their ranges.
Figure 5Readmissions stratified according to discretized mean family income and proximity. The X axis stands for the intervals of the variable and the Y axis stands for the related frequencies.
Figure 6Readmissions stratified according to discretized age, length of stay (LOS) and prior 6-month mean emergency department length of stay (PriorEDLOS). The X axis stands for the values/intervals of the variable and the Y axis stands for the related frequencies.