| Literature DB >> 30700290 |
Tianzhong Yang1,2, Yang Yang3, Yugang Jia1, Xiao Li1,2.
Abstract
BACKGROUND: Congestive heart failure is one of the most common reasons those aged 65 and over are hospitalized in the United States, which has caused a considerable economic burden. The precise prediction of hospitalization caused by congestive heart failure in the near future could prevent possible hospitalization, optimize the medical resources, and better meet the healthcare needs of patients.Entities:
Keywords: Claim data; Congestive heart failure; Dynamic prediction; Hospitalization; Random survival forest; Sliding window; Survival analysis
Mesh:
Year: 2019 PMID: 30700290 PMCID: PMC6354329 DOI: 10.1186/s12911-019-0734-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Descriptive Statistics of Baseline Features of CHF cohort with an index date of Feb 1, 2014
| Without CHF hospitalization | Encounter CHF hospitalization till the end of studya | Distribution Significance ( | |
|---|---|---|---|
| # of members (% in total) | 4910 (94.7%) | 273 (5.3%) | |
| Demographic Status | |||
| Gender (% of Male in total) | 43.50% | 47.60% | 0.199 |
| Ethnicity (# of members in 0/1/2/3/4/5/6 categories)c | 22/4470/261/51/65/38/3 | 0/245/20/4/2/2/0 | 0.62 |
| Age (mean age) | 80.8 | 82.6 | 0.004 |
| Beneficiary Medicare Status Code (# of members in 10/11/20/21/31/NA categories) d | 4372/141/351/31/4/11 | 236/20/13/2/1/1 | 0.0015 |
| Beneficiary Dual Status Code (# of members in 01/02/03/04/06/08/NA categories) e | 21/261/25/59/28/366/4150 | 1/16/1/3/0/13/239 | 0.62 |
| Co-morbidity Status | |||
| Has Hypertension % | 94.10% | 96.00% | 0.252 |
| Has Pulmonary circulation disorders % | 20.30% | 40.30% | 7.39 × 10–15 |
| Has Chronic pulmonary disease % | 49.60% | 69.60% | 1.86 × 10–10 |
| Has Diabetes % | 48.60% | 54.20% | 0.082 |
| Has Rheumatic arthritis/collagen vascular diseases % | 13.10% | 9.90% | 0.15 |
| Has Renal failure % | 38.00% | 57.50% | 1.76 × 10–10 |
| Has Liver disease % | 11.00% | 11.00% | 1 |
| Has Psychoses % | 12.80% | 12.80% | 1 |
| Has Depression % | 32.50% | 33.70% | 0.73 |
| Has Obesity % | 27.00% | 34.10% | 0.013 |
| # of non-cardiac co-morbidity (mean #) | 8.9 | 10.4 | 2.16 × 10–10 |
| Charlson Index Score (mean score) | 4.89 | 5.85 | 2.88 × 10–07 |
| Other Status | |||
| Distance-to-closest-healthcare-facility (mean distance in miles) | 8.53 | 10.37 | 7.85 × 10–12 |
| Past 12 months total medical charge (mean in dollar) | 23,361.56 | 32,248.25 | 0.00016 |
aif more than one admission for a member during the total study window, only count the earliest event
bT-test for continuous variable and Chi square; Fisher’s Exact test for categorical
cEthnicity code values: 0 = Unknown; 1 = White; 2 = Black; 3 = Other; 4 = Asian; 5 = Hispanic; 6 = North American Native
dIndication of the reason for a beneficiary’s entitlement to Medicare benefits as of a particular date as in the following categorie: 10 = Aged without Disabled, and End Stage Renal Disease (ESRD); 11 = Aged with ESRD; 20 = Disabled without ESRD; 21 = Disabled with ESRD; 31 = ESRD; only NA = Not Available
eIdentifies the most recent entitlement status of beneficiaries eligible for a program(s) in addition to Medicare (e.g., Medicaid). Check Dual Status Codes here: https://www.resdac.org/cms-data/variables/medicare-medicaid-dual-eligibility-code-january
Fig. 1Illustration of feature extraction. Features are extracted at or during the 3 months, 6 months, 12 months, or more prior to the index date. In our Dynamic Random Survival Forest (DRSF) model, the index date is corresponding to the beginning date of each prediction window
Features pool used in predictive modeling
| Feature Category | Features |
|---|---|
| Demographics | Age, Gender, Race. |
| Socioeconomics | Medicare status code, Beneficiary Dual Status code |
| Chronic conditiona | Any selected chronic conditions1; Count of selected chronic conditions1; Charlson Index Score [ |
| Health care serviceb | Count of a specific health care service utilization, including ED visit3, inpatient admission3, SNF stay, HHA stay and outpatient physician visit. |
| Acute exacerbation recordb | Count of ED visit or inpatient admission with selected exacerbation conditions2. |
| DME utilizationb | Any DME usage; any oxygen-related DME usage. |
| Disease-specific procedure and servicec | Any cardio echo test; any spirometry test; any general pulmonary function test. |
| Medicationd | Count of unique prescription. |
| Locatione | Most recent care location prior to admission, including home, HHA, SNF, Inpatient and Outpatient |
| Costc | Total Expenditure |
1See Additional file 1: Table S1 for chronic conditions used in CHF predictive models
2See Additional file 1: Table S2 for exacerbation conditions used in CHF predictive models
3We included both the all-cause and disease-specific ED visit/inpatient admission
aSuch features were collected during the 36 months window prior to the index date
bSuch features were collected during the 6 months window prior to the index date
cSuch features were collected during the 12 months window prior to the index date
dSuch features were collected during the 3 months window prior to the index date
eSuch features were collected during the 1 month window prior to the index date
Fig. 2Illustration of Random Survival Forest (RSF) model. Features and Samples are selected by random for each tree. Log-rank splitting is used to grow the tree. At the end of each branch, a cumulative hazard function is calculated for the selected individuals. Finally, the ensembled estimated cumulative hazard function is calculated by averaging over all the trees
Fig. 3Demonstration of the Dynamic Random Survival Forest (DRSF) Model with sliding windows. The black lines represent different subjects at risk. The red triangles represent the onset of adverse event, e.g. hospital admission. The red box represents the prediction window of interest, while the blue box represents the prediction window for model training and validation purpose corresponding to the red box. At the beginning of the red and blue windows, i.e., the index dates, features are auto-extracted for the model building and prediction (see section “Features and Outcome”). The black and the green boxes are the historical sliding windows trained and predicted following the same concepts as the blue and the red box except they were earlier on the time axis. The green boxes and red box are further combined for the ensemble estimation of the hazard function and survival function at different time points within the red box
AUC and C-statistics for DRSF model with the most recent window
| Training /Testing Window Index Date | Evaluation window Index Date | # of subjects in evaluation | # of events in evaluation | Mean time to event | AUC at the 60th Day | AUC at the 180th Day | Harrell’s C statistics | # of covariates1 |
|---|---|---|---|---|---|---|---|---|
| Feb 1,14 | Aug 1,14 | 5175 | 263 | 175.17 | 0.67a (0.61,0.73)b | 0.64 (0.60,0.67) | 0.66 (0.63,0.70) | 14 |
| Mar 1,14 | Sep 1,14 | 5143 | 250 | 175.14 | 0.67 (0.61,0.72) | 0.67 (0.64,0.71) | 0.68 (0.65,0.72) | 17 |
| Apr 1,14 | Oct 1,14 | 5069 | 247 | 175.07 | 0.70 (0.65,0.75) | 0.68 (0.65,0.71) | 0.68 (0.65,0.71) | 14 |
| May 1,14 | Nov 1,14 | 4988 | 230 | 175.31 | 0.65 (0.60,0.71) | 0.66 (0.63,0.70) | 0.69 (0.66,0.72) | 20 |
| Jun 1,14 | Dec 1,14 | 4958 | 225 | 175.30 | 0.66 (0.61,0.71) | 0.65 (0.62,0.70) | 0.69 (0.65,0.72) | 24 |
| Jul 1,14 | Jan 1,15 | 4261 | 201 | 175.20 | 0.74 (0.68,0.79) | 0.69 (0.66,0.73) | 0.71 (0.67,0.74) | 16 |
| Aug 1,14 | Feb 1,15 | 4233 | 176 | 175.70 | 0.71 (0.65,0.77) | 0.69 (0.65,0.72) | 0.71 (0.67,0.74) | 24 |
| Sep 1,14 | Mar 1,15 | 4222 | 172 | 175.85 | 0.61 (0.55,0.67) | 0.65 (0.61,0.69) | 0.68 (0.64,0.71) | 14 |
A larger AUC or C statistics represents a better model prediction performance
1Number of covariates selected by RSF model
aScore obtained with the original dataset
b95% confidence interval of the score obtained with 500 bootstrapped datasets
Fig. 4Comparison of the prediction power among DRSF model with 5 windows, Penalized Cox regression model with most recent window (1 window), and the batch-mode RSF model using the testing window from March 1st, 2015 to August 31st, 2015. The black line is the AUC curve of DRSF with 5 windows, the red line is the AUC curve of Penalized Cox regression with 1 window, and the green line is the AUC curve of batch-mode RSF model. The x-axis represents the number of days since March 1st, 2015. The y-axis represents the Area-Under-the-ROC-Curve (AUC)
AUC and C-statistics for DRSF models with different number of ensemble windows and benchmark models on the testing window covering days from March 1st, 2015 to August 31st, 2015
| Models | AUC at the 60th day | AUC at the 180th day | Harrell’s C statistics |
|---|---|---|---|
| Batch-mode RSF | 0.67a (0.61,0.72)b | 0.66 (0.63,0.70) | 0.67 (0.63,0.71) |
| Batch-mode Cox1 | 0.72 (0.65,0.76) | 0.72 (0.69,0.76) | 0.72 (0.69,0.76) |
| Cox1 with 1 window | 0.63 (0.63,0.74) | 0.70 (0.66,0.74) | 0.71 (0.67,0.74) |
| PLS with 1 window2 | NA | NA | 0.71 (0.67,0.74) |
| DRSF with 1 window | 0.61 (0.55,0.67) | 0.65 (0.61,0.69) | 0.68 (0.64,0.71) |
| DRSF with 2 windows | 0.67 (0.62,0.71) | 0.68 (0.65,0.71) | 0.70 (0.67,0.72) |
| DRSF with 3 windows | 0.69 (0.66,0.72) | 0.68 (0.63,0.72) | 0.70 (0.67,0.73) |
| DRSF with 4 windows | 0.67 (0.63,0.72) | 0.7 (0.67,0.73) | 0.70 (0.67,0.73) |
| DRSF with 5 windows | 0.71 (0.66,0.75) | 0.71 (0.68,0.74) | 0.71 (0.68,0.74) |
Note: A larger AUC or C-statistics represents a better model prediction performance
1Cox: Here we used penalized Cox proportional hazard model
2PLS with 1 window: Cumulative time-dependent AUC at a specific time point is not available in penalized logistic regression model. Thus, we put NA as Not Available here
aScore obtained with the original dataset
b95% confidence interval of the score obtained with 500 bootstrapped datasets
Fig. 5Variables selected in different testing windows with the index date ranging from August 1st, 2014 to March 1st, 2015 by the increment of a month. Highlighted block indicates a specific variable (variable name annotated along the rightmost y-axis) was selected at a specific window (window index date annotated along the x-axis) by the DRSF model with 3 windows. The leftmost y-axis represents the frequency a variable was chosen among eight available windows. The description of each variable is listed in the Additional file 1: Table S3
Fig. 6Dynamic monitoring the risk of hospitalization due to congestive health failure. The red dash line represents the instant risk (ensembled hazard rate) of a randomly selected subject who had an admission event at the 120th days in reality (marked as the red triangle on the horizontal line); the grey solid line represents the instant risk of another randomly selected subject who did not have such an event onset during the prediction window in reality