| Literature DB >> 36065327 |
Abstract
Purpose: Hospital readmission prediction uses historical patient visit data to train machine learning models to predict risk of patients being readmitted after the discharge. Data used to train models, such as patient demographics, disease types, localized distributions etc., play significant roles in the model performance. To date, many methods exist for hospital readmission prediction, but answers to some important questions still remain open. For example, how will demographics, such as gender, age, geographic, impact on readmission prediction? Do patients suffering from different diseases vary significantly in their readmission rates? What are the nationwide hospital admission data characteristics? and how do hospital speciality, ownership, and locations impact on their readmission rates? In this study, we carry systematic investigations to answer the above questions, and propose a predictive modeling framework to predict disease-specific 30-day hospital readmission.Entities:
Keywords: Classification; Disease-specific hospital readmission prediction; Ensemble learning; Nationwide readmissions database (NRD)
Year: 2022 PMID: 36065327 PMCID: PMC9439279 DOI: 10.1007/s13755-022-00195-7
Source DB: PubMed Journal: Health Inf Sci Syst ISSN: 2047-2501
Fig. 1ICD-10-CM code structure. For example, S06.0X1A code means “Concussion with loss of consciousness of 30 minutes or less, initial encounter”
Comparison between ICD-9-CM and ICD-10-CM Diagnosis Code Sets
| ICD-9-CM | ICD-10-CM |
|---|---|
| 14,025 codes | 69,823 codes |
| 3–5 characters | 3–7 characters |
| First character is alpha or numeric | First character is alpha, second character is numeric |
| Characters 2–5 are numeric | Characters 3–7 can be alpha or numeric |
| Decimal placed after the first three characters | Decimal placed after the first three characters |
| Lacks detail and laterality | Very specific and has laterality |
Example to label patient visit
| Patient Visitlink | Visit | NRD_Days ToEvent | LOS (days) | Readmission label |
|---|---|---|---|---|
| 863245 | 1 | 1034 | 3 | 1 |
| 863245 | 2 | 1053 | 2 | 0 |
| 863245 | 3 | 1097 | 4 | 0 |
A summary of NRD patient admission
| Categories | Number (%) |
|---|---|
| Effective admission total | 15,722,444 |
| 30-Day readmission | 1,834,786 (11.67%) |
| Not 30-day readmission | 13,887,658 (88.33%) |
| Unique patient total | 11,691,620 |
| Patient with single visit | 9,335,277 (79.85%) |
| Patient with multiple visits | 2,356,343 (20.15%) |
| Patient visit total | 15,722,444 |
| Male patient visits | 6,630,005 (42.17%) |
| Female patient visits | 9,092,439 (57.83%) |
Fig. 2Gender readmission rate difference with respect to different age groups
Fig. 3Readmission rate comparison with respect to different payment methods
Fig. 4Total annual hospital discharge
Fig. 5Percentage of admission with LOS 5 days
Fig. 6Average number of ICD-10-CM codes in each visit
Fig. 7Average number of ICD-10-PCS codes in each visit
Readmission distributions for the top 10 APRDRG in NRD
| Admission reason | Readmission rate (%) | Revisit rate (%) |
|---|---|---|
| Vaginal delivery | 0.048 | 0.168 |
| Septicemia & disseminated infections | 3.983 | 9.184 |
| Neonate birthwt > 2499 g, normal newborn or neonate w other problem | 0.848 | 0.847 |
| Cesarean delivery | 0.013 | 0.062 |
| Heart disease | 8.696 | 19.500 |
| Knee joint replacement | 0.392 | 5.775 |
| Other pneumonia | 1.800 | 4.654 |
| Chronic obstructive pulmonary disease(COPD) | 6.990 | 16.684 |
| Hip joint replacement | 1.088 | 5.222 |
| Cardiac arrhythmia & conduction disorders | 3.662 | 7.868 |
Readmission distributions for the top seven leading diseases of death
| Leading diseases | Readmission rate (%) | Revisit rate (%) |
|---|---|---|
| Heart disease | 8.092 | 17.873 |
| Stroke | 2.448 | 3.770 |
| Pneumonia | 1.832 | 4.738 |
| COPD | 6.990 | 16.684 |
| Cancer | 6.823 | 12.275 |
| Diabetes | 8.761 | 14.372 |
| Nephritis & nephrosis | 7.019 | 10.595 |
Fig. 8Readmission rate for leading diseases of death with respect median household incomes (ZIP 1 to 4 denotes an increasing level of incomes)
Factors of interest analyzed in NRD database
| Aspect | Factors of interest |
|---|---|
| Demographic | Gender; age; payment (insurance) |
| Hospital | Bed size; ownership |
| Disease | Disease type; ZIP code (household income) |
Features created for disease specific hospital readmission prediction
| Feature type | Feature | Description | Feature size and domain |
|---|---|---|---|
| Demographics feature | AGE | Patient’s age | |
| FEMALE | Patient’s gender (binary, ‘1’ is female) | ||
| PAY1 | Payment method | ||
| PL_NCHS | Patient’s location (based on NCHS Urban-Rural Code | ||
| ZIPINC_QRL | Estimated median house income in the patient’s zip code | ||
| RESIDENT | Patient’s location (‘1’: the patient is from same state as hospital) | ||
| Admission and discharge feature | AWEEKEND | Admission Day (‘1’: the admission day is a weekend) | |
| MONTH | Patient’s discharge month | ||
| QUARTER | Patient’s discharge quarter | ||
| DISPUNIFORM | Disposition of patients | ||
| LOS | Length of the hospital stay | ||
| ELECTIVE | Binary, ‘1’ represents elective admission | ||
| REHAB | Binary, ’1’ is rehab transfer | ||
| WEIGHT | Weight to discharges in AHA universe | ||
| CHARGES | Patient’s inpatient total charges | ||
| 1st VISIT | Binary,’1’ means the first hospital visit | ||
| Clinical feature | CCSR Code | Clinical categories | |
| Disease feature | APR−DRG | Patient admission reason | |
| RISK | The mortality risk | ||
| SEVERITY | The severity of illness | ||
| Hospital feature | BEDSIZE | Hospital bed size | |
| CONTROL | Hospital ownership | ||
| URU | Hospital urban−rural designation | ||
| AVE_CHARGE | Average charge amount per patient visit of the hospital | ||
| AVE_CM | Average number of ICD-CM per patient visit of the hospital | ||
| AVE_PCS | Average number of ICD-PCS per patient visit of the hospital | ||
| PER_LOS | Percentage admission with LOS larger than 5 days | ||
| DIS/UNI | Sample discharges/Universe discharges in NRD_STRATUM | ||
| DIS/BED | Total hospital discharges/num bed size of hospital |
Fig. 9CCSR (Clinical Classification Software Refined) code structure. For example, INJ008 code indicates Traumatic brain injury (TBI); concussion, initial encounter
An example of ICD-10-CM to CCSR mapping
| ICD-10-CM code | ICD-10-CM code description | CCSR category | CCSR description |
|---|---|---|---|
| S42022D | Displaced fracture of shaft of left clavicle, subsequent encounter for fracture with routine healing | INJ041 | Fracture of the upper limb; subsequent encounter |
| S42022G | Displaced fracture of shaft of left clavicle, subsequent encounter for fracture with delayed healing | INJ041 | Fracture of the upper limb, subsequent encounter |
| S42022K | Displaced fracture of shaft of left clavicle, subsequent encounter for fracture with nonunion | INJ041 | Fracture of the upper limb, subsequent encounter |
| S42022P | Displaced fracture of shaft of left clavicle, subsequent encounter for fracture with malunion | INJ041 | Fracture of the upper limb, subsequent encounter |
Correspondence between ICD-10-CM and CCSR Categories by Body System
| ICD-10-CM | Body system description | CCSR |
|---|---|---|
| A, B | Infectious and parasitic diseases | INF |
| C | Neoplasma | NEO |
| D | Neoplasms, blood,blood-forming organs | BLD |
| E | Endocrine, nutritional, metabolic | END |
| F | Mental and behavioral disorders | MBD |
| G | Nervous system | NVS |
| H | Eye and adnexa, ear and mastoid process | EYE/EAR |
| I | Circulatory system | CIR |
| J | Respiratory system | RSP |
| K | Digestive system | DIG |
| L | Skin and subcutaneous tissue | SKN |
| M | Musculoskeletal and connective tissue | MUS |
| N | Genitourinary system | GEN |
| O | Pregnancy, childbirth and the puerperium | PRG |
| P | Certain conditions originating in the perinatal period | PNL |
| Q | Congenital malformations, deformations and chromosomal abnormalities | MAL |
| R | Symptoms, signs and abnormal clinical and lab findings | SYM |
| S/T | Injury, poisoning, certain other consequences of external causes | INJ |
| U | no codes listed, will be used for emergency code additions | |
| V, W, | External causes of morbidity (home- | EXT |
| X, Y | care will only have to code how patient was hurt; other settings will also code where injury occurred, what activity patient was doing) | |
| Z | Factors influencing health status and contact with health services (similar to current “V-codes”) | FAC |
Fig. 10a Distributions of ICD-10-CM code of all Pneumonia disease patient visits. The x-axis denotes the ICD-10-CM codes ranked in a descending order according to their frequency. The y-axis denotes the frequency of each code in log scale. b Distributions of CCSR codes converted from ICD-10-CM codes in a. The x-axis shows the CCSR code ranked in a descending order according to their frequency. The y-axis denotes the frequency in log-scale
APR-DRG codes selected for the six studied diseases
| Disease | Components | APR-DRG | Feature |
|---|---|---|---|
| Heart &/lung transplant | 2 | 10 | |
| Major cardiothoracic repair of heart anomaly | 160 | 11 | |
| Heart | Cardiac defibrillator & heart assist implant | 161 | 12 |
| Disease | Permanent cardiac pacemaker implant w AMI, heart failure or shock | 170 | 13 |
| Perm cardiac pacemaker implant w/o AMI, heart failure or shock | 171 | 14 | |
| Heart failure | 194 | 15 | |
| Nervous system malignancy | 41 | 20 | |
| Respiratory malignancy | 136 | 21 | |
| Digestive malignancy | 240 | 22 | |
| Malignancy of hepatobiliary system & pancreas | 281 | 23 | |
| Cancer | Musculoskeletal malignancy & pathol fracture d/t muscskel malig | 343 | 24 |
| Kidney & urinary tract malignancy | 461 | 25 | |
| Malignancy, male reproductive system | 500 | 26 | |
| Uterine & adnexa procedures for ovarian & adnexal malignancy | 511 | 27 | |
| Female reproductive system malignancy | 530 | 28 | |
| Intracranial hemorrhage | 44 | 44 | |
| Stroke | CVA & precerebral occlusion w infarct | 45 | 45 |
| Nonspecific CVA & precerebral occlusion w/o infarct | 46 | 46 | |
| Pneumonia | Bronchiolitis & RSV pneumonia | 138 | 138 |
| Other pneumonia | 139 | 139 | |
| Diabetes | Diabetes | 420 | 420 |
| COPD | COPD | 130 | 30 |
Total sample number and sample ratio in six disease datasets
| Datasets | Total sample number | Negative:positive sample ratio |
|---|---|---|
| COPD | 327,269 | 10.88 |
| Heart disease | 582,058 | 10.16 |
| Cancer | 171,495 | 12.3 |
| Diabetes | 183,726 | 10.4 |
| Pneumonia | 358,001 | 7.38 |
| Stroke | 273,395 | 45 |
Fig. 11Hard voting vs. soft voting performance on all six disease-specific datasets and 12 sampling ratios. Points are color coded by different classifiers, and shape coded by different datasets. Points above diagonal lines denote hard voting outperforming soft voting, and vice versa
Fig. 12Performance comparisons using soft voting and different sampling ratios. Points are color coded by different datasets, and shape coded by different classifiers. Each curve denote one classifier’s performance on a specific dataset, using different sampling ratios
Readmission prediction performance comparisons using all samples (using soft voting and 1.1:1 sampling ratio)
| Measure | Disease | Decision tree | Random forest | Logistic regression | Gradient boosting |
|---|---|---|---|---|---|
| Accuracy | COPD | 0.4659 | 0.7301 | 0.7300 | |
| Cancer | 0.5260 | 0.7509 | 0.7536 | ||
| Diabetes | 0.6898 | 0.8163 | 0.8070 | ||
| Heart Disease | 0.4631 | 0.6983 | 0.7025 | ||
| Pneumonia | 0.5705 | 0.6964 | 0.7192 | ||
| Stroke | 0.6261 | 0.8244 | 0.8263 | ||
| F1 score | COPD | 0.1791 | 0.2414 | 0.2376 | |
| Cancer | 0.1866 | 0.2700 | 0.2814 | ||
| Diabetes | 0.3173 | 0.4119 | 0.4152 | ||
| Heart Disease | 0.1889 | 0.2350 | 0.2209 | ||
| Pneumonia | 0.2941 | 0.3607 | 0.3585 | ||
| Stroke | 0.0828 | 0.1482 | 0.1558 | ||
| AUC | COPD | 0.5957 | 0.6767 | 0.6604 | |
| Cancer | 0.6568 | 0.7527 | 0.7596 | ||
| Diabetes | 0.8113 | 0.8753 | 0.8543 | ||
| Heart Disease | 0.5958 | 0.6732 | 0.6406 | ||
| Pneumonia | 0.6919 | 0.7542 | 0.7645 | ||
| Stroke | 0.7594 | 0.8597 | 0.8484 | ||
| Balanced Accuracy | COPD | 0.5687 | 0.6303 | 0.6250 | |
| Cancer | 0.6176 | 0.6882 | 0.6979 | ||
| Diabetes | 0.7500 | 0.7906 | 0.7682 | ||
| Heart Disease | 0.5691 | 0.6168 | 0.5954 | ||
| Pneumonia | 0.6481 | 0.6894 | 0.7003 | ||
| Stroke | 0.7023 | 0.7808 | 0.7672 |
Bold-text denotes best performance on each measure-disease combination (i.e. each row)
Fig. 13Critical difference diagram of classifiers on the six disease specific hospital readmission prediction tasks (Based on results from Table 12). All plots use . The two numerical numbers inside the parentheses denote the and p values for each plot, i.e., (, p). Classifiers not significantly different, (i.e. their average ranks do not differ by CD), are grouped together with a horizontal bar