| Literature DB >> 33317534 |
Zhen Zhang1,2, Hang Qiu3,4, Weihao Li5,6, Yucheng Chen7,8.
Abstract
BACKGROUND: Acute myocardial infarction (AMI) is a serious cardiovascular disease, followed by a high readmission rate within 30-days of discharge. Accurate prediction of AMI readmission is a crucial way to identify the high-risk group and optimize the distribution of medical resources.Entities:
Keywords: Acute myocardial infarction; Clinical data; Hospital readmission; Machine learning; Self-adaptive; Stacking-based model learning
Year: 2020 PMID: 33317534 PMCID: PMC7734833 DOI: 10.1186/s12911-020-01358-w
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Process flow diagram of the proposed stacking model
Fig. 2Flow diagram of the selection process
Description of Clinical variables
| Category | Number of variables | Examples |
|---|---|---|
| Demographics | 8 | Sex, Age, Ethnic, Work, Marital status, Address, Physical condition of mother, Physical condition of father |
| Hospitalization information | 5 | Length of stay, The month of discharge, Payment Method, Admission condition, Admission pathway |
| Medical history | 5 | History of infection, History of trauma, History of surgery, History of allergy, History of blood transfusion |
| Past hospitalization history | 5 | Frequency of hospitalizations in the past 1 week, Frequency of hospitalizations in the past 1 month, Frequency of hospitalizations in the past 3 months, Frequency of hospitalizations in the past 6 months, Frequency of hospitalizations in the past 1 year |
| Comorbidities (ICD-10) | 25 | e.g., Hypertensive diseases (I10-I15), Diabetes mellitus (E10-E14), Renal failure (N17-N19), Malignant neoplasms (C00-C97), Diseases of liver (K70-K77), The total number of comorbidities |
| Physical examinations | 14 | e.g., Heart rate, Respiratory rate, Body temperature, Pulse, Edema, Cardiac Murmurs |
| Procedures (ICD-9-CM-3) | 11 | e.g., Procedures on blood vessels (00.6), Angiocardiography using constrast material (88.5), Intravascular imaging of blood vessels (00.2), Puncture of vessel (38.9) |
| Cost information | 17 | e.g., Total expenses, Treatment expenses, Western medicine expenses, Bed expenses, Board expenses, Surgery expenses |
| Ultrasonic examinations | 19 | e.g., Ejection Fraction, Interventricular septal thickness, Stroke volume |
| Laboratory tests | 168 | e.g., Calcium max, Calcium min, Calcium median, Hemoglobin max, Hemoglobin min, Hemoglobin median |
| Medications | 16 | e.g., β-receptor blocker, Calcium channel blockers, Angiotensin converting enzyme inhibitors, Angiotensin receptor blocks, Statins, Diuretic |
ICD-10 the 10th revision of the International Statistical Classification of Diseases, ICD-9-CM-3 International classfication of diseases clinical modification of 9th revision operations and procedures
Fig. 3a NCR treatment for the sample belongs to the majority subset; b NCR treatment for the sample belongs to the minority subset. The green ball represents the majority sample; the red ball represents the minority sample; the green triangle and the red triangle represent the majority and minority samples for analysis, respectively; the samples in the dotted ellipse circle represent the sample to be analyzed and its three closest neighbors
The parameters of the eight candidate models
| Model | Parameters |
|---|---|
| DT | |
| SVM | |
| RF | |
| ET | |
| GBDT | |
| ADB | |
| Bagging | |
| XGB |
DT Decision tree, SVM Support vector machine, RF Random forest, ET Extra trees, GBDT Gradient boosting decision tree, ADB AdaBoost, Bagging Bootstrap aggregating, XGB Extreme gradient boosting
Fig. 4a Framework of the stacking-based model; b Classifier M five-fold stacking process
The results for the eight candidate models between before and after NCR treatment
| Model | AUC | Sensitivity | ||
|---|---|---|---|---|
| Before | After | Before | After | |
| DT | 0.664 ± 0.02 | 0.287 ± 0.09 | ||
| SVM | 0.621 ± 0.01 | 0.161 ± 0.02 | ||
| RF | 0.700 ± 0.02 | 0.338 ± 0.04 | ||
| ET | 0.705 ± 0.02 | 0.289 ± 0.04 | ||
| GB | 0.698 ± 0.02 | 0.338 ± 0.04 | ||
| ADB | 0.680 ± 0.03 | 0.351 ± 0.04 | ||
| Bagging | 0.700 ± 0.02 | 0.399 ± 0.04 | ||
| XGB | 0.702 ± 0.02 | 0.371 ± 0.04 | ||
Font bold: the better values; *: there is a statistically significant difference between before and after NCR treatment (p-value < 0.05). DT Decision tree, SVM Support vector machine, RF Random forest, ET Extra trees, GBDT Gradient boosting decision tree, ADB AdaBoost, Bagging Bootstrap aggregating, XGB Extreme gradient boosting
Fig. 5Box plot of the AUC for the eight candidate models between before and after SFM. °: the outliers of box plot, *: there is a statistically significant difference between before and after SFM (p-value < 0.05). DT: decision tree; SVM: support vector machine; RF: random forest; ET: extra trees; GBDT: gradient boosting decision tree; ADB: adaBoost; Bagging: bootstrap aggregating; XGB: extreme gradient boosting
Performance comparisons of our stacking model and the eight candidate models
| Model | AUC | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| DT | 0.681 ± 0.02 | 0.768 ± 0.03 | 0.487 ± 0.06 | |
| SVM | 0.707 ± 0.03 | 0.765 ± 0.01 | 0.480 ± 0.03 | 0.808 ± 0.01 |
| RF | 0.701 ± 0.02 | 0.768 ± 0.01 | 0.502 ± 0.05 | 0.807 ± 0.01 |
| ET | 0.709 ± 0.02 | 0.760 ± 0.02 | 0.500 ± 0.03 | 0.798 ± 0.02 |
| GBDT | 0.710 ± 0.02 | 0.764 ± 0.02 | 0.501 ± 0.04 | 0.803 ± 0.02 |
| ADB | 0.702 ± 0.02 | 0.769 ± 0.03 | 0.502 ± 0.03 | 0.809 ± 0.03 |
| Bagging | 0.704 ± 0.02 | 0.769 ± 0.01 | 0.512 ± 0.03 | 0.808 ± 0.01 |
| XGB | 0.713 ± 0.02 | 0.768 ± 0.02 | 0.513 ± 0.03 | 0.806 ± 0.02 |
Font bold: the optimal values. DT Decision tree, SVM Support vector machine, RF Random forest, ET Extra trees, GBDT Gradient boosting decision tree, ADB AdaBoost, Bagging Bootstrap aggregating, XGB Extreme gradient boosting
Comparison of our study and previous works
| Author | Balance method | Feature selection | Samples | Variables | IR | Method | AUC |
|---|---|---|---|---|---|---|---|
| Krumholz [ | no | Stepwise Logistic Regression | 200,750 | 103 | 1:4.29 | LR | 0.630 |
| Yu [ | no | no | 844 | unknown | 1:3.76 | linear-SVM | 0.660 |
| Gupta [ | no | no | 7018 | 192 | 1:5.12 | GBDT | 0.641 |
| Ours | NCR | SFM | 3283 | 293 | 1:6.72 | Stacking | 0.720 |
IR Imbalance ratio, NCR Neighborhood clean rule, SFM SelectFromModel, LR Logistic regression, linear-SVM, Linear support vector machine, GBDT Gradient boosting decision tree