| Literature DB >> 34330267 |
Yujuan Shang1,2, Kui Jiang1, Lei Wang1, Zheqing Zhang1, Siwei Zhou1, Yun Liu3,4, Jiancheng Dong1, Huiqun Wu5.
Abstract
BACKGROUND AND OBJECTIVES: Diabetes mellitus is a major chronic disease that results in readmissions due to poor disease control. Here we established and compared machine learning (ML)-based readmission prediction methods to predict readmission risks of diabetic patients.Entities:
Keywords: Diabetes; Machine learning; Prediction model; Readmission
Year: 2021 PMID: 34330267 PMCID: PMC8323261 DOI: 10.1186/s12911-021-01423-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The distribution of gender (up) and race (down) in the dataset
Fig. 2Basic characteristics of numerical parameters in records
Fig. 3Gender distribution of medication use
Fig. 4Length of hospital stay among patients at different ages
Fig. 5Length of hospital stay among patients with different races
Missing attribute values in the dataset
| Attribute | Type | Description | Missing rate% |
|---|---|---|---|
| Race | Nominal | Ethnicity, including Caucasian, Asian, African American, Hispanic, and others | 2 |
| Weight | Numeric | Weight (pounds) | 97 |
| Payer code | Nominal | Integer identifiers corresponding to 23 different values | 52 |
| Medical specialty | Nominal | Doctor professionals, such as internal medicine, surgery, and family doctors | 53 |
| Diagnosis 1 | Nominal | Initial diagnosis (coded as the first three digits of ICD-9), a total of 848 different values | 0.5 |
| Diagnosis 2 | Nominal | Secondary diagnosis (coded as the first three digits of ICD-9), a total of 923 different values | 0.5 |
| Diagnosis 3 | Nominal | Additional secondary diagnosis (coded as the first three digits of ICD-9) for a total of 954 different values | 1 |
Readmission distribution in the dataset
| Readmission (days) | Description | Number | Percentage |
|---|---|---|---|
| < 30 | Within 30 days after discharge | 11,250 | 11.22 |
| > 30 | More than 30 days later | 35,173 | 35.09 |
| No | No readmission | 53,821 | 53.69 |
Fig. 6Workflow for building prediction models in KNIME
The performance of the different prediction models on T2D readmission
| Groups | Models | Avg. AUC |
|---|---|---|
30 days readmission (over-sampling) | Random Forest | 0.64 |
| Naive Bayes | 0.619 | |
| Tree Ensemble | 0.634 | |
30 days readmission (down-sampling) | Random Forest | 0.661 |
| Naive Bayes | 0.633 | |
| Tree Ensemble | 0.659 | |
| Future readmission | Random Forest | 0.686 |
| Naive Bayes | 0.652 | |
| Tree Ensemble | 0.685 |
Fig. 7The AUC diagram of future readmission risk model based on RF, NB and TE algorithms
Fig. 8Importance of features included in future readmission prediction models