| Literature DB >> 34721656 |
Zeeshan Qureshi1, Ayesha Maqbool2, Alina Mirza3, Muhammad Zubair Iqbal4, Farkhanda Afzal5, Deborah Dormah Kanubala6, Tauseef Rana1, Mir Yasir Umair3, Abdul Wakeel3, Said Khalid Shah7.
Abstract
Public health and its related facilities are crucial for thriving cities and societies. The optimum utilization of health resources saves money and time, but above all, it saves precious lives. It has become even more evident in the present as the pandemic has overstretched the existing medical resources. Specific to patient appointment scheduling, the casual attitude of missing medical appointments (no-show-ups) may cause severe damage to a patient's health. In this paper, with the help of machine learning, we analyze six million plus patient appointment records to predict a patient's behaviors/characteristics by using ten different machine learning algorithms. For this purpose, we first extracted meaningful features from raw data using data cleaning. We applied Synthetic Minority Oversampling Technique (SMOTE), Adaptive Synthetic Sampling Method (Adasyn), and random undersampling (RUS) to balance our data. After balancing, we applied ten different machine learning algorithms, namely, random forest classifier, decision tree, logistic regression, XG Boost, gradient boosting, Adaboost Classifier, Naive Bayes, stochastic gradient descent, multilayer perceptron, and Support Vector Machine. We analyzed these results with the help of six different metrics, i.e., recall, accuracy, precision, F1-score, area under the curve, and mean square error. Our study has achieved 94% recall, 86% accuracy, 83% precision, 87% F1-score, 92% area under the curve, and 0.106 minimum mean square error. Effectiveness of presented data cleaning and feature selection is confirmed by better results in all training algorithms. Notably, recall is greater than 75%, accuracy is greater than 73%, F1-score is more significant than 75%, MSE is lesser than 0.26, and AUC is greater than 74%. The research shows that instead of individual features, combining different features helps make better predictions of a patient's appointment status.Entities:
Mesh:
Year: 2021 PMID: 34721656 PMCID: PMC8556091 DOI: 10.1155/2021/2376391
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Comparison to existing research.
| Studies | Data | Algorithm | Evaluation method | Performance | |
|---|---|---|---|---|---|
| Denney et al. [ | 7 million | Ada, LR, SVM, NB, SGD, ET, DT, XG, RF | Average recall | 68% recall | Existing model with results |
| AlMuhaideb et al. [ | 1.1 million | JRip, Hoeffding trees, LR, MP, NB | Accuracy, AUC | 77.13% accuracy, 0.86 AUC | |
| Mohammadi et al. [ | 74 thousand | LR, MP, NB | Accuracy, AUC | 82% accuracy, 0.86 AUC | |
| Daghistani et al. [ | 201 million | RF, GB, LR, SVM, MP | Accuracy, precision, recall, F1 measure, AUC | 79% accuracy, 77% precision, 79% recall, 76% | |
| Our model | 6 million | Ada, LR, SVM, NB, SGD, XG, DT, GB, RF, MP | Accuracy, precision, recall, F1 measure, AUC, MSE | 86.5% accuracy, 83% precision, 94% recall, 87% | Proposed model with result |
Useful features.
| Name | Type | Range | Description |
|---|---|---|---|
| Date of birth | Input | mm/dd/yyyy | Date of birth of patient |
| Race | Input | Like Asian, African, white | Race of patient |
| Sex | Input | Male/female/other | Sex of patient |
| Civil status | Input | Single, married, divorced, separated, widowed | Civil status |
| Admit | Input | Textual format | Reason of admission |
| Date of appointment | Input | mm/dd/yyyy date | Date of appointment |
| Status of appointment | Input | Pending, closed, canceled | Status of appointment |
| Cancel date | Input | mm/dd/yyyy date | Canceling appointment date |
| Cancel reason | Input | No-show-up, death, rescheduled, out of city | Canceling appointment reason |
| Create time | Input | Time stamp format | Time at which database entry record was inserted |
| Modified time | Input | Time stamp format | Time at which database entry record was modified |
| Procedure type | Input | Like ultrasound, office visit see table | In which procedure patient booked appointment |
| Patient age | Generated | Numeric | Created by date of birth |
| Age range | Generated | From age category | Created by patient age feature |
| Create Appt difference | Generated | No of days | Created by taking difference of appointment date and create time |
| Appointment season | Generated | Month of year | Created with the help of appointment date |
| Cancel difference | Generated | No. of days | Created by taking difference of cancel date and appointment date |
List of relevant features to predict appointment status.
| Relevance | Feature name | Information gain values |
|---|---|---|
| 1 | Procedure type | 0.0736 |
| 2 | Race category | 0.04305 |
| 3 | Civil status | 0.019836 |
| 4 | Create difference category | 0.019534 |
| 5 | Age range | 0.000890 |
| 6 | Appointment season | 0.000375 |
| 7 | Sex | 0.0001323 |
Figure 1Comparison of show and no-show appointments.
Figure 2Age-wise comparison of show and no-show appointments.
Figure 3Gender-wise comparison of show and no-show appointments.
Figure 4Procedure-wise distribution of data.
Figure 5Percentage-wise show and no-show distribution of data.
Figure 6Percentage-wise show and no-show distribution after applying SMOTE.
Threshold discriminative metric evaluation for show and no-show appointment prediction.
| Algorithm | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| Random forest | 84.96% | 80% | 93% | 86.25% |
| Decision tree | 85.18% | 82% | 90% | 86.42% |
| Logistic regression | 85.35% | 83% | 89% | 86.18% |
| XG Boost | 76.69% | 80% | 83% | 81.44% |
| Gradient boosting | 73.53% | 76% | 77% | 77.32% |
| Adaboost | 70.46% | 74% | 72% | 73.17% |
| SVM | 67.09% | 69% | 74% | 72.15% |
| Naive Bayes | 63.98% | 66% | 70% | 68.44% |
| SGD | 67.14% | 60% | 84% | 70.18% |
| Multilayer perceptron | 80.93% | 64% | 77% | 70.52% |
Show and no-show appointment prediction result by mean square error and AUC.
| Algorithm | MSE | AUC |
|---|---|---|
| Random forest | 0.1069 | 92.09% |
| Decision tree | 0.1285 | 87.13% |
| Logistic regression | 0.1273 | 87.25% |
| XG Boost | 0.1906 | 80.92% |
| Gradient boosting | 0.2330 | 76.69% |
| Adaboost | 0.2646 | 73.53% |
| SVM | 0.2953 | 70.45% |
| Naive Bayes | 0.3277 | 67.21% |
| SGD | 17.71 | 55.47% |
| Multilayer perceptron | 0.3282 | 67.14% |
Show and no-show Adasyn appointment prediction result evaluation by threshold discriminative metrics.
| Algorithm | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| Random forest | 85.26% | 78% | 93% | 85% |
| Decision tree | 86.50% | 81% | 90% | 86% |
| Logistic regression | 64.77% | 66% | 66% | 66% |
| XG Boost | 79.12% | 77% | 81% | 79% |
| Gradient boosting | 75.69% | 74% | 75% | 75% |
| Adaboost | 71.90% | 72% | 70% | 71% |
| Naive Bayes | 65.37% | 65% | 68% | 67% |
| SGD | 60.96% | 59% | 83% | 69% |
| Multilayer perceptron | 63.61% | 65% | 69% | 67% |
| SVM | 68.58% | 67% | 72% | 70% |
Show and no-show Adasyn appointment prediction result by mean square error and AUC.
| Algorithm | MSE | AUC |
|---|---|---|
| Random forest | 0.16 | 83.26% |
| Decision tree | 0.15 | 85.03% |
| Logistic regression | 0.334 | 66.51% |
| XG Boost | 0.21 | 78.43% |
| Gradient boosting | 0.255 | 74.46% |
| Adaboost | 0.279 | 71.98% |
| Naive Bayes | 0.338 | 66.20% |
| SGD | 0.339 | 63.49% |
| Multilayer perceptron | 0.3408 | 65.94% |
| SVM | 0.28 | 68.90% |
Show-no-show RUS appointment prediction result evaluation by threshold discriminative metrics.
| Algorithm | Accuracy | Precision | Recall | F1-score |
|---|---|---|---|---|
| Random forest | 85.26% | 80% | 94% | 86% |
| Decision tree | 86.50% | 83% | 92% | 87% |
| Logistic regression | 64.77% | 65% | 65% | 65% |
| XG Boost | 79.12% | 78% | 81% | 79% |
| Gradient boosting | 75.69% | 76% | 76% | 76% |
| Adaboost | 71.90% | 74% | 67% | 70% |
| Naive Bayes | 65.37% | 65% | 68% | 66% |
| SGD | 60.96% | 58% | 83% | 68.18% |
| Multilayer perceptron | 63.61% | 59% | 88% | 71% |
| SVM | 68.58% | 68% | 71% | 69% |
Show-no-show RUS appointment prediction result evaluation by mean square error and AUC.
| Algorithm | MSE | AUC |
|---|---|---|
| Random forest | 0.1473 | 85.26% |
| Decision tree | 0.135 | 86.50% |
| Logistic regression | 0.352 | 64.77% |
| XG Boost | 0.208 | 79.12% |
| Gradient boosting | 0.243 | 75.68% |
| Adaboost | 0.281 | 71.9% |
| Naive Bayes | 0.346. | 65.37% |
| SGD | 0.3546 | 60.96% |
| Multilayer perceptron | 0.3648 | 63.62% |
| SVM | 0.26. | 68.58% |
Comparison of all balanced analyzed data against random forest and decision tree.
| Measures | Algorithms | Random forest (%) | Decision tree |
|---|---|---|---|
| Accuracy (%) | SM-AD-RUS | 84.96-83.17-85.26 | 85.18-85.03-86.50 |
| Precision (%) | SM-AD-RUS | 80-78-80 | 82-81-83 |
| Recall (%) | SM-AD-RUS | 93-93-94 | 90-90-92 |
| F1-score (%) | SM-AD-RUS | 86.25-85-86 | 86.42-86-87 |
| MSE | SM-AD-RUS | 0.1069-0.16-0.1473 | 0.1285-0.15-0.135 |
| AUC (%) | SM-AD-RUS | 92.09-83.26-85.26 | 87.13-85.03-86.50 |