| Literature DB >> 32962726 |
Jin-Hyeok Park1, Jeong-Heum Baek2, Sun Jin Sym3, Kang Yoon Lee4, Youngho Lee5.
Abstract
BACKGROUND: Clinical Decision Support Systems (CDSSs) have recently attracted attention as a method for minimizing medical errors. Existing CDSSs are limited in that they do not reflect actual data. To overcome this limitation, we propose a CDSS based on deep learning.Entities:
Keywords: Chemotherapy recommendation; Colorectal Cancer; Deep learning; Knowledge-based clinical decision support system (CDSS)
Mesh:
Year: 2020 PMID: 32962726 PMCID: PMC7510149 DOI: 10.1186/s12911-020-01265-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Dataset description
| Input Variables | |
|---|---|
| Demographics | Age, Sex, ASA, BMI, Smoking History |
| Disease Characteristics | DM History, Pulmonary Disease, Liver Disease, Heart Disease, Kidney Disease |
| Cancer Characteristics | Prior Cancer Diagnosis, Initial CEA, Perforation, Obstruction, Emergency, Lymphovascular Invasion, Perineural Invasion, Distal Resection Margin, Radial Margin, Radiotherapy, Harvested Lymph Node, Positive Lymph Node, Early Complication |
| Tumor Characteristics | Hereditary Colorectal Tumor, Tumor Location (Pathology), Histologic Type, TNM Stage (Pathology) |
| Genetic Characteristics | K-ras, N-ras, BRAF |
| Treatment Characteristics | Postoperative Chemotherapy |
| Oncologic Outcomes | Overall Survival, Recurrence |
| Chemotherapy | Postoperative Chemotherapy Regimen (5-FU/LV, XELODA, FOLFOX, FOLFIRI, Surveillance) |
Fig. 1Data preprocessing and data oversampling
Fig. 2Process of the bootstrap-based oversampling algorithm (blue nodes represent the majority class, and red nodes represent the minority class of the target variable. Oversampling is limited to the minority class, i.e., the oversampled data is only added to the minority class)
Fig. 3Structure of the deep learning model for chemotherapy recommendation
Fig. 4Process of model evaluation and verification
Fig. 5Example of a CRC treatment protocol: Colon Cancer M0 Treatment Protocol (the protocol is an algorithm that is used when administering chemotherapy to patients with colorectal cancer at the Gachon Gil Medical Center. The protocol recommendations are generally divided for rectal cancer and colon cancer and for M0 and M1 cases without metastasis)
Dataset changes due to chart review and data preprocessing
| Process | Variables (+Target Classes) | Patients ( | |
|---|---|---|---|
| First CRC Dataset | 142 (+ 1) | 1511 | |
| Chart Review | 1) Check extraction method and location | 142 (+ 1) | 1508 |
| 2) Check for inappropriate data | 142 (+ 1) | 1496 | |
| 3) Select priority variables (First Processed CRC Dataset) | 40 (+ 1) | 1496 | |
| Data Preprocessing | 1) Drop redundant variables | 37 (+ 1) | 1496 |
| 2) Drop variables including 90% ↑ missing values | 32 (+ 1) | 1496 | |
| 3) Drop instances containing missing values | 32 (+ 1) | 1169 | |
| 4) One-hot encoding (Final CRC Dataset) | 54 (+ 5) | 1169 | |
| Data Split | 1) Data split (training/testing) | 54 (+ 5) | 935 / 234 |
Results of oversampling of the minor classes
| Method | Total | 5-FU/LV | XELODA | FOLFOX | FOLFIRI | Surveillance |
|---|---|---|---|---|---|---|
| Original | 1169 | 398 | 42 | 323 | 35 | 371 |
| After Oversampling | 398 | 323 | 371 |
Performance of the proposed model for each chemotherapy method
| Class | Precision | Recall | F1-score | AUC |
|---|---|---|---|---|
| 5-FU/LV | 0.99 | 0.96 | 0.97 | 0.97 |
| XELODA | 0.80 | 1.00 | 0.89 | 0.99 |
| FOLFOX | 0.95 | 0.94 | 0.95 | 0.96 |
| FOLFIRI | 0.89 | 1.00 | 0.94 | 0.99 |
| Surveillance | 1.00 | 1.00 | 1.00 | 1.00 |
| Total | 0.92 | 0.98 | 0.95 | 0.98 |
Fig. 6ROC curve and confusion matrix for evaluation of the proposed model
Comparison of the performance of the proposed model and various machine learning algorithms
| Method | Precision | Recall | F1-score | AUC |
|---|---|---|---|---|
| Proposed | ||||
| SVM | 0.80 | 0.90 | 0.89 | 0.85 |
| Decision Tree | 0.90 | 0.94 | 0.93 | 0.93 |
| K-NN | 0.82 | 0.83 | 0.80 | 0.82 |
| Random Forest | 0.91 | 0.93 | 0.92 | 0.92 |
Comparison of the Top-1 and Top-2 Accuracy between the proposed model and the GCCTP and NCCN guidelines
| Top-1 Accuracy (%) | Top-2 Accuracy (%) | |||
|---|---|---|---|---|
| GCCTP | 5-FU/LV | 55 | 23.63 | 78.18 |
| XELODA | 5 | 80.00 | 80.00 | |
| FOLFOX | 37 | 83.78 | 91.89 | |
| FOLFIRI | 4 | 0 | 75.00 | |
| Surveillance | 76 | 71.05 | 71.05 | |
| NCCN | 5-FU/LV | 59 | 47.45 | 83.05 |
| XELODA | 6 | 50.00 | 100.00 | |
| FOLFOX | 50 | 92.00 | 94.00 | |
| FOLFIRI | 5 | 100.00 | 100.00 | |
| Surveillance | 80 | 80.00 | 80.00 | |
Fig. 7Overview of the most important variables to the model