| Literature DB >> 35226196 |
Shamsul Masum1, Adrian Hopgood2, Samuel Stefan3, Karen Flashman3, Jim Khan3,4.
Abstract
Data analytics and artificial intelligence (AI) have been used to predict patient outcomes after colorectal cancer surgery. A prospectively maintained colorectal cancer database was used, covering 4336 patients who underwent colorectal cancer surgery between 2003 and 2019. The 47 patient parameters included demographics, peri- and post-operative outcomes, surgical approaches, complications, and mortality. Data analytics were used to compare the importance of each variable and AI prediction models were built for length of stay (LOS), readmission, and mortality. Accuracies of at least 80% have been achieved. The significant predictors of LOS were age, ASA grade, operative time, presence or absence of a stoma, robotic or laparoscopic approach to surgery, and complications. The model with support vector regression (SVR) algorithms predicted the LOS with an accuracy of 83% and mean absolute error (MAE) of 9.69 days. The significant predictors of readmission were age, laparoscopic procedure, stoma performed, preoperative nodal (N) stage, operation time, operation mode, previous surgery type, LOS, and the specific procedure. A BI-LSTM model predicted readmission with 87.5% accuracy, 84% sensitivity, and 90% specificity. The significant predictors of mortality were age, ASA grade, BMI, the formation of a stoma, preoperative TNM staging, neoadjuvant chemotherapy, curative resection, and LOS. Classification predictive modelling predicted three different colorectal cancer mortality measures (overall mortality, and 31- and 91-days mortality) with 80-96% accuracy, 84-93% sensitivity, and 75-100% specificity. A model using all variables performed only slightly better than one that used just the most significant ones.Entities:
Keywords: Artificial intelligence; Colorectal cancer; Colorectal surgery; Data analytics; Length of stay; Machine learning; Mortality; Prediction; Predictor variables; Readmission
Year: 2022 PMID: 35226196 PMCID: PMC8885960 DOI: 10.1007/s12672-022-00472-7
Source DB: PubMed Journal: Discov Oncol ISSN: 2730-6011
Available variables in the dataset
| Sex | Complication (y/n) | pN stage |
| ASA | Additional procedures | Resection margin |
| Age | Stoma formation | Radio therapy |
| BMI | Robotic (y/n) | Chemo therapy |
| Cancer site | Surgical approach | misc_info |
| TumICD10 | Laparoscopic type | Mortality |
| Preoperative T stage | Curative surgery | Death_date |
| Preoperative nodal stage | Operation time | Death_check |
| Preoperative M stage | Blood loss | Private_pt |
| Previous abdominal surgery | LOS | Local recurrence date |
| Previous surgery type | Readmit < 31 days | Distant recurrence date |
| Operation mode | REOP < 31 days | Distant recurrence days |
| Resection (y/n) | Complication misc | Local recurrence days |
| Procedure type (4 classes) | LN harvest | Mortality < 31 days |
| OPCS4 | LN positive | Mortality < 91 days |
| Specific procedure (17 classes) | pT stage |
Classes associated with variables in the dataset
| Variables | Class | |
|---|---|---|
| Procedure types | (1) Closed without procedure | |
| (2) Stoma only | ||
| (3) Bypass/stent | ||
| (4) Excision | ||
| Specific procedure | (1) Right hemicolectomy | (10) TEMS |
| (2) Extended right hemicolectomy | (12) Polypectomy | |
| (3) Transverse colectomy | (11) Stent | |
| (4) Left hemicolectomy | (13) Laparotomy only | |
| (5) Sigmoid colectomy | (14) Laparoscopy only | |
| (6) Anterior resection | (15) Stoma only | |
| (7) APER | (16) Other | |
| (8) Hartmann’s procedure | (17) Bypass | |
| (9) Trans anal resection of tumour | ||
| Laparoscopic type | (1) Laparoscopic completed | |
| (2) Laparoscopy converted to open | ||
| Surgical approach | (1) Open operation | |
| (2) Laparoscopic operation | ||
| Robotic | (1) Yes | |
| (2) No | ||
| Operation mode | (1) Elective | |
| (2) Emergency | ||
| Complication | (1) Yes | |
| (2) No | ||
Statistics of selected variables from the dataset
| Descriptive statistics | Age | Operation time | Blood loss | BMI | LOS | Mortality days |
|---|---|---|---|---|---|---|
| Mean/average | 70.24 | 181.18 | 66.16 | 26.85 | 11.26 | 1131.32 |
| Standard deviation | 11.61 | 93.15 | 78.89 | 4.26 | 12.12 | 1152.21 |
| Minimum value | 24.00 | 0.00 | 0.00 | 13.50 | 0.00 | 0.00 |
| 25th percentile | 63.00 | 160.00 | 50.00 | 25.00 | 5.00 | 245.25 |
| 50th percentile | 72.00 | 181.18 | 66.16 | 26.85 | 8.00 | 724.50 |
| 75th percentile | 79.00 | 215.00 | 66.16 | 28.00 | 13.00 | 1654.00 |
| Maximum value | 97.00 | 690.00 | 1200.00 | 78.50 | 252.00 | 5542.00 |
Alternative techniques to fill missing values with medical domain knowledge
| Variables | No of missing values | Methods to fill missing values |
|---|---|---|
| Sex | 0 | – |
| ASA | 593 | Mode |
| LOS | 179 | Mean |
| Operation mode | 79 | Min |
| Curative surgery | 265 | Min |
Fig. 1Comparison of algorithms for modelling the data
Tuning the parameters of the SVR algorithm
| Parameters | Range | Best parameters |
|---|---|---|
| kernel | [‘linear’, ‘poly’, ‘rbf’] | rbf |
| C | [1, 5, 10, 15] | 10 |
| degree | [1, 2, 3] | 1 |
| gamma | [‘scale’, ‘auto’] | scale |
| coef0 | [0, .01, 0.1] | 0.01 |
| epsilon | [0.1, 0.5, 0.9] | 0.9 |
Fig. 2LOS prediction: a represents the actual versus predicted LOS when considering all variables and b represents the actual versus predicted LOS when considering only selected variables
Fig. 3Data analysis of LOS
Prediction of readmission with different algorithms
| Algorithm | Accuracy | Sensitivity | Specificity |
|---|---|---|---|
| Random Forest | 0.768 | 0.753 | 0.781 |
| KNN | 0.675 | 0.619 | 0.728 |
| SVM | 0.681 | 0.482 | 0.878 |
| MLP | 0.740 | 0.725 | 0.757 |
| BI-LSTM | 0.875 | 0.841 | 0.909 |
Fig. 4Readmission prediction: a represents the ROC CURVE when considering all variables and b represents the ROC CURVE when considering only selected variables
Fig. 5Data analysis of readmission
Predictions of mortality, 31 days mortality, and 91 days mortality with different algorithms
| Algorithms | Mortality | 31 days mortality | 91 days mortality | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | |
| Random Forest | 0.701 | 0.590 | 0.812 | 0.989 | 0.979 | 1 | 0.879 | 0.894 | 0.864 |
| KNN | 0.657 | 0.595 | 0.719 | 0.818 | 0.877 | 0.760 | 0.798 | 0.852 | 0.744 |
| SVM | 0.635 | 0.679 | 0.590 | 0.760 | 0.918 | 0.699 | 0.743 | 0.898 | 0.589 |
| MLP | 0.740 | 0.778 | 0.757 | 0.900 | 0.911 | 0.910 | 0.852 | 0.871 | 0.833 |
| BI-LSTM | 0.800 | 0.843 | 0.753 | 0.966 | 0.937 | 1 | 0.942 | 0.914 | 0.963 |
Fig. 6Prediction of 91 days mortality: a represents the ROC CURVE when considering all variables and b represents the ROC CURVE when considering only selected variables
Fig. 7Data analysis of mortality days
Comparison between the use of all variables versus selected variables in LOS prediction
| Evaluation metrics | All variables as feature (27) | Selected variables as feature (10) |
|---|---|---|
| RMSE | 12.52 | 12.79 |
| MAE | 9.69 | 10.32 |
| Accuracy | 83.21 | 82.61 |
Comparison between the use of all variables versus selected variables in readmission prediction
| Evaluation metrics | All variables as feature (28) | Selected variables as feature (9) |
|---|---|---|
| Accuracy | 87.5 | 83.7 |
| Sensitivity | 84.1 | 76.1 |
| Specificity | 90.9 | 90.9 |
Comparison between the use of all variables versus selected variables in mortality prediction
| Evaluation metrics | All variables as feature (29) | Selected variables as feature (10) |
|---|---|---|
| Accuracy | 80.0 | 78.4 |
| Sensitivity | 84.3 | 79.0 |
| Specificity | 75.3 | 77.8 |
Comparison between the use of all variables versus selected variables in 31 days mortality prediction
| Evaluation metrics | All variables as feature (29) | Selected variables as feature (10) |
|---|---|---|
| Accuracy | 96.6 | 96.6 |
| Sensitivity | 93.7 | 93.9 |
| Specificity | 100.0 | 100.0 |
Comparison between all variables versus selected variables in 91 days mortality prediction
| Evaluation metrics | All variables as feature (29) | Selected variables as feature (10) |
|---|---|---|
| Accuracy | 94.2 | 94.1 |
| Sensitivity | 91.4 | 87.2 |
| Specificity | 96.3 | 100.0 |