| Literature DB >> 35503455 |
Robert T van Kooten1, Renu R Bahadoer1, Bouwdewijn Ter Buurkes de Vries2, Michel W J M Wouters2,3, Rob A E M Tollenaar1, Henk H Hartgrink1, Hein Putter2, Johan L Dikken1.
Abstract
BACKGROUND AND OBJECTIVES: With the current advanced data-driven approach to health care, machine learning is gaining more interest. The current study investigates the added value of machine learning to linear regression in predicting anastomotic leakage and pulmonary complications after upper gastrointestinal cancer surgery.Entities:
Keywords: cancer; complications; machine learning; mortality; risk factors; upper gastrointestinal surgery
Mesh:
Year: 2022 PMID: 35503455 PMCID: PMC9544929 DOI: 10.1002/jso.26910
Source DB: PubMed Journal: J Surg Oncol ISSN: 0022-4790 Impact factor: 2.885
Explanation of the models used.
| Logistic regression |
| Describes the relationship between a discrete binary outcome and one or several predictor variables. The outcome is expressed as the log odds of one class over the other. This can be transformed into odds or probabilities. |
| Lasso regression |
| The difference between the Logistic regression model and the Lasso model is that the Lasso model can exclude coefficients that have little weight in the solution. This may increase interpretability. |
|
|
| Predicts new instances of a class by looking at |
| Neural Networks (NNs) |
| The inspiration for NN comes from the architecture of the human brain. The idea is that artificial neurons send the next neuron a signal based on the input they are receiving. A network of artificial neurons is called a NN. A NN consists of layers. The first is an input layer (the predictor variables), followed by one or more hidden layers (the artificial neurons) and finally resulting in an output layer (the prediction). For each outcome in the data, a NN is fit. |
| Support Vector Machine (SVM) |
| A classification (and regression) algorithm that can classify nonlinearly separable data by constructing a hyperplane (or a set of hyperplanes) in high dimensional space. An SVM tries to find a hyperplane that best separates two groups. This is the hyperplane whose distance to the nearest element of each class is the largest. For data that is not linearly separable the kernel trick is used. This is a method of adding dimensions to the data while at the same time keeping the calculations feasible. For each outcome, a |
| Random Forest |
| A Random Forest is an ensemble of decision trees. The model is trained with a technique called bootstrap aggregation (bagging). Bagging reduces variance and avoids overfitting in ensemble methods. With this technique, many bootstrap samples are taken and a decision tree is trained on each sample. The outcome of all trees together is averaged, which leads to the final outcome. For each outcome, a Random Forest is trained. |
| Adaboost |
| Boosting is similar to a Random Forest. The main differences are that the trees are now built sequentially and the results are averaged along the way. Boosting is an ensemble method that combines weak classifiers to output a single strong predicted response. The technique is considered to be an improvement over Random Forests on some occasions. For each outcome, an Adaboost.m1 model is trained. |
| Super Learner |
| The Super Learner finds an optimally weighted combination of candidate learners. The candidate learners can be any prediction algorithm. The Super Learner itself is a prediction algorithm as well. The performance of the candidate learners is assessed by cross‐validation. For each outcome, a Super Learner model is trained. The candidate learners consist of all models mentioned above. With the exception of Adaboost.m1, which is replaced by XGBoost (an alternative boosting algorithm). |
Abbreviation: Lasso, Least Absolute Shrinkage and Selection Operator.
Figure 1Patient selection.
Clinical characteristics.
| Esophageal cancer resection ( | Gastric cancer resection ( | |
|---|---|---|
|
| 66 (59–71) [19–89] | 70 (62–77) [22–92] |
|
| ||
| Male | 3272 (77%) | 1366 (62%) |
| Female | 956 (23%) | 833 (38%) |
|
| ||
| <20 | 276 (7%) | 170 (8%) |
| 20–24 | 1614 (38%) | 949 (43%) |
| 25–29 | 1663 (39%) | 789 (36%) |
| ≥30 | 675 (16%) | 291 (13%) |
|
| ||
| None | 998 (24%) | 433 (20%) |
| Yes | 3229 (76%) | 1764 (80%) |
|
| 978 (30%) | 676 (31%) |
|
| 639 (20%) | 375 (17%) |
|
| 769 (18%) | 352 (16%) |
|
| 173 (5%) | 154 (7%) |
|
| 1 (<1%) | 2 (<1%) |
|
| ||
| None | 1138 (27%) | 639 (29%) |
| 1–5 kg | 1184 (28%) | 543 (25%) |
| 6–10 kg | 891 (21%) | 473 (22%) |
| 11–15 kg | 279 (7%) | 146 (7%) |
| 16–20 kg | 106 (3%) | 55 (3%) |
| 21–35 kg | 56 (1%) | 24 (1%) |
| Unknown | 574 (14%) | 319 (15%) |
|
| ||
| No | 2943 (70%) | 1313 (40%) |
| Yes | 1276 (30%) | 882 (40%) |
| Unknown | 9 | 4 |
|
| ||
| Adenocarcinoma | 3383 (80%) | 2195 (>99%) |
| Squamous cell carcinoma | 845 (20%) | 4 (<1%) |
|
| ||
| Transhiatal | 1395 (33%) | ‐ |
| Transthoracic | 2833 (67%) | ‐ |
|
| 1353 (48%) | |
| Subtotal gastrectomy | ‐ | 1275 (58%) |
| Total gastrectomy | ‐ | 924 (42%) |
|
| ||
| Stage 0 | 6 (<1%) | 16 (1%) |
| Stage I | 566 (13%) | 465 (21%) |
| Stage II | 1116 (26%) | 842 (38%) |
| Stage III | 2155 (51%) | 185 (8%) |
| Stage IV | 40 (1%) | 39 (2%) |
| Stage X | 345 (8%) | 652 (30%) |
|
| ||
| None | 314 (7%) | 848 (39%) |
| Chemotherapy | 286 (7%) | 1316 (60%) |
| Chemoradiotherapy | 3628 (86%) | 35 (2%) |
|
| ||
| I | 712 (17%) | 305 (14%) |
| II | 2592 (61%) | 1237 (56%) |
| III | 908 (22%) | 639 (29%) |
| IV | 16 (<1%) | 18 (1%) |
|
| ||
| No | 4093 (97%) | 2118 (96%) |
| Yes | 107 (3%) | 46 (2%) |
| Unknown | 28 (1%) | 35 (2%) |
Thoracic and/or abdominal surgeries.
Figure 2Type of complications after surgery after (A) esophagectomy and (B) gastrectomy.
Figure 3(A) Anastomotic leakage after esophageal cancer resection. (B) Pulmonary complications after esophageal cancer resection. AUC, area under the curve; CI, confidence interval; KNN, k‐Nearest Neighbors; Lasso, Least Absolute Shrinkage and Selection Operator; SVM, Support Vector Machine.
Figure 4Anastomotic leakage after gastric cancer resection. AUC, area under the curve; CI, confidence interval; KNN, k‐Nearest Neighbors; Lasso, Least Absolute Shrinkage and Selection Operator; SVM, Support Vector Machine.