| Literature DB >> 35954406 |
Hae-Ryong Yun1, Cheal Wung Huh1, Da Hyun Jung2, Gyubok Lee3, Nak-Hoon Son4, Jie-Hyun Kim5, Young Hoon Youn5, Jun Chul Park2, Sung Kwan Shin2, Sang Kil Lee2, Yong Chan Lee2.
Abstract
Non-curative resection (NCR) of early gastric cancer (EGC) after endoscopic submucosal dissection (ESD) can increase the burden of additional treatment and medical expenses. We aimed to develop a machine-learning (ML)-based NCR prediction model for EGC prior to ESD. We obtained data from 4927 patients with EGC who underwent ESD between January 2006 and February 2020. Ten clinicopathological characteristics were selected using extreme gradient boosting (XGBoost) and were used to develop a ML-based model. Dataset was divided into the training and internal validation sets and verified using an external validation set. Sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) were evaluated. The performance of each model was compared by using the Delong test. A total of 1100 (22.1%) patients were identified as being treated non-curatively with ESD. Seven ML-based NCR prediction models were developed. The performance of NCR prediction was highest in the XGBoost model (AUROC, 0.851; 95% confidence interval, 0.837-0.864). When we compared the prediction performance by the Delong test, XGBoost (p = 0.02) and support vector machine (p = 0.02) models showed a significantly higher performance among the NCR prediction models. We developed an ML model capable of accurately predicting the NCR of EGC before ESD. This ML model can provide useful information for decision-making regarding the appropriate treatment of EGC before ESD.Entities:
Keywords: early gastric cancer; machine learning; non-curative resection; prediction
Year: 2022 PMID: 35954406 PMCID: PMC9367410 DOI: 10.3390/cancers14153742
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1Flow diagram of study design. Abbreviations used are EGC, early gastric cancer; ESD, endoscopic submucosal dissection; LGD, low-grade dysplasia; HGD, high-grade dysplasia; and SET, subepithelial tumor.
Clinical variables used to build machine-learning models.
| Category | Variables | |
|---|---|---|
| Demographics | Sex, Age, Antithrombotics | |
| Endoscopy | Appearance | Elevated, Flat, Depressed |
| Finding | Ulcer, Fold, Erythema, Exudate, Whitish or atrophy, Nodularity or elevated, Spontaneous bleeding | |
| Tumor | Size | Size (mm) |
| Number | 1, 2, or >2 | |
| Location (long axis) | Upper, Middle, Lower | |
| Location (short axis) | Anterior wall, Posterior wall, | |
| Histology | Adenocarcinoma well-differentiated, | |
Figure 2Development of non-curative resection after endoscopic submucosal dissection prediction model.
Baseline characteristics of the internal and external data set.
| Variables | Overall | Internal Data Set | External Data Set | |
|---|---|---|---|---|
| Demographics | ||||
| Age, years | 64.4 ± 10.2 | 64.7 ± 10.1 | 62.4 ± 11.2 | <0.001 |
| Male | 3620 (73.5) | 3240 (73.7) | 151 (71.6) | 0.29 |
| Medications | ||||
| Antithrombotics | 923 (18.7) | 830 (18.9) | 93 (17.5) | 0.45 |
| Histology | ||||
| AWD | 1565 (31.8) | 1425 (32.4) | 140 (26.4) | <0.001 |
| AMD | 1228 (24.9) | 1109 (25.2) | 119 (22.4) | 0.16 |
| APD | 145 (2.9) | 113 (2.6) | 32 (6.0) | <0.001 |
| SRC | 250 (5.1) | 210 (4.8) | 40 (7.5) | <0.001 |
| Other | 1739 (35.2) | 1539 (35.0) | 200 (37.7) | 0.23 |
| Multiple lesions | ||||
| 1 | 4280 (86.9) | 3785 (86.1) | 495 (93.2) | 0.01 |
| 2 | 561 (11.4) | 532 (12.1) | 29 (5.5) | <0.001 |
| >2 | 647 (13.1) | 611 (13.9) | 36 (6.8) | <0.001 |
| Tumor location (long axis) | ||||
| Upper | 495 (10.0) | 441 (10.0) | 54 (10.2) | 0.92 |
| Mid | 1681 (34.1) | 1463 (33.3) | 218 (41.1) | <0.001 |
| Lower | 2635 (53.5) | 2369 (53.9) | 266 (50.1) | 0.10 |
| Tumor location (short axis) | ||||
| AW | 1050 (21.3) | 927 (21.1) | 123 (23.2) | 0.27 |
| PW | 1189 (24.1) | 1070 (24.3) | 119 (22.4) | 0.33 |
| LC | 1838 (37.3) | 1607 (36.6) | 231 (43.5) | <0.001 |
| GC | 1023 (20.8) | 904 (20.6) | 119 (22.4) | 0.32 |
| Tumor size (mm) | 13.1 ± 9.2 | 12.4 ± 8.4 | 20.0 ± 12.7 | <0.001 |
| Endoscopic appearance | ||||
| Elevated | 3253 (66.0) | 2976 (67.7) | 277 (52.2) | <0.001 |
| Flat | 1341 (27.2) | 1124 (25.6) | 217 (40.9) | <0.001 |
| Depressed | 2302 (46.7) | 2029 (46.2) | 273 (51.4) | 0.02 |
| Endoscopic finding | ||||
| Ulcer | 274 (5.6) | 225 (5.1) | 49 (9.2) | <0.001 |
| Fusion of fold, interruption, or smooth tapering of fold | 104 (2.1) | 70 (1.6) | 34 (6.4) | <0.001 |
| Erythema | 795 (16.1) | 534 (12.1) | 261 (49.2) | <0.001 |
| Exudate | 210 (4.3) | 102 (2.3) | 107 (20.2) | <0.001 |
| Whitish scar or atrophy | 269 (5.5) | 225 (5.1) | 44 (8.3) | 0.002 |
| Nodularity or elevated | 863 (17.5) | 596 (13.6) | 267 (50.3) | <0.001 |
| Spontaneous bleeding | 60 (1.2) | 38 (0.9) | 22 (4.1) | <0.001 |
Note: Values for categorical variables are given as a number (percentage); values for continuous variables are given as mean (standard deviation). Abbreviations: AMD, adenocarcinoma moderate-differentiated; AWD, adenocarcinoma well-differentiated; APD, adenocarcinoma poorly differentiated; SRC, signet-ring cell; CIS, carcinoma in situ; SCC, squamous cell carcinoma; AW, anterior wall; PW, posterior wall; LC, lesser curvature; GC, greater curvature.
Figure 3The area of the receiver operating characteristics for prediction of non-curative resection after endoscopic submucosal dissection. (A) Internal data set, (B) External data set. Abbreviation: RSS, risk-scoring system; LR, logistic regression; SVM, support vector machine; KNN, k-nearest neighbors; NB, naive bayes; XGB, extreme gradient boosting; RF, random forest; MLP, multilayer perceptron.
Performance of the non-curative resection prediction model for the seven machine-learning models used in this study.
| Risk Score | Precision | F1 Score | AUPRC (95%CI) | Sensitivity | Specificity | AUROC (95%CI) | |
|---|---|---|---|---|---|---|---|
| Internal data | |||||||
| RSS | 0.636 | 0.777 | 0.463 | 0.998 | 0.008 | 0.701 | |
| LR | 0.735 | 0.547 | 0.691 | 0.788 | 0.721 | 0.840 | <0.001 |
| SVM | 0.700 | 0.460 | 0.596 | 0.827 | 0.618 | 0.667 | <0.001 |
| KNN | 0.835 | 0.436 | 0.652 | 0.771 | 0.665 | 0.807 | <0.001 |
| NB | 0.696 | 0.492 | 0.633 | 0.946 | 0.380 | 0.799 | <0.001 |
| XGB | 0.749 | 0.576 | 0.699 | 0.785 | 0.732 | 0.851 | <0.001 |
| RF | 0.925 | 0.326 | 0.647 | 0.713 | 0.757 | 0.812 | <0.001 |
| MLP | 0.718 | 0.527 | 0.676 | 0.722 | 0.752 | 0.837 | <0.001 |
| External data | |||||||
| RSS | 0.200 | 0.333 | 0.174 | 0.977 | 0.147 | 0.616 | |
| LR | 0.122 | 0.193 | 0.104 | 0.561 | 0.794 | 0.693 | 0.09 |
| SVM | 0.099 | 0.133 | 0.113 | 0.563 | 0.794 | 0.693 | 0.02 |
| KNN | 0.202 | 0.148 | 0.169 | 0.829 | 0.470 | 0.645 | 0.69 |
| NB | 0.096 | 0.147 | 0.151 | 0.776 | 0.411 | 0.631 | 0.74 |
| XGB | 0.187 | 0.274 | 0.125 | 0.587 | 0.735 | 0.710 | 0.02 |
| RF | 0.031 | 0.030 | 0.099 | 0.394 | 0.911 | 0.688 | 0.12 |
| MLP | 0.126 | 0.188 | 0.105 | 0.551 | 0.823 | 0.691 | 0.06 |
a Compared with the area under the receiver, operating characteristics of the score-based non-curative resection prediction model used by the Delong test. Abbreviation: AUPRC, the area under the precision-recall curve; AUROC, the area under the receiver operating characteristics curve; CI, confidence interval; RSS, risk-scoring system; LR, logistic regression; SVM, support vector machine; KNN, k-nearest neighbors; NB, naive bayes; XGB, extreme gradient boosting; RF, random forest; and MLP, multilayer perceptron.