| Literature DB >> 35267429 |
Ji-Eun Na1,2, Yeong-Chan Lee3, Tae-Jun Kim1, Hyuk Lee1, Hong-Hee Won3, Yang-Won Min1, Byung-Hoon Min1, Jun-Haeng Lee1, Poong-Lyul Rhee1, Jae J Kim1.
Abstract
Stratification of the risk of lymph node metastasis (LNM) in patients with non-curative resection after endoscopic resection (ER) for early gastric cancer (EGC) is crucial in determining additional treatment strategies and preventing unnecessary surgery. Hence, we developed a machine learning (ML) model and validated its performance for the stratification of LNM risk in patients with EGC. We enrolled patients who underwent primary surgery or additional surgery after ER for EGC between May 2005 and March 2021. Additionally, patients who underwent ER alone for EGC between May 2005 and March 2016 and were followed up for at least 5 years were included. The ML model was built based on a development set (70%) using logistic regression, random forest (RF), and support vector machine (SVM) analyses and assessed in a validation set (30%). In the validation set, LNM was found in 337 of 4428 patients (7.6%). Among the total patients, the area under the receiver operating characteristic (AUROC) for predicting LNM risk was 0.86 in the logistic regression, 0.85 in RF, and 0.86 in SVM analyses; in patients with initial ER, AUROC for predicting LNM risk was 0.90 in the logistic regression, 0.88 in RF, and 0.89 in SVM analyses. The ML model could stratify the LNM risk into very low (<1%), low (<3%), intermediate (<7%), and high (≥7%) risk categories, which was comparable with actual LNM rates. We demonstrate that the ML model can be used to identify LNM risk. However, this tool requires further validation in EGC patients with non-curative resection after ER for actual application.Entities:
Keywords: early gastric cancer; lymph node metastasis; machine learning model; risk stratification
Year: 2022 PMID: 35267429 PMCID: PMC8909118 DOI: 10.3390/cancers14051121
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1Diagram of patient selection.
Baseline characteristics of the development set and validation set.
| Variable | Development | Validation | |
|---|---|---|---|
| Age | 58 ± 11 | 58 ± 11 | 0.413 |
| Gender | 0.789 | ||
| Male | 6697 (65) | 2881 (65) | |
| Female | 3635 (35) | 1547 (35) | |
| tumors | 512 (5) | 201 (5) | |
| Location | 0.013 | ||
| Upper | 1083 (11) | 483 (11) | |
| Middle | 4773 (46) | 1929 (44) | |
| Lower | 4476 (43) | 2016 (45) | |
| Size (mm) | 27 ± 18 | 27 ± 18 | 0.645 |
| Gross type | 0.823 | ||
| Non-depressed | 2568 (25) | 1109 (25) | |
| Depressed | 7764 (75) | 3319 (75) | |
| Differentiation | 0.999 | ||
| Well | 1214 (12) | 523 (12) | |
| Moderate | 4053 (39) | 1741 (39) | |
| Signet | 2315 (22) | 989 (22) | |
| Poorly | 2750 (27) | 1175 (27) | |
| Histologic type by Lauren | 0.122 | ||
| Intestinal | 5198 (50) | 2271 (51) | |
| Diffuse | 3867 (38) | 1666 (38) | |
| Mixed | 1267 (12) | 491 (11) | |
| Depth of invasion | 0.983 | ||
| Lamina propria | 2568 (25) | 1114 (25) | |
| Muscularis mucosa | 3767 (37) | 1612 (37) | |
| SM1 | 1069 (10) | 455 (10) | |
| SM2/3 | 2928 (28) | 1247 (28) | |
| Lymphatic invasion, present | 1571 (15) | 682 (15) | 0.780 |
| Venous invasion, present | 154 (2) | 72 (2) | 0.588 |
| Perineural invasion, present | 232 (2) | 96 (2) | 0.817 |
Mean ± standard deviation presented for continuous variables. Values are expressed as n (%); unless otherwise specified. p-value calculated using Student’s t-test for continuous variables or Pearson’s chi-square test for categorical variables for overall data. SM1: submucosal invasion <500 µm from the muscularis mucosa; SM2/3: submucosal invasion ≥500 µm from the muscularis mucosa.
Figure 2AUROC of the ML model for the prediction of LNM in the development set (total number = 10,332, number of patients with initial ER = 2320).
Determination of the cutoff for stratification of LNM risk based on the predictive value of the ML model and actual LNM rate in the development set. (A) Total patients. (B) Patients with initial ER.
|
| ||||
| Logistic regression | ||||
| Rate (%) | Risk probability | Risk category | ||
| 1863 | 3 | 0.2 | <1% | Very low |
| 3105 | 42 | 1.4 | ≥1% to <3% | Low |
| 1656 | 67 | 4.1 | ≥3% to <7% | Intermediate |
| 3708 | 682 | 18.4 | ≥7% | High |
| Random forest | ||||
| Rate (%) | Risk probability | Risk category | ||
| 5589 | 2 | <0.1 | <1% | Very low |
| 1859 | 24 | 1.3 | ≥1% to <3% | Low |
| 412 | 18 | 4.4 | ≥3% to <7% | Intermediate |
| 2472 | 750 | 30.3 | ≥7% | High |
| Support vector machine | ||||
| Rate (%) | Risk probability | Risk category | ||
| 2277 | 5 | 0.2 | <1% | Very low |
| 2691 | 35 | 1.3 | ≥1% to <3% | Low |
| 1656 | 65 | 3.9 | ≥3% to <7% | Intermediate |
| 3708 | 689 | 18.6 | ≥7% | High |
|
| ||||
| Logistic regression | ||||
| Rate (%) | Risk probability | Risk category | ||
| 1492 | 1 | 0.1 | <1% | Very low |
| 368 | 5 | 1.4 | ≥1% to <3% | Low |
| 92 | 3 | 3.3 | ≥3% to <7% | Intermediate |
| 368 | 33 | 9.0 | ≥7% | High |
| Random forest | ||||
| Rate (%) | Risk probability | Risk category | ||
| 1722 | 0 | 0 | <1% | Very low |
| 322 | 4 | 1.2 | ≥1% to <3% | Low |
| 46 | 2 | 4.4 | ≥3% to <7% | Intermediate |
| 230 | 36 | 15.7 | ≥7% | High |
| Support vector machine | ||||
| Rate (%) | Risk probability | Risk category | ||
| 1491 | 1 | 0.1 | <1% | Very low |
| 136 | 2 | 1.5 | ≥1% to <3% | Low |
| 445 | 15 | 3.3 | ≥3% to <7% | Intermediate |
| 206 | 24 | 10.4 | ≥7% | High |
LNM, lymph node metastasis.
Figure 3AUROC of the ML model for the prediction of LNM in the validation set (total number = 4428, number with initial ER = 1016).
Risk stratification of LNM by the ML model and the actual rate in the validation set. (A) Total patients. (B) Patients with initial ER.
|
| ||||
| Logistic regression | ||||
| Risk probability | Risk category | Rate (%) | ||
| <1% | Very low | 801 | 1 | 0.1 |
| ≥1% to <3% | Low | 1335 | 21 | 1.6 |
| ≥3% to <7% | Intermediate | 708 | 34 | 4.8 |
| ≥7% | High | 1584 | 281 | 17.7 |
| Random forest | ||||
| Risk probability | Risk category | Rate (%) | ||
| <1% | Very low | 2403 | 30 | 1.3 |
| ≥1% to <3% | Low | 793 | 50 | 6.3 |
| ≥3% to <7% | Intermediate | 176 | 13 | 7.4 |
| ≥7% | High | 1056 | 244 | 23.1 |
| Support vector machine | ||||
| Risk probability | Risk category | Rate (%) | ||
| <1% | Very low | 978 | 1 | 0.1 |
| ≥1% to <3% | Low | 1138 | 19 | 1.6 |
| ≥3% to <7% | Intermediate | 678 | 30 | 4.2 |
| ≥7% | High | 1297 | 287 | 18.1 |
|
| ||||
| Logistic regression | ||||
| Risk probability | Risk category | Rate (%) | ||
| <1% | Very low | 656 | 1 | 0.2 |
| ≥1% to <3% | Low | 160 | 4 | 2.5 |
| ≥3% to <7% | Intermediate | 40 | 0 | 0 |
| ≥7% | High | 160 | 19 | 11.9 |
| Random forest | ||||
| Risk probability | Risk category | Rate (%) | ||
| <1% | Very low | 756 | 3 | 0.4 |
| ≥1% to <3% | Low | 140 | 7 | 5.0 |
| ≥3% to <7% | Intermediate | 20 | 2 | 10.0 |
| ≥7% | High | 100 | 12 | 12.0 |
| Support vector machine | ||||
| Risk probability | Risk category | Rate (%) | ||
| <1% | Very low | 655 | 1 | 0.2 |
| ≥1% to <3% | Low | 59 | 1 | 1.7 |
| ≥3% to <7% | Intermediate | 191 | 9 | 4.5 |
| ≥7% | High | 87 | 13 | 13.0 |
Figure 4Identification of patients with negligible risk of lymph node metastasis at the high-sensitivity cutoff in the validation set.