| Literature DB >> 36185632 |
Xin Wei1, Xue-Jiao Yan2, Yu-Yan Guo3, Jie Zhang4, Guo-Rong Wang5, Arsalan Fayyaz6, Jiao Yu7.
Abstract
BACKGROUND: The most important consideration in determining treatment strategies for undifferentiated early gastric cancer (UEGC) is the risk of lymph node metastasis (LNM). Therefore, identifying a potential biomarker that predicts LNM is quite useful in determining treatment. AIM: To develop a machine learning (ML)-based integral procedure to construct the LNM gray-level co-occurrence matrix (GLCM) prediction model.Entities:
Keywords: Feature selection; Gray-level co-occurrence matrix; Lymph node metastasis; Machine learning; Prediction; Undifferentiated early gastric cancer
Mesh:
Year: 2022 PMID: 36185632 PMCID: PMC9521518 DOI: 10.3748/wjg.v28.i36.5338
Source DB: PubMed Journal: World J Gastroenterol ISSN: 1007-9327 Impact factor: 5.374
Figure 1Flowchart of patient selection and data processing. UEGC: Undifferentiated early gastric cancer; EMR: Endoscopic mucosal resection; ESD: Endoscopic submucosal dissection; RFC: Random forest classifier; SVM: Support vector machine; DT: Decision tree; ANN: Artificial neural network; XGboost: Extreme gradient boosting; ROC: Receiver operating characteristic; DCA: Decision curve analysis; CIC: Clinical impact curve; LNM: Lymph node metastasis.
Patient baseline population and image index characteristic
|
|
|
| ||||||
|
|
|
|
|
|
|
|
| |
| Age (median, IQR), yr | 51.00 (40.75, 61.00) | 52.50 (40.25, 64.50) | 51.00 (41.00, 60.00) | 0.185 | 47.00 (37.25, 58.75) | 53.00 (39.00, 63.00) | 46.00 (37.00, 57.00) | 0.215 |
| Sex (%) | ||||||||
| Male | 266 (72.3) | 48 (77.4) | 218 (71.2) | 0.404 | 123 (77.8) | 21 (72.4) | 102 (79.1) | 0.594 |
| Female | 102 (27.7) | 14 (22.6) | 88 (28.8) | 35 (22.2) | 8 (27.6) | 27 (20.9) | ||
| Site (%) | ||||||||
| Nearly 1/3 | 81 (22.0) | 10 (16.1) | 71 (23.2) | 0.384 | 37 (23.4) | 9 (31.0) | 28 (21.7) | 0.138 |
| Medium 1/3 | 73 (19.8) | 15 (24.2) | 58 (19.0) | 23 (14.6) | 1 (3.4) | 22 (17.1) | ||
| Far 1/3 | 214 (58.2) | 37 (59.7) | 177 (57.8) | 98 (62.0) | 19 (65.5) | 79 (61.2) | ||
| Ulcer (%) | ||||||||
| Yes | 103 (28.0) | 21 (33.9) | 82 (26.8) | 0.329 | 45 (28.5) | 11 (37.9) | 34 (26.4) | 0.308 |
| No | 265 (72.0) | 41 (66.1) | 224 (73.2) | 113 (71.5) | 18 (62.1) | 95 (73.6) | ||
| Gross_type (%) | ||||||||
| Uplift | 98 (26.6) | 14 (22.6) | 84 (27.5) | 0.587 | 56 (35.4) | 8 (27.6) | 48 (37.2) | 0.227 |
| Flat | 72 (19.6) | 11 (17.7) | 61 (19.9) | 27 (17.1) | 8 (27.6) | 19 (14.7) | ||
| Sunken | 198 (53.8) | 37 (59.7) | 161 (52.6) | 75 (47.5) | 13 (44.8) | 62 (48.1) | ||
| Tumor_size (%) | ||||||||
| ≤ 2 cm | 296 (80.4) | 18 (29.0) | 278 (90.8) | < 0.001 | 120 (75.9) | 6 (20.7) | 114 (88.4) | < 0.001 |
| > 2 cm | 72 (19.6) | 44 (71.0) | 28 (9.2) | 38 (24.1) | 23 (79.3) | 15 (11.6) | ||
| Infiltration_depth (%) | ||||||||
| Mucosal layer | 267 (72.6) | 12 (19.4) | 255 (83.3) | < 0.001 | 113 (71.5) | 9 (31.0) | 104 (80.6) | < 0.001 |
| Submucosa | 101 (27.4) | 50 (80.6) | 51 (16.7) | 45 (28.5) | 20 (69.0) | 25 (19.4) | ||
| Vascular_invasion (%) | ||||||||
| Yes | 124 (33.7) | 42 (67.7) | 82 (26.8) | < 0.001 | 56 (35.4) | 21 (72.4) | 35 (27.1) | < 0.001 |
| No | 244 (66.3) | 20 (32.3) | 224 (73.2) | 102 (64.6) | 8 (27.6) | 94 (72.9) | ||
| VTT (%) | ||||||||
| Yes | 124 (33.7) | 45 (72.6) | 79 (25.8) | < 0.001 | 44 (27.8) | 23 (79.3) | 21 (16.3) | < 0.001 |
| No | 244 (66.3) | 17 (27.4) | 227 (74.2) | 114 (72.2) | 6 (20.7) | 108 (83.7) | ||
| TF (median, IQR) | 3.78 (3.56, 3.99) | 4.13 (3.97, 4.27) | 3.70 (3.51, 3.90) | < 0.001 | 3.79 (3.52, 4.01) | 4.16 (4.00, 4.31) | 3.70 (3.49, 3.93) | < 0.001 |
| EV (median, IQR) | 0.88 (0.64, 1.12) | 0.60 (0.49, 0.68) | 0.98 (0.72, 1.20) | < 0.001 | 0.85 (0.65, 1.09) | 0.64 (0.54, 0.70) | 0.92 (0.72, 1.16) | < 0.001 |
| Entropy (median, IQR) | 8.68 (8.37, 8.98) | 10.51 (10.07, 10.88) | 8.57 (8.33, 8.83) | < 0.001 | 8.79 (8.43, 9.02) | 10.44 (10.16, 10.96) | 8.65 (8.38, 8.89) | < 0.001 |
| IG_all (median, IQR) | 2.16 (1.76, 2.47) | 3.04 (2.64, 3.62) | 2.03 (1.69, 2.30) | < 0.001 | 2.12 (1.75, 2.47) | 2.94 (2.70, 3.54) | 1.97 (1.64, 2.31) | < 0.001 |
| IG_0 (median, IQR) | 2.26 (1.75, 2.66) | 3.34 (2.72, 3.80) | 2.10 (1.69, 2.48) | < 0.001 | 2.41 (1.90, 2.79) | 3.70 (3.16, 4.18) | 2.22 (1.77, 2.62) | < 0.001 |
| IG_45 (median, IQR) | 1.88 (1.54, 2.18) | 2.85 (2.32, 3.26) | 1.78 (1.47, 2.04) | < 0.001 | 1.85 (1.48, 2.18) | 2.73 (2.31, 3.11) | 1.74 (1.40, 2.03) | < 0.001 |
| IG_90 (median, IQR) | 2.34 (1.85, 2.85) | 3.36 (2.89, 3.84) | 2.20 (1.75, 2.63) | < 0.001 | 2.42 (1.94, 2.78) | 3.27 (3.03, 3.61) | 2.25 (1.75, 2.61) | < 0.001 |
| IV_all (median, IQR) | 176.90 (148.98, 207.25) | 134.80 (109.30, 163.02) | 182.00 (156.00, 210.75) | < 0.001 | 175.50 (143.25, 200.75) | 133.50 (105.80, 155.70) | 183.00 (154.00, 206.00) | < 0.001 |
| IV_all_SD (median, IQR) | 4584.00 (3148.00, 6602.50) | 2166.50 (1340.50, 3535.00) | 5025.00 (3747.00, 7011.75) | < 0.001 | 4940.50 (2987.25, 6682.00) | 2849.00 (1841.00, 3428.00) | 5618.00 (3813.00, 6897.00) | < 0.001 |
| IV_0 (median, IQR) | 149.85 (122.75, 186.75) | 96.40 (78.95, 125.82) | 163.20 (134.00, 195.65) | < 0.001 | 146.70 (112.78, 185.57) | 74.10 (65.60, 90.60) | 158.40 (131.40, 196.20) | < 0.001 |
| IV_45 (median, IQR) | 239.55 (201.40, 284.75) | 164.40 (123.83, 188.62) | 254.30 (220.67, 290.60) | < 0.001 | 226.25 (201.25, 266.67) | 157.40 (133.90, 193.80) | 243.90 (214.30, 273.50) | < 0.001 |
| IV_90 (median, IQR) | 129.00 (103.00, 154.00) | 101.00 (77.75, 119.00) | 134.00 (109.25, 159.00) | < 0.001 | 124.50 (109.00, 150.75) | 105.00 (77.00, 118.00) | 133.00 (117.00, 156.00) | < 0.001 |
| Haralick_all (median, IQR) | 0.10 (0.09, 0.10) | 0.12 (0.11, 0.13) | 0.09 (0.09, 0.10) | < 0.001 | 0.10 (0.09, 0.10) | 0.12 (0.12, 0.14) | 0.09 (0.09, 0.10) | < 0.001 |
| Haralick_30 (median, IQR) | 0.10 (0.09, 0.11) | 0.14 (0.12, 0.15) | 0.10 (0.09, 0.11) | < 0.001 | 0.10 (0.09, 0.11) | 0.14 (0.13, 0.15) | 0.10 (0.09, 0.11) | < 0.001 |
| Haralick_45 (median, IQR) | 0.09 (0.08, 0.10) | 0.11 (0.10, 0.12) | 0.09 (0.08, 0.10) | < 0.001 | 0.09 (0.08, 0.10) | 0.11 (0.10, 0.13) | 0.09 (0.08, 0.10) | < 0.001 |
| Haralick_90 (median, IQR) | 0.11 (0.10, 0.13) | 0.14 (0.12, 0.16) | 0.11 (0.09, 0.12) | < 0.001 | 0.12 (0.10, 0.13) | 0.15 (0.12, 0.16) | 0.11 (0.09, 0.13) | < 0.001 |
| CSV (median, IQR) | 106.00 (102.00, 111.00) | 108.00 (105.00, 111.00) | 106.00 (101.00, 111.00) | 0.001 | 107.00 (102.25, 111.00) | 109.00 (105.00, 113.00) | 107.00 (102.00, 111.00) | 0.007 |
| CP (median, IQR) | 65.50 (60.00, 70.00) | 68.00 (66.00, 71.00) | 64.00 (59.00, 70.00) | < 0.001 | 64.00 (60.00, 68.00) | 67.00 (64.00, 68.00) | 63.00 (59.00, 68.00) | 0.002 |
IQR: Interquartile range; TF: Total frequency; EV: Energy value; IV_0: Inertia value 0°; IV_45: Inertia value 45°; IV_90: Inertia value 90°; IG_0: Inverse gap 0°; IG_45: Inverse gap 45°; IG_90: Inverse gap 90°; IG_all: Inverse gap full angle; Haralick_30: Haralick 30°; Haralick_45: Haralick 45°; Haralick_90: Haralick 90°; Haralick_all: Haralick full angle; CSV: Cluster shadow value; CP: Cluster prominence.
Figure 2Variable screening and weight allocation. A: Correlation matrix analysis of candidate features; B: Weight distribution of candidate variables for each mL based model. RFC: Random forest classifier; SVM: Support vector machine; DT: Decision tree; ANN: Artificial neural network; XGboost: Extreme gradient boosting.
Figure 3Visualization model prediction based on machine learning based algorithm. A: Random forest classifier model; B: Decision tree model. Candidate factors associated with fracture risk are named through random forest classifier algorithm, and prediction nodes and weights are assigned by the decision tree algorithm.
Figure 4Visualization of prediction models based on artificial neural network algorithm. A: Artificial neural network model; B: Importance of variables using connection weights. Candidate factors associated with lymph node metastasis are ordered via artificial neural network (ANN) algorithm and prediction nodes, and weights are assigned via an ANN algorithm. IV_0: Inertia value 0°; IV_45: Inertia value 45°; IG_0: Inverse gap 0°; IG_45: Inverse gap 45°; IG_all: Inverse gap full angle; Haralick_30: Haralick 30°; Haralick_all: Haralick full angle.
Receiver operating characteristic curve analysis of lymph node metastasis in each mL based model
|
|
|
|
| ||
|
|
|
|
| ||
| RFC | 0.925 | 0.378-1.472 | 0.912 | 0.355-1.469 | Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, IV_45 |
| ANN | 0.887 | 0.340-1.434 | 0.837 | 0.280-1.394 | Entropy, IG_all, IG_0, IG_45, IG_90, IV_all, IV_all_SD, IV_0, IV_45, Haralick_all, Haralick_30 |
| DT | 0.856 | 0.309-1.403 | 0.813 | 0.256-1.370 | Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, IV_45 |
| XGboost | 0.814 | 0.267-1.361 | 0.807 | 0.250-1.364 | Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, IV_45, IG_90 |
| SVM | 0.805 | 0.258-1.352 | 0.794 | 0.237-1.351 | Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, IV_45 |
| GLM | 0.796 | 0.229-1.362 | 0.799 | 0.233-1.365 | Entropy, Haralick_all, Haralick_30, IG_all, IG_45, IG_0, IV_45 |
| Radiologist | 0.789 | 0.242-1.336 | 0.801 | 0.254-1.348 | - |
Variables are included in the model.
RFC: Random forest classifier; SVM: Support vector machine; DT: Decision tree; ANN: Artificial neural network; XGboost: Extreme gradient boosting; GLM; Generalized linear model; AUC: Area under the receiver operating characteristic curve; 95%CI: 95% confidence interval; IV_0: Inertia value 0°; IV_45: Inertia value 45°; IV_90: Inertia value 90°; IG_0: Inverse gap 0°; IG_45: Inverse gap 45°; IG_90: Inverse gap 90°; IG_all: Inverse gap full angle; Haralick_30: Haralick 30°.
Figure 5Predictive performance of candidate models based on machine learning based algorithm. A: Decision curve analysis (DCA) for five mL based models in training sets; B: DCA for five ml based models in test sets. RFC: Random forest classifier; SVM: Support vector machine; DT: Decision tree; ANN: Artificial neural network; XGboost: Extreme gradient boosting.