| Literature DB >> 33273861 |
Han Chen1,2, Xiaoying Zhou1,2, Xinyu Tang2,3, Shuo Li1,2, Guoxin Zhang1,2.
Abstract
BACKGROUND OREntities:
Keywords: lymph node metastasis; machine learning; neural network; superficial esophageal squamous cell carcinoma
Year: 2020 PMID: 33273861 PMCID: PMC7707435 DOI: 10.2147/CMAR.S270316
Source DB: PubMed Journal: Cancer Manag Res ISSN: 1179-1322 Impact factor: 3.989
Characteristics of Patients According to Lymph Node Status
| Variables | Univariate Analysis | Multivariate Analysis | |||||||
|---|---|---|---|---|---|---|---|---|---|
| LMN(+)% | LMN(-)% | OR | 95% CI | P-value | B | OR | 95% CI | P-value | |
| Sex | 0.413 | – | – | – | – | ||||
| Male | 18.9% (97) | 81.1% (416) | 1.192 | 0.783–1.814 | |||||
| Female | 16.4% (36) | 83.6% (184) | Ref | ||||||
| Age | 0.933 | – | – | – | – | ||||
| ≥60 | 18.2% (79) | 81.8% (354) | 1.017 | 0.694–1.490 | |||||
| <60 | 18.0% (54) | 82.0% (246) | Ref | ||||||
| Tumor Location | 0.196 | – | – | – | – | ||||
| Upper | 20.8% (20) | 79.2% (76) | Ref | ||||||
| Middle | 15.6% (57) | 84.4% (309) | 0.701 | 0.397–1.237 | |||||
| Lower | 20.7% (56) | 79.3% (215) | 0.990 | 0.558–1.757 | |||||
| Alcohol | 0.008* | 0.019* | |||||||
| Yes | 24.4% (48) | 75.6% (149) | 1.709 | 1.146–2.548 | 0.594 | 1.812 | 1.105–2.971 | ||
| No | 15.9% (85) | 84.1% (451) | Ref | ||||||
| Smoking | 0.870 | – | – | – | – | ||||
| Yes | 18.4% (56) | 81.6% (248) | 1.032 | 0.705–1.510 | |||||
| No | 17.9% (77) | 82.1% (352) | Ref | ||||||
| Family History of Tumor | 0.105 | – | – | – | – | ||||
| Yes | 13.0% (16) | 87.0% (107) | 0.630 | 0.359–1.106 | |||||
| No | 19.2% (117) | 80.8% (493) | Ref | ||||||
| Tumor Size | <0.001* | <0.001* | |||||||
| ≥1.85cm | 30.0.7% (98) | 70.0% (229) | 4.536 | 2.982–6.901 | 1.327 | 3.768 | 2.302–6.166 | ||
| <1.85cm | 8.6% (35) | 91.4% (371) | Ref | ||||||
| Histologic grade | <0.001* | <0.001* | |||||||
| Well and Moderately | 11.0% (56) | 89.0% (451) | Ref | ||||||
| Poorly | 34.1.8% (77) | 65.9% (149) | 4.162 | 2.815–6.152 | 0.956 | 2.600 | 1.633–4.141 | ||
| Invasion Depth | – | – | <0.001* | <0.001* | |||||
| EP/LPM | 0% (0) | 100% (128) | Ref | ||||||
| MM | 5.4% (6) | 94.7% (108) | 1.056 | 1.011–1.102 | |||||
| SM1 | 15.7% (60) | 84.3% (322) | 1.186 | 1.136–1.239 | 2.061 | 7.856 | 3.358–18.381 | ||
| SM2 or deeper | 61.5% (67) | 38.5% (42) | 2.595 | 2.047–3.290 | |||||
| LV Invasion | <0.001* | <0.001* | |||||||
| Positive | 86.1.1% (31) | 13.9% (5) | 36.17 | 13.74–95.18 | 3.216 | 24.938 | 8.349–74.495 | ||
| Negative | 14.6% (102) | 85.4% (595) | Ref | ||||||
| Pathological type | 0.129 | – | – | – | – | ||||
| Protruding | 21.7% (43) | 78.3% (155) | Ref | ||||||
| Superficial type | 16.4% (37) | 83.6% (189) | 0.706 | 0.433–1.150 | |||||
| Ulcerative and localized | 19.6% (40) | 80.4% (164) | 0.879 | 0.542–1.426 | |||||
| Infiltrative | 16.0% (12) | 84.0% (63) | 0.687 | 0.340–1.388 | |||||
| Diffusely infiltrative | 3.3% (1) | 96.7% (29) | 0.124 | 0.016–1.039 | |||||
| CT Results | <0.001* | <0.001* | |||||||
| Positive | 27.5% (69) | 72.5% (182) | 2.476 | 1.690–3.628 | 1.107 | 3.026 | 1.901–4.818 | ||
| Negative | 13.3% (64) | 86.7% (418) | Ref | ||||||
Note: *Statistically significant with a p-value less than 0.05.
Abbreviations: LNM, lymph node metastasis; LV, lymphovascular; EP, epithelium; LPM, lamina propria mucosa; MM, muscularis mucosa; SM, submucosal; Ref, reference.
Figure 1Selection of candidate predictors by LASSO regression. (A) Identification of the tuning parameter (λ) by 10-fold cross-validation on the basis of minimum criteria. Binomial deviance was plotted as a function of log(λ) from cross-validation procedure. The y-axis represents the binomial deviance, and the lower x-axis represents the log(λ). The numbers listed in the upper x-axis indicates the number of selected candidate predictors corresponding to a different λ value. The red dots stands for average deviance values of each model when given a certain λ value, and vertical bars through the red dots show the upper and lower values of the deviances. The black dotted lines determine the optimal λ values via the minimum criteria and the 1 standard error of the minimum criteria (the 1-SE criteria). The optimal λ value of 0.023 with log (λ) = - 3.63 was finally determined. (B) LASSO coefficient profiles of the twelve candidate predictors. The black dotted vertical line was drawn at the value selected using 10-fold cross-validation in Figure 1A. The optimal λ value yielded six candidate features with nonzero coefficients.
Figure 2Establishment of an Artificial Neural Network Model. A pattern recognition ANN model was generated. This non-parametric model consisted of 3 layers. The input layer consists of six parameters: tumor size, invasion depth, a past habit of alcohol use, histologic grade, lymph-vessel invasion, and preoperative CT results. The hidden layer consists of 20 neurons. The two output layers represented positive LN metastasis and negative LN metastasis, respectively.
Figure 3ROC curves of the established models. The blue curve and the green dotted curve represent the development data of the ANN model and the LR model, respectively. A–D represents the C-index of the training group, the validation group, the testing group, and the whole group, respectively.
The Classification of the Established Models
| Models | Predicted Results | Pathological Diagnosis | ||
|---|---|---|---|---|
| + | – | Percent Correct | ||
| ANN | + | 74 | 11 | 87.06% |
| – | 57 | 591 | 91.20% | |
| Percent Correct | 56.49% | 98.17% | 90.72% | |
| LR | + | 109 | 22 | 83.21% |
| – | 165 | 437 | 72.59% | |
| Percent Correct | 39.78% | 95.21% | 74.49% | |
Abbreviations: ANN, artificial neural network; LR, logistic regression.
Comparison of ANN Model and LR Model for Predicting LN Metastasis
| Diagnostic Index | ANN Model (%, 95% CI) | LR Model (%, 95% CI) | |
|---|---|---|---|
| Sensitivity | 87.06% | 83.21% | 0.764 |
| Specificity | 91.20% | 72.59% | 0.006* |
| PPV | 56.49% | 39.78% | 0.020* |
| NPV | 98.17% | 95.21% | 0.627 |
| Accuracy | 90.72% | 74.49% | <0.001* |
| AUC | 0.915 | 0.868 | <0.001* |
| NRI | −1.1%, z=−0.222 | 0.824 | |
| IDI | 23.3%, z=4.338 | <0.001* | |
Note: *Statistically significant with a p-value less than 0.05.
Abbreviations: ANN, artificial neural network; LR, logistic regression; PPV, positive predictive value; NPV, negative predictive value; NRI, net reclassification improvement; IDI, integrated discrimination improvement.
Summary of Model Performance Measures
| Aspect | Model Performance Measures | ANN | LR |
|---|---|---|---|
| Diagnostic Test | Accuracy | √ | |
| Sensitivity | Comparable | ||
| Specificity | √ | ||
| PPV | √ | ||
| NPV | Comparable | ||
| Discrimination | C-index | √ | |
| Reclassification | IDI | √ | |
| NRI | Comparable | ||
Note: √: perform better.
Abbreviations: ANN, artificial neural network; LR, logistic regression; PPV, positive predictive value; NPV, negative predictive value; NRI, net reclassification improvement; IDI, integrated discrimination improvement.