| Literature DB >> 33842317 |
Ji Hyun Ahn1, Min Seob Kwak1, Hun Hee Lee1, Jae Myung Cha1, Hyun Phil Shin1, Jung Won Jeon1, Jin Young Yoon1.
Abstract
BACKGROUND: Identification of a simplified prediction model for lymph node metastasis (LNM) for patients with early colorectal cancer (CRC) is urgently needed to determine treatment and follow-up strategies. Therefore, in this study, we aimed to develop an accurate predictive model for LNM in early CRC.Entities:
Keywords: colorectal cancer; machine learning; metastasis; model; prediction
Year: 2021 PMID: 33842317 PMCID: PMC8029977 DOI: 10.3389/fonc.2021.614398
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1The Workflow of the development process.
Baseline characteristics.
| Variables | LNM (-) | LNM (+) | |
|---|---|---|---|
| N = 24190 | N = 2543 |
| |
| Age at diagnosis, n (%) | <0.001 | ||
| 0-9 | 0 (0.0) | 1 (0.0) | |
| 10-19 | 5 (0.0) | 1 (0.0) | |
| 20-29 | 73 (0.3) | 10 (0.4) | |
| 30-39 | 339 (1.4) | 61 (2.4) | |
| 40-49 | 1511 (6.3) | 241 (9.5) | |
| 50-59 | 5684 (23.5) | 730 (28.7) | |
| 60-69 | 6775 (28.0) | 683 (26.9) | |
| 70-79 | 5952 (24.6) | 544 (21.4) | |
| 80-89 | 3410 (14.1) | 245 (9.6) | |
| 90-99 | 441 (1.8) | 27 (1.1) | |
| Sex, n (%) | <0.001 | ||
| M | 12864 (53.2) | 1254 (49.3) | |
| F | 11326 (46.8) | 1289 (50.7) | |
| Primary site, n (%) | <0.001 | ||
| Cecum | 3355 (13.9) | 381 (15.0) | |
| Appendix | 119 (0.5) | 4 (0.2) | |
| Ascending colon | 3493 (14.4) | 300 (11.8) | |
| Hepatic flexure of colon | 665 (2.7) | 61 (2.4) | |
| Transverse colon | 1545 (6.4) | 119 (4.7) | |
| Splenic flexure of colon | 381 (1.6) | 39 (1.5) | |
| Descending colon | 1009 (4.2) | 97 (3.8) | |
| Sigmoid colon | 6193 (25.6) | 773 (30.4) | |
| Overlapping lesion of colon | 78 (0.3) | 5 (0.2) | |
| Colon, NOS | 111 (0.5) | 7 (0.3) | |
| Rectosigmoid junction | 1737 (7.2) | 268 (10.5) | |
| Rectum, NOS | 5504 (22.7) | 489 (19.2) | |
| Tumor grade, n (%) | <0.001 | ||
| Well differentiated | 5284 (21.8) | 288 (11.3) | |
| Moderately differentiated | 17173 (71.0) | 1853 (72.9) | |
| Poorly differentiated | 1538 (6.4) | 364 (14.3) | |
| Undifferentiated | 195 (0.8) | 38 (1.5) | |
| Race, n (%) | <0.001 | ||
| Hispanic | 2186 (9.1) | 228 (9.0) | |
| Non-Hispanic American Indian/Alaska Native | 129 (0.5) | 10 (0.4) | |
| Non-Hispanic Asian or Pacific Islander | 2099 (8.7) | 270 (10.6) | |
| Non-Hispanic Black | 2837 (11.7) | 354 (13.9) | |
| Non-Hispanic White | 16939 (70.0) | 1681 (66.1) | |
| Tumor type, n (%) | <0.001 | ||
| Carcinoma, NOS | 40 (0.2) | 6 (0.2) | |
| Carcinoma, undifferentiated, NOS | 1 (0.0) | 1 (0.0) | |
| Adenocarcinoma, NOS | 9657 (39.9) | 1148 (45.1) | |
| Adenocarcinoma, intestinal type | 2 (0.0) | 2 (0.1) | |
| Adenocarcinoma in adenomatous polyp | 5943 (24.6) | 513 (20.2) | |
| Tubular adenocarcinoma | 47 (0.2) | 2 (0.1) | |
| Adenocarcinoma with mixed subtypes | 20 (0.1) | 4 (0.2) | |
| Adenocarcinoma in villous adenoma | 1378 (5.7) | 130 (5.1) | |
| Villous adenocarcinoma | 27 (0.1) | 1 (0.0) | |
| Adenocarcinoma in tubulovillous adenoma | 6420 (26.5) | 614 (24.2) | |
| Cystadenocarcinoma, NOS | 1 (0.0) | 0 (0.0) | |
| Mucinous cystadenocarcinoma, NOS | 11 (0.1) | 0 (0.0) | |
| Mucinous adenocarcinoma | 487 (2.0) | 82 (3.2) | |
| Mucin-producing adenocarcinoma | 107 (0.4) | 20 (0.8) | |
| Signet ring cell carcinoma | 49 (0.2) | 20 (0.8) | |
| Tumor size, mm, mean (SD) | 20.6 (25.4) | 22.8 (20.9) | <0.001 |
SD, standard deviation.
Figure 2Evaluation of the predictive models. (A) Average ROC curves of seven models. (B) Average PR curves, indicating the tradeoff between precision and recall.
Confusion matrices of developed models.
| Confusion matrix | |||
|---|---|---|---|
| Actual | Prediction | ||
| LNM (-) | LNM (+) | ||
| LR | LNM (-) | 1903 | 516 |
| LNM (+) | 1240 | 696 | |
| XGB | LNM (-) | 2163 | 256 |
| LNM (+) | 1468 | 468 | |
| kNN | LNM (-) | 1907 | 512 |
| LNM (+) | 18 | 1918 | |
| CART | LNM (-) | 1907 | 512 |
| LNM (+) | 18 | 1918 | |
| SVM | LNM (-) | 1898 | 521 |
| LNM (+) | 1053 | 883 | |
| NN | LNM (-) | 1995 | 424 |
| LNM (+) | 304 | 1632 | |
| RF | LNM (-) | 2248 | 171 |
| LNM (+) | 5 | 1931 | |
LR, logistic regression; XGB, XGBoost, kNN, k-nearest neighbor; CART, classification and regression trees model; SVM, support vector machine; NN, neural network; RF, random forest.
Performance of developed models.
| AUC | Sensitivity | Specificity | Precision | NPV | FDR | Accuracy | AP | F1 Score | Matthews correlation coefficient | |
|---|---|---|---|---|---|---|---|---|---|---|
| Models | ||||||||||
| LR | 0.623 | 0.360 | 0.787 | 0.574 | 0.606 | 0.426 | 0.597 | 0.666 | 0.442 | 0.162 |
| XGB | 0.659 | 0.242 | 0.894 | 0.646 | 0.596 | 0.354 | 0.604 | 0.700 | 0.352 | 0.181 |
| kNN | 0.933 | 0.991 | 0.788 | 0.789 | 0.991 | 0.211 | 0.878 | 0.966 | 0.879 | 0.780 |
| CART | 0.944 | 0.991 | 0.788 | 0.789 | 0.991 | 0.211 | 0.878 | 0.972 | 0.879 | 0.780 |
| SVM | 0.682 | 0.456 | 0.785 | 0.629 | 0.643 | 0.371 | 0.639 | 0.717 | 0.529 | 0.256 |
| NN | 0.910 | 0.843 | 0.825 | 0.794 | 0.868 | 0.206 | 0.833 | 0.841 | 0.818 | 0.665 |
| RF | 0.991 | 0.997 | 0.929 | 0.919 | 0.998 | 0.081 | 0.960 | 0.995 | 0.956 | 0.922 |
AUC, area under curve; NPV, negative predictive value; FDR, false discovery rate; AP, average precision; LR, logistic regression; XGB, XGBoost, kNN, k-nearest neighbor; CART, classification and regression trees model; SVM, support vector machine; NN, neural network; RF, random forest.
Figure 3Factor importance of the developed models. The (A–F) Bar graphs describe the proportion of importance of the different predictors in the model.