| Literature DB >> 35033083 |
Tenghui Han1, Jun Zhu2,3, Xiaoping Chen3, Rujie Chen2, Yu Jiang2, Shuai Wang4, Dong Xu5, Gang Shen4, Jianyong Zheng6, Chunsheng Xu7.
Abstract
BACKGROUND: Liver is the most common metastatic site of colorectal cancer (CRC) and liver metastasis (LM) determines subsequent treatment as well as prognosis of patients, especially in T1 patients. T1 CRC patients with LM are recommended to adopt surgery and systematic treatments rather than endoscopic therapy alone. Nevertheless, there is still no effective model to predict the risk of LM in T1 CRC patients. Hence, we aim to construct an accurate predictive model and an easy-to-use tool clinically.Entities:
Keywords: Artificial intelligence; Liver metastasis; Machine learning; Real-world research; T1 colorectal cancer
Year: 2022 PMID: 35033083 PMCID: PMC8761313 DOI: 10.1186/s12935-021-02424-7
Source DB: PubMed Journal: Cancer Cell Int ISSN: 1475-2867 Impact factor: 5.722
Fig. 1The workflow of selection procedure for colorectal cancer patients
Clinical baseline features of SEER and Xijing hospital database
| Variables | SEER database | Xijing CRC cohort | |
|---|---|---|---|
| Training set | Testing set | Outer validation set | |
| Age at diagnosis, n (%) | |||
| 0–9 | 14 (0.1) | 0(0) | 0(0) |
| 10–19 | 128 (1.0) | 31 (0.9) | 0(0) |
| 20–29 | 265 (2.0) | 67 (2.0) | 4 (1.2) |
| 30–39 | 390 (2.9) | 79 (2.4) | 5 (1.5) |
| 40–49 | 1084 (8.1) | 251 (7.5) | 35 (10.7) |
| 50–59 | 3632 (27.0) | 945 (28.2) | 104 (31.9) |
| 60–69 | 3649 (27.2) | 911 (27.1) | 92 (28.2) |
| 70–79 | 2659 (19.8) | 670 (20.0) | 65 (19.9) |
| 80–89 | 1403 (10.4) | 354 (10.5) | 21 (6.4) |
| 90–99 | 204 (1.5) | 49 (1.5) | 0(0) |
| Gender, n (%) | |||
| Female | 6982 (52.0) | 1695 (50.5) | 189 (58.0) |
| Male | 6446 (48.0) | 1662 (49.5) | 137 (42.0) |
| Race, n (%) | |||
| White | 10,226 (76.2) | 2552 (76.0) | 0(0) |
| Black | 1754 (13.1) | 466 (13.9) | 0(0) |
| Asian or Pacific Islander | 1354 (10.1) | 319 (9.5) | 326 (100.0) |
| American Indian/Alaska Native | 94 (0.7) | 20 (0.6) | 0(0) |
| Marital status at diagnosis, n (%) | |||
| Married and separated | 7615 (56.7) | 1855(55.2) | 322 (98.8) |
| Divorced | 1207 (9.0) | 293 (8.7) | 2 (0.6) |
| Unmarried | 2219 (16.5) | 559 (16.7) | 2 (0.6) |
| Other | 2387(17.8) | 650(19.3) | 0(0) |
| LM, n (%) | |||
| Yes | 12,821 (95.5) | 3202 (95.4) | 318 (97.5) |
| No | 607 (4.5) | 155 (4.6) | 8 (2.5) |
| Primary site, n (%) | |||
| Rectum, NOS | 3786 (28.2) | 955 (28.4) | 228 (69.9) |
| Sigmoid colon | 2925 (21.8) | 777 (23.1) | 35 (10.7) |
| Ascending colon | 1646 (12.3) | 413 (12.3) | 24 (7.4) |
| Cecum | 1586 (11.8) | 393 (11.7) | 6 (1.8) |
| Appendix | 868 (6.5) | 216 (6.4) | 0(0) |
| Rectosigmoid junction | 846 (6.3) | 215 (6.4) | 7 (2.1) |
| Transverse colon | 723 (5.4) | 166 (4.9) | 9 (2.8) |
| Descending colon | 481 (3.6) | 106 (3.2) | 1 (0.3) |
| Hepatic flexure of colon | 303 (2.3) | 70 (2.1) | 7 (2.1) |
| Splenic flexure of colon | 172 (1.3) | 31 (0.9) | 3 (0.9) |
| Colon, NOS | 50 (0.4) | 9 (0.3) | 4 (1.2) |
| Overlapping lesion of colon | 42 (0.3) | 6 (0.2) | 2(0.6) |
| Tumor size, mm, mean (SD) | 19.16 (25.1) | 18.82 (22.3) | 24.6 (14.0) |
| Tumor grade, n (%) | |||
| Well differentiated; Grade I | 4171 (31.1) | 1015 (30.2) | 69 (21.2) |
| Moderately differentiated; Grade II | 8306 (61.9) | 2114 (63.0) | 240 (73.6) |
| Poorly differentiated; Grade III | 827 (6.2) | 191 (5.7) | 15 (4.6) |
| Undifferentiated; anaplastic; Grade IV | 124 (0.9) | 37 (1.1) | 2 (0.6) |
| Tumor type, n (%) | |||
| Adenocarcinoma, NOS | 4368 (32.5) | 1099 (32.7) | 91 (27.9) |
| Adenocarcinoma in tubulovillous adenoma | 2969 (22.1) | 743 (22.1) | 76 (23.3) |
| Adenocarcinoma in adenomatous polyp | 2827 (21.1) | 708 (21.1) | 125 (38.3) |
| Carcinoid tumor, NOS | 1837 (13.7) | 454 (13.5) | 0(0) |
| Adenocarcinoma in villous adenoma | 483 (3.6) | 126 (3.8) | 9 (2.8) |
| Neuroendocrine carcinoma, NOS | 409 (3.0) | 93 (2.8) | 0(0) |
| Mucinous adenocarcinoma | 238 (1.8) | 61 (1.8) | 7 (2.1) |
| Squamous cell carcinoma, NOS | 52 (0.4) | 8 (0.2) | 0(0) |
| Atypical carcinoid tumor | 38 (0.3) | 11 (0.3) | 0(0) |
| Signet ring cell carcinoma | 28 (0.2) | 6 (0.2) | 0(0) |
| Mucin-producing adenocarcinoma | 26 (0.2) | 6 (0.2) | 0(0) |
| Tubular adenocarcinoma | 22 (0.2) | 8 (0.2) | 18 (5.5) |
| Gastrointestinal stromal sarcoma | 17 (0.1) | 0(0) | 0(0) |
| Carcinoma, NOS | 14 (0.1) | 5 (0.1) | 0(0) |
| Villous adenocarcinoma | 10 (0.1) | 2 (0.1) | 0(0) |
| Other | 90 (0.7) | 27 (0.8) | 0(0) |
| N, n (%) | |||
| N0 | 12,142 (90.4) | 3031 (90.3) | 295 (90.5) |
| N1 | 1150 (8.6) | 296 (8.82) | 30 (9.2) |
| N2 | 136 (1.0) | 30 (0.9) | 1 (0.3) |
| CEA, n (%) | |||
| Positive | 1223 (9.1) | 300 (8.9) | 110 (33.7) |
| Borderline | 25 (0.2) | 6 (0.2) | 0(0) |
| Negative | 3974 (29.6) | 993 (29.6) | 200 (61.3) |
| Unknown | 8206 (61.1) | 2058 (61.3) | 16 (4.9) |
| Tumor deposits, n (%) | |||
| No tumor deposits | 8777 (65.4) | 2213 (65.9) | 325 (99.7) |
| Tumor Deposits identified | 95 (0.7) | 27 (0.8) | 1 (0.3) |
| Unknown | 4556 (33.9) | 1117 (33.3) | 0(0) |
| Perineural invasion, n (%) | |||
| Yes | 9104 (67.8) | 2246 (66.9) | 169 (51.8) |
| No | 105 (0.8) | 48 (1.4) | 157 (48.2) |
| Unknown | 4219 (31.4) | 1063 (31.7) | 0(0) |
SEER Surveillance, Epidemiology, and End Results, CRC colorectal cancer, LM liver metastasis, NOS not otherwise specified, SD standard deviation, CEA carcinoembryonic antigen
Distributions of clinicopathological characteristics in two groups
| Variables | LM (−) | LM (+) | P value |
|---|---|---|---|
| N = 16,023 | N = 762 | ||
| Age at diagnosis, n (%) | < 0.001 | ||
| 0–9 | 14 (0.1) | 0 (0.0) | |
| 10–19 | 158 (1.0) | 1 (0.1) | |
| 20–29 | 324 (2.0) | 8 (1.0) | |
| 30–39 | 447 (2.8) | 22 (2.9) | |
| 40–49 | 1238 (7.7) | 97 (12.7) | |
| 50–59 | 4372 (27.3) | 205 (26.9) | |
| 60–69 | 4363 (27.2) | 197 (25.9) | |
| 70–79 | 3185 (19.9) | 144 (18.9) | |
| 80–89 | 1679 (10.5) | 78 (10.2) | |
| 90–99 | 243 (1.5) | 10 (1.3) | |
| Gender, n (%) | |||
| Female | 7784 (48.6) | 324 (42.5) | 0.001 |
| Male | 8239 (51.4) | 438 (57.5) | |
| Race, n (%) | 0.215 | ||
| White | 12,213 (76.2) | 565 (74.1) | |
| Black | 2100 (13.1) | 120 (15.7) | |
| Asian or Pacific Islander | 1601 (10.0) | 72 (9.4) | |
| American Indian/Alaska Native | 109 (0.7) | 5 (0.7) | |
| Marital status at diagnosis, n (%) | < 0.001 | ||
| Married | 8918 (55.7) | 376 (49.3) | |
| Single | 2611 (16.3) | 167 (21.9) | |
| Widowed | 1740 (10.9) | 90 (11.8) | |
| Divorced | 1417 (8.8) | 83 (10.9) | |
| Unknown | 1131 (7.1) | 36 (4.7) | |
| Separated | 166 (1.0) | 10 (1.3) | |
| Unmarried or Domestic Partner | 40 (0.2) | 0 (0.0) | |
| Primary site, n (%) | < 0.001 | ||
| Rectum, NOS | 4502 (28.1) | 239 (31.4) | |
| Sigmoid colon | 3540 (22.1) | 162 (21.3) | |
| Ascending colon | 1969 (12.3) | 90 (11.8) | |
| Cecum | 1884 (11.8) | 95 (12.5) | |
| Appendix | 1081 (6.7) | 3 (0.4) | |
| Rectosigmoid junction | 967 (6.0) | 94 (12.3) | |
| Transverse colon | 863 (5.4) | 26 (3.4) | |
| Descending colon | 569 (3.6) | 18 (2.4) | |
| Hepatic flexure of colon | 356 (2.2) | 17 (2.2) | |
| Splenic flexure of colon | 194 (1.2) | 9 (1.2) | |
| Colon, NOS | 53 (0.3) | 6 (0.8) | |
| Overlapping lesion of colon | 45 (0.3) | 3 (0.4) | |
| Tumor size, mm, mean (SD) | 17.5 (22.5) | 52.1 (39.2) | < 0.001 |
| Tumor grade, n (%) | < 0.001 | ||
| Well differentiated; Grade I | 5131 (32.0) | 55 (7.2) | |
| Moderately differentiated; Grade II | 9853 (61.5) | 567 (74.4) | |
| Poorly differentiated; Grade III | 891 (5.6) | 127 (16.7) | |
| Undifferentiated; anaplastic; Grade IV | 148 (0.9) | 13 (1.7) | |
| Tumor type, n (%) | < 0.001 | ||
| Adenocarcinoma, NOS | 4859 (30.3) | 608 (79.8) | |
| Adenocarcinoma in tubulovillous adenoma | 3669 (22.9) | 43 (5.6) | |
| Adenocarcinoma in adenomatous polyp | 3495 (21.8) | 40 (5.2) | |
| Carcinoid tumor, NOS | 2287 (14.3) | 4 (0.5) | |
| Adenocarcinoma in villous adenoma | 596 (3.7) | 13 (1.7) | |
| Neuroendocrine carcinoma, NOS | 495 (3.1) | 7 (0.9) | |
| Mucinous adenocarcinoma | 281 (1.8) | 18 (2.4) | |
| Squamous cell carcinoma, NOS | 59 (0.4) | 1 (0.1) | |
| Atypical carcinoid tumor | 49 (0.3) | 0 (0.0) | |
| Signet ring cell carcinoma | 32 (0.2) | 2 (0.3) | |
| Mucin-producing adenocarcinoma | 30 (0.2) | 2 (0.3) | |
| Tubular adenocarcinoma | 30 (0.2) | 0 (0.0) | |
| Gastrointestinal stromal sarcoma | 17 (0.1) | 0 (0.0) | |
| Villous adenocarcinoma | 12 (0.1) | 0 (0.0) | |
| Carcinoma, NOS | 11 (0.1) | 8 (1.0) | |
| Other | 101 (0.6) | 16 (2.1) | |
| N, n (%) | < 0.001 | ||
| N0 | 14,711 (91.8) | 462 (60.6) | |
| N1 | 1179 (7.4) | 267 (35.0) | |
| N2 | 133 (0.8) | 33 (4.3) | |
| CEA, n (%) | < 0.001 | ||
| Positive | 999 (6.2) | 524 (68.8) | |
| Negative | 4899 (30.6) | 68 (8.9) | |
| Borderline | 28 (0.2) | 3 (0.4) | |
| Unknown | 10,097 (63.0) | 167 (21.9) | |
| Tumor deposits, n (%) | < 0.001 | ||
| No tumor deposits | 10,867 (67.8) | 123 (16.1) | |
| Tumor Deposits identified | 111 (0.7) | 11 (1.4) | |
| Unknown | 5045 (31.5) | 628 (82.4) | |
| Perineural invasion, n (%) | < 0.001 | ||
| No | 11,040 (68.9) | 310 (40.7) | |
| Yes | 143 (0.9) | 10 (1.3) | |
| Unknown | 4840 (30.2) | 442 (58.0) |
LM liver metastasis, NOS not otherwise specified, SD standard deviation, CEA carcinoembryonic antigen
Fig. 2Predictive value of overall models after optimization. Inner validation in SEER database: a ROC curves of seven individual models and stacking model. Outer validation in our Chinese cohort: b ROC curves of seven individual models and stacking model. SEER: Surveillance, Epidemiology, and End Results; and ROC: receiver operating characteristic
Confusion matrices of developed models
| Confusion matrix | Inner validation | Outer validation | ||||
|---|---|---|---|---|---|---|
| Actual | Prediction | Actual | Prediction | |||
| LM (−) | LM (+) | LM (−) | LM (+) | |||
| LGBM | LM (+) | 42 | 113 | LM (+) | 4 | 4 |
| LM (−) | 3123 | 79 | LM (−) | 317 | 1 | |
| RF | LM (+) | 46 | 109 | LM (+) | 3 | 5 |
| LM (−) | 3136 | 66 | LM (−) | 318 | 0 | |
| GNB | LM (+) | 32 | 123 | LM (+) | 0 | 8 |
| LM (−) | 3051 | 151 | LM (−) | 313 | 5 | |
| KNN | LM (+) | 49 | 106 | LM (+) | 4 | 4 |
| LM (−) | 3111 | 91 | LM (−) | 316 | 2 | |
| MLP | LM (+) | 64 | 91 | LM (+) | 5 | 3 |
| LM (−) | 3131 | 71 | LM (−) | 303 | 15 | |
| CART | LM (+) | 41 | 114 | LM (+) | 3 | 5 |
| LM (−) | 3100 | 102 | LM (−) | 313 | 5 | |
| SVM | LM (+) | 35 | 120 | LM (+) | 0 | 8 |
| LM (−) | 3059 | 143 | LM (−) | 293 | 25 | |
| Stacking | LM (+) | 26 | 129 | LM (+) | 0 | 8 |
| LM (−) | 3062 | 140 | LM (−) | 303 | 15 | |
LM liver metastasis, LGBM Light Gradient Boosting Decision, RF Random Forest, GNB Gaussian Naive Bayesian, KNN K-Nearest Neighbor, MLP Multilayer Perceptron, CART Classification and Regression Trees, SVM Support Vector Machine
Fig. 3Estimation of models’ discriminant capability for T1 CRC patients with different tumor sizes. a Restricted cubic spline of tumor size. b ROC curves of seven individual models and stacking model for patients with different tumor sizes (1–50 mm and > 50 mm). CRC: colorectal cancer; and ROC: receiver operating characteristic
Fig. 4Decision tree tool to discriminate liver metastasis in T1 colorectal cancer patients