| Literature DB >> 35704090 |
Xijie Chen1,2, Wenhui Wang3, Junguo Chen1,4, Liang Xu1,5, Xiaosheng He1,4, Ping Lan1,4, Jiancong Hu6,7,8, Lei Lian9,10.
Abstract
PURPOSE: Watch and wait strategy is a safe and effective alternative to surgery in patients with locally advanced rectal cancer (LARC) who have achieved pathological complete response (pCR) after neoadjuvant therapy (NAT); present restaging methods do not meet clinical needs. This study aimed to construct a machine learning (ML) model to predict pCR preoperatively.Entities:
Keywords: Complete response; Machine learning; Neoadjuvant therapy; Rectal cancer
Mesh:
Year: 2022 PMID: 35704090 PMCID: PMC9262764 DOI: 10.1007/s00384-022-04157-z
Source DB: PubMed Journal: Int J Colorectal Dis ISSN: 0179-1958 Impact factor: 2.796
Fig. 1Flow chart of the study design
Demographic characteristics of LARC patients who received neoadjuvant therapy
| Variables | All | Training set | Tuning set | |
|---|---|---|---|---|
| Gender | 0.936 | |||
| Man | 691 (71.8%) | 499 (71.9%) | 192 (71.6%) | |
| Woman | 271 (28.2%) | 195 (28.1%) | 76 (28.4%) | |
| Age | 57 (47.0, 64.0) | 56 (46, 64) | 57 (48.0, 65.0) | 0.211 |
| BMI | 0.098 | |||
| < 18.5 | 83 (8.6%) | 64 (9.2%) | 19 (7.1%) | |
| 18.5 ≤ X < 24 | 575 (59.8%) | 424 (61.1%) | 151 (56.3%) | |
| ≥ 24 | 304 (31.6%) | 206 (29.7%) | 98 (36.6%) | |
| Family history of cancer | 0.178 | |||
| Yes | 15 (1.6%) | 8 (1.2%) | 7 (2.6%) | |
| No | 947 (98.4%) | 686 (98.8%) | 261 (97.4%) | |
| History of cancer | 0.665 | |||
| Yes | 3 (0.3%) | 3 (0.4%) | 0 (0%) | |
| No | 959 (99.7%) | 691 (99.65%) | 268 (100%) | |
| Differentiation | 0.902 | |||
| Well | 260 (27.0%) | 189 (27.2%) | 71 (26.5%) | |
| Medium | 605 (63.0%) | 438 (63.1%) | 167 (62.3%) | |
| Poorly | 63 (6.5%) | 43 (6.2%) | 20 (7.5%) | |
| Undifferentiation | 34 (3.5%) | 24 (3.5%) | 10 (3.7%) | |
| Depth of tumor invasion | < 0.0001 | |||
| 1–2 | 72 (7.5%) | 72 (10.4%) | 0 (0) | |
| 3–4 | 890 (92.5%) | 622 (89.6%) | 268 (100%) | |
| Distance of tumor from the anal | 5.5 (3.9, 7.6) | 5.4 (3.8, 7.5) | 6.0 (4.0, 8.0) | 0.062 |
| tumor location | < 0.0001 | |||
| Upper | 71 (7.4%) | 37 (5.3%) | 34 (12.7%) | |
| Middle | 439 (45.6%) | 334 (48.1%) | 105 (39.2%) | |
| Lower | 452 (47%) | 323 (46.6%) | 129 (48.1%) | |
| Tumor size by MRI | 3.0 (2.2, 4.0) | 3.0 (4.0, 2.2) | 3.0 (2.3, 3.7) | 0.19 |
| Recurrent tumor | 1 | |||
| Yes | 9 (0.9%) | 6 (0.9%) | 3 (1.1%) | |
| No | 953 (99.1%) | 688 (99.1%) | 265 (98.9%) | |
| Tumor size by colonoscopy | 0.5 (0.33, 1.0) | 0.5 (0.33, 0.81) | 0.5 (0.4, 1.0) | 0.069 |
| Initial CA125 | 9.4 (6.6, 13.0) | 9.2 (6.5, 12.9) | 9.8 (6.63, 13.2) | 0.526 |
| Initial CEA | 4.1 (2.1, 9.9) | 4.0 (2.1, 9.1) | 4.8 (2.2, 10.5) | 0.157 |
| Initial CA199 | 9.2 (4.2, 21.9) | 9.2 (4.1, 21.3) | 9.2 (4.4, 23.1) | 0.449 |
| Initial CA153 | 7.5 (5.8, 11.3) | 7.5 (5.5, 11.1) | 7.4 (5.7, 11.5) | 0.626 |
| Initial AFP | 2.6 (2.0, 3.7) | 2.6 (1.9, 3.7) | 2.6 (2.0, 3.7) | 0.984 |
| Preoperative CA125 | 10.6 (7.6, 14.5) | 10.4 (7.4, 14.6) | 10.8 (7.7, 14.2) | 0.736 |
| Preoperative CEA | 2.9 (1.7, 5.0) | 2.5 (1.6, 4.5) | 3.5 (2.2, 6.0) | < 0.0001 |
| Preoperative CA199 | 7.3 (3.5, 15.5) | 7.5 (3.5, 15.2) | 7.2 (3.6, 16.2) | 0.660 |
| Preoperative CA153 | 10.2 (7.5, 15.0) | 10.2 (7.5, 15.0) | 10.3 (7.6, 15.1) | 0.583 |
| Preoperative AFP | 3.4 (2.5, 4.9) | 3.5 (2.5, 4.9) | 3.3 (2.5, 4.8) | 0.497 |
| CEA difference | −0.8 (−5.1, 0.3) | −0.9 (−5.1, 0.1) | −0.4 (−4.9, 1.1) | 0.001 |
| Ratio of CEA difference | −0.3 (−6.2, 0.1) | −0.3 (−0.6, 0.1) | −0.1 (−0.6, 0.4) | < 0.0001 |
| CA199 difference | −0.3 (−0.6, 0.1) | −0.3 (−6.7, 1.4) | −0.2 (−7.3, 1.2) | 0.802 |
| Ratio of CA199 difference | −0.1 (−0.5, 0.2) | −0.1 (−0.5, 0.2) | −0.1 (−0.5, 0.2) | 0.847 |
| Radiotherapy | < 0.0001 | |||
| Yes | 442 (45.9%) | 358 (51.6%) | 84 (31.3%) | |
| No | 520 (54.1%) | 336 (48.4%) | 184 (68.7%) | |
| Chemotherapy | < 0.0001 | |||
| Single-agent | 145 (15.1%) | 119 (17.1%) | 26 (9.7%) | |
| Double-agent | 651 (67.7%) | 462 (66.6%) | 189 (70.5%) | |
| Triple-agent | 159 (16.5%) | 113 (16.3%) | 46 (17.2%) | |
| Unknown | 7 (0.7%) | 0 (0) | 7 (2.6%) | |
| pCR | 0.849 | |||
| Yes | 147 (15.3%) | 107 (15.4%) | 40 (14.9%) | |
| No | 815 (84.7%) | 587 (84.6%) | 228 (85.1%) |
LARC, locally advanced rectal cancer; BMI, body mass index; pCR, pathologic complete response; CA199, carbohydrate antigen 199; CA125, carbohydrate antigen 125; AFP, alpha-fetoprotein; CEA, carcinoembryonic antigen
Fig. 2ROC curve for assessing clinical performance of the ML model. A ROC curve generated by five-fold cross validation in the training set. B ROC curve in the tuning set. ROC, receiver operating characteristics curve; ML, machine learning
Fig. 3SHAP value distribution of each sample in different variables and feature importance rankings to predict outcomes of the model. A Nonlinear distribution of each feature in the training set: the higher the absolute value of SHAP, the stronger the effect on the outcomes. B Feature importance rankings in the training set. The horizontal axis represents the relationship between each feature and the probability of pCR. The longitudinal axis shows the variable names. Feature importance rankings in a descending order are dependent on the average values of SHAP. The color indicates the SHAP value of the feature where high value is coded in dark purple (positive impact) and dark yellow (negative impact) and a low value is coded in light purple and light yellow: the darker the color, the stronger the prediction. C Nonlinear distribution of each feature in the tuning set. D Feature importance rankings in the tuning set. pCR, pathological complete response; SHAP, SHapley Additive exPlanations
Univariate and multivariate analyses for identifying risk factors associated with binary tumor response in LARC patients who received neoadjuvant therapy
| Variables | Univariate analysis | Multivariate analysis | |||
|---|---|---|---|---|---|
| Non-pCR | pCR | OR (95%CI) | |||
| Gender | 0.492 | ||||
| Man | 425 (72.4%) | 74 (69.2%) | |||
| Woman | 162 (27.6%) | 33 (30.8%) | |||
| Age | 0.351 | ||||
| < 50 | 187 (31.9%) | 39 (36.4%) | |||
| ≥ 50 | 400 (68.1%) | 68 (63.6%) | |||
| BMI | 0.209 | ||||
| < 18.5 | 59 (7.7%) | 5 (4.7%) | |||
| 18.5 ≤ X < 24 | 355 (62.8%) | 69 (64.5%) | |||
| ≥ 24 | 173 (29.5%) | 33 (30.8%) | |||
| Family history of cancer | 0.470 | ||||
| Yes | 8 (1.4%) | 0 (0) | |||
| No | 579 (98.6%) | 107 (100%) | |||
| History of cancer | 1 | ||||
| Yes | 3 (0.5%) | 0 (0) | |||
| No | 584 (99.5%) | 107 (100%) | |||
| Differentiation | 0.823 | ||||
| Well | 161 (27.4%) | 28 (26.1%) | |||
| Medium | 372 (63.4%) | 66 (61.7%) | |||
| Poorly | 35 (6.0%) | 8 (7.5%) | |||
| Undifferentiation | 19 (3.2%) | 5 (4.7%) | |||
| Depth of tumor invasion | < 0.001 | ||||
| 1–2 | 47 (8.0%) | 25 (23.4%) | Reference | ||
| 3–4 | 540 (92.0%) | 82 (76.6%) | 0.281 (0.159–0.498) | < 0.001 | |
| Distance of tumor from the anal | 0.178 | ||||
| < 5 | 249 (42.4%) | 55 (51.4%) | |||
| 5 ≤ X < 10 | 293 (49.9%) | 47 (43.9%) | |||
| ≥ 10 | 45 (7.7%) | 5 (4.7%) | |||
| Tumor location | 0.201 | ||||
| Upper | 34 (5.8%) | 3 (2.8%) | |||
| Middle | 287 (48.9%) | 47 (43.9%) | |||
| Lower | 266 (45.3%) | 57 (53.3%) | |||
| Tumor size by MRI | < 0.001 | ||||
| < 2.5 | 168 (28.6%) | 49 (45.8%) | |||
| ≥ 2.5 | 419 (71.4%) | 58 (54.2%) | |||
| Recurrent tumor | 0.629 | ||||
| Yes | 6 (1.0%) | 0 (0) | |||
| No | 581 (99.0%) | 107 (100%) | |||
| Tumor size by colonoscopy | 0.006 | ||||
| ≤ 0.5 | 328 (55.9%) | 75 (70.1%) | |||
| > 0.5 | 259 (44.1%) | 32 (29.9%) | |||
| Initial CA125 | 0.279 | ||||
| ≤ 7.5 | 213 (36.3%) | 33 (30.8%) | |||
| > 7.5 | 374 (63.7%) | 74 (69.2%) | |||
| Initial CEA | 0.011 | ||||
| ≤ 2.0 | 145 (24.7%) | 39 (36.4%) | |||
| > 2.0 | 440 (75.3%) | 68 (63.6%) | |||
| Initial CA199 | 0.053 | ||||
| ≤ 12.5 | 342 (58.3%) | 73 (68.2%) | |||
| > 12.5 | 244 (41.7%) | 34 (31.8%) | |||
| Initial CA153 | 0.945 | ||||
| ≤ 6.0 | 183 (31.2%) | 33 (30.8%) | |||
| > 6.0 | 404 (68.8%) | 74 (69.2%) | |||
| Initial AFP | 0.318 | ||||
| ≤ 5.0 | 529 (90.1%) | 93 (86.9%) | |||
| > 5.0 | 58 (9.9%) | 14 (13.1%) | |||
| Preoperative CA125 | 0.023 | ||||
| ≤ 7.5 | 160 (27.3%) | 18 (16.8%) | Reference | ||
| > 7.5 | 427 (72.7%) | 89 (83.2%) | 0.425 (0.243–0.745) | 0.003 | |
| Preoperative CEA | 0.002 | ||||
| ≤ 2.0 | 200 (34.1%) | 53 (49.5%) | Reference | ||
| > 2.0 | 387 (65.9%) | 54 (50.5%) | 0.591 (0.380–0.920) | 0.02 | |
| Preoperative CA199 | 0.011 | ||||
| ≤ 12.5 | 394 (67.1%) | 85 (79.4%) | Reference | ||
| > 12.5 | 193 (32.9%) | 22 (20.6%) | 0.519 (0.307–0.877) | 0.014 | |
| Preoperative CA153 | 0.12 | ||||
| ≤ 6.0 | 75 (12.8%) | 8 (7.5%) | |||
| > 6.0 | 512 (87.2%) | 99 (92.5%) | |||
| Preoperative AFP | 0.438 | ||||
| ≤ 5.0 | 446 (76.0%) | 85 (79.4%) | |||
| > 5.0 | 141 (24.0%) | 22 (20.6%) | |||
| CEA difference | 0.251 | ||||
| ≤ 0 | 432 (73.6%) | 73 (68.2%) | |||
| > 0 | 155 (26.4%) | 34 (31.8%) | |||
| Ratio of CEA difference | 0.251 | ||||
| ≤ 0 | 432 (73.6%) | 73 (68.2%) | |||
| > 0 | 155 (26.4%) | 34 (31.8%) | |||
| CA199 difference | 0.436 | ||||
| ≤ 0 | 385 (65.6%) | 66 (61.7%) | |||
| > 0 | 202 (34.4%) | 41 (38.3%) | |||
| Ratio of CA199 difference | 0.436 | ||||
| ≤ 0 | 385 (65.6%) | 66 (61.7%) | |||
| > 0 | 202 (34.4%) | 41 (38.3%) | |||
| Neoadjuvant radiotherapy | < 0.0001 | ||||
| Yes | 280 (47.7%) | 78 (72.9%) | Reference | ||
| No | 307 (52.3%) | 29 (27.1%) | 0.356 (0.222–0.571) | < 0.001 | |
| Chemotherapy | 0.687 | ||||
| Single-agent | 102 (17.4%) | 17 (15.9%) | |||
| Double-agent | 387 (65.9%) | 75 (70.1%) | |||
| Triple-agent | 98 (16.7%) | 15 (14.0%) | |||
LARC, locally advanced rectal cancer; BMI, body mass index; pCR, pathologic complete response; CA199, carbohydrate antigen 199; CA125, carbohydrate antigen 125; AFP, alpha-fetoprotein; CEA, carcinoembryonic antigen
Fig. 4Nomogram construction and validation in both training and tuning sets. A The total points are calculated by adding the point value of each variable, which is decided by drawing a straight line up to the point axis. The probability of pCR is determined by drawing a straight line down from the total point axis. B ROC curve to evaluate the performance for predicting pCR in the training set. C The calibration curve of the training set shows the fitness of the predictive events to the actual events. The 45° dotted lines represent the ideal status with a 100% accuracy. The apparent line represents the predictive ability of the model; the closer the apparent line to the ideal line, the more precise is the model. D ROC curve to evaluate the performance for predicting pCR in the tuning set. ROC, receiver operating characteristics; pCR, pathologic complete response
Model performance for predicting pCR
| Outcome | ML classifier | Nomogram model | ||
|---|---|---|---|---|
| Training set | Tuning set | Training set | Tuning set | |
| AUROC | 0.95 | 0.73 | 0.72 | 0.69 |
| Sensitivity | 82.2% | 71.9% | 43.0% | 55.0% |
| Specificity | 91.6% | 70.0% | 87.1% | 78.5% |
pCR, pathologic complete response; ML, machine learning; AUROC, area under the receiver operating characteristic curve