| Literature DB >> 32587295 |
Dan Jiang1,2, Junhua Liao3,4, Haihan Duan3,4, Qingbin Wu5,6, Gemma Owen7, Chang Shu8, Liangyin Chen3,4, Yanjun He1, Ziqian Wu8, Du He1,2, Wenyan Zhang9,10, Ziqiang Wang11,12.
Abstract
Limited biomarkers have been identified as prognostic predictors for stage III colon cancer. To combat this shortfall, we developed a computer-aided approach which combing convolutional neural network with machine classifier to predict the prognosis of stage III colon cancer from routinely haematoxylin and eosin (H&E) stained tissue slides. We trained the model by using 101 cancers from West China Hospital (WCH). The predictive effectivity of the model was validated by using 67 cancers from WCH and 47 cancers from The Cancer Genome Atlas Colon Adenocarcinoma database. The selected model (Gradient Boosting-Colon) provided a hazard ratio (HR) for high- vs. low-risk recurrence of 8.976 (95% confidence interval (CI), 2.824-28.528; P, 0.000), and 10.273 (95% CI, 2.177-48.472; P, 0.003) in the two test groups, from the multivariate Cox proportional hazards analysis. It gave a HR value of 10.687(95% CI, 2.908-39.272; P, 0.001) and 5.033 (95% CI,1.792-14.132; P, 0.002) for the poor vs. good prognosis groups. Gradient Boosting-Colon is an independent machine prognostic predictor which allows stratification of stage III colon cancer into high- and low-risk recurrence groups, and poor and good prognosis groups directly from the H&E tissue slides. Our findings could provide crucial information to aid treatment planning during stage III colon cancer.Entities:
Mesh:
Year: 2020 PMID: 32587295 PMCID: PMC7316723 DOI: 10.1038/s41598-020-67178-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Prognostic prediction results for Image Set B and Image Set C. (A,B) represent the Kaplan-Meier plots for Gradient Boosting-Colon machine classifier using disease free survival as endpoint, (C,D) illustrate the Kaplan-Meier plots for Gradient Boosting-Colon machine classifier using overall survival as endpoint. (A,C) are the cases from Image Set B testing set, (B,D) are the cases from TCGA dataset. The number of cases in each category is indicated in the plots.
Univariate and multivariate Cox proportional hazards model based on disease-free survival (DFS) in Image Set B testing set.
| Variable | Subtype | Univariate | Multivariate | ||||
|---|---|---|---|---|---|---|---|
| HR | 95%CI | HR | 95%CI | ||||
| AI DFS status | High- | ||||||
| Age(y) | >50 | 0.241 | 2.418 | 0.553–10.579 | 0.081 | 4.417 | 0.832–23.437 |
| Gender | Male | 0.982 | 1.011 | 0.391–2.611 | 0.395 | 0.580 | 0.165–2.037 |
| Tumor site | Left | 0.132 | 0.482 | 0.186–1.247 | 0.115 | 0.420 | 0.143–1.235 |
| Tumor size | <5 cm | 0.924 | 1.046 | 0.412–2.655 | 0.276 | 0.496 | 0.141–1.752 |
| Histologic type | Muc + Sig | 0.119 | 2.692 | 0.774–9.336 | 0.817 | 1.260 | 0.178–8.921 |
| Histologic grade | G3 | 0.184 | 1.945 | 0.728–5.192 | 0.088 | 2.813 | 0.857–9.231 |
| pT | T4 | 0.056 | 2.540 | 0.977–6.602 | |||
| pN | N2 | 0.123 | 2.173 | 0.811–5.820 | 0.286 | 0.428 | 0.090–2.036 |
| TNM stage | III C | ||||||
Abbreviation: HR, hazard risk; CI, confidence interval; AI, artificial intelligence; Muc, mucous adenocarcinoma, Sig, signet ring cell adenocarcinoma; Ade, adenocarcinoma; G3, grade 3 (poor differentiation); G1, grade 1 (well differentiation); G2, grade 2 (moderated differentiation); pT, pathological primary tumor stage; pN, pathological lymph node stage; TNM, tumor, lymph node, and metastasis stage.
Univariate and multivariate Cox proportional hazards model based on the disease-free survival (DFS) in the 47 patients from TCGA-COAD.
| Variable | Subtype | DFS/Univariate | DFS/Multivariate | ||||
|---|---|---|---|---|---|---|---|
| HR | 95%CI | HR | 95%CI | ||||
| AI DFS status | High- | ||||||
| Age(y) | >50 | 0.887 | 0.921 | 0.297–2.861 | 0.967 | 1.044 | 0.140–7.770 |
| Gender | Male | 0.133 | 0.468 | 0.174–1.261 | 0.538 | 0.646 | 0.161–2.596 |
| Tumor site | Left | 0.152 | 0.428 | 0.134–1.368 | 0.740 | 0.789 | 0.195–3.199 |
| Histologic type | Muc + Sig | 0.237 | 2.000 | 0.634–6.310 | 0.662 | 1.455 | 0.271–7.802 |
| pT | T4 | 0.068 | 3.277 | 0.915–11.731 | |||
| pN | N2 | 0.210 | 1.899 | 0.697–5.176 | |||
| TNM stage | IIIC | 0.716 | 1.215 | 0.425–3.478 | 0.054 | 0.108 | 0.011–1.042 |
Abbreviation: HR, hazard risk; CI, confidence interval; AI, artificial intelligence; Muc, mucous adenocarcinoma, Sig, signet ring cell adenocarcinoma; Ade, adenocarcinoma; pT, pathological primary tumor stage; pN, pathological lymph node stage; TNM, tumor, lymph node, and metastasis stage.
Univariate and multivariate Cox proportional hazards model based on overall survival (OS) in Image Set B testing set.
| Variable | Subtype | Univariate | Multivariate | ||||
|---|---|---|---|---|---|---|---|
| HR | 95%CI | HR | 95%CI | ||||
| AI OS status | Poor | ||||||
| Age(y) | >50 | 0.209 | 3.673 | 0.483–27.937 | 0.233 | 3.693 | 0.431–31.662 |
| Gender | Male | 0.665 | 1.256 | 0.447–3.533 | 0.756 | 1.209 | 0.365–4.002 |
| Tumor site | Left | 0.351 | 0.611 | 0.217–1.719 | 0.110 | 0.416 | 0.141–1.222 |
| Tumor size | <5 cm | 0.921 | 1.052 | 0.382–2.903 | 0.847 | 0.898 | 0.299–2.699 |
| Histologic type | Muc + Sig | 0.253 | 2.092 | 0.590–7.417 | 0.856 | 0.862 | 0.174–4.274 |
| Histologic grade | G3 vs. G1 + G2 | 0.060 | 3.004 | 0.956–9.437 | 0.206 | 2.443 | 0.613–9.739 |
| pT | T4 | ||||||
| pN | N2 | 0.131 | 2.218 | 0.789–6.237 | 0.377 | 0.476 | 0.092–2.470 |
| TNM stage | IIIC | ||||||
Abbreviation: HR, hazard risk; CI, confidence interval; AI, artificial intelligence; Muc, mucous adenocarcinoma, Sig, signet ring cell adenocarcinoma; Ade, adenocarcinoma; G3, grade 3 (poor differentiation); G1, grade 1 (well differentiation); G2, grade 2 (moderated differentiation); pT, pathological primary tumor stage; pN, pathological lymph node stage; TNM, tumor, lymph node, and metastasis stage.
Univariate and multivariate Cox proportional hazards model based on the overall survival (OS) in the 47 patients from TCGA-COAD.
| Variable | Subtype | OS/Univariate | OS/Multivariate | ||||
|---|---|---|---|---|---|---|---|
| HR | 95%CI | HR | 95%CI | ||||
| AI OS status | Poor | ||||||
| Age(y) | >50 | 0.110 | 2.661 | 0.801–8.836 | 0.183 | 2.982 | 0.598–14.878 |
| Gender | Male | 0.569 | 0.806 | 0.382–1.697 | 0.608 | 1.296 | 0.481–3.492 |
| Tumor site | Left | 0.219 | 0.597 | 0.262–1.359 | 1.516 | 1.414 | 0.496–4.029 |
| Histologic type | Muc + Sig | 0.306 | 1.613 | 0.645–4.034 | 0.238 | 2.075 | 0.618–6.968 |
| pT | T4 | ||||||
| pN | N2 | ||||||
| TNM stage | IIIC | 0.132 | 1.887 | 0.826–4.311 | 0.166 | 0.373 | 0.092–1.508 |
Abbreviation: HR, hazard risk; CI, confidence interval; AI, artificial intelligence; Muc, mucous adenocarcinoma, Sig, signet ring cell adenocarcinoma; Ade, adenocarcinoma; pT, pathological primary tumor stage; pN, pathological lymph node stage; TNM, tumor, lymph node, and metastasis stage.
P values of correlation test between the 45 morphological parameters with the predictive cancer recurrence risk and prognosis risk.
| Predicative recurrence risk | Predicative prognosis risk | |||||
|---|---|---|---|---|---|---|
| ratio | mean | median | ratio | mean | median | |
| DEB | 0.735 | 0.886 | 0.854 | 0.025 | 0.042 | |
| LYM | 0.753 | 0.568 | 0.430 | 0.284 | 0.711 | 0.137 |
| MUC | 0.274 | 0.654 | 0.430 | 0.135 | 0.128 | 0.092 |
| STR | 0.883 | 0.940 | 0.793 | 0.899 | 0.733 | 0.821 |
| TUM | 0.060 | 0.103 | 0.066 | 0.105 | 0.074 | |
| DEB/LYM | 0.512 | 0.092 | 0.793 | 0.312 | 0.504 | 0.113 |
| DEB/MUC | 0.694 | 0.777 | 0.189 | 0.101 | ||
| DEB/STR | 0.952 | 0.149 | 0.854 | |||
| DEB/TUM | 0.264 | 0.338 | 0.189 | |||
| LYM/MUC | 0.142 | 0.314 | 0.189 | 0.630 | 0.078 | |
| LYM/STR | 0.538 | 0.753 | 0.430 | 0.216 | 0.541 | 0.113 |
| LYM/TUM | 0.946 | 0.583 | 0.430 | 0.123 | 0.624 | 0.053 |
| MUC/STR | 0.467 | 0.301 | 0.733 | 0.357 | 0.307 | 0.497 |
| MUC/TUM | 0.163 | 0.205 | 0.066 | 0.540 | 0.884 | 0.258 |
| STR/TUM | 0.512 | 0.265 | 0.386 | 0.750 | 0.328 | 0.497 |
Abbreviation: DEB, debris; LYM, lymphocyte; MUC, mucus; STR, stroma; TUM, tumor.
Figure 2Flowchart of this study. Briefly, Image Set A (image patches which were annotated as 9-categry in tissue slides from colorectal cancer, downloaded from the published database) was used as training set to train multiple neural networks (CNNs). The InceptionResNet V2 was locked-down after category-recognition training, due to highest accuracy in to recognizing the image patches from Image Set B and calculating the proportions of each tissue category in each whole slide (pie charts), after discarding Background. Image Set B was separated into training set (60%) and test set (40%), and the training set with the proportions of 8-tissue category was sent into multiple machine classifiers to construct the predictive model. The test set was applied to test the accuracy of each machine predictive model. Validated the performance of each predictive model by using Image Set C. Finally, Gradient Boosting Decision Tree was chosen as our predictive model.
Figure 3A neural network (CNN) segmented and restored the H&E whole tissue slides. The CNN, InceptionResNetV2 was used to recognize the nine categories (ADI, adipose tissue; BACK, background; DEB, debris; LYM, lymphocyte; MUC, mucus; MUS, muscle; NORM, normal mucosa; STR, stroma; TUM, tumor) in each whole tissue slides from the Image Set B and C. Left panel showed the original H&E staining tissue slides, the right panel was the classification maps restored by CNN, the pie charts showed the proportions of each tissue category. (A) typical adenocarcinoma and (B) mucous adenocarcinoma were from the Image Set B. (C,D) were from the Image Set C. (C) showed some problems caused by handcraft, such as tissue fold and hollowing, (D) presented visualization problems caused by uneven fixation and covering of the slides. Despite these imperfections in the whole tissue slides, the trained CNN still can perfectly recognize the different tissue categories.