| Literature DB >> 32528084 |
Dejun Zhou1, Fei Tian2, Xiangdong Tian1, Lin Sun3, Xianghui Huang4, Feng Zhao5, Nan Zhou5, Zuoyu Chen6, Qiang Zhang7, Meng Yang8, Yichen Yang8, Xuexi Guo9, Zhibin Li10, Jia Liu2, Jiefu Wang2, Junfeng Wang2, Bangmao Wang11, Guoliang Zhang12, Baocun Sun3, Wei Zhang13,14, Dalu Kong2, Kexin Chen15, Xiangchun Li16.
Abstract
Colonoscopy is commonly used to screen for colorectal cancer (CRC). We develop a deep learning model called CRCNet for optical diagnosis of CRC by training on 464,105 images from 12,179 patients and test its performance on 2263 patients from three independent datasets. At the patient-level, CRCNet achieves an area under the precision-recall curve (AUPRC) of 0.882 (95% CI: 0.828-0.931), 0.874 (0.820-0.926) and 0.867 (0.795-0.923). CRCNet exceeds average endoscopists performance on recall rate across two test sets (91.3% versus 83.8%; two-sided t-test, p < 0.001 and 96.5% versus 90.3%; p = 0.006) and precision for one test set (93.7% versus 83.8%; p = 0.02), while obtains comparable recall rate on one test set and precision on the other two. At the image-level, CRCNet achieves an AUPRC of 0.990 (0.987-0.993), 0.991 (0.987-0.995), and 0.997 (0.995-0.999). Our study warrants further investigation of CRCNet by prospective clinical trials.Entities:
Mesh:
Year: 2020 PMID: 32528084 PMCID: PMC7289893 DOI: 10.1038/s41467-020-16777-6
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Baseline characteristics of training set and three test sets.
| TCH training set | TCH test set | TFCH test set | TGH test set | |||||
|---|---|---|---|---|---|---|---|---|
| CRC | Non-CRC | CRC | Non-CRC | CRC | Non-CRC | CRC | Non-CRC | |
| Patients | 3176 | 9003 | 146 | 217 | 90 | 340 | 71 | 1399 |
| Images | 28,071 | 436,034 | 7485 | 13,298 | 2576 | 12,865 | 1614 | 46,777 |
| Male sex | 1985 | 4909 | 86 | 132 | 59 | 220 | 35 | 780 |
| Images | 17,697 | 245,936 | 4599 | 8151 | 1618 | 8351 | 760 | 26,510 |
| Female sex | 1191 | 4094 | 60 | 85 | 31 | 120 | 36 | 619 |
| Images | 10,734 | 190,071 | 2886 | 5147 | 958 | 4514 | 854 | 20,267 |
| Age (years) | 60 (53–67) | 57 (49–64) | 61 (53–66) | 59 (52–66) | 63 (53.3–72) | 58.5 (50–65) | 66 (59–74) | 58 (47.5–65) |
| Age ≤ 60 years male | 1037 (32.7%) | 3004 (33.4%) | 42 (28.8%) | 71 (32.7%) | 20 (22.2%) | 138 (40.6%) | 9 (25.7%) | 481 (61.7%) |
| Age > 60 years male | 948 (29.8%) | 1905 (21.2%) | 44 (30.1%) | 61 (28.1%) | 39 (43.3%) | 82 (24.1%) | 26 (74.3%) | 299 (38.3%) |
| Age ≤ 60 years female | 604 (19%) | 2672 (29.7%) | 28 (19.2%) | 53 (24.4%) | 15 (16.7%) | 52 (15.3%) | 11 (30.6%) | 342 (55.3%) |
| Age > 60 years female | 587 (18.5%) | 1422 (15.8%) | 32 (21.9%) | 32 (14.7%) | 16 (17.8%) | 68 (20%) | 25 (69.4%) | 277 (44.7%) |
| Tumor sitea | ||||||||
| Ascending colon | 631 (19.9%) | 15 (10.3%) | 20 (22.2%) | 19 (26.8%) | ||||
| Transverse colon | 143 (4.5%) | 17 (11.6%) | 8 (8.9%) | 9 (12.7%) | ||||
| Descending colon | 178 (5.6%) | 11 (7.5%) | 0 | 7 (9.9%) | ||||
| Sigmoid colon | 561 (17.7%) | 36 (24.7%) | 29 (32.2%) | 19 (26.8%) | ||||
| Rectum | 1663 (52.4%) | 67 (45.9%) | 33 (36.7%) | 17 (23.9%) | ||||
| TNM staginga | ||||||||
| I | 362 (11.4%) | 15 (10.3%) | 9 (10.0%) | 5 (7%) | ||||
| II | 1407 (44.3%) | 24 (16.4%) | 9 (10.0%) | 15 (21.1%) | ||||
| III | 231 (7.3%) | 34 (23.3%) | 12 (13.3%) | 9 (12.7%) | ||||
| IV | 62 (2%) | 3 (2.1%) | 1 (1.1%) | 1 (1.4%) | ||||
| Biopsy pathology | 1114 (35.7%) | 70 (47.9%) | 59 (65.6%) | 41 (57.7%) | ||||
aTumor site and TNM staging were reported for CRC patients only.
Fig. 1Flowchart depicting the development and evaluation of CRCNet.
a Model development consisted data curation and CRCNet training. b Evaluation of CRCNet on three test sets. c Comparison between CRCNet and five endoscopists on a subset of randomly selected cases. All CRC patients and 5050 control patients in the training set and all patients in three test sets have surgical specimen or biopsy for pathological evaluation.
Classification metrics of CRCNet at the patient level.
| The performance of CRCNet across three test sets | |||
|---|---|---|---|
| Performance metrics | Tianjin Cancer Hospital ( | Tianjin First Central Hospital ( | Tianjin General Hospital ( |
| Accuracy (95% CI) | 0.873 (0.835–0.906) | 0.916 (0.886–0.941) | 0.980 (0.972–0.987) |
| Recall rate (95% CI) | 0.904 (0.844–0.947) | 0.789 (0.690–0.868) | 0.746 (0.629–0.842) |
| Specificity (95% CI) | 0.853 (0.798–0.897) | 0.950 (0.921–0.971) | 0.992 (0.986–0.996) |
| Precision (95% CI) | 0.805 (0.736–0.863) | 0.807 (0.709–0.883) | 0.828 (0.713–0.911) |
| Negative predicted value (95% CI) | 0.930 (0.885–0.961) | 0.944 (0.915–0.966) | 0.987 (0.980–0.992) |
| Kappaa | 0.742 | 0.745 | 0.775 |
| F1b | 0.852 | 0.798 | 0.785 |
aMeasures the agreement between predicted classification with pathological report.
bHarmonic average of the precision and recall rate.
Fig. 2Performance of CRCNet versus endoscopists in identifying CRC.
Precision–recall curves of CRCNet on a TCH test set, b TFCH test set, and c TGH test sets. Area under the precision–recall curve and associated 95% confidence intervals are included. Blue stars depict precision and recall rate of each individual endoscopist and green stars are average performance of these five endoscopists.
Classification metrics of endoscopists versus CRCNet.
| The performance of endoscopists and CRCNet | ||||||
|---|---|---|---|---|---|---|
| Tianjin Cancer Hospital ( | Tianjin First Central Hospital ( | Tianjin General Hospital ( | ||||
| Performance metrics | Endoscopista | CRCNet | Endoscopista | CRCNet | Endoscopista | CRCNet |
| Accuracy (95% CI) | 0.824 (0.781–0.861) | 0.873 (0.835–0.906) | 0.928 (0.891–0.955) | 0.903 (0.863–0.935) | 0.934 (0.897–0.960) | 0.963 (0.933–0.982) |
| Recall rate (95% CI) | 0.849 (0.781–0.903) | 0.904 (0.844–0.947) | 0.867 (0.779–0.929) | 0.933 (0.861–0.975) | 0.900 (0.805–0.959) | 0.914 (0.823–0.968) |
| Specificity (95% CI) | 0.912 (0.867–0.946) | 0.853 (0.798–0.897) | 0.920 (0.873–0.954) | 0.890 (0.838–0.930) | 0.940 (0.898–0.969) | 0.980 (0.950–0.995) |
| Precision (95% CI) | 0.842 (0.764–0.902) | 0.805 (0.736–0.863) | 0.838 (0.751–0.905) | 0.792 (0.703–0.865) | 0.844 (0.744–0.917) | 0.941 (0.856–0.984) |
| Negative predicted value (95% CI) | 0.880 (0.825–0.924) | 0.930 (0.885–0.961) | 0.941 (0.899–0.969) | 0.967 (0.930–0.988) | 0.964 (0.928–0.986) | 0.970 (0.937–0.989) |
| Kappab | 0.622 | 0.742 | 0.82 | 0.785 | 0.828 | 0.903 |
| F1c | 0.768 | 0.852 | 0.878 | 0.857 | 0.873 | 0.928 |
aThe median value of five endoscopists.
bMeasures the agreement between predicted classification with pathological report.
cHarmonic average of the precision and recall rate.
Fig. 3Performance of CRCNet in identifying malignant colonoscopic images.
Precision–recall curves of CRCNet on a TCH test set, b TFCH test set, and c TGH test sets. Area under the precision–recall curve and associated 95% confidence intervals are included.
Fig. 4Exemplified class activation maps.
a Raw colonoscopic images. b Gradient-weighted class activation map. c Guided gradient-weighted class activation map. d Haematoxylin–eosin staining images with scale bars. The length of scale bar was shown above the bar. e Tumor location and TNM stage.