| Literature DB >> 33775248 |
Huan Yang1, Lili Chen2, Zhiqiang Cheng3, Minglei Yang1, Jianbo Wang1, Chenghao Lin1, Yuefeng Wang2, Leilei Huang2, Yangshan Chen2, Sui Peng4,5, Zunfu Ke6,7,8, Weizhong Li9,10,11.
Abstract
BACKGROUND: Targeted therapy and immunotherapy put forward higher demands for accurate lung cancer classification, as well as benign versus malignant disease discrimination. Digital whole slide images (WSIs) witnessed the transition from traditional histopathology to computational approaches, arousing a hype of deep learning methods for histopathological analysis. We aimed at exploring the potential of deep learning models in the identification of lung cancer subtypes and cancer mimics from WSIs.Entities:
Keywords: Cancer mimic; Deep learning; Histopathological classification; Lung cancer; Whole slide image
Year: 2021 PMID: 33775248 PMCID: PMC8006383 DOI: 10.1186/s12916-021-01953-2
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 8.775
Glance of deep learning-based lung cancer histological classification algorithms and general slide image analysing tools
| Research | Year | Objective | Cohort | AUC | Architecture | Framework | Language |
|---|---|---|---|---|---|---|---|
| Coudray et al. [ | 2018 | Classification between LUAD, LUSC, and NL; mutation prediction (STK11, EGFR, FAT1, SETBP1, KRAS, and TP53) | TCGA (1634 slides); NYU (340 slides) | 0.970 (classification) 0.733–0.856 (mutation) | Inception-V3 | TensorFlow | Python |
| Yu et al. [ | 2020 | Identification of histological types and gene expression subtypes of NSCLC | ICGC (87 LUAD patients, 38 LUSC patients); TCGA (427 LUAD patients, 457 LUSC patients) | 0.726–0.864 | AlexNet; GoogLeNet; VGGNet-16; ResNet-50 | Caffe | Python |
| Gertych et al. [ | 2019 | Histologic subclassification of LUAD (5 types) | CSMC (50 cases); MIMW (38 cases); TCGA (27 cases) | Accuracy, 0.892 (patch-level) | GoogLeNet; ResNet-50; AlexNet | Caffe | MATLAB |
| Wei et al. [ | 2019 | Histologic subclassification of LUAD (6 types) | DHMC (422 LUAD slides) | 0.986 (patch-level) | ResNet-18 | PyTorch | Python |
| Kriegsmann et al. [ | 2020 | Classification between LUAD, LUSC, SCLC and NL | 80 LUAD, 80 LUSC, 80 SCLC and 30 controls from NCT | 1.000 (after strict QC) | Inception-V3 | Keras (TensorFlow) | R |
| Wang et al. [ | 2020 | Classification between LUAD, LUSC, SCLC, and NL | SUCC (390 LUAD; 361 LUSC; 120 SCLC; and 68 NL slides); TCGA (250 LUAD and 250 LUSC slides in good quality) | 0.856 (for TCGA cohort) | Modified VGG-16 | TensorFlow | Python |
| QuPath [ | 2017 | Tumour identification, biomarker evaluation, batch-processing, and scripting | Specimens of 660 stage II/III colon adenocarcinoma patients from NIB | / | / | / | JAVA |
| DeepFocus [ | 2018 | Detection of out-of-focus regions in WSIs | 24 slides from OSU | / | CNN | TensorFlow | Python |
| ConvPath [ | 2019 | Cell type classification and TME analysis | TCGA (LUAD); NLST; SPORE; CHCAMS | / | CNN | / | MATLAB; R |
| HistoQC [ | 2019 | Digitization of tissue slides | TCGA (450 slides) | / | / | / | HTML5 |
| ACD model [ | 2015 | Colour normalization for H&E-stained WSIs | Camelyon-16 (400 slides); Camelyon-17 (1000 slides); Motic-cervix (47 slides); and Motic-lung (39 slides) | 0.914 (for classification) | ACD | TensorFlow | Python |
Abbreviations: LUAD, lung adenocarcinoma; LUSC, lung squamous cell cancer; NL, normal lung; TCGA, the Cancer Genome Atlas; NYU, New York University; ICGC, International Cancer Genome Consortium; CSMC, Cedars-Sinai Medical Center; MIMW, Military Institute of Medicine in Warsaw; DHMC, Dartmouth-Hitchcock Medical Center; NCT, National Center for Tumor Diseases; QC, quality control; SUCC, Sun Yat-sen University Cancer Center; NIB, Northern Ireland Biobank; OSU, Ohio State University; NLST, National Lung Screening Trial; SPORE, Special Program of Research Excellence; CHCAMS, Cancer Hospital of Chinese Academy of Medical Sciences; H&E, haematoxylin and eosin; WSIs, whole slide images; ACD, adaptive colour deconvolution
Fig. 1The data analysis workflow in details. ROIs of the H&E-stained slides were extracted by masking on the annotated regions and cropped into 256 × 256 pixels tiles to train the EfficientNet-B5 networks. Tile-level predictions were aggregated to inference the slide-level diagnoses. Tile numbers are in parentheses, and n is slide number
Details of SYSU1 dataset for the development of six-type classifier
| Number of slides (tiles) | |||||||
|---|---|---|---|---|---|---|---|
| Subsets | LUAD | LUSC | SCLC | PTB | OP | NL | SUM |
| 210 (179,402) | 77 (51,949) | 65 (17,342) | 43 (22,617) | 46 (17,987) | 70 (65,143) | 511 (354,440) | |
| 45 (43,153) | 18 (14,552) | 16 (1077) | 11 (3047) | 10 (4170) | 15 (12,526) | 115 (78,525) | |
| 43 | 16 | 22 | 10 | 10 | 14 | 115 (276,247) | |
| 298 | 111 | 103 | 64 | 66 | 99 | 741 (709,212) | |
Multi-centre cohorts collected for model validation
| Cohorts | LUAD | LUSC | SCLC | PTB | OP | NL | SUM |
|---|---|---|---|---|---|---|---|
| 56 | 64 | 52 | 30 | 25 | 91 | 318 | |
| 60 | 75 | 43 | 0 | 0 | 34 | 212 | |
| 141 | 134 | 0 | 0 | 0 | 147 | 422 |
Fig. 2High AUCs achieved across multiple cohorts. AUC was utilized to measure the performance of the model on different testing cohorts, including a the subset of the initial cohort SYSU1, b an independent internal cohort SYSU2, c an external cohort from Shenzhen People’s Hospital (SZPH) that contained 4 types of lung tissues, and d a public cohort from ‘TCGA’ which was actually a subset consisting of slides randomly selected from the TCGA-LUAD and TCGA-LUSC projects. Blind tests were conducted on all the cohorts by four pathologists of three levels (Pathologist1 is senior attending, Pathologist2 and Pathologist3 are junior attending, and Pathologist4 is junior); performance of each pathologist on each cohort was depicted as a star in a–d, respectively
Model performances across SYSU1, SYSU2, SZPH, and TCGA testing sets
| Metrics | LUAD | LUSC | SCLC | PTB | OP | NL | Macro-avg |
|---|---|---|---|---|---|---|---|
| Cohorts | |||||||
| | 0.80 | 0.75 | 0.91 | ||||
| | 0.85 | 0.79 | 0.80 | 0.88 | 0.96 | 0.86 | |
| | 0.84 | 0.94 | – | – | |||
| | 0.82 | 0.70 | – | – | – | 0.84 | |
| | 0.86 | 0.79 | 0.91 | 0.85 | 0.94 | 0.99 | 0.89 |
| | 0.75 | 0.77 | 0.80 | 0.60 | 0.93 | 0.81 | |
| | 0.84 | 0.72 | |||||
| | 0.93 | 0.67 | – | – | 0.91 | ||
| | 0.68 | 0.94 | – | – | – | 0.78 | 0.80 |
| | 0.86 | 0.85 | 0.79 | 0.87 | 0.72 | 0.89 | 0.84 |
| | 0.89 | 0.75 | 0.84 | 0.75 | 0.84 | ||
| | 0.85 | 0.79 | 0.86 | 0.95 | 0.86 | ||
| | 0.78 | – | – | 0.95 | |||
| | 0.74 | 0.80 | – | – | – | 0.88 | 0.80 |
| | 0.86 | 0.81 | 0.84 | 0.85 | 0.81 | 0.94 | 0.85 |
aFor the SZPH dataset, no PTB or OP WSIs were available
bFor TCGA dataset, only LUAD, LUSC, and NL WSIs were available
*Maximum Macro-avg value across the datasets of different diseases
Bold font: Maximum value of specific metrics across different data cohorts
EfficientNet-B5 outperformed ResNet-50 across four testing cohorts
| Cohort | Model | Micro-AUC | Macro-AUC | Accuracy | Weighted-F1-score |
|---|---|---|---|---|---|
| ResNet-50 | 0.966 | 0.985 | 0.860 | 0.860 | |
| EfficientNet-B5 | 0.970 | 0.988 | 0.860 | 0.860 | |
| ResNet-50 | 0.887 | 0.953 | 0.780 | 0.770 | |
| EfficientNet-B5 | 0.918 | 0.968 | 0.870 | 0.870 | |
| ResNet-50 | 0.713 | 0.733 | 0.540 | 0.520 | |
| EfficientNet-B5 | 0.963 | 0.971 | 0.890 | 0.900 | |
| ResNet-50 | 0.967 | 0.973 | 0.690 | 0.680 | |
| EfficientNet-B5 | 0.978 | 0.962 | 0.800 | 0.810 |
Fig. 3Visualization heatmaps of tissue predictions of LUAD, LUSC, SCLC, PTB, OP, and NL from left to the right, respectively. The top row shows the raw slides with closed blue curves delineating the ROIs annotated by expert pathologists, and the bottom row illustrates the corresponding resulting heatmaps
High ICCs between the model and pathologists across four independent testing cohorts indicate high consistency and comparable performance
| Raters | Six-type classification model (ICCa with 95% CIb) | |||
|---|---|---|---|---|
| SYSU1 | SYSU2 | SZPH | TCGA | |
| Ground truth | 0.941(0.691, 0.991) | 0.959 (0.776, 0.994) | 0.927 (0.453, 0.995) | |
| Pathologist1+++c | 0.938 (0.677, 0.991) | 0.957 (0.767, 0.994) | 0.878 (0.215, 0.991) | 0.918 (0.592, 0.988) |
| Pathologist2++c | 0.873 (0.422, 0.981) | 0.909 (0.356, 0.994) | 0.928 (0.633, 0.989) | |
| Pathologist3++c | 0.945 (0.709, 0.992) | 0.922 (0.608, 0.988) | ||
| Pathologist4+c | 0.944 (0.707, 0.992) | 0.800 (0.200, 0.969) | 0.905 (0.538, 0.986) | 0.754 (0.086, 0.961) |
| < 0.05 | < 0.05 | < 0.05 | < 0.05 | |
aICCs were computed with the ‘irr’ package for R v3.6.1 using the ‘oneway’ model to measure the reliability and consistency of diagnoses among raters
bCIs were given by bootstrapping the samples 10,000 times
c‘+’ symbols indicate the levels of pathologists, + means junior, ++ means junior attending, and +++ means senior attending
dICC ranges from 0 to 1, and a high ICC suggests a good consistency. Conventionally, when ICC > 0.75 and P < 0.05, high reliability, repeatability, and consistency were indicated
Fig. 4Sankey diagram illustrates the difference among ground truth, best pathologist, and our six-type classifier. From left to right are the predictions by the best pathologist, the ground truth, and the prediction by our six-type classifier
Misjudges from pathologists were corrected by the six-type classifier
| Cohorts | SYSU1 | SYSU2 | SZPH | TCGA |
|---|---|---|---|---|
| 31 | 84 | 21 | 120 | |
| 22 | 59 | 18 | 90 |
aErrors denote the number of slides misjudged by at least one of the pathologists
bCorrections denote the number of those misjudged slides corrected by our six-type classifier