| Literature DB >> 33854137 |
Fahdi Kanavati1, Gouji Toyokawa2, Seiya Momosaki3, Hiroaki Takeoka4, Masaki Okamoto4, Koji Yamazaki2, Sadanori Takeo2, Osamu Iizuka5, Masayuki Tsuneki6,7.
Abstract
The differentiation between major histological types of lung cancer, such as adenocarcinoma (ADC), squamous cell carcinoma (SCC), and small-cell lung cancer (SCLC) is of crucial importance for determining optimum cancer treatment. Hematoxylin and Eosin (H&E)-stained slides of small transbronchial lung biopsy (TBLB) are one of the primary sources for making a diagnosis; however, a subset of cases present a challenge for pathologists to diagnose from H&E-stained slides alone, and these either require further immunohistochemistry or are deferred to surgical resection for definitive diagnosis. We trained a deep learning model to classify H&E-stained Whole Slide Images of TBLB specimens into ADC, SCC, SCLC, and non-neoplastic using a training set of 579 WSIs. The trained model was capable of classifying an independent test set of 83 challenging indeterminate cases with a receiver operator curve area under the curve (AUC) of 0.99. We further evaluated the model on four independent test sets-one TBLB and three surgical, with combined total of 2407 WSIs-demonstrating highly promising results with AUCs ranging from 0.94 to 0.99.Entities:
Year: 2021 PMID: 33854137 PMCID: PMC8046816 DOI: 10.1038/s41598-021-87644-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) shows representative WSIs from the training set for each of the four labels: ADC, SCC, SCLC, and non-neoplastic. (b) shows a higher level overview of the training method, where tiles are randomly sampled, in a balanced manner, from the training WSIs and provided as input to train the CNN model. The CNN model is then used as a feature extractor by passing in the outputs from the penultimate layer of the CNN as input to the RNN model. All the tiles from a given WSI are fed into the RNN to provide a final WSI diagnosis.
Distribution of subtype labels in the training set and the five test sets.
| ADC | SCC | SCLC | Non-neoplastic | Total | ||
|---|---|---|---|---|---|---|
| Training | Kyushu Medical Centre (TBLB) | 76 | 48 | 17 | 393 | 534 |
| Kyushu Medical Centre (TBLB-indeterminate) | 28 | 17 | 0 | 0 | 45 | |
| Validation | Kyushu Medical Centre (TBLB) | 18 | 7 | 4 | 30 | 59 |
| Test | Kyushu Medical Centre (TBLB) | 180 | 85 | 57 | 180 | 502 |
| Kyusyu Medical Centre (TBLB-indeterminate) | 64 | 19 | 0 | 0 | 83 | |
| Kyushu Medical Centre (surgical) | 110 | 37 | 4 | 349 | 500 | |
| Mita Hospital (surgical) | 152 | 36 | 4 | 308 | 500 | |
| TCGA (surgical) | 433 | 226 | 0 | 246 | 905 | |
| Total | 1061 | 475 | 86 | 1506 | 3128 |
ROC AUCs for ADC, SCC, and SCLC computed on the test sets in which they are present, with the WSI diagnosis obtained with either the RNN model or max-pooling. The ROC AUCs were also computed for the neoplastic label by grouping ADC, SCC, and SCLC.
| ROC AUC w/RNN | ROC AUC w/max-pool | ||
|---|---|---|---|
| ADC | Kyushu Medical Centre (TBLB) | 0.964 (0.942–0.978) | 0.922 (0.901–0.944) |
| Kyushu Medical Centre (TBLB-indeterminate) | 0.993 (0.971—1.0) | 0.814 (0.684-0.891) | |
| Kyushu Medical Centre (surgical) | 0.975 (0.95—0.995) | 0.97 (0.954-0.984) | |
| Mita Hospital (surgical) | 0.974 (0.951–0.993) | 0.987 (0.978–0.995) | |
| TCGA (surgical) | 0.94 (0.923–0.952) | 0.822 (0.798–0.848) | |
| SCC | Kyushu Medical Centre (TBLB) | 0.968 (0.941–0.99) | 0.974 (0.959–0.987) |
| Kyushu Medical Centre (TBLB-indeterminate) | 0.996 (0.981–1.0) | 0.989 (0.957–1.0) | |
| Kyushu Medical Centre (surgical) | 0.974 (0.937–0.994) | 0.985 (0.975–0.994) | |
| Mita Hospital (surgical) | 0.981 (0.966–0.993) | 0.979 (0.965–0.994) | |
| TCGA (surgical) | 0.961 (0.944–0.976) | 0.959 (0.944–0.97) | |
| SCLC | Kyushu Medical Centre (TBLB) | 0.995 (0.99–0.999) | 0.994 (0.998–0.999) |
| Kyushu Medical Centre (surgical) | 0.996 (0.991–1.0) | 0.995 (0.991–1.0) | |
| Mita Hospital (surgical) | 0.999 (0.993–1.0) | 0.999 (0.992–1.0) | |
| Neoplastic | Kyushu Medical Centre (TBLB) | 0.979 (0.968–0.988) | 0.992 (0.987–0.997) |
| Kyushu Medical Centre (surgical) | 0.978 (0.967–0.989) | 0.988 (0.979—0.995) | |
| Mita Hospital (surgical) | 0.983 (0.974–0.99) | 0.995 (0.991–0.999) | |
| TCGA (surgical) | 0.963 (0.947–0.975) | 0.983 (0.976–0.99) |
Figure 2ROC curves for the five tests sets for each output label (a) ADC, (b) SCC, (c) SCLC. The neoplastic label (d) is a grouping of ADC, SCC, and SCLC and effectively evaluates the classification of carcinoma regardless of subtype.
Detailed IHC, surgical, and AI prediction for the 83 cases in the indeterminate test set.
| Case No. | Immunohistochemistry (IHC) | Surgical specimen diagnosis | TBLB-final diagnosis | TBLB-AI prediction |
|---|---|---|---|---|
| ADC-001 | ADC | ADC | ADC | |
| ADC-002 | ADC | ADC | ADC | |
| ADC-003 | TTF1 (+), Napsin-A (-), p40 (-), CK5/6 (-) | ADC | ADC | ADC |
| ADC-004 | ADC | ADC | ADC | |
| ADC-005 | TTF1 (+) | No surgery | ADC | ADC |
| ADC-006 | ADC | ADC | ADC | |
| ADC-007 | ADC | ADC | ADC | |
| ADC-008 | ADC | ADC | ADC | |
| ADC-009 | TTF1 (+), p40 (-) | No surgery | ADC | |
| ADC-010 | ADC | ADC | ADC | |
| ADC-011 | ADC | ADC | ADC | |
| ADC-012 | ADC | ADC | ADC | |
| ADC-013 | ADC | ADC | ADC | |
| ADC-014 | TTF1 (+) | No surgery | ADC | ADC |
| ADC-015 | ADC | ADC | ADC | |
| ADC-016 | TTF1 (+), Napsin-A (+), p40 (-), CK5/6 (-) | No surgery | ADC | ADC |
| ADC-017 | ADC | ADC | ADC | |
| ADC-018 | TTF1 (+), p40 (-) | No surgery | ADC | ADC |
| ADC-019 | ADC | ADC | ADC | |
| ADC-020 | TTF1 (+), p40 (-), CK5/6 (-) | No surgery | ADC | ADC |
| ADC-021 | TTF1 (+), CK20 (-), p63 (-), Uroplakin II (-), Thrombomodulin (-) | ADC | ADC | ADC |
| ADC-022 | ADC | ADC | ADC | |
| ADC-023 | ADC | ADC | ADC | |
| ADC-024 | ADC | ADC | ADC | |
| ADC-025 | TTF1 (+), Napsin-A (+), p40 (-), CK5/6 (-) | No surgery | ADC | ADC |
| ADC-026 | ADC | ADC | ADC | |
| ADC-027 | ADC | ADC | ADC | |
| ADC-028 | TTF1 (+), Napsin-A (+), p40 (-), CK5/6 (-) | No surgery | ADC | ADC |
| ADC-029 | TTF1 (+), SP-A (-), p40 (-), CK5/6 (-) | No surgery | ADC | |
| ADC-030 | ADC | ADC | ADC | |
| ADC-031 | ADC | ADC | ADC | |
| ADC-032 | ADC | ADC | ADC | |
| ADC-033 | ADC | ADC | ADC | |
| ADC-034 | ADC | ADC | ADC | |
| ADC-035 | ADC | ADC | ADC | |
| ADC-036 | TTF1 (+), CEA (+), SP-A (-), p40 (-), CK5/6 (-), p63 (-) | No surgery | ADC | ADC |
| ADC-037 | ADC | ADC | ADC | |
| ADC-038 | ADC | ADC | ADC | |
| ADC-039 | ADC | ADC | ADC | |
| ADC-040 | ADC | ADC | ADC | |
| ADC-041 | ADC | ADC | ADC | |
| ADC-042 | ADC | ADC | ADC | |
| ADC-043 | ADC | ADC | ADC | |
| ADC-044 | TTF1 (+), Napsin-A (+) | No surgery | ADC | ADC |
| ADC-045 | TTF1 (+), SP-A (+) | No surgery | ADC | ADC |
| ADC-046 | TTF1 (+), SP-A (+), CEA (+), CK5/6 (-), p40 (-), p63 (-) | No surgery | ADC | ADC |
| ADC-047 | TTF1 (+), SP-A (+), CEA (+), CK5/6 (-), p40 (-), p63 (-) | ADC | ADC | ADC |
| ADC-048 | ADC | ADC | ADC | |
| ADC-049 | CEA (+), CK5/6 (-), p40 (-), p63 (-) | No surgery | ADC | ADC |
| ADC-050 | ADC | ADC | ADC | |
| ADC-051 | CEA (+), CK5/6 (-), p40 (-), p63 (-) | ADC | ADC | ADC |
| ADC-052 | ADC | ADC | ADC | |
| ADC-053 | CK7 (+), TTF-1 (+), SP-A (+), MUC1 (+) | ADC | ADC | ADC |
| ADC-054 | TTF1 (+), MUC1 (+), MUC2 (-), MUC5AC (+), MUC6 (+) | ADC | ADC | ADC |
| ADC-055 | ADC | ADC | ADC | |
| ADC-056 | ADC | ADC | ADC | |
| ADC-057 | ADC | ADC | ADC | |
| ADC-058 | ADC | ADC | ADC | |
| ADC-059 | AE1/AE3 (+), TTF1 (+), Vimentin (-), LCA (-) | ADC | ADC | ADC |
| ADC-060 | ADC | ADC | ADC | |
| ADC-061 | ADC | ADC | ADC | |
| ADC-062 | ADC | ADC | ADC | |
| ADC-063 | ADC | ADC | ADC | |
| ADC-064 | ADC | ADC | ADC | |
| SCC-001 | TTF1 (-), SP-A (-), CK5/6 (+), p63 (+), p40 (+), CEA (-) | SCC | SCC | SCC |
| SCC-002 | TTF1 (-), SP-A (-), CK5/6 (+), p63 (+), p40 (+), CEA (+), involcrin (+) | No surgery | SCC | SCC |
| SCC-003 | SCC | SCC | SCC | |
| SCC-004 | SCC | SCC | SCC | |
| SCC-005 | SCC | SCC | SCC | |
| SCC-006 | CK5/6 (+), p63 (+), p40 (+), CEA (-), CD56 (-), Synaptophysin (-), Chromogranin A (-) | No surgery | SCC | SCC |
| SCC-007 | CK5/6+, CK7+, p63+, TTF-1-, SP-A- | No surgery | SCC | SCC |
| SCC-008 | SCC | SCC | SCC | |
| SCC-009 | CK5/6 (+),p63 (+), CEA (+), Involucrine (+), TTF1 (-) | No surgery | SCC | SCC |
| SCC-010 | SCC | SCC | SCC | |
| SCC-011 | SCC | SCC | SCC | |
| SCC-012 | CK14 (+), CK7 (+), CK5/6 (+), p63 (+), TTF1 (-), SP-A (-), ER (-), PgR (-) | No surgery | SCC | SCC |
| SCC-013 | SCC | SCC | SCC | |
| SCC-014 | SCC | SCC | SCC | |
| SCC-015 | TTF1 (-), SP-A (-), p63 (+), CK7 (-) | SCC | SCC | SCC |
| SCC-016 | SCC | SCC | SCC | |
| SCC-017 | SCC | SCC | SCC | |
| SCC-018 | TTF1 (-) | No surgery | SCC | SCC |
| SCC-019 | TTF1 (-) | No surgery | SCC | SCC |
Figure 3(A) shows a true positive ADC case (ADC-046) from the indeterminate TBLB test set. Heatmap images (a) and (c) show true positive predictions of ADC cells, and they correspond respectively to (b) and (d). The high magnification (e) and (f) subimages show spindle shaped and poorly differentiated morphology. Pathologists found it challenging to distinguish between ADC and SCC based on H&E histology alone. (B) shows a case (ADC-009) that was predicted as indeterminate, with the model showing strong predictions for both ADC and SCC. The (g) and (j) areas are almost overlapped, and based on the histology it is poorly differentiated and is impossible to decide between ADC and SCC. (k) and (l), and (h) and (i) have similar morphologies to poorly differentiated SCC and ADC, respectively, and the model strongly predicted them as such, respectively. In the heatmap colour spectrum, red indicates high probability, blue indicates low.
Figure 4The true diagnosis of this case (ADC-029) is ADC; however, it was predicted as SCC. (a) shows probability heatmap for SCC. The ADC cells highlighted in (d) and (e) from the (b) fragment are floating in a single-cell manner within necrotic tissue, which is potentially the source of confusion for the model. (c) shows non-neoplastic necrotic tissue without any cancer cells. In the heatmap colour spectrum, red indicates high probability, blue indicates low.
Figure 5(a–c) show representative surgical serial sections for ADC (#1-#9), SCC (#1-#8), and SCLC (#1-#8), respectively, and their associated diagnosis (D) and prediction (P) by our model. (a) #5 is a false negative prediction while all the rest are true positives. (d–f) show representative true positive probability heatmaps for ADC (d), SCC (e), and SCLC (f), respectively. Histopathologically, all the detected areas correspond to cancer cells. In the heatmap colour spectrum, red indicates high probability, blue indicates low.