| Literature DB >> 29292031 |
Pegah Khosravi1, Ehsan Kazemi2, Marcin Imielinski3, Olivier Elemento4, Iman Hajirasouliha4.
Abstract
Pathological evaluation of tumor tissue is pivotal for diagnosis in cancer patients and automated image analysis approaches have great potential to increase precision of diagnosis and help reduce human error. In this study, we utilize several computational methods based on convolutional neural networks (CNN) and build a stand-alone pipeline to effectively classify different histopathology images across different types of cancer. In particular, we demonstrate the utility of our pipeline to discriminate between two subtypes of lung cancer, four biomarkers of bladder cancer, and five biomarkers of breast cancer. In addition, we apply our pipeline to discriminate among four immunohistochemistry (IHC) staining scores of bladder and breast cancers. Our classification pipeline includes a basic CNN architecture, Google's Inceptions with three training strategies, and an ensemble of two state-of-the-art algorithms, Inception and ResNet. Training strategies include training the last layer of Google's Inceptions, training the network from scratch, and fine-tunning the parameters for our data using two pre-trained version of Google's Inception architectures, Inception-V1 and Inception-V3. We demonstrate the power of deep learning approaches for identifying cancer subtypes, and the robustness of Google's Inceptions even in presence of extensive tumor heterogeneity. On average, our pipeline achieved accuracies of 100%, 92%, 95%, and 69% for discrimination of various cancer tissues, subtypes, biomarkers, and scores, respectively. Our pipeline and related documentation is freely available at https://github.com/ih-_lab/CNN_Smoothie.Entities:
Keywords: Biomarkers; Classification; Convolutional Neural Network; Deep learning; Digital pathology imaging; Tumor heterogeneity
Mesh:
Year: 2017 PMID: 29292031 PMCID: PMC5828543 DOI: 10.1016/j.ebiom.2017.12.026
Source DB: PubMed Journal: EBioMedicine ISSN: 2352-3964 Impact factor: 8.143
Eight datasets are selected to assess the performance of the pipeline across different conditions.
| Number | Datasets | The database representation | Labels of inputs and outputs | Dataset size | Class size |
|---|---|---|---|---|---|
| 1 | BladderBreastLung | H&E-stained images for bladder, breast and lung cancers | Discrimination of different tissues of cancer (bladder, breast, and lung) | 3 classes and 1918 images | Bladder: 543, breast: 962, lung: 413 |
| 2 | BladderBiomarkers | IHC-stained images of cancer biomarkers comprising GATA3, CK14, S100P, and S0084 in bladder cancer | Discrimination of different types of biomarkers (GATA3, CK14, S100P, and S0084) | 4 classes and 2139 images | GATA3: 542, CK14: 514, S100P: 544, S0084: 539 |
| 3 | BreastBiomarkers | IHC-stained images of cancer biomarkers including ER, CK17, CK5/6, EGFR, and HER2 in breast cancer | Discrimination of different types of biomarkers (ER, CK17, CK5/6, EGFR, and HER2) | 5 classes and 2542 images | ER: 637, CK17: 639, CK5/6: 635, EGFR: 307, HER2: 324 |
| 4 | TMAD-InterHeterogeneity | H&E- and IHC-stained whole-slides of adenocarcinoma and squamous cell lung cancers | Discrimination of different subtypes of cancer (adenocarcinoma vs. squamous cell lung tumors) for TMAD images | 2 classes and 860 images (H&E: 572, IHC: 288) | Adenocarcinoma: 637, squamous cell: 223 |
| 5 | TCGA-IntraHeterogeneity | H&E-stained high-resolution image patches of adenocarcinoma and squamous cell lung tissues | Discrimination of different subtypes of cancer (adenocarcinoma vs. squamous cell lung tumors) within high-resolution image patches of TCGA images | 2 classes and 1629 images | Adenocarcinoma: 845, squamous cell: 784 |
| 6 | TCGA-InterHeterogeneity | H&E-stained whole-slides images of adenocarcinoma and squamous cell lung tissues | Discrimination of different subtypes of cancer (adenocarcinoma vs. squamous cell lung tumors) within whole-slide images of TCGA images | 2 classes and 1520 images | Adenocarcinoma: 761, squamous cell: 759 |
| 7 | BladderScores | IHC-stained images with various staining scores comprising Score 0, Score 1, Score 2, and Score 3 in bladder cancer | Discrimination of different staining scores (Score 0, Score 1, Score 2, and Score 3) of biomarkers | 4 classes and 2137 images | Score 0: 680, Score 1: 235, Score 2: 284, Score 3: 938 |
| 8 | BreastScores | IHC-stained images with various staining scores including Score 0, Score 1, Score 2, and Score 3 in breast cancer | Discrimination of different staining scores (Score 0, Score 1, Score 2, and Score 3) of biomarkers | 4 classes and 2543 images | Score 0: 1817, Score 1: 263, Score 2: 184, Score 3: 279 |
Fig. 1This flowchart demonstrates the pipeline, which includes extracting data, training and evaluation of CNN algorithms, and prediction of various classes. a: tumor image preparation of biopsy samples, b: extracting biopsy-derived tissue slides from TMA and TCGA databases, c: analysis of images using CNN_smoothie, and d: evaluation of the algorithms performance and annotation of the output results.
The results of six state-of-the-art architectures of deep learning algorithms on various datasets using ARC. The numbers are measured based on TNu and FNu and represent accuracy percentages. The bold fonts indicate the best classification accuracies on datasets.
| Algorithms | Tumor tissue discrimination | Bladder biomarker discrimination | Breast biomarker discrimination | Lung tumor subtype discrimination (TMAD images) |
|---|---|---|---|---|
| CNN-basic | 100% | 71.5% | 79.2% | 73% |
| Inception V3-Last layer-4000 steps | 99.3% | 86% | 75.6% | 80% |
| Inception V3-Last layer-12000 steps | 99.3% | 87.5% | 76.8% | 78% |
| Inception V1-Fine tune | 90% | |||
| Inception-ResNet V2-Last layer | 96.6% | 85.5% | 78.4% | 75% |
| Inception V3-Fine tune | 98% | 90% | ||
| Algorithms | Lung tumor subtype discrimination (TCGA intra-images) | Lung tumor subtype discrimination (TCGA inter-images) | Score discrimination in bladder | Score discrimination in breast |
| CNN-basic | 68% | 61% | 47% | 40.5% |
| Inception V3-Last layer-4000 steps | 84% | 70% | 64.5% | |
| Inception V3-Last layer-12000 steps | 80% | 70% | 64.5% | 59.5% |
| Inception V1-Fine tune | 56% | |||
| Inception-ResNet V2-Last layer | 84% | 66% | 58% | 45.5% |
| Inception V3-Fine tune | 79% | 76% | 56% |
Fig. 2Low accuracy may be associated with human errors in labeling of IHC scores. For example, figures a, b, c, and d labeled to score 3 by pathologists, while the algorithm (Inception-V1) has classified them to score 3, 3, 1, and 0, respectively. In particular, figures e and g are both labeled to score 0 by pathologists; however, the algorithm correctly has classified them into score 0 and 1, respectively. Finally, figures f and h are labeled to score 2 by pathologists while the algorithm has classified them into score 2 and 3, respectively. Closer manual inspection of images indicate the algorithm results are indeed more reliable. Highlighted probability scores in green and orange indicate concordance and discordance between algorithm classification and pathologist labeling, respectively.
Fig. 3Intra- and inter-tumor heterogeneity. The figure shows the squamous cell lung cancer in the left (A) and adenocarcinoma cell lung cancer in the right (B). The top images (A and B) represent whole-slide images and the down images represent the extracted high-resolution patches from TCGA datasets. The red cubes show the patches that algorithms are trained for them and the blue cubes indicate the patches comprising test set.
Fig. 4The graph shows the optimal number of steps for Inception-ResNet (last layer training), Inception-V1 (fine-tuning all layers), and Inception-V3 (fine-tuning all layers) to get the highest accuracy in BladderBiomarkers.
Fig. 5Inception-V1 via three different training strategies (last layer training, fine-tuning the parameters for all layers, and training from the scratch) in BreastBiomarkers dataset.
Fig. 6Receiver operating characteristic (ROC) curve for the TCGA-InterHeterogeneity dataset.
Fig. 7Precision versus recall for the TCGA-InterHeterogeneity dataset. The 4000- step version is used for Inception-V3 (training the last layer).
Fig. 8CNN_Smoothie successfully identifies tumor subtypes (LUAD vs. LUSC) and discriminates them consistently and robustly across different spectrum of colors. Highlighted probability scores in green indicate the output of classification using Inception-V1.
The result on TMAD-InterHeterogeneity and TCGA-InterHeterogeneity datasets using various statistics measures. The number in parentheses correspond to the Youden Index. The bold fonts indicate higher classification accuracies for the measures.
| Algorithms | AUC | Precision | Recall | Cohen's kappa | Jaccard coefficient | Log-loss | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TMA | TCGA | TMA | TCGA | TMA | TCGA | TMA | TCGA | TMA | TCGA | TMA | TCGA | |
| CNN-basic | 0.64 (0.27) | 0.61 (0.22) | 0.71 | 0.62 | 0.73 | 0.61 | 0.30 | 0.22 | 0.73 | 0.61 | 1.34 | 1.4 |
| Inception-V3 Last-layer 4000-steps | 0.79 (0.59) | 0.70 (0.40) | 0.81 | 0.71 | 0.80 | 0.70 | 0.56 | 0.4 | 0.80 | 0.70 | 0.45 | |
| Inception-V3 Last-layer 12000-steps | 0.76 (0.52) | 0.70 (0.40) | 0.79 | 0.70 | 0.78 | 0.70 | 0.50 | 0.4 | 0.78 | 0.70 | 0.55 | 0.64 |
| Inception-V1 Fine-tune | 0.39 | 0.66 | ||||||||||
| Inception-ResNet-V2 Last-layer | 0.68 (0.35) | 0.66 (0.32) | 0.74 | 0.68 | 0.75 | 0.66 | 0.38 | 0.32 | 0.75 | 0.66 | 0.48 | 0.63 |
| Inception-V3 Fine-tune | 0.87 (0.75) | 0.79 (0.58) | 0.90 | 0.83 | 0.90 | 0.79 | 0.76 | 0.58 | 0.90 | 0.79 | 1.16 | |