| Literature DB >> 29109450 |
Michael Nalisnik1, Mohamed Amgad1, Sanghoon Lee2, Sameer H Halani3, Jose Enrique Velazquez Vega4, Daniel J Brat4,5, David A Gutman2, Lee A D Cooper6,7,8.
Abstract
Whole-slide imaging of histologic sections captures tissue microenvironments and cytologic details in expansive high-resolution images. These images can be mined to extract quantitative features that describe tissues, yielding measurements for hundreds of millions of histologic objects. A central challenge in utilizing this data is enabling investigators to train and evaluate classification rules for identifying objects related to processes like angiogenesis or immune response. In this paper we describe HistomicsML, an interactive machine-learning system for digital pathology imaging datasets. This framework uses active learning to direct user feedback, making classifier training efficient and scalable in datasets containing 108+ histologic objects. We demonstrate how this system can be used to phenotype microvascular structures in gliomas to predict survival, and to explore the molecular pathways associated with these phenotypes. Our approach enables researchers to unlock phenotypic information from digital pathology datasets to investigate prognostic image biomarkers and genotype-phenotype associations.Entities:
Mesh:
Substances:
Year: 2017 PMID: 29109450 PMCID: PMC5674015 DOI: 10.1038/s41598-017-15092-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1An interactive machine-learning framework for phenotyping histology images. Digitized whole-slide images of tissue sections can be analyzed to extract features describing the shape, texture and staining characteristics of histologic structures like cell nuclei. We created a software framework that enables experts to identify important histologic elements like tumor infiltrating lymphocytes or vascular endothelial cells in these images through interactive training of machine learning classifiers. A browser-based interface provides point-and-click interaction with datasets containing 108+ objects for training classification rules. A multi-CPU server manages the images and boundary and feature data and provides the computational power for visualization and analysis. Classifications generated with this framework can be used to describe the phenotypes associated with cancer-related processes like angiogenesis and lymphocytic infiltration, and to investigate phenotype-genotype associations and phenotypic prognostic biomarkers.
Figure 2Active learning for efficient classification rule training. (A) (Left) A classification rule aims to learn an unknown decision boundary (black) that separates classes of objects in feature space. A margin (gray) surrounding this boundary contains objects with low prediction confidence that are difficult for the rule to classify. (Center) Instance-based learning presents unlabeled low-confidence objects to the user for labeling. (Right) Retraining the classification rule with these labels shrinks the margin towards the decision boundary improving classification accuracy. (B) Heatmap-based learning directs users to image regions that are enriched with low confidence objects for labeling. (Top) Correcting prediction errors (yellow) in low-confidence regions (red) and retraining reduces the number of low-confidence objects. (Bottom) Classification rule specificity is improved by re-training. Here the heatmaps indicate the density of cells positively classified as lymphocytes before and after retraining. (C) Active learning is an iterative process: the user first labels objects guided by active learning, then the classification rule is retrained and applied to the entire dataset, and lastly new instances and heatmaps are generated.
Figure 3Classifying vascular endothelial cells in gliomas. (A) We used active learning to train a classification rule to identify vascular endothelial cell nuclei in lower-grade gliomas (highlighted in green). (B) Prediction rule accuracy was evaluated using area- under-curve (AUC) analysis. (C) AUC was evaluated at each training iteration to measure improvement in prediction accuracy. (D) For additional validation, we correlated the percentage of positively classified endothelial cells in each sample with mRNA expression levels of the endothelial marker PECAM1 using measurements from TCGA frozen specimens (image analysis measurements were performed in images of formalin-fixed paraffin embedded sections from the same specimens).
Figure 4Quantitative phenotyping of microvasculature in gliomas. Microvascular structures undergo visually apparent changes in response to signaling within the tumor microenvironment. (A) We measured nuclear hypertrophy using a nonlinear curve to model the continuum of VECN morphologies. A hypertrophy index (HI) was calculated for each patient to measure the extremity of VECN nuclear hypertrophy score values. (B) We validated nuclear scores using nuclei that were manually labeled nuclei as hypertrophic/non- hypertrophic. (C) Examples of cell nuclei used in validation. (D) We implemented a clustering index (CI) to measure the spatial clustering of VECN as a readout of hyperplasia. CI measures the average number of VECN within a 50-micron radius of each VECN in a sample. (E) CI was compared to manual assessments of hyperplasia a multi-layered/not layered (red circles indicate the examples shown in F). (F) Example microvascular structures from two of the slides used in validating CI.
Figure 5Predicting survival of glioma patients with microvasculature phenotypes. (A) HI and CI were compared with important clinical metrics including WHO Grade and molecular subtype. (B) We trained cox hazard models using combinations of phenotypic and clinical predictors to assess their prognostic relevance and independence. Models were trained and evaluated using 100 randomizations of samples to training/testing sets. The dashed line represents the c-index corresponding to molecular subtype in this cohort. (C) We compared the accuracy of models based on HI and CI generated using a classifier trained with active learning (red) with HI and CI generated using a standard classifier trained without active learning (purple).
Molecular pathways enriched with phenotype-correlated transcripts.
| Pathway Group | Pathway name | Leading-edge genes | Subtype/metric (directionality) | Nominal p-value (FDR q-value) |
|---|---|---|---|---|
| Classiscal angiogenesis pathways | *HIF1-alpha transcription factor |
| Oligodendroglioma/HI (+) Oligodendroglioma/CI (−) | 0.033 (0.179) <0.001 (0.116) |
| HIF2-alpha transcription factor |
| IDHwt-astrocytoma/CI (+) Oligodendroglioma/CI (+) | 0.004 (0.017) 0.024 (0.116) | |
| VEGFR1/2 mediated signaling |
| IDHwt-astrocytoma/HI (+) Oligodendroglioma/CI (+) | 0.012 (0.144) 0.009 (0.12) | |
| *VEGFR1 specific signals |
| IDHwt-astrocytoma/HI (+) | 0.007 (0.19) | |
| Angiopoeitin receptor TIE-2 mediated signaling |
| IDHwt-astrocytoma/HI (+) IDHwt-astrocytoma/CI (+) Oligodendroglioma/CI (+) | 0.014 (0.147) 0.015 (0.063) 0.009 (0.087) | |
| *PDGFRA signaling |
| Oligodendroglioma/HI (+) | 0.014 (0.08) | |
| Developmental signaling pathways | Notch signaling network |
| IDHwt-astrocytoma/CI (+) Oligodendroglioma/CI (+) | <0.001 (<0.001) 0.02 (0.146) |
| *Notch mediated HES/HEY network |
| IDHwt-astrocytoma/HI (+) IDHwt-astrocytoma/CI (+) | 0.007 (0.143) <0.001 (<0.001) | |
| *WNT signaling |
| Oligodendroglioma/CI (+) | 0.021 (0.085) | |
| *Regulation of nuclear beta catenin signaling | Oligodendroglioma/CI (+) | 0.004 (0.054) | ||
| *GLI-mediated Hedgehog signaling | IDHwt-astrocytoma/CI (+) | 0.015 (0.063) | ||
| Other pathways | *Regulation of SMAD2/SMAD3 signaling |
| IDHwt-astrocytoma/HI (+) IDHwt-astrocytoma/CI (+) | 0.002 (0.008) 0.024 (0.139) |
| *SMAD2/SMAD3 nuclear signaling |
| IDHwt-astrocytoma/CI (+) | <0.001 (<0.001) | |
| *FOXM1 transcription factor network |
| Oligodendroglioma/CI (+) | <0.001 (<0.001) |
Gene set enrichment analysis of the correlations between HI/CI and gene expression identified multiple pathways associated with gliomas and vascularization. Many of the significantly enriched pathways are specific to one molecular glioma subtype. Extended results are presented in Table S2.