| Literature DB >> 31591416 |
Primož Godec1, Matjaž Pančur1, Nejc Ilenič1, Andrej Čopar1, Martin Stražar1, Aleš Erjavec1, Ajda Pretnar1, Janez Demšar1, Anže Starič1, Marko Toplak1, Lan Žagar1, Jan Hartman1, Hamilton Wang2, Riccardo Bellazzi3, Uroš Petrovič4,5, Silvia Garagna6, Maurizio Zuccotti6, Dongsu Park2, Gad Shaulsky2, Blaž Zupan7,8.
Abstract
Analysis of biomedical images requires computational expertize that are uncommon among biomedical scientists. Deep learning approaches for image analysis provide an opportunity to develop user-friendly tools for exploratory data analysis. Here, we use the visual programming toolbox Orange ( http://orange.biolab.si ) to simplify image analysis by integrating deep-learning embedding, machine learning procedures, and data visualization. Orange supports the construction of data analysis workflows by assembling components for data preprocessing, visualization, and modeling. We equipped Orange with components that use pre-trained deep convolutional networks to profile images with vectors of features. These vectors are used in image clustering and classification in a framework that enables mining of image sets for both novel and experienced users. We demonstrate the utility of the tool in image analysis of progenitor cells in mouse bone healing, identification of developmental competence in mouse oocytes, subcellular protein localization in yeast, and developmental morphology of social amoebae.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31591416 PMCID: PMC6779910 DOI: 10.1038/s41467-019-12397-x
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Unsupervised analysis of bone healing images. a The data analysis workflow starts with importing 37 images from a local folder. The images can be viewed in the Image Viewer widget (not shown) and are passed to the Image Embedder, which was set to use Google’s InceptionV3 deep network. We computed the distances between the embedded images and presented them as a dendrogram (b) with the Hierarchical Clustering widget. The clusters corresponded well to the time (days) post injury (D7 and D14), with a few exceptions. One such exception was a branch of two images highlighted in the dendrogram (b) and shown in the Image Viewer (2) (c). Image distances were also given to the multi-dimensional scaling widget (MDS), that also exhibits separation between bone healing samples at different times as depicted in different colors. Three representative MDS points from D7 and D14 were selected manually (data points with orange boundaries) (d) and the images are shown in the Image Viewer (1) (e). The two images highlighted in the dendrogram (b) were also passed to the MDS widget as a data subset. They are visualized as filled dots in this data projection (d) and they appear close to each other because of their similarity. This figure illustrates how a biologist may explore the data after clustering—first focusing on the misclassified samples and looking at the images and then selecting some of the best classified images as a point of reference for further exploration
Fig. 2Example images considered in our pilot study encompass diverse fields in biomedicine. a Bone-fracture repair involves skeletal stem cells. The images in this example are from mice that were the progeny of a cross between mice carrying Mx1/Tomato (red), which is a skeletal stem/progenitor cells marker, and mice carrying αSMA-GFP or Nestin-GFP (green), which are mesenchymal cell markers. The bones were injured and images were taken in vivo at 7 days and 14 days after injury, when critical events in the early repair process occur. b Chromatin organization (Hoechst staining) in the nucleus of mouse fully grown antral oocytes. Depending on their chromatin organization, oocytes are classified as surrounded nucleolus (SN), with a ring of heterochromatin surrounding the nucleolus and not surrounded nucleolus (NSN) oocytes, with a more dispersed chromatin not surrounding the nucleolus. SN oocytes are developmentally competent, whereas NSN oocytes are incompetent[18]. c Protein localization in budding yeast—fluorescence micrographs of GFP-fusion proteins localized to the cytoplasm, endosome or endoplasmic reticulum (er) as indicated. d Images of Dictyostelium discoideum cells at different developmental stages—streaming (STR), loose aggregate (LAG), and tight aggregates (TAG). Scale bars are 100 μm (a), 10 µm (b), 5 µm (c), or 1 mm (d). See Supplementary Note 1 for detailed description of the image sets
Fig. 3Supervised data analysis of 131 mouse oocyte images with surrounded (SN) or not surrounded (NSN) chromatin organization. a The data analysis workflow first imports the data from the local directory where images are stored in respective subdirectories named SN and NSN. Vector-based embedding passes the data matrix to a cross-validation widget (Test and Score) that accepts a machine-learning method (logistic regression) as an additional input. The Test and Score widget displays the cross-validated accuracy (area under ROC curve—AUC, classification accuracy—CA, and harmonic average of the precision and recall—F1 score) (b) and sends the evaluation results to the Confusion Matrix widget (c). The Confusion Matrix widget provides information on misclassification. In this example, 65 of the 69 SN oocytes were classified correctly. Selection of this particular cell in the Confusion Matrix triggers sending these images and their descriptors further down the workflow to an Image Viewer (d) and, as a subset of data points, to the MDS widget that performs multi-dimensional scaling (e). Just like in Fig. 1, the MDS widget shows a planar projection of data points (images) and highlights, in this case, the image points selected in the Confusion Matrix. Altogether, the components of this workflow are used to quantitatively evaluate the expected performance of machine-learning models through cross-validation and to support further exploration of correctly and incorrectly classified images