| Literature DB >> 33075075 |
Manuel Muñoz-Aguirre1,2, Vasilis F Ntasis1, Santiago Rojas3, Roderic Guigó1,4.
Abstract
The development of increasingly sophisticated methods to acquire high-resolution images has led to the generation of large collections of biomedical imaging data, including images of tissues and organs. Many of the current machine learning methods that aim to extract biological knowledge from histopathological images require several data preprocessing stages, creating an overhead before the proper analysis. Here we present PyHIST (https://github.com/manuel-munoz-aguirre/PyHIST), an easy-to-use, open source whole slide histological image tissue segmentation and preprocessing command-line tool aimed at tile generation for machine learning applications. From a given input image, the PyHIST pipeline i) optionally rescales the image to a different resolution, ii) produces a mask for the input image which separates the background from the tissue, and iii) generates individual image tiles with tissue content.Entities:
Year: 2020 PMID: 33075075 PMCID: PMC7647117 DOI: 10.1371/journal.pcbi.1008349
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1PyHIST pipeline.
(a) The input to the pipeline is a Whole Slide Image (WSI). Within PyHIST, the user can decide to scale down the image to perform the segmentation and tile extraction at lower resolutions. The WSI shown is of a skin tissue sample (GTEX-1117F-0126) from the Genotype-Tissue Expression (GTEx) project [20]. (b) An alternative version of the input image is generated, where the tissue edges are highlighted using a Canny edge detector. A graph segmentation algorithm is employed over this image in order to generate the mask shown in (c). PyHIST extracts tiles of specific dimensions from the masked regions, and provides an overview image to inspect the output of the segmentation and masking procedure, as shown in (d), where the red lines indicate the grid generated by tiling the image at user-specified tile dimensions, while the blue crosses indicate the selected tiles meeting a certain user-specified threshold of tissue content with respect to the total area of the tile. In (e), examples of selected tiles are shown.
Fig 2TCGA use case.
(a) Examples of the top 5 most accurately predicted tiles per cancer-affected tissue (rows) from the TCGA use case test set. The label above each tile shows the predicted cancer-affected tissue type (GB: glioblastoma, DC: infiltrating ductal carcinoma, AC: adenocarcinoma, CC: clear cell carcinoma, HC: hepatocellular carcinoma, MM: malignant melanoma), followed by the probability of the ground truth label. All of these tiles were correctly classified. (b) Dimensionality reduction of TCGA tiles. t-SNE performed with the feature vectors of each tile that were derived from the deep learning classifier model. Each dot corresponds to an image tile.