Literature DB >> 33075075

PyHIST: A Histological Image Segmentation Tool.

Manuel Muñoz-Aguirre^1,2, Vasilis F Ntasis¹, Santiago Rojas³, Roderic Guigó^1,4.

Abstract

The development of increasingly sophisticated methods to acquire high-resolution images has led to the generation of large collections of biomedical imaging data, including images of tissues and organs. Many of the current machine learning methods that aim to extract biological knowledge from histopathological images require several data preprocessing stages, creating an overhead before the proper analysis. Here we present PyHIST (https://github.com/manuel-munoz-aguirre/PyHIST), an easy-to-use, open source whole slide histological image tissue segmentation and preprocessing command-line tool aimed at tile generation for machine learning applications. From a given input image, the PyHIST pipeline i) optionally rescales the image to a different resolution, ii) produces a mask for the input image which separates the background from the tissue, and iii) generates individual image tiles with tissue content.

Entities: Disease Gene Species

Year: 2020 PMID： 33075075 PMCID： PMC7647117 DOI： 10.1371/journal.pcbi.1008349

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

This is a PLOS Computational Biology Software paper.

Introduction

In histopathology, Whole Slide Images (WSI) are high-resolution images of tissue sections obtained by scanning conventional glass slides [1]. Currently, these glass slides of fixed tissue samples are the preferred method in pathology laboratories around the world to make clinical diagnoses [2], notably in cancer [3]. However, the increasing automation of WSI acquisition has led to the development of computational methods to process the images with the goal of helping clinicians and pathologists in diagnosis and disease classification [4]. As an increasing number of larger WSI datasets became available, methods have been developed for a wide array of tasks, such as the classification of breast cancer metastases, Gleason scoring for prostate cancer, tumor segmentation, nuclei detection and segmentation, bladder cancer diagnosis, mutated gene prediction, among others [5-10]. Besides of being important diagnostic tools, histopathological images capture endophenotypes (of organs and tissues) that, when correlated with molecular and cellular data on the one hand, and higher-order phenotypic traits on the other, can provide crucial information on the biological pathways that mediate between the sequence of the genome and the biological traits of the organisms (including diseases) [11]. Because of the complexity of the information typically contained in WSIs, Machine Learning (ML) methods that can infer, without prior assumptions, the relevant features that they encode are becoming the preferred analytical tools [12]. These features may be clinically relevant but challenging to spot even for expert pathologists, and thus, ML methods can prove valuable in healthcare decision-making [13]. In most ML tasks, data preprocessing remains a fundamental step. Indeed, in the domain of histological images, there are several issues when preprocessing the data before an analysis: due to the large dimensions of WSIs, many deep learning applications have to break them down into smaller-sized square pieces called tiles [14]. Furthermore, a significant fraction of the area in a WSI is often uninformative background that is not meaningful for the majority of downstream analyses. To circumvent this, some applications apply a series of image transformations to identify the foreground from the background (see, for example, [15]), and perform relevant operations only over regions with tissue content. However, this process is not standardized, and customized scripts have to be frequently developed to deal with data preparation stages (see, for example [10,15]). This is cumbersome and may introduce dataset specific-biases, which can prevent integration across multiple datasets. Currently available tools for WSI processing focus mostly on the analysis of human-interpretable features by means of nuclei segmentation, object quantification and region-of-interest annotation [16-18]; but WSI preparation into tiles for external ML applications has not yet been directly addressed. To systematize the WSI preprocessing procedure for these applications, and in order to streamline the data preparation stage at the initial phase of a ML project by avoiding the need of creating custom image preprocessing scripts, we developed PyHIST, a command-line based pipeline to segment the regions of a histological image into tiles with relevant tissue content (foreground) with little user intervention. PyHIST was developed to process Aperio SVS/TIFF WSIs due to this format being supported by large slide databases such as The Cancer Genome Atlas (TCGA) which has approximately 31,000 WSIs [19] and The Genotype-Tissue Expression Project (GTEx) with approximately 25,000 WSIs [20]. PyHIST currently has experimental support for other image formats (see S1 Text).

Design and implementation

PyHIST is a command-line Python tool based on OpenSlide [21], a library to read high-resolution histological images in a memory-efficient way. PyHIST's input is a WSI encoded in SVS format (Fig 1A), and the main output is a series of image tiles retrieved from regions with tissue content (Fig 1E).

Fig 1

PyHIST pipeline.

PyHIST pipeline.

(a) The input to the pipeline is a Whole Slide Image (WSI). Within PyHIST, the user can decide to scale down the image to perform the segmentation and tile extraction at lower resolutions. The WSI shown is of a skin tissue sample (GTEX-1117F-0126) from the Genotype-Tissue Expression (GTEx) project [20]. (b) An alternative version of the input image is generated, where the tissue edges are highlighted using a Canny edge detector. A graph segmentation algorithm is employed over this image in order to generate the mask shown in (c). PyHIST extracts tiles of specific dimensions from the masked regions, and provides an overview image to inspect the output of the segmentation and masking procedure, as shown in (d), where the red lines indicate the grid generated by tiling the image at user-specified tile dimensions, while the blue crosses indicate the selected tiles meeting a certain user-specified threshold of tissue content with respect to the total area of the tile. In (e), examples of selected tiles are shown. The PyHIST pipeline involves three main steps: 1) produce a mask for the input WSI that differentiates the tissue from the background, 2) create a grid of tiles on top of the mask, evaluate each tile to see if it meets the minimum content threshold to be considered as foreground and 3) extract the selected tiles from the input WSI at the requested resolution. By default, PyHIST uses a graph-based segmentation method to produce the mask. In this method, first, tissue edges inside the WSI are identified using a Canny edge detector (Fig 1B), generating an alternative version of the image with diminished noise and an enhanced distinction between the background and the tissue foreground. Second, these edges are processed by a graph-based segmentation algorithm [22], which is used here to identify tissue content. In short, this step evaluates the boundaries between different regions of an image as defined by the edges; different parts of the image are represented as connected components of a graph, and the "within" and "in-between" variations of neighboring components are assessed in order to decide if the examined image regions should be merged or not into a single component. From this, a mask is obtained in which the background and the different tissue slices are separated and marked as distinct objects using different colors (Fig 1C). Finally, the mask is divided into a tile grid with a user-specified tile size. These tiles are then assessed to see if they meet a minimum foreground (tissue) threshold with respect to the total area of the tile, in which case they are kept, and otherwise are discarded. Optionally, the user can also decide to save all the tiles in the image. Of note, tile generation can be performed at the native resolution of the WSI, but downsampling factors can also be specified to generate tiles at lower resolutions. Additionally, edge detection and mask generation can also be performed on downsampled versions of WSIs—reducing segmentation runtimes (S1 Fig, S1 Text). A segmentation overview image is generated at the end of the segmentation procedure for the user to visually inspect the selected tiles (Fig 1D). With the set of parameters available in PyHIST (S2 Text), the user can specify regions to ignore when performing the masking and segmentation (S2 Fig), and have a fine-grained control over specific use-cases. By default, PyHIST uses the graph-based segmentation method described previously due to its robustness in detecting tissue foreground in WSIs that do not have a homogeneous composition. However, alternative tile-generation methods based on thresholding that tend to work well on heterogeneous WSIs are also implemented (S3–S5 Figs, see S1 Text for details and benchmarking information). PyHIST also has a random tile sampling mode for those applications that do not necessarily need to distinguish the background from the foreground. In this mode, tiles at a user-specified size and resolution will be extracted from random starting positions in the WSI.

Results

To demonstrate how PyHIST can be used to preprocess WSIs for usage in a ML application, we generated a use case example with the goal of building a classifier at the tile-level that allows us to determine the cancer-affected tissue of origin based on the histological patterns encoded in these tiles. To this end, we first retrieved a total of 36 publicly available WSIs, six from each of the following human tissues hosted in The Cancer Genome Atlas (TCGA) [23]: Brain (glioblastoma), Breast (infiltrating ductal carcinoma), Colon (adenocarcinoma), Kidney (clear cell carcinoma), Liver (hepatocellular carcinoma), and Skin (malignant melanoma). Slides within each tissue have the same cancer primary diagnosis as established by TCGA. Second, these WSIs were preprocessed with PyHIST, generating a total of 7163 tiles with dimensions 512x512. These tiles were then partitioned into training and test sets (constraining all the tiles of a given WSI to be in only one of the two sets), and we then fit a deep learning convolutional neural network model over these tiles with weighted sampling at training time (S6 Fig), achieving a classification accuracy of 95% (Fig 2A, S1 Table, S2 Table, see S3 Text for data preparation and model details, and a detailed assessment of Fig 2A).

Fig 2

TCGA use case.

TCGA use case.

(a) Examples of the top 5 most accurately predicted tiles per cancer-affected tissue (rows) from the TCGA use case test set. The label above each tile shows the predicted cancer-affected tissue type (GB: glioblastoma, DC: infiltrating ductal carcinoma, AC: adenocarcinoma, CC: clear cell carcinoma, HC: hepatocellular carcinoma, MM: malignant melanoma), followed by the probability of the ground truth label. All of these tiles were correctly classified. (b) Dimensionality reduction of TCGA tiles. t-SNE performed with the feature vectors of each tile that were derived from the deep learning classifier model. Each dot corresponds to an image tile. We also inspected the feature vectors generated by the deep learning model: for each tile, we retrieved the features corresponding to the linear layer of the last (fully connected) sequential container of the model, and performed dimensionality reduction (t-SNE) over the stacked matrix of these vectors. From here, we infer that the learned features recapitulate tissue morphology since tile clusters corresponding to each tissue are formed (Fig 2B, S7 Fig). We note that this classifier is only an exercise to show end-users how to quickly prepare WSI data using PyHIST to generate tiles, reducing the overhead to start performing downstream analyses: further tuning of the model with more data is desirable to ensure that the classifier is robust enough to generalize to different types of unseen WSIs for a real application.

Availability and future directions

The example use case described above is documented and fully available at https://pyhist.readthedocs.io/en/latest/testcase/, and divided into three Jupyter notebooks: 1) Data preprocessing with PyHIST, 2) Constructing a deep learning tissue classifier, and 3) Dimensionality reduction. The TCGA WSIs in the use case were downloaded from the Genomic Data Commons (GDC) repository (https://gdc.cancer.gov/) using the GDC Data-transfer tool (https://gdc.cancer.gov/access-data/gdc-data-transfer-tool). PyHIST is a generic tool to segment histological images automatically: it allows for easy and rapid WSI cleaning and preprocessing with minimal effort to generate image tiles geared towards usage in ML analyses. The tool is available at https://github.com/manuel-munoz-aguirre/PyHIST and released under a GPL license. Updated documentation and a tutorial can be found at https://pyhist.readthedocs.io/. PyHIST is highly customizable, enabling the user to tune the segmentation process in order to suit the needs of any particular application that relies on histological image tiles. The software and all of its dependencies have been packaged in a Docker image, ensuring portability across different systems. PyHIST can also be used locally within a regular computing environment with minimal requirements. Future directions and improvements include adding support for more histological image formats and features to save tiles into specialized data structures, as well as the inclusion of a graphical user interface to ease the learning curve for users who are new to the field of image processing for ML analyses. Finally, PyHIST is open source software: all the code and reproducible notebooks for the example use case are available in GitHub and will continue to be improved based on user feedback.

PyHIST overview.

General description of the pipeline: supported file formats, tile generation methods, and execution times. (PDF) Click here for additional data file.

Parameter description.

Description of supported arguments in PyHIST. (PDF) Click here for additional data file.

TCGA tissue classification use case.

Description of data preprocessing, model training and analysis for the TCGA tissue classification use case. (PDF) Click here for additional data file.

WSI scaling steps in PyHIST.

(a) WSI at its original resolution (1x). (b) The mask can be generated and processed at a given downsampling factor. A smaller resolution will lead to a faster segmentation. (c) The output can be requested at a given downsampling factor. (d) The segmentation overview image can also be generated at a given downsampling factor. The dimensions in all steps are matched to ensure that the tile sizes and grid are consistent. The downsampling choices for all the steps are independent of each other. (PNG) Click here for additional data file.

Image in graph-based segmentation test mode.

Test mode allows the user to see how the image mask will be with the chosen segmentation parameters and tile dimension configuration, before proceeding to generate the individual tile files. The black border defines the region of exclusion for tissue content placed within the edges of the slide (see—borders and—corners arguments, and section 2.2 in S2 Text). (PNG) Click here for additional data file.

Comparison of mask generation methods.

(a) Adipose tissue WSI from the GTEx project, from sample GTEX-111CU-1826. Thresholding-based masks (b-d) are generated by first converting (a) into grayscale and then applying the corresponding thresholding method. Note that simple thresholding is shown here for completeness but only Otsu and adaptive are implemented in PyHIST due to their overall better performance when compared to simple thresholding. In the graph-based method, an image with highlighted edges is first generated through a Canny edge detector (e, left) and then the connected components are labeled through graph-based segmentation (e, right). (PNG) Click here for additional data file.

Runtime benchmarks for random sampling and graph-based segmentation.

(a) Execution time to perform random sampling (y-axis) of a varying number of tiles (x-axis) at different downsampling factors for the WSI shown in S1 Fig. For each combination of number of tiles and downsampling factor, the sampling was repeated 30 times. Each dot represents the average running time across the 30 runs, while the interval shows the range between the maximal and minimal running time. (b) Execution time to perform random sampling of 1000 tiles (y-axis) at different tile dimensions (x-axis) at different downsampling factors for the same WSI in (a). Each combination was repeated 50 times, with each dot showing the average runtime. (c) Segmentation runtime of 50 Stomach WSIs from the GTEx project, at different downsampling factors, at a tile size of 256x256. Each dot represents the average execution time. Each interval shows the range between the fastest and slowest segmentations, while the labels show the dimensions of the corresponding WSIs. (d) Segmentation runtime (y-axis) at 1x resolution for the 50 Stomach WSIs, with respect to the number of pixels in the WSI (x-axis). (PNG) Click here for additional data file.

Runtime comparison of mask-generating methods.

Tile extraction was evaluated for the three different methods at four different settings of tile size. Each method + tile size combination was repeated ten times to show runtime variability. (PNG) Click here for additional data file.

Tile distribution per class in a training epoch in the TCGA example use case.

Within each training epoch, weighted random sampling is performed to create batches with a fair distribution of tiles among the classes. Even if the sample sizes in the training dataset are different among the classes, the balance in the number of tiles per epoch is obtained through data augmentation. (PNG) Click here for additional data file.

Correlation matrix of TCGA tiles based on their feature vectors.

Heatmap of Pearson’s correlation matrix between the feature vectors obtained for each TCGA tile. Rows and columns are reordered with hierarchical agglomerative clustering. (PNG) Click here for additional data file.

Tile distribution across classes in the TCGA use case training and test sets.

(PNG) Click here for additional data file.

Confusion matrix for the tiles in the test set of the TCGA use case.

(PNG) Click here for additional data file. 17 Jul 2020 Dear Muñoz-Aguirre, Thank you very much for submitting your manuscript "PyHIST: A Histological Image Segmentation Tool" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Dina Schneidman-Duhovny Software Editor PLOS Computational Biology Dina Schneidman-Duhovny Software Editor PLOS Computational Biology *********************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The paper introduces a simple and easy to use open-source tool for slide tiling, which is useful for histopathology image analysis. I have to acknowledge there are currently inadequate tools available that make tiling of digital slides simple, and I had to write my own scripts to tile my slides when analyzing whole slide images. Overall, my experience with the application was positive; I was able to create tiles from svs and TIFF files in a short amount of time, with a short learning curve. The setup and installation was relatively easy; the Docker version of the software worked without problems on Ubuntu and Windows 10 based machines. A few problems were encountered when the program was installed through Anaconda using the program’s accompanying installation instructions: in the Ubuntu machine, ‘cv2’ was reported to be missing; in Windows 10, several libraries were missing. I was able to get the program running in Ubuntu and Anaconda after manually installing OpenCV. I did not attempt to manually fix the Anaconda installation in Windows due to the large number of missing libraries. I am not sure if this is due to the unique setup of my computer, or the program was not tested with Windows 10 and Anaconda. Suggested Correction: Lines 22 - 23 states “Histopathological images are routinely used in the diagnosis of many diseases, notably cancer.” This can be misinterpreted as saying that pathologists make their diagnoses predominantly through whole slide images (WSIs). Although WSI is becoming more widespread in pathology departments, most pathologists still render their diagnoses by examining glass slides under a microscope. This statement has to be corrected/modified to reflect that whole slide images are still not being used by majority of pathologists to sign out their cases, although there is an increasing adoption of whole slide scanning technologies in pathology departments. Future direction: There is more potential in this software, which can accommodate additional features in the future while retaining its simplicity. Aside from adding new features, I believe adding a Graphical user interface (GUI) version of the program would increase the application’s user base, and be helpful for those who are less computer savvy and have no experience in using the command line. Reviewer #2: The manuscript submitted by Muñoz-Aguirre and colleagues aims to describe the development of PyHIST which is a histological image segmentation tool. Overall, this manuscript presents results that would be of interest to the community of scientists and computational biologists concerned with this problem. However, there are major issues in this manuscript that prevent us from recommending that this manuscript be accepted in its current state. Major: 1) Abstract: highlights that preprocessing enabled by PyHIST involves image scaling, segmentation, and eventually tile extraction to clearly mention the utility of PyHIST. 2) Introduction: The paper correctly addresses the need for standardization of the tiling and patch-creating pipeline for researchers working in this area to prevent dataset-specific biases. Although, as far as saving research time is concerned, currently, WSI preprocessing requires developing custom scripts, but once a process is established researchers can typically use similar code for subsequent tiling for all projects. Therefore, PyHIST may only save a significant amount of time at the initial phase. 3) Facts have been mentioned without references – we have mentioned a few examples but urge the authors to add extensive references: - lines 22 (citation for WSI obtaining process required), - 23 (citation for use in cancer), - 25 (citation to support the claim of development of computational methods for disease diagnosis and classification), and - 33 (cite literature to support histopathological images capturing endophenotypes that provide crucial information when correlated with molecular and cellular data). - In a similar way, kindly provide references at lines 37, 46, 50 4) Design and Implementation: - It’s not clear why the authors are interested in highlighting edges within tissue fragments rather than outlining the entire fragment. Figure 1b resembles a grayscaled WSI. A similar result as Figure 1b can be reached with less computation by just binarizing the WSI using a threshold to separate background from foreground. Does edge detection provide any unique benefits over binarizing the WSI? - The graph-based segmentation algorithm can perform unsupervised segmentation on complex images, but in this case the algorithm just needs to detect the connected objects. If the input image is a binary mask (foreground and background), there are many simple functions to label contiguous/interconnected objects and produce an output similar to Figure 1c. Is graph-based segmentation used because it works well with edge detection inputs? How does it compare computationally to other connected-component labeling techniques such as Python Skimage’s measure.label function? - Why are steps (b) and (c) needed in the PyHIST pipeline in Figure 1? Red gridlines still appear to tile the entire WSI and then some tiles are not stored based on a background threshold. How are the tissue fragment labels from (c) used? 5) Results: - Details of the deep learning model have not been provided – patches detected correctly have vague histology that is shown in Figure 2 A (explained below). We suggest a pathologist review of the deep learning model results. Additionally, the connection between a better model accuracy on the dataset and the validity of the pre-processing steps has not been made. - The partitioning of training and test sets can be the most time-consuming pre- processing steps of the ML process. Tiles from the same WSI should be constrained to the training or test sets. It is difficult to satisfy this constraint, while also managing the percentage of tiles in the test set and class imbalances. This process is not a built-in feature of PyHIST and it is unclear in the paper if PyHIST assists with this aspect of the ML pipeline at all. - The deep learning results are an example that tiles processed using PyHIST can achieve high prediction performance, but it doesn’t necessarily prove that it is better than other baseline or competing approaches. WSIs from different part of the body can be quite distinguishable, so many different tiling approaches could produce similar results. The Results section could include comparisons of performance and computation time for several tiling methodologies. How does PyHIST stack up against other techniques? 6) Availability and future directions: - The SVS limitation is mentioned here but should also be addressed earlier in the Intro or Design sections. For example, “PyHIST is currently limited to only SVS format due to/because…”. 7) Figures: - Figure 2 A: histology is ambiguous since the top panel for ‘T-brain’ shows artefactual tissue rather than brain tissue with cell bodies of neurons or glia etc. This is repeated for 3rd, 4th and 5th (from left) T-breast, and 1st (from left) T-colon. 8) Supplementary Materials: - Section S3: cropping the image tiles is mentioned – what is the size of these crops and are these kept uniform each time? Explanation is required for clarity of the user. - Section S2: the segmentation parameters seem to be an important part of tiling, but it is still unclear how they work. Is this a way of capturing tiles that have background in a certain orientation? How do parameters for border and corners interact with the background percentage and how does this influence segmentation? ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jerome Cheng Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions, please see 28 Aug 2020 Submitted filename: Reviewers_comments_R1.pdf Click here for additional data file. 17 Sep 2020 Dear Muñoz-Aguirre, We are pleased to inform you that your manuscript 'PyHIST: A Histological Image Segmentation Tool' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Dina Schneidman Software Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In the revised and significantly improved version of the manuscript, the authors addressed each reviewer's concerns, and all of my previous comments have been satisfactorily addressed. I do not have any new recommendations. Reviewer #2: The revised manuscript submitted by Muñoz-Aguirre and colleagues extensively address the comments raised by the reviewers. We commend them for adding detailed methods regarding pre-processing including tile extraction, additional relevant references, and mask comparisons in Supplementary Text S1 and Supplementary Figure S3. Further the edits done for figure 2 have enabled to message to be clearer and the authors have done a remarkable job. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jerome Cheng Reviewer #2: Yes: Sana Syed 9 Oct 2020 PCOMPBIOL-D-20-00862R1 PyHIST: A Histological Image Segmentation Tool Dear Dr Muñoz-Aguirre, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Matt Lyles PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

20 in total

Review 1. Automated image analysis in histopathology: a valuable tool in medical diagnostics.

Authors: Laoighse Mulrane; Elton Rexhepaj; Steve Penney; John J Callanan; William M Gallagher
Journal: Expert Rev Mol Diagn Date: 2008-11 Impact factor: 5.225

2. Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features.

Authors: Talha Qaiser; Yee-Wah Tsang; Daiki Taniyama; Naoya Sakamoto; Kazuaki Nakane; David Epstein; Nasir Rajpoot
Journal: Med Image Anal Date: 2019-04-04 Impact factor: 8.545

3. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer.

Authors: Babak Ehteshami Bejnordi; Mitko Veta; Paul Johannes van Diest; Bram van Ginneken; Nico Karssemeijer; Geert Litjens; Jeroen A W M van der Laak; Meyke Hermsen; Quirine F Manson; Maschenka Balkenhol; Oscar Geessink; Nikolaos Stathonikos; Marcory Crf van Dijk; Peter Bult; Francisco Beca; Andrew H Beck; Dayong Wang; Aditya Khosla; Rishab Gargeya; Humayun Irshad; Aoxiao Zhong; Qi Dou; Quanzheng Li; Hao Chen; Huang-Jing Lin; Pheng-Ann Heng; Christian Haß; Elia Bruni; Quincy Wong; Ugur Halici; Mustafa Ümit Öner; Rengul Cetin-Atalay; Matt Berseth; Vitali Khvatkov; Alexei Vylegzhanin; Oren Kraus; Muhammad Shaban; Nasir Rajpoot; Ruqayya Awan; Korsuk Sirinukunwattana; Talha Qaiser; Yee-Wah Tsang; David Tellez; Jonas Annuscheit; Peter Hufnagl; Mira Valkonen; Kimmo Kartasalo; Leena Latonen; Pekka Ruusuvuori; Kaisa Liimatainen; Shadi Albarqouni; Bharti Mungal; Ami George; Stefanie Demirci; Nassir Navab; Seiryo Watanabe; Shigeto Seno; Yoichi Takenaka; Hideo Matsuda; Hady Ahmady Phoulady; Vassili Kovalev; Alexander Kalinovsky; Vitali Liauchuk; Gloria Bueno; M Milagro Fernandez-Carrobles; Ismael Serrano; Oscar Deniz; Daniel Racoceanu; Rui Venâncio
Journal: JAMA Date: 2017-12-12 Impact factor: 56.272

4. OpenSlide: A vendor-neutral software foundation for digital pathology.

Authors: Adam Goode; Benjamin Gilbert; Jan Harkes; Drazen Jukic; Mahadev Satyanarayanan
Journal: J Pathol Inform Date: 2013-09-27

5. Collaborative analysis of multi-gigapixel imaging data using Cytomine.

Authors: Raphaël Marée; Loïc Rollus; Benjamin Stévens; Renaud Hoyoux; Gilles Louppe; Rémy Vandaele; Jean-Michel Begon; Philipp Kainz; Pierre Geurts; Louis Wehenkel
Journal: Bioinformatics Date: 2016-01-10 Impact factor: 6.937

6. QuPath: Open source software for digital pathology image analysis.

Authors: Peter Bankhead; Maurice B Loughrey; José A Fernández; Yvonne Dombrowski; Darragh G McArt; Philip D Dunne; Stephen McQuaid; Ronan T Gray; Liam J Murray; Helen G Coleman; Jacqueline A James; Manuel Salto-Tellez; Peter W Hamilton
Journal: Sci Rep Date: 2017-12-04 Impact factor: 4.379

Review 7. Translational AI and Deep Learning in Diagnostic Pathology.

Authors: Ahmed Serag; Adrian Ion-Margineanu; Hammad Qureshi; Ryan McMillan; Marie-Judith Saint Martin; Jim Diamond; Paul O'Reilly; Peter Hamilton
Journal: Front Med (Lausanne) Date: 2019-10-01

8. The Promise of Digital Biopsy for the Prediction of Tumor Molecular Features and Clinical Outcomes Associated With Immunotherapy.

Authors: Giuseppe Luigi Banna; Timothée Olivier; Francesco Rundo; Umberto Malapelle; Filippo Fraggetta; Massimo Libra; Alfredo Addeo
Journal: Front Med (Lausanne) Date: 2019-07-31

Review 9. Computational pathology definitions, best practices, and recommendations for regulatory guidance: a white paper from the Digital Pathology Association.

Authors: Esther Abels; Liron Pantanowitz; Famke Aeffner; Mark D Zarella; Jeroen van der Laak; Marilyn M Bui; Venkata Np Vemuri; Anil V Parwani; Jeff Gibbs; Emmanuel Agosto-Arroyo; Andrew H Beck; Cleopatra Kozlowski
Journal: J Pathol Date: 2019-09-03 Impact factor: 7.996

10. Next generation diagnostic pathology: use of digital pathology and artificial intelligence tools to augment a pathological diagnosis.

Authors: Anil V Parwani
Journal: Diagn Pathol Date: 2019-12-27 Impact factor: 2.644

3 in total

Review 1. Multiplex Immunofluorescence and the Digital Image Analysis Workflow for Evaluation of the Tumor Immune Environment in Translational Research.

Authors: Frank Rojas; Sharia Hernandez; Rossana Lazcano; Caddie Laberiano-Fernandez; Edwin Roger Parra
Journal: Front Oncol Date: 2022-06-27 Impact factor: 5.738

2. Prediction of Breast Cancer Recurrence Using a Deep Convolutional Neural Network Without Region-of-Interest Labeling.

Authors: Nam Nhut Phan; Chih-Yi Hsu; Chi-Cheng Huang; Ling-Ming Tseng; Eric Y Chuang
Journal: Front Oncol Date: 2021-10-21 Impact factor: 6.244

3. Predicting Breast Cancer Gene Expression Signature by Applying Deep Convolutional Neural Networks From Unannotated Pathological Images.

Authors: Nam Nhut Phan; Chi-Cheng Huang; Ling-Ming Tseng; Eric Y Chuang
Journal: Front Oncol Date: 2021-12-01 Impact factor: 6.244

3 in total