| Literature DB >> 35155486 |
Henrik Sahlin Pettersen1,2,3, Ilya Belevich4, Elin Synnøve Røyset1,2,3, Erik Smistad5,6, Melanie Rae Simpson7,8, Eija Jokitalo4, Ingerid Reinertsen5,6, Ingunn Bakke2,3, André Pedersen2,5,9.
Abstract
Application of deep learning on histopathological whole slide images (WSIs) holds promise of improving diagnostic efficiency and reproducibility but is largely dependent on the ability to write computer code or purchase commercial solutions. We present a code-free pipeline utilizing free-to-use, open-source software (QuPath, DeepMIB, and FastPathology) for creating and deploying deep learning-based segmentation models for computational pathology. We demonstrate the pipeline on a use case of separating epithelium from stroma in colonic mucosa. A dataset of 251 annotated WSIs, comprising 140 hematoxylin-eosin (HE)-stained and 111 CD3 immunostained colon biopsy WSIs, were developed through active learning using the pipeline. On a hold-out test set of 36 HE and 21 CD3-stained WSIs a mean intersection over union score of 95.5 and 95.3% was achieved on epithelium segmentation. We demonstrate pathologist-level segmentation accuracy and clinical acceptable runtime performance and show that pathologists without programming experience can create near state-of-the-art segmentation solutions for histopathological WSIs using only free-to-use software. The study further demonstrates the strength of open-source solutions in its ability to create generalizable, open pipelines, of which trained models and predictions can seamlessly be exported in open formats and thereby used in external solutions. All scripts, trained models, a video tutorial, and the full dataset of 251 WSIs with ~31 k epithelium annotations are made openly available at https://github.com/andreped/NoCodeSeg to accelerate research in the field.Entities:
Keywords: U-Net; code-free; colon; computational pathology; deep learning; inflammatory bowel disease; open datasets; semantic segmentation
Year: 2022 PMID: 35155486 PMCID: PMC8829033 DOI: 10.3389/fmed.2021.816281
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1Flowchart showing the pipeline from manual annotation in QuPath, export of labeled patches from QuPath (black arrows), CNN training in DeepMIB, expansion of the dataset by predicting unseen WSIs in DeepMIB and importing and correcting predictions in QuPath (red arrows), and final export of trained networks as ONNX-files and rapid prediction directly on WSIs in FastPathology (blue arrows).
Comparative accuracies on the HE stained (n = 36) and CD3 immunostained (n = 21) test sets with different hyperparameter settings for U-Net and SegNet.
|
|
|
|
|
|
|
|
|
|
| ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| HE | U-Net | 512 × 512 | 32 | 6 | 16 |
| 0.972 |
|
|
|
|
| HE | U-Net | 256 × 256 | 32 | 6 | 16 |
|
| 0.938 |
|
| 0.920 |
| HE | U-Net | 256 × 256 | 32 | 6 | 32 | 0.988 | 0.978 | 0.936* | 0.976 | 0.991 | 0.920 |
| HE | U-Net | 256 × 256 | 64 | 6 | 32 | 0.987 | 0.974 | 0.935* | 0.975 | 0.991 | 0.919 |
| HE | U-Net | 128 × 128 | 32 | 6 | 16 | 0.988 |
| 0.932* | 0.977 | 0.991 | 0.911 |
| HE | U-Net | 64 × 64 | 32 | 6 | 16 | 0.985 | 0.965 | 0.924* | 0.971 | 0.989 | 0.904 |
| HE | SegNet | 512 × 512 | 32 | 6 | 16 | 0.983 | 0.964 | 0.928* | 0.967 | 0.988 | 0.918 |
| HE | SegNet | 256 × 256 | 32 | 6 | 16 | 0.987 | 0.973 | 0.939* | 0.974 | 0.991 | 0.927 |
| HE | SegNet | 128 × 128 | 32 | 6 | 16 | 0.979 | 0.964 | 0.904* | 0.960 | 0.985 | 0.884 |
| CD3 | U-Net | 512 × 512 | 32 | 6 | 16 |
|
|
|
|
|
|
| CD3 | U-Net | 256 × 256 | 32 | 6 | 16 | 0.987 | 0.977 | 0.931* | 0.974 | 0.990 | 0.911 |
| CD3 | SegNet | 512 × 512 | 32 | 6 | 16 | 0.976 | 0.953 | 0.920* | 0.954 | 0.983 | 0.919 |
| CD3 | SegNet | 256 × 256 | 32 | 6 | 16 | 0.971 | 0.949 | 0.898* | 0.945 | 0.979 | 0.889 |
All metrics were reported as the mean at WSI-level. Best performing methods are highlighted in bold, for each respective metric and for each data set. The number of train/validation/test patches for each dataset was as follows: HE: (4973/154/1195); CD3 (3539/110/674). Stars indicate significant differences (p < 0.01) compared to the single best performing architecture within each dataset (HE and CD3 U-Net 512x512, 32 filters, 16 batch, respectively) using a two-level mixed regression model (see .
Figure 2Examples of predictions (middle column) and ground truth (right column) of epithelial segmentation (transparent green) of HE stained (top row) and CD3 immunostained (bottom row) 512 × 512-pixel image patches in DeepMIB. The arrow shows the approximate cut-offs for (filled or unfilled) minimal tubule hole size used during annotation.
Runtime measurements of different inference engines using FastPathology.
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| ||
| OpenvINO CPU | Intel i7-10750H | 3.65 | 1.03 | 135.31 | 0.80 | 2.76 | 7.09 | 76.38 |
| OpenVINO GPU | Intel UHD graphics |
| 1.26 | 133.96 | 1.25 | 3.46 | 7.83 | 76.65 |
| TensorRT | RTX 2070 Max-Q | 5.12 |
|
|
|
|
|
|
The table shows means of 10 runtime experiments for the 256 × 256 pixel input patch size U-Net applied to a single representative WSI from the dataset (540 patches without overlap). Inference measurements show runtime per 256 × 256 patch in milliseconds (ms). Export of a full WSI pyramidal TIFF performed once after inference is reported in ms, and the total runtime for the full WSI (including TIFF export) is shown in seconds (s). The fastest runtimes are highlighted in bold. GEN, Generator; NN, Neural Network; WSI, Whole Slide Image.
Figure 3Examples of prediction errors in difficult regions: HE stained images with folding artifacts (top row, red arrows) and granulocyte aggregates (second row, blue arrows). CD3 immunostained images with thick mucin rich epithelium (third row, red stars) and poorly fixated blurred epithelium at the edge of a patch (bottom row, blue stars). Prediction (middle column) and ground truth (right column) of epithelial segmentation are shown in transparent green. 512 × 512-pixel image patches displayed in DeepMIB.
Figure 4Significant differences in prediction accuracies for the (A) HE stained test set WSIs (n = 36) with active disease (n = 15) vs. inactive disease (n = 21), and (B) CD3 immunostained test set WSIs (n = 21) with active disease (n = 7) and inactive disease (n = 14). Error bars represent 95% confidence intervals assuming normality. Two-tailed Student's T-test of active vs. inactive disease gave p < 0.0001 for all four comparisons.