| Literature DB >> 33195135 |
Yan Kong1, Georgi Z Genchev1,2,3, Xiaolei Wang1, Hongyu Zhao4, Hui Lu1,2.
Abstract
Nuclei segmentation is a fundamental but challenging task in histopathological image analysis. One of the main problems is the existence of overlapping regions which increases the difficulty of independent nuclei separation. In this study, to solve the segmentation of nuclei and overlapping regions, we introduce a nuclei segmentation method based on two-stage learning framework consisting of two connected Stacked U-Nets (SUNets). The proposed SUNets consists of four parallel backbone nets, which are merged by the attention generation model. In the first stage, a Stacked U-Net is utilized to predict pixel-wise segmentation of nuclei. The output binary map together with RGB values of the original images are concatenated as the input of the second stage of SUNets. Due to the sizable imbalance of overlapping and background regions, the first network is trained with cross-entropy loss, while the second network is trained with focal loss. We applied the method on two publicly available datasets and achieved state-of-the-art performance for nuclei segmentation-mean Aggregated Jaccard Index (AJI) results were 0.5965 and 0.6210, and F1 scores were 0.8247 and 0.8060, respectively; our method also segmented the overlapping regions between nuclei, with average AJI = 0.3254. The proposed two-stage learning framework outperforms many current segmentation methods, and the consistent good segmentation performance on images from different organs indicates the generalized adaptability of our approach.Entities:
Keywords: Stacked U-Nets; attention generation mechanism; deep learning; histopathological image; nuclei segmentation
Year: 2020 PMID: 33195135 PMCID: PMC7649338 DOI: 10.3389/fbioe.2020.573866
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
FIGURE 1Overview of the two-stage learning method. (A) Overall flow chart of the two-stage method. Input: First, images are split into small patches of 384 × 384-pixel size, resized into four different scales (1.25×, 1.0×, 0.75×, and 0.50×). Stage 1: In stage 1, the patches are fed into the first set of Stacked U-Nets for the first round of nuclei segmentation. The Stacked U-Nets consist of four parallel backbone nets that have different sized images as input. At the end of stage 1, the mask segmentation of nuclei regions is generated with pixel gray value of 0 (not nuclei regions) and 1 (nuclei region). In addition, the nuclei instance segmentation is also predicted by the watershed algorithm. Stage 2: The stage 2 input contains not only the original RGB image patches but also the binary masks segmentation of nuclei regions predicted by stage 1 as the fourth set of values. At the end of stage 2, the segmentation result of overlapped regions is generated with pixel gray value of 0 (not overlapped regions) and 1 (overlapped region). Merge: In the merge step, the first round of nuclei instance segmentations results from stage 1 are updated by merging the corresponding overlapped objects, which have at least 10 pixels overlapped with the contour objects derived from stage 2. Output: The final output of the flow is nuclei instance segmentation result which includes separate nuclei of the overlapping regions if they have. (B) Architecture of the Stacked U-Nets. Blue rectangles stand for the multiple layers in the backbone net with the same spatial dimensions. The Attention Generation Model (AGM) is used to weight and sum the predictions of the four scaled backbone nets and generate the final segmentation. The output of the Attention Generation Model (AGM) is a weight matrix which weights for each backbone net that have different scaled images as input. Each backbone net returns a segmentation result weight matrix generated by the AGM which is used to multiply (X circle in panel C) the result of each segmentation result and sum them together (+ circle in panel C) to get the final result. (C) Detailed architecture of the backbone net used in the Stacked U-Nets. Each dark blue box corresponds to a multi-channel feature map. The number of channels is denoted on the top of the box. The spatial dimensions are provided under some of the boxes (boxes with the same height have the same spatial dimension). White boxes represent copied feature maps from layers where the gray arrows originate. The arrows with different colors denote the different operations–red for de-convolution, green for max-pooling, blue for regular convolution, and gray for copy and concatenation.
Comparison of AJI of different methods applied to the TCGA test set.
| Organ | Bladder | Colorectal | Stomach | Breast | Kidney | Liver | Prostate | Overall |
| FCN-8 ( | 0.5376 | 0.4018 | 0.5279 | 0.5598 | 0.5267 | 0.5045 | 0.5709 | 0.5171 |
| Mask R-CNN ( | 0.5011 | 0.3814 | 0.6151 | 0.4913 | 0.5182 | 0.4622 | 0.5322 | 0.5002 |
| U-Net ( | 0.5403 | 0.4061 | 0.6529 | 0.4681 | 0.5426 | 0.4284 | 0.5888 | 0.5182 |
| CNN3 ( | 0.5217 | 0.5292 | 0.4458 | 0.5385 | 0.5732 | 0.5162 | 0.4338 | 0.5083 |
| DIST ( | 0.5971 | 0.4362 | 0.6479 | 0.5609 | 0.5534 | 0.4949 | 0.6284 | 0.5598 |
| Stacked U-Net | 0.6138 | 0.5188 | 0.5845 | 0.5605 | 0.5647 | 0.4594 | 0.5300 | 0.5474 |
| U-Net (DLA) | 0.6215 | 0.5322 | 0.5938 | 0.5747 | 0.5624 | 0.4642 | 0.5602 | 0.5584 |
| A two-stage U-Net ( | 0.5706 | 0.4891 | 0.6545 | 0.5613 | 0.5755 | 0.4989 | 0.6316 | 0.5687 |
| Two-stage learning U-Net (DLA) ( | 0.5376 | 0.5142 | 0.5720 | 0.5895 | ||||
| Ours | 0.5926 | 0.6541 | 0.5907 | 0.5926 | 0. |
Comparison of F1 scores of different methods applied to the TCGA test set.
| Organ | Bladder | Colorectal | Stomach | Breast | Kidney | Liver | Prostate | Overall |
| FCN-8 ( | 0.8084 | 0.6934 | 0.7982 | 0.8113 | 0.5797 | 0.7589 | 0.8367 | 0.7552 |
| Mask R-CNN ( | 0.7610 | 0.6820 | 0.8268 | 0.7481 | 0.7554 | 0.7157 | 0.7401 | 0.7470 |
| U-Net ( | 0.7953 | 0.7360 | 0.8638 | 0.7818 | 0.7913 | 0.6981 | 0.7904 | 0.7795 |
| CNN3 ( | 0.7808 | 0.7399 | 0.7181 | 0.7222 | 0.6881 | 0.7922 | 0.7623 | |
| DIST ( | 0.8196 | 0.7286 | 0.8534 | 0.8071 | 0.7706 | 0.7281 | 0.7967 | 0.7863 |
| Stacked U-Net | 0.8249 | 0.7685 | 0.8498 | 0.7990 | 0.7986 | 0.7276 | 0.7829 | 0.7930 |
| U-Net (DLA) | 0.8296 | 0.7756 | 0.8530 | 0.8025 | 0.7994 | 0.7296 | 0.7895 | 0.7970 |
| A two-stage U-Net ( | 0.7599 | 0.7668 | 0.8912 | 0.8024 | 0.8189 | |||
| Two-stage learning U-Net (DLA) ( | 0.7808 | 0.8629 | 0.8022 | 0.7513 | 0.8037 | 0.8079 | ||
| Ours | 0.8217 | 0.8690 | 0.8123 | 0.8251 | 0.7865 | 0.8451 |
Quantitative comparison of different methods applied to the TNBC dataset.
| Organ | Recall | Precision | F1-Score | AJI |
| DeconvNet ( | 0.773 | 0.805 | – | |
| FCN-8 ( | 0.752 | 0.823 | 0.763 | – |
| U-Net ( | 0.800 | 0.820 | 0.810 | 0.578 |
| Ensemble ( | 0.741 | 0.802 | – | |
| Stacked U-Net | 0.802 | 0.830 | 0.816 | 0.580 |
| U-Net (DLA) | 0.812 | 0.826 | 0.818 | 0.586 |
| DIST ( | – | – | 0.824 | 0.585 |
| Two-stage learning U-Net (DLA) ( | 0.833 | 0.826 | 0.611 | |
| Ours | 0.853 | 0.792 | 0.806 |
FIGURE 2Cropped images from seven different organs (first row) with their corresponding ground truth (second row) and the segmentation result of our method (third row).
FIGURE 3Randomly selected example of nuclei segmentation using our method. Each nucleus is randomly colored. First column: Segmentation of Ground Truth and our method. Second column: partially enlarged review of the nuclei segmentation. Red arrows point to the overlapped regions.