BACKGROUND: Liver segmentation in computed tomography (CT) imaging has been widely investigated as a crucial step for analyzing liver characteristics and diagnosing liver diseases. However, obtaining satisfactory liver segmentation performance is highly challenging because of the poor contrast between the liver and its surrounding organs and tissues, the high levels of CT image noise, and the wide variability in liver shapes among patients. METHODS: To overcome these challenges, we propose a novel method for liver segmentation in CT image sequences. This method uses an enhanced mask region-based convolutional neural network (Mask R-CNN) with graph-cut segmentation. Specifically, the k-nearest neighbor (k-NN) algorithm is employed to cluster the target liver pixels in order to get an appropriate aspect ratio. Then, anchors are adapted to the liver size using the ratio information. Thus, high-accuracy liver localization can be achieved using the anchors and rotation-invariant object recognition. Next, a fully convolutional network (FCN) is used to segment the foreground objects, and local fine-grained liver detection is realized by pixel prediction. Finally, a whole liver mask is obtained by Mask R-CNN proposed in this paper. RESULTS: We proposed a Mask R-CNN algorithm which achieved superior performance in comparison with the conventional Mask R-CNN algorithms in term of the dice similarity coefficient (DSC), and the Medical Image Computing and Computer-Assisted Intervention (MICCAI) metrics. CONCLUSIONS: Our experimental results demonstrate that the improved Mask R-CNN architecture has good performance, accuracy, and robustness for liver segmentation in CT image sequences. 2021 Annals of Translational Medicine. All rights reserved.
BACKGROUND: Liver segmentation in computed tomography (CT) imaging has been widely investigated as a crucial step for analyzing liver characteristics and diagnosing liver diseases. However, obtaining satisfactory liver segmentation performance is highly challenging because of the poor contrast between the liver and its surrounding organs and tissues, the high levels of CT image noise, and the wide variability in liver shapes among patients. METHODS: To overcome these challenges, we propose a novel method for liver segmentation in CT image sequences. This method uses an enhanced mask region-based convolutional neural network (Mask R-CNN) with graph-cut segmentation. Specifically, the k-nearest neighbor (k-NN) algorithm is employed to cluster the target liver pixels in order to get an appropriate aspect ratio. Then, anchors are adapted to the liver size using the ratio information. Thus, high-accuracy liver localization can be achieved using the anchors and rotation-invariant object recognition. Next, a fully convolutional network (FCN) is used to segment the foreground objects, and local fine-grained liver detection is realized by pixel prediction. Finally, a whole liver mask is obtained by Mask R-CNN proposed in this paper. RESULTS: We proposed a Mask R-CNN algorithm which achieved superior performance in comparison with the conventional Mask R-CNN algorithms in term of the dice similarity coefficient (DSC), and the Medical Image Computing and Computer-Assisted Intervention (MICCAI) metrics. CONCLUSIONS: Our experimental results demonstrate that the improved Mask R-CNN architecture has good performance, accuracy, and robustness for liver segmentation in CT image sequences. 2021 Annals of Translational Medicine. All rights reserved.
Liver segmentation in medical imaging data is of great clinical significance for lesion resections and liver transplantations (1-3). In addition, accurate liver segmentation offers a substantial aid to physicians in diagnosing and treating liver diseases (4-6). However, manual liver segmentation takes a lot of time and effort. Automatic liver segmentation algorithms have been proposed to reduce these costs, but the performance outcomes of such algorithms are still limited due to several challenges in computed tomography (CT) imaging. First, the liver grayscale patterns in CT images are very similar to those of the surrounding organs (such as the stomach, pancreas, kidneys, and muscles) (7-9). Secondly, liver shapes and sizes vary widely across patients (10,11). Thirdly, differences in CT scanning equipment lead to large variations in CT image appearance and liver location (12,13).State-of-the-art liver segmentation performance has progressively improved with the emergence of deep network architectures (14,15). However, segmentation accuracy and speed are still unsatisfactory for complex scenes (16-18). In addition, the use of fine-grained classification in liver segmentation is complicated by the small inter-class differences that arise from the close similarities between subcategories and by the large intra-class variabilities in position, scale, and orientation (19,20). Indeed, fine-grained recognition is popular in computer vision and pattern recognition applications (21,22); it facilitates the learning of object parts and helps differentiate between object subclasses, and hence can be employed to learn liver patterns more accurately (23).In 2016, Girshick et al. (24) designed a region-based convolutional neural network (R-CNN), which used candidate region proposals and a CNN-based classification algorithm for detection. This R-CNN architecture boosted the performance of target detection and recognition systems and inspired the creation of more powerful deep-learning algorithms for such systems. In particular, the R-CNN algorithm adopted 4 steps, comprised of conventional target detection framework, feature extraction, image classification, and non-maximum suppression. Nevertheless, the R-CNN algorithm exploited CNN-based features instead of traditional hand-crafted features such as those of the scale-invariant feature transform (SIFT) (25,26) or the histogram of oriented gradients (HOGs) (27-29). The fast R-CNN (Fast R-CNN) (30) and mask R-CNN (Mask R-CNN) (31) were both built as variants of the R-CNN algorithm.This study sought to improve liver segmentation performance through an enhanced variant of the Mask R-CNN algorithm (32-35). A novel idea in the study is that the k-NN was employed for data clustering and obtaining an appropriate aspect ratio during the training phase. Compared with other models, our proposed model can effectively reduce computational resources and improve computational accuracy. Enhanced Mask R-CNN algorithm is based on original algorithm and combined with k-NN, we applied this model to liver segmentation for the first time. At present, this model has not been widely used in clinical trials, and we will study this model as soon as possible and apply it to clinical treatment.The experimental results demonstrate significant performance improvements based on the proposed method.The key aspects of our study were as follows:The study highlighted the importance of random variations in liver images. With these variations, the proposed Mask R-CNN method provided remarkable improvements in segmentation compared to the conventional Mask R-CNN method. Labeled data was augmented through image rotation operations during the data preparation phase to improve generalization and reduce overfitting;During the training phase, the k-nearest neighbor (k-NN) algorithm was employed for data clustering and obtaining an appropriate aspect ratio. Moreover, a fully convolutional network (FCN) was adopted to realize the segmentation algorithm after accounting for width-height reversal and noise;The performance of the proposed Mask R-CNN algorithm was evaluated in comparison with the conventional Mask R-CNN algorithm according to 3 metrics, namely, detection accuracy (DA), detection speed (DS), and false-detection rate (FD). The effectiveness and feasibility of the proposed algorithm were verified in comparison to state-of-the-art segmentation algorithms through the measurement of the volumetric overlap error (VOE), the relative volume difference (RVD), the average symmetric surface distance (ASD), the root-mean-square symmetric surface distance (RMSD), and the maximum symmetric surface distance (MSD).We present the following article in accordance with the MDAR reporting checklist (available at https://dx.doi.org/10.21037/atm-21-5822).
Methods
Conventional Mask R-CNN
shows the general framework and the associated network structure for the conventional Mask R-CNN algorithm. First, feature extraction is performed through convolutional layers on input liver images of arbitrary sizes to form feature maps. Then, a region proposal network (RPN) exploits the convolutional layer outputs for domain (or proposal) generation, as well as category and bounding-box regression, in order to speed up the computations. Pixel-wise segmentation of the target lever and liver localization in the image are carried out by a parallel feature pyramid network (FPN).
Figure 1
Framework of the conventional Mask R-CNN algorithm. FPN, feature pyramid network; RPN, region proposal network; FCN, fully convolutional network; Mask R-CNN, mask region-based convolutional neural network.
Framework of the conventional Mask R-CNN algorithm. FPN, feature pyramid network; RPN, region proposal network; FCN, fully convolutional network; Mask R-CNN, mask region-based convolutional neural network.The conventional Mask R-CNN algorithm follows a two-stage object detection method. Firstly, a candidate target region is generated, and the boundary box of the candidate object is proposed in line with the Faster R-CNN method. Secondly, binary masks, predicted classes, and bounding-box offsets are returned by the conventional Mask R-CNN algorithm for each region of interest (RoI), where the classification outcomes are dependent on mask predictions (36,37). In the training phase, a multi-objective loss function is defined for each RoI sample as (38):where L, L, and L denote the classification loss, the bounding-box loss, and the segmentation loss, respectively.The conventional Mask R-CNN algorithm proposes a RoI Align layer that obtains image values at pixel points with floating-point coordinates through bilinear interpolation. This approach avoids any quantization of the RoI boundaries or intervals and thus ensures the continuity of the entire process of feature aggregation. Specifically, the RoI Align layer does not carry out pooling by merely supplementing the coordinate points on the candidate region boundary. Instead, this layer first traverses each candidate region and keeps the floating-point boundary unquantized. Each region is then subdivided into K × K units with unquantized boundaries, and 4 fixed coordinate positions are found in each unit through bilinear interpolation. Hence max pooling is applied. The RoI Align layer clusters local features in the Mask R-CNN algorithm reduces misalignment caused by the two quantization operations in RoI pooling and thus improves the DA (39,40).
Enhanced Mask R-CNN algorithm
In this study, novel improvements were made to the conventional Mask R-CNN detection framework. The proposed network framework is outlined in . This framework can be divided into two steps. The first step was the candidate region identification, and in the second step, the global and local features of image blocks were learned, mainly by a part-based segmentation algorithm and the FCN architecture. The details of the two steps are as follows.
The enhanced Mask R-CNN framework. RPN, region proposal network; FPN, feature pyramid network; k-NN, k-nearest neighbor; FCN, fully convolutional network; Mask R-CNN, mask region-based convolutional neural network.Step 1: the RPN was exploited to extract features and generate a feature map at the last layer. The whole image was scanned by a sliding window to get target anchors. For each image position, it was possible to identify tens of thousands of overlapping candidate target regions with different dimensions and aspect ratios. These regions covered the whole image. We used the k-NN algorithm to optimize the number of target regions, largely reduce the computational overhead, and improve computational accuracy (). In addition, the RPN generated two outputs for each anchor, namely, the anchor category and the border tuning parameter. For multiple overlapping anchors, we adopted non-maxima suppression to obtain rough target results, where the anchor of the highest foreground score was retained. Therefore, it was possible to use the RPN prediction outcomes to select the best anchor containing the target, and fine border adjustment was applied.Step 2: image segmentation was performed using an FCN architecture that could take an input image of arbitrary resolution and produce an output of the same size. This architecture located targets in fine-grained images and treated segmentation prediction as target masking. For effective mask learning, all fine-grained training and testing images retained their original resolutions. The FCN-based mask learning process is shown in . First, FCN-based prediction was applied to obtain a local target mask in a given input image. If a pixel was predicted to be a local target position, the actual mask value was retained. Thus, fine-grained liver detection was carried out. Otherwise, if the pixel-wise prediction indicated a background region, the corresponding mask values were reset to zero. The global and local liver features in each image were learned, and the FCN algorithm returned a more accurate target mask. In addition, the obtained target masks could locate the target positions by finding the bounding rectangles. In this study, the FCN was used for target mask learning and prediction.In addition, in , the 3 streams shown correspond to the angle rotations of 3 image blocks. The characteristics of each of these blocks were learned through a series of operations such as convolution, activation, pooling, and discriminator selection. Indeed, combining image features of different scales in this study enhanced the robustness of liver detection.
Figure 3
A demonstration of liver image enhancement.
A demonstration of liver image enhancement.
Statistical analysis
In our experiments, we use 6 metrics to evaluate the segmentation performance of different algorithms. Firstly, we use the dice similarity coefficient (DSC) (41,42), which reflects the degree of spatial coincidence between a segmentation output region U and the corresponding ground-truth region U. This coefficient can be defined mathematically aswhere |U| denotes the cardinality of the set U, and denotes set intersection. The higher the DSC value, the better the segmentation performance. The best segmentation performance was achieved when the DSC had a value of 1 (i.e., when the U and U sets were identical), while the worst performance was indicated by a DSC value of 0 (when the U and U sets were mutually disjoint).In addition to the DSC metric, we also employed 5 metrics suggested by the Medical Image Computing and Computer-Assisted Intervention (MICCAI) society. These metrics were the VOE, the RVD, the ASD, the RMSD, and the MSD (5,43).where and denote the boundaries of the two regions, respectively. Also, for any pixel v, denotes the shortest distance between v and , i.e., , where is the Euclidean distance.The lower the value of each of the VOE, ASD, RMSD, and MSD, the better the segmentation performance. As the RVD metric could be negative (in the case of under-segmentation), the absolute value of this metric was used for evaluating the segmentation performance. The smaller the absolute RVD value, the better the segmentation performance. In general, each of these 5 metrics had a 0 value for perfect segmentation.
Results
Experimental data and environment
In our experiments, we used a CentOS7 with an Intel Core i7 processor and a 48-GB GPU NVIDIA RTX8000 graphics card with 128GB of memory. We carried out the algorithm implementation and simulation in Python 3.6. For system training and testing, we exploited the liver segmentation dataset of the Codalab competition (https://competitions.codalab.org/). This dataset contains 723 sequences of enhanced CT with a resolution of 512×512 (39,40) and has 3-phase ground-truth data of liver CT images. The total CT image count was 3,034 after accounting for liver image rotations. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Parameter settings
In this study, the enhanced Mask R-CNN architecture was trained for liver segmentation. Reasonable parameters values were selected to speed up training and prevent overfitting. The specific parameter settings employed herein are shown in .
Table 1
Learning parameters for the enhanced Mask R-CNN architecture
Experiments were conducted to compare the performance of the conventional Mask R-CNN against the enhanced Mask R-CNN, which employed sequence information and a graph-cut function and technical repeat 12 times. Six slices of liver images (with normal and pathological cases) were selected to investigate the effect of using sequence information and the graph-cut function in improving the Mask R-CNN segmentation outputs. gives a comparison of the results, which are shown in red contours.
Figure 4
A comparison of segmentation results between the enhanced and conventional Mask R-CNN algorithms. Mask R-CNN, mask region-based convolutional neural network.
A comparison of segmentation results between the enhanced and conventional Mask R-CNN algorithms. Mask R-CNN, mask region-based convolutional neural network.As shown in , the conventional Mask R-CNN method obviously missed lesion areas of adjacent greyscale liver regions during liver slice processing. Also, this method misjudged small liver regions, resulting in mis-segmentation. Our proposed method managed to correct these mis-segmentation problems by incorporating the sequence aspect ratio information. Our method could, to a certain extent, improve segmentation accuracy and robustness for each slice in the sequence. This is an obvious advantage of our enhanced Mask R-CNN over the conventional Mask R-CNN.As shown in , the experimental results of the enhanced Mask R-CNN were evaluated in terms of the following indicators: the average precision (AP), the false-positive rate (FPR) (also called the complemented sensitivity), and the detection rate (DR). The enhanced algorithm had relatively high AP and FPR values, as well as very low temporal complexity. This performance can be ascribed to the training adequacy and the relative roughness of the output boundary, though there were mis-segmentation or over-segmentation errors.
Table 2
Comparison of segmentation algorithms before and after enhancements
Mask R-CNN, mask region-based convolutional neural network; AP, average precision; FPR, false-positive rate; DR, detection rate.Further experiments were carried out to assess the impact of sequence information and the graph-cut function on improving the Mask R-CNN method in comparison to the FCN-8s algorithm (44), the 2D-dense-FCN algorithm (45), and the U-Net algorithm (46). Parameter settings for these networks were seen in and the performance was evaluated using the DSC, VOE, RVD, ASD, RMSD, and MSD metrics. A comparison of the segmentation results of the 4 algorithms is shown in . The test data for this comparison included slices with large, medium, and small liver regions.
Figure 5
A comparison of the segmentation results between the algorithms of FCN-8s, U-Net, 2D-dense-FCN, and enhanced Mask R-CNN. FCN, fully convolutional network; Mask R-CNN, mask region-based convolutional neural network.
A comparison of the segmentation results between the algorithms of FCN-8s, U-Net, 2D-dense-FCN, and enhanced Mask R-CNN. FCN, fully convolutional network; Mask R-CNN, mask region-based convolutional neural network.The results in show that for the 6 liver slices, some under-segmentation or over-segmentation errors were made by the FCN-8s, U-Net, and 2D-dense-FCN algorithms. Also, some of the segmented liver slices did not exhibit complete boundaries, while others had extraneous parts that did not belong to the original liver slices. However, our enhanced Mask R-CNN method was able to produce more solid boundaries and return segmented liver slices with no extra holes.indicates that the enhanced Mask R-CNN method significantly outperformed the other algorithms, except for the U-Net method, which shows a better VOE, and the 2D-dense-FCN method, which shows a better RVD (but less stability) compared to our method. However, our algorithm clearly outperformed the U-Net algorithm for all other indicators. In addition, our algorithm showed superior performance for all 5 indicators in comparison to all the other algorithms. The relatively large ASD, RMSD, and MSD values for both the U-Net and FCN-8s algorithms indicate large differences between the liver segmentation results and the corresponding ground-truth regions.
Table 3
A comparison of 4 liver segmentation algorithms based on 6 metrics
Automatic algorithms for liver segmentation in CT images seek to handle peripheral organs and the large inter-personal differences in liver characteristics (47). Filtering a certain number of anchors can greatly improve accuracy and reduce time consumption. To address the weaknesses of the conventional Mask R-CNN algorithm, we proposed a novel enhanced Mask R-CNN algorithm. Specifically, we augmented the conventional method with rotation angle adjustment and filtered out a certain anchor ratio. We also used the enhanced Mask R-CNN for liver slice segmentation as well as the creation of a probability map. Our proposed solution enhanced the Mask R-CNN algorithm by incorporating the advantages of the k-NN methodology. In addition, our solution improved the segmentation accuracy and robustness using the rotation information obtained from liver image sequences. While we focused on enhancing and employing the Mask R-CNN algorithm in liver segmentation in this study, the methodology can be extended to enhancing the segmentation outcomes for other organs (48,49). Otherwise, for the same CT images, we can apply this model to segment any type of bodily tissue, but adjusting the parameters appropriately was required.The article’s supplementary files as
Authors: Alexander Mühlberg; Julian W Holch; Volker Heinemann; Thomas Huber; Jan Moltz; Stefan Maurus; Nils Jäger; Lian Liu; Matthias F Froelich; Alexander Katzmann; Eva Gresser; Oliver Taubmann; Michael Sühling; Dominik Nörenberg Journal: Eur Radiol Date: 2020-08-27 Impact factor: 5.315