Literature DB >> 27175785

An improved parallel fuzzy connected image segmentation method based on CUDA.

Liansheng Wang¹, Dong Li¹, Shaohui Huang².

Abstract

PURPOSE: Fuzzy connectedness method (FC) is an effective method for extracting fuzzy objects from medical images. However, when FC is applied to large medical image datasets, its running time will be greatly expensive. Therefore, a parallel CUDA version of FC (CUDA-kFOE) was proposed by Ying et al. to accelerate the original FC. Unfortunately, CUDA-kFOE does not consider the edges between GPU blocks, which causes miscalculation of edge points. In this paper, an improved algorithm is proposed by adding a correction step on the edge points. The improved algorithm can greatly enhance the calculation accuracy.
METHODS: In the improved method, an iterative manner is applied. In the first iteration, the affinity computation strategy is changed and a look up table is employed for memory reduction. In the second iteration, the error voxels because of asynchronism are updated again.
RESULTS: Three different CT sequences of hepatic vascular with different sizes were used in the experiments with three different seeds. NVIDIA Tesla C2075 is used to evaluate our improved method over these three data sets. Experimental results show that the improved algorithm can achieve a faster segmentation compared to the CPU version and higher accuracy than CUDA-kFOE.
CONCLUSIONS: The calculation results were consistent with the CPU version, which demonstrates that it corrects the edge point calculation error of the original CUDA-kFOE. The proposed method has a comparable time cost and has less errors compared to the original CUDA-kFOE as demonstrated in the experimental results. In the future, we will focus on automatic acquisition method and automatic processing.

Entities: Chemical Disease Gene

Keywords: CUDA; Fuzzy connectedness; Vessel segmentation

Mesh：

Year: 2016 PMID： 27175785 PMCID： PMC4866034 DOI： 10.1186/s12938-016-0165-2

Source DB: PubMed Journal: Biomed Eng Online ISSN： 1475-925X Impact factor: 2.819

Background

Vessel segmentation is important for evaluation of vascular-related diseases and has applications in surgical planning. Vascular structure is a reliable mark to localize a tumor, especially in liver surgery. Therefore, accurately extracting the liver vessel from CT slices in real time is the most important factor in preliminary examination and hepatic surgical planning. In recent years, many methods of vascular segmentation have been proposed. For example, Gooya et al. [1] proposed a level-set based geometric regularization method for vascular segmentation. Yi et al. [2] used a locally adaptive region growing algorithm to segment vessel. Jiang et al. [3] employed a region growing method based on spectrum information to perform vessel segmentation. In 1996, Udupa et al. [4] addressed a theory of fuzzy objects for n-dimensional digital spaces based on a notion of fuzzy connectedness of image elements and presented algorithms for extracting a specified fuzzy object and identifying all fuzzy objects present in the image data. Lots of medical applications of the fuzzy connectedness are proposed, including multiple abdominal organ segmentation [5], tumor segmentation [6], vascular segmentation in liver, and so on. Based on fuzzy connectedness algorithm, Harati et al. [6] developed a fully automatic and accurate method for tumor region detection and segmentation in brain MR images. Liu et al. [7] presented a method for brain tumor volume estimation via MR imaging and fuzzy connectedness. However, with the size of medical data increasing, the sequential FC algorithm, which depends on the sequential performance of CPU, is greatly time-consuming. On the other hand, parallel technology developments in many domains, such as high-through DNA sequence alignment using GPUs [8], accelerating advanced MRI reconstructions on GPUs [9]. Therefore, some researchers proposed parallel implementations of FC. An OpenMP-based FC was proposed in 2008, the authors adapted a sequential fuzzy segmentation algorithm to multiprocessor machines [10]. Thereafter, Zhuge et al. [11] addressed a CUDA-kFOE algorithm which is based on NVIDIA’s compute unified device architecture (CUDA) platform. CUDA-kFOE computes the fuzzy affinity relations and the fuzzy connectedness relations as CUDA kernels and executes them on GPU. The authors improved their method in 2011 [12] and 2013 [13]. However, their methods has expensive computational cost because their method is in an iterative manner and lacks of interblock communication on the GPU [13]. In this paper, we proposed a novel solution to the limited communication capability between threads of different blocks. The purpose of our study is to improve the implementation of CUDA-kFOE and enhance the calculation accuracy on GPU by CUDA. The main contributions of the proposed method are in two folds. Firstly, the improved method doesn’t need large memory for large data set since we use a look up table. Secondly, the error voxels because of asynchronism are updated again and corrected in the last iteration of the proposed method. The paper is organized as follows. In "Background" section, we first summarize the literature of fuzzy connectedness and the CPU-based FC algorithms. Then a brief description of fuzzy connectedness and the original CUDA-kFOE is presented in the "Fuzzy connectedness and CUDA executing model" and "Previous work" sections respectively. The proposed improved CUDA-kFOE is explained in the "Methods" section. The experiments and conclusion are given in the "Results and discussion" and "Conclusion" sections respectively.

Fuzzy connectedness and CUDA executing model

Fuzzy connectedness

Fuzzy connectedness segmentation method [14] was first proposed by Udupa et al. in 1996. The idea of the algorithm is by comparing connectivity of seed points between target area and background area to separate the target and background. Let’s define X be any reference set. Fuzzy subset A of X is a set of ordered pairs,where is the member function of A in X. A fuzzy relation in X is a fuzzy subset of , , where . In addition, is reflexive if ; is symmetric, if ; is transitive, if . Let be a scene of , and if any fuzzy relation k in C is reflexive and symmetric, we said k to be a fuzzy spel affinity in C. We define aswhere are Gaussian function represented by and respectively. The mean and variance of are computed by the intensity of objects surrounded in fuzzy scene, is a zero-mean Gaussian.

CUDA executing model

The basic strategy of CUDA is for all computing threads to run concurrently in logic. Actually, tasks will divide thread blocks according to the equipments of different CUDA devices, and GPU will automatically distribute task blocks to each stream multiprocessor (SM). Figure 1 shows a procedure of blocks divided from software level to hardware level. In this procedure, all SMs will run in parallel independently. This means any task blocks in different SMs won’t execute synchronization instructions [15].

Fig. 1

Automatic scalability in CUDA [17]

Previous work

In this section, a brief introduction of the CUDA-kFOE Algorithm proposed by Ying Zhuge et al. is presented, in which the kFOE is well parallelized. The CUDA-kFOE algorithm consists of two parts. Affinity computation. We can use Eq. (2) to compute the affinity of voxel (c, d), and the result of affinity is stored in the special GPU device memory. Updating fuzzy connectivity. The nature of computation for the fuzzy connectivity is a single-source-shortest-path (SSSP) problem. How to parallelize the SSSP is a challenge problem. Fortunately, CUDA-based SSSP algorithm proposed by Harish and Narayanan solves the problem [16]. With the computing capability of Eq. (2), the atomic operations are employed to solve multiple threads by accessing the same address conflict which basically achieve SSSP parallelization, and the algorithm is presented in [11].

Methods

Performance analysis and improvement

In the first step of CUDA-kFOE algorithm, we need release enormous memory space to store the six-adjacent affinity when computing large CT series data. In addition, CUDA-kFOE will suffer from errors in some voxels in the scenario of different blocks hard to execute synchronously. In order to overcome these drawbacks of the CUDA-kFOE algorithm, in this section, we propose an improved double iterative method which can be implemented easily and has more accurate performance. The main advantages of the improved method are as follows.Let’s analyze the performance of CUDA-kFOE. Considering a single seed to start the CUDA-kFOE algorithm, and using breadth-first for computing fuzzy scenes. Figure 2 illustrates the processing of edge points, where red points represent its neighbors required to be updated and blue points represent being updated points. If the red points denote fuzzy affinity for propagation outside, the competition problem will be triggered when red points reach the blocks’ edge. The reason is that the fuzzy affinity must be propagated between different blocks. Since the procedure of outward propagation of seed point looks like a tree shape and therefore the path will not appear in a circle. Thus the calculation procedure can be seen as the generation of tree structure which is built on seed points as the tree root.

Fig. 2

Illustration of edge points processing situation. Red points means their neighborhood points are needed to be updated. Blue points means they are being updated

The proposed algorithm needs less memory compared to CUDA-kFOE when processing large data sets. (We change the affinity computation strategy by using look up table for memory reduction). The proposed algorithm doesn’t need CPU involved to handle extra computing and therefore achieve more accurate results. (The main idea is to process twice the error voxels because of asynchronism. Therefore those error voxels will be processed again in the last iteration). Illustration of edge points processing situation. Red points means their neighborhood points are needed to be updated. Blue points means they are being updated In Fig. 2, pixel 1, (2, 4), 3 and 5 locate at different thread blocks. Pixel 1, 2 and 3 are in (c) array and pixel 4 and 5 are updated points which are the neighbors of pixel 2. Considering the worst situation: because the runnings of thread blocks are disorder, when judging , pixel 5 will be influenced by pixel 2 and 3 together. The running orders have six situations: Because updating the pixel 5 only need selecting the max values of fuzzy affinity between pixel 1 and 2, the orders of situation (a) and (b) won’t influence the propagating result of fuzzy affinity. Therefore, situation (a) and (b) won’t generate errors because of thread block asynchrony. In the situation (c) and (d), if the pixel 1 doesn’t influence the values of pixel 2 and 3, the results are the same as the situation (a) and (b). However, If pixel 1 influences the pixel 2 or 3, the pixel 5 will be influenced by updating the pixel 2 and 3. At this condition, if run , or , first, new value of pixel won’t reach pixel 5, thus pixel 5 can’t compute the correct value. Therefore, we can run a correction iterator to propagate the correct value of pixel 1. Double iterations can solve the problem of situation (c) and (d). In the situation (e) and (f), pixels will cross 3 thread blocks. It’s the same situation as (c) and (d), thus we can run triple iterations to solve the asynchronous problem.

Improved algorithm and implementation

The flow chart of improved GPU implementation is illustrated in Fig. 3, which is modified from Ref. [13]. The pseudo code of the proposed method is given in the following algorithm.

Fig. 3

The flow char of improved CUDA-kFOE

The flow char of improved CUDA-kFOE As shown in the procedure of the algorithm, improved CUDA-FOE is an iteration algorithm. In the first iteration, only one voxel will participate in computing affinity and updating the six-adjacent connectivity. While the number of iteration increase, more and more voxels will be computed in parallel until there is no any update operation from all threads, which means every voxel value in is all false. In the step 6 of algorithm improved CUDA-kFOE, we use atomic operation for consistency [16] since more than one thread in update operation may access the same address simultaneously. In addition, the edges of different blocks can not be easily controlled which may cause error values for the voxels at the edge of blocks. Therefore we use two iterations to solve the problem.

Results and discussion

In the experiments, the accuracy of the proposed method is evaluated by compared to original CUDA-kFOE and the CPU version of FC at the same condition. The CPU version source code of fuzzy connectedness is from Insight Segmentation and Registration Toolkit (ITK). The experiments use a computer of DELL Precision WorkStation T7500 Tower which is equipped with two quad-cores 2.93 GHz Intel Xeon X5674 CPU. It runs Windows 7 (64 bit) with 48 GB device memory. We use NVIDIA Quadro 2000 for display and NVIDIA Tesla C2075 for computing. The NVIDIA Tesla C2075 is equipped with 6 GB memory and 14 multiprocessors, in which each multiprocessor consists of 32 CUDA cores. Table 1 shows the data set used in the experiments and the results of CPU version, original GPU version and improved GPU version in running time and accuracy. Error pointers is defined as the difference between CPU version and GPU version and its result is displayed in a new image.

Table 1

Experimental data set and performance comparison of original and improved CUDA-kFOE

Dataset	Small	Medium	Large
Seed position	(166, 224, 88)	(189, 245, 175)	(220, 217, 497)
Scene domain	512 * 512 * 131	512 * 512 * 261	512 * 512 * 576
Voxel size (mm³)	0.69 * 0.69 * 1.0	0.70 * 0.70 * 1.0	0.87 * 0.87 * 0.8
CPU time (s)	386	783	1157
Origin GPU version (s)	6.5	15.5	39.9
Error points (original)	1169	4800	736
Improved GPU time (s)	7.2	16.8	41.9
Error points (improved)	0	1	0

Experimental data set and performance comparison of original and improved CUDA-kFOE Figure 4a shows the result of original CUDA-kFOE in one slice and (b) is the result of improved CUDA-kFOE. There are error points in the result of original CUDA-kFOE compared to our improved one. we choose one region with red rectangle in the results to demonstrate the error points. The region are blown up at the left-upper corner of the results, in which we can clear see there are missing pixels in the result of original CUDA-kFOE compared to the improved one.

Fig. 4

a The result of original CUDA-kFOE, b the result of improved CUDA-kFOE

a The result of original CUDA-kFOE, b the result of improved CUDA-kFOE Figure 5 demonstrates the performance comparison of the original CUDA-kFOE and the improved one in different size of data set. In each row, column (a) shows one slice of origin CT series; column (b) and (c) show original fuzzy scenes and threshold segmentation result respectively; column (d) is the different points of origin GPU version and CPU version. From top to bottom, the data set size is in the first row, in the second row, in the third row. It is demonstrated that the bigger vascular, the more different points generated.

Fig. 5

a One slice of origin CT series; b original fuzzy scenes; c threshold segmentation result; d different pointers. Images in column a are in cross sectional view. Columns b, c, and d are in longitudinal view of -Y direction. In addition, the improved method is further evaluated in different iteration directions as shown in Table 2. The results are also visualized in the Fig. 6. It is illustrated that the results have higher accuracy and less number of error points when choosing more adjacent edges during iterations.

Table 2

Error points of the improved method in different iteration directions

Direction	0	1	2	3	4
Small	1187	897	348	164	2
Medium	4800	3868	880	578	1
Large	693	619	254	30	0

Fig. 6

Error points of the improved method in different iteration directions

Error points of the improved method in different iteration directions Error points of the improved method in different iteration directions The time cost of each iteration direction is shown in the Fig. 7. For each data set, time cost slightly change while increase the iteration directions, because in the proposed twice-iteration method, most pointers reach their right values and only a few threads will participate in re-computing step.

Fig. 7

Time consuming (Data 1 small, Data 2 medium, Data 3 large)

Conclusions

In this study, we proposed an improved CUDA-kFOE to overcome the drawbacks of the original one. The improved CUDA-kFOE is in an two iterations manner. Two advantages are in the improved CUDA-kFOE. Firstly, the improved method doesn’t need large memory for large data set since we use a look up table. Secondly, the error voxels because of asynchronism are updated again in the last iteration of the improved CUDA-kFOE. To evaluate the proposed method, three data sets of different size are used. The improved CUDA-kFOE has a comparable time cost and has less errors compared with the original one as demonstrated in the experiments. In the future, we will study automatic acquisition method and complete automatic processing.

10 in total

1. Multiple abdominal organ segmentation: an atlas-based fuzzy connectedness approach.

Authors: Yongxin Zhou; Jing Bai
Journal: IEEE Trans Inf Technol Biomed Date: 2007-05

2. Parallel Fuzzy Segmentation of Multiple Objects.

Authors: Edgar Garduño; Gabor T Herman
Journal: Int J Imaging Syst Technol Date: 2008 Impact factor: 2.000

3. A variational method for geometric regularization of vascular segmentation in medical images.

Authors: Ali Gooya; Hongen Liao; Kiyoshi Matsumiya; Ken Masamune; Yoshitaka Masutani; Takeyoshi Dohi
Journal: IEEE Trans Image Process Date: 2008-08 Impact factor: 10.856

4. A system for brain tumor volume estimation via MR imaging and fuzzy connectedness.

Authors: Jianguo Liu; Jayaram K Udupa; Dewey Odhner; David Hackney; Gul Moonis
Journal: Comput Med Imaging Graph Date: 2005-01-24 Impact factor: 4.790

5. Parallel fuzzy connected image segmentation on GPU.

Authors: Ying Zhuge; Yong Cao; Jayaram K Udupa; Robert W Miller
Journal: Med Phys Date: 2011-07 Impact factor: 4.071

6. Fully automated tumor segmentation based on improved fuzzy connectedness algorithm in brain MR images.

Authors: Vida Harati; Rasoul Khayati; Abdolreza Farzan
Journal: Comput Biol Med Date: 2011-05-23 Impact factor: 4.589

7. GPU-based relative fuzzy connectedness image segmentation.

Authors: Ying Zhuge; Krzysztof C Ciesielski; Jayaram K Udupa; Robert W Miller
Journal: Med Phys Date: 2013-01 Impact factor: 4.071

8. Accelerating Advanced MRI Reconstructions on GPUs.

Authors: S S Stone; J P Haldar; S C Tsao; W-M W Hwu; B P Sutton; Z-P Liang
Journal: J Parallel Distrib Comput Date: 2008-10 Impact factor: 3.734

9. GPU accelerated fuzzy connected image segmentation by using CUDA.

Authors: Ying Zhuge; Yong Cao; Robert W Miller
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2009

10. A region growing vessel segmentation algorithm based on spectrum information.

Authors: Huiyan Jiang; Baochun He; Di Fang; Zhiyuan Ma; Benqiang Yang; Libo Zhang
Journal: Comput Math Methods Med Date: 2013-11-13 Impact factor: 2.238

10 in total

2 in total

1. Automated Segmentation of Tissues Using CT and MRI: A Systematic Review.

Authors: Leon Lenchik; Laura Heacock; Ashley A Weaver; Robert D Boutin; Tessa S Cook; Jason Itri; Christopher G Filippi; Rao P Gullapalli; James Lee; Marianna Zagurovskaya; Tara Retson; Kendra Godwin; Joey Nicholson; Ponnada A Narayana
Journal: Acad Radiol Date: 2019-08-10 Impact factor: 3.173

2. An Improved Fuzzy Connectedness Method for Automatic Three-Dimensional Liver Vessel Segmentation in CT Images.

Authors: Rui Zhang; Zhuhuang Zhou; Weiwei Wu; Chung-Chih Lin; Po-Hsiang Tsui; Shuicai Wu
Journal: J Healthc Eng Date: 2018-10-29 Impact factor: 2.682

2 in total