Literature DB >> 32210514

The Comparison of Clustering Algorithms K-Means and Fuzzy C-Means for Segmentation Retinal Blood Vessels.

Abstract

INTRODUCTION: The segmentation method has a number of approaches, one of which is clustering. The clustering method is widely used for segmenting retinal blood vessels, especially the k-mean algorithm and fuzzy c-means (FCM). Unfortunately, so far there have been no studies comparing the two methods for blood vessel segmentation. Many studies do not explain the reason for choosing the method. AIM: This study aims to analyze the performance of the algorithms of k-means and FCM for retinal blood vessel segmentation.
METHODS: This research method is divided into three stages, namely preprocessing, segmentation, and performance analysis. Preprocessing uses the green channel method, Contrast-limited adaptive histogram equalization (CLAHE) and median filter. Segmentation is divided into three processes, namely clustering, thresholding and determining the region of interest (ROI). In the thresholding process, the determination of the threshold value uses two methods, namely the mean and the median. The third stage performs performance analysis using the performance parameters of the area under the curve (AUC) and statistical tests.
RESULTS: The statistical test results comparing FCM with k-means based on AUC values resulted in p-values <0.05 with a confidence level of 95%.
CONCLUSION: Retinal vascular segmentation with the FCM method is significantly better than k-means.

Entities: Chemical

Keywords: clustering; fuzzy c-meaans; k-mean; mean; median; segmentation

Year: 2020 PMID： 32210514 PMCID： PMC7085333 DOI： 10.5455/aim.2020.28.42-47

Source DB: PubMed Journal: Acta Inform Med ISSN： 0353-8109

INTRODUCTION

Vascular segmentation is a process of separation between blood vessels and the background. The separation can be done using the clustering approach. Clustering is grouping data by referring to the closeness of each data. Clustering has several algorithms that can be grouped by partition, hierarchy, and clustering for large data. The clustering algorithm that is widely used is partition-based, with well-known algorithms that are k-means and fuzzy c-means (FCM). The two algorithms used two different approaches, in k-means, data will be included in one particular cluster, whereas in FCM, a data can be included in all existing clusters, but with varying degrees of membership, in a range of values [0 1](1,2). In concept, the two algorithms have similarities in how they work. Many studies have used the k-means clustering algorithm and FCM for image segmentation. Research generally just used the clustering algorithm to be combined with several methods for segmentation. The things that sometimes left behind was the reason for choosing the clustering algorithm. Research conducted by Wiharto et al. (3), uses the k-means clustering algorithm for blood vessel segmentation. The determination of blood vessels is done by thresholding based on the center of the cluster produced. The determination of the threshold is done by calculating the mean from the center of the cluster. Unfortunately, in this study more focused on detecting positive or negative hypertension retinopathy. The use of k-mean for segmentation was also carried out by Mapayi et al. (4). In this study using the k-means algorithm, which is combined with two stages of pre-processing and post-processing used to provide maximum segmentation results. Post-processing methods used are median filter and morphology. This is not described by the ability of k-means. The next research is to use the FCM algorithm for segmentation. Research conducted by Wiharto et al. (5), tested the effect of the number of clusters in the process of retinal vessel segmentation. FCM was also used in the study of Dey et al. (6), where the study also did not explain the reasons for selecting FCM and the number of clusters used. The next study was carried out by Mapayi et al. (7). In that study, it was almost the same as Mapayi et al. (4), only different in the clustering method, which uses FCM. A comparison of segmentation methods is done in the study of Mapayi et al. (8), only the comparison made is comparing the FCM + Phase Congruence with Gray-Level Co-Occurrence Matrix (GLCM) + Sum-Entropy. The results of the comparison show that FCM + Phase Congruence is better. FCM combined with other methods was also carried out by Memari et al. (9), which combines FCM with Gabor filter and frangi filter. This makes the study unable to show FCM performance. This condition is strengthened by the results of research conducted by Wiharto et al. (10), namely segmentation with a frangi filter combined with otsu thresholding alone can provide good performance, without having to be combined with FCM. Subsequent research was carried out by Supot et al. (11), which combines fuzzy with k-means. In that research, it is also not much different, namely the existence of post-processing using a length filter. Referring to some studies that have been done, it is necessary to compare the clustering method for retinal blood vessel segmentation. This is needed to know which method can provide better segmentation results. This study aims to test the ability of k-mean and FCM algorithms for retinal blood vessel segmentation. This research method is divided into three processes, namely preprocessing, segmentation and performance analysis. Preprocessing methods used are green channel, CLAHE, and median filter. In the segmentation process after clustering, the blood vessel is determined using the thresholding method. Determination of the threshold value in the thresholding process is done by calculating the mean and median of the cluster center. Performance parameters used are the area under the curve (AUC) and statistical tests for the comparison of FCM with K-means.

AIM

This study aims to compare the performance of the k-means and fuzzy c-means algorithms combined with the thresholding method for segmenting retinal blood vessels.

METHODS

Research on retinal blood vessel segmentation uses two datasets, namely the DRIVE dataset (12) and STARE (13). Each dataset consists of 20 retinal fundus images that have not yet been segmented and segmented retinal images. This study uses the method as shown in Figure 1. Referring to Figure 1, the research method is divided into three stages, namely preprocessing, segmentation and performance analysis. The preprocessing stage aims to improve image quality, namely by separating the retinal image into three channels and taking the green channel for further processing. The green channel image is then processed using CLAHE and the median filter, for further segmentation. The clustering method for segmentation that will be tested is the k-means algorithm and fuzzy c-means (14). The segmentation stage is done using two methods, namely clustering, and thresholding. The fuzzy c-means clustering algorithm has the center cluster output expressed in vector V, with a total of c clusters. Pseudocode 1 FCM algorithm as follows (9):

Figure. 1.

Research Method

Pseudocode 1: FCM algorithm Step 1: Determine the number of clusters c and ε Step 2: Initialize the center of the cluster and Step 3: k=1 Step 4: While Calculation of and using equations (3-4) Step 5: return cluster centers v and membership function u In this study, in addition to fuzzy c-mean testing is also done using the k-means algorithm. The k-means algorithm is a partition-based clustering algorithm, using the mean. The algorithm has stages that can be described in pseudocode 2 (4). Step 1: Determine the number of clusters, c, and initialization centroid μ1, μ2··, μ Step 2: Determine the membership of each cluster by calculating the euclidean distance using the equation (5) Step 3: Perform a new centroid calculation with the equation (6) Step 4: Repeat step 2 and step 3, until no centroid changes occur. Criteria for these conditions using the equation (7) The cluster center generated by the k-mean algorithm or FCM is then used for the thresholding process. In the thresholding process, the threshold value used is determined by calculating the mean and median values of the cluster center. The resulting threshold value is used to convert grayscale to binary. The result of the thresholding process in the form of a binary image, then performed subtraction with masking from the grayscale retina, to determine the region of interest (ROI). The results of the ROI process are then performed performance analysis using the parameters of sensitivity, specificity, and area under the curve (AUC). Sensitivity parameters indicate the ability of the system to detect that the pixel is a background, while the specificity to detect that the pixel is a blood vessel. The AUC parameter is a parameter that is calculated by referring to the sensitivity and specificity values. Cluster-based segmentation performance measurement is done by comparing the retinal image of clustering-based segmentation results with a segmented retina dataset. Comparisons are made by calculating true positive (TP), true negative (TN), False positive (FP) and false-negative (FN) parameters. TP shows blood vessel pixels that are properly segmented as blood vessel pixels, whereas TN shows non-blood vessel pixels that are properly segmented as non-blood vessel pixels. FP shows non-segmented vascular pixels as vascular pixels, whereas FN shows segmented vascular pixels as non-vascular pixels (15). The four values are then used to calculate the sensitivity, specificity, and AUC. The calculation of performance parameters is done by referring to the equation (8-10). A comparison of the two algorithms is performed using statistical tests with a confidence level of 95%. The test is used to determine whether there is a significant performance difference between the two algorithms.

RESULTS

The results of segmentation using k-means and FCM for the STARE dataset can be shown in Figure 2 with the number of clusters of 10 and the threshold determination method with a median. In Figure 2 (a) is a sample input system image retina, Figure 2 (b) is the output of the CLAHE process and the median filter. Figure 2 (c) is a combined output of k-means and thresholding, while for the FCM and thresholding is shown in Figure 2 (d).

Figure. 2.

The example outputs of the segmentation model using K-means and FCM

Performance resulting from testing conducted for the thresholding method with mean and median and for both clustering methods namely k-means and fuzzy c-means can be shown in Figure 3 and Figure 4. Figure 3 and Figure 4 show the results of the segmentation system performance using AUC performance parameters. This parameter is a combination of sensitivity and specificity.

Figure. 3.

Effect of cluster number for DRIVE datasets

Figure. 4.

Effect of cluster number for STARE datasets

DISCUSSION

The performance of the two methods for segmentation as shown in Figure 3 and Figure 4 shows that when the number of clusters is less than 4, the mean and FCM methods both provide relatively poor performance, whereas better performance is produced when the number of clusters is more than or equal to 4, however, when the number of clusters is above 4 the performance produced by both the k-means segmentation method and FCM tends to be relatively constant. This shows that a greater number of the cluster does not guarantee better performance. Changes in the cluster center value will affect the resulting threshold value so that it will affect the results of segmentation. Changes in threshold values in addition to being influenced by the central cluster value are also influenced by the method of determining the threshold value. Referring to Figure 2 and Figure 3 shows that the mean and median methods give different AUC performance. The segmentation of retinal blood vessels using the k-means algorithm, the determination of the center of the cluster was initially carried out by random. Figures 3 and 4 show the change in performance that is very volatile when there is a change in the number of clusters. A combination with several algorithms to improve the weaknesses of k-means will certainly give improvement to the results of segmentation. A relatively similar condition occurs in the fuzzy c-means algorithm, which is fluctuating performance. If the k-mean that is determined randomly is the initial value of the center of the cluster, then if the FCM that is determined randomly is the initial value of the partition matrix u. U partition matrix is the degree of membership in the cluster. The initial random center cluster determination will cause a local optimum. In this study the comparison is done under the same conditions, i.e. the initial cluster center and u partition matrix are randomly determined. A comparison of the performance of the k-means and fuzzy c-means clustering methods for segmentation can be shown in Table 1. Table 1 shows a comparison of FCM and k-mean performance for the DRIVE and STARE dataset. The comparison results show that the performance of the k-means algorithm is significantly lower than that of the fuzzy c-means, based on the results of the significance test between the two algorithms. While the clustering method with the c-mean fuzzy algorithm can provide a significantly better performance, compared to k-means. Clustering-based segmentation performance with the median threshold determination method can provide better performance. The performance of the fuzzy c-means algorithm gives better performance than k-mean, both when using thresholding with mean and median methods. Better performance of fuzzy c-means requires additional time when compared to k-means, this is as explained in the study of Ghosh & Kumar (16). The research has explained that the computational time of fuzzy c-means is longer than k-means. This is also supported by the complexity of the k-mean Q(n) algorithm, while fuzzy c-means Q(n2). This is also reinforced by research conducted by Panda et al. (17).

Table 1.

Comparison of system performance

Parameters	P-value (Kmean vs FCM)
	DRIVE				STARE
	Mean	Algorithm	Median	Algorithm	Mean	Algorithm	Median	Algorithm
Sensitivity	0,009138	FCM	0,055646	-	0,002169	FCM	0,811024	-
Specificity	0,932436	-	0,005960	FCM	0,003464	Kmean	0,319630	-
Accuracy	0,002677	FCM	0,085654	-	0,002857	FCM	0,692880	-
AUC	0,405206	-	0,000696	FCM	0,066524	-	0,002140	FCM

Referring to these conditions, the low complexity of k-means will affect faster computing compared to FCM. Along with the development of processors that use Hyper-Threading Technology, the processor can execute multiple threads or instructions at the same time, to improve system performance and response. This makes the difference in the speed of computing k-means with FCM to be insignificant. The performance of both k-means and FCM, can provide performance in the range of 70% -80% based on AUC parameters, or included in the medium category (18). Both methods to achieve the best performance require a different number of clusters. Fuzzy c-mean can give the best performance when the number of clusters is 4, while k-means when the number of clusters is 6 in the DRIVE dataset. Based on the number of clusters, fuzzy c-means require relatively faster computational time than k-means, but the time for FCM convergence is longer, but cumulatively the k-mean is faster than FCM in achieving its best performance. If based on the best performance produced, then FCM is significantly better than k-means. The segmentation performance using FCM better than k-mean is also strengthened in the research of Dehariya et al. (19). That study concluded that the fuzzy-based k-mean algorithm can provide better performance than the k-mean for the case of general imagery. This is also supported in a study conducted by Uslan & Bucak (20), who explained that FCM performance is better than k-means when used for segmentation, but when used for the classification case the two methods do not provide good performance.

CONCLUSION

Retinal blood vessel segmentation using the k-mean and fuzzy c-means clustering methods can recognize retinal blood vessels. The best performance of fuzzy c-means for segmentation is included in the medium category. The resulting performance of the two methods is significantly different, both using the threshold determination method with the mean and median. The fuzzy c-means algorithm has better performance than k-means. The fuzzy c-means algorithm has a weakness in terms of computational time required, fuzzy c-means is longer than k-means.

3 in total

1. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response.

Authors: A Hoover; V Kouznetsova; M Goldbaum
Journal: IEEE Trans Med Imaging Date: 2000-03 Impact factor: 10.048

2. Ridge-based vessel segmentation in color images of the retina.

Authors: Joes Staal; Michael D Abràmoff; Meindert Niemeijer; Max A Viergever; Bram van Ginneken
Journal: IEEE Trans Med Imaging Date: 2004-04 Impact factor: 10.048

3. Clustering-based spot segmentation of cDNA microarray images.

Authors: Volkan Uslan; Ihsan Ömür Bucak
Journal: Annu Int Conf IEEE Eng Med Biol Soc Date: 2010

3 in total

2 in total

1. Magnetic Resonance Features of Acquired Immune Deficiency Syndrome Involving Central Nervous System Diseases by Intelligent Fuzzy C-Means Clustering (FCM) Algorithm.

Authors: Gang Huang; Jiaqi Chen; Yuli Ge; Xiaomei Zhu; Meixiao Ding; Xugao Chen; Chunsheng Qu
Journal: Comput Math Methods Med Date: 2022-07-05 Impact factor: 2.809

2. Abnormality detection and intelligent severity assessment of human chest computed tomography scans using deep learning: a case study on SARS-COV-2 assessment.

Authors: Mohamed Ramzy Ibrahim; Sherin M Youssef; Karma M Fathalla
Journal: J Ambient Intell Humaniz Comput Date: 2021-05-25

2 in total