Literature DB >> 27553625

Stepwise iterative maximum likelihood clustering approach.

Alok Sharma^1,2,3, Daichi Shigemizu^4,5,6, Keith A Boroevich⁴, Yosvany López^4,6, Yoichiro Kamatani⁴, Michiaki Kubo⁴, Tatsuhiko Tsunoda^7,8,9.

Abstract

BACKGROUND: Biological/genetic data is a complex mix of various forms or topologies which makes it quite difficult to analyze. An abundance of such data in this modern era requires the development of sophisticated statistical methods to analyze it in a reasonable amount of time. In many biological/genetic analyses, such as genome-wide association study (GWAS) analysis or multi-omics data analysis, it is required to cluster the plethora of data into sub-categories to understand the subtypes of populations, cancers or any other diseases. Traditionally, the k-means clustering algorithm is a dominant clustering method. This is due to its simplicity and reasonable level of accuracy. Many other clustering methods, including support vector clustering, have been developed in the past, but do not perform well with the biological data, either due to computational reasons or failure to identify clusters.
RESULTS: The proposed SIML clustering algorithm has been tested on microarray datasets and SNP datasets. It has been compared with a number of clustering algorithms. On MLL datasets, SIML achieved highest clustering accuracy and rand score on 4/9 cases; similarly on SRBCT dataset, it got for 3/5 cases; on ALL subtype it got highest clustering accuracy for 5/7 cases and highest rand score for 4/7 cases. In addition, SIML overall clustering accuracy on a 3 cluster problem using SNP data were 97.3, 94.7 and 100 %, respectively, for each of the clusters.
CONCLUSIONS: In this paper, considering the nature of biological data, we proposed a maximum likelihood clustering approach using a stepwise iterative procedure. The advantage of this proposed method is that it not only uses the distance information, but also incorporate variance information for clustering. This method is able to cluster when data appeared in overlapping and complex forms. The experimental results illustrate its performance and usefulness over other clustering methods. A Matlab package of this method (SIML) is provided at the web-link http://www.riken.jp/en/research/labs/ims/med_sci_math/ .

Entities: CellLine Chemical Disease Gene

Mesh：

Year: 2016 PMID： 27553625 PMCID： PMC4995791 DOI： 10.1186/s12859-016-1184-5

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

In an unsupervised learning procedure, the class label of a training sample is not known and the aim is to partition the data into clusters. The unsupervised learning scheme uses the relationship between samples to perform partitioning. In many biological data (e.g. transcriptome data, genomic data etc.), the number of clusters and class labels are unknown. However, the distribution is sometimes known, which is usually normal. Therefore, it would be an advantage to build a technique that utilizes distance and variance information as it can track clusters with different conformations. Over last several decades, the k-means clustering algorithm has been used quite significantly in partitioning the biological data. In the most recent multi-omics data analysis tools, like iCluster, and iClusterPlus [1], the underlying clustering method used was also k-means. Some tools in cancer research, like ConsensusCluster (CC) and CCPlus [2, 3], also utilize k-means as one of the common clustering algorithms. Though the k-means clustering algorithm has been extensively applied [4] due to its simplicity and reasonable level of accuracy, it cannot track clusters when samples of different groups are overlapping to each other (i.e., data points of adjacent groups are spread in a way that the groups partly coincide over each other). In biological data, this is sometimes the case, and thereby leads to clusters which may not be accurate. This has a significant implication in biological findings, particularly in cancer subtypes analysis, population stratification analysis in GWAS and multi-omics data analysis. In general, k-means has played a significant role in carrying out analysis for various types of biological data over several years. Since data complexity and quantity are increasing, it is important to develop techniques that can perform clustering by following data topologies. In the field of unsupervised learning and clustering, several wonderful techniques have emerged. Some of the techniques are briefly summarized here as follows: 1) clustering using some criterion functions e.g. i) sum-of-squared error criterion; ii) related minimum variance criterion, iii) scattering criterion; iv) trace criterion; v) determinant criterion; and, vi) invariant criterion [5, 6]; 2) clustering using iterative optimization [7-9]; 3) hierarchical clustering [10-13]; several hierarchical-based algorithms can be found in the literature; e.g., single-linkage [14], complete-linkage [15], median-linkage [16] and so on. Single linkage (SLink) [14] merges two nearest-neighbor clusters at a time in an agglomerative hierarchical fashion. It uses Euclidean distance to measure the closeness between two clusters (if it is less than an arbitrary threshold). This method is very sensitive to data position, which sometimes creates problem by forming a cluster in a long chain (known as the chaining effect). The complete linkage (CLink) hierarchical approach [15] depends on the farthest-neighbor and reduces the effects of long chains. This technique is also sensitive to outliers. The use of average or median distance could be a way to overcome this sensitiveness. This was done in median linkage (MLink) hierarchical approach [16]; 4) clustering is also performed using Bayes classifier [17-21]; 5) clustering iterative maximum likelihood [22-24]; and, 6) support vector clustering [25-27]. In the recent literature, support vector clustering has gained a lot of attention [26-31]. However, it is expensive in processing time and sometimes fails to find meaningful clusters. In general, clustering methods based on Bayes classifier and maximum likelihood are still the preferred choice compared to support vector clustering for many applications. There are various approaches to implement these clustering methods. In this paper, we focus on maximum likelihood estimate. There are three ways to implement the maximum likelihood method. 1) Analytic way: likelihood functions are differentiated and equated to zero and the equations are solved to find extrema. The second derivative is then taken to ensure if maxima has reached rather than minima. 2) Grid search: an exhaustive search over a region is conducted to find the parameters that produce largest likelihood. 3) Numerical analysis: an initial value of parameter is used in a hill climbing algorithm or gradient ascent algorithm (e.g. Newton-Rapson, BHHH, DFP) to find the maxima. Maximum likelihood is also estimated via EM algorithm [5, 22, 32–39]. In general, it is impossible to use an analytic approach to find maximum likelihood estimates as the parametric form of data is unknown. Grid search is only possible when the dimensionality of the data is very small. Most of the time, maximum likelihood is computed by a hill climbing algorithm or by the EM algorithm. The potential problem with gradient algorithms is that when likelihood is not differentiable then it is not possible to find gradient to convergence. Considering this, in this paper, we propose a stepwise iterative maximum likelihood (SIML) procedure which does not require derivatives of likelihood functions. It can find all unknown parameters without solving first derivative and second derivatives of likelihood. The experimental results also show promising when compared to many state-of-the-art clustering methods.

Methods

Description of Maximum Likelihood Clustering

Here, we briefly discuss maximum likelihood method for clustering [5]. Assume a d -dimensional sample set χ = {x1, x2, …, xn} having n unlabelled samples, and, c is the number of clusters. Let Ω = {ω} (for j = 1, 2, …, c) be the state of the nature or class label for j th cluster χ. Suppose θ = {θ} (for j = 1 … c) is any unknown parameter (having mean μ and covariance Σ). Then the mixture density is given by where p(x|ω, θ) is the conditional density, and P(ω) is the a priori probability. The log likelihood can be represented by joint density Suppose that the joint density is differentiable with respect to θ then from Eqs. 1 and 2 where is the gradient of L with respect to θ . If θ and θ are independent and suppose a posteriori probability is given as then from Eqs. 3 and 4, we have The gradient of likelihood (Eq. 5) can be equated to zero () to obtain maximum likelihood estimate . The solution can be therefore obtained by In the above equations, θ is replaced by unknown mean and covariance parameters for normal distribution case, to yield maximum likelihood estimates. In the literature, parameter θ is iteratively updated to reach the final value using hill climbing algorithms such as the Newton-Raphson method. In general, the computation of first and second derivatives of likelihood is required to find the solution. If the likelihood is differentiable and the a priori probability is non-zero, then convergence can be obtained. However, there is always a possibility of being trapped in a local optima.

Stepwise iterative maximum likelihood method

In this section, we describe our proposed method. This method seeks the most optimal partitions in an iterative way. We begin with an initial partition of data and shift a sample from one partition to another partition, and test if such a shift improves the overall log-likelihood. A simple illustration of SIML is given in Fig. 1.

Fig. 1

An illustration of stepwise iterative maximum likelihood method using a = 2 cluster case. In this illustration, two clusters and are given with likelihood functions L1 and L2, respectively. The center of clusters are depicted by μ 1 and μ 2 (shown as ‘+’ inside two clusters). Initial total likelihood is Lold which is the sum of two likelihood functions (L1 + L2). A sample is checked for grouping. It is advantageous to shift sample to cluster only if the new likelihood (Lnew = L 1* + L 2*) is higher than the old likelihood; i.e., L > L If we define class-based log-likelihood of two clusters χ and χ asand then we would be interested in knowing how the class-based log-likelihood functions (referred as log-likelihood function hereafter) change if a sample is shifted from χ to χ. In order to know this, let us define mean and covariance of χ and χ as μ and μ, and, as Σ and Σ, respectively. The following equations describe mean and covariance:andwhere n and n are number of samples in χ and χ, respectively. If the component density is normal and let P(ω) = n/n (where n is the total number of samples) then Eqs. 9 and 10 can be written as where tr() is a trace function. Since we can write L as Similarly, we can write L as and the total log-likelihood for c clusters can be written as where L is from Eq. 15 or 16. If a sample is shifted to χ, then the mean and covariance will change as follows (from Eqs. 11, 12, 13 and 14): where μ*, μ*, Σ* and Σ* are means and covariances after the shift. In order to find the change in log-likelihood functions L and L, we first introduce the following Lemma. Lemma 1 If a sample is shifted to cluster χ and the changed covariance of χ is defined as then the determinant of Σ* can be given as . Proof By taking determinant of Σ*, we get since for m × m square matrices |AB| = |A||B| and for a scalar c, |cA| = c|A|. We can write Eq. L1 as From Sylvester’s determinant theorem, rectangular matrices A ∈ ℝ and B ∈ ℝ in |I + AB| is equal to |I + BA|. Therefore, we can write since |c| = c. Substituting right hand side of Eq. L3 in Eq. L2 proves the Lemma. As similar to Lemma 1, the determinant of the change in covariance of χ can be written as We can now observe the change in L (Eq. 16) due to the shift of a sample from χ to χ as From Lemma 1 and Eq. 16, we can rewrite Eq. 23 after doing algebraic manipulation as where ΔL is given by and constant C is given by In a similar manner, change in L can be obtained by using Eqs. 15 and 22 as where ΔL is given by and C is same as of Eq. 26. By adding Eqs. 24 and 27, we can get the change in total log-likelihood (L*) since there is no change in any other clusters apart from χ to χ; i.e., from Eqs. 17, 24 and 27 we have where ΔL = ΔL − ΔL. Therefore, the shift of a sample is advantageous if ΔL > 0. This will give the following algorithm (Table 1):

Table 1

Stepwise iterative maximum likelihood method procedure

1. Initialization: select initial partitions with means μ ₁, μ ₂, …, μ _c and covariance matrices Σ ₁, Σ ₂, …, Σ _c
2. Loop: Select a sample \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{\mathbf{x}}\in {\chi}_i $$\end{document}x^∈χi.
3. If n _i > 1 then compute
4. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\delta}_j=\left\{\begin{array}{c}\hfill \varDelta {L}_j,\kern0.75em j\ne i\hfill \\ {}\hfill \varDelta {L}_i,\kern0.75em j=i\hfill \end{array}\right. $$\end{document}δj=ΔLj,j≠iΔLi,j=i
5. Transfer \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \widehat{\mathbf{x}} $$\end{document}x^ to χ _k if δ _k = max δ _j for all j.
6. Update L _tot, μ _i, μ _k, Σ _i and Σ _k.
7. If L _tot doesn’t change in n attempts then stop otherwise go to Loop.

Stepwise iterative maximum likelihood method procedure The following sections discuss the characteristic of the SIML method.

Initial settings of the procedure

Similar to any other iterative based optimization technique, this technique also depends on the initial settings. Therefore, it is important to put consideration into the initial settings. In this paper, we implemented three ways of initializing the partitions: 1) random initialization, 2) k-means based initialization, and 3) initialization based on the solution of c − 1 partitions and the mean. These schemes are described as follows: Random initialization: In this scheme, we create c random means around the center of the data. This technique works well when the number of clusters is small. If c is very large then it can miss clusters. K-means initialization: In this scheme, the data is first partitioned into c clusters by using the k-means algorithm. The solution of k-means is used as an initial setting for the SIML method. This method works well even if the number of clusters is large. Most of the time this initialization technique provides good results. However, since the k-means algorithm does not track the data based on covariance information, it has limitations. Initialization based on the solution of c − 1 clusters: The initialization of c clusters is done by using the solution of c − 1 clusters, which would give c − 1 locations. The c th location is the mean of the overall data itself. If only two clusters are required to find, then 2 locations around center of the data is used since the solution of 1-cluster is the center of the data itself. In this paper, we used all the three schemes for initialization and in general schemes 2 and 3 provide satisfactory results for most of the data conformations.

Numerical stability

Due to numerical difficulties the convergences of an iterative algorithm can be missed (e.g. convergence problem for EM algorithm is discussed in [40]). The problem of numeral difficulties is of particular issue when data dimensionality is high. In this situation, iterative algorithms sometimes do not converge properly. This problem usually appears due to the small numerical values of the covariance matrix. If the eigenvalues of a covariance matrix Σ are small, then its determinant can give a value close to zero due to the fixed point architecture of the hardware. However, this problem can be easily overcome by first conducting eigenvalue decomposition of Σ and using the summation of the logarithm of eigenvalues. It is described as follows: The eigenvalue decomposition of Σ ∈ ℝ will give EDET where E ∈ ℝd × d is the eigenvector matrix and D ∈ ℝ is the diagonal matrix of eigenvalues. The determinant of Σ will be where λ is the k th eigenvalue of Σ. If the values of λ are small then |Σ| = 0. This problem can be overcome by simply taking logarithm as In a similar way, the inverse of Σ can cause problems in the term of Eq. 28; i.e., the computation of the term (where ) when the size of the covariance matrix is large. In order to make this numerically stable a small quantity ϵ > 0 can be included as follows: This will ensure that .

Small sample size case

When the dimensionality d is much greater than the number of samples n (d ≫ n) then small sample size problem appears [41-44]. Let a sample set χ = {x1, x2, …, x} be drawn independently and let the mean and covariance of χ be denoted by μ and Σ, respectively. In the normal density we have a term P = (x − μ)TΣ− 1(x − μ) to compute which cannot be solved due to singular covariance matrix as its inverse does not exist. A simple extension could be to use the pseudo-inverse of Σ (denoted here as Σ+). However, this doesn’t solve the problem. If samples x are from χ then P+ = (x − μ)TΣ+(x − μ) will always be equal to the rank of Σ or basically n − 1 (for d ≫ n). This means that all the samples in a particular cluster would have the same probability and it would not be possible to justify classification of samples based on probability. A second way would be to regularize Σ, however, computing optimal regularization parameter could be a challenging task. One way could be to apply principal component analysis (PCA) procedure on a d -dimensional sample set χ ∈ ℝ to transform it to a parsimonious sample set Y ∈ ℝ where h < min(d, n) h < min(d, n). Thereafter, the clustering procedure can be performed.

Determination of the number of clusters

It is potentially important to estimate the number of clusters c present in the sample set. Since this information is usually not provided, it is important to obtain the value of c with whatever information we have at hand. Basically, the only information we have is the sample itself. In the maximum likelihood procedure we compute likelihood from sample set. Therefore, this information can be utilized to estimate the number of clusters. In order to evaluate c, we can run the SIML algorithm for a range of clusters e.g. 1 ≤ c ≤ K to see at what point the likelihood function stabilizes or reaches maximum. In this paper, we investigated two ways to compute c. In the first way, we compute the maximum log-likelihood MaxL(c) achieved for all values of c ∈ [1, K]. At a particular value of c the MaxL reaches maximum and does not change much. This would be the estimated value of c. In the second way, we compute the difference between the maximum log-likelihood MaxL achieved and the first value of L after SIML procedure (excluding the initial L value computed from initial settings as this value is based on the first random guess). Therefore, for a particular number of cluster c, we will get this difference likelihood and we denote it as DelL which is equal to MaxL − L(1) or max(L) − L(1), where L(r) defines the value of L at an iteration r. The curve of DelL as a function of c would give a peak at some value of c which would be its best value. In most of the data conformations, MaxL gives reliable results. Nonetheless, both the graphs of MaxL and DelL (as a function of c) are illustrated in the experimental section of the paper.

Results

In order to evaluate the algorithm, we carried out experiments on normal Gaussian data as well as on biological data. We divide this section into 5 subsections. In subsection 1, we illustrate the performance of various methods using three cluster case. Subsection 2 indulges on maximum likelihood plots as a function of number of clusters. In subsection 3, we discuss the processing time of the algorithm. In subsection 4, we discuss the performance in terms of clustering accuracy and rand score of various methods; and, in subsection 5 (parts I and II), we discuss SIML on biological data.

An illustration using three clusters

Since data distribution of GWAS can appear as approximately Gaussian, we generated normal distribution data with 3 different mean and covariance for simulation purposes. Furthermore, if we consider GWAS data from a continent (e.g. Europe) as one cluster, from a country (e.g. Germany) as second cluster and from a city (e.g. Berlin) as a third cluster, then third cluster (Berlin data) will reside inside second cluster (Germany data), and second cluster (Germany data) will reside inside the first cluster (European data). Therefore, clusters will overlap with each other. To simulate this scenario, we generated a sample set with 1, 500 samples, 2 dimensions and 3 clusters as shown in Fig. 2a, and applied various methods on it. Cluster 1 is the least dense (or sparse) and Cluster 3 is the most dense. Cluster 1 has mean [0.1, 0.1] and variance 3 in each direction. Similarly, mean and variance of Cluster 2 and Cluster 3 were [−1, − 1] and 0.8, and, [−0.1, − 0.1] and 0.05, respectively. The clusters overlap each other and the goal is to track these clusters. It can be seen (from Fig. 2b) that k-means clustered the 3 clusters without considering the distribution information. The processing time to perform the k-means algorithm was 0.82 s. Support vector clustering (CG method) [25] was difficult to perform as it is not possible to provide number of class information. The parameters were tuned so that 3 clusters are outputed. The processing time by this method was 1183.1 s (excluding the tuning time). It can be observed from the Fig. 2c that this method was failed to track the clusters. Next, support vector clustering (using SEP method) [26] was performed. The default parameters gave 45 clusters. Therefore, as similar to the previous CG method, tuning of parameters was carried out to extract only 3 clusters. Processing time was 25.2 s excluding the tuning time. This method also misses the clusters (Fig. 2d). Then we performed the proposed SIML method. This method was able to track all the 3 clusters in 4.49 s per repetition (Fig. 2e). The likelihood plots are discussed in the following section.

Fig. 2

An illustration using 3 clusters: a Three cluster data where n = 1500 and d = 2; b k-means clustering, different colors show different clusters; c Support vector clustering (CG method); d Support vector clustering (SEP method); e Stepwise iterative maximum likelihood (SIML) method

Likelihood plots

Here we discussed three plots: log-likelihood (L) versus sample (Fig. 3a), maximum log-likelihood (MaxL) as a function of number of clusters (Fig. 3b) and DelL as a function of number of clusters (Fig. 3c).

Fig. 3

Likelihood plots a L plot, b MaxL plot and c DelL plot

Likelihood plots a L plot, b MaxL plot and c DelL plot Figure 3a depicts L plot for 3 clusters. When a sample is moved from one cluster to another cluster the value of L is updated. This is an increasing function and the maximum value of L is defined as MaxL in this paper. Figure 3b depicts MaxL plot. Since in general, the number of cluster c information is unknown, it is therefore crucial to estimate this value. In this paper we showed that by using MaxL plot and DelL plot, it is possible to estimate c. For this, one can provide a range of c values and the value for which MaxL curve converges (reaches highest peak or does not change much) is the estimated c. We use the same data we generated in Fig. 2a and provide 10 values of c as 1 ≤ c ≤ 10. It can be seen from MaxL plot that it converges or peaks at c = 3.

Processing time

Here we discuss the processing time of the SIML algorithm. In order to give a complete picture, we investigated the clock time in seconds for samples n = 3, 000, 9, 000, 27,000, 54,000 and 102,000 having 3 clusters. We use the same conformation of data as depicted in Fig. 2a, however, we increased the dimensionality to d = 10, 20, 100 and 200. Figure 4 shows the processing time of the algorithm when processed in Linux platform (Ubuntu 14.04 LTS, 64 bit) with 6 processors (Intel Xeon R CPU E5-1660 v2 @ 3.70 GHz) and with 128 GB memory for a repetition.

Fig. 4

Processing time of SIML method for n = 3k − 102k and d = 10 − 200

Clustering on artificial data

We performed clustering accuracy and rand score test on a set of artificial data. For artificial data, we generated d -dimensional, 4 cluster data such that cluster samples are overlapping to each other (in a similar way as shown in Fig. 2). There are in total 2000 samples (where each cluster having 500 samples). We computed cluster accuracy and rand score for various methods. For statistical stability, we generated data 20 times for a particular dimension d by changing random seed of the normal data. Thereby, we computed average clustering accuracy and average rand score over these 20 attempts for dimension d. We then varied dimension d = 2, 3, …, 20 and reported average clustering accuracy and average rand score in Fig. 5. For comparison, we used centroid-based technique like k-means, hierarchical-based technique like SLink [14], CLink [15] and MLink [16] and model-based technique (using EM algorithm) like mclust [39]. It can be observed from Fig. 5a that mclust and SIML methods perform quite well on Gaussian data. K-means algorithm also performs reasonably well on this data. MLink and SLink couldn’t perform well. For average rand score (Fig. 5b), CLink, k-means, SIML and mclust are exhibiting reasonable performance. However, mclust and SIML are superior reaching almost 100 rand score. Since mclust and SIML are derived from Gaussian model, their performance on Gaussian data are well compared to other techniques.

Fig. 5

a Average clustering accuracy on Gaussian data. b Average rand score on Gaussian data

Clustering on real data-I (publically available biological data)

In this section, we utilized various biological data and reported clustering accuracy and rand score. We employed several methods such as k-means, SLink, CLink, MLink and mclust for comparison. The description of biological data is given as follows: SRBCT dataset [45]: the small round blue-cell tumor dataset consists of the expression of 2308 genes from 83 samples. This is a four class classification problem. The tumors are Burkitt lymphoma (BL), the Ewing family of tumors (EWS), neuroblastoma (NB) and rhabdomyosarcoma (RMS). The dataset consists of 11, 29, 18 and 25 samples of BL, EWS, NB and RMS respectively. MLL leukemia [46]: This dataset has 3 classes, namely ALL, MLL and AML leukemia. The dataset contains 24 ALL, 20 MLL and 28 AML. The dimension of MLL dataset is 12,582. ALL subtype dataset [47]: this dataset consists of the expression of 12,558 genes of subtypes of acute lymphoblastic leukemia. The dataset has seven classes namely BCR-ABL, E2A-PBX1, hyperdiploid >50 chromosomes ALL, MLL, T-ALL, TEL-AML1 and other (contains diagnostic samples that did not fit into any of the former six classes). Samples per class are 15, 27, 64, 20, 43, 79 and 79 respectively. To vary the data dimensionality (number of features), we utilized Chi-squared feature selection method to rank the attributes. The dimensionality investigated was d = 2, 3, …, n/2, where n is the cluster with minimum number of samples. We then performed cluster analysis (to evaluate clustering accuracy and rand score) on these datasets and compared SIML with the k-means, SLink, CLink, MLink and mclust methods. The results are reported in Tables 2 and 3 (for SRBCT dataset), Tables 4 and 5 (for MLL dataset), Tables 6 and 7 (ALL subtype dataset) and Table 8 (for estimation of number of clusters by SIML). Clustering accuracy is depicted in Tables 2, 3, 4, 5, 6 and rand score is shown in Tables 2, 3, 4, 5, 6 and 7. The methods achieving highest results are depicted in bold faces.

Table 2

Clustering accuracy on SRBCT dataset

Dim	K-means	SLINK	CLINK	MLINK	mclust	SIML
2	60.4	34.9	62.7	54.2	62.7	63.9
3	67.9	39.8	69.9	71.1	69.9	66.3
4	77.1	49.4	65.1	67.5	72.3	81.9
5	70.3	50.6	72.3	50.6	65.1	67.5
6	64.0	39.8	53.0	53.0	57.8	69.9