| Literature DB >> 35712641 |
Olga Lezhnina1, Gábor Kismihók1.
Abstract
Latent Class Cluster Analysis (LCCA) is an advanced model-based clustering method, which is increasingly used in social, psychological, and educational research. Selecting the number of clusters in LCCA is a challenging task involving inevitable subjectivity of analytical choices. Researchers often rely excessively on fit indices, as model fit is the main selection criterion in model-based clustering; it was shown, however, that a wider spectrum of criteria needs to be taken into account. In this paper, we suggest an extended analytical strategy for selecting the number of clusters in LCCA based on model fit, cluster separation, and stability of partitions. The suggested procedure is illustrated on simulated data and a real world dataset from the International Computer and Information Literacy Study (ICILS) 2018. For the latter, we provide an example of end-to-end LCCA including data preprocessing. The researcher can use our R script to conduct LCCA in a few easily reproducible steps, or implement the strategy with any other software suitable for clustering. We show that the extended strategy, in comparison to fit indices-based strategy, facilitates the selection of more stable and well-separated clusters in the data. • The suggested strategy aids researchers to select the number of clusters in LCCA • It is based on model fit, cluster separation, and stability of partitions • The strategy is useful for finding separable generalizable clusters in the data.Entities:
Keywords: Cluster separation; LCCA; Model fit; Stability of partitions
Year: 2022 PMID: 35712641 PMCID: PMC9192797 DOI: 10.1016/j.mex.2022.101747
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Cluster selection results for the simulated datasets.
| Dataset | Clusters | BIC | ICL | ASW | ARI | Jaccard |
|---|---|---|---|---|---|---|
| A | 4/1 | 14622/- | −7351/- | .19/- | .05/- | .28/- |
| B | 4/3 | 15252/15308 | −7711/−7652 | .31/.39 | .80/.67 | .76/.65 |
| C | 4/4 | 8316 | −4109 | .85 | 1 | 1 |
| D | 6/4 | 14752/14655 | −7402/−7279 | .63/.74 | .61/.51 | .52/.47 |
| E | 6/4 | 16141/16614 | −8064/−8290 | .53/.64 | .91/.73 | .86/.66 |
| F | 6/6 | 13148 | −6521 | .61 | 1 | 1 |
Note. BIC = Bayesian Information Criterion, ICL = Integrated Completed Likelihood criterion, ASW = Average Silhouette Width, ARI = Adjusted Rand Index. The number of clusters is given as total/separated, and the values of coefficients are given accordingly.
Fig. 1The graphic output of the LCCAselection function for six simulated datasets.
Fig. 2The graphic output of the LCCAselection function for the ICILS positive views dataset.
Cluster selection results for the ICILS positive views dataset.
| N clusters | BIC | ICL | ASW | ARI | Jaccard |
|---|---|---|---|---|---|
| 1 | 33639.57 | –16816.52 | — | ||
| 2 | 30843.22 | –15598.33 | .24 | ||
| 3 | 29648.88 | –14998.36 | .23 | ||
| 4 | 28985.07 | –14684.43 | .26 | .88 | .85 |
| 5 | 28902.88 | –14655.55 | .24 | ||
| 6 | 28880.90 | –14693.07 | .17 | .76 | .70 |
| 7 | 28925.91 | –14651.72 | .16 | ||
| 8 | 29016.39 | –14767.68 | .16 | ||
| 9 | 29115.62 | –14736.12 | .15 | ||
| 10 | 29217.73 | –14804.14 | .17 |
Note. BIC = Bayesian Information Criterion, ICL = Integrated Completed Likelihood criterion, ASW = Average Silhouette Width, ARI = Adjusted Rand Index.
Fig. 3Cluster visualization and silhouette plot for the four-cluster solution.
Fig. 4Item probability plot for the four-cluster solution.
| Subject Area; | Psychology |
| More specific subject area; | Social Psychology |
| Method name; | Extended selecting strategy for LCCA |
| Name and reference of original method; | Fit indices-based selecting strategies for LCCA |
| Resource availability; | The script in R (free downloadable software) is available on GitHub and in Supplementary Materials. |