| Literature DB >> 21096553 |
Abstract
Clustering can be a valuable tool for analyzing large amounts of data, but anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter when working within each of the three available frameworks where one thinks of clustering: as a Euclidean distance problem; as a statistical model problem; or as a complexity theory problem. We report here a novel recursive square root heuristic, RSQRT, which accurately predicts K(reported) as a function of the attribute or item count, depending on attribute scales. We tested the heuristic on 226 widely-varying, but mostly scientific, studies, and found that the heuristic's K(best-predicted) rounded to exactly K(reported) in over half of the studies and was close in almost all of them. We claim that this strongly-supported heuristic makes sense and that, although it is not prescriptive, using it prospectively is much better than guessing.Mesh:
Year: 2010 PMID: 21096553 DOI: 10.1109/IEMBS.2010.5627287
Source DB: PubMed Journal: Annu Int Conf IEEE Eng Med Biol Soc ISSN: 2375-7477