Literature DB >> 21096553

How many clusters to report: a recursive heuristic.

John Carlis1, Kelsey Bruso.   

Abstract

Clustering can be a valuable tool for analyzing large amounts of data, but anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter when working within each of the three available frameworks where one thinks of clustering: as a Euclidean distance problem; as a statistical model problem; or as a complexity theory problem. We report here a novel recursive square root heuristic, RSQRT, which accurately predicts K(reported) as a function of the attribute or item count, depending on attribute scales. We tested the heuristic on 226 widely-varying, but mostly scientific, studies, and found that the heuristic's K(best-predicted) rounded to exactly K(reported) in over half of the studies and was close in almost all of them. We claim that this strongly-supported heuristic makes sense and that, although it is not prescriptive, using it prospectively is much better than guessing.

Mesh:

Year:  2010        PMID: 21096553     DOI: 10.1109/IEMBS.2010.5627287

Source DB:  PubMed          Journal:  Annu Int Conf IEEE Eng Med Biol Soc        ISSN: 2375-7477


  1 in total

1.  RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT.

Authors:  John Carlis; Kelsey Bruso
Journal:  Electron Commer Res Appl       Date:  2012-03       Impact factor: 6.014

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.