Literature DB >> 22773923

RSQRT: AN HEURISTIC FOR ESTIMATING THE NUMBER OF CLUSTERS TO REPORT.

John Carlis1, Kelsey Bruso.   

Abstract

Clustering can be a valuable tool for analyzing large datasets, such as in e-commerce applications. Anyone who clusters must choose how many item clusters, K, to report. Unfortunately, one must guess at K or some related parameter. Elsewhere we introduced a strongly-supported heuristic, RSQRT, which predicts K as a function of the attribute or item count, depending on attribute scales. We conducted a second analysis where we sought confirmation of the heuristic, analyzing data sets from theUCImachine learning benchmark repository. For the 25 studies where sufficient detail was available, we again found strong support. Also, in a side-by-side comparison of 28 studies, RSQRT best-predicted K and the Bayesian information criterion (BIC) predicted K are the same. RSQRT has a lower cost of O(log log n) versus O(n(2)) for BIC, and is more widely applicable. Using RSQRT prospectively could be much better than merely guessing.

Entities:  

Year:  2012        PMID: 22773923      PMCID: PMC3388514          DOI: 10.1016/j.elerap.2011.12.006

Source DB:  PubMed          Journal:  Electron Commer Res Appl        ISSN: 1567-4223            Impact factor:   6.014


  4 in total

1.  Dietary patterns and adenocarcinoma of the esophagus and distal stomach.

Authors:  Honglei Chen; Mary H Ward; Barry I Graubard; Ellen F Heineman; Rodney M Markin; Nancy A Potischman; Robert M Russell; Dennis D Weisenburger; Katherine L Tucker
Journal:  Am J Clin Nutr       Date:  2002-01       Impact factor: 7.045

2.  Integrative data mining: the new direction in bioinformatics.

Authors:  P Bertone; M Gerstein
Journal:  IEEE Eng Med Biol Mag       Date:  2001 Jul-Aug

3.  How many clusters to report: a recursive heuristic.

Authors:  John Carlis; Kelsey Bruso
Journal:  Annu Int Conf IEEE Eng Med Biol Soc       Date:  2010

4.  The effect of cluster size, dimensionality, and the number of clusters on recovery of true cluster structure.

Authors:  G W Milligan; S C Soon; L M Sokol
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  1983-01       Impact factor: 6.226

  4 in total
  1 in total

1.  Immunogenetic clustering of 30 cancers.

Authors:  Lisa M James; Apostolos P Georgopoulos
Journal:  Sci Rep       Date:  2022-05-04       Impact factor: 4.996

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.