Literature DB >> 34882546

A Variational EM Acceleration for Efficient Clustering at Very Large Scales.

Florian Hirschberger, Dennis Forster, Jorg Lucke.   

Abstract

How can we efficiently find very large numbers of clusters C in very large datasets N of potentially high dimensionality D ? Here we address the question by using a novel variational approach to optimize Gaussian mixture models (GMMs) with diagonal covariance matrices. The variational method approximates expectation maximization (EM) by applying truncated posteriors as variational distributions and partial E-steps in combination with coresets. Run time complexity to optimize the clustering objective then reduces from O(NCD) per conventional EM iteration to for a variational EM iteration on coresets (with coreset size and truncation parameter ). Based on the strongly reduced run time complexity per iteration, which scales sublinearly with NC , we then provide a concrete, practically applicable, parallelized and highly efficient clustering algorithm. In numerical experiments on standard large-scale benchmarks we (A) show that also overall clustering times scale sublinearly with NC , and (B) observe substantial wall-clock speedups compared to already highly efficient recently reported results. The algorithm's sublinear scaling allows for applications at scales where alternative methods cease to be applicable. We demonstrate such very large-scale applicability using the YFCC100M benchmark, for which we realize with a GMM of up to 50.000 clusters an optimization of a data density model with up to 150 M parameters.

Entities:  

Year:  2021        PMID: 34882546     DOI: 10.1109/TPAMI.2021.3133763

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  1 in total

1.  Triage and monitoring of COVID-19 patients in intensive care using unsupervised machine learning.

Authors:  Salah Boussen; Pierre-Yves Cordier; Arthur Malet; Pierre Simeone; Sophie Cataldi; Camille Vaisse; Xavier Roche; Alexandre Castelli; Mehdi Assal; Guillaume Pepin; Kevin Cot; Jean-Baptiste Denis; Timothée Morales; Lionel Velly; Nicolas Bruder
Journal:  Comput Biol Med       Date:  2021-12-31       Impact factor: 4.589

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.