Literature DB >> 35242339

Massive data clustering by multi-scale psychological observations.

Shusen Yang1, Liwen Zhang1, Chen Xu2, Hanqiao Yu1, Jianqing Fan3, Zongben Xu1.   

Abstract

Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber-Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains.
© The Author(s) 2021. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd.

Entities:  

Keywords:  Weber–Fechner law; clustering; cognitive interpretability; computational scalability; massive data; psychological observation

Year:  2021        PMID: 35242339      PMCID: PMC8889001          DOI: 10.1093/nsr/nwab183

Source DB:  PubMed          Journal:  Natl Sci Rev        ISSN: 2053-714X            Impact factor:   17.275


  11 in total

Review 1.  Survey of clustering algorithms.

Authors:  Rui Xu; Donald Wunsch
Journal:  IEEE Trans Neural Netw       Date:  2005-05

2.  Clustering by passing messages between data points.

Authors:  Brendan J Frey; Delbert Dueck
Journal:  Science       Date:  2007-01-11       Impact factor: 47.728

3.  Addressing the minimum fleet problem in on-demand urban mobility.

Authors:  M M Vazifeh; P Santi; G Resta; S H Strogatz; C Ratti
Journal:  Nature       Date:  2018-05-23       Impact factor: 49.962

4.  Machine learning. Clustering by fast search and find of density peaks.

Authors:  Alex Rodriguez; Alessandro Laio
Journal:  Science       Date:  2014-06-27       Impact factor: 47.728

5.  Robust continuous clustering.

Authors:  Sohil Atul Shah; Vladlen Koltun
Journal:  Proc Natl Acad Sci U S A       Date:  2017-08-29       Impact factor: 11.205

6.  Time required for judgements of numerical inequality.

Authors:  R S Moyer; T K Landauer
Journal:  Nature       Date:  1967-09-30       Impact factor: 49.962

Review 7.  Challenges in unsupervised clustering of single-cell RNA-seq data.

Authors:  Vladimir Yu Kiselev; Tallulah S Andrews; Martin Hemberg
Journal:  Nat Rev Genet       Date:  2019-05       Impact factor: 53.242

8.  SCANPY: large-scale single-cell gene expression data analysis.

Authors:  F Alexander Wolf; Philipp Angerer; Fabian J Theis
Journal:  Genome Biol       Date:  2018-02-06       Impact factor: 13.583

9.  Molecular Architecture of the Mouse Nervous System.

Authors:  Amit Zeisel; Hannah Hochgerner; Peter Lönnerberg; Anna Johnsson; Fatima Memic; Job van der Zwan; Martin Häring; Emelie Braun; Lars E Borm; Gioele La Manno; Simone Codeluppi; Alessandro Furlan; Kawai Lee; Nathan Skene; Kenneth D Harris; Jens Hjerling-Leffler; Ernest Arenas; Patrik Ernfors; Ulrika Marklund; Sten Linnarsson
Journal:  Cell       Date:  2018-08-09       Impact factor: 41.582

10.  The single-cell transcriptional landscape of mammalian organogenesis.

Authors:  Junyue Cao; Malte Spielmann; Xiaojie Qiu; Xingfan Huang; Daniel M Ibrahim; Andrew J Hill; Fan Zhang; Stefan Mundlos; Lena Christiansen; Frank J Steemers; Cole Trapnell; Jay Shendure
Journal:  Nature       Date:  2019-02-20       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.