Literature DB >> 33497379

mbkmeans: Fast clustering for single cell data using mini-batch k-means.

Stephanie C Hicks1, Ruoxi Liu2, Yuwei Ni3, Elizabeth Purdom4, Davide Risso5.   

Abstract

Single-cell RNA-Sequencing (scRNA-seq) is the most widely used high-throughput technology to measure genome-wide gene expression at the single-cell level. One of the most common analyses of scRNA-seq data detects distinct subpopulations of cells through the use of unsupervised clustering algorithms. However, recent advances in scRNA-seq technologies result in current datasets ranging from thousands to millions of cells. Popular clustering algorithms, such as k-means, typically require the data to be loaded entirely into memory and therefore can be slow or impossible to run with large datasets. To address this problem, we developed the mbkmeans R/Bioconductor package, an open-source implementation of the mini-batch k-means algorithm. Our package allows for on-disk data representations, such as the common HDF5 file format widely used for single-cell data, that do not require all the data to be loaded into memory at one time. We demonstrate the performance of the mbkmeans package using large datasets, including one with 1.3 million cells. We also highlight and compare the computing performance of mbkmeans against the standard implementation of k-means and other popular single-cell clustering methods. Our software package is available in Bioconductor at https://bioconductor.org/packages/mbkmeans.

Entities:  

Mesh:

Year:  2021        PMID: 33497379      PMCID: PMC7864438          DOI: 10.1371/journal.pcbi.1008625

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


  29 in total

Review 1.  Machine Learning for Medical Imaging.

Authors:  Bradley J Erickson; Panagiotis Korfiatis; Zeynettin Akkus; Timothy L Kline
Journal:  Radiographics       Date:  2017-02-17       Impact factor: 5.333

2.  Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq.

Authors:  Amit Zeisel; Ana B Muñoz-Manchado; Simone Codeluppi; Peter Lönnerberg; Gioele La Manno; Anna Juréus; Sueli Marques; Hermany Munguba; Liqun He; Christer Betsholtz; Charlotte Rolny; Gonçalo Castelo-Branco; Jens Hjerling-Leffler; Sten Linnarsson
Journal:  Science       Date:  2015-02-19       Impact factor: 47.728

Review 3.  Identifying cell populations with scRNASeq.

Authors:  Tallulah S Andrews; Martin Hemberg
Journal:  Mol Aspects Med       Date:  2017-07-25

4.  Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution.

Authors:  Samuel G Rodriques; Robert R Stickels; Aleksandrina Goeva; Carly A Martin; Evan Murray; Charles R Vanderburg; Joshua Welch; Linlin M Chen; Fei Chen; Evan Z Macosko
Journal:  Science       Date:  2019-03-28       Impact factor: 47.728

5.  Developmental Emergence of Adult Neural Stem Cells as Revealed by Single-Cell Transcriptional Profiling.

Authors:  Scott A Yuzwa; Michael J Borrett; Brendan T Innes; Anastassia Voronova; Troy Ketela; David R Kaplan; Gary D Bader; Freda D Miller
Journal:  Cell Rep       Date:  2017-12-26       Impact factor: 9.423

6.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R.

Authors:  Davis J McCarthy; Kieran R Campbell; Aaron T L Lun; Quin F Wills
Journal:  Bioinformatics       Date:  2017-04-15       Impact factor: 6.937

7.  SCANPY: large-scale single-cell gene expression data analysis.

Authors:  F Alexander Wolf; Philipp Angerer; Fabian J Theis
Journal:  Genome Biol       Date:  2018-02-06       Impact factor: 13.583

8.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts.

Authors:  Aaron T L Lun; Karsten Bach; John C Marioni
Journal:  Genome Biol       Date:  2016-04-27       Impact factor: 13.583

9.  beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types.

Authors:  Aaron T L Lun; Hervé Pagès; Mike L Smith
Journal:  PLoS Comput Biol       Date:  2018-05-03       Impact factor: 4.475

10.  Clustering algorithms: A comparative approach.

Authors:  Mayra Z Rodriguez; Cesar H Comin; Dalcimar Casanova; Odemir M Bruno; Diego R Amancio; Luciano da F Costa; Francisco A Rodrigues
Journal:  PLoS One       Date:  2019-01-15       Impact factor: 3.240

View more
  5 in total

1.  Doublet identification in single-cell sequencing data using scDblFinder.

Authors:  Pierre-Luc Germain; Aaron Lun; Carlos Garcia Meixide; Will Macnair; Mark D Robinson
Journal:  F1000Res       Date:  2021-09-28

Review 2.  Machine Learning Approaches on High Throughput NGS Data to Unveil Mechanisms of Function in Biology and Disease.

Authors:  Vasileios C Pezoulas; Orsalia Hazapis; Nefeli Lagopati; Themis P Exarchos; Andreas V Goules; Athanasios G Tzioufas; Dimitrios I Fotiadis; Ioannis G Stratis; Athanasios N Yannacopoulos; Vassilis G Gorgoulis
Journal:  Cancer Genomics Proteomics       Date:  2021 Sep-Oct       Impact factor: 4.069

3.  PsiNorm: a scalable normalization for single-cell RNA-seq data.

Authors:  Matteo Borella; Graziano Martello; Davide Risso; Chiara Romualdi
Journal:  Bioinformatics       Date:  2021-09-09       Impact factor: 6.937

4.  Two-step clustering-based pipeline for big dynamic functional network connectivity data.

Authors:  Mohammad S E Sendi; David H Salat; Robyn L Miller; Vince D Calhoun
Journal:  Front Neurosci       Date:  2022-07-25       Impact factor: 5.152

5.  ccImpute: an accurate and scalable consensus clustering based algorithm to impute dropout events in the single-cell RNA-seq data.

Authors:  Marcin Malec; Hasan Kurban; Mehmet Dalkilic
Journal:  BMC Bioinformatics       Date:  2022-07-22       Impact factor: 3.307

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.