| Literature DB >> 33054706 |
Thomas D Sherman1, Tiger Gao2, Elana J Fertig3,4,5.
Abstract
BACKGROUND: Bayesian factorization methods, including Coordinated Gene Activity in Pattern Sets (CoGAPS), are emerging as powerful analysis tools for single cell data. However, these methods have greater computational costs than their gradient-based counterparts. These costs are often prohibitive for analysis of large single-cell datasets. Many such methods can be run in parallel which enables this limitation to be overcome by running on more powerful hardware. However, the constraints imposed by the prior distributions in CoGAPS limit the applicability of parallelization methods to enhance computational efficiency for single-cell analysis.Entities:
Keywords: Matrix factorization; Pattern detection; Single cell; Unsupervised learning
Mesh:
Year: 2020 PMID: 33054706 PMCID: PMC7556974 DOI: 10.1186/s12859-020-03796-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic for the asynchronous updating scheme used in CoGAPS. Updates are proposed sequentially until a conflicting proposal is generated, at which time the existing queue of proposals is evaluated in parallel. In this case, proposed change #4 is conflicting since it is in the same row as #2. When evaluating a proposal, an entire row or column of is used in the calculation of the conditional distribution used to perform Gibbs sampling. When a change is made in row of , the entire nth row of changes. So, if another change is later proposed in row , the value of used will depend on the previous proposal thereby changing the conditional distribution for this new proposal. This is exactly the case here for #2 and #4. Changes #1, #2, #3 can be processed in parallel since they do not conflict with each other
Relative performance of the sparse optimization on 2000 genes and 2000 cells, baseline is the standard algorithm with 1 thread and no sparse optimization
| Data sparsity (%) | Memory (MB) | Runtime (1 thread) | Runtime (4 threads) |
|---|---|---|---|
| 70 | 0.14 | 1.92 | 0.62 |
| 80 | 0.09 | 1.24 | 0.42 |
| 90 | 0.04 | 0.42 | 0.20 |
Fig. 2Run time (in hours) of CoGAPS on random subsets of cells for a subset of 1000 genes (left) and random subsets of genes for a subset of 1000 cells (right) from the Li et al. [8] umbilical cord blood single cell dataset. Dotted line in the left plot corresponds to the values in the right plot and is included for scale reference