| Literature DB >> 34025945 |
Nigatu Adossa1, Sofia Khan1, Kalle T Rytkönen1,2, Laura L Elo1,2.
Abstract
Single-cell omics technologies are currently solving biological and medical problems that earlier have remained elusive, such as discovery of new cell types, cellular differentiation trajectories and communication networks across cells and tissues. Current advances especially in single-cell multi-omics hold high potential for breakthroughs by integration of multiple different omics layers. To pair with the recent biotechnological developments, many computational approaches to process and analyze single-cell multi-omics data have been proposed. In this review, we first introduce recent developments in single-cell multi-omics in general and then focus on the available data integration strategies. The integration approaches are divided into three categories: early, intermediate, and late data integration. For each category, we describe the underlying conceptual principles and main characteristics, as well as provide examples of currently available tools and how they have been applied to analyze single-cell multi-omics data. Finally, we explore the challenges and prospective future directions of single-cell multi-omics data integration, including examples of adopting multi-view analysis approaches used in other disciplines to single-cell multi-omics.Entities:
Keywords: Clustering; Integration; Multi-omics; Single-cell
Year: 2021 PMID: 34025945 PMCID: PMC8114078 DOI: 10.1016/j.csbj.2021.04.060
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Single-cell multi-omics workflow. The first step in the workflow is sample extraction where cells are harvested, for example, from blood or tissues. Next, the extracted cells are dissociated and used to profile multiple layers of omics data from individual cells. In the computational analysis three data integration strategies can be used: early, intermediate and late data integration. In the end, for instance, distinct cell types and cell states can be recognized by clustering.
Fig. 2Schematic illustration of the early, intermediate and late data integration strategies in single-cell multi-omics analysis. In early data integration, multiple omics datasets are concatenated together for downstream analysis. By default, early integration increases the dimensionality of the data and does not account for the different distribution of the values in each separate omics layer. The intermediate data integration strategy covers a range of techniques to jointly analyze multiple omics datasets. Typically, this is done by transforming the datasets to a single integrated data matrix using, for instance, similarity-based integration, joint dimensionality reduction, or statistical modeling-based approaches. The late data integration strategy first employs the data analysis separately for each omics layer and then integrates these results to create a consensus result.
Computational single-cell multi-omics tools applying intermediate integration approaches and their applicable omic data types.
| Tool | Methodology | Single-cell omics types (designed for matched/unmatched) | Refs. |
|---|---|---|---|
| Similarity-based approaches | |||
| SCHEMA | Metric-learning based method | Multi-omics data (matched) | |
| Spectrum | Weighted-nearest neighbor analysis | Multi-omics data (unmatched) | |
| Seurat4 | Weighted-nearest neighbor analysis | Transcriptome and chromatin accessibility or proteome data (matched) | |
| Dimension reduction-based approaches | |||
| BindSC | Canonical correlation analysis | Transcriptome and chromatin accessibility data (matched) | |
| CoupledNMF | Non-negative matrix factorization | Transcriptome and chromatin accessibility data (unmatched) | |
| LIGER | Non-negative matrix factorization | Transcriptome and spatial gene expression data or DNA methylation (unmatched) | |
| MAGAN | Manifold alignment | Multi-omics data (unmatched) | |
| MATCHER | Manifold alignment | Transcriptome and DNA methylation data (matched) | |
| MMD-MA | Manifold alignment | Multi-omics data (matched) | |
| MOFA+ | Factor analysis | Multi-omics data (matched) | |
| scMVAE | Variational autoencoder | Multi-omics data (matched) | |
| Seurat3 | Canonical correlation analysis | Transcriptome and chromatin accessibility data (unmatched) | |
| totalVI | Deep generative model | Transcriptome and proteome data (matched) | |
| Unicom | Manifold alignment | Multi-omics data (unmatched) | |
| Statistical modeling-based approaches | |||
| BREM-SC | Bayesian mixture model | Transcriptome and proteome data (matched) | |
| Clonealign | Statistical model | Transcriptome and genome data (unmatched) | |