| Literature DB >> 34573424 |
Abstract
Analysis of single-cell multiomics datasets is a novel topic and is considerably challenging because such datasets contain a large number of features with numerous missing values. In this study, we implemented a recently proposed tensor-decomposition (TD)-based unsupervised feature extraction (FE) technique to address this difficult problem. The technique can successfully integrate single-cell multiomics data composed of gene expression, DNA methylation, and accessibility. Although the last two have large dimensions, as many as ten million, containing only a few percentage of nonzero values, TD-based unsupervised FE can integrate three omics datasets without filling in missing values. Together with UMAP, which is used frequently when embedding single-cell measurements into two-dimensional space, TD-based unsupervised FE can produce two-dimensional embedding coincident with classification when integrating single-cell omics datasets. Genes selected based on TD-based unsupervised FE are also significantly related to reasonable biological roles.Entities:
Keywords: feature extraction; multiomics data; single-cell; tensor decomposition
Mesh:
Substances:
Year: 2021 PMID: 34573424 PMCID: PMC8468466 DOI: 10.3390/genes12091442
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
The number of single cells within individual cell types included in Dataset 1.
| FGO | GO1 | GO2 | Granulosa | Immune | MI | MII | StromaC1 | StromaC2 |
|---|---|---|---|---|---|---|---|---|
| 81 | 40 | 46 | 93 | 20 | 155 | 90 | 189 | 185 |
The number of single cells at four embryonic time points included in Dataset 2. For E7.5, the gene expression profiles of 296 single cells were measured.
| E4.5-5.5 | E6.5 | E6.75 | E7.5 |
|---|---|---|---|
| 267 | 98 | 97 | 390 (296) |
Number of singular-value vectors coincident with classification shown in Table 1.
| SVD ( | HOSVD ( | ||||
|---|---|---|---|---|---|
| Adjusted | Gene | DNA | DNA | DNA Methylation | |
| Expression | Methylation | Accessibility | and Accessibility | All | |
| <0.01 | 10 | 7 | 1 | 10 | 18 |
| ≥0.01 | 0 | 3 | 9 | 10 | 12 |
Figure 1Two-dimensional embedding of singular-value vectors, , computed by HOSVD applied to in Dataset 1 (Table 3). Upper: when only DNA methylation and accessibility are integrated. Lower: when all three omics data points () are integrated. Default settings other than custom.config$n_neighbors = 100 were used.
Number of singular-value vectors coincident with the classification shown in Table 2.
| SVD ( | HOSVD ( | ||||
|---|---|---|---|---|---|
| Adjusted | Gene | DNA | DNA | DNA Methylation | |
| Expression | Methylation | Accessibility | and Accessibility | All | |
| <0.01 | 10 | 7 | 5 | 10 | 18 |
| ≥0.01 | 0 | 3 | 5 | 10 | 12 |
Figure 2Two-dimensional embedding of singular-value vectors, , computed by HOSVD applied to in Dataset 2 (Table 4). Upper: when only DNA methylation and accessibility are integrated. Lower: when all three omics data points () are integrated. Default settings other than custom.config$n_neighbors = 100 were used.
Number of single cells, features, nonzero components, and their ratios.
| Numbers | Expression | DNA Methylation | DNA Accessibility |
|---|---|---|---|
| Dataset 1 | |||
| single cells | 899 | 899 | 899 |
| features | 26,500 | 26,438,807 | 15,478,375 |
| total components |
|
|
|
| nonzero components |
|
|
|
| the ratio of nonzero components | 0.28 | 0.02 | 0.03 |
| Data set 2 | |||
| single cells | 758 | 852 | 852 |
| features | 22,084 | 20,106,507 | 13,627,678 |
| total components |
|
|
|
| nonzero components |
|
|
|
| the ratio of nonzero components | 0.29 | 0.04 | 0.07 |