| Literature DB >> 31608111 |
Abstract
Although single-cell RNA sequencing (scRNA-seq) technology is newly invented and a promising one, but because of lack of enough information that labels individual cells, it is hard to interpret the obtained gene expression of each cell. Because of insufficient information available, unsupervised clustering, for example, t-distributed stochastic neighbor embedding and uniform manifold approximation and projection, is usually employed to obtain low-dimensional embedding that can help to understand cell-cell relationship. One possible drawback of this strategy is that the outcome is highly dependent upon genes selected for the usage of clustering. In order to fulfill this requirement, there are many methods that performed unsupervised gene selection. In this study, a tensor decomposition (TD)-based unsupervised feature extraction (FE) was applied to the integration of two scRNA-seq expression profiles that measure human and mouse midbrain development. TD-based unsupervised FE could select not only coincident genes between human and mouse but also biologically reliable genes. Coincidence between two species as well as biological reliability of selected genes is increased compared with that using principal component analysis (PCA)-based FE applied to the same data set in the previous study. Since PCA-based unsupervised FE outperformed the other three popular unsupervised gene selection methods, highly variable genes, bimodal genes, and dpFeature, TD-based unsupervised FE can do so as well. In addition to this, 10 transcription factors (TFs) that might regulate selected genes and might contribute to midbrain development were identified. These 10 TFs, BHLHE40, EGR1, GABPA, IRF3, PPARG, REST, RFX5, STAT3, TCF7L2, and ZBTB33, were previously reported to be related to brain functions and diseases. TD-based unsupervised FE is a promising method to integrate two scRNA-seq profiles effectively.Entities:
Keywords: enrichment analysis; inter-species analysis; midbrain development; single-cell RNA-sequencing; tensor decomposition
Year: 2019 PMID: 31608111 PMCID: PMC6761323 DOI: 10.3389/fgene.2019.00864
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
The number of cells that belong to either without or with acute formalin stress or cell types.
| Cell types | Without | With |
|---|---|---|
| Acute stress | ||
| Astrocytes | 135 | 132 |
| Endothelial | 169 | 71 |
| Ependymal | 211 | 145 |
| Microglia | 34 | 14 |
| Neurons | 628 | 270 |
| Oligos | 570 | 431 |
| VSM | 38 | 33 |
VSM, vascular smooth muscle.
Confusion matrix of coincidence between selected 55 singular value vectors selected among all 1,977 singular value vectors, u, attributed to human cells and 44 singular value vectors selected among all 1907 singular value vectors, v, attributed to mouse cells.
| Human | |||
|---|---|---|---|
| Not selected | Selected | ||
| Mouse | Not selected | 1,833 | 12 |
| Selected | 23 | 32 | |
Selected: corrected P-values, computed with regression analysis [Eqs. (6) and (7)], are less than 0.01. Not selected: otherwise. Odds ratio is as many as 227, and P-values computed by Fisher’s exact test are 1.44×10-44.
Figure 1Coincidence between singular value vectors shown in . Horizontal axis: singular value vector numbering ℓ. Black open circles, ℓ selected for human; blue crosses, ℓ selected for both human and mouse; red open triangles, ℓ selected for mouse. Vertical black broken lines connect ℓ selected for both human and mouse.
Confusion matrix of coincidence between selected 456 genes for human and selected 505 genes for mouse among all 13,384 common genes.
| Human | |||
|---|---|---|---|
| Not selected | Selected | ||
| Mouse | Not selected | 13,233 | 151 |
| Selected | 200 | 305 | |
Selected: corrected P-values, computed with χ2 distribution [Eqs. (8) and (9)], are less than 0.01. Not selected: otherwise. Odds ratio is as many as 133, and P-values computed by Fisher’s exact test are 0 (i.e., less than numerical accuracy).
Five top-ranked terms from “Allen Brain Atlas up” by Enrichr for selected 456 human genes and 505 mouse genes.
| Human | |||
|---|---|---|---|
| Term | Overlap | Adjusted | |
| Paraventricular hypothalamic nucleus, magnocellular division, medial magnocellular part | 31/301 | 2.68 × 10–12 | 2.91 × 10–9 |
| Paraventricular hypothalamic nucleus, magnocellular division | 31/301 | 2.68 × 10-12 | 2.91 × 10-9 |
| Paraventricular hypothalamic nucleus, magnocellular division, posterior magnocellular part | 28/301 | 3.39 × 10-10 | 1.47 × 10-7 |
| Paraventricular hypothalamic nucleus | 29/301 | 7.02 × 10-11 | 5.08 × 10-8 |
| Paraventricular nucleus, dorsal part | 27/301 | 1.57 × 10-9 | 4.88 × 10-7 |
| Mouse | |||
| Paraventricular hypothalamic nucleus, magnocellular division, medial magnocellular part | 31/301 | 4.03 × 10-11 | 2.19 × 10-8 |
| Paraventricular hypothalamic nucleus, magnocellular division | 31/301 | 4.03 × 10-11 | 2.19 × 10-8 |
| Paraventricular hypothalamic nucleus, magnocellular division, posterior magnocellular part | 31/301 | 4.03 × 10-11 | 2.19 × 10-8 |
| Lower dorsal lateral hypothalamic area | 29/301 | 8.40 × 10-10 | 3.65 × 10-7 |
| Paraventricular hypothalamic nucleus, magnocellular division, posterior magnocellular part, lateral zone | 31/301 | 4.03 × 10-11 | 2.19 × 10-8 |
Enrichment of embryonic brain by “JENSEN TISSUES” in Enrichr.
| Term | Overlap | Adjusted | |
|---|---|---|---|
| Human | |||
| Embryonic_brain | 330/4936 | 3.36 × 10-104 | 4.30 × 10-102 |
| Mouse | |||
| Embryonic_brain | 366/4936 | 3.59 × 10-115 | 4.59 × 10-113 |
Enrichment of embryonic brain by “ARCHS4 Tissues” in Enrichr.
| Term | Overlap | Adjusted | |
|---|---|---|---|
| Human | |||
| MIDBRAIN | 248/2316 | 1.02 × 10-129 | 1.11 × 10-127 |
| Mouse | |||
| MIDBRAIN | 248/2316 | 1.44 × 10-99 | 1.56 × 10-97 |
Five top-ranked terms from “GTEx Tissue Sample Gene Expression Profiles up” by Enrichr for selected 456 human genes and 505 mouse genes. Brain-related terms are asterisked.
| Human | |||
|---|---|---|---|
| Term | Overlap | Adjusted | |
| GTEX-QCQG-1426-SM-48U22_ovary_female_50-59_years | 105/1165 | 3.56 × 10-35 | 1.04 × 10-31 |
| GTEX-RWS6-1026-SM-47JXD_ovary_female_60-69_years | 116/1574 | 7.96 × 10-31 | 7.74 × 10-28 |
| GTEX-TMMY-1726-SM-4DXTD_ovary_female_40-49_years | 117/1582 | 2.97 × 10-31 | 4.33 × 10-28 |
| GTEX-RU72-0008-SM-46MV8_skin_female_50-59_years | 94/1103 | 1.99 × 10-31 | 1.45 × 10-26 |
| GTEX-R55E-0008-SM-48FCG_skin_male_20-29_years | 111/1599 | 3.67 × 10-27 | 1.78 × 10-24 |
| *GX-WVLH-0011-R4A-SM-3MJFS_brain_male_50-59_years | 139/1957 | 1.93 × 10-30 | 5.63 × 10-27 |
| *GX-X261-0011-R8A-SM-4E3I5_brain_male_50-59_years | 135/1878 | 5.24 × 10-30 | 7.65 × 10-27 |
| *GX-T5JC-0011-R4A-SM-32PLT_brain_male_20-29_years | 129/1948 | 3.51 × 10-25 | 3.42 × 10-22 |
| Mouse | |||
| GTEX-R55E-0008-SM-48FCG_skin_male_20-29_years | 109/1599 | 4.93 × 10-22 | 2.40 × 10-19 |
| GTEX-TMMY-1726-SM-4DXTD_ovary_female_40-49_years | 107/1582 | 2.37 × 10-21 | 7.69 × 10-19 |
Five top-ranked terms from “MGI Mammalian Phenotype 2017” by Enrichr for selected 456 human genes and 505 mouse genes. Brain-related terms are asterisked.
| Human | |||
|---|---|---|---|
| Term | Overlap | Adjusted | |
| MP:0002169_no_abnormal_phenotype_detected | 82/1674 | 2.52 × 10-11 | 5.53 × 10-8 |
| MP:0001262_decreased_body_weight | 63/1189 | 3.40 × 10-10 | 3.72 × 10-7 |
| MP:0001265_decreased_body_size | 46/774 | 3.20 × 10-9 | 2.33 × |
| *M0009937_abnormal_neuron_differentiation | 15/106 | 1.81 × 10-8 | 9.90 × 10-6 |
| *M0000788_abnormal_cerebral_cortex_morphology | 17/145 | 3.64 × 10-8 | 1.60 × 10-5 |
| Mouse | |||
| MP:0002169_no_abnormal_phenotype_detected | 89/1674 | 1.36 × | 3.09 × 10-8 |
| MP:0011091_prenatal_lethality,_complete_penetrance | 27/272 | 1.68 × 10-9 | 1.91 × 10-6 |
| MP:0001262_decreased_body_weight | 65/1189 | 3.93 × 10-9 | 2.97 × 10-6 |
| MP:0011100_preweaning_lethality,_complete_penetrance | 42/674 | 8.55 × 10-8 | 3.88 × 10-5 |
| MP:0001265_decreased_body_size | 46/774 | 8.22 × 10-8 | 3.88 × 10-5 |
TFs enriched in “ENCODE and ChEA Consensus TFs from ChIP-X” by Enrichr for human and mouse. Bold TFs are common.
| Human | BCL3, BHLHE40, EGR1, GABPA, IRF3, PPARG, REST, RFX5, SP1, SP2, SRF, STAT3, TCF7L2, TRIM28, TRIM28, ZBTB33 |
|---|---|
| Mouse | BHLHE40, CTCF, E2F4, E2F6, EGR1, ESR1, ETS1, FLI1, GABPA, IRF3, NFIC, NRF1, PPARG, RCOR1, REST, RFX5, SPI1, STAT3, TCF7L2, USF1, USF2, YY1, ZBTB33, ZNF384 |
TFs, transcription factors.
Figure 2Transcription factor (TF) network identified by regnetworkweb for TFs in . (A) Human and (B) mouse.
Confusion matrix of coincidence between selected 30 singular value vectors selected among all 1,096 singular value vectors, u ,attributed to samples without stress and 24 singular value vectors selected among all 1,096 singular value vectors, v attributed to samples with stress.
| Not selected | Selected | ||
|---|---|---|---|
| Without stress | Not selected | 1,065 | 1 |
| Selected | 7 | 23 |
For samples without stress, only the top 1,096 singular value vectors among all 1,785 singular value vectors are considered, since total number of singular value vectors attributed to samples without stress is 1,096. Selected: corrected P -values, computed with regression analysis (Eqs. (12) and (13)), are less than 0.01. Not selected: otherwise. Odds ratio is as many as 2,483, and P-values computed by Fisher’s exact test are 1.92×10-40.
Figure 3Coincidence between singular value vectors shown in . Horizontal axis: singular value vector numbering ℓ. Black open circles, ℓs selected for samples without stress; blue crosses, ℓs selected for both samples without and with stress; red open triangles, ℓs selected for samples with stress. Vertical black broken lines connect ℓs selected for both samples without and with stress.
Confusion matrix of coincidence between selected 4,150 genes for samples without stress and selected 3,621 genes for samples with stress among all 24,341 genes.
| With stress | |||
|---|---|---|---|
| Not selected | Selected | ||
| Without stress | Not selected | 19,894 | 297 |
| Selected | 826 | 3,324 | |
Selected: corrected P -values, computed with χ2s atribution that corresponds to Eqs. (8) and (9) in human and mouse midbrain study, are less than 0.01. Not selected: otherwise. Odds ratio is as many as 270, and P-values computed by Fisher’s exact test are 0 (i.e., less than numerical accuracy).
Five top-ranked terms from “GTEx Tissue Sample Gene Expression Profiles up” by Enrichr for 3,324 genes selected commonly between samples without and with stress.
| Term | Overlap | Adjusted | |
|---|---|---|---|
| GTEX-WWYW-0011-R10A-SM-3NB35_brain_female_50-59_years | 1006/2885 | 2.7880 × 10-151 | 8.135 × 10-148 |
| GTEX-T6MN-0011-R1A-SM-32QOY_brain_male_50-59_years | 859/2317 | 2.9865 × 10-144 | 4.3575 × 10-141 |
| GTEX-QVUS-0011-R3A-SM-3GAFD_brain_female_60-69_years | 963/2759 | 6.8195 × 10-144 | 6.6325 × 10-141 |
| GTEX-T2IS-0011-R3A-SM-32QPB_brain_female_20-29_years | 967/2792 | 5.5265 × 10-142 | 4.0315 × 10-139 |
| GTEX-WZTO-0011-R3B-SM-3NMC6_brain_male_40-49_years | 991/2972 | 2.6805 × 10-133 | 1.5645 × 10-130 |
Five top-ranked terms from “Allen Brain Atlas up” by Enrichr for 3,324 genes selected commonly between samples without and with stress.
| Term | Overlap | Adjusted | |
|---|---|---|---|
| Paraventricular hypothalamic nucleus | 120/301 | 3.38 × 10-22 | 7.41 × 10-19 |
| Paraventricular hypothalamic nucleus, parvicellular division | 119/301 | 1.15 × 10-21 | 1.27 × 10-18 |
| Paraventricular hypothalamic nucleus, parvicellular division, medial parvicellular part, dorsal zone | 117/301 | 1.29 × 10-20 | 9.42 × 10-18 |
| Paraventricular nucleus, cap part | 116/301 | 4.22 × 10-20 | 2.31 × 10-17 |
| Paraventricular hypothalamic nucleus, magnocellular division | 115/301 | 1.36 × 10-19 | 5.96 × 10-17 |