| Literature DB >> 35198011 |
Soumita Seth1, Saurav Mallik2, Tapas Bhadra1, Zhongming Zhao2,3.
Abstract
The major interest domains of single-cell RNA sequential analysis are identification of existing and novel types of cells, depiction of cells, cell fate prediction, classification of several types of tumor, and investigation of heterogeneity in different cells. Single-cell clustering plays an important role to solve the aforementioned questions of interest. Cluster identification in high dimensional single-cell sequencing data faces some challenges due to its nature. Dimensionality reduction models can solve the problem. Here, we introduce a potential cluster specified frequent biomarkers discovery framework using dimensionality reduction and hierarchical agglomerative clustering Louvain for single-cell RNA sequencing data analysis. First, we pre-filtered the features with fewer number of cells and the cells with fewer number of features. Then we created a Seurat object to store data and analysis together and used quality control metrics to discard low quality or dying cells. Afterwards we applied global-scaling normalization method "LogNormalize" for data normalization. Next, we computed cell-to-cell highly variable features from our dataset. Then, we applied a linear transformation and linear dimensionality reduction technique, Principal Component Analysis (PCA) to project high dimensional data to an optimal low-dimensional space. After identifying fifty "significant"principal components (PCs) based on strong enrichment of low p-value features, we implemented a graph-based clustering algorithm Louvain for the cell clustering of 10 top significant PCs. We applied our model to a single-cell RNA sequential dataset for a rare intestinal cell type in mice (NCBI accession ID:GSE62270, 23,630 features and 1872 samples (cells)). We obtained 10 cell clusters with a maximum modularity of 0.885 1. After detecting the cell clusters, we found 3871 cluster-specific biomarkers using an expression feature extraction statistical tool for single-cell sequencing data, Model-based Analysis of Single-cell Transcriptomics (MAST) with a log 2 FC threshold of 0.25 and a minimum feature detection of 25%. From these cluster-specific biomarkers, we found 1892 most frequent markers, i.e., overlapping biomarkers. We performed degree hub gene network analysis using Cytoscape and reported the five highest degree genes (Rps4x, Rps18, Rpl13a, Rps12 and Rpl18a). Subsequently, we performed KEGG pathway and Gene Ontology enrichment analysis of cluster markers using David 6.8 software tool. In summary, our proposed framework that integrated dimensionality reduction and agglomerative hierarchical clustering provides a robust approach to efficiently discover cluster-specific frequent biomarkers, i.e., overlapping biomarkers from single-cell RNA sequencing data.Entities:
Keywords: agglomerative hierarchical clustering; cluster specified biomarkers; dimensionality reduction; modularity optimization; principal component analysis(PCA); single-cell sequencing data analysis
Year: 2022 PMID: 35198011 PMCID: PMC8859265 DOI: 10.3389/fgene.2022.828479
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1Flowchart of the proposed framework.
FIGURE 2Visualize QC metrics as a violin plot.
FIGURE 3FeatureScatter plot to visualize feature-feature relationships.
FIGURE 4Variable features with labels.
FIGURE 5DimPlot of two principal components (PC1 Vs PC2).
FIGURE 6DimHeatMap for principal components. (A) DimHeatMap for first principal component. (B) DimHeatMap for fifteen principal components.
FIGURE 7Visualization of strong enrichment of features with low p-values.
FIGURE 8Visualization of clusters.
Top 30 cluster specified frequent biomarkers.
| Marker name | Frequency | Specified clusters | FDR |
|---|---|---|---|
|
|
| Group1: Cluster0 | 1.82 × 10–06 |
| Group1: Cluster1 | 7.12 × 10–03 | ||
| Group1: Cluster3 | 2.49, ×, 10–07 | ||
| Group1: Cluster6 | 1.00, ×, 10–00 | ||
| Group1: Cluster7 | 1.00, ×, 10–00 | ||
|
|
| Group1: Cluster2 | 8.69, ×, 10–28 |
| Group1: Cluster3 | 2.08 × 10–02 | ||
| Group1: Cluster6 | 4.82 × 10–05 | ||
| Group1: Cluster7 | 3.24 × 10–01 | ||
| Group1: Cluster8 | 1.14 × 10–04 | ||
|
|
| Group1: Cluster0 | 1.07 × 10–8 |
| Group1: Cluster3 | 3.45 × 10–07 | ||
| Group1: Cluster4 | 2.29, ×, 10–04 | ||
| Group1: Cluster5 | 6.97 × 10–01 | ||
| Group1: Cluster9 | 4.97 × 10–03 | ||
|
|
| Group1: Cluster1 | 6.52 × 10–02 |
| Group1: Cluster2 | 4.45 × 10–05 | ||
| Group1: Cluster6 | 1.00, ×, 10–00 | ||
| Group1: Cluster7 | 8.31 × 10–02 | ||
|
|
| Group1: Cluster0 | 1.29, ×, 10–02 |
| Group1: Cluster4 | 1.50, ×, 10–06 | ||
| Group1: Cluster5 | 8.56 × 10–05 | ||
| Group1: Cluster9 | 8.12 × 10–05 | ||
|
|
| Group1: Cluster0 | 1.04 × 10–05 |
| Group1: Cluster1 | 3.00, ×, 10–04 | ||
| Group1: Cluster3 | 1.69, ×, 10–05 | ||
| Group1: Cluster8 | 4.63 × 10–03 | ||
|
|
| Group1: Cluster0 Vs Group2: Rest all clusters | 7.59, ×, 10–11 |
| Group1: Cluster3 | 2.34 × 10–05 | ||
| Group1: Cluster6 | 9.96 × 10–05 | ||
| Group1: Cluster7 | 1.00, ×, 10–04 | ||
|
|
| Group1: Cluster0 | 6.49, ×, 10–15 |
| Group1: Cluster1 | 1.14 × 10–15 | ||
| Group1: Cluster3 | 8.88 × 10–11 | ||
| Group1: Cluster9 Vs Group2: Rest all clusters | 5.56 × 10–03 | ||
|
|
| Group1: Cluster0 | 9.46 × 10–16 |
| Group1: Cluster1 | 9.37 × 10–04 | ||
| Group1: Cluster3 | 1.41 × 10–02 | ||
| Group1: Cluster4 | 5.64 × 10–07 | ||
|
|
| Group1: Cluster1 | 6.53 × 10–18 |
| Group1: Cluster3 | 8.54 × 10–04 | ||
| Group1: Cluster5 | 1.28 × 10–05 | ||
| Group1: Cluster9 | 1.65 × 10–01 | ||
|
|
| Group1: Cluster0 | 1.72 × 10–14 |
| Group1: Cluster3 | 9.78 × 10–07 | ||
| Group1: Cluster6 | 5.75 × 10–01 | ||
| Group1: Cluster7 | 1.18 × 10–03 | ||
|
|
| Group1: Cluster1 | 3.35 × 10–10 |
| Group1: Cluster6 | 3.83 × 10–03 | ||
| Group1: Cluster7 | 1.15 × 10–04 | ||
| Group1: Cluster8 | 4.44 × 10–01 | ||
|
|
| Group1: Cluster4 | 3.26 × 10–05 |
| Group1: Cluster5 | 2.29, ×, 10–05 | ||
| Group1: Cluster8 | 8.48 × 10–08 | ||
| Group1: Cluster9 | 6.19, ×, 10–01 | ||
|
|
| Group1: Cluster1 | 1.34 × 10–06 |
| Group1: Cluster5 | 8.44 × 10–03 | ||
| Group1: Cluster8 | 4.38 × 10–03 | ||
| Group1: Cluster9 | 3.29, ×, 10–02 | ||
|
|
| Group1: Cluster0 | 2.84 × 10–36 |
| Group1: Cluster4 | 1.85 × 10–03 | ||
| Group1: Cluster5 | 4.43 × 10–02 | ||
| Group1: Cluster9 | 6.96 × 10–01 | ||
|
|
| Group1: Cluster1 | 8.53 × 10–07 |
| Group1: Cluster2 | 6.58 × 10–10 | ||
| Group1: Cluster3 | 3.11 × 10–07 | ||
| Group1: Cluster8 | 4.55 × 10–19 | ||
|
|
| Group1: Cluster1 | 3.14 × 10–09 |
| Group1: Cluster2 | 4.50, ×, 10–10 | ||
| Group1: Cluster3 | 5.90, ×, 10–03 | ||
| Group1: Cluster8 | 2.37 × 10–27 | ||
|
|
| Group1: Cluster1 | 1.04 × 10–06 |
| Group1: Cluster2 | 7.88 × 10–02 | ||
| Group1: Cluster3 | 9.58 × 10–12 | ||
| Group1: Cluster8 | 6.17 × 10–18 | ||
|
|
| Group1: Cluster1 | 4.45 × 10–04 |
| Group1: Cluster2 | 6.28 × 10–01 | ||
| Group1: Cluster6 | 1.91 × 10–08 | ||
| Group1: Cluster7 | 8.40, ×, 10–05 | ||
|
|
| Group1: Cluster0 | 9.60, ×, 10–05 |
| Group1: Cluster3 | 3.21 × 10–01 | ||
| Group1: Cluster5 | 2.70, ×, 10–11 | ||
| Group1: Cluster9 | 2.21 × 10–07 | ||
|
|
| Group1: Cluster0 | 2.48 × 10–03 |
| Group1: Cluster4 | 8.20, ×, 10–04 | ||
| Group1: Cluster5 | 7.20, ×, 10–01 | ||
| Group1: Cluster9 | 3.15 × 10–06 | ||
|
|
| Group1: Cluster1 | 2.38 × 10–12 |
| Group1: Cluster2 | 1.91 × 10–02 | ||
| Group1: Cluster5 | 2.98 × 10–06 | ||
| Group1: Cluster9 | 2.76 × 10–07 | ||
|
|
| Group1: Cluster0 | 1.34 × 10–39 |
| Group1: Cluster3 | 1.00, ×, 10–00 | ||
| Group1: Cluster4 | 1.16 × 10–02 | ||
| Group1: Cluster9 | 8.70, ×, 10–01 | ||
|
|
| Group1: Cluster0 | 3.65 × 10–35 |
| Group1: Cluster3 | 7.05 × 10–02 | ||
| Group1: Cluster4 | 9.36 × 10–03 | ||
| Group1: Cluster9 | 1.00, ×, 10–00 | ||
|
|
| Group1: Cluster0 | 5.36 × 10–29 |
| Group1: Cluster4 | 3.64 × 10–01 | ||
| Group1: Cluster6 | 3.68 × 10–01 | ||
| Group1: Cluster9 | 1.00, ×, 10–00 | ||
|
|
| Group1: Cluster0 | 1.39, ×, 10–29 |
| Group1: Cluster4 | 5.02 × 10–03 | ||
| Group1: Cluster6 | 1.00, ×, 10–00 | ||
| Group1: Cluster9 | 3.52 × 10–01 | ||
|
|
| Group1: Cluster0 Vs Group2: Rest all clusters | 3.56 × 10–06 |
| Group1: Cluster4 | 1.63 × 10–10 | ||
| Group1: Cluster5 | 1.81 × 10–01 | ||
| Group1: Cluster9 | 1.19, ×, 10–03 | ||
|
|
| Group1: Cluster1 | 1.24 × 10–05 |
| Group1: Cluster2 | 5.81 × 10–05 | ||
| Group1: Cluster6 | 1.68 × 10–02 | ||
| Group1: Cluster7 | 2.44 × 10–01 | ||
|
|
| Group1: Cluster0 | 1.42 × 10–03 |
| Group1: Cluster4 | 2.90, ×, 10–14 | ||
| Group1: Cluster5 | 1.98 × 10–02 | ||
| Group1: Cluster9 | 1.77 × 10–03 | ||
|
|
| Group1: Cluster0 | 3.64 × 10–09 |
| Group1: Cluster3 | 3.75 × 10–08 | ||
| Group1: Cluster5 | 9.12 × 10–05 | ||
| Group1: Cluster9 | 2.26 × 10–02 |
See Supplementary Table S2 for details.
Top 20 hub genes ranked by degree centrality.
| Gene symbol | Degree | Average shortest path length | Betweenness centrality | Closeness centrality | Clustering coefficient |
|---|---|---|---|---|---|
|
| 32 | 1.800 | 0.056 | 0.556 | 0.536 |
|
| 32 | 1.861 | 0.065 | 0.537 | 0.566 |
|
| 31 | 1.877 | 0.021 | 0.533 | 0.596 |
|
| 29 | 1.892 | 0.018 | 0.528 | 0.640 |
|
| 29 | 1.923 | 0.012 | 0.520 | 0.662 |
|
| 29 | 1.923 | 0.036 | 0.520 | 0.589 |
|
| 28 | 1.862 | 0.034 | 0.537 | 0.600 |
|
| 28 | 1.661 | 0.099 | 0.602 | 0.587 |
|
| 28 | 1.646 | 0.170 | 0.607 | 0.582 |
|
| 27 | 1.877 | 0.032 | 0.533 | 0.587 |
|
| 25 | 1.985 | 0.008 | 0.504 | 0.740 |
|
| 24 | 1.923 | 0.018 | 0.520 | 0.677 |
|
| 24 | 2.000 | 0.011 | 0.500 | 0.688 |
|
| 23 | 2.015 | 0.005 | 0.496 | 0.794 |
|
| 23 | 1.969 | 0.010 | 0.508 | 0.735 |
|
| 22 | 1.785 | 0.142 | 0.560 | 0.420 |
|
| 22 | 2.046 | 0.004 | 0.489 | 0.770 |
|
| 22 | 2.323 | 0.007 | 0.430 | 0.675 |
|
| 21 | 2.000 | 0.018 | 0.500 | 0.628 |
|
| 21 | 2.338 | 0.008 | 0.428 | 0.652 |
FIGURE 9Visualization of Hub gene network of strongly correlated frequent markers.
Top significant KEGG Pathways (FDR sorted).
| KEGG pathway name | #genes | Enriched Adjusted p-value | FDR |
|---|---|---|---|
|
| 112 | 7.54 × 10–40 | 5.17 × 10–40 |
|
| 477 | 3.97 × 10–33 | 1.36 × 10–33 |
|
| 124 | 5.34 × 10–25 | 1.22 × 10–25 |
|
| 89 | 3.77 × 10–22 | 2.58 × 10–22 |
|
| 111 | 1.50, ×, 10–20 | 2.05 × 10–21 |
|
| 90 | 2.79, ×, 10–19 | 3.19, ×, 10–20 |
|
| 100 | 1.19, ×, 10–18 | 1.16 × 10–19 |
|
| 86 | 1.14 × 10–14 | 9.74 × 10–16 |
|
| 70 | 1.50, ×, 10–14 | 1.14 × 10–15 |
|
| 74 | 9.63 × 10–13 | 6.57 × 10–14 |
See Supplementary Table S3 for details.
Top significant GO-MF term enriched (FDR sorted).
| GO-MF term name | #genes | Enriched Adjusted p-value | FDR |
|---|---|---|---|
|
| 538 | 2.94 × 10–120 | 2.82 × 10–120 |
|
| 158 | 6.77 × 10–43 | 3.25 × 10–43 |
|
| 1046 | 1.15 × 10–36 | 3.68 × 10–37 |
|
| 137 | 2.83 × 10–31 | 6.29 × 10–32 |
|
| 285 | 3.28 × 10–31 | 6.29 × 10–32 |
|
| 551 | 8.93 × 10–28 | 1.43 × 10–28 |
|
| 143 | 3.90 × 10–15 | 5.34 × 10–16 |
|
| 196 | 5.39 × 10–14 | 6.46 × 10–15 |
|
| 131 | 7.38 × 10–13 | 8.61 × 10–14 |
|
| 111 | 4.18 × 10–12 | 3.95 × 10–13 |
See Supplementary Table S6 for details.
Top significant GO-BP term enriched (FDR sorted).
| GO-BP term name | #genes | Enriched Adjusted p-value | FDR |
|---|---|---|---|
|
| 195 | 1.04 × 10–39 | 1.02 × 10–39 |
|
| 529 | 6.16 × 10–28 | 3.02 × 10–28 |
|
| 241 | 2.04 × 10–23 | 6.69 × 10–24 |
|
| 93 | 1.04 × 10–17 | 2.55 × 10–18 |
|
| 198 | 4.07 × 10–15 | 7.99, ×, 10–16 |
|
| 127 | 6.55 × 10–15 | 9.33 × 10–16 |
|
| 104 | 6.66 × 10–15 | 9.33 × 10–16 |
|
| 93 | 2.03 × 10–13 | 2.48 × 10–14 |
|
| 155 | 2.15 × 10–11 | 2.17 × 10–12 |
|
| 156 | 2.22 × 10–11 | 2.17 × 10–12 |
See Supplementary Table S4 for details.
Top significant GO-CC term enriched (FDR sorted).
| GO-CC term name | #genes | Enriched Adjusted p-value | FDR |
|---|---|---|---|
|
| 1045 | 1.09, ×, 10–171 | 9.37 × 10–172 |
|
| 632 | 6.43 × 10–82 | 2.76 × 10–82 |
|
| 209 | 7.92 × 10–76 | 2.26 × 10–76 |
|
| 1669 | 1.38 × 10–74 | 2.96 × 10–75 |
|
| 609 | 2.06 × 10–64 | 3.53 × 10–65 |
|
| 131 | 2.87 × 10–51 | 4.10, ×, 10–52 |
|
| 1646 | 9.62 × 10–49 | 1.18 × 10–49 |
|
| 1453 | 2.65 × 10–47 | 2.84 × 10–48 |
|
| 590 | 2.30, ×, 10–43 | 2.19, ×, 10–44 |
|
| 192 | 2.92 × 10–43 | 2.50, ×, 10–44 |
See Supplementary Table S5 for details.