| Literature DB >> 31412637 |
Saurav Mallik1, Zhongming Zhao2,3.
Abstract
Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers ( c l = 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy ( P E ), Partition Coefficient ( P C ), Modified Partition Coefficient ( M P C ), and Fuzzy Silhouette Index ( F S I ). Next, we set the first measure as minimization objective (↓) and the remaining three as maximization objectives (↑), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices, F S I , P E , P C , and M P C for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data.Entities:
Keywords: Limma; TOPSIS; cluster validity indices; fuzzy clustering; multi-objective optimization; single cell sequencing
Mesh:
Year: 2019 PMID: 31412637 PMCID: PMC6723724 DOI: 10.3390/genes10080611
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Flowchart of the proposed analysis.
Figure 2Plots of normalization and fuzzy membership. (A) Count-depth relation plot during SCnorm normalization for the scRNA-seq gene expression data. (B) Boxplot for the fuzzy membership scores of the cells for the two resultant clusters. (C) Fuzzy membership scores of the cells for the two resultant clusters.
The cluster validity scores for the nine case studies from the scRNA-seq gene expression dataset.
| Case Study (CS) ID |
| ||||
|---|---|---|---|---|---|
| CS1 | 2 | 0.482 | 0.578 | 0.607 | 0.215 |
| CS2 | 3 | 0.543 | 0.886 | 0.482 | 0.224 |
| CS3 | 4 | 0.588 | 0.117 | 0.373 | 0.164 |
| CS4 | 5 | 0.632 | 0.139 | 0.304 | 0.130 |
| CS5 | 6 | 0.333 | 0.157 | 0.246 | 0.095 |
| CS6 | 7 | 0.364 | 0.172 | 0.215 | 0.085 |
| CS7 | 8 | 0.340 | 0.185 | 0.190 | 0.075 |
| CS8 | 9 | 0.328 | 0.197 | 0.165 | 0.061 |
| CS9 | 10 | 0.267 | 0.209 | 0.153 | 0.059 |
maximization index, minimization index.
TOPSIS optimal scores and optimal ranks for the nine case studies of fuzzy c-means clustering from the scRNA-seq gene expression dataset.
| Case Study (CS) ID |
| TOPSIS Optimal Score | Optimal Rank |
|---|---|---|---|
| CS1 | 2 | 0.858 | 1 |
| CS2 | 3 | 0.798 | 2 |
| CS3 | 4 | 0.602 | 3 |
| CS4 | 5 | 0.481 | 4 |
| CS5 | 6 | 0.236 | 5 |
| CS6 | 7 | 0.186 | 6 |
| CS7 | 8 | 0.123 | 7 |
| CS8 | 9 | 0.070 | 8 |
| CS9 | 10 | 0 | 9 |
Two resultant clusters and their participating cells after optimized fuzzy clustering () from the scRNA-seq gene expression dataset.
| Cluster ID | # Cells | Cell IDs |
|---|---|---|
| Cluster 1 | 115 | I_1, I_2, I_3, I_4, I_5, I_7, I_12, I_13, I_15, I_17, I_20, I_23, I_26, I_27, I_28, I_30, I_32, I_35, I_36, I_39, I_40, I_41, I_42, I_43, I_44, I_45, I_47, I_48, I_49, I_51, I_52, I_53, I_54, I_55, I_56, I_58, I_59, I_61, I_62, I_66, I_67, I_68, I_70, I_71, I_72, I_73, I_75, I_76, I_77, I_79, I_80, I_81, I_86, I_87, I_92, I_93, I_96, II_1, II_3, II_4, II_11, II_17, II_18, II_20, II_24, II_27, II_28, II_31, II_34, II_39, II_40, II_41, II_42, II_44, II_46, II_48, II_56, II_57, II_58, II_66, II_69, II_73, II_74, II_75, II_76, II_79, II_80, II_83, II_87, II_88, II_89, II_95, III_10, III_14, III_16, III_21, III_35, III_36, III_39, III_40, III_45, III_46, III_49, III_51, III_54, III_55, III_56, III_59, III_68, III_74, III_79, III_82, III_84, III_89, III_95 |
| Cluster 2 | 91 | I_6, I_8, I_9, I_10, I_14, I_16, I_19, I_21, I_22, I_24, I_25, I_29, I_37, I_38, I_50, I_57, I_64, I_65, I_69, I_78, I_82, I_83, I_84, I_85, I_88, I_89, I_91, I_94, I_95, II_2, II_5, II_6, II_8, II_9, II_10, II_12, II_13, II_14, II_15, II_19, II_21, II_23, II_26, II_30, II_33, II_36, II_37, II_47, II_51, II_52, II_53, II_54, II_59, II_62, II_63, II_64, II_67, II_68, II_70, II_72, II_77, II_78, II_85, II_93, III_1, III_8, III_17, III_23, III_25, III_28, III_29, III_33, III_34, III_38, III_47, III_48, III_58, III_64, III_66, III_67, III_70, III_71, III_72, III_73, III_75, III_78, III_81, III_83, III_87, III_88, III_91 |
Figure 3Plots for multi-objective optimization and Principal Component Analysis (PCA). (A) Multi-objective optimization (TOPSIS) score for different cluster sizes (nine case studies: ) using Fuzzy c-means clustering. (B) The cluster plot (PCA plot) of the optimized fuzzy clustering along with their participating cells.
Figure 4Plot of rankwise Bonferroni adjusted p-values for the differentially expressed genes.
Top ten KEGG pathways enriched with the differentially expressed genes.
| KEGG Pathway | Count | Bonferroni | Gene Symbols | |
|---|---|---|---|---|
| mmu03010:Ribosome | 94 | 3.91 | 1.08 |
|
| mmu03040:Spliceosome | 54 | 3.72 | 1.02 |
|
| mmu01130:Biosynthesis of antibiotics | 52 | 5.87 | 1.61 |
|
| mmu03050:Proteasome | 21 | 3.95 | 1.09 |
|
| mmu01200:Carbon metabolism | 33 | 5.49 | 1.51 |
|
| mmu00010:Glycolysis /Gluconeogenesis | 23 | 3.32 | 9.13 |
|
| mmu01100:Metabolic pathways | 168 | 7.42 | 2.04 |
|
| mmu00480:Glutathione metabolism | 20 | 1.51 | 4.16 |
|
| mmu05204:Chemical carcinogenesis | 26 | 3.58 | 9.85 |
|
| mmu01230:Biosynthesis of amino acids | 22 | 2.30 | 6.33 |
|
Top five Gene Ontology (GO) terms in each GO domain enriched in differentially expressed genes.
| Gene Ontology | Count | Bonferroni Correction | Gene Symbols | |
|---|---|---|---|---|
| GO:BP | 145 | 3.83 | 1.34 |
|
| GO:BP | 67 | 2.68 | 9.38 |
|
| GO:BP | 75 | 3.36 | 1.18 |
|
| GO:BP | 101 | 2.51 | 8.79 |
|
| GO:BP | 25 | 1.07 | 3.88 |
|
| GO:CC | 151 | 1.35 | 9.24 |
|
| GO:CC | 401 | 3.28 | 2.26 |
|
| GO:CC | 101 | 3.93 | 2.70 |
|
| GO:CC | 50 | 1.34 | 9.19 |
|
| GO:CC | 148 | 1.01 | 6.91 |
|
| GO:MF | 295 | 5.08 | 6.64 |
|
| GO:MF | 104 | 2.15 | 2.81 |
|
| GO:MF | 165 | 5.82 | 7.61 |
|
| GO:MF | 46 | 6.85 | 8.95 |
|
| GO:MF | 64 | 4.66 | 6.09 |
|
Biological Process, Cellular Components, Molecular Function.
Evaluation of the top ten gene markers through literature evidence, KEGG pathway and Gene Ontology analyses.
| Gene | Literature Evidence | KEGG Pathway & Gene Ontology Terms | Status |
|---|---|---|---|
| (Connected with) | |||
|
| Biological functions: artificial nucleic acid molecules [ | Known | |
|
| Solute carriers [ | Known | |
|
| Known | ||
|
| Artificial nucleic acid molecules [ | Known | |
|
| Artificial nucleic acid molecules [ | Known | |
|
| Arterial vasculature [ | Known | |
|
| - | Known | |
|
| Different biological functions: proteomic analysis [ | Known | |
|
| Hepatocellular cellular carcinoma [ | - | Known |
|
| - | - |
|