| Literature DB >> 35130112 |
Huan Hu1,2,3, Ruiqi Liu4, Chunlin Zhao5, Yuer Lu1, Yichun Xiong6, Lingling Chen1, Jun Jin1,2, Yunlong Ma6, Jianzhong Su3, Zhengquan Yu4, Feng Cheng1, Fangfu Ye3,7,8, Liyu Liu3,9, Qi Zhao10, Jianwei Shuai1,2,3.
Abstract
Simultaneous measurement of multiple modalities in single-cell analysis, represented by CITE-seq, is a promising approach to link transcriptional changes to cellular phenotype and function, requiring new computational methods to define cellular subtypes and states based on multiple data types. Here, we design a flexible single-cell multimodal analysis framework, called CITEMO, to integrate the transcriptome and antibody-derived tags (ADT) data to capture cell heterogeneity from the multi omics perspective. CITEMO uses Principal Component Analysis (PCA) to obtain a low-dimensional representation of the transcriptome and ADT, respectively, and then employs PCA again to integrate these low-dimensional multimodal data for downstream analysis. To investigate the effectiveness of the CITEMO framework, we apply CITEMO to analyse the cell subtypes of Cord Blood Mononuclear Cells (CBMC) samples. Results show that the CITEMO framework can comprehensively analyse single-cell multimodal samples and accurately identify cell subtypes. Besides, we find some specific immune cells that co-express multiple ADT markers. To better describe the co-expression phenomenon, we introduce the co-expression entropy to measure the heterogeneous distribution of the ADT combinations. To further validate the robustness of the CITEMO framework, we analyse Human Bone Marrow Cell (HBMC) samples and identify different states of the same cell type. CITEMO has an excellent performance in identifying cell subtypes and states for multimodal omics data. We suggest that the flexible design idea of CITEMO can be an inspiration for other single-cell multimodal tasks. The complete source code and dataset of the CITEMO framework can be obtained from https://github.com/studentiz/CITEMO.Entities:
Keywords: CITE-seq; data integration; immune system; multi-omics; single-cell
Mesh:
Year: 2022 PMID: 35130112 PMCID: PMC8824218 DOI: 10.1080/15476286.2022.2027151
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652
Figure 1.The workflow of CITEMO framework. The multimodal omics data obtained from the experiment are divided into raw transcriptome and raw ADT. They are normalized after preliminary quality control, and then applied PCA dimensionality reduction, respectively. On the one hand, the low-dimensional representations of the transcriptome and ADT are used for clustering. On the other hand, they are used for multimodal omics clustering by PCA dimensionality reduction. Finally, the clusters of transcriptome, ADT and multimodal omics are visualized using UMAP of multimodal omics.
Figure 2.Characterizing heterogeneity with CBMC sample. (A-C) UMAP visualizations of clustering results. The annotation of cell clusters was analysed by CITEMO using transcriptome modality (A), ADT modality (B) and multimodal omics (C) data. (D) Violin plots of Hmga2 gene expression in mouse cell clusters. (E) Violin plots of CD56 (up) and CD16 (down) ADT abundance in NK cell clusters. Different background colours in (D) and (E) indicate that different modalities were used for clustering analysis by CITEMO. (F) The clustering results of NK cells and Monocytes obtained by CITEMO multimodal omics on the left, and the box plots on the right showing the different ADT abundance of the NK cells markers of CD56 and CD16, the Monocytes markers of CD11c and CD14, and a proliferating marker CD45RA in six distinct clusters. (G) A feature plot of CD45RA ADT abundance with CITEMO multimodal omics UMAP shown on the left, and the density distributions of CD45RA ADT in CD4+ Memory T cells and CD4+ Naïve T cells under the indicated modalities given on the right.
Figure 3.The low-dimensional representations of the heterogeneity in the CBMC sample. (A-C) The variance of the transcriptome principal components (A), the ADT principal components (B), and the multimodal omics principal components (C) explained by each selected principal component. The variance estimation uses n_samples-1 degree of freedom. The arrows indicate the reduced dimension set by the elbow method. (D-F) Heat maps of cell clusters and selected principal components. The average value of each principal component of the cell clusters divided by transcriptome (D), ADT (E) and multimodal omics (F). (G-I) The projections of the features along the principal component PC1 direction sorted from the small to the large PC value for the gene in the transcriptome (G), the ADT (H), and the transcriptome PC and ADT PC in the multimodal omics analysis (I). (J-L) UMAP visualization of principal component PC1 for transcriptome (J), ADT (K) and multimodal omics (L), respectively. In (J-L) the circles with dashed line indicate mouse cell clusters (J) and CD4 + T cell clusters (K&H), respectively.
Figure 4.Analysis of potential co-expression cells in CBMC samples. (A) Feature plot with colour dots showing the indicated cell clusters with special co-expression. (B) Heat map of cell clusters with expressions of specific ADTs. (C-G) The ADT co-expression of CD4 and CD19 in T-B conjugates cluster (C), CD8 and CD11c in CD4 + T/Mono cluster (D), CD56 and CD16 in CD8 + T/Mono cluster (E), CD8 and CD11c in CD4+ CD8bright DP T cluster (F), and CD4 and CD8 in CD4+ CD8dim DP T cluster (G), respectively. The entropy (En) value shown in the top left corner of each figure is the corresponding co-expression entropy for each cell cluster. (H) The comparison of the evaluations of ADT combinations given by different methods, including Pearson correlation, Spearman correlation, Kendall correlation and the co-expression entropy, for cell clusters in (C-G) .
Figure 5.Multimodal omics analysis of HBMC samples. (A, B) UMAP visualizations of clustering results in HBMC sample with annotation of cell clusters analysed by CITEMO (A) and Seurat v4 (B), respectively. (C) The ADT abundance distributions of CD27 (Orange) and CD45RA (green) for DNT cells, giving two different states of DNT cells identified. (D) The ADT abundance distributions of CD28 (light green) and CD57 (blue) for CD8+ Memory T cells, giving two different states of CD8+ Memory T cells identified. (E) The ADT abundance distributions of CD27 (Orange) and CD45RA (green) for CD8+ Effector T cells, given three different states of CD8+ Effector T cells identified. (F) Different abundance distributions of CD27 ADT in CD4+ Memory T cells. (G) The box plots of different abundances of CD69 ADT in CD8+ Naive T cells, and (H) the box plots of the different abundances of CD45RA ADT in Treg cells, indicating the different cell subtypes. (I) Co-expression of CD3 and CD19 in Circulating B-T complex with the co-expression entropy of En = 0.16.