Literature DB >> 35860411

Single-cell entropy network detects the activity of immune cells based on ribosomal protein genes.

Qiqi Jin1,2,3, Chunman Zuo1, Haoyue Cui1,2,3, Lin Li1, Yiwen Yang1,2,3, Hao Dai1, Luonan Chen1,2,4,5.   

Abstract

We developed a new computational method, Single-Cell Entropy Network (SCEN) to analyze single-cell RNA-seq data, which used the information of gene-gene associations to discover new heterogeneity of immune cells as well as identify existing cell types. Based on SCEN, we defined association-entropy (AE) for each cell and each gene through single-cell gene co-expression networks to measure the strength of association between each gene and all other genes at a single-cell resolution. Analyses of public datasets indicated that the AE of ribosomal protein genes (RP genes) varied greatly even in the same cell type of immune cells and the average AE of RP genes of immune cells in each person was significantly associated with the healthy/disease state of this person. Based on existing research and theory, we inferred that the AE of RP genes represented the heterogeneity of ribosomes and reflected the activity of immune cells. We believe SCEN can provide more biological insights into the heterogeneity and diversity of immune cells, especially the change of immune cells in the diseases.
© 2022 The Author(s).

Entities:  

Keywords:  Association-entropy; Immune cell; Ribosomal protein gene; Single-cell RNA-seq; Single-cell entropy network

Year:  2022        PMID: 35860411      PMCID: PMC9287362          DOI: 10.1016/j.csbj.2022.06.056

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   6.155


Introduction

The advances of single-cell transcriptomics help researchers reveal the heterogeneity and functional diversity among cell populations and discover novel cell types [1], [2], [3], [4]. However, existing methods usually focus on the differential expression of genes, but ignore the heterogeneity of gene-gene associations. Theoretically, single-cell RNA sequencing (scRNA-seq) data not only explicitly describe the gene expressions at a single-cell resolution, but also implicitly contain the information on the gene-gene associations [5]. In this work, we propose a new computational method Single-Cell Entropy Network (SCEN) to analyze scRNA-seq data, which constructs a gene association network for each single cell (i.e. one network for one cell), where nodes are genes and edges are gene-gene associations measured by mutual information. Based on SCEN, we can define association-entropy (AE) for each cell to measure the association strength of gene coexpression at single-cell resolution. The higher AE level indicates that the genes are more strongly-associated with each other in a single cell. From dynamical viewpoint, gene expressions are the variables of a nonlinear biological system which change dynamically with time and conditions due to various fluctuations or noises, but their gene associations/regulations are relatively stable [6], [7]. Thus, gene associations can characterize the features of single cell more reliably, which may help us find new cell heterogeneity that is ignored by traditional gene expression analysis. In this study, we found the AE of ribosomal protein genes (RP genes) varied greatly even in the same cell type of immune cells. Ribosomal proteins are the main components of ribosomes, which act as a translational machine in cells related to protein expression. Some studies found the heterogeneity of ribosomes [8], and different ribosomes have different mRNA selectivity and produce different proteins [9], [10], e.g. Rps26-deficient ribosomes preferentially translate the mRNA from stress-response pathways [11]. Based on the definition of AE, we think AE of RP genes can reflect the heterogeneity of ribosomes. High AE of RP genes indicates that these RP genes are strongly associated and represent normal ribosomes, while low AE of RP genes represent defective ribosomes. Furthermore, we found that the average AE of RP genes of immune cells in each person was significantly associated with the healthy/disease state of this person, which validated the biological significance of AE of RP genes. Based on existing research and theory [12], [13], [14], we inferred that the AE of RP genes might reflect the activity of immune cells. Usually, immune cells will be activated by virus infection or other external stimuli to kill infected cells before progeny viruses are released, i.e. the activity of immune cells will rise after infection. This change is rapid and reversible, and is related to the heterogeneity of ribosomes. Thus, as a new method of single-cell sequencing data bioinformatics tool, we believe SCEN can provide more biological insights into the heterogeneity and diversity of immune cells, especially the change of immune cells in the diseases.

Methods

In this paper, we proposed a new method “Single-Cell Entropy Network” (SCEN) to analyze single-cell RNA-seq data (Fig. 1).
Fig. 1

Schematic illustration of SCEN method. (a) First, make a scatter diagram for every two genes based on the gene expression profile comprised of m genes and n cells, where points = cells, and then m genes can produce m*(m −1)/2 scatter diagrams. (b) In the scatter diagram of genes × and y, make a light orange box and a dark orange box near the cell k (red plot) to represent the neighborhood of x and y respectively. The number of plots/cells in the two boxes is n() and n() respectively, and the number of plots/cells in the intersection of the two boxes is n(). Then, calculate MI(x, y) and MI()(x, y). (c) Calculate w()(x, y) that represents the weight of edge x-y in cell-k network, and construct n single-cell networks. (d) Calculate the association entropy (AE) of each gene in each cell, and get an AE matrix. (e) Based on SCEN method, we can discover “dark” genes ignored by original gene expression matrix, characterize network heterogeneity at a single-cell level, and distinguish the active or resting state of immune cells. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

From the gene expression matrix comprised of m genes and n cells, we first make m*(m-1)/2 scatter diagrams, where each scatter diagram includes n points/cells, and the x-axis and y-axis represent the expression levels of two genes in the n cells. Then, from the scatter diagram of genes × and y, we could calculate the mutual information of the two genes based on Eqn. (2). where p(X) and p(Y) are marginal probability distributions of genes × and y respectively, p(X,Y) is their joint probability distribution function, X and Y are expressions of genes × and y, and Ω is the domain of definition. To estimate MI(x, y), we substitute the probability by the frequency numerically. As shown in Fig. 1, we draw light orange box that covers the plot k (or cell k) to represent the neighborhood of x. The width of the box is large just enough to make the box cover 0.1n points (0.05n points on the left of x and 0.05n points on the right of x), and thus the number of points in the neighborhood of x is fixed to 0.1n (n() = 0.1n). Similarly, we draw dark orange box to represent the neighborhood of y, and n() is also equal to 0.1n. Then, if the intersection of the previous two boxes cover n() points, mutual information of genes × and y for all n cells could be approximately calculated by Eqn. (3). Then, we define the contribution of cell k in MI(x, y) by decomposing Eqn. (3) into Eqn. (4). If two variables are correlated in some points and uncorrelated in the others, the former points give positive contributions to MI(x, y) and thus MI()(x, y) > 0, while the latter points give zero contributions or noises to MI(x, y) and thus MI()(x, y) ≤ 0 (Supplementary Note 1). The parameter 0.1 could be substituted by other values, but we find that the result is not sensitive to this parameter in the range of 0.05 to 0.2 based on the test of several public datasets, and 0.1 seems to be the best (Supplementary Note 2). Based on Eqns. (3), (4), we could construct a network for each cell, where nodes = genes and edges = gene-gene associations. By the normalization, the weight of edge x-y in cell-k network with m genes/variables is defined as. Specially, we require w()(x, y) ≥ 0 based on Eqn. (5) to make the weight of each edge non-negative. In addition, we set w()(x, y) = 0 artificially when x = 0 or y = 0, because most zeros in the gene expression matrix come from the technique problems in single-cell RNA-seq, which is meaningless in biology and makes the wrong estimation in gene association analysis. Thus, we construct a single-cell entropy network for each cell k based on Eqn. (5), from scRNA-seq data. At last, by Eqn. (5) we obtain n single-cell networks for the n cells. Based on these networks, we could find the network heterogeneity at a single-cell level. Furthermore, we could define the association-entropy (AE) of gene × in cell k based on Eqn. (6). AE is the degree of gene × in cell-k network. Thus, we can construct an association-entropy matrix (AE matrix) with AE as an element. Clearly, the AE matrix has the same number of rows and columns as the original gene expression matrix whose element is gene expression, and thus any traditional algorithm for gene expression matrix can be adopted for AE matrix, i.e. to analyze cell clustering and pseudotime trajectory. It should be noted that AE is different from the traditional definition of entropy. In this paper, AE represents the summation of mutual information. Intuitively, the higher AE level means this gene in the cell is associated with other genes more strongly due to its higher MI in the single-cell network, so high AE level represents the state of high order and low entropy (traditional definition). As AE quantifies the importance of genes from a network viewpoint, the analyses of AE matrix might help to find more features that is ignored by traditional gene expression analysis. Note that We can similarly construct the direct gene association network or single-cell entropy direct-network in the form of conditional mutual information [1], [2], [3], [4].

Results

SCEN shows better clustering performance

In this paper, we proposed a new method Single-Cell Entropy Network (SCEN) to analyze single-cell RNA-seq data (Fig. 1, Supplementary Note 1, 2). Based on SCEN, we can obtain n single-cell networks for the n cells, and further obtain an association-entropy (AE) matrix that has the same number of rows and columns as the original gene expression matrix. Thus, any traditional scRNA-seq algorithm for gene expression matrix can be adopted for our AE matrix to analyze cell clustering. To validate the performance of SCEN method, we collected seven high-quality scRNA-seq datasets with clear classification labels from literatures [15], [16], [17], [18], [19], [20] and performed cell clustering analysis on both gene expression matrix and AE matrix. By comparing the clustering result with the known classification labels, we could evaluate the clustering performances. As shown in Table 1, obviously our AE matrix is superior to original gene expression matrix on various datasets by various clustering algorithms [21], [22], [23]. In fact, the best result of each dataset is all from our AE matrix, which demonstrates the advantage of SCEN method. More details of clustering analysis are listed in Supplementary Note 3.
Table 1

Clustering performance of gene expression matrix and AE matrix, evaluated by adjusted rand index (ARI). Hierarchical (t-SNE) and k-means (t-SNE) represent that the clustering is performed after dimension-reduction by t-SNE.

BuettnerKolodziejczykDarmanisChu-typeChu-timeKimTrapnell
HierarchicalExpression0.480.490.630.750.670.660.08
AE0.860.990.640.750.770.670.23
k-meansExpression0.310.530.580.730.590.600.14
AE0.670.900.710.750.670.750.42
Hierarchical (t-SNE)Expression0.320.990.670.980.680.660.16
AE0.671.000.820.990.690.950.47
k-means (t-SNE)Expression0.300.990.650.980.690.720.16
AE0.701.000.820.990.690.890.47
SIMLRExpression0.920.990.750.740.660.970.21
AE0.981.000.880.750.700.950.28
SeuratExpression0.270.730.860.950.720.770.17
AE0.620.840.820.760.690.980.51
Schematic illustration of SCEN method. (a) First, make a scatter diagram for every two genes based on the gene expression profile comprised of m genes and n cells, where points = cells, and then m genes can produce m*(m −1)/2 scatter diagrams. (b) In the scatter diagram of genes × and y, make a light orange box and a dark orange box near the cell k (red plot) to represent the neighborhood of x and y respectively. The number of plots/cells in the two boxes is n() and n() respectively, and the number of plots/cells in the intersection of the two boxes is n(). Then, calculate MI(x, y) and MI()(x, y). (c) Calculate w()(x, y) that represents the weight of edge x-y in cell-k network, and construct n single-cell networks. (d) Calculate the association entropy (AE) of each gene in each cell, and get an AE matrix. (e) Based on SCEN method, we can discover “dark” genes ignored by original gene expression matrix, characterize network heterogeneity at a single-cell level, and distinguish the active or resting state of immune cells. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Clustering performance of gene expression matrix and AE matrix, evaluated by adjusted rand index (ARI). Hierarchical (t-SNE) and k-means (t-SNE) represent that the clustering is performed after dimension-reduction by t-SNE.

SCEN discovers new heterogeneity of immune cells marked by ribosomal protein genes

In this work, we performed our SCEN method on a public dataset of 2,700 Peripheral Blood Mononuclear Cells (PBMCs) made by 10X Genomics, which was a sample dataset in Seurat tutorial (https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html). As shown in Fig. 2a, clustering analysis of our AE matrix identifies all known cell types that are also identified by gene expression matrix, but our method detects two new subgroups of cells, RP + Monocyte and RP + T cell by Seurat 3.1 [23]. RP + Monocyte is a subgroup of CD14 + Monocyte, and RP + T cells come from three types of T cells (CD8 + T cells, naïve CD4 + T cells and memory CD4 + T cells) (Supplementary Note 4).
Fig. 2

Performance of SCEN on the PBMC dataset with 2700 cells. (a) Clustering based on AE matrix can identify two new groups of cells, RP + Monocyte and RP + T cell. (b) AE and gene expression of some RP genes. Plots are colored by log(1 + AE) or log(1 + normalized counts). (c) Differential AE genes (DAEGs) and differential expression genes (DEGs) analyses identify 176 “dark” genes (DAEGs but not DEGs) between RP + Monocytes and other CD14 + Monocytes, and 228 “dark” genes between RP + T cells and other T cells. Some translation-related GOs are enriched from these “dark” genes. (d) 4 single-cell entropy networks from 4 single cells with different sources, where the nodes are some RP genes.

Performance of SCEN on the PBMC dataset with 2700 cells. (a) Clustering based on AE matrix can identify two new groups of cells, RP + Monocyte and RP + T cell. (b) AE and gene expression of some RP genes. Plots are colored by log(1 + AE) or log(1 + normalized counts). (c) Differential AE genes (DAEGs) and differential expression genes (DEGs) analyses identify 176 “dark” genes (DAEGs but not DEGs) between RP + Monocytes and other CD14 + Monocytes, and 228 “dark” genes between RP + T cells and other T cells. Some translation-related GOs are enriched from these “dark” genes. (d) 4 single-cell entropy networks from 4 single cells with different sources, where the nodes are some RP genes. Differential expression genes (DEGs) and differential AE genes (DAEGs) analyses indicate that the AE of ribosomal protein genes (RP genes) is much higher in RP + cells than other cells, and thus we marked the two new subgroups with “RP+” (Fig. 2b). As shown in Fig. 2c, the number of DAEGs is much larger than DEGs, and DAEGs cover most DEGs, which implies that the major differences between RP + T cells/Monocytes and other T cells/Monocytes are in AE instead of gene expression. We define the DAEGs that do not belong to DEGs as “dark” genes, i.e. non-differential expression genes but differential AE genes. There are 176 “dark” genes in RP + Monocyte and 228 “dark” genes in RP + T cell. GO analyses show that these “dark” genes are enriched to translation-related GOs including translation initiation, mRNA catabolic process and co-translational protein targeting to membrane (Supplementary Note 5), demonstrating the major differences on the regulation of translation level between RP + cells and other cells. Furthermore, we constructed single-cell networks by our SCEN method. We selected 4 cells from 4 different sources and constructed the networks of RP genes. Fig. 2d clearly demonstrates that the associations of RP genes in RP + cells are much stronger than other cells, which implies that the ribosomes are different between RP + cells and other cells, i.e. the ribosomes in RP + cells are complete while the ribosomes in other cells may be defective comparatively. Besides the dataset of 2,700 PBMCs, we also performed SCEN method on other public datasets including PBMC-68 k dataset [24] (containing about 68,000 PBMCs, NCBI SRA: SRP073767), UCB dataset [25] (containing about 19,000 human umbilical cord blood cells, NCBI BioProject: PRJNA524398) and Melanoma tumor dataset [26] (containing 2,887 immune cells from melanoma tumor, NCBI GEO: GSE72056). As shown in Fig. 3, although the expressions of the RP genes do not show obvious difference, it is clear that the AE of RP genes varied greatly in the same cell type, and RP + cells are universal in the immune cells from various sources.
Fig. 3

The wide existence of RP + cells in (a) Peripheral blood mononuclear cells, (b) Human umbilical cord blood cells, and (c) Immune cells from melanoma tumor tissue. Clustering performance, AE of all RP genes and expression of all RP genes are illustrated.

The wide existence of RP + cells in (a) Peripheral blood mononuclear cells, (b) Human umbilical cord blood cells, and (c) Immune cells from melanoma tumor tissue. Clustering performance, AE of all RP genes and expression of all RP genes are illustrated. To confirm whether RP + cells represent a new cell type or not, we inspect the expression levels of all marker genes of classical cell types, and the results indicate that RP + cells do not express any known marker genes specifically (Supplementary Note 6). We also analyze the differential expression genes of RP + cells, and we cannot find any gene that is only expressed in RP + cells. In addition, we find the cells with low AE of RP genes highly express MHC class I molecules (T cell) and MHC class II molecules (Monocyte). In view of the key role of MHC molecules in immune response and antigen presentation, we do not consider RP + cells as a new cell type, but reflect the active or inactive state of immune cells.

RP + cells are in the initial stage of pseudo-trajectory

In this work, we performed pseudo-trajectory analyses on T cells and Monocytes from the human umbilical cord blood (UCB) dataset [25] (Fig. 4). As shown in Fig. 4a-4d, Monocytes and T cells are re-clustered separately, and trajectory analysis is performed using Monocle2 [27].
Fig. 4

The characteristics of RP + cells in UCB dataset. (a, d) Clustering and pseudo-trajectory results of Monocytes or T cell based on AE matrix, as well as the dynamic change of RP genes along the pseudotime. (b) Clustering of all UCB cells based on AE matrix. (c) AE levels of RP genes. (e) The network among 5 KEGG pathways in 4 different T cell subtypes.

The characteristics of RP + cells in UCB dataset. (a, d) Clustering and pseudo-trajectory results of Monocytes or T cell based on AE matrix, as well as the dynamic change of RP genes along the pseudotime. (b) Clustering of all UCB cells based on AE matrix. (c) AE levels of RP genes. (e) The network among 5 KEGG pathways in 4 different T cell subtypes. We find that the RP + cells are located in the beginning of the trajectory where the AE of RP genes presents a downward trend along the trajectory. In Fig. 4a, for Monocytes, the trajectory starts from RP + Monocytes and ends up with Monocytes highly expressing CD69, also known as a marker of Monocyte activation [28]. For T cells, the trajectory starts from RP + T cells, follows by CD4 TCM and ends up with TEM (Fig. 4d). It has been reported that in naïve T cells, RP genes are highly expressed and the ribosomal protein are preferentially translated for the preparation of the upcoming translation activities in cell proliferation [29]. Hence, we suppose that the RP + cells in our work may be also related to the translation activities. To demonstrate this, we performed enrichment analysis based on the differential AE genes (DAEGs) between the cells on both ends of the trajectory. Besides the translation-related GOs, we find some cytokine-mediated signaling pathways, such as IL-17 signaling pathway and TNF-alpha / NF-kappa B signaling pathway (Supplementary Note 7). This result implies that the RP level of cells may be related to the external stimulus. We also selected five representative pathways and analyze their associations. We find the associations are the strongest in RP + cells, which indicates the relationship between ribosome biogenesis / assembly and T cell activation / differentiation / proliferation (Fig. 4e). In addition, we also performed pseudo-trajectory analyses and GO analyses on PBMC-68 k dataset [24] (Supplementary Note 8) and melanoma tumor dataset [26] (Supplementary Note 9), and similar conclusion can be obtained.

AE of RP genes is associated with diseases and reflects the activity of immune cells

In this work, we collected four public datasets related with diseases, and each dataset contains multiple samples that were taken from healthy donors or patients. For each sample/individual, scRNA-seq produced a gene expression matrix, and we performed SCEN method based on this matrix to produce an AE matrix. Then we calculated RP level (average AE of RP genes) for each sample/individual based on the definition of Eqn. (1). We found RP level is significantly associated with diseases, while AE of RP genes is able to represent the activity of immune cells.where represents the AE of RP gene i in cell j, m is the number of RP genes and n is the number of cells. We first analyzed a COVID-19 dataset that contains 135,600 cells derived from 24 donors (32 samples) (FigShare: ) [30], including 5 samples from healthy people, 14 samples from moderate patients infected by COVID-19 and 13 samples from critical patients. Each sample includes various cell types that can be classified into immune cells and epithelial cells. As shown in Fig. 5a, RP levels of immune cells have the largest values in healthy people, the moderate values in moderate patients, and the smallest values in critical patients. As there are more active immune cells in patients than in healthy people, there seems to be negative correlation between AE of RP genes and the activity of immune cells. In addition, we can find the significant difference of RP level in lung epithelial cells between healthy people and COVID-19 patients. This result is of course as lung epithelial cells are the target of COVID-19.
Fig. 5

The associations between RP levels and diseases: (a) COVID-19, (b) Lung adenocarcinoma (LUAD) (nLN = normal lymph node; mLN = metastatic lymph node; nLung = distant normal lung tissue; tLung = primary tumor tissue; tL/B = tumor tissue from advanced stage LUAD; mBrain = metastatic brain tissue; PE = pleural fluid), (c) Immune compromised (TSCC = immune compromised organ transplant recipients; SCC = immune competent patients with cutaneous squamous cell carcinoma tumors), (d) multiple myeloma (NBM = healthy donors; MGUS = monoclonal gammopathy of undetermined significance; SMM = smoldering myeloma; MM = full-blown multiple myeloma). RP level of each sample was illustrated in box plots (center line = median; box limits = upper (3/4) and lower (1/4) quartiles; whiskers = 1.5 × interquartile range).

The associations between RP levels and diseases: (a) COVID-19, (b) Lung adenocarcinoma (LUAD) (nLN = normal lymph node; mLN = metastatic lymph node; nLung = distant normal lung tissue; tLung = primary tumor tissue; tL/B = tumor tissue from advanced stage LUAD; mBrain = metastatic brain tissue; PE = pleural fluid), (c) Immune compromised (TSCC = immune compromised organ transplant recipients; SCC = immune competent patients with cutaneous squamous cell carcinoma tumors), (d) multiple myeloma (NBM = healthy donors; MGUS = monoclonal gammopathy of undetermined significance; SMM = smoldering myeloma; MM = full-blown multiple myeloma). RP level of each sample was illustrated in box plots (center line = median; box limits = upper (3/4) and lower (1/4) quartiles; whiskers = 1.5 × interquartile range). We next analyzed a public scRNA-seq dataset (GSE131907) that contains 208,506 cells derived from 44 patients pathologically diagnosed with lung adenocarcinoma (LUAD) (58 samples) [31], which cover 7 metastatic lymph node samples (mLN), 10 normal lymph node samples (nLN), 4 lung tumor tissue samples from advanced stage LUAD patients (tL/B), 11 primary tumor tissue samples (tLung), 11 distant normal lung tissue samples (nLung), 5 pleural fluid samples (PE) from LUAD patients with malignant pleural effusion, and 10 metastatic brain tissue samples (mBrain). The extensive single cell profiles include cancer, stromal, and immune cells in the surrounding tumor microenvironments. As shown in Fig. 5b, in T cells, B cells and myeloid cells, RP level in normal lymph nodes is much lower than that in metastatic lymph nodes, suggesting the great influences of cancer cells on immune cells, which is consistent with the existing studies [32]. In cancer patients, immune evasion is a common phenomenon [33], [34], [35], i.e. the immune cells cannot attack tumor cells. Thus, there are less active immune cells in metastatic lymph nodes than normal lymph nodes. And we can still find the negative correlation between AE of RP genes and the activity of immune cells, which is consistent with the previous conclusion in COVID-19 dataset. In addition, in T cells and B cells, the RP levels of tumor lung tissues and distant normal lung tissues are almost the same, and are both significantly higher than those in normal lymph nodes, but similar to those in metastatic lymph nodes. This result suggests that even if the tumor cells have not metastasized, perhaps they have inhibited the activity of immune cells in a wide range. The third dataset (GSE145328) is from a study that reveals T-cell landscape differences between cutaneous squamous cell carcinoma tumors in immune competent (SCC in IC) and immune-compromised organ transplant recipients (TSCC in OTR) [36]. This dataset contained 32,738 cells derived from 5 SCC samples and 6 TSCC samples. As shown in Fig. 5c, RP levels in TSCC samples from immune-compromised patients are significantly higher than those in SCC samples from immune competent patients. This result is also consistent with the previous conclusion that there is negative correlation between AE of RP genes and the activity of immune cells. Fig. 5d illustrates the change of RP level during the progression of multiple myeloma (MM) [37], which is a cancer derived from B cells. The dataset (GSE124310) contains 11,450 immune cells from varying stages of MM progression: from healthy stage (NBM, 3 samples), to precursor stages: monoclonal gammopathy of undetermined significance (MGUS, 4 samples) and smoldering myeloma (SMM, 5 samples), to full-blown MM (6 samples). We can see that the RP level is associated with the progress of MM. RP levels are the highest from the cells in healthy donors, is moderate from the precursor stages MGUS and SMM, and is the lowest from full-blown MM. This result demonstrates the association between disease progression and AE of RP genes again, and gives an insight into the early immune changes of multiple myeloma. In summary, the average AE of RP genes of immune cells in each person is significantly associated with the healthy/disease state of this person, which validated the biological significance of AE of RP genes. Especially, we find the negative correlation between AE of RP genes and the activity of immune cells. It seems that high AE of RP genes (represent normal ribosomes) is associated with the inactive or resting state of immune cells, and low AE of RP genes (represent defective ribosomes) is associated with the active state. This inference is consistent with DRiPs hypothesis [13], [14]. More statistical details of these results are provided in Supplementary Note 10.

Discussion

Traditional scRNA-seq analyses are based on gene expression profiles, which aim to find new cell types with unique functions. In this paper, we present a new method SCEN to analyze scRNA-seq data based on gene-gene associations. From a dynamical viewpoint, gene associations can more reliably and stably characterize the feature/state of each cell, which may help us find the cell diversity that is ignored by traditional gene expression analysis. Based on SCEN, we define association-entropy (AE) for each cell to measure the association strength of gene coexpression at single-cell resolution, and we find the AE of ribosomal protein genes (RP genes) varied greatly even in the same cell type of immune cells. In order to rule out the possibility that the changes of RP genes are randomly arisen from the use of particular technological platforms, we further explored the relationship between the AE of RP genes and diseases. We found the average AE of RP genes of immune cells in each person is significantly associated with the healthy/disease state of this person, while AE of RP genes is not associated with the sex or age of donors. Thus, the possibility that technical artifacts lead to the changes of RP genes is considered to be quite low. In traditional opinion, ribosomes act as a translation machine in cells related to protein expression, in which ribosomal proteins (RP) and ribosomal RNA (rRNA) are the main components. However, more and more studies have found that ribosomes are also involved in DNA repair, regulation of cell development and stress response [38], [39], [40]. Moreover, the heterogeneity of ribosomes has been confirmed. Genuth et al. [8] used the state-of-the-art quantitative mass spectrometry measurements to measure the absolute abundance of 15 of the 80 core RPs in mouse embryonic stem cells, and found that 4 of the 15 RPs existed in only 60–70% ribosomes. This result indicates that ribosomes are different depending on the composition of RPs, and some ribosomes are not complete but deficient. In fact, even small changes of RPs could have large effects on translation given that millions of ribosomes are present in a single cell. Further studies found that different ribosomes have different physiological functions, depending on mRNA selectivity [9], [10]. For example, Ferretti et al. [11] found that Rps26 contributes to mRNA-specific translation by recognition of the Kozak sequence in well-translated mRNAs, and Rps26-deficient ribosomes preferentially translate the mRNA from stress-response pathways. This result confirms the relationship between RP composition and protein translation. Therefore, we infer that AE of RP genes represents the heterogeneity of ribosomes based on existing research and theory. The relationship between immune response and ribosomes has been concerned for twenty years [13]. Kinetic assays indicated that CD8 + T cells are activated only one hour after adding influenza viruses to cells [41], [42]. This rapid response is essential to kill infected cells before progeny viruses are released. However, from the kinetics of protein translation, cells are impossible to produce sufficient level of viral peptides and present on the cell surface to activate T cells [43], [44], so there must be a special mechanism for rapid immune response. Many studies have confirmed a unique translation mechanism, including CUG and other non AUG codes can be used to initiate translation of antigenic peptides [45]; CUG initiation of antigenic peptides uses eIF2A in place of eIF2 as translation initial factor [46]; the CUG initiation pathway is activated by pro-inflammatory stimuli, including influenza A virus (IAV) infection [47]. Especially, Lee et al. [12] demonstrated special relevance of ribosome heterogeneity to viral infections by knocking down each RP in cultured cells: 61 RPs were required for cell viability, while 8 RPs were required for vesicular stomatitis virus infection but not cell viability. Yewdell [13], [14] proposed a hypothesis of immunoribosomes, which suggested that antigenic peptides derived from defective ribosomal products (DRiPs), i.e. there is a special ribosome that is particularly good at synthesizing antigens. Recently, with the advent of ribosome analysis data, DRiPs has been recognized as an important source of tumor-specific antigens [48], and plays a critical role in immunesurveillance of viruses [49]. The non-canonical translation of mRNA by ribosomes is the major cause of DRiPs. Wei et al. knocked down 80 ribosomal proteins separately and found that 14 ribosomal proteins can regulate specific peptides or alter MHC class I expression. Knockdown of RPS28 increased the total number of peptides in cells by promoting DRiPs synthesis from non-canonical translations of “untranslated” regions (UTRs) and non-AUG start codons in mRNA, and enhanced the ability of CD8 + T cells to kill melanoma cells [50]. These results demonstrated that ribosomal protein mutations are closely related to DRiPs synthesis. Patel et al. found that RPL23-pertubed cells can resist T-cell mediated lysis, demonstrating that RPL23 negatively regulates CD8 + T cells to kill melanoma cells [51]. These previous findings support the relevance of ribosomal proteins to immunesurveillance. In this study, we clearly illustrate the negative correlation between AE of RP genes and the activity of various immune cells, which implies that the deficient ribosomes (low AE of RP genes) are more enriched in active immune cells. This result is consistent with the DRiPs hypothesis and provides new evidence for the relationship between ribosome heterogeneity and immune response. In addition, our analysis also found that there was no significant difference on transcriptomic level between the immune cells of high AE of RP genes and low AE of RP genes. This result is also consistent with the experimental evidence that knockdown of several RPs had minor effects on the transcriptome, which suggests that the effects of RPs on antigen presentation may arise directly from the affected ribosomes rather than the downstream “ribosomal stress” effects [52]. Generally, it is hard to measure gene expression and detect the activity of immune cells simultaneously at a single-cell level by current experimental technology, but our network-based method opens a new way, which is only based on scRNA-seq data. As the activity of immune cells is strongly associated with many diseases, our method provides a new tool to discover the change of immune cells during the disease progression. Clinically, human immunotherapy urgently requires more detailed characterization of immune cell response at the level of target peptides. As a new bioinformatics tool for single-cell RNA-seq data, SCEN method can predict tumor-specific ribosomal protein alterations for individual patients, which provides a new perspective to the improvement of immunotherapy.

Conclusions

In this paper, we present a new network-based method SCEN for scRNA-seq analysis. SCEN can identify all known cell types that have been discovered by traditional gene-expression-based analysis. Furthermore, SCEN can detect the activity of immune cells based on AE of RP genes that characterizes the heterogeneity of ribosomes and implicitly reflects the differences at the protein level. Analyses of several public datasets on immune cells indicate that the average AE of RP genes for each person is significantly associated with the healthy/disease state of this person, and AE of RP genes is negatively correlated with the activity of immune cells. We believe that SCEN method provides a new bioinformatics tool to measure gene expression and detect the activity of immune cells simultaneously at single-cell level. In addition to single-cell data, clearly SCEN can be directly applied to bulk tissue data to construct single-sample gene entropy/association network for network biomarker analysis [53], [54] and dynamic network biomarker (DNB) analysis [55], [56], [57], [58], [59]. And SCEN method may provide more biological insights into the heterogeneity of single cells and discover further other biological processes beyond gene expression.

Availability

More details of our SCEN method are illustrated in the Supplementary Information. The source code in this work is available in the Hao Dai’s repository: , which can be only used for learning and communication.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  57 in total

1.  The DRiP hypothesis decennial: support, controversy, refinement and extension.

Authors:  Jonathan W Yewdell; Christopher V Nicchitta
Journal:  Trends Immunol       Date:  2006-07-11       Impact factor: 16.687

Review 2.  Defective ribosomal products (DRiPs): a major source of antigenic peptides for MHC class I molecules?

Authors:  J W Yewdell; L C Antón; J R Bennink
Journal:  J Immunol       Date:  1996-09-01       Impact factor: 5.422

3.  Major histocompatibility class I molecules can present cryptic translation products to T-cells.

Authors:  N Shastri; V Nguyen; F Gonzalez
Journal:  J Biol Chem       Date:  1995-01-20       Impact factor: 5.157

Review 4.  Intracellular protein degradation in mammalian and bacterial cells: Part 2.

Authors:  A L Goldberg; A C St John
Journal:  Annu Rev Biochem       Date:  1976       Impact factor: 23.643

5.  A ribosome-specialized translation initiation pathway is required for cap-dependent translation of vesicular stomatitis virus mRNAs.

Authors:  Amy Si-Ying Lee; Rebeca Burdeinick-Kerr; Sean P J Whelan
Journal:  Proc Natl Acad Sci U S A       Date:  2012-11-19       Impact factor: 11.205

6.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation.

Authors:  Aleksandra A Kolodziejczyk; Jong Kyoung Kim; Jason C H Tsang; Tomislav Ilicic; Johan Henriksson; Kedar N Natarajan; Alex C Tuck; Xuefei Gao; Marc Bühler; Pentao Liu; John C Marioni; Sarah A Teichmann
Journal:  Cell Stem Cell       Date:  2015-10-01       Impact factor: 24.633

7.  Ribosome biogenesis during cell cycle arrest fuels EMT in development and disease.

Authors:  Varsha Prakash; Brittany B Carson; Jennifer M Feenstra; Randall A Dass; Petra Sekyrova; Ayuko Hoshino; Julian Petersen; Yuan Guo; Matthew M Parks; Chad M Kurylo; Jake E Batchelder; Kristian Haller; Ayako Hashimoto; Helene Rundqivst; John S Condeelis; C David Allis; Denis Drygin; M Angela Nieto; Michael Andäng; Piergiorgio Percipalle; Jonas Bergh; Igor Adameyko; Ann-Kristin Östlund Farrants; Johan Hartman; David Lyden; Kristian Pietras; Scott C Blanchard; C Theresa Vincent
Journal:  Nat Commun       Date:  2019-05-08       Impact factor: 14.919

8.  Circulating Gasdermin-D in Critically Ill Patients.

Authors:  Elie Homsy; Srabani Das; Paul Consiglio; Corynn McAtee; Angela Zachman; Haikady Nagaraja; Mark D Wewers; Matthew C Exline; Rama K Mallampalli; Anasuya Sarkar
Journal:  Crit Care Explor       Date:  2019-09-17

9.  Hunt for the tipping point during endocrine resistance process in breast cancer by dynamic network biomarkers.

Authors:  Rui Liu; Jinzeng Wang; Masao Ukai; Ki Sewon; Pei Chen; Yutaka Suzuki; Haiyun Wang; Kazuyuki Aihara; Mariko Okada-Hatakeyama; Luonan Chen
Journal:  J Mol Cell Biol       Date:  2019-08-19       Impact factor: 6.216

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.