Literature DB >> 34054849

A Cancer-Specific Qualitative Method for Estimating the Proportion of Tumor-Infiltrating Immune Cells.

Huiting Xiao1, Jiashuai Zhang1, Kai Wang1,2, Kai Song1, Hailong Zheng1, Jing Yang1, Keru Li1, Rongqiang Yuan1, Wenyuan Zhao1, Yang Hui2.   

Abstract

Tumor-infiltrating immune cells are important components in the tumor microenvironment (TME) and different types of these cells exert different effects on tumor development and progression; these effects depend upon the type of cancer involved. Several methods have been developed for estimating the proportion of immune cells using bulk transcriptome data. However, there is a distinct lack of methods that are capable of predicting the immune contexture in specific types of cancer. Furthermore, the existing methods are based on absolute gene expression and are susceptible to experimental batch effects, thus resulting in incomparability across different datasets. In this study, we considered two common neoplasms as examples (colorectal cancer [CRC] and melanoma) and introduced the Tumor-infiltrating Immune Cell Proportion Estimator (TICPE), a cancer-specific qualitative method for estimating the proportion of tumor-infiltrating immune cells. The TICPE was based on the relative expression orderings (REOs) of gene pairs within a sample and is notably insensitive to batch effects. Performance evaluation using public expression data with mRNA mixtures, single-cell RNA-Seq (scRNA-Seq) data, immunohistochemistry data, and simulated bulk RNA-seq samples, indicated that the TICPE can estimate the proportion of immune cells with levels of accuracy that are clearly superior to other methods. Furthermore, we showed that the TICPE could effectively detect prognostic signals in patients with tumors and changes in the fractions of immune cells during immunotherapy in melanoma. In conclusion, our work presented a unique novel method, TICPE, to estimate the proportion of immune cells in specific cancer types and explore the effect of the infiltration of immune cells on the efficacy of immunotherapy and the prognosis of cancer. The source code for TICPE is available at https://github.com/huitingxiao/TICPE.
Copyright © 2021 Xiao, Zhang, Wang, Song, Zheng, Yang, Li, Yuan, Zhao and Hui.

Entities:  

Keywords:  immunotherapy; prognosis; relative expression orderings; signature genes; tumor microenvironment

Year:  2021        PMID: 34054849      PMCID: PMC8160514          DOI: 10.3389/fimmu.2021.672031

Source DB:  PubMed          Journal:  Front Immunol        ISSN: 1664-3224            Impact factor:   7.561


Introduction

Immune cells are critical components in the complex tumor environment (TME). Tumor-infiltrating immune cells (TIICs) can have either tumor-promoting or tumor-suppressive effects on tumor development and progression, depending on the specific type of cancer involved (1). The types and densities of TIICs not only have predictive value in patient survival, they also affect tumor responses to therapy, particularly immunotherapy (2, 3). For instance, an increase in CD8+ T cells is generally associated with improved clinical outcomes, whereas regulatory T cells (Tregs) and tumor-associated macrophages (TAMs) are often associated with a poor prognosis (4, 5). In addition, immune checkpoint blockade (ICB) antibodies reinvigorate anti-tumor immunotherapy responses by disrupting co-inhibitory T-cell signaling, a pathway that has demonstrated clinical activity in several malignancies (6). Evidence has shown that CD4+ and CD8+ memory T cell subsets, as well as NK cell subsets, correlated with a clinical response to immunotherapy in patients with melanoma (7, 8). As such, an assessment of TIICs is of critical importance in biomedical research as well as clinical pathology (9). Previous studies concerning alterations in the composition of immune cells in human cancers have predominantly relied on immunohistochemistry (IHC) or flow cytometry. However, these techniques are compromised by the limited set of available molecular markers and are cumbersome to apply to large panels of tumors; furthermore, in the case flow cytometry, fresh or frozen tissue is required (10, 11). An abundance of transcriptomics data provide an ideal resource for large-scale immune landscape analysis and have been used to develop many computational methods that have been mainly classified into two categories: deconvolution-based approaches and methods that are based on marker genes (12). The deconvolution methods, which include CIBERSORT (13), TIMER (14), EPIC (15), and quanTIseq (16), estimate the cell fractions leveraging on a reference matrix composed of representative expression signatures for specific immune cells. Techniques that are based on marker genes, including MCP-counter (17), xCell (18), and ImmuCellAI (19), utilize a list of genes characterized for each immune cell type to compute an enrichment score and allow for inter-sample comparisons of the same immune cell type. However, these methods have been developed for the enumeration of immune cells from bulk transcriptome data from multiple cancer types that masked inter-tumor heterogeneity between different tumor types; this would affect accuracy, at least to some extent (20). In addition, all of these methods were based on absolute gene expression, thus resulting in incomparability across different datasets. Some of these techniques require data normalization, a process that is susceptible to experimental batch effects and can even distort real biological signals (21). In contrast, our research team has proven that qualitative information derived from relative gene expression is highly robust with regards to batch effects and does not necessarily require normalization (22, 23). It is therefore imperative to develop a cancer-specific qualitative method to estimate the proportion of tumor-infiltrating immune cells. In this study, we considered colorectal cancer and melanoma as examples and constructed Tumor-infiltrating Immune Cell Proportion Estimator (TICPE), a qualitative method based on the relative expression orderings (REOs) of gene pairs within a sample, to estimate the proportion of immune cells in a TME. These cell proportions could then be used to directly compare the proportion of the corresponding immune cells across samples within a cohort or different cohorts. TICPE was extensively validated in human solid tumors via publicly available IHC data, mRNA mixtures, single-cell RNA-Seq (scRNA-Seq) data from colorectal and melanoma tumors, and simulated bulk samples. Moreover, the immune cell proportions estimated by TICPE could be used for prognostic analysis and associated with treatment status and the efficacy of immunotherapy response to melanoma.

Materials and Methods

Dataset Preparation

We downloaded gene expression datasets from the Gene Expression Omnibus (GEO, http://cancergenome.nih.gov/) and RNA sequencing data from The Cancer Genome Atlas (TCGA) by the University of California Santa Cruz (UCSC) Xena website (https://xena.ucsc.edu/). The processed gene expression profiles of 97 datasets were divided into three sections (see ). Eighty-one of these datasets contained eight types of human immune cells and cancer cell lines; normal samples were used to generate signature genes and develop the TICPE. Five datasets were used for to assess the performance of the TICPE. Dataset 1 was derived from an in vitro RNA mixture experiment, GEO accession GSE64385. These mixtures contained different immune populations that were purified from the peripheral blood from healthy donors with variable concentrations and were further diluted in a fixed amount of a solution containing mRNA extracted from HCT-116, a CRC cell line ( ). Dataset 2 contained a large series of 566 CRC tumors and 19 non-tumoral colorectal mucosas, GEO accession GSE39582. Of the 566 tumors, 33 patients also had immunohistochemistry data relating to CD3, CD8, and CD68 (Aurélien de Reyniès, Personal Communication). The other three datasets (Accession numbers GSE146771, GSE115978, and GSE72056) were scRNA-Seq data, and corresponded to 10 colon cancer samples, and 31 and 19 patients with melanoma, respectively (see ). The remaining datasets that were associated with clinical information were used to investigate prognosis and response to immunotherapy. The response categories of the melanoma patients were defined by the RECIST classification scheme (Response Evaluation Criteria in Solid Tumors) as a complete response (CR) and partial response (PR) for responders, or stable disease (SD) and progressive disease (PD) for non-responders (24). For the data downloaded from GEO, we mapped the probe ID to the Entrez gene ID using the corresponding platform annotation file. Data were discarded if a probe had no or multiple corresponding Entrez gene IDs. If multiple probes shared the same Entrez gene ID, then the arithmetic mean of the expression values of these probes was used as the final expression value of the gene. For the RNA-Seq data, profiles of fragments per kilobase million (FPKM) were directly downloaded from the TCGA. For scRNA-Seq data, the reconstructed bulk samples from each donor were identified by aggregating expression profiles from all cell barcodes of the given donor. The cell ratio per cell type in a donor was then calculated by the cell number of a specific cell type divided by the total number of cells (19).

Marker Gene Preparation

For each immune cell type, we integrated a list of marker genes obtained from the literature and other analytical methods, such as xCell and MCP-counter. Most of these were overexpressed relative to other immune cells, and a total of 2,034 marker genes were acquired (see ).

Highly Stable Pairs in Cancer Cell Lines

For each cancer cell, pairwise comparisons were performed for the expression level of all genes. For each gene pair (G, G), with only two possible REO outcomes (the gene expression of G > G or G < G), we retained the gene pair with a certain REO (G > G or G < G) in at least 99% cancer cells, defined as a highly stable gene pair (SPairs).

The TICPE Development Pipeline

This cancer-specific method can be used for a variety of cancer types. Here, we took colorectal cancer as an example to describe the process in detail, the flowchart for TICPE is described in .
Figure 1

The pipeline of the TICPE algorithm. (A) Summary of the data sources used in the study to develop the TICPE. (B) The pipeline of the TICPE algorithm. RankComp algorithm was used to identify robust signature genes compared with cancer cells for each type of immune cell from the known marker genes. The upregulated score was the reversal significance of all signature genes corresponding to the cell type. Using the simulated model for each cell type, we derived a transformation pipeline for the scores. For each queried sample, calculated upregulated scores and transformed them to estimated cell proportions using learned parameters.

The pipeline of the TICPE algorithm. (A) Summary of the data sources used in the study to develop the TICPE. (B) The pipeline of the TICPE algorithm. RankComp algorithm was used to identify robust signature genes compared with cancer cells for each type of immune cell from the known marker genes. The upregulated score was the reversal significance of all signature genes corresponding to the cell type. Using the simulated model for each cell type, we derived a transformation pipeline for the scores. For each queried sample, calculated upregulated scores and transformed them to estimated cell proportions using learned parameters.

Identifying Signature Genes Compared With CRC Cells

Since cell-type-specific signatures can vary depending on cancer type, there is a need to identify cancer-type specific marker genes for each immune cell type. We take immune cell type A with n cells as an example to illustrate the process that can be used to identify the corresponding signature genes among reported marker genes. Firstly, we used the RankComp algorithm (25) to identify individual-level differentially upregulated marker genes (up-DEGs) for each immune cell compared with the SPairs in the CRC cell lines. The p-values of RankComp were adjusted using the Benjamini and Hochberg method (26). Secondly, the cumulative binomial distribution model was used to identify up-DEGs shared by a non-random high proportion of samples and the P value determined whether a marker gene was differentially upregulated at the population level. Then, the P values were also adjusted for multiple testing to control the false discovery rate (FDR). The significance was calculated as shown in Equation (1). Equation (1): In Equation (1), P represents the probability of observing a marker gene being differentially upregulated in a sample by chance (P = 0.5), n and k represent the total number of samples of the immune cell type A and the number of samples with the marker gene being differentially upregulated, respectively. Next, when a marker gene’s adjusted P-value was <0.05, it would be reserved. We finally removed the up-DEGs included in more than one type of immune cell to reduce the dependencies between closely related cell types, and the remained up-DEGs were defined as signature genes (Step 1).

Calculating Upregulated Scores Based on Signature Genes

Based on the signature genes for each immune cell type, we were able to compute the cell infiltration scores for each sample. However, the scores had different distributions between different signature genes and could not thus be compared across immune cell types in a sample. Thus, for each cell type, we conducted a simulated model using the immune cell (cell A) with an additional “control” cell type (a sample of normal colon) and a variety of CRC cell lines. Different types of CRC cell lines were used to reflect the heterogeneity of patients with the same cancer. For the simulated models, batch effects among the three types of dataset were removed using Combat (27). Then, we generated such simulations by using the median expression profile of the merged profile composed of three cell types: 60% of the CRC cell, X% of cell A, and 40–X% control (28). X% represents an arithmetic sequence with a range of 0.8 to 25.6% and an interval of 0.8%. We used this range because these interesting cell types had low fractions in the TME (18). Taking the SPairs of CRC cell lines as the background, we calculated the cell infiltration scores of the simulations using m signature genes {G, G,… G}, as described below. We were able to calculate the numbers of gene pairs (the gene pair was constructed by G and G) belonging to SPairs with ordering patterns (G > G) and (G < G) in cancer cells, which were denoted a and b. Similarly, c and d denoted the corresponding numbers of gene pairs with ordering patterns (G > G) and (G < G) in a simulation sample. When simulations involved an increasing proportion of immune cells, there were more stable pairs with the ordering patterns (G > G). Using Fisher’s exact test, we were able to determine the degree of cell infiltration by calculating the reversal significance, also known as the upregulated score (UpScore) for each simulation sample (Step 2).

Transforming UpScores to Estimate Cell Proportions

We designed a transformation pipeline for the UpScores of each cell type to enable the estimated proportions to be compared across cell types, and not just across samples. A simulation containing 0.8% of immune cells was considered to barely result in reversal significance. Therefore, for the simulated model of cell A, we first shifted the UpScores to 0 using the minimal UpScore (which corresponded to the simulation containing 0.8% of cell A) and fitted a power function to the UpScores that corresponded to proportions of 0.8 to 25.6%. The transformed parameters (V and V) were acquired by Equation (2). For each immune cell type, we could get a pair of transformed parameters. Equation (2): In Equation (2), F represents the proportions of 0.8 to 25.6% and S represents the corresponding UpScores of cell A. It was recommended that an expression dataset should contain as many signature genes as possible. The UpScores of different immune cell types were calculated based on their different signature genes. Subsequently, using the parameters corresponding to immune cells, the UpScores were transformed into estimated cell proportions for each sample (Step 3).

The Generation of Simulated Data

We simulated bulk RNA-seq data with different tumor purity values and immune infiltrates by mixing malignant cells with different immune cells from a scRNA-seq dataset. There were 100 simulated samples and each of these was composed of 1,000 cells that were randomly selected, as follows: (i) Cancer cells form the majority of a simulated samples, a fraction f of them was constrained to the interval [0.5, 0.99], and the remaining fraction 1 − f was randomly assigned to the other immune cell types; (ii) the fraction was multiplied by 1,000 to obtain cell counts for different cell types; (iii) the corresponding number of cells was randomly selected from the single cell dataset. If one cell type was available from the scRNA-seq dataset with only a few cells available, then, the same single cell sample would be selected multiple times for the artificial bulk sample.

Performance Assessment of the TICPE

The performance of the TICPE was evaluated using both microarray and RNA-Seq datasets and compared with that of previously published methods (CIBERSORT, EPIC, xCell, MCP-counter, and ImmuCellAI). For a given immune cell type, the accuracy and sensitivity of each method were measured using Pearson’s correlation between the results of in silico methods and the true proportions, as measured by immunohistochemistry or scRNA-Seq. Furthermore, we introduced a correlation deviation (19) for all cell types to measure the global performance of each method; this strategy took the sample size and overall accuracy into consideration. A smaller correlation deviation might suggest that the predicted cell fractions agree better with the true composition, as shown in Equation (3). Equation (3): In Equation (3), n represents the number of immune cell types detected in samples and r represents the Pearson correlation coefficient of immune cell type i.

Statistical Analysis

The correlation between estimated proportion and true composition was evaluated by Pearson’s correlation. The ROC analysis was performed to assess the validity of the TICPE and was completed by pROC R Package. The statistical significance of comparisons between two groups or more than two groups was estimated by the Wilcoxon rank-sum test or the Kruskal–Wallis test, respectively. The overall survival curves were estimated by the Kaplan–Meier method, and the differences between survival distributions were evaluated by the two-sided log-rank test (29). The Venn diagram was used to analyze the signature genes were different between CRC and melanoma by ggvenn R Package. All statistical analyses were performed using R program (version 4.0.2). P-values were two-sided, and P <0.05 was considered to be statistically significant (4).

Results

Development of the TICPE Algorithm

We designed a method, called TICPE, to estimate the proportions of eight important tumor-infiltrating immune cells (B cells, CD4+ T cells, CD8+ T cells, dendritic cells (DC), monocytes, macrophages, natural killer (NK) cells, and neutrophils) in a specify type of cancer. We integrated marker genes for these different cell types from publications and obtained expression profiles for immune cells, cancer cell lines, and normal tissues, from the GEO database ( ; ). Taking colorectal cancer as an example, a three-step strategy of the core algorithm of TICPE is shown in ; the detailed algorithm is described in the Materials and Methods section. Since the marker genes were screened relative to other immune cells, but not to tumor cells, the genes for a specific type of cancer needed to be filtered. We applied the RankComp algorithm to detect overexpressed marker genes in a given immune cell compared with CRC cells (FDR <5%). We then created a specific gene set from overexpressed marker genes as signature genes; this included 218 genes from the eight immune cell types (Binomial test, FDR <5%) ( ) (Step 1). We were then able to compute the cell infiltration scores for each sample based on the signature genes (Fisher’s exact test) (Step 2). However, the scores exhibited different distributions between different signature genes and could not therefore be compared across cell types in a given sample. For each immune cell type, we thus conducted a simulated model using the immune cells, CRC cells, and normal colon samples, and calculated the UpScores based on signature genes. Next, we designed a transformation pipeline for the UpScores and acquired a pair of transformed parameters for each cell type. For CRC samples, we were able to calculate the UpScores for each immune cell type and transform these into the estimated cell proportions with the acquired parameters corresponding to immune cells (Step 3).
Table 1

Datasets used in developing TICPE for colorectal cancer.

Cell TypeAccessionSamples#Marker Gene#
CRC cellsGSE11618, GSE13059, GSE110425, GSE14103, GSE16648, GSE122985, GSE18560, GSE24795, GSE115716, GSE35566, GSE55624, GSE59196, GSE63252, GSE112282, GSE50841, GSE116528, GSE90085, GSE59883, GSE59857, GSE116529, GSE75205, GSE106073, GSE72544, GSE50791, GSE119197, GSE120993687
B cellsGSE24736, GSE19599, GSE12366, GSE49910, GSE120367, GSE75007218422
CD4+ T cellsGSE11292, GSE36769, GSE32959, GSE50175, GSE103527, GSE71956230885
CD8+ T cellsGSE84251, GSE93683, GSE98640, GSE84331, GSE71956126807
NK cellsGSE27838, GSE8059, GSE21774, GSE35330, GSE7509193256
MacrophagesGSE102117, GSE100129, GSE7568, GSE16385, GSE13670, GSE24897136364
MonocytesGSE38351, GSE39840, GSE35683, GSE6054, GSE60199, GSE98480125331
DCsGSE7509, GSE10316, GSE23618, GSE23371, GSE87494, GSE8530592245
NeutrophilsGSE22103, GSE39889, GSE8668, GSE18810, GSE70044182225

#, number; CRC, colorectal cancer; NK cells, natural killer cells; DCs, dendritic cells.

Datasets used in developing TICPE for colorectal cancer. #, number; CRC, colorectal cancer; NK cells, natural killer cells; DCs, dendritic cells. A specific gene set compared with CRC cells for per cell type was selected and used in TICPE. CRC, colorectal cancer; NK cells, natural killer cells; DCs, dendritic cells. A specific gene set compared with melanoma cells for per cell type was selected. NK cells, natural killer cells; DCs, dendritic cells.

The Performance of the TICPE in CRC and Melanoma Samples

Firstly, we calculated the proportion of immune cells in CRC samples with three independent publicly available datasets, and simulated RNA-seq data, to evaluate the TICPE. In the in vitro RNA mixture experiment (GSE64385), we observed that the estimated cell proportions were highly correlated with the cell proportions for the populations introduced in the mixtures (ρ = 0.99 and P = 4.2 × 10−13 for B cells, ρ = 0.82 and P = 9.9 × 10−4 for monocytes, ρ = 0.96 and P = 3.3 × 10−7 for NK cells, ρ = 0.96 and P = 4.1 × 10−7 for neutrophils; ). Then, we used scRNA-Seq data from 10 colon cancer samples (GSE146771) and compared TICPE predictions with measured immune cell proportions. As shown in , the proportions of immune cells estimated by the TICPE showed a significantly positive correlation with the actual cell proportions (P <0.05). We also observed a significant correlation between our predictions and the immunohistochemistry data from 33 colorectal cancer tumors (P <0.05; ). The number of samples available from published data was limited, so we further used the scRNA-Seq data of annotated immune cells to generate 100 simulated bulk RNA-seq samples (further details are given in the Materials and Methods section). The TICPE showed a high correlation between the known proportions and the estimated fractions in the simulated dataset (P <0.01; ). Furthermore, we used two scRNA-Seq datasets from melanoma patients (GSE115978; GSE72056) as benchmark resources for assessing the performance of the TICPE. The estimated proportion of each immune cell type was found to correlate with the true immune cell proportions that were calculated from single cell barcode information (P <0.05; ). Similarly, we used one of the scRNA-seq datasets (GSE115978) from melanoma ecosystems to simulate bulk samples of known cell type proportions and observed a good agreement with our predictions (P <0.05; ). Furthermore, we employed the receiver operating characteristic (ROC) curve and the area under the curve (AUC) to evaluate the performance of TICPE. We used the median true cell proportions as the cutoff for each cell type, and found the TICPE estimates had an AUC value ranging from 0.667 to 1 on publicly available datasets ( ).
Figure 2

Performance assessment of the TICPE in solid tumors. (A) Correlation of the TICPE predictions with the cell proportions for the populations introduced in the mixtures. (B) Comparison with single-cell RNA-Seq data from colon samples. (C) Correlation of the TICPE predictions with corresponding cell densities measured by immunohistochemistry from colon cancer primary tumors. (D) Correlation of the TICPE predictions versus known cell type fractions on 100 simulated bulk samples generated from scRNA-seq from colon samples. Correlations were based on Pearson correlation. Proportions of cells observed experimentally were given in .

Performance assessment of the TICPE in solid tumors. (A) Correlation of the TICPE predictions with the cell proportions for the populations introduced in the mixtures. (B) Comparison with single-cell RNA-Seq data from colon samples. (C) Correlation of the TICPE predictions with corresponding cell densities measured by immunohistochemistry from colon cancer primary tumors. (D) Correlation of the TICPE predictions versus known cell type fractions on 100 simulated bulk samples generated from scRNA-seq from colon samples. Correlations were based on Pearson correlation. Proportions of cells observed experimentally were given in . We hypothesized that the high correlation values resulted, at least in part, from the robust signature genes for a specific type of cancer. Therefore, we respectively collated sets of marker genes, as reported by Angelova et al. (30), Aran et al. (18), and Manoharan et al. (31), and used the same pipeline to estimate the cell proportions to compare the representativeness of the signature genes selected by the TICPE and those in specific types of cancer. Using these public validation datasets with different cell types, we evaluated the performance of the signature gene sets on independent datasets. We observed that, in most cases, the selected signature genes in our study showed better performance in terms of the estimated cell proportions than the marker genes from previously published methods ( ). In general, and not only for colorectal cancer and melanoma, the TICPE can also be developed to estimate the proportions of infiltrating immune cells in other types of cancer when collect relevant cancer cells and normal cells are tested within the same development pipeline. Moreover, the more robust cell type-specific genes we attained, the better performance we saw in the TICPE.
Figure 3

Performance comparison with other methods. (A) Scatter plot organized by cell type showing the performance of different marker gene sets identified from four sources. Methods performance was quantified using Pearson’s correlation (R). Different colors represented marker genes collated from different methods and different datasets had corresponding shapes. (B, C) Performance of TICPE and other methods on CRC and melanoma validation cohorts, respectively. Here rows corresponded to methods and columns showed the Pearson correlation coefficient for the corresponding cell type in each dataset. (D) Correlation deviation of each method in both CRC and melanoma validation datasets.

Performance comparison with other methods. (A) Scatter plot organized by cell type showing the performance of different marker gene sets identified from four sources. Methods performance was quantified using Pearson’s correlation (R). Different colors represented marker genes collated from different methods and different datasets had corresponding shapes. (B, C) Performance of TICPE and other methods on CRC and melanoma validation cohorts, respectively. Here rows corresponded to methods and columns showed the Pearson correlation coefficient for the corresponding cell type in each dataset. (D) Correlation deviation of each method in both CRC and melanoma validation datasets.

A Comparison of the TICPE With Previously Published Methods

We utilized the public validation datasets and simulated RNA-seq data to benchmark a range of other methods (CIBERSORT, EPIC, MCP-counter, xCell, and ImmuCellAI) in order to predict immune cell proportions. Compared to the other methods used currently in CRC validation datasets, the TICPE did not obtain the highest correlations across all cell types; however, it did provide the most consistent performance of all the assessments ( ). The majority of cell types measured by TICPE showed higher correlations with the observed cell fractions than the other methods for both scRNA-Seq datasets from melanoma patients and simulated samples ( ). Performance of TICPE and previous computational methods was assessed with all validation datasets by cell type. We chose the cell type that was analyzed in more than three datasets. The TICPE robustly obtained positive correlations across all cell types and data sets and scored the high performers in the assessments ( ). In addition, the TICPE showed the least correlation deviation for the publicly available datasets for CRC and melanoma ( ). It is also worth noting that the available methods for estimating immune cell contents are based on quantitative expression measurements of reference profiles or signature genes, thus resulting in incomparability across different datasets. In contrast, our method was based on the relative ordering of gene expression and was developed to estimate cell proportions in every individual tumor sample; this strategy was more flexible than the other methods. We analyzed the cell infiltration of two scRNA-Seq datasets for melanoma and compared the results of in silico methods with the true proportions. With the exception of macrophages (Wilcoxon test; P = 0.017), we found that there was no significant difference in the actual cell fractions when compared between the two datasets for melanoma. The TICPE estimates, had a similar trend to the actual proportions and showed no statistical significance between the two datasets except for macrophages (Wilcoxon test; P = 0.02; ). However, the majority of cell contents estimated by xCell and ImmuCellAI between the two datasets were significantly different and differed from the actual proportions. These results showed that the TICPE is a robust approach that supports the comparisons of the same cell type across different datasets at the same time and shows high levels of accuracy and robustness to estimate the proportion of immune cells of tumor samples.

TICPE Revealed That Immune Cell Infiltration Has Prognostic Value

TIICs are indispensable components of the tumor microenvironment and have been demonstrated to be highly valuable in determining the prognosis of multiple cancers. We accessed data from the GEO and TCGA to investigate whether TIICs had prognosis value for melanoma patients. We employed the TICPE to systematically estimate the eight infiltrated immune cells and stratified patients into a high infiltration subtype and a low infiltration subtype by using the median cell proportions as the cutoff. The results of Kaplan–Meier analysis of 79 metastatic melanoma specimens (GSE54467) and a unique set of 51 treatment-naive primary melanoma samples (GSE98394) both indicated that higher fractions of CD4+ T, CD8+ T cells, and NK cells might be associated with better survival over those with low proportions (P <0.05; ). In addition, RNA-seq data from 472 SKCM patients and OS data for 468 patients were downloaded from the TCGA database. In addition, 323 patients with a blank therapy type were posited without chemo/radiotherapy. We only selected these patients to reduce the treatment affecting patient prognosis and also found that melanoma patients with a high density of NK cells had a better prognosis (P = 0.0021; ). Furthermore, the TICPE was able to estimate cell proportions in every individual tumor sample. Therefore, we combined 57 melanoma patients with lymphnode (GSE22153) and subcutaneous metastases, along with 20 melanoma patients with liver and lymphnode metastases (GSE22154), who were treated in the same clinical center. Melanoma patients with a higher abundance of B cells, CD4+ T, and CD8+ T cells had a longer overall survival with or without combining the 20 patients with liver and lymphnode metastases ( ).
Figure 4

The application of TICPE on prognostic analysis for melanoma. Survival high CD4+ T/CD8+ T/NK cells and low CD4+ T/CD8+ T/NK cells groups in melanoma patients, respectively. P values comparing two groups were calculated with the log-rank test.

The application of TICPE on prognostic analysis for melanoma. Survival high CD4+ T/CD8+ T/NK cells and low CD4+ T/CD8+ T/NK cells groups in melanoma patients, respectively. P values comparing two groups were calculated with the log-rank test. We also applied the TICPE to four cohorts of CRC patients to investigate the relationship between cell infiltrations and patient prognosis. In a cohort of 160 stage II and III CRC tissue samples that were treated surgically (GSE24551), patients with high levels of CD8+ T cell infiltration were significantly associated with a better DFS (disease-free survival) compared with those with a low infiltration subtype (P = 0.041). In a large series of CRC patients who had not received adjuvant chemotherapy (GSE39582), and a cohort comprising 232 colorectal cancer patients (GSE17538), we found that a higher proportion of NK cells indicated a prolonged period of patient survival (P = 0.0079; P = 0.018) while an increased number of CD8+ T cells was associated with a better prognosis, although this was not statistically significant in either of the datasets. Furthermore, in a cohort of 232 colorectal cancer patients (GSE17538), despite the fact that macrophage infiltration was not statistically significant, a higher proportion of macrophages was associated with a dismal prognosis (P = 0.17). We also observed the same tendency in 171 surgically resected CRC specimens without chemo/radiotherapy (GSE14333); relatively poor DFS was correlated with an increased fraction of macrophages (P = 0.06; ). Taken together, CD8+ T cells and NK cells were shown to play favorable roles in the survival of CRC and melanoma patients and the data obtained using the TICPE for several cohorts suggested that immune cell proportions can serve as an effective prognostic indicator for tumors.

TICPE Detected Changes in the Proportions of Immune Cells During Immunotherapy for Melanoma

An increasing number of research studies has revealed that an elevation in the levels of CD8+ T cells and NK cells is associated with an immunotherapy response in anti-PD-1 treatment (32, 33). We applied the TICPE to a melanoma dataset (GSE91061) to investigate the impact of immune cell proportions on cancer immunotherapy. The estimated fractions of CD8+ T cells and NK cells in pre-treatment and on-treatment samples showed a substantial increment in the complete and partial response group (CR & PR) (Kruskal-Wallis test; P < 0.05) than the other groups ( ). With regards to pre-treatment data, the immune cell fractions across different groups showed no statistical differences. Notably, there was no statistically significant difference between paired pre-treatment versus on-treatment immune cell proportions in responders. However, there was an increasing trend for changes in immune cell proportions during anti-PD-1 treatment; this indicated that these immune cells were associated with a favorable response to PD-1 inhibition ( ; Wilcoxon test). These results suggested that TICPE could provide important insights on the dynamic immune cell infiltration during immunotherapy and offer valuable indicators for immunotherapy response during treatment.
Figure 5

The application of TICPE on immunotherapy for melanoma. (A) The significant proportion differences of CD8+ T cells (left)/NK cells (right) in different response groups at pre- and on-treatment (anti-PD1) time point. (B) Change of the estimated immune cell proportions between pre-treatment and on-treatment time point in paired responders.

The application of TICPE on immunotherapy for melanoma. (A) The significant proportion differences of CD8+ T cells (left)/NK cells (right) in different response groups at pre- and on-treatment (anti-PD1) time point. (B) Change of the estimated immune cell proportions between pre-treatment and on-treatment time point in paired responders.

Discussion

In this study, we developed the TICPE, a cancer-specific qualitative method based on REOs, to estimate the proportion of eight different immune cells. The results of our extensive validation using immunohistochemistry data, mRNA mixtures in vitro, scRNA-Seq data, and simulated bulk RNA-seq samples, demonstrated that the TICPE could effectively infer immune cell fractions from transcriptome profiles. Of note, the TICPE does not only apply for colorectal cancer and melanoma, the TICPE could also be developed to estimate the infiltrating immune cells in any type of cancer as long as relevant data is available, such as cancer cells and normal samples. We had a straightforward comparison tumor microenvironment between colorectal cancer and melanoma utilizing data compiled by TCGA. The results showed that melanoma was highly infiltrated by CD4+ T cells, NK cells, dendritic cells, and neutrophils but poorly by cells of B cells, CD8+ T cells, monocytes, and macrophages in comparison with colorectal cancer ( ). Moreover, the TICPE could be broadly employed to other components of the tumor microenvironment with the increased availability of public data by using the proposed pipeline. In the further work, with the gradual accumulation of relevant data of other cancer types and cell types, we will develop the TICPE for each neoplasm. Then we will estimate the abundance of immune cell populations in samples across multiple cancer types from The Cancer Genome Atlas (TCGA), and propose a global analysis of immune landscape across human cancers. The key step in constructing the TICPE involved accurately identifying a list of genes characterized by a cell type. Compared to other methods based on marker genes, our method was more reliable due to the fact that it incorporated a group of signature genes for each cell type that was acquired from a comprehensive literature search and featured differentially expressed genes when compared with cancer cells. As shown the Venn diagram of , there were some shared cell type-specific genes in both CRC and melanoma, but the majority of signature genes for each cell type were cancer type-specific in our study ( ), which also indicated the gene expression of immune cell varied across different tissues. On the other hand, we chose to apply a gene signature approach over deconvolution methods because of the several advantages that the former provides. First, we did not require a reference expression matrix and could treat immune cells independently; this supported inter-sample comparisons and avoided issues relating to multicollinearity. Second, the TICPE was based on the rank of gene expression rather than the actual gene expression value and was therefore suitable for cross-platform transcriptomic measurements and comparisons. Finally, gene signatures are simple and can easily be adjusted. Furthermore, the procedure for developing TICPE was based on REOs so that it was agnostic to monotonic data normalization or concerns related to experimental batch effects; these effects rendered our technique more robust to both technical and biological noise. The TICPE was reliable and could stratify a cohort of similar tumors based on the composition of their immune microenvironments, and could follow proportional changes of the microenvironment during the course of immunotherapy. In this investigation, CD8+ T cells and NK cells were shown to play favorable roles in the survival of CRC and melanoma patients. CD8+ T cells are the most potent cytolytic cell subset and NK cells also exert cytolytic functions (34, 35). CD8+ T cells are able to exert a directly killing effect on tumors cells and have been linked to a better prognosis in several types of cancer (36). In parallel with CD8+ T cells, NK cells can recognize and kill neoplastic cells and play pivotal roles in innate and adaptive immune responses and tumor immunosurveillance (37). In addition, our study also showed that the abundance of macrophages may serve as an unfavorable prognostic marker for CRC. Macrophages are conventionally classified into M1 and M2 subtypes. M2 macrophages secrete Interleukin 10 (IL-10), transforming growth factor-β(TGF-β), and other mediators that stimulate tumor-related angiogenesis and inhibit antitumor immune response (4, 38). On the other hand, although limited in sample size, our analysis detected an association between anti-PD-1 immunotherapy response and elevated CD8+ T cell and NK cell levels and revealed the potential of TICPE for providing important insights into dynamic immune cell infiltration during immunotherapy. The TIICs found in our study make a significant contribution to patient survival and treatment and our findings are corroborated by previous studies (17, 39). Overall, the proportions of immune cells measured by the TICPE could serve as a prognostic factor or a potential predictive model for the response to immune checkpoint blockade therapy in solid tumors. Despite the utility of our method for estimating the tumor immune contexture for a particular type of cancer, several issues require further investigation. First, the TICPE was a cancer specific model and only focused on a very narrow range of cell types from the tumor microenvironment as our research involved publicly available datasets and gene sets that were characterized by each type of immune cell. Further efforts are required to collect more relevant information to extend the technique to other cell types (e.g., cancer-associated fibroblasts) and should be expanded to include more cancer types. Moreover, the inferences were strictly upregulated scores which could be compared with cancer cell and could not be interpreted as proportions. Thus, while we attempted to calibrate our method to resemble proportions, this strategy was hindered by conducting simulations to real-world datasets; the reliability of the technique needs to be improved. Furthermore, the final estimates were not normalized to sum up to one; therefore, the estimates could not be interpreted directly as cell fractions. Consequently, further improvement of our method is certainly warranted, including the reselection of signature genes from genome-wide genes to extend to other cell types and the selection of a large cohort of real-world datasets with cell proportions to conduct simulated models. In summary, the TICPE is a cancer-specific qualitative method for estimating the tumor immune contexture using public RNA-Seq and microarray datasets. The TICPE can estimate the proportion of infiltrating immune cells in CRC and melanoma but could also be extended to other types of cancer. Furthermore, the TICPE was based on REOs and can estimate the proportions of tumor-infiltrating immune cells in individual tumor samples. Therefore, the TICPE showed good comparability across different datasets and was only weakly affected by batch effects. We anticipate that this method will assist in the discovery of novel prognostic and predictive response biomarkers for both conventional and immunotherapy by taking immune cell composition into account.

Data Availability Statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/.

Author Contributions

WZ and YH conceived of, designed, and supervised the study. HX, JZ, and KW drafted the manuscript. HX and JZ implemented the source code of TICPE. KW participated in the study design and result validation. KS, HZ, JY, KL, and RY contributed to the design of TICPE. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the National Natural Science Foundation of China under Grant: 81872396.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Cell TypeGene NumberSignature Genes
B cells24 BLK, CD19, CD79A, CD79B, IGLL1, TCL1A, TLR7, FCRL2, BANK1, CPNE5, KLHL14, LINC00926, FCRL5, EBF1, ARHGAP25, CLECL1, TNFRSF17, FCRLA, HLA-DOB, NCF1, P2RY10, PNOC, TLR9, FCRL4
CD4+ T cells65 ANK1, CD40LG, CD69, CD72, CHI3L2, CCR4, CCR8, DGKA, FYN, GATA3, GPR18, GPR19, IL2RA, IL6R, IL9R, IL12RB2, TNFRSF9, ITGA4, ITGB2, JAK3, LCK, LTB, MAL, CD200, NPAT, P2RX5, PDCD1, PLCL1, PTPRC, RGS1, SELPLG, STAT4, STAT5A, STAT5B, TXK, WIPF1, SOCS3, AIM2, HS3ST3B1, TLR6, CD226, PASK, PLCL2, ANKRD12, STAP1, ZBTB32, LAT, PNMA3, FOXP3, ASB2, LRRN3, LAX1, RNF125, PARP11, PLXDC1, MAN1C1, HIVEP3, BCL11B, PVRIG, ANKRD55, TRIM46, LIMD2, SIGLEC10, RCSD1, PIK3IP1
CD8+ T cells42 ABCD2, RUNX3, CD8A, CD8B, LYST, TSC22D3, GPR183, FLT3LG, HLA-DPB1, IFI16, INPP4A, POU6F1, PTGER4, RAB27A, ATXN7, ZBTB16, EOMES, IL18R1, SLC16A7, ITM2A, AKAP5, TOX, SPOCK2, ZEB2, PLXNC1, CA5B, IKZF2, PTPN22, PBXIP1, IGFLR1, APOL3, KIAA1109, SLA2, SLFN11, JAML, TC2N, TTC39C, TMIGD2, TMEM71, HAPLN3, PYHIN1, JAKMIP1
DCs15 SLAMF8, CCL17, CD1B, CD1E, CLIC2, CD1C, CD209, DNASE1L3, IL3RA, SAMSN1, FABP4, C1QC, SLAMF9, THBD, FAM49A
Macrophages18 CHIT1, CD14, APOC1, HAMP, VSIG4, SDS, SIGLEC7, ADAMDEC1, CCL18, CCL8, CCR1, CMKLR1, CSF1, HS3ST2, MMP19, CPM, ENG, MS4A7
Monocytes26 TLR8, ASGR2, IRAK3, CD33, CFP, CLEC4A, CLEC4E, CXorf21, DOK2, DOK3, FCN1, HCK, LILRA1, LILRA5, LILRB2, MNDA, MYO1F, NCF4, PILRA, PLEK, POU2F2, PSTPIP1, QKI, RETN, CD300LF, SRGN
NK cells16 KIR2DL3, NCR1, FGFBP2, KIR3DS1, PTGDR, LIM2, KIR3DL1, KIR2DS1, KIR3DL3, KIR2DS2, KIR2DS5, KIR2DL1, SH2D1B, KIR2DL4, PIK3CG, KIR2DS4
Neutrophils13 FCGR3B, ALPL, VNN3, FFAR2, MMP25, TREM1, FPR1, LINC00528, CMTM2, PROK2, CLEC7A, CAMP, VNN2

CRC, colorectal cancer; NK cells, natural killer cells; DCs, dendritic cells.

Table 2B

A specific gene set compared with melanoma cells for per cell type was selected.

Cell typeGene NumberSignature Genes
B cells15 MS4A1, BLK, CD19, GNG3, SGCA, CD79A, CD79B, CD53, CD72, HTR3A, IGLL1, TCL1A, TLR7, VPREB3, AICDA
CD4+ T cells85 LIMD2, TRAF1, NPAT, PIK3IP1, ANKRD12, AAK1, ACBD4, CD226, CUBN, GPSM3, GRAP2, IL16, INSL3, JAK3, KLHL3, KRT2, LAIR2, MLH3, MLXIP, NOL9, SELPLG, SORCS3, STAP1, TNK1, TSPAN32, ZNF780B, HS3ST3B1, FOXP3, LAX1, STAT5B, TTN, CCR3, NFATC3, IL2, GGT1, SYNGR3, IL12RB1, STAT4, ZBTB32, CSF2, DPP4, IL12RB2, IL22, EGFL6, IL4, GATA3, IL5, IL13, IL26, ANK1, MB, MICAL2, PHEX, PTGIS, IL1R1, RORC, IL21, IL1R2, IL17A, MAP4K1, SIK1, FOSB, PVRIG, CD69, BCL11B, CHI3L2, DGKA, LAT, LCK, MAP9, PASK, RGS1, SLC7A10, TCF7, TSHR, ZBTB10, TFAP4, COL5A3, ADCYAP1R1, DAB1, ERN1, FXYD7, PNMA3, ARHGEF5, DEFB126
CD8+ T cells43 BLNK, HTR1B, SMCP, RRH, CCDC87, MOGAT2, GJB4, CALY, KIAA1109, CD248, RFX2, AMBN, MYL1, GPR52, CILP, TNFRSF10C, ITGAM, PTGDR2, PRDM1, MPO, RUNX3, APOL3, DUSP2, ZBTB16, CCND2, EOMES, ITM2A, SNX9, CXCL13, HAVCR2, LINC00299, MYO7A, TIGIT, TNFRSF1B, AKAP5, TOX, RGS2, GALM, SYNGR2, PTGER4, CCR6, ATR, GIPR
DCs28 PTGIR, SLAMF8, SLC15A3, SYT17, CCL13, CCL17, CD1B, CD1E, CLIC2, MMP12, TREM2, PLA2G7, ALDH1A2, ALOX15, ALOX15B, BCL2L11, CCL23, CD1A, CD1C, CD209, CD80, DNASE1L3, FLT3, FUT7, GUCA1A, IL12B, IL3RA, KCNK13
Macrophages19 CAMP, CHIT1, CD14, FCGR1A, HAMP, MSR1, VSIG4, SDS, SIGLEC7, TYROBP, ADAMDEC1, CCL7, CCR1, CD84, CMKLR1, CPNE6, CXCL9, CYBA, CYP19A1
Monocytes31 CA1, TLR8, FOLR2, ASGR2, IRAK3, CD33, VCAN, AIF1, CD101, CD93, CEACAM4, CFP, CLEC4A, CXorf21, DOK2, DOK3, FCER1A, FCN1, FGL2, FOLR3, GPR183, HCK, KCNMB1, KDM6B, KSR1, LILRA1, LILRA5, LILRB1, LST1, LY86, LYL1
NK cells11 KIR2DL3, NCR1, NCR3, PRR5L, KIR3DS1, PTGDR, HIPK1, LIM2, NMUR1, PRDM2, TNFSF11
Neutrophils19 CXCR1, FCGR3B, S100A12, TREML2, TRPM6, SIGLEC5, CREB5, ALPL, CEACAM3, VNN3, CA4, CEACAM8, CYP4F3, FFAR2, HBB, MMP25, P2RY13, PGLYRP1, TGM3

NK cells, natural killer cells; DCs, dendritic cells.

  39 in total

Review 1.  The logrank test.

Authors:  J Martin Bland; Douglas G Altman
Journal:  BMJ       Date:  2004-05-01

2.  Adjusting batch effects in microarray expression data using empirical Bayes methods.

Authors:  W Evan Johnson; Cheng Li; Ariel Rabinovic
Journal:  Biostatistics       Date:  2006-04-21       Impact factor: 5.899

3.  M2 Macrophage-Derived Exosomes Promote Cell Migration and Invasion in Colon Cancer.

Authors:  Jingqin Lan; Li Sun; Feng Xu; Lu Liu; Fuqing Hu; Da Song; Zhenlin Hou; Wei Wu; Xuelai Luo; Jing Wang; Xianglin Yuan; Junbo Hu; Guihua Wang
Journal:  Cancer Res       Date:  2018-11-06       Impact factor: 12.701

4.  Progress and challenge for computational quantification of tissue immune cells.

Authors:  Ziyi Chen; Aiping Wu
Journal:  Brief Bioinform       Date:  2021-01-06       Impact factor: 11.622

Review 5.  The immune contexture in human tumours: impact on clinical outcome.

Authors:  Wolf Herman Fridman; Franck Pagès; Catherine Sautès-Fridman; Jérôme Galon
Journal:  Nat Rev Cancer       Date:  2012-03-15       Impact factor: 60.716

6.  Type, density, and location of immune cells within human colorectal tumors predict clinical outcome.

Authors:  Jérôme Galon; Anne Costes; Fatima Sanchez-Cabo; Amos Kirilovsky; Bernhard Mlecnik; Christine Lagorce-Pagès; Marie Tosolini; Matthieu Camus; Anne Berger; Philippe Wind; Franck Zinzindohoué; Patrick Bruneval; Paul-Henri Cugnenc; Zlatko Trajanoski; Wolf-Herman Fridman; Franck Pagès
Journal:  Science       Date:  2006-09-29       Impact factor: 47.728

7.  Prognostic value of intra-tumoral CD8+ /FoxP3+ lymphocyte ratio in patients with resected colorectal cancer liver metastasis.

Authors:  Kostandinos Sideras; Boris Galjart; Angela Vasaturo; Alexander Pedroza-Gonzalez; Katharina Biermann; Shanta Mancham; Alex L Nigg; Bettina E Hansen; Hans A Stoop; Guoying Zhou; Cornelis Verhoef; Stefan Sleijfer; Dave Sprengers; Jaap Kwekkeboom; Marco J Bruno
Journal:  J Surg Oncol       Date:  2018-06-07       Impact factor: 3.454

8.  Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data.

Authors:  Francesca Finotello; Clemens Mayer; Christina Plattner; Gerhard Laschober; Dietmar Rieder; Hubert Hackl; Anne Krogsdam; Zuzana Loncova; Wilfried Posch; Doris Wilflingseder; Sieghart Sopper; Marieke Ijsselsteijn; Thomas P Brouwer; Douglas Johnson; Yaomin Xu; Yu Wang; Melinda E Sanders; Monica V Estrada; Paula Ericsson-Gonzalez; Pornpimol Charoentong; Justin Balko; Noel Filipe da Cunha Carvalho de Miranda; Zlatko Trajanoski
Journal:  Genome Med       Date:  2019-05-24       Impact factor: 15.266

9.  A practical guide to methods controlling false discoveries in computational biology.

Authors:  Keegan Korthauer; Patrick K Kimes; Claire Duvallet; Alejandro Reyes; Ayshwarya Subramanian; Mingxiang Teng; Chinmay Shukla; Eric J Alm; Stephanie C Hicks
Journal:  Genome Biol       Date:  2019-06-04       Impact factor: 13.583

Review 10.  The Tumor Microenvironment in the Response to Immune Checkpoint Blockade Therapies.

Authors:  Florent Petitprez; Maxime Meylan; Aurélien de Reyniès; Catherine Sautès-Fridman; Wolf H Fridman
Journal:  Front Immunol       Date:  2020-05-07       Impact factor: 7.561

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.