Literature DB >> 36105798

Systematic identification of cancer-associated-fibroblast-derived genes in patients with colorectal cancer based on single-cell sequencing and transcriptomics.

Jia Zhao1,2,3,4, Ying Chen1,2,3,4.   

Abstract

Colorectal cancer (CRC) has a high incidence rate and poor prognosis, and the available treatment approaches have limited therapeutic benefits. Therefore, understanding the underlying mechanisms of occurrence and development is particularly crucial. Increasing attention has been paid to the pathophysiological role of cancer-associated fibroblasts (CAFs) in the heterogeneous tumour microenvironment. CAFs play a crucial role in tumorigenesis, tumour progression and treatment response. However, routine tissue sequencing cannot adequately reflect the heterogeneity of tumours. In this study, single-cell sequencing was used to examine the fibroblast population in CRC. After cluster analysis, the fibroblast population was divided into four subgroups. The distribution and role of these four subgroups in CRC were found to be different. Based on differential gene expression and lasso regression analysis of the main marker genes in these subgroups, four representative genes were obtained, namely, TCF7L1, FLNA, GPX3 and MMP11. Patients with CRC were divided into the low- and high-risk groups using the prognostic risk model established based on the expression of these four genes. The prognosis of patients in different risk groups varied significantly; patients with low-risk scores had a greater response to PDL1 inhibitors, significant clinical benefits and significantly prolonged overall survival. These effects may be attributed to inhibition of the function of T cells in the immune microenvironment and promotion of the function of tumour-associated macrophages.
Copyright © 2022 Zhao and Chen.

Entities:  

Keywords:  FLNA; GPX3; MMP11; TCF7L1; cancer-associated fibroblasts; colorectal cancer; prognostic risk model; single-cell sequencing

Mesh:

Year:  2022        PMID: 36105798      PMCID: PMC9465173          DOI: 10.3389/fimmu.2022.988246

Source DB:  PubMed          Journal:  Front Immunol        ISSN: 1664-3224            Impact factor:   8.786


Introduction

As the third most common malignancy, colorectal cancer (CRC) causes more than 8% of all deaths worldwide each year (1, 2). Routine treatment of CRC includes surgery, radiotherapy and chemotherapy, which are invasive and may have a greater impact on the quality of life of patients (3). After comprehensive treatment, the 5-year survival rate of patients with early-stage CRC is 90%; however, treatment options for patients with advanced-stage CRC who are ineligible for surgery are limited (4). Immunotherapy may be beneficial for patients with advanced-stage CRC. Because of its strong anti-tumour activity, immunotherapy is used for treating several solid tumours, including melanoma, kidney cancer, non-small cell lung cancer and prostate cancer (5). In addition to targeted and anti-vascular therapies, immunotherapeutic strategies have gradually improved. PD-1/L1 and CTLA-4 are the main immunotherapeutic agents; however, their clinical efficacy remains unclear. Studies have shown that only patients with CRC with defective mismatch repair (dMMR) or high microsatellite instability (MSI-H) are eligible for checkpoint inhibition and may benefit from it (6, 7). Therefore, dMMR/MSI-H is considered a predictive biomarker for the application and efficacy of immunosuppressants (8). However, the efficacy of dMMR/MSI-H is only 30–40% (9), which considerably limits the application of immune checkpoint inhibitors for the treatment of colon cancer. Therefore, understanding the underlying mechanisms of the occurrence and development of CRC is necessary to screen for more effective predictors and improve the currently available treatment approaches. In addition to the role of tumour cells, the tumour microenvironment (TME) is another major auxiliary factor in the onset and growth of tumours. Several studies have associated TME with the occurrence and growth of tumours, survival and clinical treatment sensitivity (10, 11). TME has an extremely complex system comprising stromal cells, tumour cells, various cytokines and an extracellular matrix (ECM) (12). Fibroblasts are the main cellular component of the matrix and are called cancer-associated fibroblasts (CAFs). They interact with cancer cells (13, 14) and are significantly associated with the prognosis of tumours (15). A recent study has demonstrated that CAFs play a significant role in various tumours. For example, matrix SOX2 upregulation promotes tumorigenesis by producing CAFs expressing SFRP1/2 (16), and Wnt-induced phenotypic transformation of CAFs inhibits EMT in CRC (17). However, most studies have focused only on the involvement of tumour cells in fibroblast remodelling or the effects of fibroblasts on tumour cells, and systematic analysis of tumours and TME including the whole fibroblast population is lacking. In this study, we identified fibroblast subsets based on single-cell sequencing analysis and identified hub genes significantly related to fibroblasts by differential analysis, correlation analysis, univariate cox analysis and lasso cox analysis. Further, we analysed the roles of hub genes in tumors from various aspects by studying the mutations and immunity of these genes. Finally, we constructed a multi-gene signature and confirmed its role in predicting patient outcomes and immunotherapy predictions.

Materials and methods

Extraction and preprocessing of scRNA data

The read count expression profile data of 16 cancer tissues and 7 adjacent tissues were extracted from the single-cell sequencing dataset GSE200997 from the NCBI database Gene Expression Omnibus (GEO). First, the single-cell data were filtered by ensuring that each gene was expressed in at least three cells, and at least 250 genes were expressed per cell. The PercentageFeatureSet function was used to determine the proportion of mitochondria and rRNA and ensure that <3000 genes are expressed per cell and the Unique molecular identifier (UMI) of each cell is at least >100. The data were standardised through log-normalisation, and highly variable genes were identified using the FindVariableFeatures function (variance-stabilising transformation was used to identify variable characteristics). Subsequently, the ScaleData function was used to scale all genes, and Principal components analysis (PCA) was used for dimensionality reduction to identify anchor points (dim = 40). The FindNeighbors and FindClusters functions (resolution = 0.2) were used to cluster the cells, and the RunTSNE function was used to reduce t-SNE dimensionality to screen for fibroblasts.

Extraction and preprocessing of the cancer genome atlas data

The clinical phenotype data of CRC were downloaded from TCGA database, and samples lacking data on survival time and survival status were removed. Samples were further filtered to ensure that the survival time in each sample was >0 days. In addition, the gene expression profile data were downloaded from TCGA database, and 431 tumour samples and 41 para-cancerous samples were selected for further analysis. The copy number variation (CNV) of CRC samples were downloaded from TCGA database and integrated using the GISTIC2 software. The single nucleotide variants (SNVs) data of TCGA-COAD cohort were downloaded from TCGA database and integrated using the Mutect2 software.

Extraction and preprocessing of GEO data

The GSE17536 and GSE17537 datasets were downloaded from GEO, and the probe IDs were converted to gene symbols according to the annotation files. A probe ID that corresponded to multiple genes was deleted, and the expression of several probes for a gene was averaged. Normal tissue samples were removed, and only tumour samples were retained. In addition, samples without clinical follow-up and OS data were removed to ensure that the survival time of all patients was >0 days. A total of 177 tumour samples and 21,655 genes were obtained from the GSE17536 dataset, and 55 tumour samples and 21,655 genes were obtained from the GSE17537 dataset.

Single-cell clustering dimensionality reduction

The R language Seurat package was first used to filter the single-cell data by setting each gene to be expressed in at least 3 cells, and each cell expresses at least 250 genes, calculating the proportion of mitochondria and rRNA through the PercentageFeatureSet function, and ensuring that each cell The expressed genes are less than 3000, and the UMI of each cell is at least greater than 100. Then, we normalized the data of 23 samples separately by log-normalization.The FindVariableFeatures function was used to find highly variable genes [identify variable features based on variance stabilizing transformation (“vst”)], then scaled all genes using the ScaleData function, and perform PCA dimensionality reduction to find anchors, we chose dim=40, pass The FindNeighbors and FindClusters functions cluster the cells (set Resolution=0.2), divided the subgroups, and used the RunTSNE function for TSNE dimensionality reduction,

Annotation and further segmentation of fibroblasts

The fibroblasts were screened with the four genes of ACTA2, FAP, PDGFRB and NOTCH3, and then the fibroblasts were extracted and clustered by the functions of FindNeighbors and FindClusters (setting Resolution=0.2), and the fibroblasts were further divided into 4 groups subpopulations and re-TSNE dimensionality reduction of fibroblasts using the RunTSNE function.

Identification of marker genes

The FindAllMarkers function of the Seurat package was used to identify marker genes of fibroblasts by LogFC=0.5, Minpct=0.35 (minimum expression ratio of differential genes) and identified marker genes with a corrected p<0.05.

Functional annotation of subgroups

KEGG enrichment analysis was performed on marker genes of fibroblast subpopulations using the compareCluster function of the clusterProfiler package in R language, and screening was performed with pvalue Cutoff=0.05.

Identification of malignant and non-malignant cells

Four fibroblast subpopulations were analyzed using the R language copykat package to differentiate between tumor cells/malignant cells and normal cells/non-malignant cells in each sample by changes in the cnv of the cells. Copykat’s statistical workflow combines Bayesian methods with hierarchical clustering to calculate genomic copy number profiles of individual cells and to define clonal substructures from high-throughput 3’ scRNA-seq data. The workflow takes a gene expression matrix of Unique Molecular Identifier (UMI) counts as input to the calculation. Analysis begins with rows of gene annotations, ordered by their genomic coordinates. Freeman-Tukey transformation (FTT) was performed to stabilize variance, followed by polynomial dynamic linear modeling (DLM) to smooth out outliers in single-cell UMI counts. A subset of diploid cells with high confidence was then examined to infer baseline copy number values ​​for normal 2N cells. To do this, we pooled individual cells into several small hierarchical clusters and estimated the variance of each cluster using a Gaussian mixture model (GMM). By following strict classification criteria, the cluster with the smallest estimated variance was defined as “confident diploid cells”. Potential misclassification can occur when the data have only a few normal cells, or when tumor cells have near-diploid genomes and limited CNA events. In this context, Copykat provides a “GMM-defined” model to identify diploid normal cells one by one, where a mixture of three Gaussian models of gene expression in a single cell is assumed to represent genomic gain, loss, and neutral states. A single cell is defined as a “confident diploid cell” when the genes in the neutral state account for at least 99% of the expressed genes.

Tumour-related pathways.

As reported in a previous study, the 10 pathways related to tumours and genes associated with these pathways are shown in . The scores of each cell for the 10 pathways were calculated via Single-sample GSEA (ssGSEA). The proportion of malignant and non-malignant cells and the MSI status in fibroblast subpopulations were compared via the chi-square test, and the scores of different fibroblast subpopulations associated with the 10 tumour-related pathways were compared via the Wilcoxon test.

Potential regulatory pathways of key genes

Using h.all.v7.5.1.symbols.gmt as a background, the enrichment scores of patients in TCGA cohort for each pathway were calculated using the GSVA package in R. Subsequently, the correlation between gene expression and pathway enrichment scores was analysed using the Hmisc package.

Construction of a risk model for predicting the response to PD-L1 inhibitor immunotherapy

The PD-L1 cohort (IMvigor210) was used to assess the relationship between risk scores and immunotherapy. The effects of PD-L1 inhibitors were different among 348 patients in the IMvigor210 cohort, which were characterised by stable disease (SD), progressive disease (PD), partial response (PR) and complete response (CR). In addition, differences between immunotherapy and chemotherapy were analysed in the IMvigor210 cohort. The risk model was used to evaluate the possible clinical outcomes of immunotherapy using the TIDE (http://tide.dfci.harvard.edu/) software. The likelihood of immune escape increased with increasing TIDE prediction scores, indicating that immunotherapy is less likely to benefit patients.

Statistical analyses

The Shapiro–Wilk test was used to compare the normality of variables between two groups. The unpaired Student’s t-test was used to determine the statistical significance of differences between normally distributed variables, and the Mann–Whitney U test was used to analyse non-normally distributed variables. The Kruskal–Wallis test and one-way ANOVA were employed as non-parametric and parametric methods, respectively, for comparing more than two groups. Spearman and distance correlation analyses were used to examine the correlation. The Kaplan–Meier method was used to compute survival rates, and the log-rank test was used to assess the significance of variations in survival curves.

Results

Identification of fibroblasts from scRNA-seq data

A total of 49,698 cells were obtained after filtering single-cell sequencing data. The PercentageFeatureSet function was used to calculate the proportion of mitochondria and rRNA, and 48,755 cells were obtained. As shown in , a significant correlation was observed between the number of UMI and mRNA but not between the number of UMI/mRNA and the content of mitochondrial genes. A violin diagram created before and after QC analysis is shown in . Furthermore, the data of 23 samples were standardised via log-normalisation. A total of 16 subgroups were obtained, and the RunTSNE function was used to reduce t-SNE dimensionality. Fibroblasts were screened based on the expression of ACTA2, FAP, PDGFRB and NOTCH3. Because these four genes were mainly expressed by cells in subgroup 9, the cells were defined as fibroblasts ( ) and extracted for cluster analysis. These fibroblasts were further divided into four subgroups, and the RunTSNE function was used to reduce t-SNE dimensionality. The t-SNE map of the four fibroblast subpopulations and marker gene expression is shown in . shows the t-SNE diagram of 23 samples, shows the t-SNE diagram of different tissues (cancer and adjacent tissues), shows the t-SNE diagram of the MSI status and shows the t-SNE diagram of fibroblast subsets after cluster analysis. The number of cells in each sample before and after data filtration is shown in .
Figure 1

(A) t-SNE diagram of 23 samples; (B) Distribution of t-SNE in cancer and adjacent tissues; (C) t-SNE distribution diagram of MSI status; (D) t-SNE map of four fibroblast subpopulations after cluster analysis; (E) Dot map of the expression of the top five marker genes in the subpopulations; (F) The proportion and cell number of subpopulations in cancer and adjacent tissues; (G) KEGG enrichment analysis of the four fibroblast subpopulations; (H) Distribution of t-SNE in malignant and non-malignant cells predicted using the copykat package.

Table 1

Counting of cell counts before and after sample filtration.

Samplesraw_countclean_countPercentage (%)
B_cac1028232823100
B_cac114644461199.29
B_cac144764472299.12
B_cac151034103099.61
B_cac42666265299.47
B_cac671771299.3
B_cac71565155499.3
T_cac11692158693.74
T_cac1069769099
T_cac112865276196.37
T_cac124038401899.5
T_cac1326422642100
T_cac144071402098.75
T_cac153675365199.35
T_cac161381124390.01
T_cac21674164998.51
T_cac31183109392.39
T_cac41584157599.43
T_cac5169169100
T_cac61690164397.22
T_cac71494148099.06
T_cac899090391.21
T_cac91640152893.17
(A) t-SNE diagram of 23 samples; (B) Distribution of t-SNE in cancer and adjacent tissues; (C) t-SNE distribution diagram of MSI status; (D) t-SNE map of four fibroblast subpopulations after cluster analysis; (E) Dot map of the expression of the top five marker genes in the subpopulations; (F) The proportion and cell number of subpopulations in cancer and adjacent tissues; (G) KEGG enrichment analysis of the four fibroblast subpopulations; (H) Distribution of t-SNE in malignant and non-malignant cells predicted using the copykat package. Counting of cell counts before and after sample filtration. The marker genes of the four subpopulations were identified using the FindAllMarkers function (logfc = 0.5 [difference multiple], Minpct = 0.35 [minimum expression ratio of different genes] and corrected p-value < 0.05). The expression of the top five marker genes with the most prominent contribution was analysed in each subgroup ( ). Furthermore, the proportion of the four fibroblast subpopulations was analysed in each sample ( ), and the clusterprofiler package in R was used for KEGG enrichment analysis of marker genes in each subgroup ( ). The copykat package in R was used to screen for tumour/malignant cells and normal/non-malignant cells in each sample based on CNVs (to ensure that normal cells were not included). A total of 297 cancer cells (malignant cells) and 491 normal cells (non-malignant cells) were eventually identified ( ).

Expression of fibroblasts in tumour-related pathways

Genes involved in 10 important pathways associated with tumorigenesis and development were extracted from previous studies. shows the enrichment of fibroblasts in the 10 tumour-related pathways. In addition, the proportion of malignant and non-malignant cells and the MSI status in the fibroblast subpopulations were compared ( ), and the scores of different fibroblasts in the 10 pathways were compared ( ).
Figure 2

(A) Heat map of the scores of 10 tumour-related pathways enriched in CAFs; (B) Comparison of CAF subpopulations in malignant and non-malignant cells; (C) Comparison of CAF subpopulations in terms of MSI status; (D) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_0 subgroup; (E) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_1 subgroup; (F) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_2 subgroup; (G) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_3 subgroup; (Wilcoxon test; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001). ns, no significant.

(A) Heat map of the scores of 10 tumour-related pathways enriched in CAFs; (B) Comparison of CAF subpopulations in malignant and non-malignant cells; (C) Comparison of CAF subpopulations in terms of MSI status; (D) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_0 subgroup; (E) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_1 subgroup; (F) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_2 subgroup; (G) Comparison of the scores of 10 tumour-related pathways between malignant and non-malignant cells in the CAF_3 subgroup; (Wilcoxon test; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001). ns, no significant.

Identification of key genes in fibroblasts

A total of 1424 upregulated and 1245 downregulated genes were identified in TCGA dataset using the limma package (FDR < 0.05 and |log2 (fold change)| > 1). shows a volcano map of differential analysis.
Figure 3

(A) Volcano map of differential gene expression between cancer and adjacent tissues in TCGA dataset; (B) The scores of the four fibroblast subgroups were compared between cancer and adjacent tissues (Wilcoxon test); (C) KM curve of the high- and low-score groups in the CAF_0 subgroup; (D) KM curve of the high- and low-score groups in the CAF_1 subgroup; (E) KM curve of the high- and low-score groups in the CAF_2 subgroup; (F) KM curve of the high- and low-score groups in the CAF_3 subgroup. **P < 0.01, ****P < 0.0001.

(A) Volcano map of differential gene expression between cancer and adjacent tissues in TCGA dataset; (B) The scores of the four fibroblast subgroups were compared between cancer and adjacent tissues (Wilcoxon test); (C) KM curve of the high- and low-score groups in the CAF_0 subgroup; (D) KM curve of the high- and low-score groups in the CAF_1 subgroup; (E) KM curve of the high- and low-score groups in the CAF_2 subgroup; (F) KM curve of the high- and low-score groups in the CAF_3 subgroup. **P < 0.01, ****P < 0.0001. Based on the results of single-cell sequencing analysis, the scores of the CAF subgroups in TCGA dataset were calculated using ssGSEA to screen for marker genes in each subgroup. The results revealed that the scores of the CAF_0 subgroup were higher in cancer tissues, whereas those of CAF_1, CAF_2 and CAF_3 subgroups were higher in paracancerous tissues ( ). Subsequently, the survminer package was used to select optimal truncation based on the total survival time, and the scores of the four fibroblast subgroups were divided into the high- and low-score groups. The KM curve revealed that the high-score group of the four subgroups had a poor prognosis ( ). Furthermore, the Hmisc package was used to examine the correlation between 2669 DEGs associated with tumorigenesis and development and the scores of the four CAF subgroups. A total of 248 key genes significantly associated with the four fibroblast subpopulations were identified (p < 0.001; cor > 0.7) and subjected to univariate cox analysis using the coxph function of the survival package. The results revealed 36 genes with a high prognostic impact, which were considered prognostic risk factors (p < 0.01) ( ).
Figure 4

(A) A total of 248 candidate DEGs were identified; (B) The locus of each independent variable changing with lambda; (C) Confidence interval under lambda; (D) Multivariate Cox regression analysis (coefficient of prognosis-related genes).

(A) A total of 248 candidate DEGs were identified; (B) The locus of each independent variable changing with lambda; (C) Confidence interval under lambda; (D) Multivariate Cox regression analysis (coefficient of prognosis-related genes). These 36 key genes were further filtered using lasso regression to decrease the number of genes used for constructing a risk model. Lasso regression is a compression estimation technique. By creating a penalty function, which causes certain coefficients to be compressed and some coefficients to be set to zero, lasso regression helps to create a more refined model. Therefore, lasso regression retains the benefit of subset contraction and is a biased estimation for analysing data with complex collinearity. It selects variables during parameter estimation and improves the method of dealing with multicollinearity in regression analysis. In this study, the R software package glmnet was used to perform lasso–Cox regression. The change in each independent variable was assessed ( ), and the number of independent variable coefficients tending to 0 was found to gradually increase with the increase in lambda. The risk model was constructed using 10-fold cross-validation, and the confidence interval of each lambda was evaluated ( ). The performance of the model was optimal at a lambda of 0.0251. The four genes obtained based on this value were selected as target genes for further analysis, and multivariate cox analysis revealed that the genes were prognostic risk factors ( ).

Mutation analysis of key genes

The SNVs of the four genes were examined in TCGA dataset, and FLNA was found to have the highest mutation frequency ( ). Subsequently, we examined the collinearity and mutual exclusiveness of these four and the top 10 genes with most mutations in CRC and found that the mutations of these four genes did not exhibit significant collinearity ( ). Furthermore, the CNVs of the four genes were analysed, and only a few samples were found to have copy number amplification/deletion ( ).
Figure 5

(A) Waterfall diagram of SNVs in the 4 key genes; (B) Collinearity and mutual exclusion analysis of the 4 key genes and 10 genes with the most mutations in CRC; (C) CNVs in the 4 key genes; (D) Heat map of the correlation between the 4 key genes and aneuploidy scores, homologous recombination defects, fraction altered, number of segments and non-silent mutation rates.

(A) Waterfall diagram of SNVs in the 4 key genes; (B) Collinearity and mutual exclusion analysis of the 4 key genes and 10 genes with the most mutations in CRC; (C) CNVs in the 4 key genes; (D) Heat map of the correlation between the 4 key genes and aneuploidy scores, homologous recombination defects, fraction altered, number of segments and non-silent mutation rates. The molecular characteristics of TCGA-COAD cohort were obtained from previous pan-cancer studies. Correlation analysis revealed that MMP11 and TCF7L1 were significantly positively correlated with aneuploidy scores, homologous recombination defects and the fraction altered ( ). The enrichment scores of each pathway in TCGA cohort were calculated using the gsva package in R, and Pearson correlation analysis between the expression of the four genes and the pathway enrichment scores was performed using the Hmisc package in R. A total of 22 significantly related pathways were identified (|cor| > 0.4 and p < 0.001). shows a heat map of the relationship between the 4 genes and 22 pathways. shows a heat map of the enrichment scores of 22 pathways.
Figure 6

(A) Heat map of the correlation between genes and pathways; (B) Heat map of the enrichment scores of key pathways. *P< 0.05, **P < 0.01, ***P < 0.001.

(A) Heat map of the correlation between genes and pathways; (B) Heat map of the enrichment scores of key pathways. *P< 0.05, **P < 0.01, ***P < 0.001.

Relationship between key genes and immunity

The immune scores of each sample in TCGA dataset were evaluated using the ESTIMATE algorithm and were found to have a significant positive correlation with the four genes ( ). The samples were divided into the high- and low-expression groups based on the median expression level of the four genes, and significant differences in immune scores were observed between the high- and low-expression groups ( ).
Figure 7

(A) Heat map of the correlation between key genes and immune scores predicted using the ESTIMATE algorithm; (B) Comparison of immune scores in the high- and low-expression groups (Wilcoxon test); (C) Heat map of the correlation between key genes and immune cell scores predicted using the CIBERSORT algorithm; (D) Comparison of the scores of 22 immune cells between the high- and low-expression groups (Wilcoxon test; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001). ns, no significant.

(A) Heat map of the correlation between key genes and immune scores predicted using the ESTIMATE algorithm; (B) Comparison of immune scores in the high- and low-expression groups (Wilcoxon test); (C) Heat map of the correlation between key genes and immune cell scores predicted using the CIBERSORT algorithm; (D) Comparison of the scores of 22 immune cells between the high- and low-expression groups (Wilcoxon test; *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001). ns, no significant. The CIBERSORT method was used to determine the immune cell scores of samples in TCGA dataset. Correlation analysis revealed that the expression of the four genes was significantly negatively correlated with T cell scores but was significantly positively correlated with macrophage-related scores ( ). The samples were divided into the high- and low-expression groups based on the median expression level of the four genes, and significant differences in some immune cell scores between the high- and low-expression groups ( ).

Construction of a risk model based on key genes

The results of multivariate Cox analysis are shown in . The risk scores of samples were calculated using the following formula: RiskScore = Σ βi × Expi, where i refers to the expression levels of the four key genes, and β is the multivariate Cox regression coefficient of the corresponding genes. The final formula for calculating risk scores based on the 4-gene signature is as follows: TCGA cohort was used as the training dataset to determine the risk score of each sample. ROC analysis was performed to examine the efficiency of the risk model in predicting prognosis at 1–5 years using the R software package timeROC ( ). The AUC value for predicting prognosis at 4 and 5 years was 0.7. In addition, z-scores were evaluated for risk scores, and samples with risk scores of >0 were included in the high-risk group, whereas those with risk scores of <0 were included in the low-risk group. Subsequently, KM curves were plotted, and significant differences were observed between the two groups (p < 0.0001) ( ).
Figure 8

(A) ROC curve of the risk model constructed based on 4 genes in TCGA dataset; (B) KM curve of the risk model constructed based on 4 genes in TCGA dataset; (C) ROC curve of the risk model constructed based on 4 genes in the GSE17536 dataset; (D) KM curve of the risk model constructed based on 4 genes in the GSE17536 dataset; (E) ROC curve of the risk model constructed based on 4 genes in the GSE17537 dataset; (F) KM curve of the risk model constructed based on 4 genes in the GSE17537 dataset.

(A) ROC curve of the risk model constructed based on 4 genes in TCGA dataset; (B) KM curve of the risk model constructed based on 4 genes in TCGA dataset; (C) ROC curve of the risk model constructed based on 4 genes in the GSE17536 dataset; (D) KM curve of the risk model constructed based on 4 genes in the GSE17536 dataset; (E) ROC curve of the risk model constructed based on 4 genes in the GSE17537 dataset; (F) KM curve of the risk model constructed based on 4 genes in the GSE17537 dataset. The GSE17536 dataset was used to verify the robustness of the model using the abovementioned method. A risk model was constructed, and its efficiency in predicting prognosis at 1–5 years was analysed using the R software package timeROC ( ). The AUC value for predicting prognosis at 1 year was 0.7. In addition, z-scores were evaluated for risk scores, and samples with risk scores of >0 were included in the high-risk group, whereas those with risk scores of <0 were included in the low-risk group. Subsequently, KM curves were plotted, and significant differences were observed between the two groups (p < 0.05) ( ). The GSE17537 dataset was analysed using the same method. As shown in , the AUC value for predicting prognosis at 1–4 years was >0.7, and substantial differences were observed between the high- and low-risk groups.

Combination of risk scores and clinicopathological features to improve survival prediction

Multivariate and univariate Cox regression analyses of the risk score and clinicopathological features showed that the risk score was the most significant prognostic factor ( ). A nomogram integrating the risk scores and other clinicopathological parameters was constructed for quantifying the risk assessment and survival probability of patients with CRC ( ). The risk score had the most influence on survival rate prediction. The predictive accuracy of the risk model was further assessed using a calibration curve ( ). The calibration curve plotted for predicting prognosis at 1, 3 and 5 years and the standard curve yielded similar results, indicating the good predictive performance of the nomogram. Additionally, decision curve analysis was performed to assess the reliability of the model, and the benefits of the nomogram and risk score were found to be considerably greater than those of the extreme curve. The performance of the nomogram and risk score in predicting survival was superior to that of other clinicopathological features ( ).
Figure 9

(A, B) Univariate and multivariate Cox analyses of the risk score and clinicopathological features; (C) Nomogram model; (D) Calibration curve of the nomogram (1, 3 and 5 years); (E) Decision curve of the nomogram; (F) Compared with other clinicopathological features, the nomogram exhibited a superior capacity for survival prediction.

(A, B) Univariate and multivariate Cox analyses of the risk score and clinicopathological features; (C) Nomogram model; (D) Calibration curve of the nomogram (1, 3 and 5 years); (E) Decision curve of the nomogram; (F) Compared with other clinicopathological features, the nomogram exhibited a superior capacity for survival prediction.

Prediction of the response to PD-L1 inhibitor immunotherapy via the risk model

The capability of the risk score to predict the response of patients to ICB therapy was assessed to study its association with immunotherapy. The results showed that patients with low risk scores had significant clinical benefits and prolonged OS in the anti-PD-L1 cohort (IMvigor210 cohort) ( , p < 0.05). PD-L1 inhibitors had different effects among 348 patients in the IMvigor210 cohort, which were characterised by progressive disease (PD), stable disease (SD), partial response (PR) and complete response (CR). The risk scores of patients with SD/PD were higher than that of patients with other types of reactions ( ). Additionally, patients with low-risk scores experienced considerably superior treatment outcomes ( ). In addition, differences in survival among patients with different CRC stages in the IMvigor210 samples were analysed. The results revealed that stage I+II samples showed substantial survival differences ( ); however, stage III+IV samples did not show significant survival differences ( ).
Figure 10

(A) Differences in immunotherapy responses and risk scores in the IMvigor210 cohort; (B) Immunotherapy response among different risk groups in the IMvigor210 cohort; (C) Prognostic differences between different risk groups in the IMvigor210 cohort; (D) Prognostic differences between different risk groups of patients with early-stage CRC in the IMvigor210 cohort; (E) Prognostic differences between different risk groups of patients with middle- and late-stage CRC in the IMvigor210 cohort; (F) Differences in immunotherapy response and different risk scores in the IMvigor210 cohort were analysed using the TIDE software; (G) Differences in TIDE scores and immunotherapy responses in the IMvigor210 cohort; (H) Correlation analysis between the risk and TIDE scores in the IMvigor210 cohort. *P< 0.05, **P < 0.01, ****P < 0.0001.

(A) Differences in immunotherapy responses and risk scores in the IMvigor210 cohort; (B) Immunotherapy response among different risk groups in the IMvigor210 cohort; (C) Prognostic differences between different risk groups in the IMvigor210 cohort; (D) Prognostic differences between different risk groups of patients with early-stage CRC in the IMvigor210 cohort; (E) Prognostic differences between different risk groups of patients with middle- and late-stage CRC in the IMvigor210 cohort; (F) Differences in immunotherapy response and different risk scores in the IMvigor210 cohort were analysed using the TIDE software; (G) Differences in TIDE scores and immunotherapy responses in the IMvigor210 cohort; (H) Correlation analysis between the risk and TIDE scores in the IMvigor210 cohort. *P< 0.05, **P < 0.01, ****P < 0.0001. Furthermore, differences in immunotherapy and chemotherapy responses were analysed among patients in the IMvigor210 cohort. The risk model was used to assess the potential clinical impacts of immunotherapy using the TIDE (http://tide.dfci.harvard.edu/) software. The likelihood of immune escape increased with increasing TIDE prediction scores, indicating that patients were less likely to benefit from immunotherapy. With regard to immunotherapy, the risk and TIDE scores of patients unresponsive to immunotherapy were found to be higher, which also showed that the high-risk group was less likely to benefit from immunotherapy ( ). In addition, Pearson correlation analysis revealed a strong positive correlation between the TIDE and risk scores ( ).

Discussion

The proliferation of connective tissue is one of the key hallmarks of tumours, and the components involved in proliferation include fibroblasts, macrophages, immune cells and dense ECM (18). Fibroblasts are the main cell type in ECM, which are called CAFs. Recently, a consensus statement was issued, which stated that cancer cells with slender morphology; a lack of mutations and negative markers of epithelial cells, endothelial cells and leukocytes may be considered CAFs (19). The characteristic markers of CAFs are α-SMA and fibroblast-activating protein (FAP), and the expression of fibroblast-specific protein 1 (FSP1), platelet-derived growth factor receptor (PDGFR)-α/β and vimentin is high in CAFs. These proteins are transcribed from ACTA2, FAP, PDGFRB and NOTCH3 genes, respectively. Because morphological features are subjective and not conducive to quantification, we used ACTA2, FAP, PDGFRB and NOTCH3 genes as markers to screen for CAFs in CRC samples via single-cell sequencing. Compared with single-cell sequencing technology, the traditional transcriptome sequencing technology (bulk RNA-seq) is based on tissue samples (cell population), which reflects the average expression level of genes in the cell population. However, several studies have indicated that CAF is heterogeneous, and certain CAF subtypes stimulate tumour growth, whereas some inhibit it. For instance, in a study by Costa et al., CAF subgroup 1 created an immunosuppressive microenvironment by suppressing CD4+CD25+ T cells in breast cancer (20). Su et al. (21) reported that the new subset, CD10+GPR77+ CAFs, can facilitate the formation of tumours in patients with breast and lung cancers. Therefore, conventional sequencing technology cannot reflect the role of CAFs in tumours. In this study, cells in subgroup 9 mainly expressed ACTA2, FAP, PDGFRB and NOTCH3 and were, therefore, defined as fibroblasts. The fibroblasts of subgroup 9 were extracted, subjected to cluster analysis and further divided into four subgroups. KEGG enrichment analysis of marker genes in each subgroup revealed that the genes were mainly enriched in pathways associated with ‘ECM’ and ‘focal adhesion’, which play an important role in tumours. However, this finding does not indicate that the four CAF subgroups play the same role in tumours. Consistent with previous studies, this study revealed that the four CAF subpopulations may play different or contradictory roles in tumours. The distribution of malignant and non-malignant cells among the CAF subpopulations was significantly different. In the CAF_0 subpopulation, the proportion of malignant cells was higher, and that of cells with MSI-H was lower. However, in the other three subpopulations, the proportion of malignant cells was lower, and that of cells with MSI-H was higher. Furthermore, single-cell sequencing was used to screen for marker genes in the CAF subgroups, and the scores of the subgroups in TCGA dataset were calculated via ssGSEA. The results showed that the scores of the CAF_0 subgroup were higher in cancer tissues, and those of CAF_1, CAF_2 and CAF_3 subgroups were higher in adjacent tissues, which was consistent with the previous results, that is, the proportion of malignant tumour cells was higher in the in CAF_0 subgroup and lower in the other three subgroups. However, no significant differences in prognosis were observed among the four subgroups, and subgroups with high gene expression had a better prognosis. This finding indicates that CAFs in the same subgroup have some heterogeneity and hence cannot adequately predict the survival of patients in different subgroups. Several studies have shown that CAFs promote tumour progression in various ways, such as by remodelling ECM (22, 23), interfering with drug delivery (24), producing collagen in ECM and regulating the hardness of the tumour matrix (25). CAFs can secrete chemokines (26, 27) and cytokines (28), leading to lymphatic angiogenesis (29), so as to promote the endocrine function of cancer cells. In addition, they change the immune cell environment by recruiting immunosuppressive cells and inhibiting the activity of immune effector cells (30). In this study, the role of different CAF subtypes in tumorigenesis and development of CRC was examined, and the scores of 10 tumour-related pathways in 4 CAF subtypes were compared between malignant and non-malignant cells. The PI3K pathway was found to be highly expressed in malignant cells. Studies have shown that the PI3K pathway promotes tumour progression. The EphA2–PI3K signal can simulate angiogenesis induced by CAFs in gastric cancer cells (31). CAF-derived HGF promotes cell proliferation and drug resistance by upregulating the c-Met/PI3K/Akt and GRP78 signalling pathways in ovarian cancer cells (32). The results of this study are consistent with those of previous studies, suggesting that CAFs promote tumour progression through the PI3K pathway. To decrease the heterogeneity among subgroups, the marker genes of different CAF subgroups were used to classify CAFs. After differential expression analysis, four genes were selected via lasso regression analysis, namely, TCF7L1, FLNA, GPX3 and MMP11. TCF7L1 is a member of the TCF/lymphoid enhancer (LEF) family of transcription factors, which is involved in maintaining stem cell pluripotency (33) and skin epithelial tissue homeostasis (34). Studies have shown that ectopic TCF7L1 expression impairs the growth and invasion of highly metastatic breast cancer cells (35). In addition, overexpression of TCF7L1 can induce the growth of colorectal tumour cells (36). FLNA, the most abundant and widely distributed member of the filamin family, is a non-muscle actin filament cross-linked protein (37). Some studies have shown that FLNA is associated with multiple functional non-cytoskeletal proteins and participates in several related pathways regulating cell migration and adhesion (38). FLNA acts as a pro-oncoprotein in various human malignancies, including metastatic melanoma and hepatocellular carcinoma (39, 40). However, the expression of FLNA is decreased in breast cancer, which is negatively correlated with lymph node metastasis. FLNA knockout can promote cell migration and invasion (41). In CRC, FLNA promotes chemotherapy resistance by inducing epithelial–mesenchymal transformation and the Smad2 signalling pathway (42). Therefore, the controversial role of FLNA in human malignant tumours has been reported in several studies. GPX3 is a tumour suppressor gene and the main antioxidant enzyme in plasma. It plays an important role in scavenging hydrogen peroxide and other oxygen free radicals and protecting cells from oxidative stress-induced damage (43–45). As an important member of the MMP family, MMP11 regulates a series of physiological processes and signalling events, manipulates some bioactive molecules on the cell surface, changes the biological behaviour of cells and plays an important role in TME (46, 47). In addition, studies have shown that MMP is closely related to tumorigenesis. The most important MMP is MMP11, which is overexpressed in tumours and participates in the proliferation and malignant development of tumour cells (48, 49). However, according to previous studies, CAFs can also degrade ECM by releasing MMPs and synthesising new matrix proteins to provide structural support for tumour invasion and angiogenesis (50, 51). Therefore, MMP11 can be used for the evaluation of prognosis. The four genes identified via lasso regression were subjected to enrichment analysis, and 22 significantly related pathways were identified including those associated with ‘angiogenesis’, ‘apical junction’, ‘apoptosis’ and ‘IL2–STAT5’. The four key genes were used to establish a prognostic risk model, which had good stability and accuracy in predicting prognosis in both training and validation sets. The prognosis of patients in the high-risk group was worse. To quantify the risk assessment and survival probability of patients, the risk score was combined with other clinicopathological features, and it was found that the risk score adequately predicted clinicopathological features, especially the M stage, indicating that patients with high risk scores may be more predisposed to distant metastasis. In addition, to examine the relationship between the risk score and immunotherapy, the ability of risk score to predict the response of patients to ICB therapy was examined. Patients with low risk scores had significant clinical benefits and significantly prolonged OS in the anti-PD-L1 cohort. Furthermore, mutation analysis of the four genes in TCGA cohort revealed that FLNA had the highest mutation frequency, and there was no significant collinearity among the mutations of the four genes. Moreover, only a few samples had copy number amplification/deletion. Because the mutation frequency of the four genes is not significant, their role may be directly realised through their expression levels. Furthermore, the correlation between the prognostic risk model and infiltrating immune cells was analysed, and a significant positive correlation was observed between the four genes and immune scores, indicating that high gene expression increased the abundance of infiltrating immune cells in ECM. Moreover, these four genes had a significant negative correlation with T cell-related scores. Therefore, CAFs labelled by these genes can promote tumour progression by inhibiting T-cell function. This result is consistent with that of previous studies. CAFs can induce immune evasion of cancer cells (52, 53) and restrict the recruitment of immune effector cells (such as CD8+ T cells) to tumour tissues by secreting different chemokines (54). In this study, a significant positive correlation was observed between the four genes and the score of macrophages, which is consistent with the finding of a previously reported study, indicating that CAF can induce M2 polarisation (55). These results suggest that the interaction between stromal cells and immune-related cells in TME promotes tumour progression. However, this study has certain limitations. First, the results of single-cell sequencing were not verified in actual clinical samples. The screened key genes lack basic in vivo and in vitro experimental verification, and the prognostic model should be verified in actual clinical samples, which is our next research direction. In addition, there are some contradictory and unexplained results. For example, the distribution of different CAF subpopulations among malignant and normal cells is different; however, the prognosis among these populations was not different. Whether their distribution in malignant cells also plays an important role warrants further investigation and verification. In conclusion, the fibroblast population screened via single-cell sequencing in CRC was divided into four subpopulations through cluster analysis. The distribution and role of these four subpopulations are different in CRC. In addition, by analysing the differential expression of the main marker genes in these subpopulations, four representative genes were identified via lasso regression, namely, TCF7L1, FLNA, GPX3 and MMP11. Using the prognostic risk model constructed based on the expression of these four genes, patients with CRC were divided into the high- and low-risk groups. Patients with low risk scores had significant clinical benefits from immunotherapy and had significantly prolonged OS, which may be attributed to inhibition of T-cell function in the immune microenvironment and promotion of the function of tumour-associated macrophages.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/ .

Author contributions

JZ performed the study and wrote the paper. YC edited and proofread the paper. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  55 in total

Review 1.  Prognostic relevance of cancer-associated fibroblasts in human cancer.

Authors:  Janna Paulsson; Patrick Micke
Journal:  Semin Cancer Biol       Date:  2014-02-19       Impact factor: 15.707

Review 2.  Cancer-associated fibroblasts: Secretions, interactions, and therapy.

Authors:  Bagher Farhood; Masoud Najafi; Keywan Mortezaee
Journal:  J Cell Biochem       Date:  2018-09-27       Impact factor: 4.429

Review 3.  The Immunoscore: Colon Cancer and Beyond.

Authors:  Helen K Angell; Daniela Bruni; J Carl Barrett; Ronald Herbst; Jérôme Galon
Journal:  Clin Cancer Res       Date:  2019-08-14       Impact factor: 12.531

4.  Cancer statistics for African American/Black People 2022.

Authors:  Angela N Giaquinto; Kimberly D Miller; Katherine Y Tossas; Robert A Winn; Ahmedin Jemal; Rebecca L Siegel
Journal:  CA Cancer J Clin       Date:  2022-02-10       Impact factor: 286.130

5.  Control of breast cancer growth and initiation by the stem cell-associated transcription factor TCF3.

Authors:  Michal Slyper; Amit Shahar; Anat Bar-Ziv; Roy Z Granit; Tamar Hamburger; Bella Maly; Tamar Peretz; Ittai Ben-Porath
Journal:  Cancer Res       Date:  2012-10-22       Impact factor: 12.701

Review 6.  Cancer cells, adipocytes and matrix metalloproteinase 11: a vicious tumor progression cycle.

Authors:  Elena Roza Motrescu; Marie-Christine Rio
Journal:  Biol Chem       Date:  2008-08       Impact factor: 3.915

Review 7.  Therapeutic strategy in unresectable metastatic colorectal cancer: an updated review.

Authors:  Benoist Chibaudel; Christophe Tournigand; Franck Bonnetain; Hubert Richa; Magdalena Benetkiewicz; Thierry André; Aimery de Gramont
Journal:  Ther Adv Med Oncol       Date:  2015-05       Impact factor: 8.168

8.  Hyaluronan impairs vascular function and drug delivery in a mouse model of pancreatic cancer.

Authors:  Michael A Jacobetz; Derek S Chan; Albrecht Neesse; Tashinga E Bapiro; Natalie Cook; Kristopher K Frese; Christine Feig; Tomoaki Nakagawa; Meredith E Caldwell; Heather I Zecchini; Martijn P Lolkema; Ping Jiang; Anne Kultti; Curtis B Thompson; Daniel C Maneval; Duncan I Jodrell; Gregory I Frost; H M Shepard; Jeremy N Skepper; David A Tuveson
Journal:  Gut       Date:  2012-03-30       Impact factor: 23.059

9.  Remodeling of extracellular matrix by normal and tumor-associated fibroblasts promotes cervical cancer progression.

Authors:  Alexandra Fullár; József Dudás; Lászlóné Oláh; Péter Hollósi; Zoltán Papp; Gábor Sobel; Katalin Karászi; Sándor Paku; Kornélia Baghy; Ilona Kovalszky
Journal:  BMC Cancer       Date:  2015-04-11       Impact factor: 4.430

10.  Cancer-associated fibroblasts promote M2 polarization of macrophages in pancreatic ductal adenocarcinoma.

Authors:  Aibin Zhang; Yigang Qian; Zhou Ye; Haiyong Chen; Haiyang Xie; Lin Zhou; Yan Shen; Shusen Zheng
Journal:  Cancer Med       Date:  2017-01-18       Impact factor: 4.452

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.