Anji Ju1,2,3, Jiaze Tang1,2,3, Shuohua Chen1,2,3, Yan Fu1,2,3, Yongzhang Luo1,2,3. 1. The National Engineering Laboratory for Anti-Tumor Protein Therapeutics, Tsinghua University, Beijing, China. 2. Beijing Key Laboratory for Protein Therapeutics, Tsinghua University, Beijing, China. 3. Cancer Biology Laboratory, School of Life Sciences, Tsinghua University, Beijing, China.
Abstract
Skin cutaneous melanoma (SKCM) is a chronically malignant tumor with a high mortality rate. Pyroptosis, a kind of pro-inflammatory programmed cell death, has been linked to cancer in recent studies. However, the value of pyroptosis in the diagnosis and prognosis of SKCM is not clear. In this study, it was discovered that 20 pyroptosis-related genes (PRGs) differed in expression between SKCM and normal tissues, which were related to diagnosis and prognosis. Firstly, based on these genes, nine machine-learning algorithms were shown to perform well in constructing diagnostic classifiers, including K-Nearest Neighbor (KNN), logistic regression, Support Vector Machine (SVM), Artificial Neural Network (ANN), decision tree, random forest, XGBoost, LightGBM, and CatBoost. Secondly, the least absolute shrinkage and selection operator (LASSO) Cox regression analysis was applied and the prognostic model was constructed based on 9 PRGs. Subgroups in low and high risks determined by the prognostic model were shown to have different survival. Thirdly, functional enrichment analyses were performed by applying the gene set enrichment analysis (GSEA), and results suggested that the risk was related to immune response. In conclusion, the expression signatures of pyroptosis-related genes are effective and robust in the diagnosis and prognosis of SKCM, which is related to immunity.
Skin cutaneous melanoma (SKCM) is a chronically malignant tumor with a high mortality rate. Pyroptosis, a kind of pro-inflammatory programmed cell death, has been linked to cancer in recent studies. However, the value of pyroptosis in the diagnosis and prognosis of SKCM is not clear. In this study, it was discovered that 20 pyroptosis-related genes (PRGs) differed in expression between SKCM and normal tissues, which were related to diagnosis and prognosis. Firstly, based on these genes, nine machine-learning algorithms were shown to perform well in constructing diagnostic classifiers, including K-Nearest Neighbor (KNN), logistic regression, Support Vector Machine (SVM), Artificial Neural Network (ANN), decision tree, random forest, XGBoost, LightGBM, and CatBoost. Secondly, the least absolute shrinkage and selection operator (LASSO) Cox regression analysis was applied and the prognostic model was constructed based on 9 PRGs. Subgroups in low and high risks determined by the prognostic model were shown to have different survival. Thirdly, functional enrichment analyses were performed by applying the gene set enrichment analysis (GSEA), and results suggested that the risk was related to immune response. In conclusion, the expression signatures of pyroptosis-related genes are effective and robust in the diagnosis and prognosis of SKCM, which is related to immunity.
Malignant skin cutaneous melanoma (SKCM) is a serious life-threatening disease, and the incidence rate of SKCM is rapidly increasing throughout the world (1, 2). SKCM lacks specific treatment other than early surgical resection, which leads to a poor prognosis and extremely high mortality (3). Although non-Caucasian populations are less likely to develop melanoma, the severity of SKCM in Africa, Asia, Central America, and South America has increased (4). Lack of prevention and early diagnosis programs may contribute to the increased prevalence of SKCM in these regions (5). Therefore, developing efficient diagnosis and prognosis methods is important for the treatment of SKCM.Pyroptosis, or caspase 1-dependent cell death, also known as cellular inflammatory necrosis, is triggered by various pathological stimuli, such as microbial infections, stroke, heart attack, and cancer (6). The term pyroptosis was first proposed in 2001 from the Greek roots pyro, relating to fire or fever, and ptosis (to-sis) to denote a falling, to describe pro-inflammatory programmed cell death (7). In addition to apoptosis, ferroptosis, and autophagy, this newly discovered type of cell death has become a hot spot recently.Pyroptosis is characterized by the rapid rupture of the plasma membrane and the release of pro-inflammatory intracellular contents. A canonical pathway of pyroptosis is triggered by the activation of inflammasomes which are cytoplasmic multi-protein platforms containing the nucleotide-binding oligomerization domain (NOD)-like receptor (NLR) family (8). Caspase-1 can be activated by inflammasomes, which leads to the cleavage of gasdermin D (GSDMD) and both the maturation and secretion of pro-inflammatory cytokines, such as IL-18 and IL-1B (9). Caspase-1-dependent plasma membrane pores dissipate cellular ion gradients, resulting in osmotic pressure increase, which leads to water influx and cell swelling (10). Ultimately, osmotic lysis occurs and inflammatory intracellular contents are released (10). Caspase-1 dependence is a defining feature of pyroptosis in which mediates cell lysis during pyroptosis and is not involved in apoptosis (11–13). Besides GSDMD, the plasma membrane pores formation can be executed by the cleavage of other gasdermin proteins, especially gasdermin E (GSDME) which can be cleaved by caspase-3 to trigger pyroptosis (14, 15).The mechanism and functions of pyroptosis in tumor cells have been extensively studied, but its relationship to cancer prognosis has been ambiguous. This is because pyroptosis plays a dual role in cancer progression. On one hand, inducing pyroptosis may be a feasible method to kill tumor cells; on the other hand, as a type of pro-inflammatory death, pyroptosis can form a suitable microenvironment for tumor cell growth and thus promote tumor growth (16–20). In SKCM, aberrant expression of PRGs was associated with metastasis, invasion, and drug resistance, in addition to mediating melanoma cell death (20). Given the existing findings, it is likely that the impact of pyroptosis on the development of melanoma is bidirectional. As a result, the role of PRG expression in the diagnosis and prognosis of SKCM remains unclear. Studying the relationship between pyroptosis and clinical features of SKCM is helpful for its treatment, but the value of pyroptosis in the diagnosis and prognosis of SKCM has not been reported. Therefore, in this systematic study, classifiers were built through machine-learning algorithms to mine out the diagnosis value of pyroptosis-related genes (PRGs) in distinguishing between SKCM and normal tissue. Then a novel PRGs prognostic risk signature in SKCM was constructed for survival predicting. Besides, prognostic risk-related phenotypes were analyzed. Thus, this study provides a novel understanding of the role of pyroptosis in SKCM and suggests that PRG signatures have the potential to diagnose and predict the prognosis.
Materials and Methods
Data Collection
The study design and grouping are shown in
. Transcriptome profiles and clinical data in SKCM patients were collected in the database of The Cancer Genome Atlas (TCGA) -SKCM (18th December 2019. https://portal.gdc.cancer.gov/) and Gene Expression Omnibus (GEO. https://www.ncbi.nlm.nih.gov/geo/) including GSE54467, GSE65904, GSE98394, and GSE112509 (21–25). Transcriptome profiles in normal skin tissues were collected in the database of Genotype-Tissue Expression Project (GTEx-SKIN. https://gtexportal.org/home/). The RNA-seq data in TCGA-SKCM, GSE98394, GSE112509, and GTEx-SKIN were converted to Transcripts per Kilobase Million (TPM) format. The microarray data in GSE54467 and GSE65904 were normalized by using the R package “limma”. Repeat values were averaged and missing values were removed. The RNA-seq data in TCGA-SKCM and GTEx-SKIN were merged and normalized by using the R package “limma”.
Figure 1
The flowchart of the overall procedures. This flowchart illustrates the process of data collection and analyses for diagnostic and prognostic studies.
The flowchart of the overall procedures. This flowchart illustrates the process of data collection and analyses for diagnostic and prognostic studies.
Identification of Differentially Expressed Genes
Twenty PRGs (listed in
) were retrieved in the GeneCards database (8th January 2020. https://www.genecards.org/) by the keyword “pyroptosis” and verified in several reviews (26–29). The “limma” package was used to identify differentially expressed genes (DEGs) between SKCM and normal tissues with the FDR-adjusted p-value, i.e. the q-value < 0.1. The correlation of DEGs was analyzed and demonstrated by using the R package “corrplot”. The significance of relationships between OS and the DEGs in TCGA-SKCM was determined using univariate Cox regression analysis and the q-value < 0.1 was chosen as the criteria, which was carried out by using the “survival” R package. A protein-protein interaction (PPI) network for the DEGs was obtained from Search Tool for the Retrieval of Interacting Genes (STRING v11.0, https://string-db.org/).
Construction and Evaluation of PRGs-Based Classifiers for SKCM Diagnosis
Data from GSE98394 were randomly divided into a training set and a testing set according to 7:3. Data from the training set were used to train the classifiers respectively based on the K-Nearest Neighbor (KNN), logistic regression, Support Vector Machine (SVM), Artificial Neural Network (ANN), decision tree, random forest, XGBoost, LightGBM, and CatBoost via following Python packages: Scikit-learn (sklearn) v0.23.2, XGBoost v1.3.3, LightGBM v3.1.1, and CatBoost v0.24.4 (30–33).The “sklearn.metrics” Python package was used to evaluate the PRGs-based classifiers, and the “matplotlib” Python package was used to plot the receiver operating characteristic (ROC) curves. Besides the area under ROC curves (AUC), accuracy, precision (also known as positive predictive value), recall (also known as sensitivity), and F1 score were calculated to evaluate the prediction performance of the models by using the “sklearn.metrics” Python package. To assess the quality of the models, the Gini index and Kolmogorov–Smirnov (KS) value were calculated according to the methods described previously (34).Data from the testing set were used to perform internal evaluations and parameter tuning. Major parameters used in the above algorithms are listed in
. For external evaluations, data from TCGA-SKCM & GTEx-SKIN (validation 1 set) and GSE112509 (validation 2 set) were used. Data from each group were normalized by employing the “StandardScaler” function from the “sklearn.preprocessing” Python package before training and evaluations.
Consensus Clustering Analysis of PRGs
To classify the SKCM by consensus clustering, R packages “limma” and “ConsensusClusterPlus” were used. The “prcomp” function in the “stats” R package was used to conduct principal component analysis (PCA) based on the clusters. The correlations between clusters and clinical characteristics, including overall survival (OS), were analyzed by employing the chi-square test and R package “survival”. The results were presented by heat maps and Kaplan-Meier (KM) curves via R packages “pheatmap”,”survival”, and “survminer”.
Construction of PRGs-Based SKCM Prognostic Model
The least absolute shrinkage and selection operator (LASSO) Cox regression analysis was performed by using the R package “glmnet” to narrow down the candidate genes and to develop the prognostic model. The penalty parameter (λ) was determined by the minimum parameters. The risk scores were calculated using the following equations:where Coef is the coefficient and Exp is the expression level of every retained gene. Data from TCGA-SKCM were randomly divided into a training set and a testing set according to 7:3. The risk score was calculated by using the data from the training set. Data from the testing set was used for the internal evaluation. Data from GSE54467 and GSE65904 were merged and normalized as a validation set by using the R package “limma” for the external evaluation. The R packages “survival” and “survminer” was employed to perform KM analyses. The R package “survivalROC” was employed to perform 3- and 5- year ROC analysis.The correlation between subgroups and clinical characteristics in TCGA-SKCM was analyzed by employing the chi-square test and presented by heat map. The relationship and independence of the clinical factors and the risk score calculated from the prognostic model were determined using univariate and multivariate Cox regression analyses, which were carried out by using the “survival” R package.
Gene Sets Enrichment Analysis
The DEGs (|log2FC| ≥ 1 and FDR < 0.05) between the low- and high-risk subgroups in TCGA-SKCM were filtered, which was carried out with the Gene Ontology (GO) analysis by using the “clusterProfiler” R package. Besides, gene set enrichment analysis (GSEA) was used in TCGA-SKCM to identify the biological processes that were significantly alerted between the high-risk and low-risk subgroups (35, 36). The Java GSEA software (version 4.0.1) was employed and the gene set “c2.cp.kegg.v7.4.symbols.gmt” from the database of Kyoto Encyclopedia of Genes and Genomes (KEGG) was chosen as the reference (37–39). Biological processes with the normalized p < 0.05 and the false discovery rate (FDR) q value < 0.05 were considered as statistically significant. The top biological processes that had been altered were chosen based on a ranking of normalized enrichment ratings (NES).
Immune Infiltration Analysis
Transcriptome data from TCGA-SKCM was transformed into the total abundance of immune cells by utilizing the Cell-type Identification by Estimating Relative Subsets of RNA Transcripts (CIBERSORT) analysis with the “CIBERSORT” R package (40, 41). Patients were divided into low- and high-infiltration subgroups according to the median level. Tumor IMmune Estimation Resource 2.0 (TIMER2.0, http://timer.cistrome.org/) was employed to analyze the correlation between the immune infiltration and OS in SKCM (42–44).
Statistical Analyses
Wilcoxon test was applied to compare the gene expression levels between the normal skin and SKCM tissues and the immune infiltration levels between subgroups. The two-sided log-rank test was used to compare the OS between subgroups. Other statistical methods are specifically described above. All statistical analyses were accomplished with R (v3.6.2) and Anaconda 3 (Python v3.8.5).
Results
Identification of Differentially Expressed PRGs Between Normal Skin and SKCM Tissues
Expression levels of 20 PRGs were compared between 557 normal and 471 tumor tissues from GTEx-SKIN and TCGA-SKCM data. It was observed that all the 20 PRGs were significantly differentially expressed (all q-value < 0.1.
and
). Among them, 11 genes (CASP1, PYCARD, APIP, FOXO3, IL18, GSDMA, GSDMC, CASP4, GSDMB, NLRP1, and NAIP) were downregulated while 9 genes (NLRP9, DHX9, CASP3, NLRC4, AIM2, NLRP3, IL1B, GSDME, and GSDMD) were upregulated in tumor tissues. In addition, 13 genes showed significant associations with OS (
). Among them, 11 genes were protecting factors (hazard ratio < 1) and 2 genes were risk factors (hazard ratio > 1).
Figure 2
Expressions and the associations with OS of the 20 PRGs. (A) Violin plot of PRGs between the normal (blue) and the tumor tissues (red). q, FDR-adjusted p-value. (B) Significance and hazard ratio (95% CI) values of OS-related PRGs in univariate Cox regression. CI, confidence interval. q, FDR-adjusted p-value. (C) PPI network showing the interactions of the PRGs (interaction score = 0.9). The bottom boxes show the types of interactions.
Expressions and the associations with OS of the 20 PRGs. (A) Violin plot of PRGs between the normal (blue) and the tumor tissues (red). q, FDR-adjusted p-value. (B) Significance and hazard ratio (95% CI) values of OS-related PRGs in univariate Cox regression. CI, confidence interval. q, FDR-adjusted p-value. (C) PPI network showing the interactions of the PRGs (interaction score = 0.9). The bottom boxes show the types of interactions.To further explore the interactions of these PRGs, PPI and expression correlation analysis were performed (
and
). The minimum required interaction score for the PPI analysis was set at 0.9 (the highest confidence). The results suggested that CASP1, CASP3, GSDMD, NLRP3, PYCARD, AIM2, and NLRC4 play central roles in the pyroptosis process of SKCM. The Human Protein Atlas (www.proteinatlas.org) was used to retrieve immunohistochemistry staining images of proteins encoded by PRGs in SKCM, showing cellular sublocalization of these molecules (
). It can be seen that several widely reported pyroptosis-related proteins were in high levels, including AIM2, CASP1, CASP3, GSDMD, and GSDME, which indicate that pyroptosis occurred in a large part of SKCM tissues.
Diagnosis Value of PRGs-Based Classifiers in SKCM
Given the significant difference in PRG expression between normal and tumor tissues, it was hypothesized that PRGs can be used to diagnose SKCM. To verify this hypothesis, nine commonly used machine-learning algorithms were used to construct diagnostic classifiers, including KNN, logistic regression, SVM, ANN, decision tree, random forest, XGBoost, LightGBM, and CatBoost. Data from GSE98394 that contains primary melanoma and common acquired nevi were randomly divided into a training set and a testing set according to 7:3. Respectively, classifiers based on the above algorithms were trained by using the RNA-seq data in the training set. The testing set was designed to perform internal evaluations. As expected, RNA-seq data of PRGs were suitable for building the SKCM diagnostic classifiers, because of the high accuracy in the training and testing set (
,
, and
).
Table 1
Evaluation parameters of classifiers in different datasets.
Classifiers
Datasets
Precision
Recall
F1_score
Accuracy
ROC AUC
Gini
KS
KNN
Training
1.000
0.974
0.987
98.30%
0.997
0.995
0.974
Testing
1.000
1.000
1.000
95.00%
1.000
1.000
1.000
Validation 1
0.803
0.987
0.886
88.30%
0.962
0.923
0.861
Validation 2
0.843
0.754
0.796
72.50%
0.772
0.545
0.426
L2 Logistic Regression
Training
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Testing
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Validation 1
0.857
0.994
0.920
92.10%
0.994
0.988
0.953
Validation 2
0.852
0.912
0.881
82.50%
0.844
0.689
0.642
SVM
Training
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Testing
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Validation 1
0.849
1.000
0.918
91.80%
0.997
0.993
0.957
Validation 2
0.841
0.930
0.883
82.50%
0.831
0.661
0.590
ANN
Training
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Testing
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Validation 1
0.870
0.994
0.928
92.90%
0.995
0.990
0.955
Validation 2
0.897
0.912
0.904
86.30%
0.858
0.716
0.677
Decision Tree
Training
0.974
0.974
0.974
96.60%
0.962
0.924
0.924
Testing
0.833
0.769
0.800
75.00%
0.742
0.484
0.484
Validation 1
0.929
0.949
0.939
94.40%
0.944
0.888
0.888
Validation 2
0.804
0.649
0.718
63.70%
0.629
0.258
0.258
Random Forest
Training
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Testing
1.000
0.923
0.960
95.00%
1.000
1.000
1.000
Validation 1
0.816
0.998
0.898
89.60%
0.995
0.991
0.952
Validation 2
0.873
0.842
0.857
80.00%
0.854
0.709
0.615
XGBoost
Training
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Testing
0.923
0.923
0.923
90.00%
0.984
0.967
0.923
Validation 1
0.703
0.970
0.815
79.90%
0.971
0.941
0.906
Validation 2
0.857
0.842
0.850
78.70%
0.840
0.680
0.607
LightGBM
Training
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Testing
0.917
0.846
0.880
85.00%
0.978
0.956
0.857
Validation 1
0.768
0.994
0.867
86.00%
0.988
0.975
0.929
Validation 2
0.883
0.930
0.906
86.30%
0.839
0.679
0.643
Catboost
Training
1.000
1.000
1.000
100.00%
1.000
1.000
1.000
Testing
1.000
0.923
0.960
95.00%
0.989
0.978
0.923
Validation 1
0.836
0.996
0.909
90.90%
0.994
0.989
0.940
Validation 2
0.879
0.895
0.887
83.80%
0.855
0.710
0.624
Figure 3
Performance evaluations of diagnostic classifiers based on 9 algorithms. (A–I) ROC curves for evaluating the predictive performance of the diagnostic models respectively based on K-Nearest Neighbor (A), logistic regression (B), Support Vector Machine (C), Artificial Neural Network (D), decision tree (E), random forest (F), XGBoost (G), LightGBM (H), and CatBoost (I). Data from GSE98394 were randomly divided into a training set (not shown due to the AUCs were extremely close to 1.0 in all classifiers) and a testing set (blue line) according to 7:3. Validation 1: the combination of TCGA-SKCM & GTEx-SKIN (red line). Validation 2: GSE112509 (green line). (J) Columns showing the accuracy (%) of each classifier in different datasets. (K) Columns showing the F1 score of each classifier in different datasets. (L) Columns showing the Gini index of each classifier in different datasets. (M) Columns showing the KS value of each classifier in different datasets.
Evaluation parameters of classifiers in different datasets.Performance evaluations of diagnostic classifiers based on 9 algorithms. (A–I) ROC curves for evaluating the predictive performance of the diagnostic models respectively based on K-Nearest Neighbor (A), logistic regression (B), Support Vector Machine (C), Artificial Neural Network (D), decision tree (E), random forest (F), XGBoost (G), LightGBM (H), and CatBoost (I). Data from GSE98394 were randomly divided into a training set (not shown due to the AUCs were extremely close to 1.0 in all classifiers) and a testing set (blue line) according to 7:3. Validation 1: the combination of TCGA-SKCM & GTEx-SKIN (red line). Validation 2: GSE112509 (green line). (J) Columns showing the accuracy (%) of each classifier in different datasets. (K) Columns showing the F1 score of each classifier in different datasets. (L) Columns showing the Gini index of each classifier in different datasets. (M) Columns showing the KS value of each classifier in different datasets.In addition to the accuracy, ROC curves were used to evaluate the sensitivity and specificity of the classifiers. In the testing set, except for the poor performance of the decision tree, the AUC values of the other eight algorithms were all higher than 0.900 (
). This suggests that PRGs had a very high capacity to distinguish between normal and tumor samples in a single study (GSE98394). To verify the performance of these classifiers in out-of-sample data with different sample sizes and levels of balance, two external validation sets were used to perform ROC analysis. In the validation set 1 (TCGA-SKCM & GTEx-SKIN) with a relatively larger and more balanced sample size (the numbers of melanoma and normal skin samples are 471 and 557), all classifiers performed well (
, red line). Furthermore, except for KNN and decision tree, classifiers based on the other seven algorithms worked well in the validation set 2 (GSE112509) with a relatively smaller and unbalanced sample size (the numbers of melanoma and normal nevi samples are 57 and 23) (
, green line).In order to further evaluate the classifiers, precision, recall and F1 score were calculated and the results were consistent with the ROC analysis (
and
). Moreover, the Gini index and KS value were estimated to confirm the results (
, and
). Commonly, when these parameters are close to 1.000, it indicates that the classifier has a strong ability to distinguish. Besides, the classifier is robust when the difference of these parameters among datasets is minimal. Considering all the evaluation parameters, it was found that ANN is the most suitable algorithm to construct the diagnostic model based on PRGs in this study, while logistic regression, random forest, and SVM also performed well, which suggests that the expression signature of PRGs has a high diagnostic benefit in SKCM.
Identification of SKCM Clusters Using Consensus Clustering
In order to investigate the therapeutic utility of PRGs, we attempted to divide the SKCM samples into clusters depending on gene expression patterns (
). The number of clusters was represented by the letter “k”. The empirical CDF was plotted to determine the optimum k value for the sample distribution to reach maximal stability (
). Consensus matrices showed that, with k = 2, patients in TCGA-SKCM could be divided into two distinct and non-overlapping clusters, which was verified by the PCA (
and
). It was observed that there are significant differences in OS and the stage of SKCM (
). As shown in
, cluster 2 had a significantly poorer OS than cluster 1 (HR = 1.74).
Figure 4
Consensus clustering analysis of PRGs. (A) PCA plot for clusters. (B) KM curves showing the OS of cluster 1 (blue) and cluster 2 (red). HR, hazard ratio. CI, confidence interval. (C) Heatmap and the clinical characters of the two clusters (T, N, and M are the tumor node metastasis classification) (*p < 0.05).
Consensus clustering analysis of PRGs. (A) PCA plot for clusters. (B) KM curves showing the OS of cluster 1 (blue) and cluster 2 (red). HR, hazard ratio. CI, confidence interval. (C) Heatmap and the clinical characters of the two clusters (T, N, and M are the tumor node metastasis classification) (*p < 0.05).
Prognostic Value of PRGs Expression Signature in SKCM
Cox regression analysis was used to evaluate the correlations between each PRG and survival status to assess the prognostic value of PRGs expression signature. Data from TCGA-SKCM were randomly divided into a training set and a testing set according to 7:3. To narrow down the candidate genes and construct the prognostic model, the LASSO Cox regression model was used in the training set. Nine genes and their coefficients (
) were eventually preserved, and the penalty parameter (λ) was determined by the minimum parameters (
). Data from GSE54467 and GSE65904 were merged and normalized as a validation set for the external evaluation. The risk scores in the test and validation sets were calculated by the same equation obtained from the training set (
).
Table 2
Coefficients in the LASSO Cox regression model.
i
Gene
Coef
1
GSDMD
-0.006861
2
GSDME
0.0003969
3
CASP4
-0.001943
4
GSDMC
0.0079361
5
NLRC4
-0.022123
6
APIP
-0.009636
7
AIM2
-0.003569
8
CASP3
-0.00106
9
IL18
-0.000169
Figure 5
Construction of the PRGs-based prognostic model. (A) LASSO regression of the 7 OS-related genes. (B) Cross-validation for tuning the parameter in the LASSO regression. (C–H) KM curves showing the OS of the low- (blue) and high- (red) risk subgroups. ROC curves demonstrated the predictive efficiency of the risk score for 3- and 5-year survival. Data from TCGA-SKCM were randomly divided into a training set (C, D) and a testing set (E, F) according to 7:3. GSE54467 and GSE65904 were merged as the validation set (G, H). HR, hazard ratio. CI, confidence interval.
Coefficients in the LASSO Cox regression model.Construction of the PRGs-based prognostic model. (A) LASSO regression of the 7 OS-related genes. (B) Cross-validation for tuning the parameter in the LASSO regression. (C–H) KM curves showing the OS of the low- (blue) and high- (red) risk subgroups. ROC curves demonstrated the predictive efficiency of the risk score for 3- and 5-year survival. Data from TCGA-SKCM were randomly divided into a training set (C, D) and a testing set (E, F) according to 7:3. GSE54467 and GSE65904 were merged as the validation set (G, H). HR, hazard ratio. CI, confidence interval.According to the median risk score, patients in the training set were divided into low- and high-risk subgroups, and a significant difference in OS was observed via the KM survival analysis (
). The lifespan of patients in the high-risk subgroup was shorter than those in the low-risk subgroup. The sensitivity and specificity of the prognostic model were determined using the time-dependent ROC analysis, and the AUC was 0.640 for 3-year survival and 0.711 for 5-year survival, respectively (
). Furthermore, patients in the test and validation sets were also divided according to the median risk score. The OS and ROC analyses of these two subgroups showed similar results to the training set (
).In addition, significant differences in the tumor stage were observed between low-and high-risk subgroups, such as more stage-IV and fewer T1 samples in the high-risk subgroup (
). This observation led us to wonder whether the risk score could function as an independent prognostic factor in SKCM. To prove this hypothesis, univariate and multivariable Cox regression analyses were performed. Firstly, univariate Cox regression analysis revealed that the risk score was certainly related to prognosis, with the greater the risk score, the poorer the prognosis (HR = 2.470, p < 0.001.
). Secondly, multivariable Cox regression analysis showed that the risk score is an independent prognostic risk factor (HR=2.078, p < 0.001.
). These results suggest that the PRGs-based prognostic model is robust and independent in predicting the prognosis of SKCM.
Figure 6
Univariate and multivariate Cox regression analyses for the risk score. (A) Heatmap and the clinical characters of low- and high-risk subgroups (T, N, and M are the tumor node metastasis classification) (*p < 0.05, ***p < 0.001). (B, C) Univariate (B) and multivariate (C) Cox regression analyses showing the significance and hazard ratio values of risk score and clinical characters. CI, confidence interval.
Univariate and multivariate Cox regression analyses for the risk score. (A) Heatmap and the clinical characters of low- and high-risk subgroups (T, N, and M are the tumor node metastasis classification) (*p < 0.05, ***p < 0.001). (B, C) Univariate (B) and multivariate (C) Cox regression analyses showing the significance and hazard ratio values of risk score and clinical characters. CI, confidence interval.
Identification of the Prognostic Model-Related Biological Processes
It is meaningful to figure out what biological processes were influenced by the prognostic risk model to make them predictive. To answer this question, functional enrichment analyses were performed. Firstly, GO enrichment was employed to analyze the DEGs between the low- and high-risk subgroups. It was observed that genes related to immune cell activation and proliferation had different expression levels (
). Secondly, to further verify this observation, GSEA was utilized to find enriched pathways in the KEGG database. Results showed that 53 gene sets were significantly upregulated in the low-risk subgroup (normalized p < 0.05 and FDR q < 0.05) but no gene set was significantly upregulated in the high-risk subgroup (
). Interestingly, it was observed that the most enriched biological processes in the low-risk subgroup were closely associated with immune responses (
and
), including the chemokine signaling pathway (NES = 2.566), Toll-like receptor signaling pathway (NES = 2.507), leukocyte transendothelial migration (NES = 2.488), T cell receptor signaling pathway (NES = 2.423), cytokine-cytokine receptor interaction (NES = 2.402), NK cell-mediated cytotoxicity (NES = 2.238), etc. These results proved that the PRGs-based prognostic risk model is related to immune responses. Based on these findings, we proposed that the effects of PRGs on predicting the prognosis of SKCM could be related to the immune microenvironment. CIBERSORT was employed to estimate the immune cell component in SKCM tissues. The proportion of 22 human immune cell subpopulations, including naive and memory B cells, plasma cells, seven T cell types, NK cells, and myeloid subsets, was assessed. Results suggested that fractions of activated CD4+ memory T cells, γδ T cells, and M1 macrophages were significantly higher in the low-risk subgroup, whereas the high-risk subgroup had a higher fraction of M2 macrophages (
,
). In addition, we retrieved the relationship between immune cell infiltration and cumulative survival with Timer2.0 (
). Interestingly, only the contents of macrophages showed significant associations with survival, where the high level of M1 macrophages or the low level of M2 macrophages indicated better survival (
). Inflammation can be regulated by various types of tumor-associated macrophages (45). These findings suggest that, in the PRGs-based prognostic model, high-risk patients have less pro-inflammatory M1 macrophages and more anti-inflammatory M2 macrophages than low-risk patients, eventually resulting in a worse prognosis.
Figure 7
Functional enrichment analyses. (A) Bubble graph for GO enrichment (the bigger bubble means the more genes enriched, and the increasing depth of red means the differences were more significant). (B–G) Representative enrichment plots generated by GSEA reveal that the low risk was significantly associated with chemokine signaling pathway (B), Toll-like receptor signaling pathway (C), leukocyte transendothelial migration (D), T cell receptor signaling pathway (E), cytokine-cytokine receptor interaction (F), NK cell-mediated cytotoxicity (G).
Functional enrichment analyses. (A) Bubble graph for GO enrichment (the bigger bubble means the more genes enriched, and the increasing depth of red means the differences were more significant). (B–G) Representative enrichment plots generated by GSEA reveal that the low risk was significantly associated with chemokine signaling pathway (B), Toll-like receptor signaling pathway (C), leukocyte transendothelial migration (D), T cell receptor signaling pathway (E), cytokine-cytokine receptor interaction (F), NK cell-mediated cytotoxicity (G).
Identification of Risk-Related Genes
Since PRGs have been shown to have prognostic significance, identifying risk-related genes would aid in further research into the function of pyroptosis in SCKM. The correlation of the prognostic risk score and the expression level of each gene was analyzed by Pearson’s correlation analysis to screen the most relevant genes. Genes with the p < 0.05 and the absolute value of Pearson Correlation Coefficient (|Cor|) ≥ 0.6 were considered as the strong-correlated genes (
). Among them, the most relevant gene is NLRC4 which is also a component of the prognostic model. Respectively, KM survival analyses were performed for each gene with the p < 0.05 and |Cor| ≥ 0.7 (
). It was observed that all the six most relevant genes were significantly associated with survival, and higher expression means longer lifespan (
). These results imply that these genes may be involved in the pyroptosis of SKCM and function as protectors to patients.
Figure 8
Identification of risk-related genes. (A–F) Representative results of correlation analysis between the risk score and each gene in SKCM. Cor: correlation coefficient. (G–L) KM curves showing the relationship between the six most relevant genes and OS. HR, hazard ratio. CI, confidence interval.
Identification of risk-related genes. (A–F) Representative results of correlation analysis between the risk score and each gene in SKCM. Cor: correlation coefficient. (G–L) KM curves showing the relationship between the six most relevant genes and OS. HR, hazard ratio. CI, confidence interval.
Discussion
Since pyroptosis may be a double-edged sword for cancerpatients, the most straightforward and concrete way to explain its importance is to develop pyroptosis-related prognostic and diagnostic models. The mRNA levels of 20 PRGs were investigated in SKCM and normal tissues in this study, and it was discovered that they were all differentially expressed. The significance of these genes related to the survival of patients was studied. Several genes that were highly expressed in SKCM and lowly expressed in normal skin tissues, but those genes were shown to be associated with a better prognosis, such as GSDMD and NLRC4, which is consistent with previous findings (46). Furthermore, diagnosis by a single gene is difficult and inaccurate. So it seems that a single PRG is unreliable for SKCM diagnosis and predicting the prognosis. This has inspired us to explore the diagnostic and prognostic value of pyroptosis by using a multi-PRG signature.First of all, we established the SKCM-normal classifiers based on nine commonly used algorithms. Although there was little overfitting, the classifiers still had reasonable generalization ability and classification performance, especially classifiers based on the ANN, logistic regression, random forest, and SVM. Except for the decision tree, classifiers constructed from other tree-based algorithms (random forest, XGBoost, LightGBM, and Catboost) also had excellent performance. It’s worth noting that differences in immune infiltration and phenotypic patterns may lead to differences in diagnostic model performance between validation set 1 and 2, although these models performed well in these datasets. Since the PRG signature had the potential to diagnose SKCM but performed differently across the datasets, it is critical to collect more training samples and further tune parameters for the advancement of this SKCM diagnostic method. Clinically, because the PRGs-based classifiers were constructed using a dataset containing benign nevi and melanoma (GSE98394), and it was validated in datasets containing normal skin tissue (GTEx-SKIN) and benign nevi (GSE112509), they have the potential to provide a novel approach for distinguishing between malignant melanoma and benign nevus.Secondly, we proved that PRGs expression signature has prognostic value in SKCM. To verify the hypothesis, it was found that PRGs could cluster SKCM patients, and patients in different clusters have different clinical outcomes. This suggested that the occurrence of pyroptosis in tumor tissues may be different in SKCM patients, which led to a different OS. Then we constructed a 9-gene prognostic risk model via LASSO Cox regression analysis, and patients in different risk subgroups had different OS, which was then validated to perform well in the external datasets.Through the enrichment analysis of biological processes for different risk subgroups, it was found that there were significant differences in immune-related signaling pathways, which is in line with our expectations. Because the process of pyroptosis can lead to the secretion of many inflammatory cytokines, and it is also the result of inflammasome activation (6, 29). Interestingly, in addition to the representative results shown in
, we also found several signaling pathways associated with immunological rejection and autoimmune-related diseases including Type 1 diabetes. This may be due to the fact that certain patients have been treated with immune checkpoint therapy, such as ipilimumab (47, 48). While our study centered on melanoma, the importance of pyroptosis in immune checkpoint and autoimmune diseases deserves more investigation. In addition to immune checkpoint therapy, some commonly used melanoma-targeting drugs, including BRAF and MEK inhibitors, also affect the immune microenvironment through pyroptosis (49). Therefore, we hypothesize that patients will benefit from these drugs, and their curative efficacy can be monitored by PRGs-based risk score to guide the treatment.Furthermore, pyroptosis was firstly discovered in the infectious pathogenic bacteria Shigella and Salmonella, which induced lytic cell death in macrophages by activating caspase-1 through secreted effector proteins SipB and IpaB, respectively (50, 51). As for SKCM, circulating macrophages are selectively recruited into tumors during tumor development, where they modify the tumor microenvironment. In response to numerous microenvironmental signals produced by tumor and stromal cells, macrophages change their functional phenotypes including M1 and M2. On one hand, M1 macrophages participate in the inflammatory response, pathogen clearance, and antitumor immunity. M1 macrophages have high levels of the main histocompatibility complex class I (MHC1) and class II (MHC2) molecules, which are needed for tumor-specific antigen presentation. As a result, M1 macrophages play an important role in the inflammatory response as well as antitumor immunity. On the other hand, the M2 macrophages influence the anti-inflammatory response, wound healing, and pro-tumorigenic properties. Tumor-associated macrophages (TAMs) are M2-polarized macrophages that are important modulators of the tumor microenvironment to accelerate tumor progression (52). Coincidentally, by analyzing the fraction and types of immune cells in the microenvironment, we found that M1 and M2 macrophages were different between low- and high-risk subgroups (
). Nevertheless, it is crucial to emphasize that using bulk sequencing in tissues to estimate the immune infiltration is imprecise, so that further research on the relationship between pyroptosis and TAM in melanoma tissues, as well as their relevance to patient outcomes, is worth discussing in future works.There could be many complicating factors that drive the variations in gene expression among different tissues, especially the expression of PRGs and inflammation-related genes, including the percentage of immune cell infiltration and the differentiation status of melanoma (49, 53). Although these factors did not affect the use of PRG expression signature to diagnose and predict the prognosis of SKCM, the links between pyroptosis and immune cell infiltration, the differentiation status of SKCM, and other factors are interesting to investigate, which may provide novel inspirations to predict the diagnosis and prognosis of SKCM. To determine whether samples with either high or low PRGs is related to immune cell contamination, we investigated if critical PRGs were linked to the expression of inflammatory components. Firstly, it was reported that GSDME who can be activated by Caspase 3 to mediate pyroptosis is expressed in the majority of melanomas (
and
) (
16). Thus, CIBERSORT was utilized to analyze the TCGA-SKCM dataset and it was observed that the expression level of GSDME was not significantly related to immune invasion (data not shown), which indicated that the presence of GSDME may not be caused by immune cell contamination. Secondly, it was found that, compared with the canonical Caspase 1-mediated pyroptosis pathway, the expression level of Casp3 was highly correlated with the expression levels of some inflammasomes and inflammatory cytokine-related genes (
). Taken together, it was indicated that a number of melanoma cells underwent Casp3/GSDME pathway-mediated pyroptosis and hence generated inflammatory cytokines to recruit immune cells. In addition, much more research will be required in the future to fully understand how the PRG-related prognostic and diagnostic models work.Finally, we analyzed genes associated with risk scores. In particular, NLRC4 was the most associated gene with the risk score, even though it was one component of the prognostic model. This suggests that NLRC4 inflammasomes may be more involved in SKCM. The risk score can be estimated using the expression level of a single NLRC4 gene since it is associated with OS in SKCM (
). Furthermore, it was reported that Nlrc4-/- mice were shown to have increased tumor development when injected subcutaneously with mouse B16F10 melanoma (54). Therefore, the impairment of NLRC4 inflammasome in melanoma cells and the function in pyroptosis are worth further study.In decades, there have been many studies on data mining and modeling based on gene expression profiles and clinical outcomes of melanomapatients, which can provide reliable models for the diagnosis and prognosis of melanoma (55–58). Compared with these studies, we used candidate genes to establish diagnostic and prognostic models in one study, which proved that PRGs were significantly valuable for the diagnosis and prognosis of SKCM. In terms of diagnostic models, we employed a variety of traditional algorithms and compared their effectiveness. In future works, we will refer to previous reports to discuss the potential of these models in predicting the metastasis of melanoma (55). As for the prognosis of SKCM, we solely selected the mRNA levels of the protein-coding PRGs to establish the prognostic model. Although there were still some gaps with some models based on other genes, the results suggested that PRGs had an effective prognostic performance in SKCM (56). In addition to transcriptome data that can be used for these analyses, we hypothesize that other omics data, such as proteome and metabolome, can be used similarly for tumor diagnosis and prognosis.In conclusion, our study showed 20 PRGs differentially expressed between SKCM and normal tissues, and their association with diagnosis and prognosis. Then we showed that these genes can be used to distinguish between normal and SKCM tissues. Furthermore, the risk score derived from the prognostic model based on 9 PRGs was an independent risk factor for predicting SKCM prognosis, which was found to be related to the immune microenvironment.
Data Availability Statement
The original contributions presented in the study are included in the article/
. Further inquiries can be directed to the corresponding author.
Author Contributions
AJ, JT, and SC conceived the study. AJ designed the study and analyzed the data. AJ wrote the manuscript, which was reviewed by YL and YF. All authors contributed to the article and approved the submitted version.
Funding
This research was supported by the Science and Technology Major Project of National Health Commission of the PRC (No. 20181821569) and Self-Topic Fund of Tsinghua University (No. 20191080585).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Authors: H A Tawbi; J H Beumer; A A Tarhini; S Moschos; S C Buch; M J Egorin; Y Lin; S Christner; J M Kirkwood Journal: Ann Oncol Date: 2012-11-21 Impact factor: 32.976
Authors: Helena Cirenajwis; Henrik Ekedahl; Martin Lauss; Katja Harbst; Ana Carneiro; Jens Enoksson; Frida Rosengren; Linda Werner-Hartman; Therese Törngren; Anders Kvist; Erik Fredlund; Pär-Ola Bendahl; Karin Jirström; Lotta Lundgren; Jillian Howlin; Åke Borg; Sofia K Gruvberger-Saal; Lao H Saal; Kari Nielsen; Markus Ringnér; Hensin Tsao; Håkan Olsson; Christian Ingvar; Johan Staaf; Göran Jönsson Journal: Oncotarget Date: 2015-05-20
Authors: Elizabeth S Borden; Anngela C Adams; Kenneth H Buetow; Melissa A Wilson; Julie E Bauman; Clara Curiel-Lewandrowski; H-H Sherry Chow; Bonnie J LaFleur; Karen Taraszka Hastings Journal: Cancers (Basel) Date: 2021-12-21 Impact factor: 6.575