Bizhi Tu1, Yaya Jia2, Jun Qian1. 1. Department of Orthopedics, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui, People's Republic of China. 2. Department of Pediatrics, The Shanxi Medical University, Taiyuan, Shanxi, People's Republic of China.
Abstract
Background: Human sarcomas (SARC) are a group of malignant tumors that originated from mesenchymal lineages with more than 60 subtypes. However, potential biomarkers for the diagnosis and prognosis of SARC remain to be investigated. Methods: We obtained three GSE raw matrix files (GSE39262, GSE21122, GSE48418) that related to various subtypes of sarcoma from the public GEO database and explored the widely differential expression genes in three obtained GSE files. Then common differential expression genes (CDGEs) were identified. We analyzed the correlation between the expression of the top five interacted genes of CDEGs and genome-wide differences, prognosis, genetic mutation, functional enrichment, immune infiltration, immune checkpoint, and marker genes' expression of N6-methyladenosine (m6A) modification in SARC patients. Besides, a prognostic nomogram was constructed to predict the survival of SARC patients. Results: Among the three GSE files, 42 CDGEs were identified, and the top five interacted genes were ASPM, CCNB2, PRC1, AURKA, and SCM2. The expression levels of the five genes were higher in the SARC group than that in the normal group. The transcriptional level of CCNB2, PRC, and SCM2 was correlated to the worse survival of SARC. The constructed nomogram that combined CNB2, PRC1, and SCM2 showed a fairly good incredibility in predicting the survival of SARC (C-index: 0.711). Furthermore, the five genes were widely involved in immune infiltration, immune checkpoint, and m6A modification. In addition, we found a minor survival-related mutation rate (9%) of the five identified genes in SARC patients (p < 0.05). Conclusion: The results suggested the five identified genes widely participated in the prognosis, immune infiltration, immune checkpoint, and m6A modification of SARC patients. This study provided a theoretical basis for the research about the correlation between the level of five identified genes and sarcoma, but the further mechanism needs to be verified by experiments.
Background: Human sarcomas (SARC) are a group of malignant tumors that originated from mesenchymal lineages with more than 60 subtypes. However, potential biomarkers for the diagnosis and prognosis of SARC remain to be investigated. Methods: We obtained three GSE raw matrix files (GSE39262, GSE21122, GSE48418) that related to various subtypes of sarcoma from the public GEO database and explored the widely differential expression genes in three obtained GSE files. Then common differential expression genes (CDGEs) were identified. We analyzed the correlation between the expression of the top five interacted genes of CDEGs and genome-wide differences, prognosis, genetic mutation, functional enrichment, immune infiltration, immune checkpoint, and marker genes' expression of N6-methyladenosine (m6A) modification in SARC patients. Besides, a prognostic nomogram was constructed to predict the survival of SARC patients. Results: Among the three GSE files, 42 CDGEs were identified, and the top five interacted genes were ASPM, CCNB2, PRC1, AURKA, and SCM2. The expression levels of the five genes were higher in the SARC group than that in the normal group. The transcriptional level of CCNB2, PRC, and SCM2 was correlated to the worse survival of SARC. The constructed nomogram that combined CNB2, PRC1, and SCM2 showed a fairly good incredibility in predicting the survival of SARC (C-index: 0.711). Furthermore, the five genes were widely involved in immune infiltration, immune checkpoint, and m6A modification. In addition, we found a minor survival-related mutation rate (9%) of the five identified genes in SARC patients (p < 0.05). Conclusion: The results suggested the five identified genes widely participated in the prognosis, immune infiltration, immune checkpoint, and m6A modification of SARC patients. This study provided a theoretical basis for the research about the correlation between the level of five identified genes and sarcoma, but the further mechanism needs to be verified by experiments.
Human sarcomas (SARC) encompass more than sixty subtypes, which originate from mesenchymal lineages with the multifactorial somatic genetic alteration. Newly confirmed cases of SARC (13,130 people) accounted for 1% of all malignant tumors and more than 5350 deaths in 2020 around America.1 However, the five-year survival rate for advanced SARC patients is less than 20–30%.2 Surgical resection is the primary strategy for the early treatment of sarcoma. For advanced-stage SARC patients, radiotherapy and combination chemotherapy are applied in some sensitive types of sarcomas preoperatively or postoperatively, which has acquired an initial ideal effect.3 However, approximately 50% of patients with SARC tend to recurrence and metastases after early radiotherapy, chemotherapy, and surgery treatment.4 The current chemotherapy regimen used in advanced sarcoma patients is usually difficult to achieve substantial improvement, the expected survival time ranges from 12 to 18 months.4–6 Therefore, identifying the potential diagnostic and prognostic biomarkers of SARC is of great importance.Different subtypes of sarcoma have various clinical processes and outcomes.7 For example, the biological characteristic of desmoid tumors tends to local infiltration and causes various clinical symptoms, such as chronic pain, psychological diseases, and function dysfunction,8 while osteosarcoma has a propensity for bone destruction and early distant metastasis.9 The features common to biological characteristics and molecular markers among different subtypes of sarcoma are less known.10 Recently, the use of high-throughput gene sequencing provides a better understanding of SARC at the genomic heterogeneity level.11,12 Utilizing gene sequence analysis, several overexpressed genes related to spindle assembly and centrosome activity had been identified.13 Similarly, minichromosome maintenance family proteins were found to widely participate in the tumor progress and prognosis in patients with osteosarcoma.14 However, the role of common differential expression genes (CDEGs) among diverse subtypes of SARC has not been reported. In the current study, we obtained several CDEGs from three GEO series (GSE) profiles and identified several genes that interacted the most. A better understanding of the multimolecular characteristics of these genes in SARC is expected to open novel perspectives for identifying the potential biomarkers for tumor diagnosis and prognosis and would provide guidance for future studies.
Methods
Identification of CDEGs from the GEO Database
We downloaded three sarcoma-related GSE gene expression matrix files (GSE39262, GSE21122, GSE48418) from the GEO database (), and 10 subtypes of SARC were included: osteosarcoma, chondrosarcoma, rhabdomyosarcoma, fibrosarcoma, neuroblastoma, liposarcoma, Ewing sarcoma, leiomyosarcoma, synovial sarcoma, and myxofibrosarcoma. The GSE profile mentioned above was classified into sarcoma group, and control group respectively. Utilizing the limma package in R software, the differential expression of various genes in each GSE profile were analyzed. The ggplot2 and ComplexHeatmap packages were used for visualization. Genes that were differentially expressed (|logFC|>1 and p<0.05) in all three databases were defined as CDEGs.
GO/KEGG Enrichment Analysis of CDEGs
Using the clusterProfile package in R software, we conducted gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis on CDEGs. And the main enrichment function and pathway of the CDEGs were retained when an adjusted p-value was less than 0.05.
Protein-Protein Interaction (PPI) Analysis of CDEGs
The PPI network of CDEGs was performed using the STRING database (), an online network designed to explore the potential protein interaction network among genes by collecting, integrating, and scoring publicly available data. Cytoscape was used for the visualization of PPI.15 A plug-in model Molecular Complex Detection (MCODE) of Cytoscape was applied to identify the tightly connected genes. The top five genes of CDEGs with the highest connect degree were retained for further analysis.
Differential Expression and Prognostic Analysis of the Top Five Genes of CDEGs
We utilized the GEPIA () database, a new online analysis tool that provides the RNA-seq expression data of more than 9736 tumors and 8587 normal tissue samples, to explore the association of the expression of five genes between sarcomas and normal tissues.16 Then, we analyzed the relationship between the expression of five identified genes and prognosis among SARC patients. Overall survival (OS) and disease free survival (DFS) of SARC were analyzed in the Survival Analysis module of the GEPIA database. Moreover, progress free interval (PFI) and OS were explored in the Cancer Genome Atlas Database (TCGA, ), a comprehensive tumor database that contains expression information and survival data of 34 tumor types.17
Nomogram Model in Predicting the Prognosis of SARC
The included variables were identified in the univariate analysis module, which contains Age, Gender, Female, Male, Tumor depth, Superficial, Deep, Tumor multifocal, Radiation therapy, Tumor necrosis, Metastasis, Margin status, Race, Residual tumor (). Three of the five identified genes were also included due to the observed significant correlation in the prognosis of sarcoma including CCNB2, PRC1, and SMC2 (p<0.05 for all in OS). Based on the results of the univariate analysis and relevant clinical parameters, a predictive nomogram was generated utilizing the rms and survival package in R software. To test the clinical applicability of the nomogram, calibration plots, decision curve analysis (DCA), and time-dependent ROC analysis were applied. Moreover, a risk score analysis was also performed based on the Cox regression results.
Correlation Analysis Between the Expression of Five Identified Genes and Immune Cell Infiltration
The relationship between the level of five identified genes and immune cell infiltration was investigated in the TIMER database (), an online database that provides the clinical relevance of tumor immune subsets and corrects the output results in a multivariable Cox proportional hazard model.18 In the Survival module of the TIMER database, we analyzed the relationship between the expression levels of five identified genes and the infiltration of six immune cells (neutrophils, B cells, CD8+ T cells, CD4+ T cells, dendritic cells, and macrophages).
Genetic Mutation Analysis of Five Identified Genes
Mutations of the five identified genes in tumor tissues were analyzed using the cBioPortal () database.19 The clinic data from 640 SARC samples containing gene copy-number alterations and prognostic information was obtained from the TCGA database. Using the cBioPortal, we also explored the correlation between mutations of the five identified genes and prognostic indicators of OS and disease-specific survival (DSS) in patients with SARC. The z-score threshold of mRNA expression (RNASeq V2 RSEM) was set as ±1.8.
Immune Checkpoint and N6-Methyladenosine (m6A) Modification Analysis
A standardized universal SARC dataset was downloaded in the UCSC database ().20 Correlation between 60 checkpoint inhibitor genes (24 inhibitory genes and 36 stimulatory genes) or 21 marker genes of m6A modification and expression of five identified genes was analyzed using the Sangerbox database ().
Results
Identification of CDEGs Between the SARC and Normal Groups
The differential expression genes of SARC and normal tissues were obtained from three GSE data files. As shown in Figure 1C, principal component analysis (PCA) manifested that the transcriptional levels of genes between SARC and normal tissues were significantly separated in GSE48418, and coincident in GSE39262 (Figure 1A) and GSE21122 (Figure 1B). In Figure 1D–F, the volcano plot presented the distribution of the differential genes in three GSE files. The red point represented upregulation genes based on the criteria of p-value < 0.05 and logFC > 1, whereas blue points represented the downregulated genes in three GSE files (p-value < 0.05 and logFC < −1). The clustering heatmap indicated that the differential expression genes in SARC were distinct from normal samples (Figure 2A–C). As shown in the Venn diagram, 42 CDEGs genes in the three GSE files were identified (Figure 2D).
Figure 1
PCA plots of GSE files and Volcano maps of differential expression genes. (A–C) represents the PCA analysis plot of GSE39262, GSE 21122, and GSE48418. Red and blue spots represent samples from SARC and Normal (Control) groups, respectively. (D–F) represents the Volcano plot of GSE39262, GSE21122, and GSE48418. Red and green spots represent differentially expressed genes, red represent up-regulated genes and green represent down-regulated genes.
Figure 2
Heatmaps and Venn diagram of differential expression genes. Heatmap of differential expression genes in GSE39262 (A), GSE21122 (B), and GSE48418 (C). Venn diagram of differential expression genes in GSE39262, GSE21122, and GSE48418 (D).
PCA plots of GSE files and Volcano maps of differential expression genes. (A–C) represents the PCA analysis plot of GSE39262, GSE 21122, and GSE48418. Red and blue spots represent samples from SARC and Normal (Control) groups, respectively. (D–F) represents the Volcano plot of GSE39262, GSE21122, and GSE48418. Red and green spots represent differentially expressed genes, red represent up-regulated genes and green represent down-regulated genes.Heatmaps and Venn diagram of differential expression genes. Heatmap of differential expression genes in GSE39262 (A), GSE21122 (B), and GSE48418 (C). Venn diagram of differential expression genes in GSE39262, GSE21122, and GSE48418 (D).
Functional Enrichment and PPI Analysis of CDEGs in SARC
As shown in Figure 3A, the CDEGs significantly participated in the “response to acid chemical” and “collagen−containing extracellular matrix” pathways. Figure 3B shows the visualized network of the functional enrichment. Details were presented in . With respect to the PPI analysis, the top 5 interacted genes of CDECs (ASPM, CCNB2, PRC1, AURKA, SMC2) were identified according to the degree of tight connection in Cytoscape (Figure 4A and B).
Figure 3
Enrichment analysis of the CDEGs in SARC (Metascape). Go and KEGG enrichment analysis predicted the functional roles of the target host gene (A) and provided a visual network (B).
Figure 4
Protein-protein interaction networks for CDEGs (Cytoscape). (A) Protein-protein interaction networks for 42 CDEGs. (B) Protein-protein interaction networks for five identified genes.
Enrichment analysis of the CDEGs in SARC (Metascape). Go and KEGG enrichment analysis predicted the functional roles of the target host gene (A) and provided a visual network (B).Protein-protein interaction networks for CDEGs (Cytoscape). (A) Protein-protein interaction networks for 42 CDEGs. (B) Protein-protein interaction networks for five identified genes.
Aberrant Expressions of the Five Identified Genes and Prognostic Values in SARC
Figure 5 presented the differential expression of five identified genes between tumor and normal tissues, and we found the five genes were significantly amplified in the tumor samples (p<0.05 for all). Moreover, we found the overexpression of ASPM was associated with poor DFS and PFI (Figure 6B and L), but not for OS (Figure 6A and K). A higher level of CCNB2 was related to poor OS (Figure 6C and M) of SARC. While no relationship was observed between CCNB2 expression and DFS, PFI (Figure 6D and N). As shown in Figure 6E and O, a higher level of PRC1 was associated with poor OS of SARC. Meanwhile, we found that PRC1 expression was significantly correlated to DFS and PFI (Figure 6F and P). Notably, we found AURKA was only statistically correlated with the DFS of SARC (Figure 6G–H and Q–R). Furthermore, SMC2 expression was also associated with the worse OS and PFI in SARC patients (Figure 6I–J, Figure 6S–T). Based on the univariate analysis, we found that CCNB2, PRC1, SCM2, Residual tumor, Metastasis, Margin status, and Tumor multifocal were associated with the clinical outcomes of SARC (). Combining these parameters, we constructed a prognostic nomogram, and the C-index was 0.711 (Figure 7A). The predictive ability and clinical capacity of the nomogram were corroborated by the calibration curve and DCA (Figure 7B and C). By time-dependent ROC analysis (Figure 7D), we found that the constructed nomogram has great applicability to predict 3-year and 5-year survival in SARC, with an AUC of 0.749 and 0.729. Based on the variables of the generated nomogram, the risk score for each patient was calculated and presented in Figure 7E. According to the median value of the risk score, the patients can be divided into low and high-risk groups. On the basis of the distribution of survival times, we can tell that the higher the risk score, the poorer survival outcomes in SARC patients. To investigate whether the heterogeneity of the sarcoma subtypes influences the results of our study, we obtained six sarcoma subtypes from TCGA database including dedifferentiated liposarcoma (58 cases), leiomyosarcoma (106 cases), myxofibrosarcoma (25 cases), pleomorphic sarcoma (52 cases), synovial sarcoma (10 cases), and malignant peripheral nerve sheath tumors (10 cases). We compared the expression level of the five target genes among six subtypes. As shown in , overall expression differences of the target genes in six subtypes of SARC were relatively small.
Figure 5
Distinct expression of five identified genes in SARC tissues and adjacent normal tissues (GEPIA). The aberrant expression of ASPM (A), CCNB2 (B), PRC1 (C), AURKA (D), and SMC2 (E) in SARC tissues. *P<0.05.
Figure 6
Prognostic role of the five identified genes in SARC patients (GEPIA, TCGA). Effect of ASPM expression on OS (A and K), DFS (B) and PFI (L) in SARC patients. Effect of CCNB2 expression on OS (C and M), DFS (D) and PFI (N) in SARC patients. Effect of PRC1 expression on OS (E and O), DFS (F) and PFI (P) in SARC patients. Effect of AURKA expression on OS (G and Q), DFS (H) and PFI (R) in SARC patients. Effect of SMC2 expression on OS (I and S), DFS (J) and PFI (T) in SARC patients.
Figure 7
The logistic regression-based predicted nomogram for the probability of survival time among SARC individuals (A). Calibration evaluation of nomogram predicted survival probability (B). Decision curve analysis (DCA) of the nomograms for OS of SARC (C). Time-dependent ROC analysis of the nomograms to predict 1-year, 3-year, and 5-years of OS (D). Analysis of Risk score of a risk-score model based on the Cox regression results of the OS of SARC (E).
Distinct expression of five identified genes in SARC tissues and adjacent normal tissues (GEPIA). The aberrant expression of ASPM (A), CCNB2 (B), PRC1 (C), AURKA (D), and SMC2 (E) in SARC tissues. *P<0.05.Prognostic role of the five identified genes in SARC patients (GEPIA, TCGA). Effect of ASPM expression on OS (A and K), DFS (B) and PFI (L) in SARC patients. Effect of CCNB2 expression on OS (C and M), DFS (D) and PFI (N) in SARC patients. Effect of PRC1 expression on OS (E and O), DFS (F) and PFI (P) in SARC patients. Effect of AURKA expression on OS (G and Q), DFS (H) and PFI (R) in SARC patients. Effect of SMC2 expression on OS (I and S), DFS (J) and PFI (T) in SARC patients.The logistic regression-based predicted nomogram for the probability of survival time among SARC individuals (A). Calibration evaluation of nomogram predicted survival probability (B). Decision curve analysis (DCA) of the nomograms for OS of SARC (C). Time-dependent ROC analysis of the nomograms to predict 1-year, 3-year, and 5-years of OS (D). Analysis of Risk score of a risk-score model based on the Cox regression results of the OS of SARC (E).
The Correlation Between the Expression of the Five Identified Genes and Infiltration Levels of Immune Cells
The tumorigenesis and development of tumors are linked to inflammatory responses and immune cell infiltration.21 Thus, we conducted a comprehensive analysis of the relationship between the level of the five identified genes and immune cell infiltration (B cells, CD8+T cells, CD4+T cells, macrophage, neutrophil, and dendritic cells) in SARC. As shown in Figure 8A, there were significant correlations between the infiltration levels of B cells, CD8+T cells, CD4+T cells, and ASPM expression levels, while CD4+T cell infiltration was significantly correlated to CCNB2 expression level. Besides, the infiltration of CD4+T cells and macrophages were closely related to PRC1 and SMC2 expression, while B cells and dendritic cell infiltration were correlated to AURKA expression. Moreover, the enrichment of CD4+T and neutrophil cells in sarcoma tissue was accompanied by a longer life expectancy (Figure 8B). Specific information on the distribution of each immune subset at each copy number (arm-level deletion, diploid/normal, arm-level gain, and high amplification) status in SARC is presented in .
Figure 8
Interactions of COL4As expression and immune system. (A) Correlation of the expression of five identified genes and the level of six immune cells infiltration. (B) The correlation between immune cell infiltration and survival time in SARC patients.
Interactions of COL4As expression and immune system. (A) Correlation of the expression of five identified genes and the level of six immune cells infiltration. (B) The correlation between immune cell infiltration and survival time in SARC patients.
Correlations Between the Five Identified Gene Mutations in SARC and OS, DSS
As shown in Figure 9A, among 640 samples, the mutation rate of ASPM, CCNB2, PRC1, AURKA, and SMC2 was 2.2%, 0.9%, 2%, 2.3%, and 2.2%, respectively. Of interest, we observed lower OS and DSS in the altered group (Figure 9B and C). Moreover, the altered and unaltered groups differed in some clinic parameters such as tumor mutational burden (TMB), Fraction Genome Altered, and MSIsensor Score ().
Figure 9
Genetic mutations in the five identified genes and their association with OS and DSS of SARC patients (cBioPortal). A minor mutation rate (9%) of the target genes was observed in SARC patients (A). mRNA expression z-scores of target genes were related to all samples. A significant correlation was noticed between the genetic alterations of the five identified genes and DSS, OS in SARC (B and C).
Genetic mutations in the five identified genes and their association with OS and DSS of SARC patients (cBioPortal). A minor mutation rate (9%) of the target genes was observed in SARC patients (A). mRNA expression z-scores of target genes were related to all samples. A significant correlation was noticed between the genetic alterations of the five identified genes and DSS, OS in SARC (B and C).
Results of Immune Checkpoint and m6A Marker Gene Analysis
The immune checkpoint inhibitor has been successfully applied in the treatment of several tumors.22,23 m6A modification is the most common nucleotide modification in human internal, which is considered involved in gene expression regulation and diseases.24,25 In this study, we explored the relationship between 60 immune checkpoint genes, m6A modification, and the expression of five identified genes. As presented in Figure 10, the ASPM was significantly correlated with 42 of 60 immune checkpoints and 18 of 21 marker genes. And the CCBN2 was obviously associated with 18 of 60 immune checkpoints and 12 of 21 marker genes. For the expression level of PRC1, we observed a significant correlation with 44 of 60 immune checkpoints and 18 of 21 marker genes. As for AURKA, we found that 45 of 60 immune checkpoints and 19 of 21 marker genes were associated with the level of AURKA. Then, the expression level of SMC2 was correlated with 31 of 60 immune checkpoints and 19 of 21 marker genes.
Figure 10
Correlation of the expression of five identified genes and 60 immune checkpoints (A), 21 marker genes of m6A modification (B). *P<0.05.
Correlation of the expression of five identified genes and 60 immune checkpoints (A), 21 marker genes of m6A modification (B). *P<0.05.
Discussion
Many studies have clarified the molecular mechanism and prognostic markers of various subtypes of sarcoma from the perspective of gene expression.10–12,14,26,27 However, the prognostic value of the differential expressed genes in the integrated subtypes of sarcoma remains to be reported. In this study, we obtained three GSE series profiles from the GEO database, and differentially expressed genes were selected. Then, 42 CDEGs that were differentially expressed were identified. Based on the analysis of the PPI network, the top five genes (ASPM, CCNB2, PRC1, AURKA and SMC2) were further selected. After that, the correlation between the expression of the five genes and the prognosis, genes mutation, immune infiltration, and methylation of SARC were explored.Previous studies had reported the suppressive effect in cancer stemness and tumorigenicity when downregulation the expression of ASPM in prostate cancer.28 Consistently, we found the lower expression of ASPM was correlated with better OS and PFI in SARC. It is well acknowledged that the expression of CCNB2 is closely related to the cell cycle pathway. Based on previous research, CCNB2 is correlated with the plasmalemma and is indispensable for the meiosis of mouse oocytes.29 Song et al have reported that the CCNB2 could act as a hub gene related to the progression and prognosis of hepatocellular carcinoma,30 and lower expression of CCNB2 restricts the proliferation and metastasis of tumor cells.31 In this study, CCNB2 was overexpressed in SARC than in normal samples and such overexpression was associated with a worse prognosis of SARC. Hernandez-Ortega et al indicated that PRC1 is the first substrate of phosphoregulation in the complex of cyclin-dependent kinase/ cyclin, which decreased the cellular proliferative capacity after downregulating the level of PRC1.32,33 AURKA belongs to the family of serine/threonine kinases, which widely participate in the mechanism of occurrence and development of various tumors.34 The high level of AURKB is associated with worse survival in SARC patients.34 However, based on GIPIA and TCGA databases, we found the overexpression of AURKA was only related to poor DFS. The impact of the AURKA level on the OS of SARC needs to be further verified. Concerning SCM2, Zhong et al found that it plays an unfavorable role in pancreatic cancer through gene expression sequence analysis.35 The prognostic role of the SMC gene family in sarcoma had also been explored, and the higher expression level of the SMC2 gene was found to be related to the worse prognosis of SARC.26 These results were consistent with the finding of the current study. Sarcomas are a heterogeneous group of tumors that consist of a wide variety of histological subtypes. In terms of disease localization, recurrence rate, aggressiveness, and genetic complexity, the clinic results among sarcoma subtypes also various.36,37 Besides, it’s obviously challenging to perform a comprehensive analysis on a single subtype of sarcoma due to their rarity. By comparison of the differential expression of target genes in six sarcoma subtypes, we found the magnitude of target gene level within subtypes was small, which may indicate that the difference of the target genes was not as pronounced in different subtypes of sarcoma. Although age is related to the vast majority of tumors,38 univariate/multivariate analysis of overall survival in SARC patients shows that age was not statistically associated with the prognosis of SARC. No doubt, more in vivo/vitro follow-up studies and experiments are needed to verify the clinical values of the five genes in SARC.Over the past few years, increasing attention has been paid to the relationship between the extracellular matrix and both local invasion and distant metastasis. Tumor metastasis and drug resistance are related to changes in the extracellular matrix.39,40 The decreased expression of types I and IV collagen α1 gene is related to the drug resistance of chondrosarcoma cells.39,41 Noticeably, among the 42 DEGs, TIMP3 belongs to the TIMP gene family, which plays an inhibitory role in matrix metalloproteinases, a group of peptidases involved in the degradation of the extracellular matrix. Interestingly, another gene of this family, TIMP1, has been recently correlated with soft sarcoma response to chemotherapy and cell invasiveness, exhibited a positive prognostic role in soft tissue sarcoma DFS.39 Therefore, the abnormalities in signaling and structural components of the extracellular matrix may play a vital role in sarcoma development.39 In this study, we also found that the 42 CDEGs were most significantly enriched in signaling pathways related to “collagen−containing extracellular matrix” pathways, which suggested a vital function of extracellular matrix in SARC.The occurrence and development of tumors are always accompanied by genes mutation.7,42 In the current study, a low survival-related mutation rate of five identified genes was observed, suggesting that the mutation of the five identified genes was harmful and shortened the survival time of SARC patients. Therefore, genetic assessment of five identified genes mutations should be considered in measuring clinical outcomes of SARC. However, the correlation between the clinical parameters of SARC and the mutation of COL4As needs to be verified by further lab results.Immune cell infiltration not only plays an essential role in tumors but also acts as the reference for both immunotherapy and clinical outcome.43,44 In this study, the significant correlation between the expression level of all five identified genes and immune infiltration indicated a tight interaction between gene expression and immune cell infiltration in SARC. Meanwhile, immune checkpoint analysis has been a hot spot in the past few years due to ineffective immunotherapy in some tumors, especially in late-stage patients.45 It has been proved that immune checkpoints are involved in several types of soft tissue sarcoma.46 Previously, many studies have shown that RNA methylation (m6A) participated in the initiation, progression, and metastasis of many tumors.24,25 And M6A regulators are found as an independent indicator in the prognosis of some tumor patients such as colorectal cancer47 and gastric cancer.48 In the current study, we found that five identified gene expression levels were significantly associated with multiple immune checkpoint genes and m6A regulators, which indicated that five identified genes widely participated in the clinic progress of SARC through checkpoint blockade and modification of m6A.
Limitations
As the first study to investigate the multimolecular characteristics and prognostic role of the five identified genes in SARC, several limitations should be recognized. First, we did not explore the role of target genes in specific sarcoma subtypes of SARC. Second, we used multiple online databases to perform the current comprehensive analysis, the data heterogeneity would be inevitably existing, which could reduce the reliability of our results. Further follow-up clinical studies and experiments in vitro/vivo are required to verify the clinical values of the five identified genes in each subtype of SARC.
Conclusion
We comprehensively analyzed the relationship between the expression level of the five identified genes and the prognosis, mutation, immune infiltration, methylation, and checkpoints of SARC patients. We found the expression level of CCNB2, PRC1, and SMC2 was significantly correlated with the survival of SARC. The five identified genes are widely involved in immune infiltration and impact the prognosis of SARC. Meanwhile, we found the five identified genes were significantly associated with m6A modification and immune checkpoints. This study provides a theoretical basis for the research about the correlation between the level of five identified genes and sarcoma, but further experiments are needed to explore the underlying mechanisms.
Authors: Weiwei Shan; Patricia Y Akinfenwa; Kari B Savannah; Nonna Kolomeyevskaya; Rudolfo Laucirica; Dafydd G Thomas; Kunle Odunsi; Chad J Creighton; Dina C Lev; Matthew L Anderson Journal: Clin Cancer Res Date: 2012-04-25 Impact factor: 12.531
Authors: Ashok R Asthagiri; Dilys M Parry; John A Butman; H Jeffrey Kim; Ekaterini T Tsilou; Zhengping Zhuang; Russell R Lonser Journal: Lancet Date: 2009-05-22 Impact factor: 79.321
Authors: Ian Judson; Jaap Verweij; Hans Gelderblom; Jörg T Hartmann; Patrick Schöffski; Jean-Yves Blay; J Martijn Kerst; Josef Sufliarsky; Jeremy Whelan; Peter Hohenberger; Anders Krarup-Hansen; Thierry Alcindor; Sandrine Marreaud; Saskia Litière; Catherine Hermans; Cyril Fisher; Pancras C W Hogendoorn; A Paolo dei Tos; Winette T A van der Graaf Journal: Lancet Oncol Date: 2014-03-05 Impact factor: 41.316
Authors: Jun Zhong; Ashley Jermusyk; Lang Wu; Jason W Hoskins; Irene Collins; Evelina Mocci; Mingfeng Zhang; Lei Song; Charles C Chung; Tongwu Zhang; Wenming Xiao; Demetrius Albanes; Gabriella Andreotti; Alan A Arslan; Ana Babic; William R Bamlet; Laura Beane-Freeman; Sonja Berndt; Ayelet Borgida; Paige M Bracci; Lauren Brais; Paul Brennan; Bas Bueno-de-Mesquita; Julie Buring; Federico Canzian; Erica J Childs; Michelle Cotterchio; Mengmeng Du; Eric J Duell; Charles Fuchs; Steven Gallinger; J Michael Gaziano; Graham G Giles; Edward Giovannucci; Michael Goggins; Gary E Goodman; Phyllis J Goodman; Christopher Haiman; Patricia Hartge; Manal Hasan; Kathy J Helzlsouer; Elizabeth A Holly; Eric A Klein; Manolis Kogevinas; Robert J Kurtz; Loic LeMarchand; Núria Malats; Satu Männistö; Roger Milne; Rachel E Neale; Kimmie Ng; Ofure Obazee; Ann L Oberg; Irene Orlow; Alpa V Patel; Ulrike Peters; Miquel Porta; Nathaniel Rothman; Ghislaine Scelo; Howard D Sesso; Gianluca Severi; Sabina Sieri; Debra Silverman; Malin Sund; Anne Tjønneland; Mark D Thornquist; Geoffrey S Tobias; Antonia Trichopoulou; Stephen K Van Den Eeden; Kala Visvanathan; Jean Wactawski-Wende; Nicolas Wentzensen; Emily White; Herbert Yu; Chen Yuan; Anne Zeleniuch-Jacquotte; Robert Hoover; Kevin Brown; Charles Kooperberg; Harvey A Risch; Eric J Jacobs; Donghui Li; Kai Yu; Xiao-Ou Shu; Stephen J Chanock; Brian M Wolpin; Rachael Z Stolzenberg-Solomon; Nilanjan Chatterjee; Alison P Klein; Jill P Smith; Peter Kraft; Jianxin Shi; Gloria M Petersen; Wei Zheng; Laufey T Amundadottir Journal: J Natl Cancer Inst Date: 2020-10-01 Impact factor: 13.506