Literature DB >> 35845531

Bioinformatics analysis of SOXF family genes reveals potential regulatory mechanism and diagnostic value in cancers.

Dan Wu1, Chuan Jiang1, Jing-Jing Zheng1, De-Sheng Luo1, Ji Ma1, Hai-Feng Que1, Chao Li2, Chao Ma2, Hui-Yong Wang2, Wei Wang2, Hong-Tao Xu1.   

Abstract

Background: SOXF family genes (SOX7, SOX17, SOX18) have been reported to involved in tumorigenesis and development in previous articles, separately. But data sources, analysis contents and criteria are not same. Here, we focused on SOXF genes to analyze the regulatory mechanisms and diagnostic value at the same standards.
Methods: This study analyzed functions, expressions, methylations, and mutations of SOXF genes through public databases including Metascape, Gene Expression Profiling Interactive Analysis (GEPIA), cBioPortal, Tumor IMmune Estimation Resource (TIMER), and Kaplan-Meier Plotter. TIMER applies a deconvolution method to infer the abundance of tumor-infiltrating immune cells (TIICs) from gene expression profiles. Metascape combines several biological functions and over 40 independent knowledge bases within one integrated portal. GEPIA analyses RNA sequencing expression data from the The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) projects. The cBioPortal visualizes and analyses genetic data from cancer studies.
Results: This study found that SOXF genes had low expressions in multiple types of cancer, such as lung cancer and breast cancer (ANOVA differential methods, |log2FC| cutoff: 1, q value cutoff: 0.01). The lung adenocarcinoma (LUAD) patients with high expression of SOX7 [HR =0.72 (0.61-0.85), logrank P=8.1e-05) and SOX17 [HR =0.54 (0.45-0.64), logrank P=1.7e-12] had a higher overall survival (OS) rate. Expression of SOX7 was significantly related to the copy number variation (CNV) (P=3.02e-8) and promoter methylation level (P=5.33e-14), while SOX17 was only related to the promoter methylation level (P=3.32e-12). The expression of SOXF genes was positively correlated with CD4+ T cell infiltration (SOX7: P=8.32e-07, SOX17: P=4.93e-06, SOX18: P=1.61e-11). The AUC for cg07660671 site of SOX7, cg15377283 site of SOX17, and cg24199599 site of SOX18 in distinguishing between normal and tumor in LUAD, intestinal cancer, and breast cancer reached 0.9. SOXF genes were mainly involved in transcriptional regulation, and the Wnt signaling pathway and low expression of SOXF genes in tumor tissue had a strong negative correlation with tumor hypoxia (correlation: -0.35, P≤0.001). Conclusions: This study implied that the expression of SOX7 and SOX17 are potential prognosis markers for patients with Lung cancer and the SOXF genes methylation is potential biomarkers for pan-cancer screening. The SOX7 and SOX17 might modulate the Wnt signaling pathway and the expression of SOXF family genes was significantly negatively correlated with tumor hypoxia. 2022 Annals of Translational Medicine. All rights reserved.

Entities:  

Keywords:  SOXF family genes; diagnosis; immune infiltration; methylation; prognosis

Year:  2022        PMID: 35845531      PMCID: PMC9279774          DOI: 10.21037/atm-22-2749

Source DB:  PubMed          Journal:  Ann Transl Med        ISSN: 2305-5839


Introduction

Cancer is one of the biggest public health challenges today. According to the World Health Organization (WHO), 9.6 million people died of cancer in 2018. The most common cancers include lung cancer, breast cancer, colorectal cancer, prostate cancer, skin cancer (non-melanoma), and stomach cancer (1). Tumors occur due to genetic and epigenetic changes causing more and more healthy cells to transform into cancer cells. These cancer cells are characterized by uncontrolled proliferation, high survival rate, unlimited growth, replication potential, and strong angiogenesis, invasion, and metastasis ability (2). Therefore, studies on cancer are very necessary, which can help determine the cause of the disease and explore new treatment methods to prevent, diagnose, and treat different types of cancer (3). During the development of cancer, various key factors are involved, all of which contribute to tumorigenesis. The Hallmarks of Cancer may include genome instability& mutation, resisting cell death, sustaining proliferative signaling, non-mutational epigenetic reprogramming, avoiding immune destruction, etc. Many of the factors are identified as proto-oncogenes, and play an important role in tumorigenesis and development. For example, platelet-derived growth factor (4), insulin-like growth factor axis (5), forkhead/wing helix box transcription factor (Fox) family (6), signal transduction pathways such as Wnt (7) and Notch (8) pathways, and viruses such as human papilloma virus, Epstein-Barr virus, and the hepatitis B and C viruses. SOX family genes are a series of important transcription factors involved in tumorigenesis and development. They contain a variety of transcriptional regulators, which can mediate DNA binding through the highly-conserved high mobility group (HMG) domain. Some of these SOX family transcription factors tightly control cell differentiation in cancer, and some are involved in progression and metastasis (9). The SOX gene family includes SOXA (SRY), SOXB1 (SOX1, SOX2, SOX3), SOXB2 (SOX14, SOX21), SOXC (SOX4, SOX11, SOX12), SOXD (SOX5, SOX6, SOX13), SOXE (SOX8, SOX9, SOX10), SOXF (SOX7, SOX17, SOX18), SOXG (SOX15), and SOXH (SOX30). The sex determining region Y gene (SRY) is a member of the SRY-like-box (SOX) family of DNA binding proteins and contains a central HMG region (10). In the process of sexual development, SRY is a determinant of the testis (11), which is also the physiological evidence that it plays a key role in gender differences during embryonic development (12). In addition, it has also been reported in the literature that the up-regulated expression of SRY is associated with poor prognosis of liver cancer, and there is no gender difference in this correlation (13). The SOX1, SOX2, and SOX3 are the three members of SOXB1 subclass transcription factors, and they have similar sequences, expression patterns, and overexpression phenotypes (14). The data show that high levels of SRY box transcription factor 1 (SOX1) are associated with lower overall survival (OS) rates in some patients, and suggest that SOX1 is a potential target for the glioma stem cell (GSC) population in glioblastoma (15). The radiation-activated PI3K/AKT pathway promotes the induction of tumor stem cell-like cells by up-regulating SOX2 in colorectal cancer (16). As the two members of the SOXB2 subclass transcription factors, SOX14 can promote the proliferation and invasion ability of cervical cancer cells by activating the Wnt/β-catenin pathway (17). The methylation of SOX21 gene promoter has great potential to be an epigenetic biomarker for early detection of colorectal cancer (18). Together, SOX4, SOX11, and SOX12 constitute the C group of SRY-related HMG box proteins. Both SOX4 and SOX11 regulate cell differentiation, proliferation, and survival in a number of basic processes, and they may function in a redundant manner to control more developmental, physiological, and pathological processes than currently known (19). The SOXD family genes (SOX5, SOX6, SOX13, and SOX23) are involved in the transcriptional regulation of developmental processes, including embryonic development, nerve growth and cartilage formation. The SOXD gene family was also identified as an important transcriptional regulator associated with a variety of cancers (20). Since the discovery of the SOX factor, members of the SOXF family (SOX7, SOX17, and SOX18) have been identified to play a wide range of roles, especially in cardiovascular development. Recently, SOXF factor was discovered to be a key factor in determining cell fate and regulating cancer cells (21). The SOX15 gene can regulate the proliferation and migration of endometrial cancer cells, and the up-regulation of SOX5 may be valuable for the treatment of endometrial cancer (22). As the main switch of desmosome genes, SOX30 inhibits the proliferation, migration, and invasion of lung adenocarcinoma (LUAD) cells by activating the transcription of desmosome genes (23). In this study, we focused on the investigation of SOXF family genes’ regulatory mechanisms and diagnostic value using the same data sources, analysis contents and criteria. Using public databases and public data, the relationships between gene expression, gene mutation, and gene methylation in LUAD were studied, which have not been reported in other studies. In addition, we also evaluated the potential of SOXF family genes as diagnostic markers in multiple cancers, and revealed their potential mechanisms affecting prognosis. We present the following article in accordance with the STARD reporting checklist (available at https://atm.amegroups.com/article/view/10.21037/atm-22-2749/rc).

Methods

We comprehensively analyzed the biological functions of the SOXF family using a variety of tools. Firstly, functional analysis of SOX gene family was conducted, and expression analysis was done according to sub-group classification. Then, we put our focus on SOXF family. The clinical value of SOXF family expression was analyzed in LUAD. Subsequently, the factors affecting SOXF family expression were analyzed from the visions of mutation, immune microenvironment, methylation and regulation. Meanwhile, the biological functions of SOXF family were conducted, that include Gene Oncology, Pathway, single-cell level functional, etc.

Data collection

In this study, RNAseq data in HTSeq-fragments per kilobase of exon per million reads (HTseq-FPKM) format, clinical information, and prognostic data of The Cancer Genome Atlas (TCGA) LUAD were downloaded in Xena (https://xenabrowser.net/) for subsequent analysis. The RNAseq data in FPKM format was converted to transcripts per million (TPM) and processed by log2 for analysis. In addition, the processed Beta value data of SOXF group genes corresponding to lung cancer, intestinal cancer, and breast cancer were also downloaded in this database. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Gene function enrichment analysis

The analysis tool Metascape (24) (http://metascape.org/gp/index.html#/main/step1) was used to analyze the function of the 20 member genes of the SOX gene family. The Search Tool for the Retrieval of Interacting Genes/Proteins (25) (STRING; version 11.0, http://www.string-db.org/) database was used to predict genes related to SOXF family genes. At the same time, the online analysis tool GeneMANIA (26) (http://genemania.org/) was used to predict gene information related to SOXF Group genes. The intersection of the results was drawn from the two databases, the intersection gene and the three genes of the SOXF family were put in R (version 3.6.3; The R Foundation for Statistical Computing, Vienna, Austria), and the clusterProfiler package (version 2.14.3) was used to perform the enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway.

Gene expression analysis

The analysis tool Gene Expression Profiling Interactive Analysis 2 (27) (GEPIA2; http://gepia2.cancer-pku.cn/#) analysis was used to analyze the expression of 20 gene members of the SOX gene family in multiple cancer types. The analysis function of this tool “multiple genes comparison” was used to categorize according to the family. The expression of SOX genes in pan-cancer was displayed. In addition, the profile tool under Expression DIY was selected for difference analysis, and the analysis of variance (ANOVA) method was chosen for comparison. The difference selection criterion was |log2 (fold change)| >1, q value <0.01. The analysis tool GEPIA (28) (http://gepia.cancer-pku.cn/) was also used in the correlation analysis between SOXF group expression and LUAD tumor stage. Later, the “Correlation function” of this tool was used to analyze the relationship between the SOXF Group gene and the expression of immune cell marker genes.

Prognostic analysis

The analysis tool Kaplan-Meier Plotter (29) (https://kmplot.com/analysis/index.php?P=servic) was used to analyze the prognostic value of SOXF Group expression in LUAD.

Receiver operating characteristic curve drawing

The receiver operating characteristic (ROC) curve in this study was analyzed in R version 3.6.3. After using the R package pROC (version 1.17.0.1) for analysis, the ggplot2 (version 3.3.3) package was used to visualize the results.

Single factor Cox analysis

In this study, the single factor Cox analysis method was used to explore the effects of T stage, N stage, pathological stage, gender, age, and smoking history on the OS rate of patients with LUAD. The R version used for the analysis was 3.6.3, and the R package was survival (version 3.2-10).

Gene mutation analysis

The online analysis tool cBioPortal (30) (https://www.cbioportal.org/) was used to analyze the gene mutation frequency and main mutation types of SOXF Group genes in LUAD, and to explore the relationship between SOXF group gene copy number variation (CNV) and gene expression. At the same time, the Cancer Cell Line Encyclopedia (CCLE) database (31) (https://depmap.org/portal/) was used to verify the relationship between SOXF group gene CNV and gene expression.

Immune infiltration analysis

The online analysis tool Tumor IMmune Estimation Resource (TIMER; https://cistrome.shinyapps.io/timer/) (32,33) was used to analyze the correlation between the SOXF group gene copy number and the level of immune cell infiltration.

Methylation analysis

The online analysis tool University of ALabama at Birmingham CANcer data analysis Portal (UALCAN) (34) (http://ualcan.path.uab.edu/) was used in selecting TCGA data to analyze the methylation of SOX7, SOX17, and SOX18 gene promoter regions of normal and LUAD samples. The online analysis tool cBioPortal was used to select LUAD (TCGA, Firehose Legacy) and Lung Adenocarcinoma (TCGA, Nature 2014), and analyze the relationship between the methylation of SOXF Group gene promoter region and gene expression in the Plots module. We used the online analysis tool DiseaseMeth V2.0 (35) (http://bio-bigdata.hrbmu.edu.cn/diseasemeth/analyze.html). In LUAD, we selected ARRAY and NGS as the platform. We then selected t-test; significant P value: 0.01; absolute methylation difference >0.2 as the comparison method, to verify the methylation status of SOXF Group and RASSF1 genes.

Single cell function analysis

The online analysis tool CancerSEA (36) (http://biocc.hrbmu.edu.cn/CancerSEA/home.jsp) was used to analyze the functions of SOXF group genes in single-cell sequencing data.

Statistical analysis

We performed all the statistical analysis using the R package (V3.6.3) and the default statistical analysis algorithm of the online analysis tool (Metascape, STRING, GEPIA, TIMER, etc.).

Results

SOX gene family function and expression analysis

In order to explore the functions of the SOX gene family, we conducted an enrichment analysis of the 20 genes in the SOX gene family. The results showed that genes of the family were involved in a variety of life processes including deactivation of the beta-catenin transactivating complex, cell fate commitment, and stem cell fate specification (). Subsequently, we continued to explore the expression of this gene family in different cancer types. The results showed that most of the genes in this family were abnormally expressed in a variety of cancer tissues, such as SOX4, SOX7, SOX9, SOX12, SOX13, SOX17, SOX18, and so on (). Later, we selected the SOXF (SOX7, SOX17, SOX18) family genes we were more interested in for further study.
Figure 1

SOX family gene function and expression analysis. (A) SOX family gene function enrichment analysis; (B) SOX family gene expression analysis in pan-cancer.

SOX family gene function and expression analysis. (A) SOX family gene function enrichment analysis; (B) SOX family gene expression analysis in pan-cancer.

Expression analysis of SOXF family genes

First, we further explored the expression details of SOXF family genes (SOX7, SOX17, SOX18) in pan-cancer. The results showed that the expression of SOXF family genes in six cancer tissues, including lung squamous cell carcinoma (LUSC), LUAD, and breast cancer (BRCA), was significantly lower than that in normal samples (). Since lung cancer is a relatively prevalent cancer type in China, LUAD was taken as an example for further exploration and research. shows the expression of SOX7, SOX17, and SOX18 in LUAD, and it can be seen that their expressions in tumor tissues are significantly lower than that in normal tissues. However, there is no significant difference in their expression in different stages of LUAD ().
Figure 2

The expression of SOXF family genes SOX7 (A), SOX17 (B), and SOX18 (C) in pan-cancer.

Figure 3

The expression of SOXF family genes in lung adenocarcinoma. The expression of SOX7 (A), SOX17 (B), and SOX18 (C) in LUAD. The expression of SOX7 (D), SOX17 (E), and SOX18 (F) in different LUAD stages. *, the gene expression is significant different between tumor and normal samples. Pr(F), is the P value of the F-statistic. LUAD, lung adenocarcinoma. num(T), number of tumor samples. num(N), number of normal samples.

The expression of SOXF family genes SOX7 (A), SOX17 (B), and SOX18 (C) in pan-cancer. The expression of SOXF family genes in lung adenocarcinoma. The expression of SOX7 (A), SOX17 (B), and SOX18 (C) in LUAD. The expression of SOX7 (D), SOX17 (E), and SOX18 (F) in different LUAD stages. *, the gene expression is significant different between tumor and normal samples. Pr(F), is the P value of the F-statistic. LUAD, lung adenocarcinoma. num(T), number of tumor samples. num(N), number of normal samples.

Prognostic value evaluation of SOXF family genes

In order to explore the clinical significance of SOXF family genes, we studied the OS of patients with different SOXF family gene expressions. The results showed that patients in the SOX7 and SOX17 high expression group had a higher OS rate (). However, there was no significant difference in OS rate for patients in the SOX18 high expression group and SOX18 low expression group (). Subsequently, we drew the ROC curve to evaluate the prognostic predictive ability of SOXF family genes in different stages. The results showed that the SOXF family genes showed good accuracy in the normal population and patients with LUAD (), and that in the case of further distinguishing cancer stages, SOXF family genes still had high accuracy for the predicting the endpoints of normal people and patients with LUAD (). In addition, we also used single factor Cox analysis to explore the impact of different clinical factors on the OS of patients. The results showed that for LUAD patients, the later the pathological stage, the worse the OS state (), and that there is no relationship between smoking history, age, and gender of the patients and the OS rate of LUAD patients.
Figure 4

Prognostic analysis. The overall survival analysis of different SOX7 (A), SOX17 (B), and SOX18 (C) expression populations. HR, hazard ratio. Logrank P, P value of the logrank test.

Figure 5

ROC analysis and single factor Cox analysis. The accuracy of SOX7, SOX17, and SOX18 in predicting clinical endpoint of normal and cancer population (A), patients with different cancer stages (B-E). (F) Single factor Cox analysis. ROC, receiver operating characteristic; TPR, true positive rate; FPR, false positive rate; HR, hazard ratio; 95% CI, 95% confidence interval; T/N/M stage, cancer staging system.

Prognostic analysis. The overall survival analysis of different SOX7 (A), SOX17 (B), and SOX18 (C) expression populations. HR, hazard ratio. Logrank P, P value of the logrank test. ROC analysis and single factor Cox analysis. The accuracy of SOX7, SOX17, and SOX18 in predicting clinical endpoint of normal and cancer population (A), patients with different cancer stages (B-E). (F) Single factor Cox analysis. ROC, receiver operating characteristic; TPR, true positive rate; FPR, false positive rate; HR, hazard ratio; 95% CI, 95% confidence interval; T/N/M stage, cancer staging system.

Mutations of SOXF family genes

Gene mutations are of great significance to the tumorigenesis and development of cancer. Therefore, we explored the mutations of SOXF family genes in LUAD. In various published data (TCGA, Memorial Sloan Kettering Cancer Center (MSKCC), etc.), the overall mutation frequency of SOXF family genes is 7% (SOX7), 3% (SOX17), and 5% (SOX18) respectively (). The SOX7 gene has more deep deletion mutations (); SOX17 has more evenly distributed mutations, and relatively more occurrence of amplification (); and SOX18 has more amplification mutations (). In the same type of mutation data, the frequency of gene mutations is quite different.
Figure 6

Gene mutation analysis. (A) The overall mutation frequency of SOXF family genes. SOX7 (B), SOX17 (C), and SOX18 (D) specific mutation type analysis. +, this kind of mutation result exists in that study. −, this kind of mutation result doesn’t exist in that study.

Gene mutation analysis. (A) The overall mutation frequency of SOXF family genes. SOX7 (B), SOX17 (C), and SOX18 (D) specific mutation type analysis. +, this kind of mutation result exists in that study. −, this kind of mutation result doesn’t exist in that study. In order to explore the correlation between CNV and SOXF family gene expression, we performed correlation analysis between the two. The results showed that the expression of SOX7 was significantly affected by the copy number (correlation coefficient was about 0.2, P<0.05), that deep deletion may be the main reason for the low expression of this gene (), and that the expression of SOX17 and SOX18 had little correlation with copy number, which may be caused by other factors (). The above conclusions were verified in another database (CCLE; ).
Figure 7

Correlation analysis between SOXF group copy number variation and mRNA expression. cBioPortal database, analyzing the correlation between SOX7 (A), SOX17 (B), SOX18 (C) CNV and mRNA expression. CCLE database, analyzing the correlation between SOX7 (D), SOX17 (E), and SOX18 (F) CNV and mRNA expression. CNV, copy number variation; mRNA, messenger RNA; CCLE, cancer cell line encyclopedia.

Correlation analysis between SOXF group copy number variation and mRNA expression. cBioPortal database, analyzing the correlation between SOX7 (A), SOX17 (B), SOX18 (C) CNV and mRNA expression. CCLE database, analyzing the correlation between SOX7 (D), SOX17 (E), and SOX18 (F) CNV and mRNA expression. CNV, copy number variation; mRNA, messenger RNA; CCLE, cancer cell line encyclopedia.

The relationship between SOXF group gene copy number, gene expression, and immune cell infiltration level

The above results confirm that there is a certain relationship between gene copy number and expression, and gene expression can affect the infiltration of immune cells. Therefore, we further explored the relationship between copy number and immune cell infiltration level. The results showed that in LUAD, the B cell and CD8+ T cell immune infiltration levels were significantly reduced in the SOX7 high amplification population (); the B cell, CD4+ T cell, macrophage cell, neutrophils, and dendritic cell (DC) infiltration levels were reduced in the SOX17 high amplification population (); only the DC infiltration level was significantly reduced in the SOX18 high amplification population (); and that SOXF group gene expression was positively correlated with CD4+ T cell infiltration (); macrophage cell infiltration was positively correlated with the expression of SOX7 and SOX17 (); and neutrophils and DC infiltration had a more significant relationship with SOX7 expression (). This is roughly the same as the relationship between CNV and immune cell infiltration.
Figure 8

Correlation analysis of immune cell infiltration. Correlation analysis between SOX7 (A), SOX17 (B), and SOX18 (C) copy number and immune infiltration. Correlation analysis of SOX7 (D), SOX17 (E), and SOX18 (F) expression and immune infiltration. LUAD, the lung adenocarcinoma. *, cutoff 0.1; **, cutoff 0.01; ***, cutoff 0.001. cor, correlation.

Correlation analysis of immune cell infiltration. Correlation analysis between SOX7 (A), SOX17 (B), and SOX18 (C) copy number and immune infiltration. Correlation analysis of SOX7 (D), SOX17 (E), and SOX18 (F) expression and immune infiltration. LUAD, the lung adenocarcinoma. *, cutoff 0.1; **, cutoff 0.01; ***, cutoff 0.001. cor, correlation. In order to further explain the influence of SOXF group gene expression on immune cells, we explored the relationship between SOXF group gene and some immune cell markers. The results showed that SOX7 expression was highly positively correlated with CD4 expression, while SOX17 and SOX18 showed a weak correlation. However, their NOS2 expression was highly positively correlated. This is consistent with the results of the previous immune infiltration investigation ().
Table 1

Correlation analysis between SOXF group and immune cells biomarkers in LUAD by using the GEPIA database

Immune cellBiomarkerSOX7SOX17SOX18
R valueP valueR valueP valueR valueP value
B cellCD190.212.90E-060.29.90E-060.185.30E-05
CD79A0.185.30E-050.193.50E-050.172.60E-04
CD8+ T cellCD8A0.150.00110.0850.0610.0640.16
CD8B0.0690.130.0490.280.0320.49
CD4+ T cellCD40.452.90E-250.372.50E-170.32.30E-11
M1 macrophageNOS20.391.40E-180.431.70E-230.41.20E-19
IRF50.21.10E-050.133.10E-030.187.40E-05
PTGS20.211.90E-060.28.00E-060.0895.20E-02
M2 macrophageCD1630.334.10E-140.314.50E-120.21.20E-05
VSIG40.272.70E-090.221.00E-060.127.80E-03
MS4A4A0.341.10E-140.282.80E-100.142.10E-03
NeutrophilCEACAM80.192.80E-050.191.70E-050.0731.10E-01
ITGAM0.341.80E-140.271.00E-090.232.40E-07
CCR70.351.30E-150.291.50E-100.212.30E-06
Dendritic cellHLA-DPB10.284.40E-100.214.00E-060.171.30E-04
HLA-DQB10.111.30E-020.127.10E-030.171.40E-04
HLA-DRA0.214.40E-060.133.50E-030.0671.40E-01
HLA-DPA10.232.90E-070.159.30E-040.0963.60E-02
CD1C0.291.50E-100.245.60E-080.142.10E-03
NRP10.221.70E-060.191.80E-050.094.90E-02
ITGAX0.341.00E-140.265.40E-090.294.30E-11

LUAD, lung adenocarcinoma; GEPIA, Gene Expression Profiling Interactive Analysis.

LUAD, lung adenocarcinoma; GEPIA, Gene Expression Profiling Interactive Analysis.

SOXF group gene family methylation analysis and biomarker exploration

Gene methylation will also affect gene expression (37), so we further explored the methylation of the SOXF group gene family. The results showed that in people with of pathological stages (), different age groups (), different genders (), and different lymph node metastases (), the promoter methylation level of SOXF family genes in normal samples was very low, while the promoter methylation level in tumor samples was significantly increased. However, the promoter methylation level of SOX18 in normal samples was relatively high, higher than that of SOX7 and SOX17. Subsequently, correlation analysis was performed in the different LUAD cohorts of the two TCGAs to explore the relationship between methylation and gene expression in promoter region. The results showed that SOX7 () and SOX17 () methylation significantly negatively regulated gene expression. For SOX18, there was no such result ().
Figure 9

SOXF Group promoter methylation assessment. The methylation status of SOX7 (A), SOX17 (B), and SOX18 (C) in different pathological stages. The methylation status of SOX7 (D), SOX17 (E), and SOX18 (F) in different age groups. The methylation status of SOX7 (G), SOX17 (H), and SOX18 (I) in different genders. The methylation status of SOX7 (J), SOX17 (K), and SOX18 (L) in people with different degrees of lymph node metastasis. Beta Value, represents the ratio between the methylated array intensity and total array intensity, falls between 0 (lower levels of methylation) and 1 (higher levels of methylation).

Figure 10

The correlation analysis SOXF group methylation and gene expression. The correlation analysis of SOX7 (A), SOX17 (C), and SOX18 (E) promoter region methylation and gene expression (TCGA Firehose data set). The correlation analysis of SOX7 (B), SOX17 (D), and SOX18 (F) promoter region methylation and gene expression (TCGA Nature data set). TCGA, The Cancer Genome Atlas.

SOXF Group promoter methylation assessment. The methylation status of SOX7 (A), SOX17 (B), and SOX18 (C) in different pathological stages. The methylation status of SOX7 (D), SOX17 (E), and SOX18 (F) in different age groups. The methylation status of SOX7 (G), SOX17 (H), and SOX18 (I) in different genders. The methylation status of SOX7 (J), SOX17 (K), and SOX18 (L) in people with different degrees of lymph node metastasis. Beta Value, represents the ratio between the methylated array intensity and total array intensity, falls between 0 (lower levels of methylation) and 1 (higher levels of methylation). The correlation analysis SOXF group methylation and gene expression. The correlation analysis of SOX7 (A), SOX17 (C), and SOX18 (E) promoter region methylation and gene expression (TCGA Firehose data set). The correlation analysis of SOX7 (B), SOX17 (D), and SOX18 (F) promoter region methylation and gene expression (TCGA Nature data set). TCGA, The Cancer Genome Atlas. The methylation in the RASSF1 gene promoter region can be used as an effective biomarker for the early diagnosis of lung cancer (38). Therefore, we compared the methylation status of the SOXF group with the methylation status of RASSF1 to evaluate whether SOXF family genes can be used as potential biomarkers. First, we observed the methylation status of RASSF1 in different pathological stages. The methylation status of this gene was low in the normal group and the methylation level in the tumor tissues was high (). In comparison of tumor and normal samples, the methylations of four gene promoter regions were all significantly different, with a P value of <0.01. Among them, the difference between the average methylation degree of SOX17 gene tumor sample and the average methylation degree of normal sample was 0.194, which was the most significant of the four genes, and the degree of difference was better than that of RASSF1 (). In order to further evaluate the feasibility of SOXF group gene methylation for early diagnosis of cancer, its diagnostic capability was explored in LUAD, intestinal cancer, and breast cancer. It was found that the AUC for the cg07660671 site of SOX7, the cg15377283 site of SOX17, and the cg24199599 site of SOX18 in distinguishing between normal and tumor tissue in LUAD, intestinal cancer, and breast cancer reached 0.9 ().
Figure 11

The methylation status of RASSF1 in different pathological stages. LUAD, lung adenocarcinoma; TCGA, The Cancer Genome Atlas.

Table 2

Verification of methylation results

TumorGenomic regionTranscriptGeneP valueMean (MethylTumor)-Mean (MethylNormal)
LUADchr3:50377867-50380367NM_007182 RASSF1 0.000e+000.122
LUADchr8:55368494-55370994NM_022454 SOX17 0.000e+000.194
LUADchr20:62680479-62682979NM_018419 SOX18 1.624e-120.099
LUADchr8:10587584-10590084NM_031439 SOX7 1.624e-120.114

LUAD, lung adenocarcinoma; Mean (MethylTumor), mean of methylation level in tumor samples; Mean (MethylNormal), mean of methylation level in normal samples.

Table 3

SOXF Group pan-cancer methylation marker analysis

Cancer typeGeneProbeAUC (best)SensitivitySpecificity
LUAD SOX7 cg076606710.9240.8411
LUAD SOX17 cg153772830.9780.9311
LUAD SOX18 cg241995990.9150.8651
COAD SOX7 cg076606710.9230.8471
COAD SOX17 cg260594680.9820.9221
COAD SOX18 cg235098960.9730.9421
BRCA SOX7 cg220086250.9200.7790.952
BRCA SOX17 cg001230550.9560.8540.976
BRCA SOX18 cg241995990.8850.7480.940

LUAD, lung adenocarcinoma; COAD, colon adenocarcinoma; BRCA, breast invasive carcinoma; AUC, area under the curve.

The methylation status of RASSF1 in different pathological stages. LUAD, lung adenocarcinoma; TCGA, The Cancer Genome Atlas. LUAD, lung adenocarcinoma; Mean (MethylTumor), mean of methylation level in tumor samples; Mean (MethylNormal), mean of methylation level in normal samples. LUAD, lung adenocarcinoma; COAD, colon adenocarcinoma; BRCA, breast invasive carcinoma; AUC, area under the curve.

SOXF group related genes and function analysis

In order to further analyze the functions of SOXF family genes, we used two online databases () to predict the proteins related to SOXF family genes and constructed a protein interaction network. After taking intersection of the results of the two databases, six related genes were obtained (), namely: CTNNB1, LEF1, POU5F1, TCF7, TCF7L1, and TCF7L2. Then, three SOXF family genes and the above six related genes were analyzed by KEGG and GO. The results showed that these nine genes mainly participated in processes such as transcriptional regulation, Wnt signaling pathway, and so on (). Subsequently, a more in-depth analysis of SOXF group functions was performed through the CancerSEA single-cell sequencing database. From the results, it can be concluded that the expression of SOXF Group was significantly negatively correlated with tumor hypoxia (). In the hypoxia environment, in order to maintain growth and proliferation, tumor cells will change their metabolism and cell behaviors, which may be related to extracellular matrix remodeling and increased migration and metastasis behavior (39).
Figure 12

Functional analysis. (A) STRING analyzing the protein interaction network of SOXF family genes. (B) GeneMANIA predicting the SOXF family gene-related gene network. (C) Veen diagram, searching for the intersection genes predicted by STRING and GeneMANIA. (D) Pathway enrichment analysis of SOXF family genes and their function-related genes. STRING, Search Tool for the Retrieval of Interacting Genes/Proteins; GeneMANIA, tool for predicting gene sets function.

Figure 13

SOXF group single cell function analysis. ***, the gene expression of SOXF group is significant related with hypoxia. geneExp, gene expression.

Functional analysis. (A) STRING analyzing the protein interaction network of SOXF family genes. (B) GeneMANIA predicting the SOXF family gene-related gene network. (C) Veen diagram, searching for the intersection genes predicted by STRING and GeneMANIA. (D) Pathway enrichment analysis of SOXF family genes and their function-related genes. STRING, Search Tool for the Retrieval of Interacting Genes/Proteins; GeneMANIA, tool for predicting gene sets function. SOXF group single cell function analysis. ***, the gene expression of SOXF group is significant related with hypoxia. geneExp, gene expression.

Discussion

This article first explored the functions of SOX family genes through enrichment analysis and found that the SOX gene family was mainly involved in signal transduction, cell fate determination, stem cell differentiation, and other pathways and functions, which is consistent with previous reports on breast cancer (40) and ovarian cancer (41). Then the expression of SOX family genes in pan-cancer was observed. The results showed that most genes of this family were abnormally expressed in a variety of cancer tissues, such as SOX4 (42), SOX7 (43), SOX9 (44), SOX12 (45), SOX13 (46), SOX17 (47), SOX18 (48,49), and so on. Subsequently, we conducted follow-up studies taking SOXF family genes as the main research object. The SOXF group includes three genes, SOX7, SOX17, and SOX18. The expression levels of these three genes in a variety of cancers, including lung cancer (49,50), intestinal cancer (51), breast cancer (50), and so on, were significantly lower than those in normal samples. All of them were solid tumors, which might result from the function of SOX7, SOX17, and SOX18. It was reported that SOX7 (52), SOX17 (53), and SOX18 (54) are related to angiogenesis. Angiogenesis is considered the key and early step of tumorigenesis. Furthermore, it was also revealed as a marker of solid tumor and a key promoter of tumor recurrence (55). Hence, these genes showed abnormal expression in most of solid tumors to promote the cancer development. Then, taking LUAD as an example, we studied the relationship between the expression, mutation, and methylation of SOXF family genes and their prognosis. The results showed that the expression of SOXF family genes decreased in LUAD, but there was no significant difference in their expression in different LUAD pathological stages. Subsequently, according to the expression level of SOXF family genes, the patients were divided into a high expression group and low expression group. It was found that the patients in the SOX7 and SOX17 high expression groups had better OS rate, although there was no significant difference in the survival rate of the patients in SOX18 high expression group and low expression group. This may be related to their own gene functions. According to report, SOX7 is a tumor suppressor. Highly expressed SOX7 can inhibit the cell cycle, promote cell apoptosis, and thereby inhibit cancer (56). There is also similar report on SOX17 in esophageal squamous cell carcinoma. Highly expressed SOX17 can reduce cell migration ability and slow down tumor growth (57). This also explains why SOX7 and SOX17 high expression group patients had a higher OS rate. In addition, we also analyzed the predictive ability of SOXF family genes for the clinical endpoints of cancer patients at different stages, and the results showed that all three genes had good predictive capabilities. In order to further explore the reasons for the differential expression of SOXF family genes, we analyzed the mutations of three genes. The results showed that SOX7 had relatively more deep deletion mutations; SOX17 had more evenly distributed mutations; and SOX18 has more amplification mutations. According to report, the CNV is linearly correlated to gene expression (58), which may be one of the reasons for the abnormal expression of SOXF family genes in cancer tissues. Subsequently, we also performed a correlation analysis on the relationship between the copy number and the expression of SOXF family genes. The results showed that only SOX7 had a copy number significantly correlated with its expression, suggesting that the abnormal expression of SOX7 may be mainly caused by CNV. Since gene expression can help predict immune cell infiltration, which is also one of the main reasons that affect the OS of patients, we also explored the relationship between SOXF family copy number and gene expression and immune cell infiltration. The B cell and CD8+ T cell immune infiltration levels were significantly reduced in the SOX7 high amplification population. The B cell, CD4+ T cell, macrophage cell, neutrophils, and DC infiltration levels were reduced in the SOX17 high amplification population. Only the DC infiltration level was significantly reduced in the SOX18 high amplification population. Although the gene expression of SOX7 was related to its CNV, the CNV of SOX17 and SOX18 also had a certain relationship with immune cell infiltration, suggesting that SOX17 and SOX18 may be more inclined to control downstream genes to achieve regulatory goals during the regulation process. The SOXF group gene expression was positively correlated with CD4+ T cell infiltration, and macrophage cell infiltration was positively correlated with the expression of SOX7. The LUAD patients enriched with CD4+ T and macrophage cells had a better prognosis (59). The DC have been shown to perform antigen presentation, which in turn triggers the anti-tumor immune response mediated by T cells (60). This also indicates that perhaps affecting immune infiltration is one of the ways that SOXF family genes affect the prognosis of patients. The SOXF family genes are positively correlated with the expression of immune cell marker genes CD4 and NOS2, which again confirms the above inference. Further, the activation of SOX7 could modulate interleukin (IL)-33 to recruit tumor-associated macrophages leading to metastasis, and SOX17 deletion also resulted in decreased inflammatory DC in the lungs (61). As only SOX7 had a significant correlation between CNV and its expression, we continued to explore factors that may affect the expression of SOX17 and SOX18 from other perspectives. It has been reported that the methylation of the promoter region can affect the expression of genes (37). Therefore, we explored the methylation of SOXF family genes. The results showed that in people of different pathological stages, age groups, genders, and lymph node metastases, the promoter methylation level of SOXF family genes in normal samples was very low, and promoter methylation in tumor samples was significantly increased. The methylation of SOX7 and SOX17 significantly negatively regulated gene expression. This indicates that the low expression of SOX7 in LUAD is not only due to CNV, but is also affected by the level of methylation. The expression of SOX17 is more affected by the level of methylation, but the factors affecting the changes in SOX18 expression still need to be explored. It has been reported that the methylation of the promoter region of the RASSF1 gene can be used as an effective biomarker for the early diagnosis of lung cancer (37). To explore whether SOXF family genes have similar capabilities, we compared SOXF family genes with RASSF1. In comparison of tumor and normal samples, the methylations of four gene promoter regions were all significantly different. Among them, the difference between the average methylation degree of SOX17 gene tumor sample and the average methylation degree of normal sample was 0.194, and the degree of difference was better than that of RASSF1. Through further analysis, it was found that the AUC for the cg07660671 site of SOX7, the cg15377283 site of SOX17, and the cg24199599 site of SOX18 in distinguishing between normal and tumor in LUAD, intestinal cancer, and breast cancer reached 0.9, indicating that SOXF family genes have the potential to become the biomarkers for methylation in lung cancer diagnosis. Finally, we analyzed the function of SOXF family genes and found six related genes, CTNNB1, LEF1, POU5F1, TCF7, TCF7L1, and TCF7L2. After enrichment analysis, it was found that these genes mainly participated in transcriptional regulation, the Wnt signaling pathway, and a variety of cancer signaling pathways. These pathways are of great significance to the tumorigenesis and development of cancer (7,62), further confirming the important value of these genes for the exploration of cancer. The SOX7 gene disrupts β-catenin (CTNNB1)/BCL9 interaction to suppress Wnt signaling and also inhibits CTNNB1/TCF-mediated transcription (63). The SOX17 gene decreased expression of β-catenin and proteins in the Wnt signaling pathway (64), whereas β-catenin was demonstrated to be inversely correlated with CD8+ T cell infiltration (65). One might assume that inhibition of Wnt/β- catenin signaling can improve CD8+ T cell infiltration and initiation, so it may produce a more favorable immune checkpoint inhibition scheme (66). Conversely, the induction of Wnt signaling plays an important role in maintaining the dryness of memory CD8+ T cells by blocking T cell differentiation (67). The Wnt/β-catenin pathway has also been associated with modulation of innate immunity, such as DC (68). Hence, SOX7 and SOX17 might modulate β-catenin and TCF to regulate the Wnt signaling pathway, resulting in the change of immune cell infiltration. In most solid tumors, due to insufficient oxygen supply from abnormal blood vessels, the needs of rapid proliferation required by cancer cells cannot be met, and hypoxia occurs to a certain extent. The oxygenation level in the same tumor varies greatly between different regions and will change over time. Tumor hypoxia is an important obstacle to effective tumor treatment. In radiotherapy, the main mechanism is producing reactive oxygen species; therefore, tumor hypoxia is radiation resistant (69) and contributes to tumor metastasis (39). In our study, we found that the low expression of SOXF group genes in tumor tissue had a strong negative correlation with tumor hypoxia. This also provides more theoretical support for the promotion effect of SOXF group gene low expression in tumors.

Conclusions

In this article, we integrated bioinformatics tools and public databases to explore the mechanism and diagnostic value of the SOXF gene family. First, we found that the SOXF family genes are lowly expressed in multiple cancer types, but the mechanisms that affect the expression of each gene are very different. For the SOX7 gene, deep deletion, promoter methylation, and immune infiltration were found to affect its expression in tumor samples together. For the SOX17 gene, methylation and immune infiltration are the main factors that affected its expression. For the SOX18 gene, amplification and methylation play important role in its expression. These methods provide new ideas for study of gene expression mechanism in cancer. Second, through systematic analysis of gene expression and methylation in multiple cancer types, SOXF genes expression affect the prognosis of patients and the promoter methylation are potential biomarkers for pan-cancer screening. Diagnostic methods and reagents can be developed based on these results. The article’s supplementary files as
  69 in total

Review 1.  Wnt signaling in dendritic cells: its role in regulation of immunity and tolerance.

Authors:  Daniel Swafford; Santhakumar Manicassamy
Journal:  Discov Med       Date:  2015-04       Impact factor: 2.970

2.  Prognostic significance of SOX18 expression in non-small cell lung cancer.

Authors:  Aleksandra Jethon; Bartosz Pula; Mateusz Olbromski; Bozena Werynska; Beata Muszczynska-Bernhard; Wojciech Witkiewicz; Piotr Dziegiel; Marzena Podhorska-Okolow
Journal:  Int J Oncol       Date:  2014-10-07       Impact factor: 5.650

Review 3.  Wnt signaling pathway in development and cancer.

Authors:  B Taciak; I Pruszynska; L Kiraga; M Bialasek; M Krol
Journal:  J Physiol Pharmacol       Date:  2018-07-04       Impact factor: 3.011

4.  Next-generation characterization of the Cancer Cell Line Encyclopedia.

Authors:  Mahmoud Ghandi; Franklin W Huang; Judit Jané-Valbuena; Gregory V Kryukov; Christopher C Lo; E Robert McDonald; Jordi Barretina; Ellen T Gelfand; Craig M Bielski; Haoxin Li; Kevin Hu; Alexander Y Andreev-Drakhlin; Jaegil Kim; Julian M Hess; Brian J Haas; François Aguet; Barbara A Weir; Michael V Rothberg; Brenton R Paolella; Michael S Lawrence; Rehan Akbani; Yiling Lu; Hong L Tiv; Prafulla C Gokhale; Antoine de Weck; Ali Amin Mansour; Coyin Oh; Juliann Shih; Kevin Hadi; Yanay Rosen; Jonathan Bistline; Kavitha Venkatesan; Anupama Reddy; Dmitriy Sonkin; Manway Liu; Joseph Lehar; Joshua M Korn; Dale A Porter; Michael D Jones; Javad Golji; Giordano Caponigro; Jordan E Taylor; Caitlin M Dunning; Amanda L Creech; Allison C Warren; James M McFarland; Mahdi Zamanighomi; Audrey Kauffmann; Nicolas Stransky; Marcin Imielinski; Yosef E Maruvka; Andrew D Cherniack; Aviad Tsherniak; Francisca Vazquez; Jacob D Jaffe; Andrew A Lane; David M Weinstock; Cory M Johannessen; Michael P Morrissey; Frank Stegmeier; Robert Schlegel; William C Hahn; Gad Getz; Gordon B Mills; Jesse S Boehm; Todd R Golub; Levi A Garraway; William R Sellers
Journal:  Nature       Date:  2019-05-08       Impact factor: 49.962

5.  High potential of SOX21 gene promoter methylation as an epigenetic biomarker for early detection of colorectal cancer.

Authors:  Keivan Moradi; Esmaeil Babaei; Nayebali Rezvani; Reza Safaralizadeh; Homayoun Bashiri; Mohammad Ali Hosseinpour Feizi
Journal:  Indian J Cancer       Date:  2020 Apr-Jun       Impact factor: 1.224

6.  Oncogenic activity of SOX1 in glioblastoma.

Authors:  Idoia Garcia; Juncal Aldaregia; Jelena Marjanovic Vicentic; Paula Aldaz; Leire Moreno-Cugnon; Sergio Torres-Bayona; Estefania Carrasco-Garcia; Laura Garros-Regulez; Larraitz Egaña; Angel Rubio; Steven Pollard; Milena Stevanovic; Nicolas Sampron; Ander Matheu
Journal:  Sci Rep       Date:  2017-04-20       Impact factor: 4.379

Review 7.  DiseaseMeth version 2.0: a major expansion and update of the human disease methylation database.

Authors:  Yichun Xiong; Yanjun Wei; Yue Gu; Shumei Zhang; Jie Lyu; Bin Zhang; Chuangeng Chen; Jiang Zhu; Yihan Wang; Hongbo Liu; Yan Zhang
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

8.  Connection between SOX7 Expression and Breast Cancer Prognosis.

Authors:  Chun-Xin Qin; Xiao-Qing Yang; Zhi-Yong Zhan
Journal:  Med Sci Monit       Date:  2020-04-02
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.