Yi Zheng1, Sumin Chi2, Chengxin Li1. 1. Department of Dermatology, The First Medical Center of Chinese PLA General Hospital, Beijing. 2. Department of Physiology, Air Force Medical University of PLA, Xi'an, China.
Abstract
Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with an increasing incidence. As a pre-cancerous condition, actinic keratosis (AK) has an up to 20% risk of progression to cSCC. This study aims to define the potential genes that associated with genesis and progression of cSCC, thereby further identify critical biomarkers for the prevention, early diagnosis, and effective treatment of cSCC.Two datasets GSE42677 and GSE45216 were downloaded from the GEO. Microarray data analysis was applied to explore the differentially expressed genes (DEGs) between cSCC samples and AK samples. Then functional enrichment analysis, protein-protein interaction (PPI) network, and drug-gene interaction analysis were performed to screen key genes.A total of 711 DEGs, including 238 upregulated genes and 473 downregulated genes, were screened out. DEGs mainly involved in pathways as extracellular matrix (ECM)-receptor interaction, hematopoietic cell lineage, phosphatidylinositol 3-kinase (PI3K-Akt) signaling pathway, and focal adhesion. Candidate genes, including upregulated genes as JUN, filamin A (FLNA), casein kinase 1 delta (CSNK1D), and histone cluster 1 H3 family member f (HIST1H3F), and downregulated genes as androgen receptor (AR), heat shock protein family H member 1 (HSPH1), tropomyosin 1 (TPM1), pyruvate kinase, muscle (PKM), LIM domain and actin binding 1 (LIMA1), and synaptopodin (SYNPO) were screened out. In drug-gene interaction analysis, 13 genes and 44 drugs were identified.This study demonstrates that genes JUN, FLNA, AR, HSPH1, and CSNK1D have the potential to function as targets for diagnosis and treatment of cSCC.
Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with an increasing incidence. As a pre-cancerous condition, actinic keratosis (AK) has an up to 20% risk of progression to cSCC. This study aims to define the potential genes that associated with genesis and progression of cSCC, thereby further identify critical biomarkers for the prevention, early diagnosis, and effective treatment of cSCC.Two datasets GSE42677 and GSE45216 were downloaded from the GEO. Microarray data analysis was applied to explore the differentially expressed genes (DEGs) between cSCC samples and AK samples. Then functional enrichment analysis, protein-protein interaction (PPI) network, and drug-gene interaction analysis were performed to screen key genes.A total of 711 DEGs, including 238 upregulated genes and 473 downregulated genes, were screened out. DEGs mainly involved in pathways as extracellular matrix (ECM)-receptor interaction, hematopoietic cell lineage, phosphatidylinositol 3-kinase (PI3K-Akt) signaling pathway, and focal adhesion. Candidate genes, including upregulated genes as JUN, filamin A (FLNA), casein kinase 1 delta (CSNK1D), and histone cluster 1 H3 family member f (HIST1H3F), and downregulated genes as androgen receptor (AR), heat shock protein family H member 1 (HSPH1), tropomyosin 1 (TPM1), pyruvate kinase, muscle (PKM), LIM domain and actin binding 1 (LIMA1), and synaptopodin (SYNPO) were screened out. In drug-gene interaction analysis, 13 genes and 44 drugs were identified.This study demonstrates that genes JUN, FLNA, AR, HSPH1, and CSNK1D have the potential to function as targets for diagnosis and treatment of cSCC.
Cutaneous squamous cell carcinoma (cSCC) is the second most frequent skin cancer, accounting for approximately 20% of all skin cancers.[ The morbidity of cSCC is steadily increasing, posing a significant threat to public health.[ The most significant risk factor for cSCC is actinic keratosis (AK), a precancerous lesion developed from the damage effects of chronical ultraviolet radiation.[ Although it is difficult to determine whether an AK will evolve into cSCC, up to 65% to 97% of cSCCs are reported to originate in lesions previously diagnosed as AKs.[ Defining critical molecular biomarkers to help identify which AKs are more likely to progress to cSCC is an urgent research need, which will contribute to the prevention, early diagnosis, and effective treatment of cSCC.It is reported that the karyotypic profile of AK is similar to that of cSCC, but the degree of complexity is reduced, representing an earlier stage of tumor formation development.[ Multiple genetic abnormalities have been observed in cSCC and important roles of some genes contributing to the progression of AK to cSCC as TP53, CDKN2A, and NOTCH have been proven.[ Up-regulation of matrix metalloproteinases (MMP) has been validated in many different types of tumors, and in particular the expression and production of MMP-7 proves to be enhanced in cSCC specifically.[ In addition, overexpression of interleukin (IL-24) was found in cSCC lesions via promoting focal expression of MMP7.[ Besides, the mitogen-activated protein kinase (MAPK) pathway is reported to be vital in the transition from AK to cSCC.[ However, current knowledge about the underlying molecular pathogenesis is poorly characterized and the development of molecular biomarkers for early diagnosis, treatment, and prognosis prediction of tumor remain underexplored.Deoxyribonucleic acid (DNA) microarray is now widely used in clinical research to identify differentially expressed genes (DEGs) between normal samples and tumor specimens. Secondary analysis of exiting DNA microarray data provides a comprehensive and responsible comparison of gene expression among different tissues.[ In this study, we tried to investigate the possible molecular mechanism of progression from AK to cSCC and to identify the key genes associated with cSCC and thus potentially provide novel treatment targets for cSCC management. DEGs of cSCC samples compared with controlled AK samples were screened out to explore the molecular biological mechanisms of cSCC.
Methods
Microarray data source
Microarray data containing both cSCC samples and AK samples were retrieved in the Gene Expression Omnibus (GEO), a database for gene expression managed by the National Center of Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/geo/). Two datasets with the accession number of GSE42677 and GSE45216 were selected for the following analysis.[ The dataset GSE42677 included 10 cSCC samples and 5 AK samples and the platform employed was GPL571 (HG-U133A_2) Affymetrix Human Genome U133A 2.0 Array. The dataset GSE45216 contained 30 cSCC samples and 10 AK samples and the platform used was GPL570 (HG-U133_Plus_2) Affymetrix Human Genome U133 Plus 2.0 Array. The data used in this study were downloaded from the openly available GEO dataset and ethical approval was waived.
Data pre-processing and identification of DEGs
The affy package (http://bioconductor.org/packages/release/bioc/html/affy.html, version 1.60.0), which contained functions for exploratory oligonucleotide array analysis, was used to perform the background adjustment and normalization so as to ensure the comparability and integrity of data.[ The Robust Multi-array Average (RMA) method was used for normalizing and calculating expression measures.[ The limma package (http://bioconductor.org/packages/limma/, version 3.38.3) was used to identify the DEGs between cSCC and AK.[ Genes with an absolute value of log 2 (fold change) greater than 1 and P < .05 were screened out as DEGs. To discard probes that did not match any gene symbol, probes were annotated with corresponding platform annotation file.
Functional enrichment analysis of DEGs
The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEEG) enrichment analysis of DEGs was performed using the online analysis tool the Database for Annotation Visualization and Integrated Discovery (DAVID) (https://david.ncifcrf.gov/, version 6.8), a database providing investigators tools to comprehend biological meaning behind a number of genes.[ Given a gene list, the DAVID could identify enriched biological terms and detect enriched functional-related gene groups through standard accumulative hypergeometric statistical test. Significance threshold P < .05 together with the enriched gene number ≥2 were regarded as the threshold of significant enrichment analysis.
Protein network analysis
Protein networks provide a complementary means to dynamically identify protein groupings that are functionally relevant. Proteins connected within a protein-protein interaction (PPI) network are likely to collaborate to perform various related biological processes (BP). PPI networks were constructed with the Metascape (http://metascape.org/gp/index.html) and Cytoscape software (https://cytoscape.org/, version 3.7.3).[ In Metascape, PPI enrichment analysis was carried out with databases BioGrid, InWeb IM, and OmniPath.[ The Molecular Complex Detection (MCODE) plugin (http://apps.cytoscape.org/apps/mcode, version 1.5.1) of Cytoscape was used to cluster the PPI network to recognize the most densely connected network neighborhoods, where such neighborhood components were more likely to associate with a particular complex or functional unit.[ The threshold score was set at ≥10. Pathway and process enrichment analysis was performed and the best-scoring terms appraised by P value were regarded as the functional description.
Drug-gene interaction analysis
The Drug Gene Interaction Database (DGIdb, http://www.dgidb.org/) collects and integrates genomic resources and therapeutic resources to search genes against the existing compendia of known or potential drug-gene interactions.[ In this study, DGIdb 3.0 was used to identify possible effective drugs based on the above obtained module genes. All the gene-drug interactions related were predicted.
Results
DEGs screening
Data processing was applied on the raw data by affy and limma packages in R software. According to the above-mentioned screening criteria, we identified 302 DEGs from GSE42677 dataset, containing 137 upregulated genes and 165 downregulated genes. At the same time, 619 DEGs were screened out from GSE45216 dataset, the number of genes upregulated was 194 while 425 genes were downregulated. Figure 1 showed the clustering heatmaps of the top 200 DEGs. To avoid loss of disease-associated DEGs, DEGs from two datasets were combined. After removal of duplicate genes and genes with self-contradictory expression trends, 711 genes altogether were included in the next analysis, of which 238 were upregulated and 473 were downregulated.
Figure 1
Heatmaps of the top 200 DEGs. (A) Heatmaps of DEGs in GSE42677. (B) Heatmaps of DEGs in GSE45216. DEGs = differentially expressed genes.
Heatmaps of the top 200 DEGs. (A) Heatmaps of DEGs in GSE42677. (B) Heatmaps of DEGs in GSE45216. DEGs = differentially expressed genes.The KEEG pathway enrichment analysis results demonstrated that upregulated genes were enriched in 10 pathways, for example, transcriptional misregulation in cancer, rheumatoid arthritis, focal adhesion, and amoebiasis. Downregulated genes were enriched in 18 pathways, like extracellular matrix (ECM)-receptor interaction, hematopoietic cell lineage, phosphatidylinositol 3-kinase (PI3K-Akt) signaling pathway, and focal adhesion. The KEEG pathway terms enriched by DEGs were shown in Table 1. In terms of GO BP, genes upregulated were enriched in 62 GO BP terms, while the downregulated genes were enriched in 55 terms. The bubble plot showed the most significant 20 GO BP terms (Fig. 2). The results indicated that upregulated genes mainly participated in BP such as extracellular matrix organization, cell proliferation, inflammatory response, and blood coagulation. At the same time, genes downregulated mainly involved in BP such as epidermis development, cell adhesion, sphingolipid biosynthetic process, and skin development.
Table 1
The main KEEG pathways enriched by DEGs.
Figure 2
The bubble plot of the 20 most remarkable GO terms of genes upregulated and genes downregulated. (A) The 20 most remarkable GO terms of genes upregulated. (B) The 20 most remarkable terms of downregulated genes. GO = gene ontology.
The main KEEG pathways enriched by DEGs.The bubble plot of the 20 most remarkable GO terms of genes upregulated and genes downregulated. (A) The 20 most remarkable GO terms of genes upregulated. (B) The 20 most remarkable terms of downregulated genes. GO = gene ontology.
PPI network analysis
Figure 3A showed the PPI network analysis results, which contained 369 nodes and 862 edges. Nodes with higher topological scores were key nodes of PPI network. The key 10 gene nodes included JUN (also known as Jun proto-oncogene or activator protein 1 transcription factor subunit, upregulated, degree = 52), filamin A (FLNA, upregulated, degree = 34), androgen receptor (AR, downregulated, degree = 29), heat shock protein family H member 1 (HSPH1, downregulated, degree = 28), casein kinase 1 delta (CSNK1D, upregulated, degree = 27), tropomyosin 1 (TPM1, downregulated, degree = 26), pyruvate kinase, muscle (PKM, downregulated, degree = 24), histone cluster 1 H3 family member f (HIST1H3F, upregulated, degree = 23), LIM domain and actin binding 1 (LIMA1, downregulated, degree = 23), and synaptopodin (SYNPO, downregulated, Degree = 19). Pathway and process enrichment analysis showed that these genes mainly involved in transmembrane receptor protein tyrosine kinase signaling pathway, extracellular matrix organization, and regulation of cell adhesion. The MCODE algorithm were applied to identify densely connected network components and the MCODE networks identified for individual gene lists were gathered and shown in Figure 3B. The biggest 10 nodes of the PPI network were also hub genes in the MCODE networks.
Figure 3
PPI network and the MCODE networks. (A) PPI network. (B) MCODE networks. The network contains proteins that form physical interactions with at least one other member. The densely connected network components were identified with the MCODE algorithm. Node size was decided by the degree, higher degree represented a larger node. MCODE = molecular complex detection, PPI = protein-protein interaction,.
PPI network and the MCODE networks. (A) PPI network. (B) MCODE networks. The network contains proteins that form physical interactions with at least one other member. The densely connected network components were identified with the MCODE algorithm. Node size was decided by the degree, higher degree represented a larger node. MCODE = molecular complex detection, PPI = protein-protein interaction,.By searching the DGIdb for the module genes identified by the MCODE networks, 52 gene-drug interactions were obtained, containing 13 genes (4 upregulated genes: NOTCH1, CXCL8, JUN, CYP27B1; 9 downregulated genes: CYP3A5, AR, MAP2, EDNRB, CYP7B1, LRRK2, PTGER3, SAA1, MMP7) and 44 drugs (preset filters: FDA approved and antineoplastic). The drug-gene interaction results were shown in Figure 4.
Figure 4
Drug-gene interaction diagram, yellow squares represents the DEGs and blue squares represents the drugs. DEG = differentially expressed gene.
Drug-gene interaction diagram, yellow squares represents the DEGs and blue squares represents the drugs. DEG = differentially expressed gene.
Discussion
cSCC usually arises from interfollicular epidermal keratinocytes and are characterized by different degrees of keratosis.[ In the year 2015, up to 2.2 million people were troubled with cSCC.[ Although prognosis of cSCC is generally favorable, its five-year survival rate reduces to only 34% when invasion and distant metastasis occurs.[ Until now, the biomarker study of cSCC was insufficient and the identification of biomarkers indicating progression from AK to cSCC was significant and urgent, which might be beneficial in the prevention, earlier diagnosis and prognostic management of cSCC.In this study, 53 samples, 40 cSCC samples and 13 AK samples from datasets GSE42677 and GSE45216, were used for bioinformatics analysis, aiming to explore the potential molecular mechanism of cSCC. Finally, 238 upregulated genes and 473 downregulated genes were identified. All the 711 DEGs were subjected to following gene annotation, enrichment analysis and PPI analysis. Upregulated genes JUN, FLNA, CSNK1D, and HIST1H3F, and down regulated genes AR, HSPH1, TPM1, PKM, LIMA1, and SYNPO were identified as the potential target genes. JUN, FLNA, AR, HSPH1, and CSNK1D were the genes most differentially expressed in cSCC and AK.JUN, otherwise known as activator protein 1 transcription factor subunit, is a transcription factor that regulates gene expression to respond to a variety of stimuli, such as cytokines, stress, and viral or bacterial infections.[ JUN governs many cellular processes like cell proliferation, cell differentiation, and apoptosis.[ It is overexpressed in a variety types of human tumors, including cSCC.[ The activated state of JUN in answer to extracellular signals can be modified under conditions in which the balance of keratinocyte proliferation and differentiation is rapidly changed.[ Therefore, JUN was investigated as a possible target for cancer prevention and treatment.[ As an actin-binding protein encoded by the FLNA gene, FLNA crosslinks actin filaments and attaches actin filaments to membrane glycoproteins. FLNA not only participates in remodeling the cytoskeleton to influence migration and cell shape, but also interacts with integrins and second messengers.[ It is reported that regulation of epidermal growth factor dependent inactivation of α5β1 integrin is by FLNA phosphorylation and cellular contractility.[ As a member of the casein kinase I gene family, CSNK1D mainly participates in the control of cytoplasmic and nuclear processes, including DNA replication and repair.[ The upregulation of CSNK1D has been verified in squamous carcinoma.[The AR has 3 major functional domains: the N-terminal domain, the androgen binding domain, and the DNA binding domain.[ To act as a DNA-binding transcription factor and regulate gene expression is the main function of the AR. One of the known AR activation target genes is the insulin-like growth factor receptor (IGFR).[ HSPH1, also known as HSP105, acts as a nucleotide exchange factor for the molecular chaperone heat shock cognate 71 kDa protein (Hsc70).[ Furthermore, HSPH1 plays a unique but related role as a holdase that inhibits the aggregation of misfolded proteins, including the cystic fibrosis transmembrane conductance regulator (CFTR) protein.[ However, the specific role of HSPH1 in the genesis and invasion of cSCC was still undefined, which warrants more studies. Interestingly, JUN and AR were also of significance in drug-gene interaction network.Altogether, our study indicates that genes JUN, FLNA, AR, and HSPH1 may play vital roles in the occurrence and development of cSCC and may be used as potential biomarkers of cSCC, indicating an application in the improvement of prognostic tools and treatments of cSCC. However, it should be noted that the lack of experimental verifications of identified genes is a limitation of this study, which will be focused in future study.
Conclusion
Our study identified several candidate genes which may be closely related to the occurrence and progression of cSCC from AK via comprehensive analysis of microarray data. These genes may function as the biomarkers of cSCC and will contribute to the development of novel targeted therapies for cSCC.
Author contributions
Conceptualization: Yi Zheng, Chengxin Li.Data curation: Yi Zheng, Sumin Chi.Formal analysis: Yi Zheng, Sumin Chi.Funding acquisition: Chengxin Li.Investigation: Chengxin Li.Methodology: Yi Zheng, Sumin Chi.Project administration: Chengxin Li.Resources: Chengxin Li.Software: Yi Zheng, Sumin Chi.Supervision: Chengxin Li.Validation: Sumin Chi, Chengxin Li.Writing – original draft: Yi Zheng.Writing – review & editing: Yi Zheng, Sumin Chi, Chengxin Li.
Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker Journal: Genome Res Date: 2003-11 Impact factor: 9.043
Authors: Nick Z Lu; Suzanne E Wardell; Kerry L Burnstein; Donald Defranco; Peter J Fuller; Vincent Giguere; Richard B Hochberg; Lorraine McKay; Jack-Michel Renoir; Nancy L Weigel; Elizabeth M Wilson; Donald P McDonnell; John A Cidlowski Journal: Pharmacol Rev Date: 2006-12 Impact factor: 25.468
Authors: A S Jonason; S Kunala; G J Price; R J Restifo; H M Spinelli; J A Persing; D J Leffell; R E Tarone; D E Brash Journal: Proc Natl Acad Sci U S A Date: 1996-11-26 Impact factor: 11.205
Authors: Nora Eisemann; Annika Waldmann; Alan C Geller; Martin A Weinstock; Beate Volkmer; Ruediger Greinert; Eckhard W Breitbart; Alexander Katalinic Journal: J Invest Dermatol Date: 2013-07-22 Impact factor: 8.551
Authors: Nicholas J Wang; Zachary Sanborn; Kelly L Arnett; Laura J Bayston; Wilson Liao; Charlotte M Proby; Irene M Leigh; Eric A Collisson; Patricia B Gordon; Lakshmi Jakkula; Sally Pennypacker; Yong Zou; Mimansa Sharma; Jeffrey P North; Swapna S Vemula; Theodora M Mauro; Isaac M Neuhaus; Philip E Leboit; Joe S Hur; Kyunghee Park; Nam Huh; Pui-Yan Kwok; Sarah T Arron; Pierre P Massion; Allen E Bale; David Haussler; James E Cleaver; Joe W Gray; Paul T Spellman; Andrew P South; Jon C Aster; Stephen C Blacklow; Raymond J Cho Journal: Proc Natl Acad Sci U S A Date: 2011-10-17 Impact factor: 11.205
Authors: Shari R Atilano; Sina Abedi; Narcisa V Ianopol; Mithalesh K Singh; J Lucas Norman; Deepika Malik; Payam Falatoonzadeh; Marilyn Chwa; Anthony B Nesburn; Baruch D Kuppermann; M Cristina Kenney Journal: Cells Date: 2022-08-26 Impact factor: 7.666