Literature DB >> 29805564

Analysis of differential gene expression caused by cervical intraepithelial neoplasia based on GEO database.

Shenghui Yao1, Taifeng Liu2.   

Abstract

The aim of the present study was to identify the differentially expressed genes between cervical intraepithelial neoplasias (CIN) and adjacent normal tissue, and to construct a protein-protein interaction (PPI) network. A CIN dataset was obtained from Gene Expression Omnibus, and data of gene expression in CIN and adjacent normal tissue were extracted from GSE64217. The differentially expressed genes were selected using software package and heat map was drawn using the 'pheatmap' package. The selected differentially expressed genes were subjected to PPI, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis using Cytoscape, Database for Annotation, Visualization and Integrated Discovery, STRING and KOBAS. In the present study, 287 genes were differentially expressed between CIN and adjacent normal tissue, of which 170 were significantly upregulated and 118 genes were significantly downregulated (P<0.00001, fold-change >6). A differential gene expression network map was constructed to show the interactions of 30 protein products encoded by differentially expressed genes using STRING software. In particular, the key gene, EGR1, was identified using Cytoscape software. The KEGG pathway analysis revealed that the differential genes were mainly involved in several pathways, including 'glutathione metabolism', 'arachidonic acid metabolism', and 'pentose phosphate pathway'. Results of the GO analysis showed that differential genes were enriched in different subsets. Specifically, small proline-rich protein 2E and 3, distal-less homeobox 5, epithelial membrane protein 1, cornifelin, periplakin, homeobox protein Hox-A13, estrogen receptor α, transglutaminase 1, small proline-rich protein 2A, Rh C glycoprotein, tumor protein p63, TGM3, homeobox B5 and small proline-rich protein 2D were enriched in 'epithelial cell differentiation', which affected the differentiation of epithelial cells. In conclusion, 287 differentially expressed genes were identified successfully. The key gene was identified based on the results of PPI, GO and KEGG analyses, and functional annotation and pathway analysis were also performed. Our study provides the basis for further studies on the interaction among differentially expressed genes.

Entities:  

Keywords:  KEGG pathway analysis; differential genes; gene ontology analysis; intraepithelial neoplasias; protein-protein interaction networks

Year:  2018        PMID: 29805564      PMCID: PMC5950031          DOI: 10.3892/ol.2018.8403

Source DB:  PubMed          Journal:  Oncol Lett        ISSN: 1792-1074            Impact factor:   2.967


Introduction

Cervical cancer is a common malignant tumor (1) with an incidence of up to 1.2–4.5 per 10,000 delivery women. The incidence of cervical cancer has shown an increasing trend in Chinese women (2,3). Cervical intraepithelial neoplasia (CIN) is a type of precancerous lesion closely related to cervical cancer. With a high incidence of up to 6.5% (4), CIN significantly impacts the development of cervical cancer. Therefore, accurate diagnosis of CIN and better understanding of the pathogenesis of this disease will definitely improve the prevention of cervical cancer (5). However, most studies mainly focused on the treatment of this disease, and studies on its pathogenesis are relatively rare (6,7). In previous years, with the explosion of gene expression data, bioinformatics-based data digging for gene expression profile analysis has become a hot research field (8,9). In the present study, the bioinformatics method was applied to analyze gene expression data to identify differentially expressed genes in CIN tissue. Our study provide references for further studies on the molecular pathogenesis of CIN.

Materials and methods

Acquisition of gene expression profiling data

Gene expression profiling data with the series number GSE64217 was obtained from the the Gene Expression Omnibus (GEO) database. GSE64217 was provided by the Indian Institute of Technology Kharagpur, School of Medical Science and Technology, Multimodal Imaging and Computing for Theranostics (West Bengal, India). The data included 2 cases of CIN, of cervical squamous cell carcinoma, and 2 of normal tissues. Biopsy samples were collected during hysterectomy, and half of each sample was analyzed with optical microscopy (Olympus, Tokyo, Japan) by a pathologist for histopathological confirmation and the other half was used for microarray analysis.

Pretreatment of raw data, identification of differential genes, and preparation of a heat map

Statistical analysis on chip data was performed using BRB-ArrayTools 4.3.2 Beta software. Chip data were first pre-treated using JustRMA algorithm, and filtered and normalized using median-based method. Chip data were filtered according to the following criteria: i) No less than two times of difference of median of genes should be observed in ≥20% of the samples when comparing the two types of samples; and ii) missed gene expression data should be ≤50%. The filtered genes were tested with independent-samples t-test. Classification and comparison of dataset were performed with the Class comparison tool to identify differentially expressed genes between CIN and normal tissue (P<0.00001). Finally, a heat map was drawn using ‘pheatmap’ package in ‘R’ software, and differentially expressed genes were highlighted.

Gene Ontology (GO) enrichment analysis

Differentially expressed genes were subjected to GO enrichment analysis and functional annotation using Database for Annotation, Visualization and Integrated Discovery (DAVID) and ‘Bingo’ (plug-in of Cytoscape software). DAVID analysis: DAVID (Database for Annotation, Visualization and Integration Discovery) software, which integrates all the major public bioinformatics resources, can be used to interpret genes related to biological mechanisms by providing enrichment analysis with standardized genetic terminologies. The DAVID database aims to provide rapid accessibility of heterogeneous annotation data from enriched area and enhanced biological information levels of individual genes specifically to yield a gene list by enabling high-throughput gene function analysis. DAVID database can be downloaded for free (https://david.ncifcrf.gov/).

Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis

KEGG pathway analysis and functional annotation for differential genes were performed using KOBAS 3.0, which is the first hypergeometric distribution-based examination software to evaluate the significance of enrichment of pathways, and has been successfully applied in pathway analysis for plants, animals, bacteria, and other organisms. KOBAS server can be accessed via https://kobas.cbi.pku.edu.cn.

Protein-protein interaction (PPI) network analysis

Differential genes were subjected to PPI network analysis using STRING software. PPR refers to the protein complex formed by two or more proteins through covalent bond. STRING can be accessed free of charge via https://string-db.org/.

Results

Identification of differential genes and preparation of the heat map

A total of 287 differential genes were obtained based on GSE64217, and 170 genes were significantly upregulated and 118 genes were significantly downregulated (P<0.00001, fold-change >6). Representative differential genes are presented in Table I. Fifty differential genes with the lowest P-values were analyzed in the heat map (Fig. 1).
Table I.

Major differential genes.

GenelogFCP-valueAdjusted P-value
SPRR2A17458.833456.83E-111.35E-06
SPRR2E10240.244421.30E-101.35E-06
TGM34833.3257091.26E-098.70E-06
SCGB1D25415.641562.82E-091.26E-05
KLK132711.5891543.03E-091.26E-05
NCCRP14315.4555751.11E-083.84E-05
AKR1B10−5251.8888751.38E-084.09E-05
KLK121742.5024791.75E-084.54E-05
MSLN2480.8339442.15E-084.97E-05
RPTN1595.1550994.20E-088.71E-05
Figure 1.

Heat map analysis of the first 50 differential genes. Major enrichment areas of differential genes are highlighted.

GO enrichment analysis

The list of differential genes was uploaded to DAVID bioinformatics resource network (https://david.ncifcrf.gov/). The identifier was set as OFFICIAL_GENE_SYMBOL and list type as Gene List. Other parameters were all default. The results showed that differential genes were mainly concentrated in the following fields: ‘Epithelial cell differentiation’, ‘epithelium development’, and ‘epidermis development’, which can affect the development and differentiation of epithelial cells (Fig. 2).
Figure 2.

GO enrichment result of differential genes. The x-axis represents the enrichment of GO, and the y-axis represents the count and ratio of differential genes. Different colors correspond to different GO categories. GO, Gene Ontology.

KEGG pathway analysis

KOBAS 3 software was used for KEGG pathway analysis and functional annotation of differential genes, and five key KEGG pathways were identified, including ‘glutathione metabolism’, ‘arachidonic acid metabolism’, and ‘pentose phosphate pathway’, among which ‘glutathione metabolism’ and ‘grachidonic acid metabolism’ pathways were considered to be the two most important ones (Table II).
Table II.

KEGG enrichment outcome of differential genes.

TermCountP-valueFDR
hsa00480: Glutathione metabolism50.0065733687.715865143
hsa00590: Arachidonic acid metabolism50.01297543714.70170689
hsa00030: Pentose phosphate pathway30.06769031157.40275058
hsa05230: Central carbon metabolism in cancer40.06819191257.68095253
hsa04610: Complement and coagulation cascades40.08143080364.44748515

Term, enriched KEGG; count, number of differential genes within Term; P-value, P-value of enrichment analysis; FDR, adjusted P-value; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.

PPI analysis

With PPI analysis using STRING software, 30 prominent proteins were identified of which estrogen receptor α (ESR1), STAT1, AURKA and GAK are the relatively important. EGR1 was considered to be the most important protein and connected 14 nodes (Figs. 3 and 4).
Figure 3.

Protein-protein interaction (PPI) map. Circle represents the gene; line, protein-protein interaction (PPI); result in the circle, structure of the protein. Color of the arrows indicates varying PPI evidence.

Figure 4.

Histogram of key proteins. The y-axis represents the name of genes, the x-axis represent the number of adjacent genes, and height is the number of gene connections.

Discussion

Cervical cancer is a common malignant tumor in women (10). In China, approximately 30,000 deaths and 100,000 new cases are reported annually (11). CIN, the precancerous lesion closely related to cervical cancer, is considered to be of great significance in studies of cervical cancer (12,13). Transition from CIN to cervical cancer may take as long as 10 years (14). Therefore, early diagnosis, on-time follow-up and early treatment of CIN may effectively inhibit the development of cervical cancer (15). The latest report of American Society of Clinical Oncology (ASCO) showed that the patients with CIN were becoming younger, especially for urban residents (16). In order to investigate the molecular pathogenesis of CIN, we analyzed the differential expression between CIN patients and healthy controls using gene expression profiling data with multiple bioinformatic methods including enrichment analysis and PPI analysis. In the present study, strict inclusion criteria were followed to select the most reliable chips from microarray candidates. The reliability of the results was secured by the use of microarray data from multiple samples, which can reduce the error rate. Based on GEO public database, we analyzed and integrated the chip data using software package, and the resulting 287 differential genes were further treated for PPI analysis with STRING. PPI analysis with differential genes showed that EGR1 and ESR1 genes are important factors affecting CIN. EGR1 protein is regarded as the protein with the most significant impact on CIN. Van den Brandt et al (17) suggested that EGR1 was closely associated with the development of myopia and non-small cell lung cancer in human. EGR1 is also an important factor affecting the cell dysplasia, dedifferentiation, and synthesis of nucleoli and ribosomes (18). The activation of EGR1 is related to the normal growth and differentiation of cells. However, cell dysplasia and dedifferentiation play an important role in the development of CIN. Therefore, EGR1 can serve as a marker for the diagnosis of CIN. Schiavon et al (19) suggested that ESR1 is a potential risk factor for breast cancer and can be used as a tumor marker for targeted therapy of breast cancer. The biological effects of ESR1 mainly affect estrogen-targeted organs. ESR1 is mainly expressed in cytoplasm of cervical cancer cells, but not in the nucleus, possibly due to the blocked protein translation and modification. However, with the progression of CIN, the expression of ESR1 in epithelial cells is gradually declining, indicating that ESR1 can be used as marker for the early diagnosis of CIN. But further studies are needed to confirm these conclusions. GO enrichment analysis found that differential expression between CIN cancer cells and normal cells was mainly observed in epithelial cell differentiation, epithelial cell development, and epidermal development. Nagaoka et al (20) concluded that the biological processes of ‘epithelial cell differentiation’ and ‘development’ played an equally dominant role in the pathogenesis of breast and lung cancer, making the two the focus of study on the biological process of lung cancer. The specific maintenance of differentiation ability in squamous epithelial cells is an important feature of CIN, indicating the important impact of EGR1 and ESR1 on CIN. KEGG pathway analysis revealed the dominant role of glutathione metabolism and arachidonic acid metabolism pathways in CIN. A previous study carried out by Liu et al (21) suggested that glutathione metabolism was involved in varying aspects of the development of cancers by affecting the rate of cancer progression. Glutathione, which can be found in every cell in the body, plays importance roles in the maintenance of normal immune system function. Glutathione has been used widely as basic ingredient in functional foods due to its function in improving resistance and inhibiting tumorigenesis. Arachidonic acid in the human body can be synthesized by linoleic acid. The metabolism of arachidonic acid can affect cell proliferation rate, which is related to cell dysplasia of CIN. These four signal pathways also play pivotal roles in the progression of other tumors, but the correlation with CIN still has not been reported. The specific mechanism remains to be further explored. Chip data used in this study are relatively old and sample size was relatively small. Considering that CIN-relevant genes may change with contributing factors or demographic reason (region, and ethnicity), occult genetic difference may exist. We have reduced the avoidable human error to the lowest possible level. Tumor development is difficult to predict. Further studies should focus on the gene and pathway candidates to elucidate the mechanism. The long-term analysis led to identification that, CIN was closely related to EGR1 and ESR1 genes, epithelial cell differentiation and glutathione metabolism. However, the internal connection of the three factors remains to be explored with further studies. Considering the fact that only few studies on CIN-relevant genes have been reported, better understanding of CIN at the genetic level may significantly benefit the diagnosis, treatment and prognosis of CIN.
  17 in total

1.  Development of a record linkage protocol for use in the Dutch Cancer Registry for Epidemiological Research.

Authors:  P A Van den Brandt; L J Schouten; R A Goldbohm; E Dorant; P M Hunen
Journal:  Int J Epidemiol       Date:  1990-09       Impact factor: 7.196

2.  [Significance and implication on changes of serum squamous cell carcinoma antigen in the diagnosis of recurrence squamous cell carcinoma of cervix].

Authors:  Qun Li; Shuyu Liu; Hongli Liu; Jing Zhang; Suyang Guo; Lihua Wang
Journal:  Zhonghua Fu Chan Ke Za Zhi       Date:  2015-02

3.  Expression of PBK/TOPK in cervical cancer and cervical intraepithelial neoplasia.

Authors:  Qiong Luo; Bin Lei; Shuguang Liu; Yaowen Chen; Wenjie Sheng; Peixin Lin; Wenxia Li; Haili Zhu; Hong Shen
Journal:  Int J Clin Exp Pathol       Date:  2014-10-15

4.  From cancer screening to treatment: service delivery and referral in the National Breast and Cervical Cancer Early Detection Program.

Authors:  Jacqueline W Miller; Vivien Hanson; Gale D Johnson; Janet E Royalty; Lisa C Richardson
Journal:  Cancer       Date:  2014-08-15       Impact factor: 6.860

5.  Human papillomavirus DNA testing for the detection of cervical intraepithelial neoplasia grade 3 and cancer: 5-year follow-up of a randomised controlled implementation trial.

Authors:  N W J Bulkmans; J Berkhof; L Rozendaal; F J van Kemenade; A J P Boeke; S Bulk; F J Voorhorst; R H M Verheijen; K van Groningen; M E Boon; W Ruitinga; M van Ballegooijen; P J F Snijders; C J L M Meijer
Journal:  Lancet       Date:  2007-10-04       Impact factor: 79.321

6.  Natural history of cervical neoplasia and risk of invasive cancer in women with cervical intraepithelial neoplasia 3: a retrospective cohort study.

Authors:  Margaret R E McCredie; Katrina J Sharples; Charlotte Paul; Judith Baranyai; Gabriele Medley; Ronald W Jones; David C G Skegg
Journal:  Lancet Oncol       Date:  2008-04-11       Impact factor: 41.316

7.  Analysis of ESR1 mutation in circulating tumor DNA demonstrates evolution during therapy for metastatic breast cancer.

Authors:  Gaia Schiavon; Sarah Hrebien; Isaac Garcia-Murillas; Rosalind J Cutts; Alex Pearson; Noelia Tarazona; Kerry Fenwick; Iwanka Kozarewa; Elena Lopez-Knowles; Ricardo Ribas; Ashutosh Nerurkar; Peter Osin; Sarat Chandarlapaty; Lesley-Ann Martin; Mitch Dowsett; Ian E Smith; Nicholas C Turner
Journal:  Sci Transl Med       Date:  2015-11-11       Impact factor: 17.956

Review 8.  Emerging regulatory paradigms in glutathione metabolism.

Authors:  Yilin Liu; Annastasia S Hyde; Melanie A Simpson; Joseph J Barycki
Journal:  Adv Cancer Res       Date:  2014       Impact factor: 6.242

9.  Low risk of cervical cancer during a long period after negative screening in the Netherlands.

Authors:  M E van den Akker-van Marle; M van Ballegooijen; J D F Habbema
Journal:  Br J Cancer       Date:  2003-04-07       Impact factor: 7.640

10.  Association between cervical lesion grade and micronucleus frequency in the Papanicolaou test.

Authors:  Caroline Tanski Bueno; Cláudia Maria Dornelles da Silva; Regina Bones Barcellos; Juliana da Silva; Carla Rossana Dos Santos; João Evangelista Sampaio Menezes; Honório Sampaio Menezes; Maria Lucia Rosa Rossetti
Journal:  Genet Mol Biol       Date:  2014-09       Impact factor: 1.771

View more
  10 in total

1.  Diagnostic genes and immune infiltration analysis of colorectal cancer determined by LASSO and SVM machine learning methods: a bioinformatics analysis.

Authors:  Yan-Rong Li; Ke Meng; Guang Yang; Bao-Hai Liu; Chu-Qiao Li; Jia-Yuan Zhang; Xiao-Mei Zhang
Journal:  J Gastrointest Oncol       Date:  2022-06

2.  Identification of crucial aberrantly methylated and differentially expressed genes related to cervical cancer using an integrated bioinformatics analysis.

Authors:  Xiaoling Ma; Jinhui Liu; Hui Wang; Yi Jiang; Yicong Wan; Yankai Xia; Wenjun Cheng
Journal:  Biosci Rep       Date:  2020-05-29       Impact factor: 3.840

3.  Eleven genes associated with progression and prognosis of endometrial cancer (EC) identified by comprehensive bioinformatics analysis.

Authors:  JinHui Liu; ShuLin Zhou; SiYue Li; Yi Jiang; YiCong Wan; XiaoLing Ma; WenJun Cheng
Journal:  Cancer Cell Int       Date:  2019-05-20       Impact factor: 5.722

4.  ITLNI identified by comprehensive bioinformatic analysis as a hub candidate biological target in human epithelial ovarian cancer.

Authors:  JinHui Liu; SiYue Li; JunYa Liang; Yi Jiang; YiCong Wan; ShuLin Zhou; WenJun Cheng
Journal:  Cancer Manag Res       Date:  2019-03-25       Impact factor: 3.989

5.  Identifying MMP14 and COL12A1 as a potential combination of prognostic biomarkers in pancreatic ductal adenocarcinoma using integrated bioinformatics analysis.

Authors:  Jingyi Ding; Yanxi Liu; Yu Lai
Journal:  PeerJ       Date:  2020-11-23       Impact factor: 2.984

6.  Genome-Wide Identification and Analysis of Chitinase GH18 Gene Family in Mycogone perniciosa.

Authors:  Yang Yang; Frederick Leo Sossah; Zhuang Li; Kevin D Hyde; Dan Li; Shijun Xiao; Yongping Fu; Xiaohui Yuan; Yu Li
Journal:  Front Microbiol       Date:  2021-01-11       Impact factor: 5.640

7.  Subtyping of Human Papillomavirus-Positive Cervical Cancers Based on the Expression Profiles of 50 Genes.

Authors:  Xiaojun Zhu; Shengwei Li; Jiangti Luo; Xia Ying; Zhi Li; Yuanhe Wang; Mengmeng Zhang; Tianfang Zhang; Peiyue Jiang; Xiaosheng Wang
Journal:  Front Immunol       Date:  2022-01-21       Impact factor: 7.561

8.  Identification of aberrantly methylated differentially expressed genes and associated pathways in endometrial cancer using integrated bioinformatic analysis.

Authors:  JinHui Liu; YiCong Wan; Siyue Li; HuaiDe Qiu; Yi Jiang; Xiaoling Ma; ShuLin Zhou; WenJun Cheng
Journal:  Cancer Med       Date:  2020-03-14       Impact factor: 4.452

9.  Identification of metastasis and prognosis-associated genes for serous ovarian cancer.

Authors:  Yijun Yang; Suwan Qi; Can Shi; Xiao Han; Juanpeng Yu; Lei Zhang; Shanshan Qin; Yingchun Gao
Journal:  Biosci Rep       Date:  2020-06-26       Impact factor: 3.840

10.  Prognostic marker identification based on weighted gene co-expression network analysis and associated in vitro confirmation in gastric cancer.

Authors:  Haonan Guo; Jun Yang; Shanshan Liu; Tao Qin; Qianwen Zhao; Xianliang Hou; Lei Ren
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.