Yingchun Hu1, Lingxia Cheng1, Wu Zhong1, Muhu Chen1, Qian Zhang2. 1. Department of Emergency Medicine, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, China (mainland). 2. Department of Infectious Diseases, The Affiliated Hospital of Southwest Medical University, Luzhou, Sichuan, China (mainland).
Abstract
BACKGROUND Septic shock occurs when sepsis is associated with critically low blood pressure, and has a high mortality rate. This study aimed to undertake a bioinformatics analysis of gene expression profiles for risk prediction in septic shock. MATERIAL AND METHODS Two good quality datasets associated with septic shock were downloaded from the Gene Expression Omnibus (GEO) database, GSE64457 and GSE57065. Patients with septic shock had both sepsis and hypotension, and a normal control group was included. The differentially expressed genes (DEGs) were identified using OmicShare tools based on R. Functional enrichment of DEGs was analyzed using DAVID. The protein-protein interaction (PPI) network was established using STRING. Survival curves of key genes were constructed using GraphPad Prism version 7.0. Each putative central gene was analyzed by receiver operating characteristic (ROC) curves using MedCalc statistical software. RESULTS GSE64457 and GSE57065 included 130 RNA samples derived from whole blood from 97 patients with septic shock and 33 healthy volunteers to obtain 975 DEGs, 455 of which were significantly down-regulated and 520 were significantly upregulated (P<0.05). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified significantly enriched DEGs in four signaling pathways, MAPK, TNF, HIF-1, and insulin. Six genes, WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP in the center of the PPI network were associated with septic shock, according to survival curve and ROC analysis. CONCLUSIONS Bioinformatics analysis of gene expression profiles identified four signaling pathways and six genes, potentially representing molecular mechanisms for the occurrence, progression, and risk prediction in septic shock.
BACKGROUND Septic shock occurs when sepsis is associated with critically low blood pressure, and has a high mortality rate. This study aimed to undertake a bioinformatics analysis of gene expression profiles for risk prediction in septic shock. MATERIAL AND METHODS Two good quality datasets associated with septic shock were downloaded from the Gene Expression Omnibus (GEO) database, GSE64457 and GSE57065. Patients with septic shock had both sepsis and hypotension, and a normal control group was included. The differentially expressed genes (DEGs) were identified using OmicShare tools based on R. Functional enrichment of DEGs was analyzed using DAVID. The protein-protein interaction (PPI) network was established using STRING. Survival curves of key genes were constructed using GraphPad Prism version 7.0. Each putative central gene was analyzed by receiver operating characteristic (ROC) curves using MedCalc statistical software. RESULTS GSE64457 and GSE57065 included 130 RNA samples derived from whole blood from 97 patients with septic shock and 33 healthy volunteers to obtain 975 DEGs, 455 of which were significantly down-regulated and 520 were significantly upregulated (P<0.05). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis identified significantly enriched DEGs in four signaling pathways, MAPK, TNF, HIF-1, and insulin. Six genes, WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP in the center of the PPI network were associated with septic shock, according to survival curve and ROC analysis. CONCLUSIONS Bioinformatics analysis of gene expression profiles identified four signaling pathways and six genes, potentially representing molecular mechanisms for the occurrence, progression, and risk prediction in septic shock.
Sepsis, or septicemia, is the response to systemic infection and is characterized by fever, tachycardia, and leukocytosis [1,2]. Septic shock occurs when sepsis is associated with critically low blood pressure and has a high mortality rate [2,3]. Sepsis and septic shock are common acute illnesses treated in the intensive care unit (ICU) [3]. Worldwide, it has been estimated that approximately 8 million people die annually from sepsis, usually from septic shock, and that circulatory, cellular, and metabolic abnormalities can significantly increase mortality rates [1,4].Based on evidence from patient database analysis the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) proposed that the clinical definition of septic shock should include hypotension, the requirement for adequate fluid resuscitation and treatment with vasopressin, and a serum lactate level >2 mmol/L [4,5]. Although patients with septic shock who undergo active fluid resuscitation show improved tissue perfusion, and the use of vasoactive drugs and anti-infective symptomatic treatment are used [6-8], the 28-day mortality, ICU mortality, and in-hospital mortality in patients with septic shock remain high [6]. Also, the chances of re-admission after discharge from hospital are higher than those of general ICU patients, and a significant proportion exhibit cognitive and functional impairment following treatment [9-11]. To reduce patientmortality and improve quality of life following resuscitation from septic shock, there has been continued clinical and preclinical research but with few developments. However, the use of bioinformatics analysis of gene expression profiles has the potential to identify genes and pathways that may stratify patients by risk and lead to new approaches for diagnosis and treatment of septic shock.Bioinformatics involves the integration of computer science and life science to screen a large amount of molecular and clinical data with data mining through pathway analysis, statistical analysis, and visual processing to investigate disease at the molecular level, and has been widely used in research on sepsis [2,12,13]. In some studies, bioinformatics has identified disease-related molecules, which have then been investigated and validated in clinical trials [14,15]. Although several biomarkers have previously been identified and tested in the treatment of sepsis, there have been few RNA expression profiles studied in septic shock [12,16].Therefore, this study aimed to undertake a bioinformatics analysis of gene expression profiles for risk prediction in septic shock.
Material and Methods
Data sources
The Gene Expression Omnibus (GEO) database () from the National Center for Biotechnology Information (NCBI) was used, which stored curated gene expression datasets. Firstly, the term ‘septic shock’ was entered into the GEO search box as a keyword, followed by selecting ‘homo sapiens’ as the specimen. Datasets with more than 20 patient samples were selected. However, poor quality data (such as GSE48080, GSE63042) were excluded, and the datasets, GSE64457 and GSE57065 were selected for this study. Patients with sepsis and hypotension were included in the ‘septic shock’ group, while healthy people were included in the ‘control’ group. The gene expression profile data of GSE64457 included 15 cases of septic shock and eight healthy volunteers with data on whole blood-derived RNA samples. The expression data of GSE57065 were generated from 82 patients with septic shock and 25 healthy volunteers. The platform used for both expression data was the GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array (Affymetrix, Santa Clara, CA, USA).
Data preprocessing and analysis of differentially expressed genes (DEGs)
After logarithmic pretreatment of the original data, OmicShare online tools based on R (www.omicshare.com/tools) were used to identify the DEGs in the septic shock and control cases. The P-value was corrected for the occurrence of false-positive results, and the genes with a P-value <0.05 were considered as initial DEGs. The intersections of upregulated and down-regulated genes in the two sets of data were defined as the final DEGs.
Gene Ontology (GO) and pathway enrichment analysis
GO is a popular method to classify gene expression and its properties, including molecular function, biological processes (BPs), and cellular components, which provide comprehensive functional annotation tools for investigators to integrate significant genes with a specific function. The Database for Annotation, Visualization, and Comprehensive Discovery (DAVID) was used to identify the overrepresented GO categories in the BPs with a P-value <0.05. Kyoto Encyclopedia of Gene and Genomes (KEGG) is an online database that collects information on genomic, biochemical, and enzymatic pathways. The DEGs were mapped to the KEGG database and the significantly related pathways were screened. The GO functional and KEGG pathway analyses were performed using DAVID. The results were visualized using OmicShare tools.
Protein–protein interaction (PPI) network construction
The STRING database () is used to search for interactions between known proteins and predictive proteins. The PPI network analysis investigates the molecular mechanism underlying various diseases and identifies the targets of new drugs from a systematic perspective. The STRING database was used in this study to annotate the functional interaction between target genes and other genes based on its node degree and the network, with a combined score was >0.3. In theory, the more the connections, the more important the gene, because of its wider associations with other genes.
Survival curve analysis
In the GSE65682 dataset, which included 479 patients with sepsis, expression data for each gene and prognosis were recorded for each patient. The survival curves were generated using GraphPad Prism version 7.0. (GraphPad Software, La Jolla, CA, USA). The combination of gene expression and survival time enabled prediction o the function of a specific gene in septic shock.
MedCalc statistical software () was used to perform ROC curve analysis and to determine the specificity, sensitivity, likelihood ratios, positive-predictive values and negative-predictive values for all the possible thresholds of the ROC curve. The value of the genes was predicted based on the ROC curve analysis.
Results
Identification of differentially expressed genes (DEGs)
The microarray datasets GSE64457 and GSE57065 were obtained from the Gene Expression Omnibus (GEO) database and were analyzed by OmicShare tools to map the volcano plots and Venn diagrams of the DEGs (Figure 1A, 1B). A total of 975 genes were designated as DEGs in septic shock cases when compared with the controls (P<0.05), including 455 downregulated genes and 520 upregulated genes (Figure 1C, 1D).
Figure 1
Volcano plots of differentially expressed genes (DEGs) in GSE64457 and GSE57065. (A) and (B) show the volcano plots of differentially expressed genes (DEGs) in GSE64457 and GSE57065, respectively. The abscissa represents log2 (fold change), the negative represents down-regulated genes, and the positive represents upregulated genes. The ordinate represents −log10 (P-value). (C) and (D) show the intersection of GSE64457 and GSE57065 upregulated genes and down-regulated genes, respectively.
Enrichment analysis in septic shock
Enrichment analysis techniques extract biological knowledge from a set of genes or proteins. To screen for biomarkers related to the diagnosis of septic shock, the DAVID online software database was used to annotate the gene functions. Also, significantly enriched categories were identified by comparing the distribution of genes in each Gene Ontology (GO) category between the gene sets of interest and the reference gene set. The differential genes associated with biological functions, including cellular macromolecule metabolic processes, and cell death were identified (Figure 2A). Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis was used to screen the signaling pathways for differential genes. These DEGs were mainly involved in the MAPK signaling pathway, the TNF signaling pathway, the HIF-1 signaling pathway, and the insulin signaling pathway (Figure 2B).
Figure 2
Gene Ontology (GO) pathway analysis of differentially expressed genes (DEGs). (A) and (B) show the results of GO and pathway analysis of differentially expressed genes (DEGs). The darker the red represents P-value, and the darker the green represents the P-value. The size of the dots represents the number of genes in different items. The rich factor is the proportion of DEGs in each item.
Protein–protein interaction (PPI) networks
To identify the core genes from multiple angles, this study submitted DEGs to the STRING database to construct a protein-protein interaction (PPI) network, and to further identify the interaction between genes at the protein level. Identifying important nodes in the network map was valuable for screening the most critical genes. A total of 975 DEGs were submitted to construct a PPI network, and six genes located at the core of the network were identified, including WD repeat domain 82 (WDR82), ASH1 like histone lysine methyltransferase (ASH1L), translocated promoter region, nuclear basket protein (TPR), nuclear receptor coactivator 1 (NCOA1), splicing factor 1 (SF1), and CREB binding protein (CREBBP), which were related to histone modification, cellular protein modification, and metabolic process (Figure 3).
Figure 3
The protein-protein interaction (PPI) networks based on the screened differentially expressed genes (DEGs). (A) The PPI networks based on DEGs screened above. Red represents the cellular protein modification process, green represents histone modification, and blue represents cellular protein metabolic process. (B) The heat map of the six potential core genes in GSE64457 (WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP). Red represents relatively high expression, while green represents relatively low expression.
Based on the data from GSE65682, the patients were divided into high-expression and low-expression groups, according to the expression of WDR82, ASH1L, CREBBP, NCOA1, SF1, and TPR genes. The survival curves and receiver operating characteristic (ROC) curves were analyzed by GraphPad Prism version 7.0 based on the survival time for patients with septic shock. The six genes, WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP showed a positive correlation with prognosis in patients with septic shock (Figure 4A–4F), which may indicate that these may be protective genes.
Figure 4
Survival curves for the six potential core genes in GSE64457 (WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP). (A–F) The survival curves of the six potential core genes in GSE64457 (WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP). Patients with high expression of all six genes had a better prognosis than those with low gene expression.
Based on the data of GSE57065 and GSE64457, the patients were divided into the septic shock group and the control group. The ROC curves were analyzed by MedCalc to evaluate the diagnostic accuracy of DEGs for septic shock. The area under the curve (AUC) could be mapped to compare different screening genes. Therefore, the six genes identified, WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP might be considered as new biomarkers for sepsis and septic shock and require further studies to validate these preliminary findings (Figure 5A–5F).
Figure 5
Receiver operating characteristic (ROC) curves for the six potential core genes in GSE64457 (WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP). (A–F) The receiver operating characteristic (ROC) curves of the six potential core genes in GSE64457 (WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP). The area under the curve (AUC) for all six genes was >0.8 for ROC analysis, supporting their potential role as diagnostic indicators.
Discussion
Septic shock occurs when sepsis is associated with a critically low blood pressure, and has a high mortality rate [2,3]. Although the ability to diagnose and treat septic shock has increased, the survival rate of patients with septic shock is low, and new and sensitive diagnostic and predictive markers are needed. The early diagnosis and treatment of septic shock is essential to reduce patientmortality and to improve quality of life after discharge from the intensive care unit (ICU). Molecular biology techniques have previously been used to investigate the pathogenesis of sepsis and septic shock, to provide insight into the diagnosis and treatment. In the present study, a comparative analysis of mRNA between patients with septic shockpatients and controls identified a range of differentially expressed genes (DEGs) for septic shock.Molecules have different roles in biological cellular functions, but collectively regulate the activity of the cells. An interactive network of Gene Ontology (GO) functional enrichment analysis and pathway analysis showed that the cellular function and signaling pathway were affected by DEGs. GO and pathway analysis showed that the DEGs were associated with histone modification, cellular protein modification, and metabolic processes. To identify the key components in the diagnosis or treatment of septic shock, the protein-protein interaction (PPI) network analysis was performed, followed by pathway enrichment analysis. Six core genes were identified that included WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP that were found to be involved in the development of septic shock.The WD repeat domain 82, encoded by the WDR82 gene, is one of the subunits of the SETD1A/B complex, which is correlated with transcriptional activity and is considered to be a marker of active gene transcription [17,18]. In mammalian cells, Set1 has a role in histone H3 lysine K4 (H3K4) methylation, and H3K4 methylation patterns and functional properties are related to the interaction between Wdr82 and Set1 complexes, and H3K4 trimethylation is related to the transcription start site of the transcribed genes [19,20]. In cells, WDR82 expression is involved in the expression of the cytokine interferon-β, encoded by the INF-β gene [18,20,21], but its role in the development of sepsis remains unclear. The WDR82 gene may be a putative biomarker or drug target in septic shock.Recent studies have shown that ASH1-like histone lysine methyltransferase (ASH1L) is associated with cancer pathogenesis and immunity [22]. ASH1L is a mammalian homolog of DrosophilaAsh, which activates the expression of multiple genes depending on the activity of the SET domain H3K4 methyltransferase [22,23]. The ASH1L gene is widely expressed in cells in vivo, including CD4 lymphocytes, macrophages, and natural killer (NK) cells [24,25]. Previously reported studies have shown that the expression of ASH1L inhibited interleukin-6 (IL-6) and toll-like receptor (TLR) that triggered the production of tumor necrosis factor (TNF) in a mouse model of sepsis [26]. However, ASH1L-mediated H3K4 methylation inhibits IL-6 production and the inflammatory responses of the Tnfaip3 promoter, indicating an inhibitory role of ASH1L in the development of septic shock following reduction of IL-6 and TNF, supporting the potential for ASH1L as a therapeutic target in septic shock [26,27].Nuclear receptor coactivator 1 (NCOA1), also known as steroid receptor coactivator-1 (SRC-1), is essential for the transcriptional activation [28]. NCOA1 is a member of the SRC family (SRC-1, SRC-2, SRC-3) that interacts with nuclear receptors and other transcription factors to regulate gene transcription, with most studies having involved its role in malignancy [29,30]. The findings from the present study are supported by the findings from previous studies that have shown that expression of the NCOA1 gene downregulates the role of pro-inflammatory factors, including NF-κB and AP-1, to exert its anti-inflammatory effects [31-33]. These findings support the potential role of NCOA1 as a therapeutic target for the treatment of septic shock.Transient receptor potential (TRP) channels are non-selective ion channels involved in several physiological processes [34]. The TRP signaling pathway is divided into several subtypes, including TRPCs, TRPVs, TRPMs, and TRPA1. The transient receptor potential cation channel subfamily M member 2 (TRPM2) is an acid channel permeable to Ca2+, Na2+, and K+ [35]. Recent studies have shown that expression of the TRPM2 gene regulates the activity of macrophages in bacterial phagocytosis and killing [36]. The ion channel transient receptor potential melastatin-like 7 (TRPM7) is a key component of TLR4 signaling, and is a non-selective ion channel that is highly permeable to Ca2 ion channels and mediates the cytosolic increase in Ca2+ that is essential for lipopolysaccharide (LPS)-induced macrophage activation [37]. Increased intracellular Ca2+ is required for TLR4 endocytosis and activation of the transcription factor IRF3. Ca2+ signaling is also key to the nuclear translocation of NF-κB by TRPM7. In the current study, the TRPM7 gene was down-regulated. Previous studies have shown that downregulation of TRPM7 inhibited the production of the key pro-inflammatory cytokine, IL-1β, suggesting TRPM7 may be a potential target gene for the treatment of septic shock [38].Splicing factor 1 (SF1), which is also known as ZMF1, recognizes the 3′ splice sites in yeast, flies, nematodes, and human cell lines [39]. As a ternary complex with U2AF65, the U2AF35 cleavage factor, SF1, is required for cell proliferation [39,40]. Expression of the SF1 gene is q specific precursor in mRNA splicing in cells, and the phosphorylation of SF1 might enhance the RNA binding [41]. Expression of the SF1 gene plays a role in the development of several disease processes [40,42]. The findings from the present study indicated that the SF1 gene may be a new biomarker for septic shock.CREB-binding protein (CREBBP of CBP) is a histone acetyltransferase that acetylates multiple transcription factors and histones [43]. By interacting with transcription factors, the expression of CREBBP increases the expression of target genes, which then regulate embryonic growth and homeostasis [44]. Recent in vitro and in vivo studies have shown competition between NF-κB and cAMP response element binding protein (CREB) to CREBBP, which regulates transcriptional activity to inhibit the inflammatory response, although the specific mechanism remain to be determined [43,45-47]. The findings from the present study indicated that the CREBBP gene may be a new biomarker for septic shock.
Conclusions
This study aimed to undertake a bioinformatics analysis of gene expression profiles for risk prediction in septic shock. Two datasets associated with septic shock were downloaded from the Gene Expression Omnibus (GEO), GSE64457 and GSE57065, and differentially expressed genes (DEGs) were identified. Following establishment of protein-protein interaction (PPI) networks, pathway analysis identified significantly enriched DEGs in four signaling pathways, MAPK, TNF, HIF-1, and insulin. Six genes were identified, WDR82, ASH1L, NCOA1, TPR, SF1, and CREBBP, which were associated with patient prognosis in septic shock. The roles of these pathways and genes in patients with septic shock require further investigation in future clinical studies.
Authors: Jinfeng Liu; William Lee; Zhaoshi Jiang; Zhongqiang Chen; Suchit Jhunjhunwala; Peter M Haverty; Florian Gnad; Yinghui Guan; Houston N Gilbert; Jeremy Stinson; Christiaan Klijn; Joseph Guillory; Deepali Bhatt; Steffan Vartanian; Kimberly Walter; Jocelyn Chan; Thomas Holcomb; Peter Dijkgraaf; Stephanie Johnson; Julie Koeman; John D Minna; Adi F Gazdar; Howard M Stern; Klaus P Hoeflich; Thomas D Wu; Jeff Settleman; Frederic J de Sauvage; Robert C Gentleman; Richard M Neve; David Stokoe; Zora Modrusan; Somasekar Seshagiri; David S Shames; Zemin Zhang Journal: Genome Res Date: 2012-10-02 Impact factor: 9.043