| Literature DB >> 29100313 |
Li Yin1,2,3, Cuifang Chang2, Cunshuan Xu1,2.
Abstract
The present study was designed to explore the molecular mechanism at the early stage of hepatocarcinoma (HCC) and identify the candidate genes and pathways changed significantly. We downloaded the gene expression file dataset GSE6764 from GEO, adopted the Robust Multi-array Average (RMA) algorithm to preprocess the raw file. 797 differentially expressed genes (DEGs) were screened out based on the SAM method using R language. Ingenuity Pathway Analysis (IPA) was used to perform canonical pathway analysis in order to calculate the most significantly changed pathways and predict the upstream regulators. In order to confirm the results from the DEGs which based on the individual gene level, the gene set enrichment analysis (GSEA) was done from the gene set level and the leading edge analysis was performed to find out the most appeared genes in several gene sets. The PPI network was built using GeneMANIA and the key genes were calculated using cytoHubba plugin based on cytoscape 3.4.0. We found that the Cell Cycle: G2/M DNA damage checkpoint regulation is the top-ranked pathways at the early stage of HCC by IPA. The high expression of several genes including CCNB1, CDC25B, XPO1, GMPS, KPNA2 and MELK is correlated with high risk, poor prognosis and shorter overall survival time in HCC patients by use of Kaplan-Meier Survival analysis. Taken together, our study showed that the G2/M checkpoint plays a vital role at the early HCC and the genes participate in the process may serve as biomarkers for the diagnosis and prognosis.Entities:
Keywords: G2/M checkpoint; GSEA; IPA; early HCC; leading edge analysis
Year: 2017 PMID: 29100313 PMCID: PMC5652707 DOI: 10.18632/oncotarget.19351
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1The QC plot and box plot before and after normalization
(A) The quality control (QC) plot analysis of the raw data. (B) The box plot for the data before normalization. (C) The box plot for the data after normalization.
Figure 2(A) Plot of the observed d-values vs. the ordered expected d-values. Each gene is represented by a dot, and the differentially expressed genes are colored in green. Compared to control group, there are 421 genes being significantly up-regulated (green dots above) and 376 genes being significantly down-regulated in HCC (green dots below) at an FDR of 0.1%. (B) Plot of the number of significant genes vs. types identified from DEGs from IPA.
Figure 3The most representative canonical pathways associated with the early stage of HCC are shown from Ingenuity Pathway Analysis (IPA)
The number of DEGs are shown in the figure. Red represents the up-regulated genes, the green represents the down-regulated genes and the grey represents the no overlap genes with dataset. The significance (-log p value) of every pathway is indicated in parenthesis.
Upstream regulator analysis of differentially expressed genes in the early stage of HCC
| Upstream regulator | Predicted activation state | Activation z-score | p-Value of overlap | Target molecules in dataset |
|---|---|---|---|---|
| IRF3 | Activated | 2.248 | 0.0000463 | ADAR,APOBEC3B,CLIC4,HLA-F,IFI27,IFI44,IFI6,IFIT2,ISG15,OASL,PARP12,PLAC8,PNP,STAT1,STAT2,TAP1,TDRD7,TLR4,TSLP |
| IRF7 | Activated | 2.367 | 0.0000635 | ADAR,BCL2L13,IFI44,IFI6,IFIT2,IL33,ISG15,MICB,MX1,OASL,PARP12,PLAC8,STAT1,STAT2,TAP1,TDRD7,TLR4 |
| NLRC5 | Activated | 2.182 | 0.000528 | HLA-A,HLA-B,HLA-C,HLA-F,TAP1 |
| HOXA10 | Activated | 2.335 | 0.00442 | ALPL,BCHE,CDKN2B,COL15A1,HSP90AA1,IGFBP3,MYCN,NDRG2,PEG3,PHGDH,PROS1,SOS1,XDH,YWHAG |
| IRF5 | Activated | 2.607 | 0.0105 | IFI44,IFIT2,ISG15,OASL,PARP12,STAT1,STAT2 |
| SATB1 | Activated | 2.373 | 0.0316 | DSTYK,FERMT2,FOXJ3,HSP90AA1,LRRN3,NCOR1,PTGS2,TAOK1,TSLP,ZKSCAN8,ZNF287 |
| FOXO1 | Activated | 2.005 | 0.0357 | ANLN,APOA5,ASPM,BCL2L13,CASP2,CCNA2,CCNB1,CCNB2,CDKN2B,CENPF,DLGAP5,EBF1,EGR1,FOS,GPD1,KLF7,NEK2,PRC1,STAT2 |
| TP53 | Inhibited | -2.388 | 1.93E-09 | ABAT,ACAA2,ADGRB3,ALB,ANLN,AQP3,ASPM,ATAD2,AURKA,BMX,CAMLG,CARHSP1,CASP2,CCNA2,CCNB1,CCNB2,CD82,CDKN2A,CDKN3,CENPF,CKAP2,CLIC4,CLU,COL4A1,COMT,CXCL12,DLGAP5,DNM1L,DUT,EDIL3,EGR1,EIF4G3,ELK4,ESR1,EZH2,FAT1,FERMT2,FOS,GMNN,GNA14,H2AFY,HLA-B,HMMR,HSP90AA1,IGFBP3,ISG15,KPNA2,MAP2K1,MDM4,MELK,MX1,MYBL1,NDC80,NDRG2,NEK2,NPNT,ORM2,PDGFA,PDLIM5,PEG3,PHGDH,PIK3R3,PLPBP,PODXL,PPFIBP1,PRC1,PRKAB1,PTGS2,PTTG1,PURA,PVT1,RACGAP1,RALBP1,RFWD2,RLIM,ROBO1,RRM2,SFRP1,SON,STAT1,STEAP3,TAP1,TFPI2,TINAGL1,TJP1,TOP2A,TP53BP2,TPD52L1,TRIO,USP14,WNT2,XPO1,ZEB2 |
| HNF1A | Inhibited | -2.256 | 0.0000157 | ABCC9,ADH6,ALB,ANKS4B,APOH,AQP3,C8A,C8B,CYP1A2,F11,FOXJ3,HPX,IFNAR1,LCAT,LEF1,LY6E,MT1H,MT1X,NBR1,NPC1L1,NR1H4,PAMR1,PKHD1,PNO1,PPP1R1A,PZP,SLC12A7,SLC17A2,SLC38A4,SLC7A2,SUPV3L1,TMEM27,TROVE2,ZNF502 |
| HMGA1 | Inhibited | -2.206 | 0.000605 | ALPL,COL4A1,EGR1,ESR1,FOS,GHR,IDI1,IER2,IGFALS,IGFBP3,LY6E,MAPT,PTGS2,PTH1R |
| TRIM24 | Inhibited | -2.525 | 0.00217 | IFI44,IFIT2,ISG15,OASL,PARP12,PLAC8,SAMHD1,STAT1,STAT2,TAP1 |
| IRF4 | Inhibited | -2.975 | 0.0322 | ALPL,CCNB1,CDKN2A,ENTPD1,IL33,ISG15,PDCD6,SMARCA4,STAT1,STAT2 |
| ELK1 | Inhibited | -2.146 | 0.0329 | CDKN2A,EGR1,FOS,PTGS2,TPD52L1 |
Figure 4Upstream regulator analysis of differentially expressed genes at the early stage of HCC
7 TFs which was predicted to be activated as determined by IPA.
Figure 5Gene expression profiling identifies pathways upregulated at the early stage of HCC
(A-H) The 7 significantly enriched gene sets in HCC. The normalized enrichment score, the false discovery rates (FDR) and the nominal p-value score(NES) are indicated for each gene set. Each bar at the bottom of each panel represents a member gene of the respective pathway from plot A-H and (I) shows its relative location in the ranked list of genes.
Figure 6Set-to-set and gene in subsets from the leading edge analysis
The left graph showed the overlap between 8 subsets: the darker the color, the greater the overlap between the subsets. The intensity of the cell A and B corresponds to an X/Y ratio which is the number of leading edge genes from set A and Y is the union of leading edge genes in sets A and B. The right graph shows each gene and the number of subsets in which it appears.
Figure 7PPI network of 15 top-ranked DEGs and top 20 most related genes associated with the onset of HCC
The genes belong to DEGs colored by their logFC. The network was generated using the GeneMANIA plugin. The networks legend indicates the types of interactions between genes.
Figure 8The molecular activation prediction (MAP) figure based on IPA
Figure 9Kaplan-Meier curves of XPO1 and KPNA2 in TCGA liver cancer dataset (https://tcga-data.nci.nih.gov/publications/tcga) with SurvExpress (n=381)
Censoring samples are shown as “+” marks.Horizontal axis represents time (day) to event. Outcome event, time scale, condordance index (CI) and p-value of the log-rank test are shown. Red and green curves represent High and Low-risk groups. The number below horizontal axis represents the number of individuals not presenting the event of the corresponding risk groups along time. (A) High expression of XPO1 is correlated with high risk, poor prognosis and shorter overall survival time. (B) High expression of KPNA2 indicates high risk, poor prognosis and shorter overall survival time. The down panel shows box plot across risk groups with the p-value.