| Literature DB >> 31086582 |
Lianyue Guan1, Qiang Luo2, Na Liang3, Hongyu Liu1.
Abstract
In the present study, gene expression data of hepatocellular carcinoma (HCC) were analyzed by using a multi-step Bioinformatics approach to establish a novel prognostic prediction system. Gene expression profiles were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. The overlapping differentially expressed genes (DEGs) between these two datasets were identified using the limma package in R. Prognostic genes were further identified by Cox regression using the survival package. The significantly co-expressed gene pairs were selected using the R function cor to construct the co-expression network. Functional and module analyses were also performed. Next, a prognostic prediction system was established by Bayes discriminant analysis using the discriminant.bayes function in the e1071 package, which was further validated in another independent GEO dataset. A total of 177 overlapping DEGs were identified from TCGA and the GEO dataset (GSE36376). Furthermore, 161 prognostic genes were selected and the top six were stanniocalcin 2, carbonic anhydrase 12, cell division cycle (CDC) 20, deoxyribonuclease 1 like 3, glucosylceramidase β3 and metallothionein 1G. A gene co-expression network involving 41 upregulated and 52 downregulated genes was constructed. SPC24, endothelial cell specific molecule 1, CDC20, CDCA3, cyclin (CCN) E1 and chromatin licensing and DNA replication factor 1 were significantly associated with cell division, mitotic cell cycle and positive regulation of cell proliferation. CCNB1, CCNE1, CCNB2 and stratifin were clearly associated with the p53 signaling pathway. A prognostic prediction system containing 55 signature genes was established and then validated in the GEO dataset GSE20140. In conclusion, the present study identified a number of prognostic genes and established a prediction system to assess the prognosis of HCC patients.Entities:
Keywords: differentially expressed genes; gene expression data; hepatocellular carcinoma; prognostic genes; prognostic prediction system; survival curve
Year: 2019 PMID: 31086582 PMCID: PMC6489019 DOI: 10.3892/etm.2019.7494
Source DB: PubMed Journal: Exp Ther Med ISSN: 1792-0981 Impact factor: 2.447
Figure 1.Schematic diagram for a multi-step strategy to identify a gene signature for the prognosis of hepatocellular carcinoma. The results for each step have been summarized. TCGA, The Cancer Genome Atlas; DEGs, differential expressed genes; OS, overall survival; K-M, Kaplan-Meier; TF, transcription factor.
Figure 2.Distribution of mRNA expression density. The solid and dotted line, respectively, indicate the distribution prior to and after the removal of low-abundance mRNAs.
Figure 3.Hierarchical clustering analysis using the top 25 downregulated and upregulated genes among the 177 overlapping differentially expressed genes. (A) for The Cancer Genome Atlas dataset; (B) for the GSE36376 dataset. Pink indicates hepatocellular cancer cases, while blue indicates normal controls.
Figure 4.Kaplan-Meier survival curves for the top six prognostic genes. (A) STC2; (B) CA12; (C) CDC20; (D) DNASE1L3; (E) GBA3; (F) MT1G. The red lines indicate cases with high expression (expression value > median), while the black line indicates cases with low expression (expression value < median). 95% confidence intervals are displayed after the HR in brackets. HR, hazard ratio; STC2, stanniocalcin 2; CA12, carbonic anhydrase 12; CDC20, cell division cycle 20; DNASE1L3, deoxyribonuclease 1 like 3; GBA3, glucosylceramidase β3; MT1G, metallothionein.
Figure 5.Co-expression network. (A) For prognostic genes; (B) transcription factor-regulated. Red lines indicate negative co-expression pairs, green lines indicate positive co-expression pairs and purple lines indicate transcriptional regulations. Triangles represent upregulated genes, while inverted triangles represent downregulated genes, and pink squares represent transcriptional factors. Functional modules are presented in different colors.
Figure 6.Significantly enriched (A) GO terms and (B) Kyoto Encyclopedia of Genes and Genomes pathways for the 93 prognostic genes from the gene co-expression network. GO, gene ontology; Hsa, Homo sapiens.
GO functional terms enriched by the prognostic genes in the co-expression network.
| Category/term | Count | P-value | Genes |
|---|---|---|---|
| Biological process | |||
| GO:0051301~cell division | 19 | 2.29×10−13 | CDC7, KIFC1, NEK2, AURKA, CDC20, PTTG1, CDC25A, SPC24, FAM83D, CCNB1, CCNE1, CDCA8, CCNB2, NCAPG, BUB1, CENPW, CDCA5, CCNA2, CDCA3 |
| GO:0007067~mitotic nuclear division | 16 | 2.760×10−12 | NEK2, CDC20, AURKA, PBK, AURKB, CEP55, PTTG1, CDC25A, SPC24, FAM83D, CCNB2, BUB1, CENPW, CDCA5, CCNA2, CDCA3 |
| GO:0000082~G1/S transition of mitotic cell cycle | 8 | 1.05×10−6 | CDC7, CCNE1, CDC45, POLE2, CDKN3, CDCA5, CDC25A, CDT1 |
| GO:0007062~sister chromatid cohesion | 7 | 1.66×10−5 | SPC24, CDCA8, CENPL, BUB1, CDC20, AURKB, CDCA5 |
| GO:0006260~DNA replication | 7 | 1.63×10−4 | CDC7, CDC45, POLE2, CHAF1B, CDC25A, DSCC1, CDT1 |
| GO:0008284~positive regulation of cell proliferation | 7 | 3.56×10−2 | CDC7, PRC1, PTH1R, PDGFRA, CDC20, ESM1, VIPR1 |
| GO:0071276~cellular response to cadmium ion | 6 | 2.07×10−8 | MT1A, CYP1A2, MT1H, MT1X, MT1G, MT1F |
| GO:0000086~G2/M transition of mitotic cell cycle | 6 | 7.65×10−4 | CCNB1, CCNB2, NEK2, AURKA, CDC25A, HMMR |
| GO:0006281~DNA repair | 6 | 7.87×10−3 | RAD51AP1, POLE2, PTTG1, CHAF1B, UBE2T, RAD51 |
| GO:0008283~cell proliferation Cellular component | 6 | 4.32×10−2 | FAM83D, STIL, DLGAP5, BUB1, AURKB, CDC25A |
| GO:0005654~nucleoplasm | 26 | 1.40×10−3 | PRC1, AURKA, AURKB, CDT1, CCNE1, CDC45, CDCA8, POLE2, BUB1, THRSP, CCNA2, CDCA5, TOP2A, CDC7, RAD51AP1, CENPL, CDC20, CDC25A, RAD51, CCNB1, CCNB2, CENPW, CHAF1B, UBE2T, DSCC1, KIF20A |
| GO:0005576~extracellular region | 14 | 4.86×10−2 | C7, CNDP1, HSD17B13, COL15A1, CCL19, DCN, ESM1, ECM1, GREM2, DNASE1L3, MMP11, LPA, APOF, MFAP4 |
| GO:0005813~centrosome | 9 | 1.13×10−3 | CCNB1, STIL, CDC45, CCNB2, NCAPG, NEK2, AURKA, CDC20, CEP55 |
| GO:0048471~perinuclear region of cytoplasm | 9 | 1.09×10−2 | MT1A, AURKA, CDC20, CDKN3, MT1H, MT1X, MT1G, RAD51, MT1F |
| GO:0030496~midbody | 8 | 3.22×10−6 | CDCA8, PRC1, NEK2, AURKA, CEP55, AURKB, ECT2, KIF20A |
| GO:0005819~spindle | 6 | 3.16×10−4 | KIFC1, PRC1, AURKA, CDC20, AURKB, KIF20A |
| Molecular function | |||
| GO:0019901~protein kinase binding | 9 | 7.10×10−4 | CCNB1, FAM83D, CCNE1, PRC1, AURKA, SFN, CCNA2, CDC25A, KIF20A |
| GO:0004672~protein kinase activity | 7 | 1.06×10−2 | CDC7, NEK2, PDGFRA, BUB1, AURKA, PBK, AURKB |
| GO:0004674~protein serine/threonine kinase activity | 6 | 4.48×10−2 | CDC7, NEK2, BUB1, AURKA, PBK, AURKB |
GO, Gene Ontology; APOF, apolipoprotein F; AURKA, aurora kinase A; AURKB, aurora kinase B; BUB1, BUB1 mitotic checkpoint serine/threonine kinase; C7, complement C7; CCL19, C-C motif chemokine ligand 19; CCNA2, cyclin A2; CCNB1, cyclin B1; CCNB2, cyclin B2; CCNE1, cyclin E1; CDC7, cell division cycle 7; CDC20, cell division cycle 20; CDC25A, cell division cycle 25A; CDC45, cell division cycle 45; CDCA3, cell division cycle associated 3; CDCA5, cell division cycle associated 5; CDCA8, cell division cycle associated 8; CDKN3, cyclin dependent kinase inhibitor 3; CDT1, chromatin licensing and DNA replication factor 1; CENPL, centromere protein L; CENPW, centromere protein W; CEP55, centrosomal protein 55; CHAF1B, chromatin assembly factor 1 subunit B; CNDP1, carnosine dipeptidase 1; COL15A1, collagen type XV alpha 1 chain; CYP1A2, cytochrome P450 family 1 subfamily A member 2; DCN, decorin; DLGAP5, DLG associated protein 5; DNASE1L3, deoxyribonuclease 1 like 3; DSCC1, DNA replication and sister chromatid cohesion 1; ECM1, extracellular matrix protein 1; ECT2, epithelial cell transforming 2; ESM1, endothelial cell specific molecule 1; FAM83D, family with sequence similarity 83 member D; GREM2, gremlin 2, DAN family BMP antagonist; HMMR, hyaluronan mediated motility receptor; HSD17B13, hydroxysteroid 17-beta dehydrogenase 13; KIF20A, kinesin family member 20A; KIFC1, kinesin family member C1; LPA, lipoprotein(a); MFAP4, microfibril associated protein 4; MMP11, matrix metallopeptidase 11; MT1A, metallothionein 1A; MT1F, metallothionein 1F; MT1G, metallothionein 1G; MT1H, metallothionein 1H; MT1X, metallothionein 1X; NCAPG, non-SMC condensin I complex subunit G; NEK2, NIMA related kinase 2; PBK, PDZ binding kinase; PDGFRA, platelet derived growth factor receptor alpha; POLE2, DNA polymerase epsilon 2, accessory subunit; PRC1, protein regulator of cytokinesis 1; PTH1R, parathyroid hormone 1 receptor; PTTG1, pituitary tumor-transforming 1; RAD51, RAD51 recombinase; RAD51AP1, RAD51 associated protein 1; SFN, stratifin; SPC24, kinetochore-associated Ndc80 complex subunit SPC24; STIL, STIL centriolar assembly protein; THRSP, thyroid hormone responsive; TOP2A, DNA topoisomerase II alpha; UBE2T, ubiquitin conjugating enzyme E2 T; VIPR1, vasoactive intestinal peptide receptor 1.
Significantly enriched pathways for the prognostic genes in the co-expression network.
| Term | Count | P-value | Genes |
|---|---|---|---|
| hsa04110:Cell cycle | 11 | 4.51×10−9 | CCNB1, CDC7, CCNE1, CDC45, CCNB2, BUB1, CDC20, PTTG1, SFN, CCNA2, CDC25A |
| hsa04978:Mineral absorption | 5 | 2.10×10−4 | MT1A, MT1H, MT1X, MT1G, MT1F |
| hsa04115:p53 signaling pathway | 4 | 9.25×10−3 | CCNB1, CCNE1, CCNB2, SFN |
| hsa00380:Tryptophan metabolism | 3 | 2.77×10−2 | AADAT, CYP1A2, INMT |
| hsa00232:Caffeine metabolism | 2 | 3.22×10−2 | NAT2, CYP1A2 |
| hsa00140:Steroid hormone biosynthesis | 3 | 3.45×10−2 | CYP3A4, SRD5A2, CYP1A2 |
| hsa05204:Chemical carcinogenesis | 3 | 4.53×10−2 | CYP3A4, NAT2, CYP1A2 |
Hsa, Homo sapiens; AADAT, aminoadipate aminotransferase; BUB1, BUB1 mitotic checkpoint serine/threonine kinase; CCNA2, cyclin A2; CCNB1, cyclin B1; CCNB2, cyclin B2; CCNE1, cyclin E1; CDC20, cell division cycle 20; CDC25A, cell division cycle 25A; CDC45, cell division cycle 45; CDC7, cell division cycle 7; CYP1A2, cytochrome P450 family 1 subfamily A member 2; CYP3A4, cytochrome P450 family 3 subfamily A member 4; INMT, indolethylamine N-methyltransferase; MT1A, metallothionein 1A; MT1F, metallothionein 1F; MT1G, metallothionein 1G; MT1H, metallothionein 1H; MT1X, metallothionein 1X; NAT2, N-acetyltransferase 2; PTTG1, pituitary tumor-transforming 1; SFN, stratifin; SRD5A2, steroid 5 alpha-reductase 2.
Figure 7.(A) Workflow for the established prognostic prediction system. (B) Distribution of the prognostic score. (C) Kaplan-Meier survival curves for cases from the TCGA dataset. The blue line represents the favorable prognostic group and the green line represents the poor prognostic group. (D) Kaplan-Meier survival curves for cases from the GSE20140 dataset. The blue line represents the favorable prognostic group and the green line represents the poor prognostic group, as identified by the prognostic prediction system. TCGA, The Cancer Genome Atlas.
The 55 signature genes involved in the prognostic prediction system.
| A, Upregulated genes | |||
|---|---|---|---|
| Gene | logFC | P-value | FDR |
| SPC24 | 1.777 | 8.57×10−33 | 2.66×10−30 |
| TGM3 | 1.396 | 4.38×10−35 | 1.96×10−32 |
| KIF20A | 1.173 | 2.42×10−30 | 5.74×10−28 |
| ESM1 | 1.162 | 6.50×10−30 | 1.40×10−27 |
| CDC20 | 1.106 | 1.51×10−30 | 3.88×10−28 |
| CDCA3 | 1.018 | 3.84×10−23 | 4.22×10−21 |
| CCNE1 | 0.964 | 4.97×10−19 | 3.47×10−17 |
| TNFRSF4 | 0.949 | 8.28×10−19 | 5.59×10−17 |
| COL15A1 | 0.942 | 2.28×10−25 | 3.00×10−23 |
| CELSR3 | 0.921 | 1.59×10−19 | 1.16×10−17 |
| CEP55 | 0.920 | 1.99×10−17 | 1.19×10−15 |
| CDT1 | 0.860 | 4.43×10−20 | 3.41×10−18 |
| C16orf59 | 0.846 | 1.13×10−16 | 6.18×10−15 |
| MUC13 | 0.800 | 1.07×10−19 | 7.93×10−18 |
| GPC3 | 0.778 | 9.91×10−26 | 1.36×10−23 |
| MMP11 | 0.723 | 2.34×10−16 | 1.24×10−14 |
| IGF2BP3 | 0.702 | 2.73×10−12 | 1.04×10−10 |
| STC2 | 0.687 | 1.29×10−13 | 5.67×10−12 |
| RAD51AP1 | 0.672 | 3.06×10−11 | 1.05×10−9 |
| CCNB1 | 0.653 | 1.50×10−14 | 7.08×10−13 |
| STIL | 0.622 | 1.95×10−10 | 6.11×10−9 |
| CA12 | 0.615 | 9.69×10−11 | 3.15×10−9 |
| PNMA3 | 0.550 | 8.68×10−7 | 1.55×10−5 |
| POLE2 | 0.540 | 4.99×10−8 | 1.07×10−6 |
| FOXD2 | 0.534 | 3.59×10−7 | 6.83×10−6 |
| MSX1 | 0.531 | 4.53×10−7 | 8.45×10−6 |
| ITPKA | 0.525 | 1.77×10−8 | 4.16×10−7 |
| ACTG2 | 0.525 | 3.01×10−8 | 6.78×10−7 |
| AKR1B10 | 0.514 | 7.73×10−13 | 3.14×10−11 |
| CYP4A22 | −0.492 | 2.25×10−12 | 8.69×10−11 |
| TMEM82 | −0.519 | 5.69×10−13 | 2.36×10−11 |
| GLYAT | −0.536 | 1.39×10−16 | 7.57×10−15 |
| GBA3 | −0.546 | 1.40×10−15 | 7.17×10−14 |
| APOF | −0.555 | 1.52×10−18 | 1.00×10−16 |
| SLC22A1 | −0.565 | 1.50×10−20 | 1.23×10−18 |
| DNASE1L3 | −0.580 | 4.07×10−17 | 2.31×10−15 |
| ECM1 | −0.591 | 7.68×10−17 | 4.26×10−15 |
| CETP | −0.602 | 1.68×10−15 | 8.57×10−14 |
| GLS2 | −0.603 | 6.99×10−20 | 5.35×10−18 |
| GREM2 | −0.614 | 1.99×10−16 | 1.06×10−14 |
| AADAT | −0.615 | 2.46×10−17 | 1.42×10−15 |
| MME | −0.616 | 6.41×10−17 | 3.57×10−15 |
| SRD5A2 | −0.650 | 5.30×10−19 | 3.64×10−17 |
| C9 | −0.655 | 7.70×10−28 | 1.33×10−25 |
| CLRN3 | −0.701 | 2.09×10−20 | 1.68×10−18 |
| IGFALS | −0.716 | 7.36×10−26 | 1.03×10−23 |
| TMEM27 | −0.730 | 5.13×10−21 | 4.37×10−19 |
| ASPG | −0.735 | 1.68×10−25 | 2.26×10−23 |
| SRPX | −0.763 | 7.97×10−21 | 6.69×10−19 |
| MYOM2 | −0.784 | 3.33×10−18 | 2.13×10−16 |
| MT1F | −0.846 | 1.21×10−34 | 5.04×10−32 |
| PTH1R | −0.866 | 3.18×10−28 | 5.82×10−26 |
| FCN3 | −0.978 | 1.24×10−42 | 9.37×10−40 |
| HAMP | −1.052 | 2.53×10−55 | 2.78×10−52 |
| MT1H | −1.425 | 5.74×10−76 | 6.94×10−72 |
logFC, log2(fold change); FDR, false discovery rate; SPC24, SPC24 component of NDC80 kinetochore complex; TGM3, transglutaminase 3; KIF20A, kinesin family member 20A; ESM1, endothelial cell specific molecule 1; CDC20, cell division cycle 20; CDCA3, cell division cycle associated 3; CCNE1, cyclin E1; TNFRSF4, umor necrosis factor receptor superfamily, member 4; COL15A1, collagen type XV alpha 1 chain; CELSR3, cadherin EGF LAG seven-pass G-type receptor 3; CEP55, centrosomal protein 55; CDT1, chromatin licensing and DNA replication factor 1; C16orf59/TEDC2, tubulin epsilon and delta complex 2; MUC13, mucin 13, cell surface associated; GPC3, glypican 3; MMP11, matrix metallopeptidase 11; IGF2BP3, nsulin like growth factor 2 mRNA binding protein 3; STC2, stanniocalcin 2; RAD51AP1, RAD51 associated protein 1; CCNB1, cyclin B1; STIL, STIL centriolar assembly protein; CA12, carbonic anhydrase 12; PNMA3, PNMA family member 3; POLE2, DNA polymerase epsilon 2, accessory subunit; FOXD2, forkhead box D2; MSX1, msh homeobox 1; ITPKA, inositol-trisphosphate 3-kinase A; ACTG2, actin gamma 2, smooth muscle; AKR1B10, aldo-keto reductase family 1 member B10; CYP4A22, cytochrome P450 family 4 subfamily A member 22; TMEM82, transmembrane protein 82; GLYAT, glycine-N-acyltransferase; GBA3, glucosylceramidase beta 3; APOF, apolipoprotein F; SLC22A1, solute carrier family 22 member 1; DNASE1L3, deoxyribonuclease 1 like 3; ECM1, extracellular matrix protein 1; CETP, cholesteryl ester transfer protein; GLS2, glutaminase 2; GREM2, gremlin 2, DAN family BMP antagonist; AADAT, aminoadipate aminotransferase; MME, membrane metalloendopeptidase; SRD5A2, steroid 5 alpha-reductase 2; C9, complement C9; CLRN3, clarin 3; IGFALS, insulin like growth factor binding protein acid labile subunit; TMEM27/CLTRN, collectrin, amino acid transport regulator; ASPG, asparaginase; SRPX, sushi repeat containing protein X-linked; MYOM2, myomesin 2; MT1F, metallothionein 1F; PTH1R, parathyroid hormone 1 receptor; FCN3, ficolin 3; HAMP, hepcidin antimicrobial peptide; MT1H, metallothionein 1H.