| Literature DB >> 29312619 |
Nguyen Phuoc Long1, Kyung Hee Jung2, Sang Jun Yoon1, Nguyen Hoang Anh3, Tran Diem Nghi3, Yun Pyo Kang1, Hong Hua Yan2, Jung Eun Min1, Soon-Sun Hong2, Sung Won Kwon1,4.
Abstract
Although many outstanding achievements in the management of cervical cancer (CxCa) have obtained, it still imposes a major burden which has prompted scientists to discover and validate new CxCa biomarkers to improve the diagnostic and prognostic assessment of CxCa. In this study, eight different gene expression data sets containing 202 cancer, 115 cervical intraepithelial neoplasia (CIN), and 105 normal samples were utilized for an integrative systems biology assessment in a multi-stage carcinogenesis manner. Deep learning-based diagnostic models were established based on the genetic panels of intrinsic genes of cervical carcinogenesis as well as on the unbiased variable selection approach. Survival analysis was also conducted to explore the potential biomarker candidates for prognostic assessment. Our results showed that cell cycle, RNA transport, mRNA surveillance, and one carbon pool by folate were the key regulatory mechanisms involved in the initiation, progression, and metastasis of CxCa. Various genetic panels combined with machine learning algorithms successfully differentiated CxCa from CIN and normalcy in cross-study normalized data sets. In particular, the 168-gene deep learning model for the differentiation of cancer from normalcy achieved an externally validated accuracy of 97.96% (99.01% sensitivity and 95.65% specificity). Survival analysis revealed that ZNF281 and EPHB6 were the two most promising prognostic genetic markers for CxCa among others. Our findings open new opportunities to enhance current understanding of the characteristics of CxCa pathobiology. In addition, the combination of transcriptomics-based signatures and deep learning classification may become an important approach to improve CxCa diagnosis and management in clinical practice.Entities:
Keywords: cervical cancer; deep learning; meta-analysis; survival analysis; transcriptomics
Year: 2017 PMID: 29312619 PMCID: PMC5752532 DOI: 10.18632/oncotarget.22689
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Overview of the study flow
(a) Flow diagram for data set selection. (b) The workflow of data processing and analysis.
Microarray data sets in the meta-analysis of CxCa
| Comparison | Author | Data set | Year | Platform1 | Country | Participants | ||
|---|---|---|---|---|---|---|---|---|
| Normalcy | CIN(s) | Cancer | ||||||
| Martinez IM | GSE52903 | 2015 | 1.0 ST | Mexico | 17 | - | 55 | |
| den Boon JA | GSE63514 | 2015 | U133 Plus 2.0 | USA | 24 | - | 28 | |
| Polyzos A | GSE63678 | 2015 | U133A 2.0 | Greece | 5 | - | 5 | |
| Yan R | GSE42764 | 2014 | U133 Plus 2.0 | Canada | 2 | - | 12 | |
| Karagavriilidou K | GSE27678 | 2013 | U133 Plus 2.0 | UK | 3 | - | 28 | |
| Murty VV | GSE9750 | 2008 | U133A | USA | 24 | - | 33 | |
| Zhai Y | GSE7803 | 2007 | U133A | USA | 10 | - | 21 | |
| Pyeon D | GSE6791 | 2007 | U133 Plus 2.0 | USA | 8 | - | 20 | |
| den Boon JA | GSE63514 | 2015 | U133 Plus 2.0 | USA | 24 | 76 | - | |
| Karagavriilidou K | GSE27678 | 2013 | U133A | UK | 12 | 32 | - | |
| Zhai Y | GSE7803 | 2007 | U133A | USA | 10 | 7 | - | |
| den Boon JA | GSE63514 | 2015 | U133 Plus 2.0 | USA | - | 76 | 28 | |
| Zhai Y | GSE7803 | 2007 | U133A | USA | - | 7 | 21 | |
1 All included data sets belong to Affymetrix platform.
2 GSE27678 contains two different platform (U133 Plus 2.0 and U133A).
3 Cervical intraepithelial neoplasia or cervical dysplasia.
Some representative up- and downregulated pathways in the differential gene expression meta-analysis
| Pathway | Number of gene | Gene symbols | P-value | FDR | |
|---|---|---|---|---|---|
| Upregulated pathway | |||||
| Cell cycle | 36 | CDC6, MCM5, MCM3, ORC2, CDC7, RBL1, YWHAH, CCNB1, CDK4, SMAD2, CCNE1, PCNA, CDKN2C, E2F3, EP300, MCM6, MCM2, STAG2, CDK2, CCNA2, CCNB2, MAD2L1, RAD21, CDC23, ORC5, CDC25A, CDC20, CDC25C, SMC1A, TFDP2, PTTG1, E2F1, SMC3, BUB3, HDAC1, HDAC2 | 1.43E-13 | 3.10E-11 | |
| RNA transport | 26 | NXT2, NUP107, NUP155, EIF2S3, NXT1, EIF2S1, NUPL2, NUP133, NUP153, UPF1, EIF2B2, NUP205, PAIP1, GEMIN2, NUP93, SUMO4, MAGOHB, NUP43, UPF2, MAGOH, EIF2S2, NCBP1, NUP160, NUP58, XPO1, EIF2B1 | 8.69E-7 | 9.43E-5 | |
| mRNA surveillance pathway | 18 | PABPN1, NXT2, NXT1, UPF1, PPP2R3A, PELO, NUDT21, MAGOHB, PPP2R5E, CPSF7, UPF2, SMG5, MAGOH, CSTF2, NCBP1, CPSF6, GSPT1, SMG1 | 1.81E-5 | 1.31E-3 | |
| Cell cycle | 31 | CDC6, MCM5, MCM3, CDC7, CCNB1, HDAC2, CDK4, CCNE1, CDKN2C, E2F3, MCM6, MCM2, CCNA2, CCNB2, MAD2L1, RAD21, ORC5, CDC25A, CDC20, PRKDC, CDC25C, SMC1A, PTTG1, SMC3, BUB3, HDAC1, RBL1, PCNA, STAG1, CDC23, TFDP2 | 1.21E-20 | 2.62E-18 | |
| One carbon pool by folate | 6 | TYMS, SHMT2, MTR, GART, MTHFD2, DHFR | 1.61E-5 | 1.75E-3 | |
| RNA transport | 14 | NXT2, NUP107, NUP155, NUPL2, NUP133, PAIP1, MAGOHB, NUP43, UPF2, MAGOH, EIF2S2, NCBP1, NUP160, XPO1 | 3.02E-5 | 2.18E-3 | |
| Cell cycle | 14 | MCM3, ORC2, CREBBP, E2F3, CDK2, MAD2L1, RAD21, PRKDC, E2F1, BUB3, ABL1, RBL1, CDC23, WEE1 | 8.60E-7 | 1.87E-4 | |
| mRNA surveillance pathway | 9 | MAGOHB, PPP2R5E, CPSF7, UPF2, NCBP1, CPSF6, GSPT1, PPP2R2D, SMG1 | 1.10E-4 | 1.20E-2 | |
| Pathways in cancer | 16 | CREBBP, E2F3, CBL, CDK2, TCF7L1, FZD7, MAP2K1, ITGB1, CXCL8, WNT11, E2F1, CRKL, ABL1, STK4, ETS1, PIAS2 | 2.16E-3 | 1.42E-1 | |
| Phototransduction | 8 | RCVRN, GUCA1B, GNAT1, GNAT2, GUCY2D, RHO, CALML3, ARRB1 | 2.37E-4 | 3.18E-2 | |
| Focal adhesion | 26 | PIK3R2, HGF, VTN, CCND1, PDGFRA, CCND2, VWF, PDGFRB, MAPK3, KDR, SHC2, COL5A3, ITGB5, VEGFD, ILK, PARVA, CAV1, CAV3, ACTN2, JUN, SHC3, ITGA8, PDGFD, RELN, LAMA2, ITGA7 | 2.93E-4 | 3.18E-2 | |
| Calcium signaling pathway | 23 | BDKRB1, PDGFRA, HTR2B, PDGFRB, HTR5A, TACR3, TACR1, ADRB2, CALML3, ATP2B2, GNAL, CACNA1H, CCKBR, ERBB4, ADCY2, TNNC2, GNA14, EDNRB, HTR2A, ADRA1D, GRPR, ADRA1A, ITPR2 | 6.62E-4 | 4.79E-2 | |
| Circadian rhythm - mammal | 4 | CLOCK, PER1, NPAS2, CNSK1E | 6.31E-4 | 1.37E-1 | |
| Amoebiasis | 7 | HSPB1, GNA15, SERPINB3, IL1R2, GNAL, SERPINB4, RELA | 4.45E-5 | 9.66E-3 | |
Figure 2Genetic panel of 168 consistently dysregulated genes
(a) Overlapping genes in the consistently upregulated group. (b) Overlapping genes in the consistently downregulated group. (c) Protein-protein network of 113 consistently upregulated genes. (d) Protein-protein network of 55 consistently downregulated genes. Red, green, blue, purple, yellow, light blue, and black lines indicate the presence of fusion, neighborhood, co-occurrence, experimental, text mining, database, and co-expression evidence, respectively.
The genetic panel for deep learning classification adapted from consistently upregulated and downregulated genes
| Number of gene | Gene symbols | |
|---|---|---|
| 113 | FOXM1, RFC4, ORC6, NCAPG2, KIF23, TRIP13, CENPN, KIF14, TYMS, AARS, DONSON, E2F3, RNPS1, BRIX1, TMEM194A, MELK, MCM3, MOCOS, BUB1, ECT2, CDC25B, DEPDC1, FBXO5, POLR2H, RAD1, CDK1, RAD54L, BUB3, SUPV3L1, DBF4, NETO2, CENPI, SLC7A6, CMC2, MEST, BLM, ZNF281, ACOT9, MAD2L1, DDX11, PDIA5, ELAVL1, NUP155, RUVBL1, WARS, SS18L1, MTHFD2, LRRC8D, MSH6, IMMT, LHX2, RAD21, EIF2S2, MDC1, NUP85, C5orf22, CDC23, UCK2, PTDSS1, UBA2, ZNF473, BRCA1, TMEM22, ADAR, NUP160, ACP1, WBP11, DIEXF, C9orf91, NVL, PTPLAD1, C3orf37, WASF1, HPRT1, ACTL6A, PPAT, DKC1, POGK, MTFR1, ELF4, HAUS6, TEX10, USP18, PRKCI, TNPO1, ARL6IP1, KIAA1598, NFATC2IP, KIAA0947, PARP12, NUCKS1, PARPBP, TAF5, CRKL, GOLT1B, CEP152, SLC25A17, HSP90AA1, HERC5, NSL1, FN3KRP, IFI30, LSG1, PALB2, MTMR4, PSMA6, SFMBT1, TTC13, DAP3, TRIM45, OSBPL11, COG2, NOL11 | |
| 55 | SYNGR1, CFD, C9orf125, TTC39A, BBOX1, CXCR2, CRYL1, HPGD, HEBP2, MAL, FHIT, EDN3, NDST2, ABR, UPK1A, SOSTDC1, ITPR2, CAB39L, ALOX12, FUT6, TP53AIP1, CD24, MREG, FGFR2, PLD2, EPHB6, ACAA1, CWH43, CA12, ZNF91, IL17RC, TBX3, RAPGEF3, PACRG, ECHDC2, ZC4H2, ASAP3, EPS8L1, ZNF426, ALDH2, C5orf4, MAPK10, SLC24A3, GYS2, PPP1R3C, PADI1, DEFB4A, RGS12, ENDOD1, GULP1, TMEM8B, PGAP3, TRIM13, CLN8, PLBD1 |
Reported genes in the two previous literatures on CxCa and their related functions
| Article | Gene code | Full name | Effect sizes | Function related to cervical cancer | ||
|---|---|---|---|---|---|---|
| Cancer-Normalcy | CIN-Normalcy | Cancer-CIN | ||||
| BRCA1, DNA repair associated | 2.04 | 1.42 | 0.66 | Hypermethylation of the BRCA1 promoter was observed in advanced stage invasive cervical cancer patients [ | ||
| BUB1 mitotic checkpoint serine/threonine kinase | 1.71 | 0.81 | 0.78 | BUB1 has not been described to be associated with cervical cancer. | ||
| Cyclin dependent kinase 1 | 2.42 | 1.52 | 0.65 | Cyclin B1, a regulatory subunit of CDK1 and a crucial protein for the transition from G2 phase to mitosis of the cell cycle, is found to be overexpressed in invasive cervical cancer cells [ | ||
| DBF4 zinc finger | 1.99 | 1.34 | 0.91 | DBF4 has not been described to be associated with cervical cancer. | ||
| Epithelial cell transforming 2 | 3.10 | 1.46 | 1.40 | The high expression of ECT2 in the region 3q may be implicated in cervical oncogenesis [ | ||
| F-box protein 5 | 1.89 | 1.35 | 0.75 | A study suggested that the differential regulation of miR-654-3p on FBXO5 may enforce cell cycle progression and cause genomic instability in CIN III stage [ | ||
| Forkhead box M1 | 2.78 | 1.25 | 1.19 | High levels of FOXM1 expression were observed in cervical cancer. Its overexpression was correlated with tumor aggressiveness and the presence of cell proliferation indicator Ki67 [ | ||
| Kinesin family member 14 | 2.91 | 1.18 | 1.05 | A study reported the high levels of KIF14 expression in cervical carcinoma cell line C-33A [ | ||
| Kinesin family member 23 | 2.38 | 1.39 | 0.83 | A research showed an increase of KIF23 levels in preinvasive CIN 1 and invasive cervical cancer [ | ||
| MAD2 mitotic arrest deficient-like 1 (yeast) | 1.82 | 1.11 | 0.56 | A study suggested that the significant overexpression of MAD2L1 in HSILs and SCCs may be involved in the cervical carcinogenesis [ | ||
| Maternal embryonic leucine zipper kinase | 2.97 | 1.32 | 1.26 | The high MELK expression was associated with poor prognosis and advanced tumor stage (CIN3/CIS and invasive cancer) [ | ||
| Neuropilin and tolloid like 2 | 1.77 | 1.08 | 0.68 | A research showed that the NETO2 mRNA level was increased in 50% of cervical cancer samples [ | ||
| Dyskerin pseudouridine synthase 1 | 1.63 | 0.66 | 0.60 | DKC1 has not been described to be associated with cervical cancer. | ||
| Large 60S subunit nuclear export GTPase 1 | 1.62 | 0.66 | 1.11 | LSG1 has not been described to be associated with cervical cancer. | ||
| Nucleoporin 155 | 1.77 | 0.93 | 0.90 | NUP155 has not been described to be associated with cervical cancer. | ||
| RNA polymerase II subunit H | 1.98 | 1.36 | 1.28 | POLR2H has not been described to be associated with cervical cancer. | ||
| Protein kinase C iota | 1.75 | 0.85 | 0.95 | PRKCI overexpression was frequently observed in cervical squamous cell carcinoma [ | ||
| RAD1 checkpoint DNA exonuclease | 1.34 | 1.26 | 0.54 | A research suggested that the mutation in pathways containing RAD1 may predispose to cervical cancer transition [ | ||
| Replication factor C subunit 4 | 3.03 | 1.55 | 1.23 | The upregulation of RFC4 was observed in cervical cancer [ | ||
| Zinc finger protein 473 | 1.28 | 0.90 | 0.62 | ZNF473 has not been described to be associated with cervical cancer. | ||
* The gene of which its expression pattern is associated with the prognosis in TCGA cohort (P-value < 0.05).
Figure 3PCA visualization of the comparison groups
(a) PCA plot of Cancer versus Normalcy. (b) PCA plot of CIN versus Normalcy. (c) PCA plot of Cancer versus CIN.
Figure 4ROC curve for the illustration of the diagnostic ability of the panel
(a) Cancer versus Normalcy, (b) Cancer versus CIN, and (c) CIN versus Normalcy.
Sensitivity and Specificity of deep learning classifier of three comparison groups
| Group | Parameters | 10-fold cross-validation | Test set |
|---|---|---|---|
| Cancer versus Normalcy | |||
| Accuracy (%) | 97.97 | 97.96 | |
| Sensitivity (%) | 97.03 | 99.01 | |
| Specificity (%) | 100.00 | 95.65 | |
| Cancer versus CIN | |||
| Accuracy (%) | 84.04 | 84.21 | |
| Sensitivity (%) | 71.43 | 85.71 | |
| Specificity (%) | 91.53 | 83.33 | |
| CIN versus Normalcy | |||
| Accuracy (%) | 90.35 | 91.49 | |
| Sensitivity (%) | 91.36 | 91.18 | |
| Specificity (%) | 87.88 | 92.31 | |
| Cancer versus Normalcy | |||
| Accuracy (%) | 97.30 | 97.28 | |
| Sensitivity (%) | 98.02 | 99.01 | |
| Specificity (%) | 95.74 | 93.48 | |
| Cancer versus CIN | |||
| Accuracy (%) | 92.55 | 86.84 | |
| Sensitivity (%) | 91.43 | 85.71 | |
| Specificity (%) | 93.22 | 87.50 | |
| CIN versus Normalcy | |||
| Accuracy (%) | 89.47 | 82.98 | |
| Sensitivity (%) | 90.12 | 88.24 | |
| Specificity (%) | 87.88 | 69.23 | |
Figure 5Kaplan-Meier plots of nine selected genes
(a) ZNF281, (b) DIEXF, (c) POGK, (d) TNPO1, (e) GOLT1B, (f) COG2, (g) EPHB6, (h) FGFR2, and (i) SYNGR1.
Figure 6Protein expression level of ZNF281 in cancer samples
(a) The representative staining scores (0-3) of ZNF281 in cancer tissues. (b) The staining score of ZNF281 is significantly higher in CxCa than in normal controls.****, P-value < 0.0001.