| Literature DB >> 29328363 |
Jiajun Zhi1, Jiwei Sun1, Zhongchuan Wang1, Wenjun Ding1.
Abstract
Colorectal cancer (CRC) is one of the most common cancers and a major cause of mortality. The present study aimed to identify potential biomarkers for CRC metastasis and uncover the mechanisms underlying the etiology of the disease. The five datasets GSE68468, GSE62321, GSE22834, GSE14297 and GSE6988 were utilized in the study, all of which contained metastatic and non-metastatic CRC samples. Among them, three datasets were integrated via meta-analysis to identify the differentially expressed genes (DEGs) between the two types of samples. A protein-protein interaction (PPI) network was constructed for these DEGs. Candidate genes were then selected by the support vector machine (SVM) classifier based on the betweenness centrality (BC) algorithm. A CRC dataset from The Cancer Genome Atlas database was used to evaluate the accuracy of the SVM classifier. Pathway enrichment analysis was carried out for the SVM-classified gene signatures. In total, 358 DEGs were identified by meta‑analysis. The top ten nodes in the PPI network with the highest BC values were selected, including cAMP responsive element binding protein 1 (CREB1), cullin 7 (CUL7) and signal sequence receptor 3 (SSR3). The optimal SVM classification model was established, which was able to precisely distinguish between the metastatic and non-metastatic samples. Based on this SVM classifier, 40 signature genes were identified, which were mainly enriched in protein processing in endoplasmic reticulum (e.g., SSR3), AMPK signaling pathway (e.g., CREB1) and ubiquitin mediated proteolysis (e.g., FBXO2, CUL7 and UBE2D3) pathways. In conclusion, the SVM-classified genes, including CREB1, CUL7 and SSR3, precisely distinguished the metastatic CRC samples from the non-metastatic ones. These genes have the potential to be used as biomarkers for the prognosis of metastatic CRC.Entities:
Mesh:
Year: 2018 PMID: 29328363 PMCID: PMC5819940 DOI: 10.3892/ijmm.2018.3359
Source DB: PubMed Journal: Int J Mol Med ISSN: 1107-3756 Impact factor: 4.101
Quality control results of the five datasets.
| Dataset | IQC | EQC | CQCg | CQCp | AQCg | AQCp | SMR |
|---|---|---|---|---|---|---|---|
| GSE68468 | 5.19 | 3.28 | 69.15 | 103.59 | 27.46 | 56.31 | 2.13 |
| GSE62321 | 3.76 | 3.15 | 56.7 | 148.66 | 33.78 | 47.61 | 3.59 |
| GSE22834 | 0.21 | 0.67 | 0.01 | 0.27 | 0.83 | 1.98 | 13.87 |
| GSE14297 | 7.65 | 4.32 | 1.92 | 59.62 | 21.19 | 2.39 | 6.02 |
| GSE6988 | 0.03 | 1.19 | 0.86 | 0.53 | 1.73 | 1.96 | 8.62 |
IQC, internal quality control; EQC, external quality control; CQCg, consistency quality control of gene; CQCp, consistency quality control of pathway; AQCg, accuracy quality control of gene; AQCp, accuracy quality control of pathway; SMR, standardized mean rank.
Figure 1Quality control of the five datasets via MetaQC. Numbers 1–5 denote the five datasets. IQC, internal quality control; EQC, external quality control; CQCg, consistency quality control of gene; CQCp, consistency quality control of pathway; AQCg, accuracy quality control of gene; AQCp, accuracy quality control of pathway.
Top 10 differentially expressed genes identified via meta-analysis of the three integrated datasets.
| Gene | P-value | FDR | Q | Qp | τ2 |
|---|---|---|---|---|---|
| 1.00×10−20 | 3.45×10−18 | 1.7104 | 0.4252 | 0 | |
| 1.00×10−20 | 3.45×10−18 | 0.9410 | 0.6247 | 0 | |
| 1.00×10−20 | 3.45×10−18 | 0.9375 | 0.6258 | 0 | |
| 1.00×10−20 | 3.45×10−18 | 0.7498 | 0.6874 | 0 | |
| 1.00×10−20 | 3.45×10−18 | 0.7372 | 0.6917 | 0 | |
| 1.00×10−20 | 3.45×10−18 | 0.6972 | 0.7057 | 0 | |
| 1.00×10−20 | 3.45×10−18 | 0.4327 | 0.8054 | 0 | |
| 1.00×10−20 | 3.45×10−18 | 0.2751 | 0.8715 | 0 | |
| 3.62×10−6 | 7.69×10−4 | 1.9948 | 0.3688 | 0 | |
| 3.62×10−6 | 7.69×10−4 | 1.8035 | 0.4059 | 0 |
FDR, false discovery rate; MCF2L, MCF.2 cell line derived transforming sequence like; TCF21, transcription factor 21; FGD6, FYVE, RhoGEF and PH domain containing 6; MED28, mediator complex subunit 28; PRDM1, PR/SET domain 1; TMED10, transmembrane p24 trafficking protein 10; F5, coagulation factor 5; NUMA1, nuclear mitotic apparatus protein 1; ELOVL6, ELOVL fatty acid elongase 6; DLD, dihydrolipoamide dehydrogenase.
Figure 2Heat map of the gene expression of the 358 differentially expressed genes in metastatic and non-metastatic colon cancer samples. Red indicates high expression and green indicates low expression, yellow represents metastatic samples and blue represents non-metastatic samples.
Figure 3Protein-protein interaction network of the differentially expressed genes. Orange indicates upregulated genes and blue represents downregulated genes in metastatic compared with non-metastatic colon cancer samples. Lines between two nodes represent interactions between them.
Top 10 differentially expressed genes ranked by their betweenness centrality value.
| Gene | BC | Exp | Degree | P-value | FDR | Q | Qp | τ2 |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 2 | 1.41×10−2 | 0.1337 | 0.1198 | 0.9418 | 0 | |
| 1 | 0 | 2 | 6.28×10−3 | 0.0845 | 0.8227 | 0.6627 | 0 | |
| 1 | 0 | 4 | 2.44×10−2 | 0.1812 | 0.6522 | 0.7217 | 0 | |
| 1 | 0 | 2 | 7.17×10−4 | 0.0236 | 0.8618 | 0.6499 | 0 | |
| 0.7 | 0 | 3 | 3.82×10−2 | 0.2279 | 0.0720 | 0.9646 | 0 | |
| 0.6667 | 1 | 2 | 3.26×10−5 | 0.0030 | 1.6994 | 0.4275 | 0 | |
| 0.6667 | 0 | 3 | 3.02×10−2 | 0.1990 | 0.4374 | 0.8036 | 0 | |
| 0.6 | 0 | 2 | 2.54×10−5 | 0.0028 | 1.6978 | 0.4279 | 0 | |
| 0.4595 | 1 | 16 | 6.92×10−4 | 0.0234 | 1.0330 | 0.5966 | 0 | |
| 0.4 | 0 | 2 | 1.04×10−3 | 0.0291 | 1.5003 | 0.4723 | 0 |
BC, betweenness centrality; FDR, false discovery rate; BCOR, BCL6 corepressor; COPB2, coatomer protein complex subunit β 2; CREB1, cAMP responsive element binding protein 1; MYH11, myosin heavy chain 11; FAM3C, family with sequence similarity 3 member C; INADL, InaD-like; RAB32, RAB32, member RAS oncogene family; TOMM22, translocation of outer mitochondrial membrane 22; CUL7, culin 7; SSR3, signal sequence receptor 3.
Figure 4SVM classification and the performance evaluation result. (A) Accurate and error ratios of different training SVM classifications based on different signature genes. Red denotes error ratio and blue represents accurate ratio. (B) Scattergram based on SVM classification on different kinds of samples. Orange represents non-metastatic samples and blue represents metastatic samples. SVM, support vector machine.
Figure 5Scattergram based on support vector machine classification of different samples in two datasets. (A) GSE62321 and (B) GSE14297 datasets. Orange represents non-metastatic samples and blue represents metastatic samples.
Figure 6Receiver operating characteristic curve of support vector machine classification on individual validation datasets. AUC, area under the curve.
Pathway enrichment results of the crucial 40 genes.
| Term | ID | Count | P-value | Genes |
|---|---|---|---|---|
| Protein processing in ER | hsa04141 | 5 | 0.0089 | FBXO2, DNAJC10, SSR3, CUL1, UBE2D3 |
| AMPK signaling pathway | hsa04152 | 4 | 0.0144 | PRKAB2, PFKP, PRKAA1, CREB1 |
| Dorso-ventral axis formation | hsa04320 | 2 | 0.0188 | MAPK1, NOTCH1 |
| Ubiquitin mediated proteolysis | hsa04120 | 4 | 0.0199 | FBXO2, CUL1, CUL7, UBE2D3 |
| Prion diseases | hsa05020 | 2 | 0.0313 | MAPK1, NOTCH1 |
ER, endoplasmic reticulum; AMPK, AMP-activated protein kinase; FBXO2, F-box protein 2; DNAJC10, DnaJ heat shock protein family (Hsp40) member C10; SSR3, signal sequence receptor 3; CUL1, cullin 1; UBE2D3, ubiquitin conjugating enzyme E2 D3; PRKAB2, protein kinase AMP-activated non-catalytic subunit β 2; PFKP, phosphofructokinase, platelet; PRKAA1, protein kinase AMP-activated catalytic subunit α 1; CREB1, cAMP responsive element binding protein 1; MAPK1, mitogen-activated protein kinase 1; NOTCH1, notch 1; CUL7, cullin 7: