| Literature DB >> 33194391 |
Abstract
A comprehensive meta-analysis of publicly available gene expression microarray data obtained from human-derived pancreatic ductal adenocarcinoma (PDAC) tissues and their histologically matched adjacent tissue samples was performed to provide diagnostic and prognostic biomarkers, and molecular targets for PDAC. An integrative meta-analysis of four submissions (GSE62452, GSE15471, GSE62165, and GSE56560) containing 105 eligible tumor-adjacent tissue pairs revealed 344 differentially over-expressed and 168 repressed genes in PDAC compared to the adjacent-to-tumor samples. The validation analysis using TCGA combined GTEx data confirmed 98.24% of the identified up-regulated and 73.88% of the down-regulated protein-coding genes in PDAC. Pathway enrichment analysis showed that "ECM-receptor interaction", "PI3K-Akt signaling pathway", and "focal adhesion" are the most enriched KEGG pathways in PDAC. Protein-protein interaction analysis identified FN1, TIMP1, and MSLN as the most highly ranked hub genes among the DEGs. Transcription factor enrichment analysis revealed that TCF7, CTNNB1, SMAD3, and JUN are significantly activated in PDAC, while SMAD7 is inhibited. The prognostic significance of the identified and validated differentially expressed genes in PDAC was evaluated via survival analysis of TCGA Pan-Cancer pancreatic ductal adenocarcinoma data. The identified candidate prognostic biomarkers were then validated in four external validation datasets (GSE21501, GSE50827, GSE57495, and GSE71729) to further improve reliability. A total of 28 up-regulated genes were found to be significantly correlated with worse overall survival in patients with PDAC. Twenty-one of the identified prognostic genes (ITGB6, LAMC2, KRT7, SERPINB5, IGF2BP3, IL1RN, MPZL2, SFTA2, MET, LAMA3, ARNTL2, SLC2A1, LAMB3, COL17A1, EPSTI1, IL1RAP, AK4, ANXA2, S100A16, KRT19, and GPRC5A) were also found to be significantly correlated with the pathological stages of the disease. The results of this study provided promising prognostic biomarkers that have the potential to differentiate PDAC from both healthy and adjacent-to-tumor pancreatic tissues. Several novel dysregulated genes merit further study as potentially promising candidates for the development of more effective treatment strategies for PDAC.Entities:
Keywords: Biomarker; Gene expression; Gene expression omnibus; Microarray; Pancreatic ductal adenocarcinoma
Year: 2020 PMID: 33194391 PMCID: PMC7597628 DOI: 10.7717/peerj.10141
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1The workflow of the meta-analysis and summary of the results.
Eligible public datasets used in the meta-analysis.
| Public datasets | Array platform | Number of sample pairs | PMID |
|---|---|---|---|
| [HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array | 59 | 27197190 | |
| [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array | 35 | 19260470 | |
| [HG-U219] Affymetrix Human Genome U219 Array | 9 | 27520560 | |
| [HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array | 2 | 25587357 |
Note:
The number of sample pairs after exclusion of non-confirmed sample pairs and samples of low quality.
Ranked list of the top 10 up and down-regulated genes in pancreatic ductal adenocarcinoma.
| Gene symbol | Gene name | Logratio combined | Fold change | ||
|---|---|---|---|---|---|
| POSTN | 1.0004 | 10.009 | 0 | 0 | |
| CEACAM5 | 0.8855 | 7.683 | 0 | 0 | |
| SLC6A14 | 0.8611 | 7.263 | 0 | 0 | |
| CEACAM6 | 0.8452 | 7.002 | 0 | 0 | |
| SULF1 | 0.8347 | 6.835 | 0 | 0 | |
| LAMC2 | 0.8279 | 6.728 | 0 | 0 | |
| FN1 | 0.8083 | 6.432 | 0 | 0 | |
| COL11A1 | 0.7918 | 6.191 | 0 | 0 | |
| INHBA | 0.7713 | 5.907 | 0 | 0 | |
| VCAN | 0.7644 | 5.813 | 0 | 0 | |
| ALB | -0.8658 | −7.342 | 0 | 0 | |
| SERPINI2 | −0.783 | −6.068 | 8.88E−16 | 4.96E−14 | |
| PNLIPRP1 | −0.7683 | −5.866 | 9.40E−12 | 4.71E−10 | |
| ERP27 | −0.7367 | −5.454 | 6.66E−16 | 3.75E−14 | |
| PNLIPRP2 | −0.7359 | −5.444 | 3.66E−12 | 1.85E−10 | |
| CTRL | −0.7199 | −5.247 | 3.55E−15 | 1.94E−13 | |
| PDIA2 | −0.7025 | −5.041 | 0 | 0 | |
| GP2 | −0.7005 | −5.018 | 3.06E−11 | 1.51E−09 | |
| CELA2B | −0.685 | −4.842 | 7.62E−12 | 3.83E−10 | |
| IAPP | −0.6768 | −4.751 | 8.88E−16 | 4.96E−14 |
The ranked prognostic gene list for PDAC and the results of the Kaplan–Meier survival analysis in five datasets.
| Gene symbol | Validation datasets | Val. Stat. | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TCGA | Average | |||||||||||||||
| HR | HR | HR | HR | HR | HR | |||||||||||
| 1 | ITGB6 | 2.59 | 4.90E−06 | 1.39 | 0.001 | NS | 1.54 | 0.002 | 1.30 | 0.01 | 3 | 3.38E−03 | 1.7050 | |||
| 2 | LAMC2 | 3.06 | 1.10E−04 | 1.53 | 0.003 | NS | 1.33 | 0.03 | 1.24 | 0.01 | 3 | 1.06E−02 | 1.7900 | |||
| 3 | KRT7 | 3.12 | 7.20E−05 | 1.29 | 0.02 | NS | 1.46 | 0.004 | NS | 2 | 9.16E−03 | 1.9567 | ||||
| 4 | SERPINB5 | 2.89 | 5.40E−06 | 1.37 | 0.01 | NS | 1.23 | 0.03 | NS | 2 | 1.18E−02 | 1.8300 | ||||
| 5 | IGF2BP3 | 3.55 | 1.00E−05 | 1.17 | 0.04 | NS | 1.28 | 0.01 | NS | 2 | 1.40E−02 | 2.0000 | ||||
| 6 | DCBLD2 | 2.19 | 2.00E−04 | NS | NS | 1.57 | 0.00 | 1.29 | 0.04 | 2 | 1.54E−02 | 1.6833 | ||||
| 7 | TGM2 | 2.14 | 3.00E−04 | NS | 1.42 | 0.04 | 1.48 | 0.01 | NS | 2 | 1.68E−02 | 1.6800 | ||||
| 8 | INPP4B | 3.07 | 1.00E−04 | 1.37 | 0.03 | NS | 1.24 | 0.03 | NS | 2 | 1.94E−02 | 1.8933 | ||||
| 9 | IL1RN | 2.39 | 7.00E−04 | 1.29 | 0.02 | NS | 1.26 | 0.05 | NS | 2 | 2.29E−02 | 1.6467 | ||||
| 10 | MPZL2 | 2.30 | 1.00E−04 | NS | 1.47 | 0.05 | 1.36 | 0.05 | NS | 2 | 3.14E−02 | 1.7100 | ||||
| 11 | SFTA2 | 2.39 | 2.00E−04 | NS | NS | 1.37 | 0.00 | NS | 2 | 5.50E−04 | 1.8800 | |||||
| 12 | MET | 2.79 | 1.20E−07 | NS | NS | 1.68 | 0.001 | NS | 2 | 7.00E−04 | 2.2350 | |||||
| 12 | LAMA3 | 3.86 | 3.60E−06 | 1.50 | 0.002 | NS | NS | 1.57 | 0.001 | 2 | 9.68E−04 | 2.3100 | ||||
| 14 | DHRS9 | 2.14 | 3.00E−04 | NS | NS | 1.31 | 0.004 | NS | 1 | 1.90E−03 | 1.7250 | |||||
| 15 | FRMD6 | 2.33 | 3.00E−04 | NS | NS | 1.65 | 0.01 | NS | 1 | 3.15E−03 | 1.9900 | |||||
| 16 | ARNTL2 | 2.51 | 7.40E−06 | 1.47 | 0.01 | NS | NS | NS | 1 | 3.15E−03 | 1.9900 | |||||
| 17 | PKM | 2.52 | 1.00E−05 | 1.88 | 0.01 | N/A | N/A | N/A | 1 | 3.51E−03 | 2.2000 | |||||
| 18 | SLC2A1 | 3.73 | 4.40E−05 | NS | NS | 1.33 | 0.01 | NS | 1 | 4.02E−03 | 2.5300 | |||||
| 19 | LAMB3 | 2.18 | 3.00E−04 | NS | NS | NS | 1.23 | 0.01 | 1 | 6.65E−03 | 1.7050 | |||||
| 20 | COL17A1 | 2.19 | 2.00E−04 | NS | NS | 1.20 | 0.03 | NS | 1 | 1.36E−02 | 1.6950 | |||||
| 21 | EPSTI1 | 2.22 | 2.00E−04 | NS | NS | NS | 1.45 | 0.03 | 1 | 1.51E−02 | 1.8350 | |||||
| 22 | IL1RAP | 2.51 | 1.00E−04 | NS | NS | NS | 1.56 | 0.03 | 1 | 1.71E−02 | 2.0350 | |||||
| 23 | AK4 | 2.26 | 7.40E−05 | NS | N/A | 1.30 | 0.04 | N/A | 1 | 1.75E−02 | 1.7800 | |||||
| 24 | ANXA2 | 2.50 | 8.40E−06 | 1.61 | 0.04 | NS | NS | NS | 1 | 1.85E−02 | 2.0550 | |||||
| 25 | S100A16 | 2.16 | 2.00E−04 | 1.40 | 0.04 | NS | NS | NS | 1 | 1.86E−02 | 1.7800 | |||||
| 26 | KRT19 | 3.23 | 7.90E−05 | NS | NS | NS | 1.22 | 0.04 | 1 | 1.90E−02 | 2.2250 | |||||
| 27 | GPR87 | 3.37 | 4.60E−06 | NS | NS | 1.15 | 0.04 | NS | 1 | 2.15E−02 | 2.2600 | |||||
| 28 | GPRC5A | 2.64 | 4.60E−06 | NS | NS | 1.26 | 0.05 | NS | 1 | 2.40E−02 | 1.9500 | |||||
Note:
HR, hazard ratio; NS, nonsignificant (p > 0.05); N/A, not available; P, P value; Val. Stat, validation status; PDAC, Pancreatic ductal adenocarcinoma.
Figure 2Kaplan–Meier survival plots for the identified up-regulated genes in PDAC (A–BB).
Survival plots were created using Km-Plotter. Kaplan–Meier survival plots are shown only for genes whose elevated expressions were significantly associated with the overall survival rate of patients in TCGA data and whose prognostic values were validated in at least one of the external validation datasets (GSE21501, GSE250827, GSE57495, and GSE71729).
Figure 3The identified prognostic genes whose mRNA expressions were found to be correlated with the pathological tumor stages in patients with PDAC (A–U).
Violin plots were created using GEPIA based on the TCGA PAAD dataset. F-value indicates the statistical value of F test; Pr (>F) indicates P-value. P < 0.05 was accepted as statistically significant.
Figure 4The protein-protein interaction (PPI) network analysis of differentially expressed genes in PDAC.
The network was constructed by Cytoscape based on the PPI correlations from the STRING database. The clusters in the network was identified using MCODE. A total of nine clusters with MCODE score >5 were marked and named with different colors in the network.
The list of the identified hub protein-coding genes in PDAC.
| Gene symbol | Node degree | Gene name |
|---|---|---|
| FN1 | 25 | Fibronectin type III domain containing |
| TIMP1 | 23 | Tissue inhibitor of metalloproteinases 1 |
| MSLN | 22 | Pre-pro-megakaryocyte-potentiating factor |
| FBN1 | 20 | Fibrillin 1 |
| ALB | 20 | Serum albumin |
| F5 | 20 | Coagulation factor V (proaccelerin, labile factor) |
| SERPINA1 | 20 | Serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1 |
| COL1A2 | 20 | Collagen alpha-2(I) chain |
| IGFBP3 | 19 | Insulin-like growth factor binding protein 3 |
| COL3A1 | 19 | Collagen alpha-1(III) chain |
| COL1A1 | 19 | Collagen alpha-1(I) chain |
| ITGA2 | 18 | Integrin, alpha 2 (CD49B, alpha 2 subunit of VLA-2 receptor) |
| COL17A1 | 18 | 180 kDa bullous pemphigoid antigen 2 |
| TNC | 18 | Glioma-associated-extracellular matrix antigen |
| SPP1 | 18 | Secreted phosphoprotein 1 |
| VCAN | 17 | Chondroitin sulfate proteoglycan core protein 2 |
| MATN3 | 17 | Matrilin 3 |
| IGFBP5 | 17 | Insulin-like growth factor binding protein 5 |
| EGF | 16 | Pro-epidermal growth factor |
| GNB4 | 16 | Guanine nucleotide binding protein (G protein) |
| LTBP1 | 16 | Latent transforming growth factor beta binding protein 1 |
| COL4A2 | 16 | Collagen alpha-2(IV) chain |
| LGALS1 | 16 | Lectin, galactoside-binding, soluble, 1 |
| APOL1 | 16 | Apolipoprotein L, 1 |
| COL11A1 | 16 | Collagen alpha-1(XI) chain |
| ANXA1 | 16 | Phospholipase A2 inhibitory protein |
| CP | 16 | Ceruloplasmin (ferroxidase) |
Figure 5Gene Ontology analysis of the differentially expressed genes in PDAC.
Enriched molecular functions (A and D), biological processes (B and E) and cellular locations (C and F) associated with the differential gene expression in PDAC were shown. Analyses were performed using FunRich.