Hui Cai1, Jiping Xu2, Yifang Han3, Zhengmao Lu1, Ting Han1, Yibo Ding4, Liye Ma1. 1. Department of General Surgery, Changhai Hospital, Second Military Medical University, Shanghai, People's Republic of China. 2. Department of Medical Administration, Changhai Hospital, Second Military Medical University, Shanghai, People's Republic of China. 3. Department of Epidemiology, Research Institute for Medicine of Nanjing Command, Nanjing, People's Republic of China. 4. Department of Epidemiology, Changhai Hospital, Second Military Medical University, Shanghai, People's Republic of China.
Abstract
PURPOSE: This study aimed to identify molecular prognostic biomarkers for gastric cancer. METHODS: mRNA and miRNA expression profiles of eligible gastric cancer and control samples were downloaded from Gene Expression Omnibus to screen the differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs), using MetaDE and limma packages, respectively. Target genes of the DEmiRs were also collected from both predictive and experimentally validated target databases of miRNAs. The overlapping genes between selected targets and DEGs were identified as risk genes, followed by functional enrichment analysis. Human pathways and their corresponding genes were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database for the expression analysis of each pathway in gastric cancer samples. Next, co-pathway pairs were selected according to the Pearson correlation coefficients. Finally, the co-pathway pairs, miRNA-target pairs, and risk gene-pathway pairs were merged into a complex interaction network, the most important nodes (miRNAs/target genes/co-pathway pairs) of which were selected by calculating their degrees. RESULTS: Totally, 1,260 DEGs and 144 DEmiRs were identified. There were 336 risk genes found in the 9,572 miRNA-target pairs. Judging from the pathway expression files, 45 co-pathway pairs were screened out. There were 1,389 interactive pairs and 480 nodes in the integrated network. Among all nodes in the network, focal adhesion/extracellular matrix-receptor interaction pathways, CALM2, miR-19b, and miR-181b were the hub nodes with higher degrees. CONCLUSION: CALM2, hsa-miR-19b, and hsa-miR-181b might be used as potential prognostic targets for gastric cancer.
PURPOSE: This study aimed to identify molecular prognostic biomarkers for gastric cancer. METHODS: mRNA and miRNA expression profiles of eligible gastric cancer and control samples were downloaded from Gene Expression Omnibus to screen the differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs), using MetaDE and limma packages, respectively. Target genes of the DEmiRs were also collected from both predictive and experimentally validated target databases of miRNAs. The overlapping genes between selected targets and DEGs were identified as risk genes, followed by functional enrichment analysis. Human pathways and their corresponding genes were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database for the expression analysis of each pathway in gastric cancer samples. Next, co-pathway pairs were selected according to the Pearson correlation coefficients. Finally, the co-pathway pairs, miRNA-target pairs, and risk gene-pathway pairs were merged into a complex interaction network, the most important nodes (miRNAs/target genes/co-pathway pairs) of which were selected by calculating their degrees. RESULTS: Totally, 1,260 DEGs and 144 DEmiRs were identified. There were 336 risk genes found in the 9,572 miRNA-target pairs. Judging from the pathway expression files, 45 co-pathway pairs were screened out. There were 1,389 interactive pairs and 480 nodes in the integrated network. Among all nodes in the network, focal adhesion/extracellular matrix-receptor interaction pathways, CALM2, miR-19b, and miR-181b were the hub nodes with higher degrees. CONCLUSION:CALM2, hsa-miR-19b, and hsa-miR-181b might be used as potential prognostic targets for gastric cancer.
Gastric cancer ranks as the fifth most common cancer and the third leading cause for cancer death, with diverse histological and molecular subtypes.1,2 Over the past decades, due to the remarkable advances in diagnosis and treatment, the incidence and mortality of gastric cancer have decreased significantly in most countries globally.3 However, gastric cancer is still a challenge to be tackled, demanding intensive studies.One of the prerequisites for the effective therapy of gastric cancer is to understand its prognosis status. Recently, several studies have been performed to investigate prognostic biomarkers for gastric cancer and their prognostic effects.4,5 For example, overexpression of stomatin-like protein 2 is reported to be associated with invasion depth, tumor node metastasis stage, as well as lymph node and distant metastases, leading to poor prognosis in gastric cancer.6 Notch-1 has also been proven to be correlated with gastric cancer progression and to exert a predictive role for the poor clinical outcome of gastric cancer.7 Li et al8 have found by analyzing microarray data that a seven-miRNA signature (miR-10b, miR-21, miR-223, miR-338, let-7a, miR-30a-5p, and miR-126) is related to the survival and relapse of gastric cancer. Furthermore, analysis of oncogenic signaling pathways through the genomic and proteomic expression profiles has revealed the effect of their abnormalities on the prognosis of gastric cancer.9 For instance, the vascular endothelial growth factor (VEGF) pathway was reported as a possible prognostic factor for gastric cancer.10 The expression of VEGF is increased in gastric cancer cell lines, facilitating increases in their proliferation, motility, and adhesion.11 Targeted inhibition of the VEGF pathway has been an important strategy to improve prognosis in gastric cancer therapy.12 By constructing an interconnected network, Cheng et al13 showed that deregulation of WNT, NF-κB, and RTK-Ras-PIK3-AKT pathways may be involved in the development of gastric cancer by causing excessive cell proliferation and sustained angiogenesis. However, the prognostic biomarkers that can be applied in the clinic remain rare and thus further investigation of the etiology of gastric cancer is still necessary.In this study, we aim to screen for highly potential prognostic molecules for gastric cancer by comprehensively analyzing genes, miRNAs, and pathways and thereafter integrating them, to construct an miRNA–target–pathway pairs network, the results of which may be believed to be more credible compared with the separate analysis of each. To our knowledge, this has not been previously reported.
Methods
mRNA and miRNA expression profiles
The mRNA microarray profiles of humangastric cancer samples were retrieved from Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo). After the first screening, 154 related studies underwent a second screening using the following criteria: 1) the studies should have involved no less than ten samples; 2) tissue samples instead of cell samples should have been studied; 3) gastric tissue samples should be the only research topics; 4) the cancer should not have been caused by chemotherapy and other artificial stimulations; 5) the samples should have only undergone the treatment for microarray analysis; 6) normal control samples or normal adjacent tissues should be available; 7) the platforms should include data on >10,000 genes; 8) the annotation of the samples should be available; 9) the data should not be two-panel data; and 10) the profile should have many overlapping genes with the other profiles. Next, different subtypes of gastric cancer samples were merged as the disease samples, while the normal tissue and adjacent tissue samples were recognized as the control samples. Finally, a total of 13 data sets, comprising 497 disease samples and 425 control samples, were selected for further analysis (Table 1).
Table 1
Summary of mRNA expression profiles for gastric cancer
ID
Number
Disease sample
Control sample
Total
GSE51575
26
26
52
GSE65801
32
32
64
GSE33651
40
12
52
GSE63089
45
45
90
GSE56807
5
5
10
GSE33335
25
25
50
GSE27342
80
80
160
GSE13195
25
25
50
GSE30727
30
30
60
GSE19826
12
15
27
GSE13911
38
31
69
GSE38932
24
24
48
GSE37023
141
75
216
In addition, the miRNA expression profile for gastric cancer (accession number GSE26595) was also downloaded from GEO. A total of 60 gastric samples and eight normal samples were available, based on the GPL8197 platform.
Screening for differentially expressed genes and miRNAs
As the mRNA microarray data were merged from 13 individual data sets, MetaDE14 in R language was used to identify the genes whose expressions differed between cancer samples and control samples. Genes with adjusted P-value (false discovery rate or FDR) <0.01 were considered differentially expressed genes (DEGs). For the screening of the differentially expressed miRNAs (DEmiRs), limma15 in R language was utilized with the threshold of FDR <0.05.
Potential gastric cancer-related genes
The targets of the DEmiRs were first identified from the following four databases: PicTar (http://pictar.mdc-berlin.de/), TargetScan (http://www.targetscan.org/), PITA (http://genie.weizmann.ac.il/pubs/mir07/mir07_data.html), and miRanda (http://www.microrna.org/microrna/getDownloads.do). The targets of a certain miRNA were selected as those predicted by at least three of these four databases. Additionally, miRecords,16 miRTarBase,17 and TarBase18 were also searched to identify the verified targets of the DEmiRs. Targets from these screenings were merged as the final targets of the DEmiRs. Finally, the targets that overlapped with the DEGs were recognized as the potential gastric cancer-related genes.
Functional enrichment
The potential gastric cancer-related genes were subjected to functional interpretation using the online tools of the bioinformatics resource Database for Annotation, Visualization and Integrated Discovery for function and pathway enrichment. The terms with FDR <0.05 were considered the significant terms of these genes.
Pathway expression profile
The Kyoto Encyclopedia of Genes and Genomes (KEGG; http://www.genome.jp/kegg/pathway.htm) is a database for the functional analysis of genes. All the human pathways and genes in these pathways were downloaded from KEGG for further analysis. Next, the expression level of each pathway in the samples was calculated by computing the expression levels of its genes in each sample, to obtain the pathway expression profile. The expression level of the pathway was calculated using the following formula:Here, Path is the expression level of pathway i in sample k; g1, g2, g3,…, g are the expression values in sample k of all genes in pathway i. Path remains the same when there are observation errors, abnormal values, and heteroskedastic variances; therefore, it is chosen as the repression of expression levels of pathways.
Screening of co-pathway pairs
To obtain a comprehensive understanding about the roles of abnormal pathways in gastric cancer, the co-pathway pairs that are closely connected were identified by calculating their expression correlations in all samples. The correlation was measured by the value of the Pearson correlation coefficient, which was computed using the following formula:Here, P, is the Pearson correlation coefficient between pathways X and Y in all samples.
and
are the average expression values of pathways X and Y, respectively.Subsequently, the co-pathway pairs highly related to gastric cancer were screened using Equation 3 based on the hypergeometric distribution algorithm:Here, n represents the common gastric cancer-related genes in the two pathways. N represents all the genes of the co-pathway pairs, M represents cancer-related genes of the co-pathway pairs and C represents the combination of genes of the co-pathway pairs. The co-pathway pairs with P<0.05 were considered to be the significant gastric cancer-related co-pathway pairs.
Construction of miRNA–risk gene–pathway pair integrated network
The common cancer-related genes in the co-pathway pairs were abstracted to merge with the target genes of DEmiRs from the miRNA–risk gene pairs. Subsequently, an integrated network of miRNA–risk gene–pathway pair was constructed, for which the network analysis was carried out based on the Cytoscape software19 to calculate the network’s topological features. Additionally, classification of the most important miRNAs in the network was achieved by cross-validation by calculating the receiver operating characteristic (ROC) value.
Results
Screening for DEGs and DEmiRs
The expression profiles of the normal and gastric cancer samples from 13 studies were analyzed for the DEGs using the MetaDE method. Under the threshold of FDR <0.01, a total of 1,260 DEGs were identified, including 1,139 upregulated ones and 121 downregulated ones (Table 2). The DEmiRs were identified using limma package; totally, 144 DEmiRs were obtained, comprising 69 upregulated and 75 downregulated ones (Table 3).
Table 2
Top ten differentially expressed genes
Gene
FDR
COL4A1
6.65E-26
SERPINH1
1.23E-26
IFITM3
4.07E-25
LY6E
4.81E-23
COL1A1
1.37E-23
COL1A2
1.37E-23
SULF1
3.98E-21
ATP4B
2.16E-21
CTSB
2.65E-20
FN1
1.64E-20
Abbreviation: FDR, false discovery rate.
Table 3
Top ten differentially expressed miRNAs
miRNA
FDR
hsa-miR-204
1.02E-13
hsa-miR-1246
8.43E-12
hsa-miR-135b
3.35E-07
hsa-miR-30c
5.94E-07
hsa-miR-29c
5.94E-07
hsa-miR-646
7.43E-07
hsa-miR-148a
8.09E-07
hsa-miR-363
2.22E-06
hsa-miR-1206
5.14E-06
hsa-miR-196b
6.58E-06
Abbreviation: FDR, false discovery rate.
miRNA–target pairs
In all, a total of 8,642 predicted target genes for DEmiRs were identified from PicTar, PITA, miRanda, and TargetScan (Figure 1A), while a total of 912 verified target genes were identified from miRecords, miRTarBase, and Tarbase (Figure 1B). Finally, the common genes of potential targets and verified targets were selected for the eventual miRNA–target pairs (Figure 1).
Figure 1
The numbers of target genes for DEmiRs.
Notes: In all, (A) a total of 8,642 predicted target genes for DEmiRs were identified from PicTar, PITA, miRanda, and TargetScan, while (B) a total of 912 verified target genes were identified from miRecords, miRTarBase, and Tarbase. Finally, (C) the common genes of potential targets and verified targets were selected for the eventual miRNA–target pairs.
The eventual targets for DEmiRs were merged with DEGs to select the potential gastric cancer-related genes (risk genes), and the miRNA–risk gene pairs were believed to play vital roles in the development of gastric cancer. Totally, 570 miRNA–risk gene pairs were obtained with 35 miRNAs and 336 risk genes (Figure 2).
The enrichment analysis showed that DNA metabolic process, trabecula formation, and epidermis development were the most dysfunctional biological processes; nuclear part, nuclear lumen, and nucleoplasm were the most dysfunctional cellular components; adenyl ribonucleotide binding, calcium ion binding, and pyrophosphatase activity were the most dysfunctional molecular functions; extracellular matrix (ECM)–receptor interaction, focal adhesion, and cell cycle were the most dysfunctional pathways (Figure 3).
Figure 3
Enriched functions and pathways of risk genes.
Notes: (A) Biological process; (B) cellular components; (C) molecular functions; (D) pathways. The terms are listed in the Y-axis, while the significances are listed on the X-axis. The higher the log(P-value), the more significant is the term.
Abbreviation: ECM, extracellular matrix.
Pathway expression profile and co-pathway pairs
Totally, 239 human pathways were extracted from KEGG to construct the pathway expression profile (Figure 4). According to the Pearson correlation coefficients, 45 co-pathway pairs were selected as the most closely related pairs. Next, 15 of these co-pathway pairs were chosen as the most significant ones connected with risk genes. The risk gene–pathway pairs were next combined with the miRNA–risk gene pairs for the construction of miRNA–risk gene–pathway pair network. The integrated network consisted of 1,398 interaction pairs and 480 nodes (miRNA, risk genes, co-pathway pairs) (Figure 5).
Figure 4
Heat map of the pathway expression profile.
Notes: The X-axis refers to the pathways, and the Y-axis refers to the samples. The colors from green to red indicate the pathway expression level from low to high.
Notes: The rhombus refers to the miRNA, the rotundity refers to the mRNA, and yellow refers to the co-pathway pairs. Red indicates upregulation and green indicates downregulation.
Abbreviation: ECM, extracellular matrix.
Using Cytoscape, the topological features of the network were calculated (Table 4). The co-pathway pairs “Focal adhesion/ECM–receptor interaction”, “Proteoglycans in cancer/MicroRNAs in cancer”, and “3′,5′-cyclic guanosine monophosphate (cGMP)–protein kinase G (PKG) signaling pathway/3′,5′-cyclic adenosine monophosphate (cAMP) signaling pathway” were closely associated with risk genes. cGMP–PKG signaling pathway/cAMP signaling pathway pair was the co-pathway pair with the highest degree. CALM2 was the most important gene in the integrated network, with the highest connecting degree with miRNAs and pathway pairs (Figure 6; Table 5), such as cGMP–PKG signaling pathway/cAMP signaling pathway, Estrogen signaling pathway/Melanogenesis, and Ras signaling pathway/Rap1 signaling pathway. Moreover, CALM2 was regulated by hsa-miR-193a-3p, hsa-miR-1246, hsa-miR-193b, hsa-miR-19b, hsa-miR-196b, and hsa-miR-181b. The topological features of the top 15 miRNAs with highest degrees are listed in Table 6, with hsa-miR-19b and hsa-miR-181b as the top two miRNAs. As these two miRNAs had differential expressions in gastric cancer samples compared with the normal samples (Figure 7A), their ability to classify the disease samples was detected by cross-validation. The results showed that hsa-miR-19b had an ROC value of 0.7281, while hsa-miR-181b had an ROC value of 0.9125 (Figure 7B).
Table 4
Network topological features of 15 pathways
Pathway pair
Average shortest path length
Betweenness centrality
Closeness centrality
Degree
Topological coefficient
Focal adhesion/ECM–receptor interaction
3.793734
0.026451
0.263593
11
0.142857
Proteoglycans in cancer/microRNAs in cancer
3.903394
0.012734
0.256187
7
0.18254
cGMP–PKG signaling pathway/cAMP signaling pathway
3.767624
0.008914
0.265419
6
0.233333
Ras signaling pathway/Rap1 signaling pathway
3.762402
0.010502
0.265788
6
0.219298
Cell cycle/oocyte meiosis
4.331593
0.006181
0.230862
5
0.225
Estrogen signaling pathway/melanogenesis
3.929504
0.004038
0.254485
4
0.352941
Cholinergic synapse/serotonergic synapse
4.503916
0.001182
0.222029
3
0.484848
Adherens junction/tight junction
4.691906
0.001816
0.213133
2
0.5
Chronic myeloid leukemia/acute myeloid leukemia
4.561358
0.000973
0.219233
2
0.5
Dopaminergic synapse/long-term depression
4.775457
0.000139
0.209404
2
0.8
MAPK signaling pathway/ErbB signaling pathway
4.723238
0.00051
0.211719
2
0.5
NF-kappa B signaling pathway/HIF-1 signaling pathway
Interactive nodes of risk gene CALM2 in the integrated network of miRNA–risk gene–pathway pair.
Notes: The rhombus refers to the miRNA, the rotundity refers to the mRNA, and yellow refers to the co-pathway pairs. Red indicates upregulation and green indicates downregulation.
Abbreviation: ECM, extracellular matrix.
Table 5
Network topological features of the top 15 risk genes with highest degrees
Risk_gene
Average shortest path length
Betweenness centrality
Closeness centrality
Degree
Topological coefficient
CALM2
3.328982
0.043302
0.300392
9
0.128042
MAP2K1
3.898172
0.022464
0.25653
9
0.133903
NSMAF
3.402089
0.027349
0.293937
7
0.202381
COL1A2
3.694517
0.016129
0.270671
6
0.236111
FKBP1A
3.365535
0.034681
0.29713
6
0.20405
SOX4
3.240209
0.044469
0.308622
6
0.206037
CCND2
3.245431
0.028494
0.308126
5
0.233871
GATA6
3.537859
0.019326
0.282657
5
0.220513
GNAI2
3.887728
0.008331
0.25722
5
0.232432
GNAI3
4.221932
0.005805
0.236858
5
0.257143
ITGA5
4.007833
0.011827
0.249511
5
0.276923
KPNA3
3.292428
0.023322
0.303727
5
0.25812
RND3
3.590078
0.036432
0.278545
5
0.221739
THBS1
3.386423
0.03582
0.295297
5
0.212121
WEE1
3.240209
0.040427
0.308622
5
0.2304
Table 6
Network topological features of the top 15 differentially expressed miRNAs with highest degrees
The pathways with higher degrees in the integrated network were considered to be the pathways significantly associated with the occurrence and development of gastric cancer, and the co-pathway pair “Focal adhesion/ECM–receptor interaction” was found to interact with the DEGs of gastric cancer samples. Focal adhesion kinase regulates the intercellular signaling network, which plays a key role in the differentiation and metastasis of cells. Interruption of this pathway inhibits the spread of gastric cancer cells.20,21 It is reported that the connection of Focal adhesion/ECM–receptor interaction induces cascade reactions, and they are both involved in gastric carcinoma.22–24Calmodulin is a highly conserved Ca2+-binding protein in the calcium signaling process, participating in signaling pathways including motility, proliferation, and cell cycle progression and thus plays important roles in cancer development.25
CALM2, a risk gene with the highest degree in our integrated network, encodes calmodulin 2, which has also been found to be overexpressed in mammary cancer cells.26–28 High expression of CALM2 may promote the activation of a calcium/calmodulin-dependent kinase 2 (CaMKII), which then enhances the metastasis of gastric cancer by upregulating NF-κB and Akt-mediated matrix metalloproteinase-9 production.29,30miRNAs are evolutionarily conserved small noncoding RNAs, which have shown their roles in the onset and progression of cancers.31 Many miRNAs have been proven to be associated with the prognosis of various solid tumors.32 As a member in the miR-17-92 cluster, miR-19b is recognized as an oncomiR influencing multiple aspects of the phenotypes of gastric cancer.33 The expression of miR-19b is high in gastric cancer tissues and it is significantly associated with its metastasis.34 In gastric cancer, the expression of miR-181 is upregulated.35 Guo et al36 have found that miR-181b is a potential therapeutic target for gastric neoplasms. Moreover, miR-181b also holds great potential to be used as a prognostic biomarker for gastric cancer in the late stage.37 In this study, these two miRNAs showed satisfactory classification of the gastric cancers, with ROC values of 0.7281 and 0.9125, which further confirmed their prognostic roles in gastric cancer. Additionally, this consistency also proved the accuracy of the miRNA–targets–pathway pair network.It is reported that miRNAs may function in cancer by repressing the expression of target genes via binding to the 3′-untranslated region. A previous study38 has observed that CALM2 is a target anti-correlated with the DEmiR hsa-miR-130b. However, no researchers have identified the correlations between CALM2 and hsa-miR-19b/hsa-miR-181b. Considering the roles of these two genes and the association between these and CALM2, we assumed that CALM2 might be an important prognostic molecule for gastric cancer.
Conclusion
By constructing the miRNA–target–pathway pair network, our findings indicate that Focal adhesion/ECM–receptor interaction pathway, CALM2, hsa-miR-19b, and hsa-miR-181b might be used as potential prognostic targets for gastric cancer. However, further experimental studies are needed to confirm our conclusion.
Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker Journal: Genome Res Date: 2003-11 Impact factor: 9.043
Authors: Christine M Coticchia; Chetana M Revankar; Tushar B Deb; Robert B Dickson; Michael D Johnson Journal: Breast Cancer Res Treat Date: 2008-06-28 Impact factor: 4.872