G Liu1, D Z Li1, C S Jiang1, W Wang1. 1. Department of Gastroenterology, Fuzhou General Hospital of Nanjing Command, Fuzhou, China.
Abstract
To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed by inhibiting signal transductions upstream and downstream of the selected motifs.
To investigate signal regulation models of gastric cancer, databases and literature were used to construct the signaling network in humans. Topological characteristics of the network were analyzed by CytoScape. After marking gastric cancer-related genes extracted from the CancerResource, GeneRIF, and COSMIC databases, the FANMOD software was used for the mining of gastric cancer-related motifs in a network with three vertices. The significant motif difference method was adopted to identify significantly different motifs in the normal and cancer states. Finally, we conducted a series of analyses of the significantly different motifs, including gene ontology, function annotation of genes, and model classification. A human signaling network was constructed, with 1643 nodes and 5089 regulating interactions. The network was configured to have the characteristics of other biological networks. There were 57,942 motifs marked with gastric cancer-related genes out of a total of 69,492 motifs, and 264 motifs were selected as significantly different motifs by calculating the significant motif difference (SMD) scores. Genes in significantly different motifs were mainly enriched in functions associated with cancer genesis, such as regulation of cell death, amino acid phosphorylation of proteins, and intracellular signaling cascades. The top five significantly different motifs were mainly cascade and positive feedback types. Almost all genes in the five motifs were cancer related, including EPOR, MAPK14, BCL2L1, KRT18, PTPN6, CASP3, TGFBR2, AR, and CASP7. The development of cancer might be curbed by inhibiting signal transductions upstream and downstream of the selected motifs.
Numerous studies have shown that the abnormal transduction of cellular signaling is
closely related to differentiation, apoptosis, and proliferation of cells, and to the
occurrence, progression, and prognosis of disease (1). According to studies of intercellular protein-protein interaction
networks, the regulation of local signaling in normal tissue is different from that in
tumors (2). Network motifs are the specific
combinations of functional vertices and the basic building blocks of a network. Motifs
can react to external stimuli by regulating gene expression. Mining the cancer
susceptibility genes, combined network motifs, and gene expression profiles (3) can improve the identification of target genes on
tumor metastasis markedly (4,5).About 90% of early gastric cancerpatients with adequate treatment can survive for more
than 5 years and be considered cured; however, the 5-year survival rate of advanced
gastric cancer after treatment is less than 5% (6). Thus, early diagnosis is the key to improving treatment efficacy and
increasing survival rate (7).In this study, in order to screen for gastric cancer-related genes and then investigate
the signal-regulating models, we constructed a human signaling network after integrating
information from many databases and references. After analysis of topological
properties, we mapped the verified genes onto the network, and mined the cancer-related
motifs using three vertices. Finally, we selected the motifs that were significantly
different in normal compared with gastric cancer cells. Genes in the significantly
different motifs were the screened genes.
Material and Methods
Gene expression profiles
The Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) is currently the largest fully
public gene expression resource. It provides flexible mining tools that enable users
to easily query, filter, inspect and download data within the context of their
specific interests (8). We downloaded the gene
expression profile data of GSE2685 (9) from
GEO, which was based on the GPL80 platform (HU6800; Affymetrix Human Full Length
HugeneFL Array) data. A total of 30 samples were available, including primary human
advanced gastric cancer tissues (n=22), and noncancerous gastric tissues (n=8). We
downloaded the raw data and the probe annotation files from Affymetrix for further
analysis. The probe-level data were converted into expression values, log2
transformed, and standardized using the median method (10).
Extraction of gastric cancer-related genes
Gastric cancer-related genes were extracted from CancerResource (11), GeneRIF (Gene Reference into Function)
(12), and COSMIC (Catalogue of Somatic
Mutations in Cancer) (13) databases.
Human signaling network construction
All cellular activities, including division, differentiation, and apoptosis are
closely associated with signal transduction. The BioCarta database is the largest
collection of information on human signaling pathways. We downloaded all the human
signaling pathways from BioCarta (http://www.biocarta.com/genes/Cellsignaling.asp) (14), removed redundant information, and
represented all proteins with their corresponding genes. In addition, 10
cancer-related pathways from Cancer CellMap (15) and pathways published by Le and Kwon (16) were also used to construct the signaling network associated with
gastric cancer. Gastric cancer-related genes extracted from different databases were
then marked into the signaling network. Finally, the network analyzer tool in
CytoScape was used to calculate network topological characteristics such as degree
distribution and clustering coefficient.
Motif mining in human signaling network
Many biological networks consist of specific combinations of subnets with frequencies
of occurrence that are significantly higher than random. Topological motifs with high
frequencies can be used to explain the principles of bio-network organization (17). The fast network motif detection (FANMOD)
software (18) was used for motif mining in the
human signaling network, because it can handle networks with colors in nodes and
edges, and predict the mining time for the whole network with a high operating
efficiency.
Screening for significant differences among motifs
To investigate the differences of motifs in the normal and cancer states, the
significant motif difference (SMD) method (19), based on variations of coexpression, was used to calculate the SMD
scores of motifs. For a motif (MA) with three edges, E1, E2, and E3, the
difference score (S) is defined as:
where X, Y are the gene expression values in the normal state and X′,
Y′ are the gene expression values in the cancer state. Ek and
E′k are the absolute values of Pearson correlated coefficients between
the two genes connecting by edge k under normal and cancer states, respectively.Motifs with SMD scores higher than threshold are the significantly different motifs,
and the threshold is set according to the distributions of SMD scores. P=0.05 was
selected as the significance threshold.
Functional annotations of significantly different motifs
Gene ontology (GO) functional annotations (20)
of genes in significantly different motifs were performed using the Database for
Annotation, Visualization, and Integration Discovery (DAVID) (21). Functions with a corrected P value false discovery rate
(FDR) of less than 0.05 were selected.
Results
Gastric cancer-related genes
By screening the expression profiles and extracting from three databases, 5515 and
778 related genes were obtained, respectively.The human signaling network was constructed combining the pathways obtained from
databases and references. There were 1634 nodes and 5089 regulating interactions,
including 2403 activated, 741 inhibited, and 1915 physical interactions in the
network (Figure 1).
Figure 1
Human signaling network. Light gray lines represent the physical
interactions, dark black lines represent the inhibited interactions, and pink
lines represent the activated interactions. The dark red nodes are
cancer-related genes.
The integrated network was hypothesized to have the same characteristics, such as
small-world, scale-free, and hierarchy as protein-protein interaction networks, and
gene networks (22). The CytoScape
NetworkAnalyzer was used to calculate the degree distribution (Figure 2A) and clustering coefficient (Figure 2B) of the network. It turned out than the degree
distribution followed a power law, and the network had scale-free and small-world
characteristics. The average degree was 6.3, but was 10.5 for gastric cancer-related
genes, almost all of which were hub genes in the network (23). As shown in Figure 2B,
the genes with a higher number of neighbors tended to have lower clustering
coefficients.
Figure 2
A, Degree distributions of the human signaling network.
Numbers of nodes with higher degree were smaller than the other nodes, and all
nodes approximated a power-law. B, Clustering coefficients
distributions of the human signaling network. The average clustering
coefficient of all nodes was plotted against the numbers of neighbors, and
nodes with smaller coefficients tended to have fewer neighbors.
Human signaling network motif mining
Biological networks are composed of recurring network models, and all models are
usually combinations of motifs with three vertices. We conducted the motif mining
using the FANMOD software for the gastric cancer-related motifs with three vertices.
The nodes and edges in the network were marked in different colors. In the total of
92 models, 90 were marked with cancer-related genes. Of a total of 69,492 motifs,
57,942 were marked with cancer-related genes.
Significantly different motif selection
SMD scores of 57,942 motifs were computed using the gene expression profiles under
normal and cancer states. In all, 26,354 motifs were selected with all three genes
expressed, and the distributions of these motif scores were normally distributed
(Figure 3). The SMD scores in the normal and
cancer states were significantly different for 264 motifs (P<0.05).
Figure 3
Distribution of significant motif difference (SMD) scores of motifs marked
with gastric cancer-related genes.
Genes in the significantly different motifs were mainly enriched in functions closely
related to the occurrence of cancer, such as regulation of cell death, regulation of
programmed cell death, protein amino acid phosphorylation, and intracellular
signaling cascades (Table 1). This result
confirmed the relationship between the significantly different motifs and gastric
cancer.
Type and rank analysis of significantly different motifs
First, we classified the types of significantly different motifs, and found that the
types having more than five motifs were mainly cascades and positive feedback (Figure 4). Next, we ranked the motifs according to
their SMD scores (Table 2), and queried for
the relationship of genes of the top five motifs with gastric cancer. Among all the
genes, only two, NCOR2 (humannuclear corepressor 2) and
ARHGEF7 (rho guanine nucleotide exchange factor), were found to
have no relation to gastric cancer. The relationships of EPOR
(erythropoietin receptor), MAPK14 (mitogen-activated protein kinase
1), BCL2L1 (BCL2-like1), KRT18 (keratin 18),
PTPN6 (protein tyrosine phosphatase nonreceptor 6),
CASP3 (caspase-3), TGFBR2 (transforming growth
factor-beta, TGFβ, type II receptor), AR (adrenergic receptor), and
CASP7 (caspase-7) with gastric cancer were already known.
Figure 4
Models of significantly different motifs. Red arrows are the activated
interactions, while green arrows are the inhibited interactions. Black nodes
represent normal genes, while red nodes represent gastric cancer-related
genes.
Discussion
The human signaling network we constructed was very large and could reveal additional
signal-associated information about gastric cancer. Analysis of the topological
characteristics of the network revealed that gastric cancer-related genes had a higher
average degree than that of all the genes taken together, and that most of these
cancer-related genes were hub genes in the network. This result further confirmed the
importance of cancer-related genes (23). We also
conducted cancer-related motif mining for a better understanding of the mechanisms of
cancer occurrence and development. Cascade and positive feedback were the two types of
motifs with significantly different normal and cancer state SMD scores, suggesting that
they are disrupted in the cancer state, which may promote the speed of signal
transduction. Various types of motifs are associated with cell functions. The
significance of the cascade type lies in its influence on cell proliferation and
differentiation, the negative feedback type participates in an adaptive response, and
the positive feedback type can enhance signal robustness (24,25). Thus, efficient
signal transduction may be the reason why cancer cells can proliferate so rapidly.We mapped gene expression values to the signaling network and then screened the
significantly different motifs according to differences in coexpression of motif genes
between the normal and cancer states. Expression of genes in the selected motifs was
mainly enriched in those functions implicated with cancer development, such as
regulation of cell death, regulation of programmed cell death, protein amino acid
phosphorylation, and intercellular signaling cascades. Recently, studies have shown that
amino acids are not only cell signaling molecules but also regulators of gene expression
and the protein phosphorylation cascade (26). The
signaling pathways of the cellular response to accurate transmission of signals rely on
protein phosphorylation and, ultimately, lead to the activation of specific
transcription factors that induce the expression of appropriate target genes (27). Extracellular signals are transmitted from the
cell membrane to genes in the nucleus via several communication lines known as
intracellular signaling pathways, and the transmission of signals through these pathways
involves sequential phosphorylation events, in many cases by protein kinases, that are
termed kinase cascades (28). Among signal
transduction events, protein phosphorylation modulated by protein kinases and
phosphatases is an important posttranslational modification event in a variety of cells.
Such phosphorylation plays a critical function in signal transduction, cell growth,
differentiation, and oncogenesis (29). All the
enriched functions in this network were involved in cancer development. Thus, the
selected motifs were also related to gastric cancer.EPOR, MAPK14, BCL2L1,
KRT18, PTPN6, CASP3,
TGFBR2, AR, and CASP7 were genes
in the five motifs with the highest SMD scores, and some of them are already known to be
gastric cancer related. NCOR2 and ARHGEF were the only
two genes for which there have been no reports of a correlation with gastric cancer.
EPOR is a member of the cytokine receptor superfamily, and the increased expression of
EPOR is a potential, significant prognostic marker in the carcinogenesis, angiogenesis,
and progression of gastric cancer (30). The
protein tyrosine phosphatase (PTP) family plays an important part in the inhibition or
control of growth, and members may exert oncogenic functions (31). Several studies have detected aberrant DNA methylation of the
PTPN6 gene in gastric cancer (32,33). TGFBR2, a
constitutively active kinase, is reported to play a tumor suppressor role in the TGFβ
pathway in gastric cancer (34). Studies have also
detected the relevance of AR (35), CASP3 (36), and
CASP7 (37) with gastric
cancer. NCOR2, which participates in a corepressor complex resulting in
chromatin condensation, is involved with many cancers (38). It promotes the deacetylation of histone to silence genes. In addition,
ARHGEF7, also known as PAK-interacting exchange factor, participates
in the activation of Ras family genes (39). Based
on these identifications, even though there is no direct evidence,
NCOR2 and ARHGEF may be the latent gastric
cancer-related genes.Gastric cancer is a common, fatal malignancy worldwide. At present, therapeutic
decisions are based on clinical and pathological parameters, including age,
tumor-involved lymph nodes, metastases, stage, and histological grade. Although useful,
these factors often fail to differentiate more aggressive tumor types from less
aggressive types (40). As a result, there is an
urgent need to find specific markers. If motifs, as functional units, can be used as
biomarkers, then the diagnostic efficiency will be greatly increased. We could then find
the locations of the already known cancer-related genes in a motif, and see which genes
they affect and which genes affect them. The development of cancers might then be
suppressed by inhibiting the signal transductions of their upstream and downstream genes
with new potential drugs for gastric cancer.
Authors: M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971
Authors: David Croft; Gavin O'Kelly; Guanming Wu; Robin Haw; Marc Gillespie; Lisa Matthews; Michael Caudy; Phani Garapati; Gopal Gopinath; Bijay Jassal; Steven Jupe; Irina Kalatskaya; Shahana Mahajan; Bruce May; Nelson Ndegwa; Esther Schmidt; Veronica Shamovsky; Christina Yung; Ewan Birney; Henning Hermjakob; Peter D'Eustachio; Lincoln Stein Journal: Nucleic Acids Res Date: 2010-11-09 Impact factor: 16.971