Minjie He1, Yan Lin2, Yuzhen Xu3. 1. Department of Medical Oncology, The First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan 650000, P.R. China. 2. Department of Oncology, The Affiliated Traditional Chinese Medical Hospital of Xinjiang Medical University, Urumqi, Xinjiang 830000, P.R. China. 3. Department of Gastrointestinal Surgery, Xuzhou Hospital Affiliated to Medical School of Southeast University, Xuzhou, Jiangsu 221009, P.R. China.
Abstract
Colorectal cancer (CRC) is a highly malignant gastrointestinal tumor accompanied by poor prognosis. Long non-coding RNA (lncRNA) plays an important role in the progression and physiology of tumors as it competes with endogenous RNAs, including miRNA and mRNA. In the present study, a multi-step computational method was used to build a CRC-related functional lncRNA-mediated competitive endogenous RNA (ceRNA) network (LMCN). lncRNAs with more degrees and betweenness centrality (BC) were screened out as hub lncRNAs. Then functional enrichment analyses of lncRNAs were carried out from the Gene Ontology (GO) and Reactome pathway databases based on the 'guilt by association' principle. As a result, lncRNAs in the LMCN displayed specific topological characteristics in accordance with the regulatory correlation of coding mRNAs in CRC pathology. HCP5, EPB41L4A-AS1, SNHG12, and LINC00649 were screened out as hub lncRNAs which were more significantly related to the development and prognosis of CRC. The hub lncRNAs in CRC were obviously involved in functions of cell cycle arrest, vacuolar transport, histone modification, and in pathways of GPCR, signaling by Rho GTPases, axon guidance pathways, meaning that they might be potential biomarkers for diagnosis, evaluation and gene-targeted therapy of CRC. Thus, the LMCN construction method could accelerate lncRNA discovery and therapeutic development in CRC.
Colorectal cancer (CRC) is a highly malignant gastrointestinal tumor accompanied by poor prognosis. Long non-coding RNA (lncRNA) plays an important role in the progression and physiology of tumors as it competes with endogenous RNAs, including miRNA and mRNA. In the present study, a multi-step computational method was used to build a CRC-related functional lncRNA-mediated competitive endogenous RNA (ceRNA) network (LMCN). lncRNAs with more degrees and betweenness centrality (BC) were screened out as hub lncRNAs. Then functional enrichment analyses of lncRNAs were carried out from the Gene Ontology (GO) and Reactome pathway databases based on the 'guilt by association' principle. As a result, lncRNAs in the LMCN displayed specific topological characteristics in accordance with the regulatory correlation of coding mRNAs in CRC pathology. HCP5, EPB41L4A-AS1, SNHG12, and LINC00649 were screened out as hub lncRNAs which were more significantly related to the development and prognosis of CRC. The hub lncRNAs in CRC were obviously involved in functions of cell cycle arrest, vacuolar transport, histone modification, and in pathways of GPCR, signaling by Rho GTPases, axon guidance pathways, meaning that they might be potential biomarkers for diagnosis, evaluation and gene-targeted therapy of CRC. Thus, the LMCN construction method could accelerate lncRNA discovery and therapeutic development in CRC.
Colorectal cancer (CRC), usually starting as a harmless polyp, is one of the most common gastrointestinal malignancies. Globally, CRC is the third most common cancer in the diagnosis of male diseases, ranking second in women. Approximately 1.4 million new CRC cases and 700,000 CRC deaths were evaluated in 2012. It has been found that the incidence of CRC is highest in North America, Europe, and Australia/New Zealand, and other countries with lower history rate are currently facing increased risk (1). The risk factors of CRC include environment (such as obesity, alcohol consumption and tobacco smoking), heredity and gene-environment interactions (2). Stool tests, sigmoidoscopy or colonoscopy, and virtual colonoscopy are common clinical screening methods for CRC (3). Although the fecal occult blood testing and sigmoidoscopy can provide valuable information for the guide therapy of CRC, they are not sufficient to confirm clinical outcomes. Therefore, seeking new effective diagnostic methods and exploring the canceration mechanism from polyp becomes urgent. It has been reported that bioinformatics analysis of gene expression profiles could clearly explain the occurrence, development, metastasis and prognosis of tumors, and is a hot spot in the current basic and clinical research (4–6).Long non-coding RNA (lncRNA) is a novel class of transcripts with 200 nucleotides that do not serve as a template for protein (7), but they are involved in a variety of biological processes, such as epigenetic regulation, chromosome remodeling and gene expression regulation (8). Studies have found that lncRNAs are abnormally expressed in many tumors, including non-small cell lung cancer, cervical cancer, hepatocellular cancer, and humanbreast cancer (9–12). The expression level of other transcripts as miRNA sponges could be regulated by these lncRNAs through competitive endogenous RNAs (ceRNAs) network (13,14). Furthermore, some canonical oncogenic pathways can be revealed by ceRNA interaction network (15), and Wang et al have reported that the newly identified ceRNA network contributes to the exploration of the regulatory mechanisms of ceRNA mediated by lncRNA in the nosogenesis of muscle-invasive bladder cancer (16). However, it remains a challenge to identify lncRNA biomarkers of CRC, and to understand the functional roles of ceRNAs mediated by lncRNA in CRC.In the present study, a functional lncRNA-mediated ceRNA network (LMCN) associated with CRC was constructed using a multi-step computational method, and the relevant lncRNAs were identified based on the constructed landscape map. Then we carried out functional enrichment analyses for mRNAs which were significantly associated with lncRNAs, and used the functions of the mature mRNAs to forecast the lncRNA functions.
Materials and methods
Data source
Expression profiles of lncRNA and mRNA in CRC
In the present study, the gene expression profile dataset no. GSE31737, deposited by Loo et al (17) in the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) based on the GPL5175 platform [HuEx-1_0-st; Affymetrix Human Exon 1.0 ST Array: transcript (gene) version; Affymetrix; Thermo Fisher Scientific, Inc., Waltham, MA, USA], was subjected to bioinformatics analysis. A total of 80 chips were involved in the dataset, including 40 colon cancer tissues (tumor group) and 40 paired adjacent-normal colon tissues (control group). After preprocessing and mapping between genes and probes, we obtained 14,451 gene expression profile data.
Identification of miRNA-target interactions
Firstly, interactions between miRNA and mRNA, and between miRNA and lncRNA were obtained from the starBase v2.0 which provided the most comprehensive decoding of miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data (18). Then a new expression profile was acquired through taking the intersection between the genes in the above-mentioned expression profile data of CRC and the mRNAs and lncRNAs, respectively, which were located in the interactions of miRNA-mRNA and lncRNA-miRNA. Subsequently, the interaction relationship of the new gene expression profile were extracted from the downloaded interactions of miRNA-mRNA and miRNA-lncRNA.
Construction of LMCN
The competitive lncRNA-mRNA interactions were identified using a hypergeometric test that was able to effectively assess the significance of the miRNAs shared by each lncRNA and mRNA. There were a total of N miRNAs in the genome, where K and M represented the number of miRNAs related to the current lncRNA and mRNA, and x represented the number of common miRNAs shared by lncRNA and mRNA. P-value was calculated in order to evaluate the enrichment significance for that function using the formula:False discovery rate (FDR) correction was performed to adjust the P-values. FDR<0.01 was used as the threshold of LMCN.The lncRNA-mRNA interaction-pairs screened out above were co-expression analyzed through calculating their Pearson's correlation coefficient in the control and tumor group. The calculation formula used was:of which cov(X,Y) represents the covariance of variables X and Y, and σX and σY represent the standard deviations of X and Y, respectively. FDR<0.01 was set as the threshold. The lncRNA-mRNA interaction-pairs whose difference values of Pearson's correlation coefficient was >0.3 between control and tumor groups were taken as the significant co-expression ceRNA interactions (19). The ceRNA network was constructed and described using a Cytoscape software (v3.5.1; Cytoscape Consortium, New York, NY, USA). A complete bipartite graph was involved in this model in which an edge was achieved from each apex of the lncRNA set to each vertex of the mRNA set.We studied the relationship between the number of genes and the degree of distribution, and the fitting coefficient R2 of the power-law Y = aXb of the objective networks was detected (20). The built-in Network Analyzer tool (Agilent Technologies, Inc., Santa Clara, CA, USA) was used to analyze the node degree and the betweenness centrality (BC). BC represents a central metric for the nodes in the network. It can be obtained by computing the number of the shortest paths through the node, from each node to all others. This property reflects the control ability or the probability to exert the ‘gate-keeping’ effect of a node in the ceRNA network. Generally, a higher degree implies that the node is a hub involved in more ceRNA interactions. A higher BC indicates that the node is a bottleneck that functions as a bridge to connect different network modules.
Screening of hub lncRNAs
LMCN can provide an all-sided landscape of all possible ceRNA interactions, which contributes in exploring the regulatory roles of the lncRNAs. Moreover, more detailed view on the synergy between lncRNAs and competing mRNAs can be confirmed by analyzing the regional sub-networks. A high-competitive sub-network was extracted from the LMCN based on a Pearson's correlation coefficient threshold >0.5. lncRNAs with more degrees and BCs were selected out as hub lncRNAs involved in CRC.
Function prediction of lncRNAs
Functions of the hub lncRNAs can be studied with a ‘guilt by association’ (GBA) approach (21,22). It has been stated by GBA principle that genes that participate in the same biological process are inclined to be correlated (or have similar properties, such as similar expression patterns), allowing statistically to infer previously unknown functions of a gene based on some previous knowledge on other genes and correlative data (23,24). In this study, the GBA meant that the function of lncRNAs could be deduced though carrying out a functional enrichment analysis for the mature mRNAs which were significantly associated with lncRNAs.Function and pathway enrichment analysis were performed to evaluate the lncRNAs competitive mediated modules with functionality obtained from the Gene Ontology (GO) database (25) as well as pathways acquired from the Reactome pathway database (26). The significance thresholds were set as P<0.01 in functional analysis, and P<0.05 in pathway analysis after Fisher's correction.
Results
The LMCN reveals specific topological properties
The LMCN was built with a multi-step method to assess the lncRNA-mediated ceRNA interactions. For miRNA-lncRNA interactions and miRNA-mRNA interactions in CRC, we collected and integrated full-scale datasets from the starBase v2.0 which provided experimentally supported high confidence miRNA-target interactions. A total of 265,728 pairs of miRNA-mRNA interactions as well as 598 pairs of lncRNA-miRNA interactions were acquired from a new expression profile covering a total of 8,522 genes consisting of 8,471 mRNAs and 51 lncRNAs. Based on the studies that lncRNA transcripts competed with endogenous mRNAs by binding to miRNAs that could be predicted with current miRNA target prediction approach (27,28). The significance of common miRNAs shared by each lncRNA-mRNA pair was examined with a hypergeometric test. Then 51 lncRNAs, 8,125 mRNAs and 3,586 lncRNA-mRNA pairs were obtained. Pearson's correlation coefficient of the identified candidate ceRNA pair was calculated to confirm active ceRNA pairs in CRC. Eventually, we constructed and graphically simulated the LMCNs using the significantly co-expressed lncRNA-mRNA ceRNA pairs (Fig. 1). There were 51 lncRNAs, 3,752 mRNAs, and 5,617 ceRNA interactions in the LMCN. The statistics of nodes degree (R2=0.9999) showed power law distributions (Fig. 2A), indicating that the LMCN related to CRC is a scale-free network. Moreover, the lncRNA nodes exhibited more specific topological properties with more degrees and BCs in comparison with mRNA nodes (Fig. 2B and C).
Topological analysis of LMCN. (A) The entire network reflected a power-law distribution. (B) The node degree distribution, and (C) the BC distribution of the lncRNAs were significantly higher than that of mRNAs. LMNC, lncRNA-mediated ceRNA network; BC, betweenness centrality; lncRNA, long non-coding RNA; ceRNA, competitive endogenous RNA.
HCP5, EPB41L4A-AS1, SNHG12, and LINC00649 are the hub lncRNAs in CRC
To determine the hub lncRNAs with highly specific properties in CRC, a sub-LMCN covering lncRNA-mRNA co-expression interactions with Pearson's correlation coefficient deference >0.5 between tumor and control groups was constructed (29). Results showed that the sub-LMCN also had a power-law distribution scaleless structure (Fig. 3C; R2=0.9995). The landscape map contained 42 lncRNAs, 681 mRNAs, and 730 ceRNA interactions (Fig. 3A). We found that HCP5, EPB41L4A-AS1, SNHG12, and LINC00649 had more node degrees and BCs in the sub-LMCN than others (Fig. 3B), indicating that they may be the prognostic markers in CRC.
Figure 3.
Overview of the highly competitive sub-network with Pearson's correlation coefficient deference >0.5. (A) The sub-LMCN contains 42 lncRNAs, 681 mRNAs, and 730 ceRNA interactions. The hub lncRNAs are shown as blue nodes, the other lncRNAs are shown as green nodes and mRNAs are shown as red nodes. (B) Amplified landscape map of the hub and bottleneck nodes. (C) The sub-LMCN had a scale-free structure complying with power-law distribution. LMNC, lncRNA-mediated ceRNA network; lncRNA, long non-coding RNA; ceRNA, competitive endogenous RNA.
Sixty-three functions and 66 pathways enriched from GO and Reactome pathway database
To further validate the potential functional implication of lncRNAs, function and pathway enrichment analysis of mRNAs in the sub-LMCN were conducted based on GO and Reactome pathway database. The mRNAs associated with hub lncRNAs were obviously enriched in 63 GO terms, especially involved in cell cycle arrest, vacuolar transport, histone modification, and regulation of protein serine/threonine kinase activity (Fig. 4), and 66 Reactome pathways, including GPCR pathway, signaling by Rho GTPases, axon guidance and developmental biology (Table I).
Figure 4.
The top 20 enriched functions of mRNAs associated with lncRNAs in GO database. lncRNA, long non-coding RNA; GO, Gene Ontology
Table I.
The top 20 enriched Reactome pathways of mRNAs associated with lncRNAs.
No.
Reactome pathway
P-value (FDR)
No. of genes
1
GPCR downstream signaling
2.64E-29
76
2
Signaling by GPCR
4.09E-25
93
3
GPCR ligand binding
6.93E-13
33
4
Class A/1 (Rhodopsin-like receptors)
1.31E-09
22
5
Signaling by Rho GTPases
1.01E-06
125
6
Peptide ligand-binding receptors
1.05E-06
11
7
Complement cascade
7.60E-06
1
8
Axon guidance
2.07E-05
110
9
Developmental Biology
5.78E-05
158
10
G α (i) signalling events
5.78E-05
21
11
Initial triggering of complement
3.70E-04
1
12
Membrane trafficking
3.70E-04
70
13
RHO GTPase effectors
3.70E-04
85
14
Asparagine N-linked glycosylation
1.12E-03
46
15
UPR
1.12E-03
37
16
Sema4D in semaphorin signaling
1.66E-03
16
17
Biological oxidations
2.43E-03
16
18
Scavenging heme from plasma
2.43E-03
1
19
Immunoregulatory interactions between a lymphoid and a non-lymphoid cell
2.47E-03
9
20
Translocation of GLUT4 to the plasma membrane
2.85E-03
27
lncRNA, long non-coding RNA; FDR, false discovery rate; UPR, unfolded protein response.
Discussion
Considering the characteristics with hidden tumorigenesis, apt to local invasion and metastasis of CRC, what we primarily do for good prognosis is to ascertain pathogenesis, make early diagnosis and targeted individualized treatment. Bioinformatics is one of the interdisciplinaries involved in many fields, and one of its core elements is to study the internal expression regulatory mechanisms of genes and reveal essential laws of human diseases. lncRNA is an important type of gene expression regulatory factor in bioinformatics, and is involved in many human diseases, such as tumors (30), cardiovascular disease (31), inflammation (32) and autoimmune disease (33). The interconnections of ceRNAs play a crucial role in the development and physiology of tumors (34,35). In the present study, we constructed a functional LMCN to explore the intermodulation relationship between lncRNAs and mRNAs, and predicted the functions and pathways involved in lncRNAs across 80 CRC samples. Firstly, the LMCN was constructed through integrating the expression profiles of genome-wide lncRNA/mRNA and comprehensive interactions of miRNA-target. The LMCN presented a scale-free network conforming to power-law distributions, and the lncRNA nodes exhibited more specific topological properties with more degrees and BCs than mRNA nodes. Thus, these results indicated that LMCN has similar properties with many biological networks, and can be greatly organized into a structured rather than random network through a set of core lncRNA-mRNA competing principles (36,37). Then a sub-LMCN was established to identify the hub and bottleneck lncRNAs with a threshold of Pearson's correlation coefficient deference >0.5. We found that HCP5, EPB41L4A-AS1, SNHG12, and LINC00649 have more node degrees and BCs in the landscape map. Finally, function enrichment analyses of mature mRNAs from GO and Reactome pathway databases were carried out to predict the functions of the obviously relative lncRNAs based on the GBA principle. As a result, the lncRNAs were found to be involved in 63 GO terms and 66 Reactome pathways, including the biological process of cell cycle arrest, vacuolar transport, histone modification, regulation of protein serine/threonine kinase activity, and the pathways of signaling by Rho GTPases, GPCR pathway, axon guidance, developmental biology had lower P-values and covered more mRNAs.lncRNAs regulate the expression level of protein-coding genes in different ways, such as basic regulation process of genes and post-transcriptional regulation (38–40). In this study, we found that HCP5, EPB41L4A-AS1, SNHG12, and LINC00649 present stronger centrality and bridge-joint, meaning that they are more important than other lncRNAs in the onset and progress of CRC. Some investigations have shown that SNHG12 could increase cell cycle-related protein expression and suppress caspase-3 expression in CRC and humanosteosarcoma cells. Moreover, silencing SNHG12 expression can inhibit triple-negative breast cancer cell proliferation, and it has been found that SNHG12, as an endogenous sponge of miR-199a/b-5p, regulates MLK3 expression in hepatocellular carcinoma and affects the activation of the NF-κB pathway (41–44). Therefore, SNHG12 may be considered as a potential biomarker and a promising therapeutic target for CRC and other cancers. It has been reported that (HCP5) is highly expressed in glioma tissues and U87 and U251 cells, which can promote the malignant biological behavior of glioma cells by enhancing proliferation, migration as well as invasion, and inhibiting apoptosis (45). Furthermore, a single nucleotide polymorphism in HCP5 (rs2244546) has been confirmed as a strong predictor of hepatitis C virus-related hepatocellular carcinoma in the Swiss Hepatitis C Cohort Study (46). EPB41L4A-AS1 was found a risky lncRNA in ovarian cancer (47). HCP5 and EPB41L4A-AS1 could be boldly inferred as potential lncRNA biomarkers for the diagnosis, evaluation and gene-targeted therapy of CRC, but further studies needs to be carried out to verify this inference.GO analysis suggested that the target genes of hub lncRNAs are significantly associated with cell cycle and histone modification. Cell cycle regulation is of great significance in maintaining homeostasis and preventing cancer, and its abnormity has been long regarded as an important intermediate link for the tumor development (48–50). The basic function of histone modification is to regulate gene expression, and phosphorylation of histones is not only an important intermediate step of certain signal transduction pathways, but also affects cell division and cycle through combining with other types of modification (51,52). Therefore, the result in this study was consistent with previous reports, and the occurrence and development of CRC are closely related to cell cycle and histones modification. The reactome pathway analysis showed that the lncRNAs might participate in the Rho GTPases and axon guidance pathways. Rho GTPases family is an intracellular signal transducer that connects cell surface signals to multiple intracellular reactions. They playe a critical role in various cellular processes, including cell morphology, gene transcription, cell cycle progression, and cell adhesion (53,54). Furthermore, it has been reported that oncogene-specific cell migration and invasion pathways are mediated by Rho GTPases in colon cancer cells (55). Semaphorins primarily known as ligands for plexins and neuropilins are key members of axon guidance molecules (56). Many studies have found that semaphorin proteins have certain modulatory effects on tumorous occurrence and development (57–59), for example, the axon guidance molecule semaphorin 3F was found in normal neuroendocrine cells, while it was lost in most human primary tumors and all metastases. Axon guidance molecule semaphorin 3F plays a negative regulator role in the development of tumor in ileal neuroendocrine tumors (60). Additionally, the inhibition of migration, invasion and epithelial to mesenchymal transition has been revealed because of the overexpression of SEMA3A by suppressing the expression of NF-κB and SNAI2 in head and neck squamous cell carcinoma (HNSCC); whereas, a shorter overall survival and more important independent prognostic significance could be found in HNSCCpatients with lower SEMA3A expression (61). Therefore, axon guidance cues could be promising targets for personalized anticancer therapies. In short, the results of the present study indicated that cancer-related LMCN contributes in improving the identification of biomarkers and promotes the development of CRC therapy.In conclusion, we have provided the LMCN landscape across 80 CRC samples with a new method. The specific topological properties and synergistic competition effects of lncRNAs may reveal the regulatory interactions of coding mRNAs in CRC.
Authors: Alexander E Kelly; Cristina Ghenoiu; John Z Xue; Christian Zierhut; Hiroshi Kimura; Hironori Funabiki Journal: Science Date: 2010-08-12 Impact factor: 47.728
Authors: Lourdes Peña-Castillo; Murat Tasan; Chad L Myers; Hyunju Lee; Trupti Joshi; Chao Zhang; Yuanfang Guan; Michele Leone; Andrea Pagnani; Wan Kyu Kim; Chase Krumpelman; Weidong Tian; Guillaume Obozinski; Yanjun Qi; Sara Mostafavi; Guan Ning Lin; Gabriel F Berriz; Francis D Gibbons; Gert Lanckriet; Jian Qiu; Charles Grant; Zafer Barutcuoglu; David P Hill; David Warde-Farley; Chris Grouios; Debajyoti Ray; Judith A Blake; Minghua Deng; Michael I Jordan; William S Noble; Quaid Morris; Judith Klein-Seetharaman; Ziv Bar-Joseph; Ting Chen; Fengzhu Sun; Olga G Troyanskaya; Edward M Marcotte; Dong Xu; Timothy R Hughes; Frederick P Roth Journal: Genome Biol Date: 2008-06-27 Impact factor: 13.583