Yao Liu1, Hao Wang2, Wenhan Yang3, Youhui Qian1. 1. Department of Thoracic Surgery, Shenzhen Second People's Hospital, Shenzhen, Guangdong, China (mainland). 2. Department of Thoracic Surgery, Mingzhou Hospital, Ningbo, Zhejiang, China (mainland). 3. School of Basic Medicine, Shenzhen University Health Science Center, Shenzhen, Guangdong, China (mainland).
Abstract
BACKGROUND There are various pathological types of lung cancer, including squamous cell carcinoma and adenocarcinoma. Although both of them are lung cancers, there are significant differences in diagnosis, pathogenesis, location, imaging, metastasis, and treatment. According to the competing endogenous RNA (ceRNA) theory, long non-coding RNAs (lncRNAs) compete with encoding protein genes (mRNAs) to connect with miRNAs, thus affecting the level of mRNA.d MATERIAL AND METHODS First, using the t test, we identified mRNAs and lncRNAs that have different expressions (fold change >2, P.
BACKGROUND There are various pathological types of lung cancer, including squamous cell carcinoma and adenocarcinoma. Although both of them are lung cancers, there are significant differences in diagnosis, pathogenesis, location, imaging, metastasis, and treatment. According to the competing endogenous RNA (ceRNA) theory, long non-coding RNAs (lncRNAs) compete with encoding protein genes (mRNAs) to connect with miRNAs, thus affecting the level of mRNA.d MATERIAL AND METHODS First, using the t test, we identified mRNAs and lncRNAs that have different expressions (fold change >2, P.
Lung cancer is a common cancer worldwide, with high morbidity and mortality [1]; 85% of lung cancer is non-small-cell lung cancer (NSCLC) [2,3]. Although significant progress has been made in treatment, such as chemotherapy and radiation therapy, the 5-year survival rate for NSCLC is still 15% [4]. Therefore, it is very important to understand the molecular mechanism of non-small cell lung cancer and to seek new therapeutic strategies. In addition, regional or remote transfers result in higher mortality [5]. Long non-coding RNAs (lncRNA) are a class of single-stranded RNA molecules with a length of more than 200 nt, located in the nucleus or cytoplasm and lacking protein-coding functions [6,7]. The expression of genes is regulated at many levels in the form of RNA (e.g., epigenetics, transcriptional regulation, and post-transcriptional regulation) [8,9]. lncRNA is only expressed at specific developmental stages, and it has tissue- or cell-specificity. Research on CeRNA (competing endogenous RNAs) and internal competitive RNAs reveals a new mechanism of interaction between RNAs. It is known that microRNAs can cause gene silencing by binding mRNAs, while ceRNAs can regulate gene expression by competitively competitive binding to microRNAs [5]. Recent studies have reported that lncRNAs are associated with other RNA transcripts by competing for internal RNAs (ceRNAs) mechanisms [10-12]. mRNAs compete with lncRNAs for miRNA molecules to alleviate the inhibition of miRNA targets [13-15]. This type of lncRNA-related competitive triplet is a subclass of ceRNA and occurs widely in humans and several other species [16-18]. Therefore, this study constructed and analyzed the specific ceRNA network of lung adenocarcinoma and squamous cell carcinoma. By constructing the specific ceRNA network and universal network of lung cancer, we found some important markers of cancer that could be used as a guide and reference for the further study of lung cancer.
Material and Methods
Identification of differential expression genes
We downloaded the gene expression data (FPKM) of LUAD and LUSC from TCGA project. First, we used PCA method to extract the classification feature information of 2 kinds of cancer expression profile data, and greatly reduced the dimension of expression profile data and visualized the classification features while maintaining high classification accuracy. Differentially expressed genes of LUAD and LUSC were obtained by unpaired t test and we kept the genes satisfying |Log 2 FoldChange| >1 and P<0.01. Finally, we input the list of upregulated and downregulated genes into the R package “clusterProfiler” for Go term and pathway enrichment analysis.
Building the specific DE-ceRNA network
The miRNA targets were obtained from StarBase2.0, TargetScan, mirTarbase, LncBase, and MiRcode databases. mRNA and lncRNA pairs with more than zero shared miRNAs and a significant hypergeometric test P value (P<0.01) were retained. In total, we integrated more than human 800 000 lncRNA-PCG interactions. Further, we used expression profiles of lung cancer to calculate the expression correlations of the differential mRNA-lncRNA pairs by Pearson correlation analysis. Finally, we defined the consistent expression pairs as potential ceRNA relationships (cor >0; P<0.01). We used Cytoscape software to analyze the topological properties of the network.
Optimizing the DE-ceRNA network and building the sub-net
The random walk (RW) algorithm, which is based on a network, was used to represent an irregular form of change and some of the interesting nodes in the network were weighted as seed nodes. A weight value is distributed to other neighbor nodes along the network structure from the seed node. At the same time, the distribution of the weight of other nodes is also obtained in the walking process. Finally, a node in close contact with the seed node tends to have a higher weight. Here is a restart-type random walk with restart, with the formula is defined as:where r represents the weight of a node in each migration with the probability assigned to the neighbor node, where the default value of 0.7, w is used as a standardized adjacency matrix representing the network, and p 0 means the initial weight of the node. The weight vector, p t, represents the new weight of the node after t-time migration of the network. We obtained gene-disease associations from DisGeNET, Genetic Association Database, Lnc2Cancer, and LncRNADisease. Then, we mapped these known pathogenic genes of lung cancer to the ceRNA network as seed nodes. We used the random walk algorithm according to the topological structure of the ceRNA network to score each node. Finally, we selected the genes with the top random walk score (10 times the number of seed nodes for each cancer) as the nodes to extract the sub-net from the ceRNA network. We used Cytoscape software to visualize the sub-network.
Building a common CeRNA network between LUAD and LUSC
To obtain the common markers of lung cancer, we extracted the same ceRNA relationship between squamous cell carcinoma and adenocarcinoma and we defined this as a common ceRNA network. We extracted the simultaneous lncRNA-mRNA pairs in the 2 cancers to build a common CeRNA network. Finally, we input the genes list of each module into the DAVID () website for Go term and pathway enrichment analysis.
Survival analysis
We downloaded gene expression and clinical data from the TCGA project. The relationship between gene expression and prognosis of patients was assessed by Kaplan-Meier analysis. The mean gene expression was used as the cut-off to classify patients into high- and low-risk groups. For a gene in the ceRNA network, we calculated an average expression to evaluate the survival relation of the gene using the log-rank test (P<0.05). We used R package (“Survival”) to evaluate the differences between different groups in the survival analysis experiments.
Results
The functions of differentially expressed genes
We used PCA to reduce the expression profile of the 2 cancers (LUAD, LUSC) and showed the classification characteristics, finding that both sets of data maintain high classification accuracy at the same time (Figure 1). After the data quality evaluation, we used the non-paired t test method to find genes that are differentially expressed (|Log 2 FoldChange| >1 and P<0.01) between normal and cancer samples (Figure 2A–2D). The statistical significance P_value from low to high sequencing of 1000 differentially expressed genes have been annotated on some important functional GO terms cell multiplication (Figure 2E, 2F) such as “cell division”, “DNA replication”, “G1/S transition of mitotic cell cycle”, “regulation of cell cycle”, “sister chromatid cohesion”, “spindle organization”, “mitotic sister chromatid segregation”, “G2/M transition of mitotic cell cycle”, and pathways associated with cancer such as “cell cycle”. These differences in mRNAs and lncRNAs were used in the following analyses.
Figure 1
Classification of samples by lung cancer expression profile. (A) LUSC mRNA expression profile. (B) LUSC lncRNA expression profile. (C) LUAD mRNA expression profile. (D) LUAD lncRNA expression profile.
Figure 2
Enrichment analysis of differentially expressed genes. (A) Differentially expressed mRNAs for LUSC. (B) Differentially expressed lncRNAs for LUSC. (C) Differentially expressed mRNAs for LUAD. (D) Differentially expressed lncRNAs for LUAD. (E) Differentially expressed mRNAs enrichment analysis for LUSC. (F) Differentially expressed mRNAs enrichment analysis for LUAD.
Lung cancer-specific DE-ceRNA network analysis
We used the shared miRNA and expression correlation between mRNAs and lncRNAs to build the specific DE-ceRNA network of 2 kinds of lung cancer (see Materials and Methods section above). To retain the highly expressed genes, we deleted the nodes in the network with an average expression of less than 0.1 in all samples. The LUSC DE-ceRNA network contains 16 436 edges, 1628 lncRNAs nodes, and 1580 mRNAs nodes. The LUAD DE-ceRNA network contains 9953 edges, 1475 lncRNAs nodes, and 1011 mRNAs nodes. By analyzing the topological properties of the network with Cytoscape software, we found that only 1 frequency of shared neighbor node was the highest, with increasing frequency of shared neighbors (Figure 3A, 3E). The distribution of the node degrees of the 2 networks follow the power law distribution (Figure 3B, 3F). Most of the nodes have a small number of connections, and very few nodes have a very large number of connections, indicating that the network construction is a non-random process. The closeness centrality and betweenness centrality of the network show a positive correlation with the number of shared neighbors (Figure 3C, 3D, 3G, 3H). Through the topology analysis of the network, we determined that the network has strong connectivity, and since the size of the network is large, the next step is to optimize the network.
Figure 3
Topology attribute of specific DE-ceRNA network. (A–D) Attributes of DE-ceRNA network for LUSC. (E–H) Attributes of DE-ceRNA network for LUAD.
Optimizing the DE-ceRNA network and finding marker genes
To optimize the network, we defined known pathogenic genes obtained from 4 gene-associated disease databases as seed nodes to score all nodes in the network by random walk [19]. Then, we reinvested the top genes with random walk scores from high to low ranking into the DE-ceRNA network to get edge information and build a sub-network. Each sub-network is automatically divided into up (orange color) and down (blue color) parts (Figure 4A, 4B). There are nodes in the network that are significantly higher than other nodes, and the contribution of these nodes to the network may be greater. Therefore, we extracted 1 step of neighbor nodes of 2 networks with degree of nodes greater than 20 and maintained the original connection. These nodes that participate in multiple connections in the network are defined as markers such as oncogene PVT1 [20,21], LINC00472 [22], multiple tumor suppressor CDKN2A [23], prognostic Biomarkers in NSCLCFAM83B [24], cancer susceptibility gene BRCA2 in LUSC and HOXA11-AS [25], HNF1A-AS1 [26], LINC00511 [27,28], and HOTAIR [29-32] in LUAD (Figure 4A, 4B). There are many genes in this network that are significantly associated with survival of patients with 2 types of cancer (Figure 5). The expression of the genes PVT1, KC877982.1, LINC01133, MAGI2-AS3, DSCR9, SFTA1P, LINC00968, and SOX2-OTcan significantly distinguish the survival rate of LUSC patients (Figure 6A), and the expression of the genes AC064807.2, LINC00987, LINC01238, TMPO-AS1, OGFRP1, DRAIC, AC023421.1, and RNF144A-AS1 can significantly distinguish the survival rate of LUAD patients (Figure 6B).
Figure 4
Visualization of optimized DE-ceRNA Network. (A) Optimized DE-ceRNA Network for LUSC. (B) Optimized DE-ceRNA Network for LUAD. The border colour of the seed nodes is red.
Figure 5
Visualization of sub-DE-ceRNA Network. (A) Sub-DE-ceRNA Network for LUSC. (B) Sub-DE-ceRNA Network for LUAD. The border colour of the seed nodes is red.
Figure 6
KM survival curve of marker genes. (A) KM survival curves of lncRNAs and mRNAs for LUSC. (B) KM survival curves of lncRNAs and mRNAs for LUAD.
Common ceRNA network analysis
To find common markers, we obtained the intersection of lncRNA-mRNA pairs between 2 cancers to build a common ceRNA network. The network contains 2680 lncRNA-mRNA pairs, 633 lncRNAs and 596 mRNAs (Figure 7A, 7B). Genes that occur at high frequencies in 2 types of cancer are thought to be more important in the treatment and detection of lung cancer. In the common network, mRNAs are also enriched in important pathways and functional terms of cancer development, such as “hsa04110: Cell cycle”, “hsa04115: p53 signaling pathway”, “GO: 0008283~cell proliferation”, “GO: 0051301~cell division”, “h_rbPathway: RB Tumor Suppressor/Checkpoint Signaling in response to DNA damage”, and “GAD_DISEASE: lung cancer” (Figure 8A). We selected cell cycle and p53 pathway modules to display in this common ceRNA network (Figure 8B). For the cell cycle pathway module, we speculated that lncRNAs connected by the coding genes (CDC6, CDK1, E2F2, PKMYT1, TTK, CHEK1, ESPL1, PTTG1, CDC25C, CDC25A, CCNB1, CCNE2, MAD2L1, CCNB2, PLK1, BUB1, BUB1B, ORC6, ORC1, CCNA2) in the pathway may be an important molecule that affects the cell cycle and the increased number of cancer cells, and they exist in an endogenous competitive relationship (CDC25C-LINC00355; ORC6-DDX11-AS1; BUB1/CCNA2/PLK1-VPS9D1-AS1; ORC1/ORC6-LINC00337). In the same way, lncRNA connected by the coding genes (CCNB1, CCNE2, CDK1, CCNB2, SERPINB5, RRM2, CHEK1, GTSE1) may also be an important molecule in the P53cancer pathway. These close regulatory relationships are more likely to be pathogenic factors of NSCLC.
Figure 7
Common ceRNA network for non-small cell lung cancer. (A) The intersection of the ceRNA pairs, lncRNAs, or mRNAs for 2 kinds of non-small cell lung cancer. (B) Visualization of common ceRNA network.
Figure 8
Common markers for non-small cell lung cancer. (A) Enrichment analysis of mRNAs in common ceRNA network. (B) Network modules related to cancer-related pathway.
Discussion
The CeRNA mechanism plays an important role in the development of lung cancer. For example, lncRNA PVT1 regulates expression of HIF1α via functioning as ceRNA for miR-199a-5p in non-small cell lung cancer under hypoxia. We identified some cancer markers by building and analyzing specific or common ceRNA networks [9,33,34]. The specific treatment of 2 subtypes of non-small cell lung cancer is a problem that both researchers and clinicians want to solve, but so far there is no good explanation or evidence [35-37]. We constructed 2 specific networks with mRNA and lncRNA using the specificity of lncRNA to find markers and survival-related genes of 2 different cancers. The representative markers we found in the LUSC network and LUAD network are PVT1, HOTAIR, FAM83B, LINC00472, HOXA11-AS, and CDKN2A, which have also been reported as potential risk factors for malignant tumors such as gastric cancer, glioma, and cervical cancer [20-22,24,25,28-31]. In addition, we analyzed the influence of gene expression on patient survival by combining the clinical information of lung cancerpatients. We screened a few genes related to lung cancerpatient survival and prognosis, such as PVT1, LINC01133, TMPO-AS1, and DRAIC.To predict some of the markers in both cancers, we also built a common ceRNA network, and we selected 2 important cancer pathways – Cell cycle and P53 – to analyze the subnetworks associated with them. We found that the CDC25C/CDK1/RRM2-LINC00355 ceRNA relationship can affect non-small cell lung cancer. It is reported in recent studies that exosomes-mediated LINC00355 regulates bladder cancer cell proliferation and invasion [38]. Identification of these markers will further improve understanding of lncRNA-mediated pathway regulation and ceRNA functions. Our study validated the conclusions of some previous experiments and also produced novel and innovative results awaiting verification by biologists. Continued investigation of the ceRNA networks analyzed here will aid in the development of targeted therapy for lung cancer. We will plan to collect more cancer expression data and clinical information to identify molecular markers more precisely using our network-based approach in future studies.
Conclusions
Accumulating evidence suggests that lncRNAs play important roles in lung cancer. However, only a few specific biomarkers have been identified in subtypes of non-small cell lung cancers so far. In this study, the network analysis and functional genes prediction of non-small cell lung cancer were assessed from 2 aspects – specificity and universality. These results suggest that our method identifies different cancer subtypes with remarkable molecular and cancer-related pathways, which would help to improve personalized cancer management. Our results provide a basis and evidence for future research and clinical treatment, and our experimental data will be of value to researchers.
Authors: Katrin Panzitt; Marisa M O Tschernatsch; Christian Guelly; Tarek Moustafa; Martin Stradner; Heimo M Strohmaier; Charles R Buck; Helmut Denk; Renée Schroeder; Michael Trauner; Kurt Zatloukal Journal: Gastroenterology Date: 2006-08-14 Impact factor: 22.682
Authors: P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki Journal: Science Date: 2005-09-02 Impact factor: 47.728