Meng Zhou1, Zhiyong Diao2, Xiaolong Yue3, Yang Chen1, Hengqiang Zhao1, Liang Cheng1, Jie Sun1. 1. College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, PR China. 2. Department of Plastic Surgery, the First Affiliated Hospital of Harbin Medical University, Harbin, 150001, PR China. 3. Medical Oncology Department, Affiliated Tumor Hospital, Harbin Medical University, Harbin, 150001, PR China.
Abstract
It is increasing evidence that ceRNA activity of long non-coding RNAs (lncRNAs) played critical roles in both normal physiology and tumorigenesis. However, functional roles and regulatory mechanisms of lncRNAs as ceRNAs in pancreatic ductal adenocarcinoma (PDAC), and their potential implications for early diagnosis remain unclear. In this study, we performed a genome-wide analysis to investigate potential lncRNA-mediated ceRNA interplay based on "ceRNA hypothesis". A dysregulated lncRNA-associated ceRNA network (DLCN) was constructed by utilizing sample-matched miRNA, lncRNA and mRNA expression profiles in PDAC and normal samples in combination with miRNA regulatory network. The results of network analysis uncovered seven novel lncRNAs as functional ceRNAs whose aberrant expression will result in the extensive variation in tumorigenic or tumor-suppressive gene expression through DLCN at the post-transcriptional level contributing to PDAC. Therefore, we developed a 7-lncRNA signature (termed LncRisk-7) based on the expression data of seven lncRNAs and SVM algorithm as a novel diagnostic tool to improve early diagnosis of PDAC. The LncRisk-7 achieved high performance in distinguishing PDAC patients from nonmalignant pancreas samples in the discovery cohort and was further confirmed in another two independent validation cohorts. Functional analysis demonstrated that seven lncRNA biomarkers act as ceRNAs involving the regulation of cell death, cell adhesion and cell cycle. This study will help to improve our understanding of the lncRNA-mediated ceRNA regulatory mechanisms in the pathogenesis of PDAC and provide novel lncRNAs as candidate diagnostic biomarkers or potential therapeutic targets.
It is increasing evidence that ceRNA activity of long non-coding RNAs (lncRNAs) played critical roles in both normal physiology and tumorigenesis. However, functional roles and regulatory mechanisms of lncRNAs as ceRNAs in pancreatic ductal adenocarcinoma (PDAC), and their potential implications for early diagnosis remain unclear. In this study, we performed a genome-wide analysis to investigate potential lncRNA-mediated ceRNA interplay based on "ceRNA hypothesis". A dysregulated lncRNA-associated ceRNA network (DLCN) was constructed by utilizing sample-matched miRNA, lncRNA and mRNA expression profiles in PDAC and normal samples in combination with miRNA regulatory network. The results of network analysis uncovered seven novel lncRNAs as functional ceRNAs whose aberrant expression will result in the extensive variation in tumorigenic or tumor-suppressive gene expression through DLCN at the post-transcriptional level contributing to PDAC. Therefore, we developed a 7-lncRNA signature (termed LncRisk-7) based on the expression data of seven lncRNAs and SVM algorithm as a novel diagnostic tool to improve early diagnosis of PDAC. The LncRisk-7 achieved high performance in distinguishing PDACpatients from nonmalignant pancreas samples in the discovery cohort and was further confirmed in another two independent validation cohorts. Functional analysis demonstrated that seven lncRNA biomarkers act as ceRNAs involving the regulation of cell death, cell adhesion and cell cycle. This study will help to improve our understanding of the lncRNA-mediated ceRNA regulatory mechanisms in the pathogenesis of PDAC and provide novel lncRNAs as candidate diagnostic biomarkers or potential therapeutic targets.
Entities:
Keywords:
ceRNA network; competing endogenous RNAs; early diagnosis; long non-coding RNAs; pancreatic cancer
Pancreatic cancer (PC) remains the most common cause of cancer-related mortality worldwide [1]. Pancreatic ductal adenocarcinoma (PDAC) is the most frequent subtype of pancreatic cancer, accounting for more than 85% of pancreatic tumor cases [2]. Despite increasing efforts in PDAC, the 5-year survival rate is low at less than 5%, primarily due to late stage diagnosis at advanced stages in a large proportion of patients [1]. To decrease mortality and improve the management of PDAC, diagnostic markers is critical for early detection and risk stratification of PDAC which could aid clinicians to select early tailored treatment. However, the PDAC is a heterogeneous disease in various aspects including clinicopathological, molecular and cellular heterogeneity [3, 4]. Therefore, traditional recognizable clinical and pathological symptoms or signs have limited value in detecting early PDAC. Molecular biomarkers have been proven to be a promising clinical tool for identifying patients' subgroup having early-stage disease.Over the last ten years, large-scale genome and transcriptome studies have documented large numbers of non-coding RNAs (ncRNAs), including short ncRNAs and long ncRNAs [5, 6]. Long non-coding RNAs (lncRNAs), a major class of ncRNAs with larger than 200 nucleotides in length, have been reported to participate in a wide range of biological processes, including genomic imprinting, transcriptional and post-transcriptional regulation [7-9]. It has been shown that lncRNAs harboring miRNA response elements (MREs) can be regulated by miRNAs, thus acting as competing endogenous RNAs (ceRNAs) to communicate with mRNAs by competing for shared miRNAs [10, 11]. Experimental evidence revealed that dysregulated expressions of key lncRNAs of ceRNA network have greater effects on the miRNA-mediated lncRNA/mRNA ceRNA crosstalk interactions and disrupt bistable states, thus contributing to the initialization and development of cancers [11, 12]. Wang and his colleagues showed that the crosstalk between lncRNA HULC and PRKACB gene via competitive binding to miR-372 in a ceRNA-dependent feed-forward loop resulted in highly up- regulated expression of HULC in liver cancer [13]. Paci et al. built two miRNA-mediated lncRNA-mRNA networks in normal and pathological breast tissue and investigated the potential role of lncRNA as ceRNA in breast cancer [14]. Recent two studies had led to important insights into ceRNA-mediated gene regulation in gastric cancer and ovarian cancer and identified candidate functional cancer-related lncRNAs [15, 16]. Several lncRNAs have been found to be differentially expressed in PC tissues compared to the healthy controls, such as HOTTIP-005, RP11-567G11.1 and MALAT1 [17, 18], implying their potentials as diagnostic or prognostic biomarkers in PC. A more recent study reported that lncRNA NUTF2P3-001 function as ceRNAs to communicate with KRAS mRNA by competitively binding to miR-3923, and the overexpression of NUTF2P3-001 will unregulated KRAS expression by depriving the inhibition of miR-3923 on KRAS leading to the proliferation and invasion of pancreatic cancer cell [19], revealing the functional roles of lncRNA-mediated ceRNA crosstalk in PC for the first time.In this study, as demonstrated by Chou's 5-step rule [20] to establish a really useful prediction method for a biological system, we first investigated the altered expression patterns of lncRNAs, miRNAs and mRNAs between PDACpatients and normal samples. Then a dysregulated lncRNA-associated ceRNA network (DLCN) was constructed by integrating altered lncRNAs, miRNAs, mRNAs and their co-dysregulated regulatory relationships based on “ceRNA hypothesis”. We uncovered 7 novel lncRNAs as functional ceRNAs with key roles in the pathogenesis of PDAC, and developed a SVM-based 7-lncRNA signature (termed LncRisk-7) which significantly discriminate PDACtumors from nonmalignant pancreas samples with high performance. The diagnostic values of the LncRisk-7 were further validated in another two independent testing cohorts. With further experimental validation, these novel lncRNAs acting as ceRNAs at the post-transcriptional level may become promising diagnostic biomarkers and therapeutic targets.
RESULTS
Identification of differentially expressed mRNAs, miRNAs and lncRNAs in PDAC patients
We first compared the expression profiles of mRNAs, miRNAs and lncRNAs between 25 PDAC samples and 7 nonmalignant pancreas samples from the discovery cohort, and found that 1032 mRNAs, 87 miRNAs and 78 lncRNAs are differentially expressed (Fold change ≥ 1.5 or ≤ 0.67 and FDR-adjusted p ≤ 0.1) in PDAC samples compared with nonmalignant pancreas samples using the SAM analysis (Supplementary Table S1). Of these, 632 mRNAs, 47 miRNAs and 52 lncRNAs were over-expressed and 400 mRNAs, 40 miRNAs and 26 lncRNAs were down-expressed in PDACpatients compared with nonmalignant pancreas samples.
Construction and analysis of dysregulated lncRNA-associated ceRNA network
The preliminary dysregulated lncRNA-associated ceRNA network (DLCN) was first built by integrating expression profiles and regulatory relationships of mRNAs, miRNAs and lncRNAs of 32 samples in the discovery cohort. As described in the Materials and Methods section, we detected 290 miRNA-mediated lncRNA-mRNA competing triplets among 5 miRNAs, 7 lncRNAs and 150 mRNAs (Supplementary Table S2). These significant competing triplets were then assembled to constitute dysregulated lncRNA-associated ceRNA network for exploring the dynamic changes of ceRNA regulation in PDAC. The resulted DLCN was comprised of 467 edges among 5 miRNAs, 7 lncRNAs and 150 mRNAs (Figure 1A).
Figure 1
The layout of dysregulated lncRNA-mediated ceRNA network (DLCN) and its structural characteristics
(A) Global view of the DLCN in pancreatic cancer. The DLCN was comprised of 467 edges among 5 miRNAs, 7 lncRNAs and 150 mRNAs. (B) Degree distribution of the DLCN. (C) The clustering coefficient of the DLCN is higher than randomization test. The arrow represents the clustering coefficient in the real ceRNA network. (D) The characteristic path length of the DLCN is higher than randomization test. The arrow represents the characteristic path length in the real ceRNA network.
The layout of dysregulated lncRNA-mediated ceRNA network (DLCN) and its structural characteristics
(A) Global view of the DLCN in pancreatic cancer. The DLCN was comprised of 467 edges among 5 miRNAs, 7 lncRNAs and 150 mRNAs. (B) Degree distribution of the DLCN. (C) The clustering coefficient of the DLCN is higher than randomization test. The arrow represents the clustering coefficient in the real ceRNA network. (D) The characteristic path length of the DLCN is higher than randomization test. The arrow represents the characteristic path length in the real ceRNA network.Next, we performed network analysis to study the structure and organization of the DLCN. The degree distribution of nodes in the DLCN was investigated and the power-law distribution with a slope of −1.783 and R2 = 0.9988 was observed (Figure 2B), suggesting that the DLCN displayed scale-free characteristics typical of biological networks. Furthermore, the DLCN showed module characteristics with a significantly higher clustering coefficient than random networks (p < 0.01, Figure 2C). However, the characteristic path length of the DLCN is substantially larger than that of random networks (p < 0.001) (Figure 1D), implying that the DLCN had reduced global efficiency. The scale-free, module characteristics and reduced global efficiency of the DLCN suggested that dysregulated ceRNA interactions often occurred at a local scale and hubs in the modules tended to be critical in the context of the entire network.
Figure 2
The lncRNA ceRNAs are more critical components compared to mRNA ceRNAs in the DLCN
(A) The difference of degree among lncRNAs, miRNAs and mRNAs. The lncRNA ceRNAs have significantly higher degrees than mRNA ceRNAs in the DLCN. (B) The difference of betweenness centrality among lncRNAs, miRNAs and mRNAs. The lncRNA ceRNAs had a higher betweenness centrality than mRNA ceRNAs in the DLCN. (C) The unsupervised hierarchical clustering heatmap of 32 samples based on the expression profiles of 7 lncRNA biomarkers in the discovery cohort.
The lncRNA ceRNAs are more critical components compared to mRNA ceRNAs in the DLCN
(A) The difference of degree among lncRNAs, miRNAs and mRNAs. The lncRNA ceRNAs have significantly higher degrees than mRNA ceRNAs in the DLCN. (B) The difference of betweenness centrality among lncRNAs, miRNAs and mRNAs. The lncRNA ceRNAs had a higher betweenness centrality than mRNA ceRNAs in the DLCN. (C) The unsupervised hierarchical clustering heatmap of 32 samples based on the expression profiles of 7 lncRNA biomarkers in the discovery cohort.Further comparison analysis showed that there were significant differences in the degree distribution and betweenness centrality among mRNAs, miRNAs and lncRNAs (p = 3.356e-09 for degree and p = 2.65e- 07 for betweenness centrality, Kruskal-Wallis test) (Figure 2A and 2B). LncRNAs and miRNAs have significantly higher degree and betweenness centrality compared with mRNAs, suggesting that lncRNAs and miRNAs tended to be hub nodes. As showed in Figure 1A, a large proportion of mRNAs (55.3%) communicated with individual lncRNAs, and all lncRNAs acted as ceRNAs to communicate with multiple mRNAs by competing specific shared miRNAs. These results suggested that the aberrant expression of lncRNA ceRNA would result in the extensive variation in gene expression by miRNA-mediated lncRNA-mRNA ceRNA crosstalk interactions, implying that ceRNA function of lncRNAs in the DLCN is of crucial importance in the development of PDAC.
Identification of potential diagnostic lncRNA signature in PDAC from the discovery cohort
Based on the above observation, these 7 lncRNAs with ceRNA activity were considered as potential biomarkers associated with PDAC (Table 1). To test whether these 7 lncRNA biomarkers could efficiently distinguish PDACpatients from nonmalignant pancreas samples, we performed unsupervised hierarchical clustering for 32 samples in the discovery cohort according to the expression pattern of 7 lncRNA biomarkers. The results showed that all samples in the discovery cohort were grouped into two distinctive sample clusters (11 samples in Cluster 1 vs. 21 samples in Cluster 2), which were highly correlated with disease status (p = 9.804e- 05, Fisher exact test; Figure 2C). As seen in Figure 2C, all nonmalignant pancreas samples were clustered into Cluster 1, and most of PDACpatients (23/25, 84%) were subdivided into Cluster 2. The above results demonstrated that these 7 dysregulated lncRNAs in the DLCN might have a predictive role and could be used as biomarkers in the diagnosis of PDAC. Thus, we integrated these 7 lncRNA biomarkers to construct a 7-lncRNA signature (termed LncRisk-7) by developing an SVM classifier. The performance of the LncRisk-7 in distinguishing PDACpatients from nonmalignant pancreas samples was evaluated in the discovery cohort using the LOOCV procedure, in which 31 samples were used as training set and the remaining one was served as the test sample. Results of discovery cohort showed that the LncRisk-7 was able to correctly classify 24 out of 32 samples, achieving an overall predictive accuracy of 75% with a sensitivity of 84% and a specificity of 57.1%. The discriminatory performance of the LncRisk-7, evaluated by calculating the AUC and DOR, revealed that the AUC was 0.829 (Figure 3A and 3B) and the DOR was 3.938 (Figure 3C), which were significantly higher than those of randomization tests (Figure 3B and 3C). These results demonstrated that the LncRisk-7 had the better predictive performance for discriminating PDACtumors from nonmalignant pancreas samples.
Table 1
The detailed information of dysregulated 7 lncRNAs with ceRNA activity in the DLCN
Ensembl ID
Gene name
Genomic location
Fold change
FDR
ENSG00000224660
SH3BP5-AS1
Chr 3: 15,254,184–15,264,493 (+)
0.42
0.09713
ENSG00000246859
STARD4-AS1
Chr 5: 111,512,226–111,739,726 (+)
2.47
0.09713
ENSG00000245311
ARNTL2-AS1
Chr 12: 27,389,789–27,446,625 (–)
2.78
0.09713
ENSG00000261312
AC002550.5
Chr 16: 19,706,351–19,715,383 (+)
6.63
0.09713
ENSG00000240618
RP11-206L10.5
Chr 1: 759,032–764,925(–)
2.21
0.09713
ENSG00000223947
AC016738.4
Chr 2: 100,993,676–101,002,244(–)
2.67
0.09713
ENSG00000255306
RP5-901A4.1
Chr 11: 68,024,809–68,030,461(–)
3.12
0.09713
Figure 3
The performance of the LncRisk-7 in distinguishing PDAC patients from nonmalignant pancreas samples in the discovery cohort
(A) ROC analysis of the sensitivity and specificity of the LncRisk-7. (B) The AUC values of the LncRisk-7 based on leave-one-out cross validation. (C) The DOR values of the LncRisk-7 based on leave-one-out cross validation.
The performance of the LncRisk-7 in distinguishing PDAC patients from nonmalignant pancreas samples in the discovery cohort
(A) ROC analysis of the sensitivity and specificity of the LncRisk-7. (B) The AUC values of the LncRisk-7 based on leave-one-out cross validation. (C) The DOR values of the LncRisk-7 based on leave-one-out cross validation.
Further validation of the LncRisk-7 with two additional independent PDAC cohorts
To independently validate the diagnostic power of the LncRisk-7, we first test the discriminatory performance of the LncRisk-7 in an independent cohort of 52 samples from Pei's study [21]. For this, we first performed unsupervised hierarchical clustering analysis based on the expression profiles of 7 lncRNAs and found that 52 samples were grouped into two distinctive clusters based on expression patterns of seven lncRNAs, with significantly different tumor status (p = 4.241e- 05, Chi-square test; Figure 4A). Furthermore, the LncRisk-7 efficiently distinguished PDACpatients from nonmalignant pancreas samples in the Pei cohort, achieving 76.9% prediction accuracy with 86.1% sensitivity and 43.8% specificity. The discriminatory power measured by the AUC and DOR was 0.833 and 7.97, respectively (Figure 4B).
Figure 4
Independent validation of the LncRisk-7 for early diagnosis in another cohort of 52 samples from Pei's study
(A) Hierarchical clustering heatmap and dendrogram of 52 samples based expression patterns of 7 lncRNA biomarkers. (B) The discriminatory performance of the LncRisk-7, evaluated by ROC analysis and calculating the AUC and DOR.
Independent validation of the LncRisk-7 for early diagnosis in another cohort of 52 samples from Pei's study
(A) Hierarchical clustering heatmap and dendrogram of 52 samples based expression patterns of 7 lncRNA biomarkers. (B) The discriminatory performance of the LncRisk-7, evaluated by ROC analysis and calculating the AUC and DOR.Further validation of the discriminatory power of the LncRisk-7 was conducted using another completely independent cohort of 72 samples from Badea's study [22]. Results with unsupervised hierarchical clustering analysis were similar to those observed in the discovery cohort and Pei cohort above (Figure 5A). Most of PDACpatients (30/36, 83.3%) was clustered into one subgroup and 28 out of 36 (77.8%) normal samples was divided into another subgroup, indicating the significant association between expression patterns of 7 lncRNAs and tumor occurrence (p = 7.144e-07, Chi-square test; Figure 5A). Next, we performed classification of PDAC and control samples in the Badea cohort using the LncRisk-7, and found that the LncRisk-7 was able to correctly classify 26 out of 36 PDAC samples and 24 out of 36 control samples, resulting in 69.4 % accuracy, 72.2% sensitivity and 33.3% specificity. The discriminatory power measured by the AUC and DOR was 0.781 and 5.2, respectively (Figure 5B). Taken together, these results with additional independent validation cohorts demonstrated better and reproducible diagnostic performance of the LncRisk-7 in discriminating PDACtumors from normal samples.
Figure 5
Further confirmation of the LncRisk-7 for early diagnosis in other independent cohort of 72 samples from Badea's study
(A) Hierarchical clustering heatmap and dendrogram of 72 samples based expression patterns of 7 lncRNA biomarkers. (B) The discriminatory performance of the LncRisk-7, evaluated by ROC analysis and calculating the AUC and DOR.
Further confirmation of the LncRisk-7 for early diagnosis in other independent cohort of 72 samples from Badea's study
(A) Hierarchical clustering heatmap and dendrogram of 72 samples based expression patterns of 7 lncRNA biomarkers. (B) The discriminatory performance of the LncRisk-7, evaluated by ROC analysis and calculating the AUC and DOR.
Functional implication of lncRNA biomarkers
To investigate the potential functional implication of the LncRisk-7, we performed functional enrichment analysis of GO and KEGG for mRNAs in the DLCN. Results of GO analysis revealed 39 enriched GO terms in the “Biological Process” (GOTERM-BP-FAT) (p < 0.05 and Fold Enrichment > 3.0) (Supplementary Table S3), which could be clustered into three functional sub-networks involved in cell death, cell adhesion and cell cycle (Figure 6A). KEGG analysis focusing on the biological pathways showed that these mRNAs as ceRNA interactors of lncRNA biomarkers in the LncRisk-7 were significantly enriched in several pathways involved in pathways in cancer, ECM-receptor interaction, cell adhesion molecules (CMAs) and adherens junction (p < 0.05 and Fold Enrichment > 3.0) (Figure 6B and Supplementary Table S3). These enriched biological processes and pathways have been reported to play important roles in PDAC pathogenesis, thus the LncRisk-7 might be involved with.
Figure 6
Functional analysis of the diagnostic lncRNAs
(A) The functional enrichment map of GO terms for mRNAs as ceRNA counterparts of lncRNA biomarkers. Each node represents a GO term and an edge represents the proportion of shared genes between connecting GO terms. (B) The enriched KEGG pathways ranked by −log10 (p-value).
Functional analysis of the diagnostic lncRNAs
(A) The functional enrichment map of GO terms for mRNAs as ceRNA counterparts of lncRNA biomarkers. Each node represents a GO term and an edge represents the proportion of shared genes between connecting GO terms. (B) The enriched KEGG pathways ranked by −log10 (p-value).
DISCUSSION
PDAC is one of the deadliest solid tumors characterized by complex molecular and cellular heterogeneity [4, 23]. During the past years, great efforts have been made to provide novel insights into the molecular mechanisms underlying PDAC, but the focus has been on protein-coding genes or miRNAs [24, 25]. A recently discovered non-coding RNA class, termed lncRNAs, has been widely reported to participate in a wide range of biological processes and their dysregulated expression is associated with many complicated human disease phenotypes including cancers [26-28]. There is increasing evidence that lncRNAs are extensively targeted by miRNAs, and function as ceRNAs. The crosstalk between ceRNAs occurred through competitively binding to shared miRNAs and formed a complex ceRNA-mediated regulatory network contributing to a novel dimension of post-transcriptional gene regulation [10, 11]. It has been shown that lncRNAs are key components of ceRNA-mediated regulatory network and their ceRNA activity has been implicated in both physiological conditions and cancer development [12, 29, 30]. Systematic analysis of ceRNA network has been performed in breast cancer [14, 31], gastric cancer [16], glioblastoma multiforme [32] and ovarian cancer [15]. However, lncRNAs are known to have more developmental stage-, tissue-, organ- and disease-specific expression patterns than protein-coding genes, suggesting that lncRNA-associated ceRNA crosstalk will likely shift under different specific conditions and occur in a disease-specific manner. Therefore, it is critical for studying functional roles and regulatory mechanisms of lncRNA as ceRNAs in the development of PDAC, and investigating their potential implications for early diagnosis of PDAC.In this study, we obtained genome-wide expression profiles of lncRNAs in PDACpatients and nonmalignant pancreas samples by repurposing three publicly available PDAC-related GEO cohorts and identified 78 differentially expressed lncRNAs, implying that these lncRNAs may be associated with PDAC. In order to explore the ceRNA activity of lncRNAs in PDAC, we used an integrative computational framework to identify 290 dysregulated lncRNA-mediated ceRNA crosstalks through integrating sample-matched expression profiles of lncRNAs, miRNAs and mRNAs with miRNA-target regulatory information and constructed a lncRNA-mediated ceRNA regulatory network. Only 7 of 78 differentially expressed lncRNAs displayed ceRNA activity and communicated with 150 mRNAs by competing for 5 common miRNAs in PDAC. Network analysis showed that PDAC-specific DLCN was a scale-free and small world network as general biological networks, and the topological properties of DLCN were significantly distinguished from random networks, including high clustering coefficient and characteristic path length. We further examined the associations between mRNAs ceRNAs in the DLCN and cancers, and found that 150 mRNA ceRNAs in the DLCN were significantly enriched in the cancer class from The Genetic Association Database [33] (GAD, https://geneticassociationdb.nih.gov/) (p = 5.0e-03), in which 40 mRNAs are known to be related to cancers recorded in GAD (Supplementary Table S4). Previous studies have suggested that hub nodes, characterized by their high degree of connectivity to other nodes, play critical roles in a network and tend to be essential in network organization [34, 35]. In the DLCN, lncRNAs were observed to be topological key nodes whose degree and betweenness centrality are significantly higher than mRNA ceRNAs, indicating that ceRNA activity of lncRNAs has profound implications for PDAC and the dysregulated expression of lncRNA ceRNAs will result in the extensive variation in tumorigenic or tumor-suppressive gene expression through lncRNA-mediated ceRNA regulatory network at the post-transcriptional level contributing to PDAC.With increasing attention to the roles of lncRNAs as oncogenes and tumor suppressors in cancers, lncRNAs have exhibited superior potential as diagnostic and prognostic biomarkers than protein-coding genes owing to the more closely association between lncRNA expression and their functions [36, 37]. Recent studies have reported several lncRNA-focus signatures to improve prognosis prediction for some malignant tumors including breast cancer, lung cancer, colorectal cancer and so on [38-45]. However, the diagnostic role of lncRNAs in PDAC has not been investigated. Introducing graphic methods into biological systems can provide an intuitive vision and useful insights. It is particularly helpful to deal with complicated biological systems as demonstrated in recent some studies [46-48]. To explore whether 7 lncRNAs with ceRNA activity uncovered in the DLCN could become suitable diagnostic biomarkers for PDAC, we first performed unsupervised hierarchical clustering analysis and found that expression patterns of 7 lncRNAs could discriminate effectively between PDACtumors and nonmalignant pancreas samples in the discovery cohort and two independent validation cohorts. In statistical prediction, the following three cross-validation methods are often used to examine a predictor for its effectiveness in practical application: independent dataset test, subsampling test, and jackknife test [49]. We adopted the leave one out cross-validation in this study as done by many investigators with SVM as the prediction engine and chose independent dataset test method to examine the accuracy of 7 lncRNAs in dignostic precdition. Therefore, we developed a 7-lncRNA signature based on the expression data of 7 lncRNAs and SVM algorithm as a novel diagnostic tool to improve early diagnosis of PDAC. The 7-lncRNA signature achieved a high performance in distinguishing PDACpatients from nonmalignant pancreas samples in the discovery cohort and was further confirmed in another two independent validation cohorts. These results demonstrated that the 7-lncRNA signature had the robust discriminatory power and may become a reliable and powerful predictor for early diagnosis of patients with PDAC.As functional characterization of lncRNA is still in its infancy, only very few lncRNAs have been well functionally annotated until now. The ceRNA activity of these 7 lncRNAs was reported to be associated with PDAC for the first time and their function is unknown. It is increasing evidence that ceRNA crosstalk has become a powerful approach to infer functional roles of lncRNA with unknown function based on their ceRNA activity [27, 28, 50–52]. Therefore, as a preliminary exploration for the potential functional implication of the LncRisk-7, we used ‘guilt by association’ principle to investigate biological processes and pathways regulated by the dysregulation of lncRNAs biomarkers and associated ceRNA crosstalk by functional enrichment analysis for mRNAs in the DLCN. We found that mRNAs as ceRNA counterparts of lncRNA biomarkers were involved in three GO biological processes (cell death, cell adhesion and cell cycle) and four KEGG pathways (pathways in cancer, ECM-receptor interaction, cell adhesion molecules (CMAs) and adherens junction). Three GO biological processes(cell death, cell adhesion and cell cycle) are well known to be associated with the hallmarks of cancer [53]. The extracellular matrix (ECM) not only provided mechanical and structural support but also played important roles in the development processes [54]. The dysregulated expression of ECM receptors contributing to the alterations of ECM formation and composition has been implicated in PDAC [55, 56]. Therefore, it is a plausible inference that the seven lncRNA biomarkers act as ceRNAs involving the regulation of cell death, cell adhesion and cell cycle. The dysregulation of these lncRNA ceRNAs and the resultant perturbation in lncRNA-mediated ceRNA network will have important effects in cancer-related biological processes contributing to tumorigenesis and progress of PDAC.In summary, we performed a genome-wide analysis to investigate potential lncRNA-mediated ceRNA interplay by utilizing sample-matched miRNA, lncRNA and mRNA expression profiles in PDACpatients and normal samples in combination with miRNA regulatory network based on “ceRNA hypothesis”. Then a PDAC-specific dysregulated lncRNA-mediated ceRNA network was constructed, which for the first time enables an overall view and analysis of lncRNA-associated ceRNA-mediated gene regulation in the development of PDAC on a system-wide level. This study will help to improve our understanding of lncRNA-mediated ceRNA regulatory mechanisms in the pathogenesis of PDAC and provide novel lncRNAs as candidate diagnostic biomarkers or potential therapeutic targets.
MATERIALS AND METHODS
Patient datasets
Three independent, nonoverlapping PDACpatient cohorts were used in this study. The initial discovery cohort of 25 PDAC samples and 7 nonmalignant pancreas samples with whole genome gene expression profiles and miRNA expression profiles were retrieved from Donahue' study [63] and were used to identify dysregulated lncRNA-mediated ceRNA interplay in PDAC. Another two PDACpatient cohorts composed of 52 samples (36 tumor samples and 16 normal samples, denoted by “ Pei cohort”) and 72 samples (36 tumor samples and 36 normal samples, denoted by “ Badea cohort”) only with whole genome gene expression profiles were collected from Pei's study [21] and Badea's study [22], and were used as validation cohorts to test the diagnostic power of lncRNA biomarkers.
Acquisition and analysis of expression profiles
The sample-matched whole genome gene expression data and miRNA expression profiles data of 32 samples in the discovery cohort and the whole genome gene expression data of 52 samples in the Pei cohort and 72 samples in the Badea cohort were obtained from the public available GEO database (the GEO accession number is GSE32688, GSE16515 and GSE15471). Raw gene expression data profiled from Affymetrix Human Genome U133 Plus 2.0 Array (HG-U133_Plus_2.0) in the three patient cohorts were processed and normalized using the Robust Multichip Average (RMA) algorithm for background adjustment [64] and log-transformed (base 2). miRNA expression data produced by the Exiqon miRNA arrays (miRCURY LNA microRNA Array v.11.0 -hsa, mmu & rno) were adjusted and normalized by variance stabilizing transformation [63]. lncRNA expression data were obtained by repurposing the probes in the HG-U133_Plus_2.0 array to lncRNAs based on the annotation from the GENCODE project (http://www.gencodegenes.org, release 22) as previously described [36, 41, 65, 66]. Finally, expression data of 11005 protein-coding genes (PCGs), 826 miRNAs and 2330 lncRNAs were retained for further analysis.Differential expression analysis of mRNAs, miRNAs and lncRNAs between PDAC samples and normal samples was carried out using the significance analysis of microarrays (SAM) method [67]. Unsupervised hierarchical clustering was used to investigate the expression pattern between PDAC samples and normal samples, and the Chi-square test or Fisher exact test was used to analyze the correlations between tumor status and lncRNA biomarkers.
MiRNA-mRNA and miRNA-lncRNA interaction data
The experimentally validated miRNA-mRNA interaction data was collected and integrated from TarBase (version 6.0) [68], miRTarBase (version 4.5) [69] and miRecords [70], including 37659 interactions between 402 miRNAs and 12360 PCGs. The putative interactions of miRNA-lncRNA were downloaded from lnCeDB database [71], including 1562845 interactions between 1394 miRNA and 28364 lncRNAs.The dysregulated lncRNA-associated ceRNA network (DLCN) in PDAC was constructed based on “ceRNA hypothesis” as follows: Firstly, expression correlation between differentially expressed mRNAs and differentially expressed lncRNAs was evaluated using Pearson correlation coefficient (PCC) from matched mRNA and lncRNA expression profiles data. Those dysregulated lncRNA-mRNA pairs with PCC > 0.5 and p < 0.05 were selected as co- dysregulated lncRNA-mRNA pairs. Then, the Pearson correlation coefficient between differentially expressed miRNAs and differentially expressed mRNAs, and between differentially expressed miRNAs and differentially expressed lncRNAs was computed from paired miRNA, mRNA and lncRNA expression profile data. For a given co-dysregulated lncRNA-mRNA pair, both mRNAs and lncRNAs in this pair are targeted and co-expressed negatively with a certain common miRNA, this miRNA-mRNA-lncRNA was identified as dysregulated competing triplets (DCTs). Finally, a DLCN was built for PDAC by assembling all DCTs identified above.The network characteristics of DLCN, including degree, characteristic path length (CPL), clustering coefficient (CC) and betweenness centrality were analyzed.
Development of lncRNA-based signature in PDAC diagnosis
For classification of PDAC vs. normal samples, lncRNA biomarkers were integrated to form a lncRNA-focus signature using support vector machine (SVM) with the sigmoid kernel. An unbiased performance estimate in identifying PDACpatients was carried out using leave one out cross-validation (LOOCV). Diagnostic ability of the lncRNA-focus prediction model was evaluated by obtaining the area under a receiver operating characteristic (ROC) curve (AUC) and diagnostic odds ratio (DOR). The ROC curve was produced by plotting true positive rates (sensitivity) against false positive rates (1-specificity). The DORs were calculated as follows:The permutation p-value of AUC and DOR was obtained from 1,000 randomization tests by randomizing lncRNA expression data for testing the null hypothesis.
Functional analysis of lncRNA biomarkers
Functional enrichment analysis of Gene Ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) for mRNAs in the DLCN was performed to infer potential biological processes and pathways of lncRNA biomarkers using DAVID Bioinformatics Tool (version 6.7) [72] limited to GO terms in the “Biological Process”(GOTERM-BP-FAT) and KEGG pathway categories. The biological processes and pathways with p-value of < 0.05 and an enrichment score of > 3.0 using the whole human genome as background were considered as significant functional categories which were organized into an interaction network with similar functions using the Enrichment Map plugin in Cytoscape environment [73].
Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong Journal: Nature Date: 2007-06-14 Impact factor: 49.962
Authors: P Carninci; T Kasukawa; S Katayama; J Gough; M C Frith; N Maeda; R Oyama; T Ravasi; B Lenhard; C Wells; R Kodzius; K Shimokawa; V B Bajic; S E Brenner; S Batalov; A R R Forrest; M Zavolan; M J Davis; L G Wilming; V Aidinis; J E Allen; A Ambesi-Impiombato; R Apweiler; R N Aturaliya; T L Bailey; M Bansal; L Baxter; K W Beisel; T Bersano; H Bono; A M Chalk; K P Chiu; V Choudhary; A Christoffels; D R Clutterbuck; M L Crowe; E Dalla; B P Dalrymple; B de Bono; G Della Gatta; D di Bernardo; T Down; P Engstrom; M Fagiolini; G Faulkner; C F Fletcher; T Fukushima; M Furuno; S Futaki; M Gariboldi; P Georgii-Hemming; T R Gingeras; T Gojobori; R E Green; S Gustincich; M Harbers; Y Hayashi; T K Hensch; N Hirokawa; D Hill; L Huminiecki; M Iacono; K Ikeo; A Iwama; T Ishikawa; M Jakt; A Kanapin; M Katoh; Y Kawasawa; J Kelso; H Kitamura; H Kitano; G Kollias; S P T Krishnan; A Kruger; S K Kummerfeld; I V Kurochkin; L F Lareau; D Lazarevic; L Lipovich; J Liu; S Liuni; S McWilliam; M Madan Babu; M Madera; L Marchionni; H Matsuda; S Matsuzawa; H Miki; F Mignone; S Miyake; K Morris; S Mottagui-Tabar; N Mulder; N Nakano; H Nakauchi; P Ng; R Nilsson; S Nishiguchi; S Nishikawa; F Nori; O Ohara; Y Okazaki; V Orlando; K C Pang; W J Pavan; G Pavesi; G Pesole; N Petrovsky; S Piazza; J Reed; J F Reid; B Z Ring; M Ringwald; B Rost; Y Ruan; S L Salzberg; A Sandelin; C Schneider; C Schönbach; K Sekiguchi; C A M Semple; S Seno; L Sessa; Y Sheng; Y Shibata; H Shimada; K Shimada; D Silva; B Sinclair; S Sperling; E Stupka; K Sugiura; R Sultana; Y Takenaka; K Taki; K Tammoja; S L Tan; S Tang; M S Taylor; J Tegner; S A Teichmann; H R Ueda; E van Nimwegen; R Verardo; C L Wei; K Yagi; H Yamanishi; E Zabarovsky; S Zhu; A Zimmer; W Hide; C Bult; S M Grimmond; R D Teasdale; E T Liu; V Brusic; J Quackenbush; C Wahlestedt; J S Mattick; D A Hume; C Kai; D Sasaki; Y Tomaru; S Fukuda; M Kanamori-Katayama; M Suzuki; J Aoki; T Arakawa; J Iida; K Imamura; M Itoh; T Kato; H Kawaji; N Kawagashira; T Kawashima; M Kojima; S Kondo; H Konno; K Nakano; N Ninomiya; T Nishio; M Okada; C Plessy; K Shibata; T Shiraki; S Suzuki; M Tagami; K Waki; A Watahiki; Y Okamura-Oho; H Suzuki; J Kawai; Y Hayashizaki Journal: Science Date: 2005-09-02 Impact factor: 47.728
Authors: Kristina P Sørensen; Mads Thomassen; Qihua Tan; Martin Bak; Søren Cold; Mark Burton; Martin J Larsen; Torben A Kruse Journal: Breast Cancer Res Date: 2015-04-11 Impact factor: 6.466
Authors: Zhou Du; Teng Fei; Roel G W Verhaak; Zhen Su; Yong Zhang; Myles Brown; Yiwen Chen; X Shirley Liu Journal: Nat Struct Mol Biol Date: 2013-06-02 Impact factor: 15.369