Literature DB >> 30459803

BPLLDA: Predicting lncRNA-Disease Associations Based on Simple Paths With Limited Lengths in a Heterogeneous Network.

Xiaofang Xiao1, Wen Zhu2, Bo Liao1,2, Junlin Xu1, Changlong Gu1, Binbin Ji2, Yuhua Yao2, Lihong Peng3, Jialiang Yang2,4.   

Abstract

In recent years, it has been increasingly clear that long noncoding RNAs (lncRNAs) play critical roles in many biological processes associated with human diseases. Inferring potential lncRNA-disease associations is essential to reveal the secrets behind diseases, develop novel drugs, and optimize personalized treatments. However, biological experiments to validate lncRNA-disease associations are very time-consuming and costly. Thus, it is critical to develop effective computational models. In this study, we have proposed a method called BPLLDA to predict lncRNA-disease associations based on paths of fixed lengths in a heterogeneous lncRNA-disease association network. Specifically, BPLLDA first constructs a heterogeneous lncRNA-disease network by integrating the lncRNA-disease association network, the lncRNA functional similarity network, and the disease semantic similarity network. It then infers the probability of an lncRNA-disease association based on paths connecting them and their lengths in the network. Compared to existing methods, BPLLDA has a few advantages, including not demanding negative samples and the ability to predict associations related to novel lncRNAs or novel diseases. BPLLDA was applied to a canonical lncRNA-disease association database called LncRNADisease, together with two popular methods LRLSLDA and GrwLDA. The leave-one-out cross-validation areas under the receiver operating characteristic curve of BPLLDA are 0.87117, 0.82403, and 0.78528, respectively, for predicting overall associations, associations related to novel lncRNAs, and associations related to novel diseases, higher than those of the two compared methods. In addition, cervical cancer, glioma, and non-small-cell lung cancer were selected as case studies, for which the predicted top five lncRNA-disease associations were verified by recently published literature. In summary, BPLLDA exhibits good performances in predicting novel lncRNA-disease associations and associations related to novel lncRNAs and diseases. It may contribute to the understanding of lncRNA-associated diseases like certain cancers.

Entities:  

Keywords:  Gaussian interaction profile kernel similarity; ROC curve; disease similarity; leave-one-out cross validation; lncRNA similarity; path with limited length

Year:  2018        PMID: 30459803      PMCID: PMC6232683          DOI: 10.3389/fgene.2018.00411

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


Introduction

It is known that there are about 20,000 protein-coding genes, consisting of less than 2% of the human genome (Bertone et al., 2004; Claverie, 2005). Most DNA regions in the human genome are either not transcribable or transcribed into noncoding RNAs (ncRNAs), which are deemed to be transcriptional noises in a long period of time. However, many recent studies have suggested that ncRNAs play key regulatory roles in many important biological processes such as cell proliferation (Esteller, 2011). Based on their sizes, ncRNAs can be divided into long ncRNAs (lncRNAs) (Pauli et al., 2011) and small ncRNAs such as microRNAs (miRNAs) (Farazi et al., 2013), transfer RNAs (tRNAs) (Birney et al., 2007), and Piwi-interacting RNAs (piRNAs) (Li et al., 2013). LncRNAs are ncRNAs of lengths greater than 200 nucleotides (Mercer et al., 2009; Mitchell Guttman et al., 2013). Compared to protein-coding, RNAs, lncRNAs are less conservative among species (Harrow et al., 2012; Cabili et al., 2016), and have a relatively low expression level, more tissue-specific patterns (Guttman et al., 2010), and longer but less exons (Chen, 2015). Recently, more and more lncRNAs have been identified in eukaryotes from nematodes to human beings due to the advancement in sequencing technologies and computational methods (Awan et al., 2017). Previous studies have suggested that lncRNAs are critical in cell proliferation, cell differentiation, chromatin remodeling, genome splicing, epigenetic regulation, transcription, and many other important biological processes (Guttman et al., 2009). The dysregulation of lncRNAs has also been associated with the development of many diseases, including diabetes (Pasmant et al., 2011), cardiovascular diseases (Congrains et al., 2012), HIV (Zhang et al., 2013), neurological disorders (Johnson, 2012), and several cancers such as lung cancer (Ji et al., 2003; Zhang et al., 2003), breast cancer (Barsyte-Lovejoy et al., 2006; Gupta et al., 2010), and prostate cancer (Kok et al., 2002; Szell et al., 2008). As a result, it has become a hot topic recently to identify lncRNA-disease associations, and many important disease-associated lncRNAs have been discovered. For example, breast cancer metastasis patients have about 100 to 2,000 times higher HOTAIR expression than that of the healthy people, based on a quantitative PCR study (Gupta et al., 2010). HOTAIR is also related to metastasis and progression of other cancers, such as liver cancer (Hrdlickova et al., 2014), lung cancer (Li et al., 2014), colorectal cancer (Res, 2011; Maass et al., 2014), gastric cancer (Li et al., 2014; Liu et al., 2014), and so on. Therefore, HOTAIR is deemed to be a potential biomarker for cancers (Maass et al., 2014). In addition, the dysfunction of lncRNA H19 is found in several diseases, such as bladder cancer (Ariel et al., 2000). The downregulation of H19 also significantly reduces the clonogenic and anchored nondependent growth of breast cancer cells based on a knock-down study (Barsyte-Lovejoy et al., 2006). Known lncRNA-disease associations have been stored in a few databases, including LncRNADisease (Chen et al., 2013), Lnc2Cancer (Ning et al., 2016), MNDR (Wang et al., 2013), and so on, which are the basis for predicting novel associations using efficient computational methods. The computational models to predict lncRNA-disease associations are generally divided into two categories including machine learning-based models and network-based models (Chen et al., 2017). Machine learning-based models usually train predictors from features based on training samples and test their performances based on cross-validation or independent data. For example, Chen et al. developed Laplacian Regularized Least Squares for LncRNA-Disease Association (LRLSLDA) for inferring candidates of disease-associated lncRNAs by applying a semisupervised learning framework (Chen and Yan, 2013). LRLSLDA assumes that similar diseases tend to correlate with functionally similar lncRNAs, and vice versa. Thus, known lncRNA-disease associations and lncRNA expression profiles are combined to prioritize disease-associated lncRNA candidates by LRLSLDA, which does not require negative samples (i.e., confirmed uncorrelated lncRNA-disease associations). However, LRLSLDA faces difficulty in optimizing the best model parameters. Zhao T. et al. (2015) proposed a naïve Bayesian classifier, which exploits various information related to cancer-associated lncRNAs, including regulome, genome, transcriptome, and multiomic data. As a result, 707 potential cancer-related lncRNAs were identified. However, this method requires negative samples, which are usually unknown. In contrast, network-based methods take the advantage of the lncRNA-disease association network, the disease similarity network, and the lncRNA similarity network to study the connectivity of lncRNAs and diseases. For instance, Sun et al. (2014) developed RWRlncD, which infers potential lncRNA-disease associations by a random walk with restart (RWR) on the lncRNA functional similarity network. However, the method cannot predict lncRNAs related to novel diseases (i.e., diseases with no known associated lncRNA). Gu et al. (2017) provided a global network random walk model for predicting lncRNA-disease associations (GrwLDA), which performs RWR on both lncRNA functional similarity network and disease similarity network. However, GrwLDA also faces a dilemma in optimizing model parameters. In this study, we have proposed a novel method BPLLDA to predict lncRNA-disease associations based on paths connecting them with limited lengths in a heterogeneous network. Specifically, BPLLDA first establishes a heterogeneous network consisting of the known lncRNA-disease association network, the disease similarity network, and the lncRNA similarity network. It then calculates the association between a disease and an lncRNA by the paths connecting them and their lengths. BPLLDA does not require negative samples and is capable of predicting novel diseases and novel lncRNAs.

Materials and methods

lncRNA-disease associations

The lncRNA-disease association data were retrieved from the database LncRNADisease (Chen et al., 2013; Sun et al., 2014). After eliminating identical lncRNA-disease entries from distinct pieces of evidence, there were 352 experimentally confirmed lncRNA-disease associations, containing 156 lncRNAs and 190 diseases (see Supplementary Figure 1 and Supplementary Tables 2, 3). We summarize some basic characteristics (e.g., the average degree) of the dataset in Table 1. We then established the lncRNA-disease association network, whose adjacency matrix is denoted by LD. That is, LD(i, j) is set to 1 if lncRNA l(i) is associated with disease d(j), and 0 if otherwise. Before presenting the details of BPLLDA, we first introduced two important notations, namely, disease semantic similarity and lncRNA functional similarity.
Table 1

The basic characteristics of the lncRNA-disease association dataset.

Total of lncRNAsTotal of diseasesTotal of associationsAverage degree of lncRNAsAverage degree of diseasesMax degree of lncRNAsMax degree of diseasesMin degree of lncRNAs/diseases
1561903522.31.941151
The basic characteristics of the lncRNA-disease association dataset.

Disease semantic similarity

The Disease Ontology (DO) is an open source ontology of human diseases (http://www.disease-ontology.org/). The terms in DO are diseases or disease-correlated concepts, which are organized in a directed acyclic graph (DAG). On the basis of Disease Ontology, Li et al. (2011) provided an R package called DOSim to calculate the disease semantic similarity, and we adopted this method in this study. Specifically, we used a symmetric matrix SS to record semantic similarity values among diseases, in which SS(i, j) represents semantic similarity between disease d(i) and d(j) as calculated by DOSim. We plot the distribution of SS in Figure 1A. There are overall 36100 (190 × 190) values, among which 21148 values (58.58%) are 0 s.
Figure 1

The distributions of disease semantic and lncRNA functional similarity. (A) Disease semantic similarity (SS) distribution. (B) lncRNA functional similarity (FS) distribution. The x-axis indicates the intervals of similarity values and the y-axis indicates the numbers of values in the interval. The actual values are also marked above the histograms.

The distributions of disease semantic and lncRNA functional similarity. (A) Disease semantic similarity (SS) distribution. (B) lncRNA functional similarity (FS) distribution. The x-axis indicates the intervals of similarity values and the y-axis indicates the numbers of values in the interval. The actual values are also marked above the histograms.

lncRNA functional similarity

We adopted a similar method to Sun et al. for measuring the functional similarity between two lncRNAs (Wang et al., 2010; Sun et al., 2014). Specifically, suppose lncRNA l(i) is associated with a disease set D = {d| 1 ≤ k ≤ m} and lncRNA l(j) is associated with D = {d| 1 ≤ l ≤ n}. The method first calculates the semantic similarity between a disease, say d, and a disease group, say D, as Then, the functional similarity between l(i) and l(j) is calculated as It is clear that the lncRNA functional similarity matrix FS is symmetric. Similarly, we plot the distribution of FS in Figure 1B. There are 24336 (156 × 156) values, among which 8662 (35.59%) are 0 s.

Gaussian interaction profile kernel similarity for lncRNAs

There are many zeros in FS due to the fact that lncRNA-disease associations are rather incomplete. To avoid such scenario, we introduced the Gaussian interaction profile kernel similarity between lncRNA l(i) and l(i) as where IP(l(i)) and IP(l(j)) are the vectors in the ith and jth row of the lncRNA-disease association matrix LD. The parameter γ is a regulation parameter of the kernel bandwidth with , where ln is the number of all lncRNAs studied and is usually set to 1 according to van Laarhoven et al. (2011).

Gaussian interaction profile kernel similarity for diseases

Similarly, we defined the Gaussian interaction profile kernel similarity for diseases as with , where IP(d(i)) and IP(d(i)) are the binary vectors in the ith and jth column of the adjacency matrix LD and dn is the numbers of diseases. Clearly, GD is also symmetric.

Integrated similarity between lncRNAs and between diseases

We integrated disease semantic similarity (lncRNA functional similarity) with the Gaussian interaction profile kernel similarity for diseases (lncRNAs) as follows: where NS is the set of diseases with no sematic similarity with any other disease, and NF is the set of lncRNAs with no functional similarity with any other lncRNAs. By definition, DS and LS are symmetric. We plot the distributions of DS and LS in Figure 2, in which the numbers of 0 s are greatly reduced compared to SS and FS.
Figure 2

The distributions of integrated similarities. (A) Distribution of the integrated similarity for diseases (DS). (B) Distribution of the integrated similarity for lncRNAs (LS). The x-axis indicates the intervals of similarity values and the y-axis indicates the numbers of values in the interval. The actual values are also marked above the histograms.

The distributions of integrated similarities. (A) Distribution of the integrated similarity for diseases (DS). (B) Distribution of the integrated similarity for lncRNAs (LS). The x-axis indicates the intervals of similarity values and the y-axis indicates the numbers of values in the interval. The actual values are also marked above the histograms.

BPLLDA

The general workflow of BPLLDA is illustrated in Figure 3, in which a heterogeneous network is first constructed with nodes denoting lncRNAs or diseases. For any two diseases d(i) and d(j), the weight of the edge between them is defined to be
Figure 3

The flowchart of BPLLDA. It consists of three steps: (1) disease similarity measurement, (2) lncRNA similarity measurement, and (3) the BPLLDA algorithm.

where T is a threshold value to avoid all diseases being connected (You et al., 2017). Similarly, the weight of the edge between two lncRNAs l(i) and l(j) is The weight of an edge between an lncRNA l(i) and a disease d(j) is LD(l(i), d(j)), that is, the weight is 1 if they are associated and 0 if otherwise. We tuned T from 0.1 to 0.5 with interval 0.1 by a leave-one-out cross-validation (LOOCV) process and finally chose T to be 0.2. The flowchart of BPLLDA. It consists of three steps: (1) disease similarity measurement, (2) lncRNA similarity measurement, and (3) the BPLLDA algorithm. For a given lncRNA node l(i) and a disease node d(j), we performed a depth-first search (Hopcroft and Tarjan, 1974) to identify all noncyclic paths between them. To avoid long paths, we restricted the maximum number of edges in the path to be τ. Similarly, we performed an LOOCV search for τ being 1 to 4 and decided τ to be 3. Intuitively, l(i) and d(j) tend to be associated if there are many paths with high edge weights connecting them. Therefore, a score measuring their association confidence can be defined using the paths together with a decay function Fdecay(p): where p = {p1, p2, …, p} is the set of paths connecting l(i) and d(j), and ∏p denotes the product of the weights of all edges in the path p. Generally speaking, long paths will have little contribution to the total score. So the decay function Fdecay(p) is denoted as where the decay factor α is set to 2.26 based on a previous study (Ba-Alawi et al., 2016; You et al., 2017) and len(p) is the length of the path p. Clearly, the higher the score(l(i), d(j)), the more likely that l(i) and d(j) will be associated.

Analysis of the computational complexity

We analyzed the time complexity and space complexity of BPLLDA. Recall that there are m diseases and n lncRNAs with m > n. The algorithm mainly consists of two steps. First, a heterogeneous network was constructed, for which two matrices were established. So the time complexity and space complexity are O(m2) respectively in this step. Then, BPLLDA infers the probability of an lncRNA-disease association based on paths with limited lengths in the network. We performed a depth-first search to identify all noncyclic paths between nodes and the time complexity is O((m + n)2) on each node. Because there are m diseases, the time complexity is O(m3) in this step. And the space complexity is O(mn) because we need to only save the prediction result. In summary, the time complexity and space complexity are at most O(m3) and O(m2), respectively, for BPLLDA.

Results and discussions

Performance of BPLLDA in predicting lncRNA-disease associations

We applied BPLLDA to a known lncRNA-disease association data LD, together with two popular methods GrwLDA (Gu et al., 2017) and LRLSLDA (Chen and Yan, 2013). The reason why we selected the two methods for comparison is that they can both predict novel lncRNAs and novel diseases. Specifically, two LOOCV methods namely global LOOCV and local LOOCV were adopted to evaluate their performances. Global LOOCV sets each experimentally confirmed lncRNA-disease association as a test sample once, but local LOOCV sets all associations of an lncRNA or those of a disease as test samples once. Other known lncRNA-disease associations are considered as training samples. The performances of the methods were evaluated by the area under the receiver operating characteristic (ROC) curve (AUC). As a result, we plotted the global LOOCV ROC curves and their associated AUCs of BPLLDA, GrwLDA, and LRLSLDA, respectively, in Figure 4. BPLLDA has an AUC of 0.87117, and outperformed LRLSLDA (0.81952) and GrwLDA (0.78246). Similarly, we plotted the local LOOCV ROC curves and AUCs of the three methods on novel lncRNAs in Figure 5. As can be seen, BPLLDA has an AUC of 0.82403, about 8 and 18% higher than that of LRLSLDA (0.76542) and GrwLDA (0.69817), respectively. Finally, the AUC of BPLLDA (0.78528) in predicting novel diseases is significantly higher than that of LRLSLDA (0.65812) with an increase of 19% and GrwLDA (0.65802) with an increase of 20% (see Figure 6). In summary, our method is better than LRLSLDA and GrwLDA in both lncRNA-disease association prediction and prediction related to novel lnRNAs and diseases.
Figure 4

Performance evaluation of BPLLDA, LRLSLDA, and GrwLDA in predicting lncRNA-disease associations by global LOOCV.

Figure 5

Performance evaluation of BPLLDA, LRLSLDA, and GrwLDA in predicting novel lncRNA-associated diseases.

Figure 6

Performance evaluation of BPLLDA, LRLSLDA, and GrwLDA in predicting novel disease-associated lncRNAs.

Performance evaluation of BPLLDA, LRLSLDA, and GrwLDA in predicting lncRNA-disease associations by global LOOCV. Performance evaluation of BPLLDA, LRLSLDA, and GrwLDA in predicting novel lncRNA-associated diseases. Performance evaluation of BPLLDA, LRLSLDA, and GrwLDA in predicting novel disease-associated lncRNAs. Meanwhile, we list in Table 2 the precision versus the prediction scores in the global LOOCV. In general, the higher the score, the more likely the disease is related to the lncRNAs. The association confidence is greater than 0.9 when the prediction score is larger than 21.58.
Table 2

Precision of BPLLDA on global LOOCV.

Prediction scores1.002~9.92910.028~17.60121.580~24.39125.778~37.757
Precision> = 0.134> = 0.446> = 0.9331
Precision of BPLLDA on global LOOCV.

Effects of parameters

There are two model parameters in BPLLDA, including the maximum path length L and the weight threshold T. We tested the effects of these parameters on AUCs for LOOCV with L (L = 2, 3, 4) and T (T = 0.2, 0.4, 0.5), and we list the results in Table 3. As can be seen, the parameter L has significant effects on the performance of BPLLDA, and the best AUC is achieved at L = 3. In contrast, T has only minor effects on the performance of our method. To further illustrate this, we fixed L to be 3, and let T vary from 0.1 to 0.5 with interval 0.1 (see Table 4). The AUC values are between 0.85568 and 0.87117, only about 2% difference.
Table 3

Tuning two model parameters: the maximum path length L and the weight threshold T by LOOCV.

L234
T = 0.20.839030.87117*
T = 0.40.820430.855680.81205
T = 0.50.817610.859590.80830

The value in each cell represents LOOCV AUC.

T = 0.2 and L = 4 was not calculated because it takes more than 48 h.

Table 4

The effects of T on AUC when fixing L = 3.

T0.10.20.30.40.5
AUC0.871020.871170.868890.855680.85959
Tuning two model parameters: the maximum path length L and the weight threshold T by LOOCV. The value in each cell represents LOOCV AUC. T = 0.2 and L = 4 was not calculated because it takes more than 48 h. The effects of T on AUC when fixing L = 3.

Effects of Gaussian interaction profile kernel similarity for lncRNAs and diseases

Disease similarity and lncRNA similarity are calculated by integrating disease semantic similarity, lncRNA functional similarity, as well as the Gaussian interaction profile kernel similarity for lncRNAs and diseases. We tested the effects of the Gaussian interaction profile kernel similarity for lncRNAs and diseases on LOOCV with L = 3 and T = 0.2 with four settings: (1) without using both the Gaussian interaction profile kernel similarity for lncRNAs and diseases; (2) only using the Gaussian interaction profile kernel similarity for lncRNAs; (3) only using the Gaussian interaction profile kernel similarity for diseases; (4) using both the Gaussian interaction profile kernel similarity for lncRNAs and diseases. The results are summarized in Table 5. As can be seen, the two similarities indeed have a significant influence on the LOOCV AUC. The best AUC (0.87117) was achieved when both similarities were adopted into our model.
Table 5

The effects of the Gaussian interaction profile kernel similarity for lncRNAs and diseases on LOOCV.

No GD and GLGLGDGL and GD
0.787180.790360.809240.87117

The value in each cell represents LOOCV AUC.

The effects of the Gaussian interaction profile kernel similarity for lncRNAs and diseases on LOOCV. The value in each cell represents LOOCV AUC.

Case studies on predicted lncRNA-disease associations

It is known that lncRNAs play critical roles in the development of many diseases. To further evaluate the ability of BPLLDA in inferring novel lncRNA-disease associations, we used all known lncRNA-disease associations in LD as training data and assessed the potential of predicted associations by our model. The novel lncRNA-disease associations were ranked according to the predicted score of BPLLDA. To validate the predictions, the newest LncRNADisease database was used, which curated 1766 distinct known lncRNA-disease associations among 888 lncRNAs and 328 diseases. Specifically, we listed the top five lncRNAs associated with three diseases, including cervical cancer, glioma, and non-small-cell lung cancer (NSCLC), respectively, in Table 6 and the paths of cervical cancer in Supplementary Table 1. For a better view, we also plotted the associations of the three diseases and their top 10 predicted lncRNAs in Figure 7.
Table 6

The top five lncRNA candidates predicted for cervical cancer, glioma, and non-small-cell lung cancer.

DiseaselncRNAEvidence
Cervical cancerMEG3LncRNADisease (Zhang J. et al., 2016)
Cervical cancerPVT1LncRNADisease (Yang et al., 2016)
Cervical cancerCDKN2B-AS1LncRNADisease (Zhang D. et al., 2016)
Cervical cancerHOTAIRLncRNADisease (Huang et al., 2014)
Cervical cancerGAS5LncRNADisease (Cao et al., 2014)
GliomaH19LncRNADisease (Shi et al., 2014)
GliomaMALAT1LncRNADisease (Ma et al., 2015)
GliomaPVT1(Zou et al., 2017)
GliomaHOTAIRLncRNADisease (Ke et al., 2015)
GliomaGAS5LncRNADisease (Zhao X. et al., 2015)
Non-small-cell lung cancerH19LncRNADisease (Zhang E. et al., 2016)
Non-small-cell lung cancerMEG3LncRNADisease (Lu et al., 2013)
Non-small-cell lung cancerHOTAIRLncRNADisease (Liu X. H. et al., 2013)
Non-small-cell lung cancerPVT1LncRNADisease (Yang et al., 2014)
Non-small-cell lung cancerCDKN2B-AS1LncRNADisease (Nie et al., 2015)
Figure 7

Network view of the top 10 predicted lncRNAs for cervical cancer, glioma, and non-small-cell lung cancer.

The top five lncRNA candidates predicted for cervical cancer, glioma, and non-small-cell lung cancer. Network view of the top 10 predicted lncRNAs for cervical cancer, glioma, and non-small-cell lung cancer. Cervical cancer is a cancer in the cervix and its early symptoms are hard to uncover. As the second common cancer among women all over the world, cervical cancer causes numerous incidents of death in developing countries (Forouzanfar et al., 2011). It was reported that there are approximately 500,000 novel cases of cervical cancer diagnosed annually (Tewari et al., 2014). Therefore, there is an urgent need to explore its biological mechanisms and develop effective treatment strategies. Interestingly, all of the top five novel cervical cancer-associated lncRNAs predicted by BPLLDA were confirmed by the newest updates of the LncRNADisease database. For example, the top predicted lncRNA, MEG3, can inhibit tumor growth in cervical cancer by regulating miR-21-5p, which is regarded as a tumor suppressor (Zhang J. et al., 2016). Serum PVT1 can accurately differentiate patients with cervical cancer from healthy controls (Yang et al., 2016). The high expression of HOTAIR is involved in cervical cancer progression and may be a potential target for diagnosis and gene therapy (Huang et al., 2014). Glioma is considered to be the most common malignant tumor in the central nervous system and is characterized by aggressive blood vessel formation (Khasraw et al., 2010). Despite the continuous improvement of various treatments, including surgery, radiotherapy, and chemotherapy, the overall survival of patients with glioma is only about 12–14 months after diagnosis (Wang et al., 2015). The poor treatment effect is mainly due to the prominent tumor angiogenesis. Similarly, BPLLDA achieved good performance in predicting glioma-associated lncRNAs as all top five predicted lncRNAs were confirmed by the newest LncRNADisease database and literature. For example, it was shown that H19 regulates the development of glioma by deriving miR-675 and offers an essential clue to understanding the key role of the lncRNA-miRNA functional network in glioma (Shi et al., 2014). The expression level of lncRNA MALAT1 is significantly correlated with the overall survival of patients with glioma and can be used as a convictive prognostic biomarker for patients with glioma (Ma et al., 2015). In addition, Gas5 inhibits tumor malignancy by downregulating miR-222, which may be a promising treatment for glioma (Zhao X. et al., 2015). NSCLC, including adenocarcinoma and squamous cell carcinoma, is a predominant form of lung cancer (Siegel et al., 2012). Despite the progress in clinical and experimental oncology, the prognosis remains difficult. More and more evidence indicates that ncRNAs could take part in the pathogenesis of NSCLC. Similarly, the top five NSCLC-correlated lncRNA candidates predicted by BPLLDA were validated by literature. For example, HOTAIR is significantly upregulated in NSCLC tissues and partly regulates cell invasion and metastasis of NSCLC by HOXA5 downregulation (Liu X. H. et al., 2013). So, HOTAIR is a potential therapeutic target for NSCLC intervention. In addition, patients with NSCLC with high PVT1 expression have a significantly lower overall survival rate than those with low PVT1 expression (Yang et al., 2014). Finally, the expression of CDKN2B-AS1 (ANRIL) might damage cell proliferation and leads to cell apoptosis in vitro and in vivo (Nie et al., 2015), which is linked to the survival of patients with NSCLC.

Case studies on predicted novel diseases and novel lncRNAs

To test the ability of BPLLDA in predicting novel disease-associated lncRNAs, all known lncRNA-disease associations correlated with a disease were eliminated. We selected two diseases: colorectal cancer and breast cancer (see Table 7). As can be seen, all top five predicted lncRNAs associated with colorectal cancer were confirmed by the newest LncRNADisease database, whereas four of the top five lncRNAs associated with breast cancer were also validated by the database or literature.
Table 7

The top five novel disease-correlated lncRNA candidates predicted for colorectal cancer and breast cancer.

DiseaselncRNAEvidence
Colorectal cancerH19lncRNADisease (Tsang et al., 2010)
Colorectal cancerCDKN2B-AS1lncRNADisease (Sun et al., 2016)
Colorectal cancerPVT1lncRNADisease (Ping et al., 2018)
Colorectal cancerMEG3lncRNADisease (Zhu et al., 2018)
Colorectal cancerMALAT1lncRNADisease (Ji et al., 2014)
Breast cancerH19lncRNADisease (Vennin et al., 2015)
Breast cancerCDKN2B-AS1lncRNADisease (Xu et al., 2017)
Breast cancerPVT1lncRNADisease (Guan et al., 2007)
Breast cancerMALAT1lncRNADisease (Chou et al., 2016)
Breast cancerB2 SINE RNAUnconfirmed
The top five novel disease-correlated lncRNA candidates predicted for colorectal cancer and breast cancer. Similarly, to test the ability of BPLLDA in predicting novel lncRNA-associated diseases, all known lncRNA-disease associations correlated with an lncRNA were removed. As two case studies, we selected two lncRNAs, H19, and HOTAIR (see Table 8). In both cases, four of the top five associated diseases were validated by the database and literature. In summary, BPLLDA achieves favorable performances in predicting novel disease-associated lncRNAs and novel lncRNA-associated diseases.
Table 8

The top five novel disease-correlated lncRNA candidates predicted for H19 and HOTAIR.

lncRNADiseaseEvidence
H19Prostate cancerlncRNADisease (Zhu et al., 2014)
H19Tumor(Matouk et al., 2007)
H19CancerlncRNADisease (DeBaun et al., 2002)
H19Breast cancerlncRNADisease (Vennin et al., 2015)
H19Decreased myogenesisUnconfirmed
HOTAIRCancerlncRNADisease (Gupta et al., 2010)
HOTAIRBreast cancerlncRNADisease (Xue et al., 2016)
HOTAIRHepatocellular carcinomalncRNADisease (Yang et al., 2011)
HOTAIRProstate cancerlncRNADisease (Zhang et al., 2015)
HOTAIRTumorUnconfirmed
The top five novel disease-correlated lncRNA candidates predicted for H19 and HOTAIR.

Conclusions

Many studies have demonstrated that lncRNAs are essential in many physiological processes related to human diseases. They could be important biomarkers for the diagnosis, prognosis, and treatment of these diseases. However, the biological experiments to validate lncRNA-disease associations are not only time consuming but also costly, which promotes the need for developing computational prediction models. In this study, we proposed BPLLDA, a novel computational method to predict lncRNA-disease associations based on simple paths with limited lengths in a heterogeneous network consisting of the lncRNA similarity network, the disease similarity network, and the lncRNA-disease association network. BPLLDA outperforms two compared methods in prediction accuracy, and most top predicted novel lncRNA-disease associations were validated by literature. However, there are a few limitations of BPLLDA. First, available experimentally validated lncRNA-disease associations are rather incomplete. Secondly, lncRNA similarity is computed on the basis of known lncRNA-disease associations. There is a problem of sparseness in the disease semantic similarity and lncRNA functional similarity, which is remedied by integrating the Gaussian interaction profile kernel similarity for diseases and lncRNAs, respectively. So, BPLLDA may result in biased predictions. Finally, the distance-decay function in BPLLDA is relatively simple and could be improved by machine learning methods.

Author contributions

JY and BL: conceived the concept of the work and designed the experiments; XX, JX, BJ and YY: performed the literature search; XX, WZ, CG, and LP: collected and analyzed the data; XX and JY: wrote the paper, and all authors have approved the manuscript.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  79 in total

Review 1.  The enigmatic world of mRNA-like ncRNAs: their role in human evolution and in human diseases.

Authors:  Márta Széll; Zsuzsanna Bata-Csörgo; Lajos Kemény
Journal:  Semin Cancer Biol       Date:  2008-01-15       Impact factor: 15.707

2.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses.

Authors:  Moran N Cabili; Cole Trapnell; Loyal Goff; Magdalena Koziol; Barbara Tazon-Vega; Aviv Regev; John L Rinn
Journal:  Genes Dev       Date:  2011-09-02       Impact factor: 11.361

3.  Gas5 Exerts Tumor-suppressive Functions in Human Glioma Cells by Targeting miR-222.

Authors:  Xihe Zhao; Ping Wang; Jing Liu; Jian Zheng; Yunhui Liu; Jiajia Chen; Yixue Xue
Journal:  Mol Ther       Date:  2015-09-15       Impact factor: 11.454

4.  lncRNA H19/miR-675 axis represses prostate cancer metastasis by targeting TGFBI.

Authors:  Miaojun Zhu; Qin Chen; Xin Liu; Qian Sun; Xian Zhao; Rong Deng; Yanli Wang; Jian Huang; Ming Xu; Jianshe Yan; Jianxiu Yu
Journal:  FEBS J       Date:  2014-07-21       Impact factor: 5.542

5.  Long noncoding RNA MALAT1 associates with the malignant status and poor prognosis in glioma.

Authors:  Kang-xiao Ma; Hong-jie Wang; Xiao-rong Li; Tao Li; Gang Su; Pan Yang; Jian-wen Wu
Journal:  Tumour Biol       Date:  2015-01-23

6.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors:  Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal:  Nature       Date:  2007-06-14       Impact factor: 49.962

7.  A pituitary-derived MEG3 isoform functions as a growth suppressor in tumor cells.

Authors:  Xun Zhang; Yunli Zhou; Kshama R Mehta; Daniel C Danila; Staci Scolavino; Stacey R Johnson; Anne Klibanski
Journal:  J Clin Endocrinol Metab       Date:  2003-11       Impact factor: 5.958

8.  H19 non coding RNA-derived miR-675 enhances tumorigenesis and metastasis of breast cancer cells by downregulating c-Cbl and Cbl-b.

Authors:  Constance Vennin; Nathalie Spruyt; Fatima Dahmani; Sylvain Julien; François Bertucci; Pascal Finetti; Thierry Chassat; Roland P Bourette; Xuefen Le Bourhis; Eric Adriaenssens
Journal:  Oncotarget       Date:  2015-10-06

9.  MEG3 Activated by Vitamin D Inhibits Colorectal Cancer Cells Proliferation and Migration via Regulating Clusterin.

Authors:  Yan Zhu; Peizhan Chen; Yisha Gao; Na Ta; Yunshuo Zhang; Jialin Cai; Yong Zhao; Shupeng Liu; Jianming Zheng
Journal:  EBioMedicine       Date:  2018-03-31       Impact factor: 8.143

10.  NEAT1 long noncoding RNA and paraspeckle bodies modulate HIV-1 posttranscriptional expression.

Authors:  Quan Zhang; Chia-Yen Chen; Venkat S R K Yedavalli; Kuan-Teh Jeang
Journal:  MBio       Date:  2013-01-29       Impact factor: 7.867

View more
  15 in total

1.  Computational Methods and Applications for Identifying Disease-Associated lncRNAs as Potential Biomarkers and Therapeutic Targets.

Authors:  Congcong Yan; Zicheng Zhang; Siqi Bao; Ping Hou; Meng Zhou; Chongyong Xu; Jie Sun
Journal:  Mol Ther Nucleic Acids       Date:  2020-05-21       Impact factor: 8.886

2.  An Efficient Classifier for Alzheimer's Disease Genes Identification.

Authors:  Lei Xu; Guangmin Liang; Changrui Liao; Gin-Den Chen; Chi-Chang Chang
Journal:  Molecules       Date:  2018-11-29       Impact factor: 4.411

3.  SKF-LDA: Similarity Kernel Fusion for Predicting lncRNA-Disease Association.

Authors:  Guobo Xie; Tengfei Meng; Yu Luo; Zhenguo Liu
Journal:  Mol Ther Nucleic Acids       Date:  2019-08-09       Impact factor: 8.886

4.  Inferring Latent Disease-lncRNA Associations by Faster Matrix Completion on a Heterogeneous Network.

Authors:  Wen Li; Shulin Wang; Junlin Xu; Guo Mao; Geng Tian; Jialiang Yang
Journal:  Front Genet       Date:  2019-09-04       Impact factor: 4.599

5.  CNNDLP: A Method Based on Convolutional Autoencoder and Convolutional Neural Network with Adjacent Edge Attention for Predicting lncRNA-Disease Associations.

Authors:  Ping Xuan; Nan Sheng; Tiangang Zhang; Yong Liu; Yahong Guo
Journal:  Int J Mol Sci       Date:  2019-08-30       Impact factor: 5.923

Review 6.  RUNX3 Expression Level Is Correlated with the Clinical and Pathological Characteristics in Endometrial Cancer: A Systematic Review and Meta-analysis.

Authors:  Zhen Liu; Zhi-Pan Hong; Shu-Xue Xi
Journal:  Biomed Res Int       Date:  2021-07-14       Impact factor: 3.411

Review 7.  Long Non-coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer.

Authors:  Anshika Chowdhary; Venkata Satagopam; Reinhard Schneider
Journal:  Front Genet       Date:  2021-07-01       Impact factor: 4.599

8.  LDAI-ISPS: LncRNA-Disease Associations Inference Based on Integrated Space Projection Scores.

Authors:  Yi Zhang; Min Chen; Ang Li; Xiaohui Cheng; Hong Jin; Yarong Liu
Journal:  Int J Mol Sci       Date:  2020-02-22       Impact factor: 5.923

9.  Multiview Consensus Graph Learning for lncRNA-Disease Association Prediction.

Authors:  Haojiang Tan; Quanmeng Sun; Guanghui Li; Qiu Xiao; Pingjian Ding; Jiawei Luo; Cheng Liang
Journal:  Front Genet       Date:  2020-02-21       Impact factor: 4.599

10.  LncRNA KRT19P3 Is Involved in Breast Cancer Cell Proliferation, Migration and Invasion.

Authors:  Yanping Fan; Xiaotong Dong; Meizeng Li; Pengju Liu; Jie Zheng; Hongli Li; Yunxiang Zhang
Journal:  Front Oncol       Date:  2022-01-04       Impact factor: 6.244

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.