Literature DB >> 35495166

NCP-BiRW: A Hybrid Approach for Predicting Long Noncoding RNA-Disease Associations by Network Consistency Projection and Bi-Random Walk.

Yanling Liu^1,2, Hong Yang¹, Chu Zheng¹, Ke Wang¹, Jingjing Yan¹, Hongyan Cao¹, Yanbo Zhang^1,3,4.

Abstract

Long non-coding RNAs (lncRNAs) play significant roles in the disease process. Understanding the pathological mechanisms of lncRNAs during the course of various diseases will help clinicians prevent and treat diseases. With the emergence of high-throughput techniques, many biological experiments have been developed to study lncRNA-disease associations. Because experimental methods are costly, slow, and laborious, a growing number of computational models have emerged. Here, we present a new approach using network consistency projection and bi-random walk (NCP-BiRW) to infer hidden lncRNA-disease associations. First, integrated similarity networks for lncRNAs and diseases were constructed by merging similarity information. Subsequently, network consistency projection was applied to calculate space projection scores for lncRNAs and diseases, which were then introduced into a bi-random walk method for association prediction. To test model performance, we employed 5- and 10-fold cross-validation, with the area under the receiver operating characteristic curve as the evaluation indicator. The computational results showed that our method outperformed the other five advanced algorithms. In addition, the novel method was applied to another dataset in the Mammalian ncRNA-Disease Repository (MNDR) database and showed excellent performance. Finally, case studies were carried out on atherosclerosis and leukemia to confirm the effectiveness of our method in practice. In conclusion, we could infer lncRNA-disease associations using the NCP-BiRW model, which may benefit biomedical studies in the future.

Entities: Chemical

Keywords: bi-random walk; integrated similarity; lncRNA-disease association prediction; network consistency projection; normalization

Year: 2022 PMID： 35495166 PMCID： PMC9043107 DOI： 10.3389/fgene.2022.862272

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.772

Introduction

Long non-coding RNAs (lncRNAs) were primitively considered noise in transcriptional regulation and thought to have no biological functions (Guttman et al., 2013; Li et al., 2019). In recent decades, however, lncRNAs have attracted growing attention from researchers worldwide owing to the discovery of their critical biological functions. Increasing numbers of lncRNAs have been identified in eukaryotes (Guttman et al., 2009) and abnormal lncRNA expression has been shown to cause many human diseases, including nervous system diseases (Qureshi and Mehler, 2013; Chen et al., 2021), cardiovascular diseases (Bhatti et al., 2021; Xie et al., 2021), various cancers (Amelio et al., 2021; Taniue and Akimitsu, 2021), autoimmune diseases (Lodde et al., 2020; Zeni and Mraz, 2021), and blood diseases (Wei et al., 2013; Kirtonia et al., 2021). Therefore, searching for possible lncRNA-disease associations may facilitate the elucidation of the molecular pathogenesis of human diseases and could be relevant in disease diagnosis, prognosis, prevention, and treatment in the clinical setting. At present, researchers mainly study potential lncRNA-disease associations through biological experiment verification and computational model prediction. However, biological experiments are often costly, time-consuming, and inconclusive (Chen et al., 2017). Thus, few lncRNA-disease associations have been verified experimentally, and the use of more advanced algorithms is essential. LncRNA-disease association predictive models can be roughly classified into two types, the first of which is machine learning-based. Chen and Yan (2013) proposed the calculative model LRLSLDA, which integrates known lncRNA-disease interactions and lncRNA expression profiles and applies the Laplacian regularized least square method to predict disease-related lncRNAs. Subsequently, Chen et al. (2015) developed LRLSLDA-LNCSIM. Under the hypothesis that lncRNAs with similar functions tend to be related to similar diseases, two new functional similarity computational models, LNCSIM1 and LNCSIM2, were developed. Then, the two models were combined with the LRLSLDA model for the prediction of lncRNA-disease associations. Yang et al. (2014) constructed a binary network for genes and diseases, and applied a network propagation algorithm to find hidden lncRNA-disease interactions. On the basis of the naïve Bayesian classifier, Zhao et al. (2015) developed a novel method to identify cancer-related lncRNAs by integrating genome, transcriptome, and regulome data and identified 707 lncRNAs. Furthermore, Lu et al. (2018) proposed SIMCLDA, which first computed disease functional similarity and lncRNA Gaussian interaction profile kernel similarity and then used principal component analysis to extract the principal eigenvector of disease and lncRNA similarity. Finally, the inductive matrix completion technique was used for association prediction. In recent years, there have been many deep learning techniques developed in the field of bioinformatics. Zeng et al. (2020) developed the SDLDA model to predict lncRNA-disease interactions. SDLDA extracted the features of lncRNAs and diseases, including the linear features acquired by the singular value decomposition technique and the non-linear features obtained by the deep learning method. Zeng et al. (2021) proposed a deep matrix factorization model called DMFLDA. Based on the lncRNA-disease associations matrix, the non-linear hidden layers of DMFLDA were employed to learn the latent representation of lncRNAs and diseases, which could capture more complex and nonlinear lncRNA-disease associations. However, negative samples are required for these machine learning methods and are difficult to obtain. The second type of predictive model is network-based. Sun et al. (2014) constructed the RWRlncD model, in which random walk with restart was used to compute lncRNA functional similarity, and the lncRNA functional similarity network was then combined with the lncRNA-disease and disease similarity networks to form a global network. Finally, the candidate lncRNAs of specific diseases of interest were sorted. Chen (2015) developed KATZLDA, which integrated lncRNA functional similarity, lncRNA expression profiles, disease semantic similarity, Gaussian interaction profile kernel similarity, and the known lncRNA-disease pairs, and then used the KATZ method to predict the potential lncRNA-disease interactions. Wen et al. (2018) developed Lap-BiRWRHLDA. First, Laplacian normalization was applied to compute lncRNA similarity matrix and disease similarity matrix. Then a heterogeneous network was constructed based on lncRNA similarity network, disease similarity network, and known lncRNA-disease associations. Finally, bi-random walk algorithm was applied on this heterogeneous network to predict lncRNA-disease associations. Hu et al. (2019) proposed the BiWalkLDA model, which applied bi-random walk method to predict hidden lncRNA-disease associations. It integrated gene ontology and interaction profiles to calculate disease similarity, and used interaction profiles data to calculate lncRNA similarity in which the cold-start problem was solved by using the local topological structure of a new lncRNA. Xie et al. (2019) proposed NCPHLDA, which calculated the comprehensive similarity for lncRNAs and diseases and then applied a network consistency projection method to infer the interactions between lncRNAs and diseases. The most significant advantage of the network consistency projection algorithm is that it has no parameters. The network consistency projection algorithm and the bi-random walk algorithm have the common characteristic that they both have the calculation process on the similarity networks of lncRNAs and diseases. Wang and Yan (2019) constructed the IDLDA model, which used an improved diffusion method to infer lncRNA-disease interactions based on a combined dataset. Recently, some hybrid computational models have emerged and showed good performance. Xie et al. (2021) designed the RWSF-BLP model to forecast lncRNA-disease interactions. The model first applied a random walk algorithm to fuse various similarity networks and then adopted bidirectional label propagation to make predictions. Yin et al. (2020) created the NCPLP model based on network consistency projection and label propagation to predict microbe-disease interactions. These biological network-based methods provide a fresh perspective and framework with which we can construct new computational models. Here, we intend to construct a hybrid method consisting of two different methods. According to previous studies, we considered the following three factors in modeling: First, the two methods could be combined properly and reasonably. Second, it is better to have no more parameters, which is directly related to computational efficiency. Third, the combination of two methods should contribute more biological information to the final result. Accordingly, in this paper, we come up with a hybrid method consisting of network consistency projection and bi-random walk (NCP-BiRW) to infer lncRNA-disease interactions. We investigated comprehensive similarity networks for lncRNAs and diseases based on known lncRNA-disease relationships, disease semantic similarity, lncRNA functional similarity, and Gaussian interaction profile (GIP) kernel similarity for lncRNAs and diseases to apply more similarity information. Second, we constructed a heterogeneous network consisting of lncRNA similarity network, disease similarity network, and lncRNA-disease association network. The network consistency projection method was used to compute lncRNA network projection scores and disease network projection scores. Third, we added the results of the network consistency projection algorithm to the bi-random walk algorithm, and finally got the predicted scores of potential lncRNA-disease interactions. Five- and ten-fold cross-validation (CV) were adopted to verify the effectiveness of NCP-BiRW. Our Results demonstrated that our method outperformed the other five classical algorithms and we showed that the model was robust when applied to another dataset. Finally, case studies on atherosclerosis and leukemia were used to further verify the validity of our model.

Materials and Methods

Long non-coding RNA-Disease Associations Dataset

We downloaded known lncRNA-disease associations from the 2017-version LncRNADisease database (Chen et al., 2013) (http://www.cuilab.cn/lncrnadisease). After conducting data quality control and data cleaning, 701 known experimentally validated interactions between 157 diseases and 82 lncRNAs were acquired, as previously reported (Fan et al., 2020). and indicate the numbers of lncRNAs and diseases, respectively. denotes the association matrix, where is defined as follows:

Gaussian Interaction Profile Kernel Similarity for Long non-coding RNAs and Diseases

Researchers have hypothesized that the more similar two lncRNAs are, the more likely they are to have similar interaction modes with similar diseases (van Laarhoven et al., 2011). Thus, GIP kernel similarity was used to measure the similarities of lncRNAs and diseases. Given lncRNA and lncRNA , the GIP kernel similarity between the two lncRNAs can be calculated as follows: where represents the GIP kernel similarity matrix of lncRNAs, indicates the i-th row of , is the normalized kernel bandwidth, and is a parameter that is often set as 1 (van Laarhoven et al., 2011). Similarly, the GIP kernel similarity of disease is calculated as follows: where represents the GIP kernel similarity matrix of diseases, denotes the i-th column of , indicates the normalized kernel bandwidth, and .

Disease Semantic Similarity

Directed acyclic graphs (DAGs) have been widely used to compute the semantic similarity between diseases when predicting potential lncRNA-disease interactions (Chen et al., 2017). Here, the disease semantic similarity was calculated as previously reported (Fan et al., 2020). First, the Medical Subject Headings (MeSH) descriptors of the diseases we needed were downloaded from the National Library of Medicine (http://www.nlm.nih.gov/) (Wang et al., 2010). We then constructed a DAG for each disease : , where represents all the ancestor nodes of (containing ), and denotes all the direct edges from parent nodes to child nodes. For a disease s in , its semantic contribution to disease is computed as follows: where denotes the semantic contribution factor and is set to 0.5 (Wang et al., 2010). is defined as: where K is the diseases set in MeSH, is the number of DAGs containing s, and represents the number of all diseases in MeSH. By accumulating the semantic contributions of all the diseases in , the following formula is used to compute the final semantic similarity of disease : In general, the similarity between the two diseases is higher if the nodes sharing in their DAGs are higher. Therefore, we compute the semantic similarity of diseases and using the following formula:

Long non-coding RNA Functional Similarity

We computed the functional similarities of lncRNAs according to the LNCSIM model (Chen et al., 2015). Let and denoted the corresponding disease sets of lncRNA and lncRNA , and the similarity between disease and the disease set of lncRNA ( ) is given by In view of the hypothesis that functionally similar lncRNAs are usually related with similar diseases, the functional similarity between lncRNAs and is computed as follows: where denotes the number of elements in .

Network Consistency Projection and Bi-Random Walk

We constructed a novel model NCP-BiRW involving network consistent projection (Xie et al., 2019) and bi-random walk (Hu et al., 2019) to forecast hidden lncRNA-disease interactions. We divided the model implementation process into three steps. Figure 1 shows the flowchart of the algorithm.

FIGURE 1

Flow chart of NCP-BiRW.

Flow chart of NCP-BiRW. construction of integrated similarity networks for lncRNAs and diseases The integrated technique was adopted to obtain more similarity information. On the basis of the lncRNA GIP kernel similarity matrix (KL) and the lncRNA functional similarity matrix (FL), the integrated similarity between lncRNAs and is as follows: Similarly, based on the disease semantic similarity matrix (SV) and the disease GIP kernel similarity matrix (KD), the integrated similarity between diseases and is as follows: network consistency projection for lncRNA and disease spaces We constructed a heterogeneous network consisting of the above integrated similarity networks and lncRNA-disease association network. The network consistency projection method was utilized to obtain more network topological information (Yin et al., 2020). Network consistency projection can be divided into lncRNA network consistency projection and disease network consistency projection (Li et al., 2019; Xie et al., 2019). The lncRNA network consistency projection fractions can be formulated as follows: where is the i-th row of the lncRNA integrated similarity matrix (LS). is the j-th column of the association matrix , represents the relevance between disease and all lncRNAs, is the norm of , and is the projection fraction of on . In particular, if the angle between and is smaller, the score is higher (Bao et al., 2017). Similarly, the formula of the disease network consistency projection fractions is as follows: where is the j-th column of the disease integrated similarity matrix (DS), is the i-th row of (representing the relevance between lncRNA and all diseases), and is the projection fraction of on . bi-random walk in the integrated similarity networks of lncRNAs and diseases First, the integrated similarity networks, LS and DS were normalized such that all the similarity values were between 0 and 1 (Hu et al., 2019). The formula of the normalized similarity of lncRNAs is as follows: Similarly, the normalization of the disease similarity is as follows: The association matrix should also be normalized, as follows: Then, we carried out the random walk method for both the lncRNA similarity network and the disease similarity network, called bi-random walk, a global process (Zhang et al., 2018). r 1 and r 2 are designated as the maximum number of iterations in the lncRNA and disease similarity networks, respectively. If r 1 > r 2, the lncRNA similarity is considered more important in the predicted process (Hu et al., 2019). On the basis of the results of the network consistency projection, the iteration processes are as follows: where and denote the random walk scores in the lncRNA similarity network and the disease similarity network, respectively. β is the decay factor that controls the proportion of primitive information, NLS and NDS denote the lncRNA and disease normalized integrated similarity matrices, respectively. is the initial probability matrix of A, and the iterative function denotes the average value of and in step t. When , the algorithm ends, and we obtain the final (denoted as S), which contains all the predictive scores of lncRNA-disease pairs.

Results

Performance Evaluation

We used k-fold CV to evaluate the model performance. In k-fold CV, known lncRNA-disease pairs are divided into k subparts, with k-1 parts as the training set and the remaining part as the testing set. Here, we chose k = 5 (5-fold CV) and k = 10 (10-fold CV). All unknown associations were regarded as candidate samples. The predicted score of each lncRNA-disease pair was obtained using NCP-BiRW. The predicted scores of the test and candidate samples were sorted together. The receiver operating characteristic (ROC) curve was drawn according to the false positive rate (FPR) and the true positive rate (TPR) under different thresholds. The area under the ROC curve (AUC) was employed as a metric to assess the overall performance of our method. For AUC , when the value is closer to 1, the model performs better.

Effects of Parameters

In this research, there were three parameters: , r 1 and r 2. denotes the decay factor in bi-random walk, and its value ranges from 0 to 1. To test the performance of the model, we increased from 0.1 to 0.9 in steps of 0.1. The maximum number of iterations in the lncRNA and disease similarity networks (r 1 and r 2, respectively) was from 1 to 5, evaluated with a step size of 1. The grid search algorithm was used to determine the proper values of these parameters. By experimental comparison, the best parameter values were = 0.8 and r 1 = r 2 = 1 in the 5-fold CV framework, whereas in 10-fold CV framework, the optimal values were = 0.7 and r 1 = r 2 = 1. The experimental results of the grid search are listed in Supplementary Table S1. In the 10-fold CV framework, when = 0.7 and r 1 = r 2 = 1, the AUC value was close to the best AUC value. Finally, we set = 0.8 and r 1 = r 2 = 1 in the proposed model. Figure 2 shows the experimental effects of different r 1 and r 2 values when = 0.8 in the 5-fold CV framework. The optimal parameters corresponding to the best AUC were r 1 = r 2 = 1.

FIGURE 2

Results for r 1 and r 2 when = 0.8 in 5-fold CV.

Comparison With Other Methods

In order to prove the excellent model performance, we compared NCP-BiRW with five other popular algorithms: KATZLDA (Chen, 2015), Lap-BiRWRHLDA (Wen et al., 2018), BiWalkLDA (Hu et al., 2019), NCPHLDA (Xie et al., 2019), and IDLDA (Wang and Yan, 2019). We chose the parameter values for each model in the original reference. First, we conducted 5-fold CV, as shown in Figure 3, and the AUC of NCP-BiRW was 0.8982, which was better than the AUC values of the other five methods (KATZLDA: 0.8622, Lap-BiRWRHLDA: 0.8642, BiWalkLDA: 0.8702, NCPHLDA: 0.8338, and IDLDA: 0.8424). Then, we conducted 10-fold CV, and the AUC of NCP-BiRW was 0.9050 (Figure 3), which had the best performance (KATZLDA: 0.8646, Lap-BiRWRHLDA: 0.8666, BiWalkLDA: 0.8706, NCPHLDA: 0.8862, and IDLDA: 0.8413). In addition, we considered the following two models: 1) NCP, i.e., NCP-BiRW without bi-random walk; and 2) BiRW, i.e., NCP-BiRW without network consistent projection. Then, we compared the two models with NCP-BiRW, as shown in Figure 4. The results showed that our hybrid method was better than every single method. In summary, NCP-BiRW achieved the best performance for predicting lncRNA-disease interactions using the dataset from the LncRNADisease database.

FIGURE 3

ROCs and AUCs of the six methods using the LncRNADisease database.

FIGURE 4

Comparisons of NCP, BiRW, and NCP-BiRW using the LncRNADisease database.

ROCs and AUCs of the six methods using the LncRNADisease database. Comparisons of NCP, BiRW, and NCP-BiRW using the LncRNADisease database.

Robustness of Evaluation Using Another Dataset

We then applied NCP-BiRW to another dataset to determine whether our method could still achieve outstanding performance. We chose the Mammalian ncRNA-Disease Repository (MNDR) database (Cui et al., 2018), from which the known lncRNA-disease interactions were downloaded. After data cleaning, 1,680 known interactions between 190 diseases and 89 lncRNAs were selected (Fan et al., 2020). We performed the same experiment as above, and Figure 5 shows the final computational results. In 5-fold CV, the AUC of NCP-BiRW was 0.9556, which was better than those of KATZLDA (0.9450), Lap-BiRWRHLDA (0.9374), BiWalkLDA (0.9412), NCPHLDA (0.9355), and IDLDA (0.9452). In 10-fold CV, NCP-BiRW also performed the best. The AUCs of KATZLDA, Lap-BiRWRHLDA, BiWalkLDA, NCPHLDA, IDLDA and NCP-BiRW were 0.9466, 0.9380, 0.9420, 0.9539, 0.9466 and 0.9591, respectively. The excellent performance of NCP-BiRW using the MNDR database demonstrated the robustness of our model.

FIGURE 5

ROCs and AUCs of the six methods using the MNDR database.

Case Studies

Next, we chose atherosclerosis (AS) and leukemia as model diseases, and conducted case studies using these two diseases to further confirm the predictive effects of NCP-BiRW. The top 10 candidate lncRNAs predicted by our method for the two diseases are listed in Tables 1, 2. Eventually, lncRNAs in the tables were verified using the MNDR database (Ning et al., 2021) and the Lnc2Cancer database (Gao et al., 2021) (http://bio-bigdata.hrbmu.edu.cn/lnc2cancer).

TABLE1

Top ten lncRNAs for atherosclerosis.

Rank	LncRNA	Evidence
1	MALAT1	MNDR
2	MEG3	MNDR
3	HOTAIR	MNDR
4	PVT1	Unknown
5	GAS5	MNDR
6	UCA1	MNDR
7	TUG1	MNDR
8	BCYRN1	Unknown
9	XIST	MNDR
10	SPRY4-IT1	Unknown

TABLE 2

Top ten lncRNAs for leukemia.

Rank	LncRNA	Evidence
1	H19	Lnc2Cancer
2	MEG3	Lnc2Cancer
3	MALAT1	Lnc2Cancer
4	HOTAIR	MNDR, Lnc2Cancer
5	PVT1	MNDR, Lnc2Cancer
6	GAS5	MNDR, Lnc2Cancer
7	UCA1	Lnc2Cancer
8	TUG1	Lnc2Cancer
9	MIAT	Lnc2Cancer
10	XIST	MNDR, Lnc2Cancer

Top ten lncRNAs for atherosclerosis. Top ten lncRNAs for leukemia. AS is a chronic inflammatory disease characterized by lipid-rich plaques in the artery wall (Vigario et al., 2020). AS is the primary cause of most cardiovascular diseases, including acute myocardial infarction and stroke (Li et al., 2020). Many lncRNAs have been shown to function in AS, the central underlying pathology of cardiovascular diseases (Josefs and Boon, 2020). In this study, we next predicted the top 10 lncRNAs associated with AS (Table 1). Seven of these top 10 lncRNAs were verified using the MNDR database. For example, MALAT1 (ranked first) inhibits AS through miR-155 and SOCS1. Specifically, MALAT1 inhibits the release of inflammatory cytokines and blocks apoptosis by sponging miR-155 and enhancing SOCS1 expression to suppress the Janus kinase/signal transducer and activator of the transcription pathway (Li et al., 2018). Additionally, MEG3 (ranked second), an endothelial-enriched lncRNA, acts as a competing endogenous RNA against miR-223, which may explain the anti-AS functions of melatonin (Zhang et al., 2018). HOTAIR (ranked third), is related to the progression of various cancers; however, its functions in AS are still unclear. Notably, HOTAIR has been shown to control AS progression by sponging miR-330-5p in THP-1 cells (Liu et al., 2019). Leukemia, a type of blood or bone marrow cancer, involves excessive production of white cells (Luo et al., 2015). There are four main types of leukemia: acute lymphocytic leukemia, acute myeloid leukemia (AML), chronic lymphocytic leukemia, and chronic myeloid leukemia (CML) (Siegel et al., 2021). In 2020, over 31,000 people died of leukemia worldwide (Siegel et al., 2021). Recent studies have demonstrated the relationships among lncRNAs and the pathophysiology of leukemia (Gao et al., 2020). The top 10 predicted leukemia-related lncRNAs are listed in Table 2. All 10 were validated using the Lnc2Cancer database and the MNDR database. MALAT1 (ranked third) promotes the survival of CML cells, stimulates the cell cycle and imatinib resistance by sponging miR-328, highlighting the vital roles of MALAT1 as a microRNA sponge in CML and supporting the application of lncRNA-targeted therapies in the treatment of CML (Wen et al., 2018). Additionally, TUG1 (ranked eighth) promotes the progression of AML through the miR-370-3p/mitogen-activated protein kinase 1 (MAPK1)/extracellular signal-regulated kinase (ERK) signaling pathway. The MAPK1/ERK signaling pathway inhibits the epithelial-mesenchymal transition and thus blocks the migration and invasion of AML cells (Li et al., 2019). Studies have shown that MIAT (ranked ninth) is highly expressed in various solid tumors in humans and promotes AML progression by negatively regulating miR-495, which may be a promising therapeutic target in patients with AML (Wang et al., 2019).

Discussion

According to a substantial body of evidence, lncRNAs are critical for disease research. Identification of hidden lncRNA-disease pairs may provide insights into the pathological mechanisms of diseases, disease prevention, diagnosis, and treatment. Experimental techniques have been used to identify unknown lncRNA-disease interactions; however, these approaches are slow and costly. Therefore, computing methods have been developed as alternative approaches. Here, we constructed a new algorithm, NCP-BiRW, based on network consistency projection and bi-random walk. First, we integrated two similarity networks, i.e., one for diseases combining disease GIP kernel similarity and disease semantic similarity, and the other for lncRNAs combining lncRNA functional similarity and lncRNA GIP kernel similarity. Then, we used NCP-BiRW to forecast lncRNA-disease interactions on the LncRNADisease database. To validate its superiority, NCP-BiRW was compared with five classical models: KATZLDA, Lap-BiRWRHLDA, BiWalkLDA, NCPHLDA, and IDLDA based on 5- and 10-fold CV frameworks. The AUCs of NCP-BiRW were 0.8982 and 0.9050 for the two frameworks, respectively. To further test the stability of NCP-BiRW, we applied six methods on the MNDR database. After the same experimental process, the performance of NCP-BiRW was found to be optimal. Furthermore, case studies on AS and leukemia were used to validate the predictive performance of our algorithm in practice, and the prediction accuracy of the top 10 lncRNAs in AS and leukemia were 70% and 100%, respectively. The reasons for the outstanding performance of our model are as follows. First, a considerable amount of biological information about lncRNAs and diseases was applied. Indeed, we used disease semantic similarity, GIP kernel similarity, and lncRNA functional similarity to construct similarity networks. Second, we did not use negative samples. Third, for making full use of network topological information, network consistency projection was applied. Moreover, no parameters were necessary for this step, so the computational efficiency was improved. Finally, the model added the results of network consistency projection into the bi-random walk, so more network topological information was added to the initial association matrix in the computing process of the bi-random walk method. By conducting random walks on two similarity networks, the similarity of lncRNAs and diseases are used reasonably and fully. Based on the above, the performance of the algorithm has been improved. In the future, our model can be used for other association predictions, such as miRNA-disease, gene-disease, drug-disease associations. Despite these advantages, there are still some limitations of the NCP-BiRW framework. First, the proportion of known lncRNA-disease interactions in the LncRNADisease database is only 5.4%, and the original association matrix is thus very sparse; this could influence various calculations, including GIP kernel similarity, network consistency projection, and bi-random walk. Second, in this study, we only considered two factors: lncRNAs and diseases, and more biological information on different factors (such as genes, protein, and other types of RNAs) may provide more evidence for the prediction of lncRNA-disease interactions. Therefore, more valuable biological information is necessary for the future. Finally, NCP-BiRW is a network-based method. With the emergence of new methods in different fields, developing more algorithms for integration of various fields is essential. In our future studies, we will plan to apply multiple types of data with more biological information to association prediction models in order to yield more accurate predictive effects.

49 in total

1. Novel human lncRNA-disease association inference based on lncRNA expression profiles.

Authors: Xing Chen; Gui-Ying Yan
Journal: Bioinformatics Date: 2013-09-02 Impact factor: 6.937

2. Gaussian interaction profile kernels for predicting drug-target interaction.

Authors: Twan van Laarhoven; Sander B Nabuurs; Elena Marchiori
Journal: Bioinformatics Date: 2011-09-04 Impact factor: 6.937

3. Long noncoding RNA MIAT promotes the progression of acute myeloid leukemia by negatively regulating miR-495.

Authors: Gaoyan Wang; Xuerong Li; Liang Song; Hua Pan; Jian Jiang; Lirong Sun
Journal: Leuk Res Date: 2019-10-31 Impact factor: 3.156

Review 4. Long non-coding RNAs: novel targets for nervous system disease diagnosis and therapy.

Authors: Irfan A Qureshi; Mark F Mehler
Journal: Neurotherapeutics Date: 2013-10 Impact factor: 7.620

Review 5. Role of long non-coding RNAs in normal and malignant hematopoiesis.

Authors: Panpan Wei; Bowei Han; Yueqin Chen
Journal: Sci China Life Sci Date: 2013-09-12 Impact factor: 6.038

Review 6. Tolerogenic vaccines for the treatment of cardiovascular diseases.

Authors: Fernando Lozano Vigario; Johan Kuiper; Bram Slütter
Journal: EBioMedicine Date: 2020-06-20 Impact factor: 8.143

7. DNILMF-LDA: Prediction of lncRNA-Disease Associations by Dual-Network Integrated Logistic Matrix Factorization and Bayesian Optimization.

Authors: Yan Li; Junyi Li; Naizheng Bian
Journal: Genes (Basel) Date: 2019-08-12 Impact factor: 4.096