Literature DB >> 32585624

Computational Methods and Applications for Identifying Disease-Associated lncRNAs as Potential Biomarkers and Therapeutic Targets.

Congcong Yan¹, Zicheng Zhang¹, Siqi Bao¹, Ping Hou¹, Meng Zhou¹, Chongyong Xu², Jie Sun³.

Abstract

Long non-coding RNAs (lncRNAs) have been recognized as critical components of a broad genomic regulatory network and play pivotal roles in physiological and pathological processes. Identification of disease-associated lncRNAs is becoming increasingly crucial for fundamentally improving our understanding of molecular mechanisms of disease and developing novel biomarkers and therapeutic targets. Considering lower efficiency and higher time and labor cost of biological experiments, computer-aided inference of disease-associated RNAs has become a promising avenue for facilitating the study of lncRNA functions and provides complementary value for experimental studies. In this study, we first summarize data and knowledge resources publicly available for the study of lncRNA-disease associations. Then, we present an updated systematic overview of dozens of computational methods and models for inferring lncRNA-disease associations proposed in recent years. Finally, we explore the perspectives and challenges for further studies. Our study provides a guide for biologists and medical scientists to look for dedicated resources and more competent tools for accelerating the unraveling of disease-associated lncRNAs.

Entities: Chemical Disease Gene Species

Keywords: bioinformatics; computational method; disease; lncRNA-disease association; long non-coding RNAs

Year: 2020 PMID： 32585624 PMCID： PMC7321789 DOI： 10.1016/j.omtn.2020.05.018

Source DB: PubMed Journal: Mol Ther Nucleic Acids ISSN： 2162-2531 Impact factor: 8.886

Introduction

Advances in genomic and transcriptional analyses have markedly expanded our knowledge of the genomic dark matter and revealed that only about 2% of the human genome encodes protein-coding genes, and the vast majority are transcribed into non-coding RNAs (ncRNAs)., Long ncRNAs (lncRNAs), constituting the biggest class of ncRNAs, were arbitrarily defined as ncRNAs with more than 200 nt in length., There is increasing evidence that lncRNAs are hidden critical components of a broad genomic regulatory network involved in gene transcription, epigenetic regulation, and post-transcriptional regulation, and they thus play pivotal roles in a wide variety of biological processes., A large number of lncRNAs with oncogenic or tumor-suppressor function have been found, highlighting the emerging role of lncRNAs in complex diseases.7, 8, 9 Identification of disease genes is a significant and challenging task in biomedical research. Systematic identification of disease-associated lncRNAs not only contributes to our understanding of the underlying molecular mechanisms of complex diseases, but it also has been shown to have the intrinsic advantage over protein-coding genes in disease diagnosis, prognosis, and treatment.10, 11, 12, 13, 14, 15, 16, 17 Despite increasing efforts that have been taken to explore the function of lncRNAs and their implications in diseases, the vast majority of lncRNAs are not functionally well characterized, and their associations with diseases remain unknown. Low-throughput biological experiments in vivo or in vitro have been extensively used to dissect disease-related lncRNAs. Although the exact association between disease and lncRNAs, as well as the pathogenic mechanism of lncRNAs, could be elucidated through in vivo or in vitro experiments, these low-throughput biological experiments tend to be time-consuming, expensive, and inefficient when faced with tens of thousands of lncRNAs with unknown function. With the application of high-throughput technologies (e.g., microarray and next-generation sequencing) to disease transcriptomes, a large number of dysregulated lncRNAs were identified to be associated with disease. However, results of high-throughput technologies contained much noise, and most of the dysregulated lncRNAs tend to be unrelated rather than causal lncRNAs because the aberrant expression in disease is not sufficient evidence to confirm the causal association between lncRNAs and diseases. With large-scale available heterogeneous data resources of lncRNAs and diseases, great efforts have been devoted to system-level inference of lncRNA-disease association through computational or bioinformatics approaches, which constitute a complement to wet-lab experiments., In this study, we present an overview of the computer-aided inference of the lncRNA-disease association. First, data sources accessible to the lncRNA-disease association study are introduced in detail. Second, novel computational methods and software tools, as well as their application in lncRNA-disease association prediction, are summarized and reviewed. Finally, we explore the future perspectives and challenges in this field.

Results

In this section, we reviewed dozens of novel computational methods in inferring the lncRNA-disease association proposed in recent years. Based on the core idea of the algorithm, these computational methods could be divided into four categories: (1) matrix completion-based methods, (2) recommendation algorithm-based methods, (3) resource allocation-based methods, and (4) integration-based methods.

Matrix Completion-Based Methods

Figure 1

Schematic Workflow of Matrix Completion-Based Methods

Three matrices (including the lncRNA-disease association matrix, lncRNA-lncRNA matrix, and disease-disease matrix) were first obtained as the input data. Then, feature extraction was accomplished based on the above three matrices to obtain lncRNA feature vectors and disease feature vectors. Finally, matrix completion methods were performed on the lncRNA-disease association matrix to acquire the lncRNA-disease association.

Table 1

Overview of Categories and Corresponding Method/Tool for Acquiring lncRNA-lncRNA Association

Categories	Method/Tool	Data Types	Data Resources	References
Sequence similarity	EMBOSS Needle tool	lncRNA sequence	LncRNADisease, UCSC, LNCipedia	Needleman and Wunsch³⁴
Functional similarity	LNCSIM	lncRNA-disease association, MeSH descriptors	LncRNADisease, Lnc2Cancer, MNDR, MeSH	Chen et al.³⁵
Functional similarity	ILNCSIM	lncRNA-disease association, MeSH descriptors	MNDR, Lnc2Cancer, MeSH	Huang et al.³⁶
Functional similarity	NA	lncRNA-gene association, protein-protein interaction	LncRNA2Target, StarBase, HPRD	Paik et al.³⁷
Functional similarity	NA	lncRNA-miRNA association	StarBase	Zhao et al.⁴⁰
Expression similarity	Spearman/Pearson correlation	lncRNA expression profiles	Array Express, UCSC Genome Bioinformatics	Chen and Yan⁴¹
Cosine similarity	cosine similarity	lncRNA-disease association	MNDR, Lnc2Cancer, LncRNADisease	Cheng et al.⁴²

NA, not applicable.

Schematic Workflow of Matrix Completion-Based Methods Three matrices (including the lncRNA-disease association matrix, lncRNA-lncRNA matrix, and disease-disease matrix) were first obtained as the input data. Then, feature extraction was accomplished based on the above three matrices to obtain lncRNA feature vectors and disease feature vectors. Finally, matrix completion methods were performed on the lncRNA-disease association matrix to acquire the lncRNA-disease association. Overview of Categories and Corresponding Method/Tool for Acquiring lncRNA-lncRNA Association NA, not applicable. Li et al. developed a computational model of faster randomized matrix completion for latent disease lncRNA association (named FRMCLDA) that used the faster singular value threshold (fSVT) algorithm to predict lncRNA-disease associations based on the idea of matrix completion. FRMCLDA uses the disease similarity matrix, lncRNA similarity matrix, lncRNA-disease association matrix, and transpose matrix of the association matrix to construct the adjacency matrix, which improves the prediction performance by fitting the adjacency matrix. LDAPM and SIMCLDA also use a matrix completion approach, but the difference is that LDAPM denotes the approximated matrix as the multiplication of the two matrices., TSSR exploits learned representation matrices as feature matrices to reconstruct the original matrix.

Resource-Allocation-Based Methods

Resource allocation is the allocation of available resources to each node. To predict the lncRNA-disease association, resource allocation is based on the initial value of the multi-data source matrices as a possible value for the relationship between nodes. The process is demonstrated in Figure 2. Resource allocation-based methods are built on data from multiple sources, such as lncRNA-disease association, miRNA-disease association, miRNA-lncRNA association, and so on. The heterogeneous multilayer network is constructed, and the edges are weighted according to the corresponding values of the matrix. The lncRNA-disease scoring matrix was produced by post-processing resource allocation on the heterogeneous network (Table 2).

Figure 2

Schematic Workflow of Resource Allocation-Based Methods

Multi-type data source matrices were first obtained as the input data. Then, a heterogeneous multilayer network is constructed, and the edges are weighted by the corresponding values of the matrix. Finally, the lncRNA-disease scoring matrix was produced by post-processing resource allocation on the heterogeneous network.

Table 2

Overview of Categories and Corresponding Method/Tool for Acquiring Disease-Disease Association

Categories	Method/Tool	Data Types	Data Resources	References
Semantic similarity	R package DOSE	MeSH descriptor	Disease Ontology, MeSH	Yu and Wang⁴³
Semantic similarity	NA	MeSH descriptor, Disease Ontology terms	MeSH, DincRNA	Chen et al.³⁵
Functional similarity	Jaccard coefficient	disease-gene association, gene-Gene Ontology terms association	Ensembl, DisGeNET	Mathur and Dinakarpandian⁴⁴
Functional similarity	NA	disease-miRNA association	HMDD	Zhao et al.⁴⁰
Gaussian interaction profile kernel similarity	Gaussian interaction profile kernel similarity/radial basis function (RBF) kernel similarity	disease-miRNA association, disease-gene association, lncRNA-disease association, sequence, expression	DisGeNet, HMDD, MNDR, Lnc2Cancer, LncRNADisease	Chen and Yan⁴¹
Cosine similarity	Cosine similarity	lncRNA-disease association	MNDR, Lnc2Cancer, LncRNADisease	Hamaneh and Yu⁴⁵

NA, not applicable.

Schematic Workflow of Resource Allocation-Based Methods Multi-type data source matrices were first obtained as the input data. Then, a heterogeneous multilayer network is constructed, and the edges are weighted by the corresponding values of the matrix. Finally, the lncRNA-disease scoring matrix was produced by post-processing resource allocation on the heterogeneous network. Overview of Categories and Corresponding Method/Tool for Acquiring Disease-Disease Association NA, not applicable. Resource allocation has been implemented in more than a dozen methods for predicting the lncRNA-disease association. Fan et al. proposed a computational model of IDHI-MIRW by integrating diverse heterogeneous information sources with positive pointwise mutual information and random walk with restart algorithm. Xiao et al. developed BPLLDA to predict lncRNA-disease associations based on simple paths with limited lengths in a heterogeneous network. Zhang et al. proposed a rule-based inference method on the linked tripartite network, which was constructed by integrating heterogeneous data with deep learning algorithms. Some other information had also been introduced to allocate resources for improving prediction performance. For example, LION and DislncRF introduced protein information and genome-wide tissue expression profiles, which are aided by protein-coding genes., NBLDA and LLCLPLDA both constructed four matrices and used the label propagation algorithm for resource allocation., By constructing a disease weight matrix based on the similarity between the lncRNA disease set and the specified disease, IIRWR introduced the concept of disease clique and added the weight of disease linkages to the traveling network. Lap-BiRWRHLDA and BiWalkLDA both used laplacian normalization and bi-random walk algorithm on similarity networks., The other two methods, TPGLDA and ncPred, allocated resources from the disease to lncRNAs and other nodes, respectively, but the difference is that the resources were returned to the initial nodes.,

Recommendation Algorithm-Based Methods

The common characteristic of the recommendation algorithm-based methods is to recommend a node that may be related to another node. It mainly includes content-based recommendation, collaborative filtering, and matrix factorization. The process is depicted in Figure 3. After applying a recommendation system algorithm to multi-data matrices, recommendation matrices at multiple levels (such as lncRNA, miRNA, etc.) were obtained. Finally, the possibility of the potential relationship between lncRNA and disease was measured through the combination of the recommendation matrices (Table 3).

Figure 3

Schematic Workflow of Recommendation Algorithm-Based Methods

Multi-type data source matrices were first obtained as the input data. Then, recommendation matrices at multiple levels (e.g., lncRNAs, miRNAs) are obtained by applying a recommendation system algorithm. Finally, the possibility of the potential relationship between lncRNA and disease is measured through the combination of the recommendation matrices.

Table 3

Overview of Matrix Completion-Based Computational Methods for Inferring lncRNA-Disease Association

Method Name	Computational Principle	Data Types	Available Tool (Package or Code)	References
SIMCLDA	inductive matrix completion, singular value decomposition	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	code (https://github.com//bioinfomaticsCSU/SIMCLDA)	Lu et al.⁴⁷
LDAPM	inductive matrix completion, singular value decomposition	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Fraidouni and Zaruba⁴⁸
FRMCLDA	faster randomized matrix completion, faster singular value threshold	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	code (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749816/bin/Table_7.docx)	Li et al.⁴⁶
TSSR	sparse self-representation	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	code (https://github.com/Oyl-CityU/TSSR)	Ou-Yang et al.⁴⁹

NA, not applicable.

Schematic Workflow of Recommendation Algorithm-Based Methods Multi-type data source matrices were first obtained as the input data. Then, recommendation matrices at multiple levels (e.g., lncRNAs, miRNAs) are obtained by applying a recommendation system algorithm. Finally, the possibility of the potential relationship between lncRNA and disease is measured through the combination of the recommendation matrices. Overview of Matrix Completion-Based Computational Methods for Inferring lncRNA-Disease Association NA, not applicable. The first category of recommendation algorithm-based methods is the content-based recommendation. Content-based recommendation refers to the recommendation of similar nodes of previous related nodes for this node. For example, NCPLDA, proposed by Li et al., measured the lncRNA-disease association score based on network consistency projection. The second category of recommendation algorithm-based methods is based on collaborative filtering. Collaborative filtering refers to adding other nodes similar to the nodes and using the information of these nodes to make inferences. CFNBC, developed by Yu et al., and NBCLDA, proposed by Yu et al., both used collaborative filtering on multi-data matrices to uncover a new relationship between lncRNA and disease and then took advantage of a naive Bayesian classifier to determine whether there is an association between lncRNA and disease in the set of the lncRNA-associated node and disease-associated node. A similarity correlation fusion method introduced neighbor information and then was used to predict the association by making the original matrix fit as well as possible in ILDMSF and SKF-LDA., BLM-NPAI, developed by Cui et al., introduced the nearest profile to get the final prediction results after constructing the local model of lncRNA and disease. Another method, proposed by Ping et al., measured the one-step neighbor of a node based on simrank measure when there was no common neighbor. DCSLDA, proposed by Zhao et al., calculated the shortest path between lncRNA and disease and the distance correlation coefficient to construct the final matrix. The third category of recommendation algorithm-based methods is based on matrix factorization, including MFLDA, WMFLDA, and PMFILDA. The three methods above are based on matrix factorization, and the difference is that the latter two add weight and probability separately.43, 44, 45 Li et al. developed a computational method, DNILMF-LDA, that is anchored in the neighborhood regularized logistic matrix factorization and optimizes the above parameters to predict interaction probabilities. NNLDA, determined by Hu et al., solved some of the disadvantages of traditional matrix factorization by changing the training method and the loss function and adding a fully connected layer. DSCMF combines matrix factorization and collaborative filtering to predict associations efficiently by introducing neighbor information.

Multi-model Integration-Based Methods

Multi-model integration methods have also been proposed to overcome the shortcomings of the single model and improve prediction performance (Table 4). A combination of matrix completion ideas and recommendation system ideas was applied in three models to predict potential lncRNA-disease associations, including LDASR, ECLDA, and weighted bagging LightGBM model.49, 50, 51 Three methods (CNNLDA, CNNDLP, and GCNLDA) were developed by Xuan et al.52, 53, 54 to construct the final module through the integration of the convolutional module and attention module. LDAPred, proposed by Xuan et al., introduced the convolutional neural network based on the integration of resource allocation and matrix completion.

Table 4

Overview of Resource Allocation-Based Computational Methods for Inferring lncRNA-Disease Association

Method Name	Computational Principle	Data Types	Available Tool (Package or Code)	References
BPLLDA	paths together with a decay function	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Xiao et al.⁵¹
TPGLDA	resource allocation	disease-gene association, lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	code (https://github.com/USTC-HIlab/TPGLDA)	Ding et al.⁶²
IDHI-MIRW	positive pointwise mutual information, random walk with restart algorithm	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	IDHI-MIRW (https://github.com/NWPU-903PR/IDHI-MIRW)	Fan et al.⁵⁰
Lap-BiRWRHLDA	Laplacian normalization, random walks	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Wen et al.⁶⁰
IIRWR	random walk with restart algorithm	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	code (https://github.com/xiaoyubin123/code)	Wang et al.⁵⁹
LLCLPLDA	label propagation algorithm, locality-constrained linear coding	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Xie et al.⁵⁵
LION	network diffusion approach	lncRNA-protein interaction, protein-protein interaction, protein-disease interaction	NA	Sumathipala et al.⁵³
NBLDA	label propagation algorithm	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Liu et al.⁵⁸
DislncRF	random forest	RNA sequencing data, disease-protein coding gene association, lncRNA-disease association	code (https://github.com/xypan1232/DislncRF)	Pan et al.⁵⁴
NA	DeepWalk and a rule-based inference method	lncRNA-disease association, lncRNA-miRNA association, miRNA-disease association	code (https://github.com/Pengeace/lncRNA-disease-link)	Zhang et al.⁵²
NA	ncPred	disease-target association, target-ncRNA association, ncRNA-ncRNA association, target-target association	NA	Mori et al.⁶³
BiWalkLDA	Laplacian normalization, random walks	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	code (https://github.com/screamer/BiwalkLDA)	Gao et al.⁶¹

NA, not applicable.

Overview of Resource Allocation-Based Computational Methods for Inferring lncRNA-Disease Association NA, not applicable.

Discussion

During the past decade, it has been well documented that lncRNAs play a critical role in nearly all biological processes and have become an emerging paradigm of human disease research., Identification of disease-associated lncRNAs is becoming increasingly important for fundamentally improving our understanding of molecular mechanisms and developing novel therapeutic targets, and thus has attracted more and more attention in the scientific community and is becoming one of the hotspots in medical research. Although current experimental studies in vitro and in vivo could directly link identified lncRNAs with disease phenotypes, they are affected by the limitation of lower efficiency and higher time and labor cost. Taking into account the limitations of experimental studies, high-throughput technologies were then implemented, leading to exponential growth in the number of dysregulated lncRNAs in diseases. However, the aberrant expression of lncRNAs is not sufficient evidence for ascribing to them a functional role in disease. Therefore, efficient and accurate identification and functional elucidation of disease-associated lncRNAs are in their infancy and remain a major challenge. With the rapidly increasing quantity and quality of bioinformatics databases and resources in lncRNAs and diseases, computer-aided inference of disease-associated RNAs has become a promising avenue for facilitating the unraveling of the functional role of lncRNAs in diseases and provides complementary value for experimental studies. A large number of computational models, algorithms, and tools have been developed and proposed, compensating for this dearth. In this work, we first summarize data and knowledge sources available for the lncRNA-disease association study, which contains databases of lncRNAs, diseases, and known lncRNA-disease associations. We then present a detailed overview of previously proposed computational methods for inferring lncRNA-disease associations. Based on the core idea implemented, these computational methods can be divided into four categories: (1) matrix completion-based methods, (2) recommendation algorithm-based methods, (3) resource allocation-based methods, and (4) multi-model integration-based methods. Despite that the performance of each computational method is very great according to the reports in their own studies, one emerging critical issue is that most of these methods used different data sources as their training dataset and carried out cross-validation on their dataset, lacking benchmark performance evaluation. These computational methods have distinct limitations and weaknesses, which are noted at follows. First, matrix completion-based methods considered the feature vectors of lncRNA and disease to improve the accuracy of the prediction. However, these algorithms hold the disadvantage of poor robustness. The ranks of diverse datasets are likely to vary widely. Second, because experimentally verified lncRNA-disease associations are still too incomplete, resource allocation-based methods need to consider the prediction of separate nodes or integrate additional biological information. However, although this can improve prediction accuracy, some interactions from other databases may contain some noise to interfere with prediction results. In addition, recommendation algorithm-based methods are stated separately. Content-based recommendation models only need node prior knowledge, but new levels of disease-lncRNA associations cannot be recognized. Although collaborative filtering-based recommendation methods complement this shortcoming, the spare lncRNA-disease association matrix is harmful to the recommendation, and the complexity and time cost of the algorithm will sharply increase when the amount of data is too large. Additionally, matrix factorization-based recommendation methods reduce space complexity by mapping a matrix to a product of low-dimensional matrices. These methods also make it easy to add additional data from different sources and use the intrinsic structure. Matrix factorization-based methods also have the same weaknesses as collaborative filtering-based recommendation methods. The above computational approaches can thus complement each other. Therefore, multi-model integration-based methods were proposed to achieve better performance when investigating the association between lncRNAs and diseases. Finally, only several computational approaches have been developed as online web tools, and most are still theoretical studies that hampered their use for biologists and medical scientists. With the rapidly increasing knowledge for the functional mechanism of lncRNAs, several challenges that would be helpful to improve the accuracy and practicality of the predictors could be highlighted. It is well known that the majority of lncRNAs exhibited precise subcellular localization, thus performing regulatory roles in a spatiotemporal manner., Therefore, some interactions between lncRNAs and other biological molecules (DNA, RNA, and proteins) used in previous predictors are derived from prediction and do not exist in the real biological world. Therefore, co-localization information of lncRNAs and other biological molecules should be considered. Additionally, it has also been observed that lncRNAs were expressed in highly cell type-specific, tissue-specific, and disease-specific manners. Therefore, more molecular information in the appropriate biological contexts should be introduced into predictors that are more suitable for the specific disease. Finally, the prediction results of these computational approaches are only descriptive associations, and the specific association type (e.g., casual or non-causal association of lncRNA with the disease) is still a challenging task and needs to be answered. The implementation of efficient and reliable computational predictions, together with systematic biological experiments, will greatly accelerate the study of lncRNA functions and mechanisms in physiological and pathological conditions.

Materials and Methods

Databases and Knowledge Bases

NONCODE (http://www.noncode.org/) collects and integrates data from PubMed and other resources via text mining. Users can use CNCI to predict their protein-encoding potential and display the results of functional annotation and enrichment through the ncFANs online website. The current version covers expression, function, sequence, structure, disease relevance of lncRNA, and other factors. Compared with other lncRNA databases, NONCODE stores more information about lncRNA transcripts and unique annotations. LncRBase (http://bicresources.jcbose.ac.in/zhumur/lncrbase) is an annotation database resource for analyzing lncRNA functions based on feature sequences. It has recorded transcript information about 133,361 human lncRNA entries and 83,201 mouse lncRNA entries. Information about the lncRNA subtypes and small ncRNA-lncRNA associations is included. The database also provides microarray probes mapped to specific lncRNAs and expression in tissues. LncBook (http://bigd.big.ac.cn/lncbook) collects information on 268,848 experimentally verified and predicted lncRNAs (including 1,867 functional lncRNAs) and includes information on related functions, diseases, expressions, methylation, mutations, and miRNA interactions (via software prediction). The team developed a database called LncRNAWiki, which is an integrated database. LncRNAWiki has set up a model of collaborative annotation. Then LncBook has been established to organize large-scale annotations systematically as a complement to LncRNAWiki. MONOCLdb (https://www.monocldb.org/) contains 20,728 lncRNAs from the sequencing of virus-infected lungs of eight respiratory-infected mice, of which 5,329 were differentially expressed. These differentially expressed lncRNAs are annotated by different methods (enrichment methods, as well as module-based and rank-based methods). The correlation score of lncRNA expression profiles and six phenotypic data were determined as pathogenic associations. lncRNome (http://genome.igib.res.in/lncRNome) is a comprehensive database of human lncRNAs, which collects information on annotation, sequence, structure, interacting proteins, genomic variations, conservation, and epigenetic modifications for more than 18,000 lncRNAs. Annotation is manually curated from literature and databases, including associated diseases, related literature, and the mapping of disease-associated variation in lncRNA gene loci. LncRNASNP (http://bioinfo.life.hust.edu.cn/lncRNASNP/) mainly sorts information on the single nucleotide polymorphism (SNP) loci located on the lncRNA gene in humans and mice. The cancer mutations in lncRNA transcripts and lncRNA expression in cancer, the predicted interactions of miRNAs and associated diseases, and the impact of variations on lncRNA structures were integrated into LncRNASNP. This database also collects experimentally verified and predicted disease-lncRNA associations in humans. LncRNADisease (http://www.rnanut.net/lncrnadisease/) is an open-access database that has been updated to version 2.0. As one of the more commonly used databases, LncRNADisease 2.0 provides 10,564 experimental lncRNA-disease associations and 195,395 computational lncRNA-disease associations in four species. Additionally, a confidence score was obtained for each pair of lncRNA-disease associations based on known experimental information. LncRNADisease also collets lncRNA regulatory networks. Lnc2Cancer (http://www.bio-bigdata.net/lnc2cancer) collected 4,989 comprehensive experimentally supported associations between 1,614 lncRNAs and 165 human cancers., These records were built through text mining on the PubMed database. The database consists of three classifications of relationships between lncRNAs and cancers: circulating, drug-resistant, and prognostic-related lncRNAs. Additionally, it collects transcription factor (TF), mircroRNA (miRNA), variant, and methylation molecular information on the regulation of lncRNAs. MNDR (http://www.rna-society.org/mndr/) is built through manual curation of scientific literature. The current release (MNDR v2.0) has recorded 261,042 entries including six species and 1,416 diseases from 26,600 studies. Detailed and comprehensive annotations for lncRNA-disease associations are presented at the bottom of each record, including the data from articles and evidence support. EVLncRNAs (http://biophy.dzu.edu.cn/EVLncRNAs) is a database that intends to include all lncRNA-disease associations that are validated by low-throughput experiments. The database includes 1,543 lncRNAs from 77 species, 886 of which are associated with 338 diseases, along with experimental information. For other lncRNAs that are not associated with diseases, their functional information is collected. NSDNA (http://www.bio-bigdata.net/nsdna/) is an online knowledge base of ncRNA-nervous system disease (NSD) associations. It contains 24,713 entries of associations covering 142 NSDs and 8,593 ncRNAs from more than 1,300 articles, of which 4,608 lncRNAs-NSDs are included. Users can browse by ncRNAs, diseases, species, or tissue name. Also, if searching for data, users can select low-throughput or high-throughput experimental data or both. Nc2Eye (http://nc2eye.bio-data.cn/) is the first high-quality manually curated ncRNAomics knowledge base associated with eye disease and includes 1,147 lncRNA-associated entries.

Computational Methods for Acquiring lncRNA-lncRNA Associations

Most of the computational methods for inferring lncRNA-disease association are based on lncRNA-lncRNA association data. Therefore, acquiring a high-quality lncRNA-lncRNA association is critical for improving performance in predicting the lncRNA-disease association. We have summarized the currently available computational methods for acquiring the lncRNA-lncRNA association. In general, these methods could be basically divided into four categories (Table 1): sequence similarity-based methods, functional similarity-based methods, cosine similarity-based methods, and expression-based methods.

Sequence Similarity-Based Methods

Due to plentiful information about the lncRNA sequence, the similarity between two lncRNAs was measured by comparing the sequence features of lncRNAs. Needleman and Wunsch first proposed the Needleman-Wunsch global alignment algorithm in 1970. Then, researchers developed it as a web tool, which calculates the optimum alignment and best score of two sequences in the order of sequence steps along their entire length. The alignment score SW(li,lj) was obtained from EMBOSS Needle, and the sequence similarity is defined as follows:

Functional Similarity-Based Methods

Based on the assumption that if lncRNA-related molecules have a similar function, lncRNA functions are identical, several functional similarity-based computational methods were developed. Chen et al. proposed a method, named LNCSIM, to measure the semantic similarity of lncRNAs’ associated two groups of diseases. They integrated two semantic similarity models to achieve better performance. Two models both collected diseases’ MeSH descriptors and constructed a directed acyclic graph (DAG). Then, the contribution of disease term t to disease A was calculated, which is also the difference between these two models. In the first model, it was calculated as Equation 2. Since the contribution of other diseases to the semantic value of the disease decreases with the increase of the distance between this disease and disease A, the decay factor is added. For the second model, diseases that appear in DAG(A) and are less common in other diseases, DAGs have a more significant contribution that can be calculated in Equation 3:Therefore, the semantic value of disease A is defined as the sum of contributions from ancestral diseases and disease A itself:Thus, the semantic similarity between two diseases A and B is calculated based on the common nodes of DAG(A) and DAG(B):Finally, lncRNA functional similarity was obtained by calculating the average of the similarities between the two groups of diseases. Huang et al. proposed an improved model called ILNCSIM, which introduced information content (IC) and focused on the hierarchical structure of disease DAGs. First, the information content value was calculated. Information content of disease term a is defined as the negative log-likelihood of each term:Second, IC-based distances were used to calculate the most informative common ancestors (MICAs) and the most informative leaf (MIL):Third, components α, to measure the specificity of MICA, β, to measure the generality of two disease terms, and γ, to estimate the total IC-based distances between two terms and their MICA, are computed:Fourth, based on the above three equations, to compute the semantic similarity of two diseases:Finally, lncRNA functional similarity was measured by calculating the average of the similarities between the two groups of diseases. ICod measures disease similarity by scoring disease-related gene similarity, and researchers applied this idea to lncRNAs.37, 38, 39 The similarity between lncRNAs i and j was calculated based on the shortest path between each pair of lncRNA-related genes in the integrated human protein-protein interaction (PPI) network. The shortest distance between two proteins in the PPI network is indicated as d(pm,pn). D(pm,pn) denotes the transformed distance between the networks. t is the threshold of d(pm,pn). NETi and NETj represent the networks related to two lncRNAs, respectively. E and H are freely adjustable parameters:Zhao et al. proposed a computational model to infer lncRNA-lncRNA association, which introduced lncRNA-miRNA associations and was defined as for Equation 14 and 15. The process is the sum of contributions of commonly associated miRNA divided by the number of miRNAs associated with two lncRNAs. The contribution value of each miRNA for lncRNA is computed by Equation 14. D(i) and D(j) are the number of lncRNAi-related edges and lncRNAj-related edges, respectively:

Expression Similarity-Based Methods

lncRNAs are expressed in highly cell type-specific, tissue-specific, and disease-specific manners. Consequently, the expression similarity between two lncRNAs is an important point. The co-expression relationship between lncRNAs measured by a Pearson or Spearman correlation coefficient was commonly used to infer the lncRNA-lncRNA association.

Cosine Similarity-Based Methods

The concept of cosine is the origin of mathematics. Cheng et al. proposed a computational method called IntNetLncSim, which linked cosine similarity with lncRNA similarity. The similarity between two lncRNAs, lnc1 and lnc2, was calculated as follows:where w1,i represents the vector values of lnc1 on the ith dimension.

Computational Methods for Acquiring Disease-Disease Associations

Disease-disease association data are critical data used in most of the computational methods for inferring lncRNA-disease association. In general, these methods for obtaining disease-disease association can be divided into four categories (Table 2): semantic similarity-based methods, functional similarity-based methods, Gaussian interaction profile kernel similarity-based methods, and cosine similarity-based methods.

Semantic Similarity-Based Methods

Semantic similarity between diseases is one of the commonly used methods that use mesh descriptors or disease ontology terms and determine the similarity of two disease terms based on the information content of common ancestral terms. A package named DOSE developed by Yu and Wang can calculate disease semantic similarity. Other methods also obtained the same results by mathematical formulas. The detailed algorithm is described in LNCSIM of lncRNA functional similarity. Functional similarity-based methods were achieved by using the Jaccard coefficient to measure the similarity and difference of disease-related gene ontology. Disease-gene interaction and gene-gene ontology interaction were used and calculated as Equation 19. GOi represents the gene ontology terms related to disease i:Zhao et al. proposed a model to infer the disease-disease association in their computational method, which introduced disease-miRNA associations and was defined as Equation 20 and 21. The process is the sum of contributions of commonly associated miRNA divided by the number of miRNAs associated with two diseases. The contribution value of each miRNA for a disease is computed by Equation 20. D(i) and D(j) are the number of diseasei-related edges and diseasej-related edges, respectively:

Gaussian Interaction Profile Kernel Similarity-Based Methods

Based on the assumption that genes with similar functions tend to be associated with a similar disease, Chen and Yan applied the Gaussian interaction profile kernel (also called the radial basis function kernel) similarity to measure disease-disease association as in Equation 22. IP(i) indicates the row of diseasei in the disease-lncRNA association matrix:The parameter γd controls the kernel bandwidth, which is defined as follows:where nd denotes the number of the contained diseases. γd′ is a novel bandwidth parameter by the average number of associations with lncRNAs per disease. In a previous study, Hamaneh and Yu linked cosine similarity with disease similarity. We used the lncRNA-disease association matrix to replace the original matrix for display. The similarity between the two diseases, dis1 and dis2, was calculated as follows:where wl,i represents the vector values of dis1 on the ith dimension.

Computational Methods for Inferring lncRNA-Disease Associations

In this section, we review dozens of novel computational methods in inferring the lncRNA-disease associations proposed in recent years. Based on the core idea of the algorithm, these computational methods can be divided into four categories: matrix completion-based methods, recommendation algorithm-based methods, resource allocation-based methods, and multi-model integration-based methods.

Matrix Completion-Based Methods

The universal characteristic of the matrix completion-based methods is to complete the dataset with missing values in the form of a matrix. As shown in Figure 1, three matrices, including the lncRNA-disease association matrix, lncRNA-lncRNA matrix, and disease-disease matrix, were obtained. Then, feature extraction is accomplished based on the above three matrices to obtain lncRNA feature vectors and disease feature vectors. Finally, matrix completion methods were conducted on the lncRNA-disease association matrix to acquire the lncRNA-disease scoring matrix based on lncRNA feature vectors and disease feature vectors (Table 3). Li et al. developed a computational model of faster randomized matrix completion for latent disease-lncRNA association (named FRMCLDA) that used the fSVT algorithm to predict lncRNA-disease associations based on the idea of matrix completion. FRMCLDA uses the disease similarity matrix, lncRNA similarity matrix, lncRNA-disease association matrix, and transpose matrix of the association matrix to construct the adjacency matrix, which improves the prediction performance by fitting the adjacency matrix. LDAPM and SIMCLDA also use a matrix completion approach, but the difference is that LDAPM denotes the approximated matrix as the multiplication of the two matrices., Also, TSSR exploits learned representation matrices as feature matrices to reconstruct the original matrix.

Resource Allocation-Based Methods

Resource allocation is used to allocate available resources to each node. To predict the lncRNA-disease association, resource allocation is based on the initial value of the multi-data source matrices as a possible value for the relationship between nodes. The process is demonstrated in Figure 2. Resource allocation-based methods are built on data from multiple sources, such as lncRNA-disease association, miRNA-disease association, miRNA-lncRNA association, and so forth. The heterogeneous multilayer network is constructed, and the edges are weighted according to the corresponding values of the matrix. The lncRNA-disease scoring matrix was produced by post-processing resource allocation on the heterogeneous network (Table 4). Resource allocation has been implemented in more than a dozen methods for predicting the lncRNA-disease association. Fan et al. proposed a computational model of IDHI-MIRW by integrating diverse heterogeneous information sources with positive pointwise mutual information and random walk with a restart algorithm. Xiao et al. developed BPLLDA to predict lncRNA-disease associations based on simple paths with limited lengths in a heterogeneous network. Zhang et al. proposed a rule-based inference method on the linked tripartite network, which was constructed by integrating heterogeneous data with deep-learning algorithms. Some other information had also been introduced to allocate resources for improving prediction performance. For example, LION and DislncRF introduced protein information and genome-wide tissue expression profiles, which are aided by protein-coding genes., NBLDA and LLCLPLDA both constructed four matrices and used the label propagation algorithm to resource allocation., By constructing a disease weight matrix based on the similarity between the lncRNA disease set and the specified disease, IIRWR introduced the concept of disease clique and added the weight of disease linkages to the traveling network. Lap-BiRWRHLDA and BiWalkLDA both used Laplacian normalization and a bi-random walk algorithm on similarity networks., The other two methods, TPGLDA and ncPred, allocated resources from the disease to lncRNAs and other nodes, respectively, but the difference is that the resources are returned to the initial nodes.,

Recommendation Algorithm-Based Methods

The common characteristic of the recommendation algorithm-based methods is to recommend a node that may be related to another node. It mainly includes content-based recommendation, collaborative filtering, and matrix factorization. The process is depicted in Figure 3. After applying a recommendation system algorithm to multi-data matrices, recommendation matrices at multiple levels (e.g., such as lncRNA, miRNA) are obtained. Finally, the possibility of the potential relationship between lncRNA and disease is measured through the combination of the recommendation matrices (Table 5).

Table 5

Overview of Recommendation Algorithm-Based Computational Methods for Inferring lncRNA-Disease Association

Method Name	Computational Principle	Data Types	Available Tool (Package or Code)	References
ILDMSF	similarity network fusion	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Chen et al.³⁸
NBCLDA	naive Bayesian, collaborative filtering	miRNA-disease association, miRNA-lncRNA association, lncRNA-disease association, disease-disease association	NA	Yu et al.⁶⁶
NCPLDA	network consistency projection	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	code (https://github.com/ghli16/NCPLDA)	Li et al.⁶⁴
MFLDA	matrix factorization	lncRNA-miRNA association, lncRNA-gene association, lncRNA-Gene Ontology (GO) association, lncRNA-disease association, miRNA-gene association, miRNA-disease association, gene-disease association, gene-gene association, gene-drug association, drug-drug association, gene-GO association	code (http://mlda.swu.edu.cn/codes.php?name=MFLDA)	Fu et al.⁷⁰
WMFLDA	matrix factorization	lncRNA-miRNA association, lncRNA-gene association, lncRNA-GO association, lncRNA-disease association, miRNA-gene association, miRNA-disease association, gene-disease association, gene-gene association, gene-drug association, drug-drug association, gene-GO association	code (http://mlda.swu.edu.cn/codes.php?name=WMFLDA)	Wang et al.⁸³
PMFILDA	probabilities matrix factorization	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association, miRNA-disease association, miRNA-lncRNA association	NA	Xuan et al.⁷¹
DNILMF-LDA	logistic matrix factorization, Bayesian optimization	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Li et al.⁷³
DSCMF	collaborative matrix factorization	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Gao et al.⁷⁵
NNLDA	matrix factorization	lncRNA-disease association	code (https://github.com/gao793583308/NNLDA)	Hu et al.⁷⁴
NA	SimRank measure, common neighbor-based	lncRNA-disease association	NA	Ping et al.⁶⁹
CFNBC	naive Bayesian, collaborative filtering	miRNA-disease association, miRNA-lncRNA association, lncRNA-disease association, disease-disease association	code (https://github.com/jingwenyu18/CFNBC)	Yu et al.⁶⁵
DCSLDA	distance correlation set	disease-disease association, lncRNA-disease association, miRNA-disease association, miRNA-LncRNA association, lncRNA-lncRNA association	NA	Zhao et al.⁴⁰
SKF-LDA	similarity kernel fusion	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Xie et al.⁶⁷
BLM-NPAI	bipartite local model	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Cui et al.⁸⁴

NA, not applicable.

Overview of Recommendation Algorithm-Based Computational Methods for Inferring lncRNA-Disease Association NA, not applicable. The first category of recommendation algorithm-based methods is the content-based recommendation. Content-based recommendation refers to recommending similar nodes of previous related nodes for this node. For example, NCPLDA, proposed by Li et al., measured the lncRNA-disease association score based on network consistency projection. The second category of recommendation algorithm-based methods is based on collaborative filtering. Collaborative filtering refers to adding other nodes similar to the nodes and using the information of these nodes to make inferences. CFNBC developed by Yu et al. and NBCLDA proposed by Yu et al. both used collaborative filtering on multi-data matrices to uncover a new relationship between lncRNA and disease and then took advantage of a naive Bayesian classifier to determine whether there is an association between lncRNA and disease in the set of lncRNA-associated node and disease-associated node. A similarity correlation fusion method introduced neighbor information and then was used to predict the association by making the original matrix fit as good as possible in ILDMSF and SKF-LDA., BLM-NPAI, developed by Cui et al., introduced the nearest profile to get the final prediction results after constructing the local model of lncRNA and disease. Another method proposed by Ping et al. measured the one-step neighbor of a node based on SimRank measure when there is no common neighbor. DCSLDA, proposed by Zhao et al., calculated the shortest path between lncRNA and disease and distance correlation coefficient to construct the final matrix. The third category of recommendation algorithm-based methods is based on matrix factorization, including MFLDA, WMFLDA, and PMFILDA. These three methods are based on matrix factorization, and the difference is that the latter two add weight and probability separately.70, 71, 72 Li et al. developed a computational method, DNILMF-LDA, which is anchored in the neighborhood-regularized logistic matrix factorization and optimizes the above parameters to predict interaction probabilities. NNLDA, determined by Hu et al., solved some of the disadvantages of traditional matrix factorization by changing the training method and the loss function and adding a fully connected layer. Additionally, DSCMF combines matrix factorization and collaborative filtering to predict associations efficiently by introducing neighbor information.

Multi-Model Integration-Based Methods

Multi-model integration methods have also been proposed to overcome the shortcomings of the single model and improve prediction performance (Table 6). A combination of matrix completion ideas and recommendation system ideas was applied in three models to predict potential lncRNA-disease associations, including LDASR, ECLDA, and the weighted bagging LightGBM model.76, 77, 78 Three methods (CNNLDA, CNNDLP, and GCNLDA) were developed by Xuan et al.79, 80, 81 to construct the final module through the integration of the convolutional module and attention module. LDAPred, proposed by Xuan et al., introduced the convolutional neural network based on the integration of resource allocation and matrix completion.

Table 6

Overview of Multi-Model Integration-Based Computational Methods for Inferring lncRNA-Disease Association

Method Name	Computational Principle	Data Types	Available Tool (Package or Code)	References
NA	weighted bagging lightGBM model	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Chen and Liu⁷⁷
LDASR	rotation forest	lncRNA-disease association	NA	Guo et al.⁷⁶
ECLDA	extreme learning machine, convolutional neural networks	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association	NA	Guo et al.⁷⁸
CNNLDA	convolutional neural networks, attention mechanisms	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association, miRNA-disease association, miRNA-lncRNA association, miRNA-miRNA association	NA	Xuan et al.⁷⁹
CNNDLP	convolutional neural networks, attention mechanisms	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association, miRNA-disease association, miRNA-lncRNA association, miRNA-miRNA association	NA	Xuan et al.⁸⁰
GCNLDA	convolutional neural networks, graph convolutional network	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association, miRNA-disease association, miRNA-lncRNA association, miRNA-miRNA association	NA	Xuan et al.⁸¹
LDAPred	convolutional neural networks, information flow propagation	lncRNA-disease association, lncRNA-lncRNA association, disease-disease association, miRNA-disease association, miRNA-lncRNA association, miRNA-miRNA association	NA	Xuan et al.⁸²

NA, not applicable.

Overview of Multi-Model Integration-Based Computational Methods for Inferring lncRNA-Disease Association NA, not applicable. During the past decade, it has been well documented that lncRNAs play a critical role in nearly all biological processes and have become an emerging paradigm of human disease research., Identification of disease-associated lncRNAs is becoming increasingly important for fundamentally improving our understanding of molecular mechanisms and developing novel therapeutic targets, and thus has attracted more and more attention in the scientific community and is becoming one of the hotspots in medical research. Although current experimental studies in vitro and in vivo could directly link identified lncRNAs with disease phenotypes, they are affected by the limitation of lower efficiency and higher time and labor cost. Taking into account the limitations of experimental studies, high-throughput technologies were then implemented, leading to exponential growth in the number of dysregulated lncRNAs in diseases. However, the aberrant expression of lncRNAs is not sufficient evidence for ascribing to them a functional role in disease. Therefore, efficient and accurate identification and functional elucidation of disease-associated lncRNAs are in their infancy and remain a major challenge. With the rapidly increasing quantity and quality of bioinformatics databases and resources in lncRNAs and diseases, computer-aided inference of disease-associated RNAs has become a promising avenue for facilitating the unraveling of the functional role of lncRNAs in diseases and provides complementary value for experimental studies. A large number of computational models, algorithms, and tools have been developed and proposed, compensating for this dearth. In this work, we first summarize data and knowledge sources available for the lncRNA-disease association study, which contains databases of lncRNAs, diseases, and known lncRNA-disease associations. We then present a detailed overview of previously proposed computational methods for inferring lncRNA-disease associations. Based on the core idea implemented, these computational methods can be divided into four categories: (1) matrix completion-based methods, (2) recommendation algorithm-based methods, (3) resource allocation-based methods, and (4) multi-model integration-based methods. Despite that the performance of each computational method is very great according to the reports in their own studies, one emerging critical issue is that most of these methods used different data sources as their training dataset and carried out cross-validation on their dataset, lacking benchmark performance evaluation. These computational methods have distinct limitations and weaknesses, which are noted at follows. First, matrix completion-based methods considered the feature vectors of lncRNA and disease to improve the accuracy of the prediction. However, these algorithms hold the disadvantage of poor robustness. The ranks of diverse datasets are likely to vary widely. Second, because experimentally verified lncRNA-disease associations are still too incomplete, resource allocation-based methods need to consider the prediction of separate nodes or integrate additional biological information. However, although this can improve prediction accuracy, some interactions from other databases may contain some noise to interfere with prediction results. In addition, recommendation algorithm-based methods are stated separately. Content-based recommendation models only need node prior knowledge, but new levels of disease-lncRNA associations cannot be recognized. Although collaborative filtering-based recommendation methods complement this shortcoming, the spare lncRNA-disease association matrix is harmful to the recommendation, and the complexity and time cost of the algorithm will sharply increase when the amount of data is too large. Additionally, matrix factorization-based recommendation methods reduce space complexity by mapping a matrix to a product of low-dimensional matrices. These methods also make it easy to add additional data from different sources and use the intrinsic structure. Matrix factorization-based methods also have the same weaknesses as collaborative filtering-based recommendation methods. The above computational approaches can thus complement each other. Therefore, multi-model integration-based methods were proposed to achieve better performance when investigating the association between lncRNAs and diseases. Finally, only several computational approaches have been developed as online web tools, and most are still theoretical studies that hampered their use for biologists and medical scientists. With the rapidly increasing knowledge for the functional mechanism of lncRNAs, several challenges that would be helpful to improve the accuracy and practicality of the predictors could be highlighted. It is well known that the majority of lncRNAs exhibited precise subcellular localization, thus performing regulatory roles in a spatiotemporal manner., Therefore, some interactions between lncRNAs and other biological molecules (DNA, RNA, and proteins) used in previous predictors are derived from prediction and do not exist in the real biological world. Therefore, co-localization information of lncRNAs and other biological molecules should be considered. Additionally, it has also been observed that lncRNAs were expressed in highly cell type-specific, tissue-specific, and disease-specific manners. Therefore, more molecular information in the appropriate biological contexts should be introduced into predictors that are more suitable for the specific disease. Finally, the prediction results of these computational approaches are only descriptive associations, and the specific association type (e.g., casual or non-causal association of lncRNA with the disease) is still a challenging task and needs to be answered. The implementation of efficient and reliable computational predictions, together with systematic biological experiments, will greatly accelerate the study of lncRNA functions and mechanisms in physiological and pathological conditions.

Authors Contributions

J.S., C.X., and M.Z. designed the study. C.Y., Z.Z., S.B., and P.H. collected and reviewed literature. J.S., M.Z., and C.Y. drafted the manuscript. All authors read and approved the final manuscript.

Conflicts of Interest

The authors declare no competing interests.

73 in total

1. RNA maps reveal new RNA classes and a possible function for pervasive transcription.

Authors: Philipp Kapranov; Jill Cheng; Sujit Dike; David A Nix; Radharani Duttagupta; Aarron T Willingham; Peter F Stadler; Jana Hertel; Jörg Hackermüller; Ivo L Hofacker; Ian Bell; Evelyn Cheung; Jorg Drenkow; Erica Dumais; Sandeep Patel; Gregg Helt; Madhavan Ganesh; Srinka Ghosh; Antonio Piccolboni; Victor Sementchenko; Hari Tammana; Thomas R Gingeras
Journal: Science Date: 2007-05-17 Impact factor: 47.728

2. An Immune-Related Six-lncRNA Signature to Improve Prognosis Prediction of Glioblastoma Multiforme.

Authors: Meng Zhou; Zhaoyue Zhang; Hengqiang Zhao; Siqi Bao; Liang Cheng; Jie Sun
Journal: Mol Neurobiol Date: 2017-05-19 Impact factor: 5.590

3. Annotation of long non-coding RNAs expressed in collaborative cross founder mice in response to respiratory virus infection reveals a new class of interferon-stimulated transcripts.

Authors: Laurence Josset; Nicolas Tchitchek; Lisa E Gralinski; Martin T Ferris; Amie J Eisfeld; Richard R Green; Matthew J Thomas; Jennifer Tisoncik-Go; Gary P Schroth; Yoshihiro Kawaoka; Fernando Pardo Manuel de Villena; Ralph S Baric; Mark T Heise; Xinxia Peng; Michael G Katze
Journal: RNA Biol Date: 2014-06-12 Impact factor: 4.652

4. NONCODEV5: a comprehensive annotation database for long non-coding RNAs.

Authors: ShuangSang Fang; LiLi Zhang; JinCheng Guo; YiWei Niu; Yang Wu; Hui Li; LianHe Zhao; XiYuan Li; XueYi Teng; XianHui Sun; Liang Sun; Michael Q Zhang; RunSheng Chen; Yi Zhao
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

5. A Novel Network-Based Computational Model for Prediction of Potential LncRNA⁻Disease Association.

Authors: Yang Liu; Xiang Feng; Haochen Zhao; Zhanwei Xuan; Lei Wang
Journal: Int J Mol Sci Date: 2019-03-28 Impact factor: 5.923

6. LDAPred: A Method Based on Information Flow Propagation and a Convolutional Neural Network for the Prediction of Disease-Associated lncRNAs.

Authors: Ping Xuan; Lan Jia; Tiangang Zhang; Nan Sheng; Xiaokun Li; Jinbao Li
Journal: Int J Mol Sci Date: 2019-09-10 Impact factor: 5.923

Review 7. Long noncoding RNAs and the genetics of cancer.

Authors: S W Cheetham; F Gruhl; J S Mattick; M E Dinger
Journal: Br J Cancer Date: 2013-05-09 Impact factor: 7.640

8. ILNCSIM: improved lncRNA functional similarity calculation model.

Authors: Yu-An Huang; Xing Chen; Zhu-Hong You; De-Shuang Huang; Keith C C Chan
Journal: Oncotarget Date: 2016-05-03

9. EVLncRNAs: a manually curated database for long non-coding RNAs validated by low-throughput experiments.

Authors: Bailing Zhou; Huiying Zhao; Jiafeng Yu; Chengang Guo; Xianghua Dou; Feng Song; Guodong Hu; Zanxia Cao; Yuanxu Qu; Yuedong Yang; Yaoqi Zhou; Jihua Wang
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

10. Nc2Eye: A Curated ncRNAomics Knowledgebase for Bridging Basic and Clinical Research in Eye Diseases.

Authors: Yan Zhang; Zhengbo Xue; Fangjie Guo; Fulong Yu; Liangde Xu; Hao Chen
Journal: Front Cell Dev Biol Date: 2020-02-14

7 in total

1. Multi-Omics Data Analyses Construct a Six Immune-Related Genes Prognostic Model for Cervical Cancer in Tumor Microenvironment.

Authors: Fangfang Xu; Jiacheng Shen; Shaohua Xu
Journal: Front Genet Date: 2021-05-24 Impact factor: 4.599

2. 6mA-Pred: identifying DNA N6-methyladenine sites based on deep learning.

Authors: Qianfei Huang; Wenyang Zhou; Fei Guo; Lei Xu; Lichao Zhang
Journal: PeerJ Date: 2021-02-03 Impact factor: 2.984

3. Novel lncRNAs Co-Expression Networks Identifies LINC00504 with Oncogenic Role in Luminal A Breast Cancer Cells.

Authors: Carolina Mathias; Clarice S Groeneveld; Sheyla Trefflich; Erika P Zambalde; Rubens S Lima; Cícero A Urban; Karin B Prado; Enilze M S F Ribeiro; Mauro A A Castro; Daniela F Gradia; Jaqueline C de Oliveira
Journal: Int J Mol Sci Date: 2021-02-28 Impact factor: 5.923

4. Integrative Analysis of DNA Methylation Data and Transcriptome Data Identified a DNA Methylation-Dysregulated Four-LncRNA Signature for Predicting Prognosis in Head and Neck Squamous Cell Carcinoma.

Authors: Qiuxu Wang; Weiwei Yang; Wei Peng; Xuemei Qian; Minghui Zhang; Tianzhen Wang
Journal: Front Cell Dev Biol Date: 2021-04-01

5. Transcriptomic Data Analyses Reveal a Reprogramed Lipid Metabolism in HCV-Derived Hepatocellular Cancer.

Authors: Guoqing Liu; Guojun Liu; Xiangjun Cui; Ying Xu
Journal: Front Cell Dev Biol Date: 2020-10-27

6. Circulating exosomal miR-363-5p inhibits lymph node metastasis by downregulating PDGFB and serves as a potential noninvasive biomarker for breast cancer.

Authors: Xin Wang; Tianyi Qian; Siqi Bao; Hengqiang Zhao; Hongyan Chen; Zeyu Xing; Yalun Li; Menglu Zhang; Xiangzhi Meng; Changchang Wang; Jie Wang; Hongxia Gao; Jiaqi Liu; Meng Zhou; Xiang Wang
Journal: Mol Oncol Date: 2021-06-25 Impact factor: 6.603

7. Long non-coding ribonucleic acid urothelial carcinoma-associated 1 promotes high glucose-induced human retinal endothelial cells angiogenesis through regulating micro-ribonucleic acid-624-3p/vascular endothelial growth factor C.

Authors: Huang Yan; Panpan Yao; Ke Hu; Xueyao Li; Hong Li
Journal: J Diabetes Investig Date: 2021-07-27 Impact factor: 4.232

7 in total