Literature DB >> 28387604

DES-ncRNA: A knowledgebase for exploring information about human micro and long noncoding RNAs based on literature-mining.

Adil Salhi¹, Magbubah Essack¹, Tanvir Alam¹, Vladan P Bajic², Lina Ma^3,4, Aleksandar Radovanovic¹, Benoit Marchand⁵, Sebastian Schmeier⁶, Zhang Zhang^3,4,7,8, Vladimir B Bajic¹.

Abstract

Noncoding RNAs (ncRNAs), particularly microRNAs (miRNAs) and long ncRNAs (lncRNAs), are important players in diseases and emerge as novel drug targets. Thus, unraveling the relationships between ncRNAs and other biomedical entities in cells are critical for better understanding ncRNA roles that may eventually help develop their use in medicine. To support ncRNA research and facilitate retrieval of relevant information regarding miRNAs and lncRNAs from the plethora of published ncRNA-related research, we developed DES-ncRNA ( www.cbrc.kaust.edu.sa/des_ncrna ). DES-ncRNA is a knowledgebase containing text- and data-mined information from public scientific literature and other public resources. Exploration of mined information is enabled through terms and pairs of terms from 19 topic-specific dictionaries including, for example, antibiotics, toxins, drugs, enzymes, mutations, pathways, human genes and proteins, drug indications and side effects, mutations, diseases, etc. DES-ncRNA contains approximately 878,000 associations of terms from these dictionaries of which 36,222 (5,373) are with regards to miRNAs (lncRNAs). We provide several ways to explore information regarding ncRNAs to users including controlled generation of association networks as well as hypotheses generation. We show an example how DES-ncRNA can aid research on Alzheimer disease and suggest potential therapeutic role for Fasudil. DES-ncRNA is a powerful tool that can be used on its own or as a complement to the existing resources, to support research in human ncRNA. To our knowledge, this is the only knowledgebase dedicated to human miRNAs and lncRNAs derived primarily through literature-mining enabling exploration of a broad spectrum of associated biomedical entities, not paralleled by any other resource.

Entities: Chemical Disease Gene Species

Keywords: Alzheimer disease; Noncoding RNA; bioinformatics; data-mining; information integration; knowledgebase; literature-mining; long noncoding RNA; microRNA; text-mining

Mesh：

Substances：

Year: 2017 PMID： 28387604 PMCID： PMC5546543 DOI： 10.1080/15476286.2017.1312243

Source DB: PubMed Journal: RNA Biol ISSN： 1547-6286 Impact factor: 4.652

Alzheimer disease Asynchronous JavaScript and XML Chemical Entities of Biological Interest Cascading Style Sheets Dragon Exploration System Disease Ontology Enzyme Commission Functional Annotation of RNA transcripts False Discovery Rate Gene Ontology HyperText Markup Language Integrated relational Enzyme database JavaScript Object Notation KnowledgeBase Kyoto Encyclopedia of Genes and Genomes KO-Based Annotation System long ncRNA microRNA Noncoding RNA National Center for Biotechnology Information Protein ANalysis THrough Evolutionary Relationships Side Effect Resource small ncRNAs Single Nucleotide Polymorphisms Toxin and Toxin Target Database Transcription co-Factors DataBase Text-Mining for sequence Variants eXtensible Markup Language

Introduction

Noncoding RNAs (ncRNAs), compared with mRNAs, are RNA molecules that do not encode protein information. Many ncRNAs have unknown function and are transcribed from various places within the genome. Through high-throughput sequencing technologies a considerable number of ncRNAs have been identified, many more than expected. New members and classes of ncRNAs are frequently detected and found to play crucial roles in a wide variety of important biologic processes. Among them, microRNAs (miRNAs), small ∼22 nt in length ncRNAs and long noncoding RNAs (lncRNAs) that are typically longer than 200 nt, are 2 popular and well-studied classes, featuring large quantity, ubiquitous distribution, significant functions in gene expression and regulation, and close associations with various human diseases. To facilitate in-depth investigations of human miRNAs and lncRNAs, several databases have been developed for collecting and managing their sequences and annotations. For miRNAs, representative examples include miRBase, miRDB, miRTarBase and DASHR. Specially, miRBase is a central repository for miRNA sequence information, containing miRNAs from more than 200 species. miRDB is a database for miRNA target predictions and functional annotations, currently including 947,941 associations of predicted gene targets regulated by 2,588 human miRNAs. Unlike miRDB, miRTarBase is a database collecting experimentally validated miRNA-target interactions, containing 12,738 human genes targeted by 2,619 miRNAs. DASHR is a dedicated database for human small ncRNAs (sncRNA) and contains information on ∼48,000 human sncRNA genes. Likewise, there are also several representative examples for lncRNAs. NONCODE features a large collection of ncRNAs and contains 141,353 human lncRNA transcripts. DIANA-LncBase is a database indexing miRNA targets on ncRNAs and integrates an in silico predicted collection of ∼51,000 miRNA-lncRNA interaction pairs for human and mouse. LncRNA2Target stores lncRNA-“target gene” associations under the assumption that a gene is targeted by the lncRNA if the gene is differentially expressed after lncRNA knockdown or overexpression. LncRNAWiki, as a core resource in the BIG Data Center, is a wiki-based database for community curation of human lncRNAs, which currently integrates a total of 105,824 lncRNAs and computationally identified 9,387 lncRNAs that potentially encode short proteins. RNAcentral provides a single-entry point to access all types of ncRNA sequences that integrates information from 22 participating resources. However, there are many other related ncRNA databases of interest. It has been documented that roles of miRNAs and lncRNAs are critical for better understanding molecular regulation in humans that affects a variety of biologic processes. However, to get deeper insight into the roles that miRNA and lncRNA may have in living cells, it is helpful to be able to explore associations of these ncRNAs to other biomedical entities. Interesting examples of such resources include ChemiRs, miRegulome, DisGeNET, miRCancer, LncRNADisease and FARNA. Each of these resources deal with few and/or specific types of these associations and would benefit if broader spectrum of associations is enabled. This issue could be leveraged to some extent by literature text-mining. However, efficient exploration of information in the biomedical field is not an easy task, as the volume of published information is substantial and is continuously growing. The sheer volume of associations/links between relevant biomedical terms further exacerbated the problem. One approach to tackle this issue is to create topic-specific knowledgebases (KBs) that contains pre-computed term associations and is equipped with built-in information exploration capabilities to ease the task for researchers. Several such topic-specific KBs for life science have been developed, where text analysis has been used for titles and abstracts of PubMed records. Although titles and abstracts of published scientific articles contain the highest information density, full-text articles provide significantly more information overall. To date, the use of text-mining to identify and extract term associations has been used in the field of ncRNA research (see for example. Although these attempts at using literature-mining for ncRNA research have been effective in identifying very specific associations, the scope of these resources could be expanded to support broader information exploration. To enable a more comprehensive exploration of pairs of associated terms related to human miRNAs and lncRNAs, we developed DES-ncRNA KB. This KB makes use of dictionaries that are pre-compiled and contain terms and phrases belonging to different thematic categories (e.g. pathways, genes, diseases, miRNAs, lncRNAs, chemicals, drugs, drug indications and side effects, etc.). These dictionaries are used to index text. DES-ncRNA enables exploration and discovery of statistically enriched terms from these dictionaries as found in the analyzed text, but more importantly the statistically enriched associations among the enriched terms. As the primary source of textual data, DES-ncRNA identified dictionary terms in titles and abstracts (retrieved from PubMed), as well as open-access full-length articles (retrieved from PubMed Central, and BioMed Central). Moreover, due to the importance of miRNAs and lncRNAs in studies of human diseases, relevant dictionaries have been included, so that users can explore different aspects of potential ncRNA roles in living cells to have better insights into their roles in diseases. To illustrate how DES-ncRNA can assist researchers in the ncRNA domain, we present an example related to Alzheimer disease. To our knowledge, DES-ncRNA is the first KB dedicated to human miRNAs and lncRNAs derived through literature-mining that focuses on term associations, which has comprehensive information exploration capabilities, comprehensive spectrum of associations and offers users a variety of options to localize information of interest. There is no similar KB available for support of ncRNA research.

Materials and methods

Server architecture and underlying systems

The server architecture is based on a 3-tier model: data, logic and presentation tiers, implemented as shown in Fig. 1. The traditional SQL-based data tier is expanded with an NoSQL document storage. Technologies used are PostgreSQL and MongoDB.

Figure 1.

DES-ncRNA 3-tier server architecture.

DES-ncRNA 3-tier server architecture. NoSQL document storage is populated with publicly available biomedical literature from BioMed Central, PubMed and PubMed Central. Specifically written web-crawlers regularly gather data from these sources and provide cross-system data federalization. JSON (JavaScript Object Notation) format is used for data federalization since it has less syntactic overhead for similar amounts of information, is understood and processed by all tiers without much transformation and can be easily expanded if new data sources are included. NoSQL document storage was chosen since it is JSON-based, and as such has a flexible dynamic structure with no schema constraints. The resulting “Bio-Text” repository is fault tolerant, and horizontally scalable. However, a SQL database is still used for indexes related to the KB, since it has a more flexible and powerful query system for relational data. The logic tier responds to user queries with data provided by the data tier. In addition, this tier handles data integration from SQL and NoSQL data sources (e.g., the literature view is constructed by fetching the relevant documents from the text repository (NoSQL) and annotating it using the index from the SQL store). Software modules in this tier respond to various AJAX (Asynchronous JavaScript and XML) calls from the presentation layer. The presentation tier uses jQuery to provide separation between presentation related HTML/CSS code and server side logic. JQuery provides AJAX for asynchronous background calls to the logic tier, native JSON parsing, and dynamic rendering of the user's browser display.

Implementation and testing

DES-ncRNA is hosted on a CentOS-7 operating system using an Apache (2.4.6) web server. The literature repository is hosted on a MongoDB (2.6.11) database, and the KB index and related tables are stored in a PostgreSQL (9.2.15) database. DES-ncRNA was created using an Apache Lucene text index for fast querying of the text. Different components of the KB were developed using various programming languages/tools, namely: Java (openjdk 1.8.0_91), C/C++ (gcc 4.8.5), Perl (v5.16.3), PHP (5.4.16), JavaScript, and JQuery (3.0.0). DES-ncRNA is functional across major web-browsers on Linux, Windows, and Mac OS platforms. It was specifically tested for Firefox, Chrome and Safari. DES-ncRNA was not tested for hand-held devices, and is not currently intended for such use.

Preparing the literature corpus

To create DES-ncRNA, we first queried our local literature repository, a MongoDB repository hosting PubMed, PubMed Central, and BioMed Central articles. The following DES-ncRNA query was used to create the ncRNA literature corpus: [((long AND (ncRNAs OR ncRNA OR “non-coding” OR noncoding OR “non coding”)) OR lncRNA OR lncRNAs OR “linc RNA” OR “linc RNAs” OR lincRNA OR lincRNAs OR microRNA OR miRNA OR microRNAs OR miRNAs OR “micro RNA” OR “micro RNAs”) AND (human OR humans OR “homo sapiens”)]. The query was made on 2016-12-25, and retrieved 31,074 articles.

Preparing the dictionaries

To ensure relevance and comprehensiveness, we used 19 dictionaries from the pre-existing DES v2.0 vocabularies (Table 1). Frequently, dictionary terms have several synonymous words/phrases. Where possible, we normalize them to the same internal identifier in DES-ncRNA. Such an approach allows for the universal identification of terms, for example using IDs from authoritative sources such as EntrezGene, ChEBI or UniProt ID, enables to complement text-mined information with information from such external sources as well as links to these sources.

Table 1.

Dictionary	# of terms in dictionaries	# of statistically significant terms in documents	Source
Chemicals/Compounds
Antibiotics	6,768	203	pre-existing in DES
Chemical Entities of Biological Interest (ChEBI)	164,419	3,943	pre-existing in DES
Drugs (DrugBank + Chembl)	40,131	1,563	updated from Chembl
Enzymes (IntEnz)	29,993	1,192	pre-existing in DES
Metabolites (MetaboLights)	59,569	1,139	pre-existing in DES
Toxins (T3DB)	47,140	728	pre-existing in DES
Functional Annotation
Biological Process (GO)	27,801	2,816	pre-existing in DES
Cellular Component (GO)	3,842	894	pre-existing in DES
Disease Ontology (DO)	23,553	1,701	pre-existing in DES
Molecular Function (GO)	10,796	717	pre-existing in DES
Pathways (KEGG, Reactome, UniPathway, PANTHER)	9,650	896	pre-existing in DES
General
Drug Indications and Side Effects (SIDER)	7,058	1,382	Newly compiled
Human Anatomy	7,167	1,476	pre-existing in DES
Genes/Proteins/Transcripts
Human Genes & Proteins (EntrezGene)	206,179	16,214	pre-existing in DES
Human Long Non-Coding RNAs (FARNA)	176,516	230	Updated
Human MicroRNAs (miRBase)	9,471	931	Updated
Human Transcription Factors (TcoF-DB)	12,280	1,273	Updated
Human Transcription Co-Factors (TcoF-DB)	3,850	345	Updated
Mutations (tmVar)	192,936	7,447	pre-existing in DES

References for the data sources indicated in Table 1 are as follows: ChEBI, DrugBank, Chembl, MetaboLights, IntEnz, T3DB, Industrially Important EnzymesEC, GO, KEGG, Reactome, PANTHER, UniPathways, EntrezGene, NCBI Taxonomy, KOBAS, FARNA, mirBase, TcoF-DB, tmVar, SIDER.

List of dictionaries used in DES-ncRNA, with the number of terms that each dictionary contains and the number of statistically significantly enriched normalized terms identified in the analyzed documents. References for the data sources indicated in Table 1 are as follows: ChEBI, DrugBank, Chembl, MetaboLights, IntEnz, T3DB, Industrially Important EnzymesEC, GO, KEGG, Reactome, PANTHER, UniPathways, EntrezGene, NCBI Taxonomy, KOBAS, FARNA, mirBase, TcoF-DB, tmVar, SIDER. Terms in these dictionaries are mined in the retrieved articles, highlighted and color-coded according to dictionary. This process is enabled by the back-end index that matches terms to their occurrences, up to the character level, within the mined articles. A term is defined as enriched when it is overrepresented in DES-ncRNA documents as compared with all PubMed, PubMed Central, and BioMed Central articles in our local repository. We used a false discovery rate (FDR) <0.05, which was calculated based on the Benjamini-Hochberg procedure to correct for multiplicity testing. Terms in all dictionaries are normalized, i.e. names, symbols and synonyms referring to the same concept are represented by a single entity when analyzed.

Results

Knowledgebase statistics

Table 1 shows the dictionaries used to mine the text documents, as well as the statistically significantly enriched unique terms found in these documents. While there are only 45,090 unique normalized statistically enriched terms identified from 1,039,119 terms contained in the used dictionaries, the number of statistically significantly enriched pairs of these terms are 877,977. Table 2 summarizes the associations between miRNAs and lncRNAs with terms from 19 dictionaries.

Table 2.

Statistically significantly enriched pairs of terms as identified in the analyzed set of documents (pairs with FDR < = 0.05), when one member of the pair is miRNA or lncRNA.

Dictionary	# of statistically significantly enriched pairs of terms containing miRNAs	# of statistically significantly enriched pairs of terms containing lncRNAs
Chemicals/Compounds
Antibiotics	32	10
Chemical Entities of Biological Interest (ChEBI)	728	152
Drugs (DrugBank + Chembl)	357	56
Enzymes (IntEnz)	267	60
Metabolites (MetaboLights)	234	51
Toxins (T3DB)	236	57
Functional Annotation
Biological Process (GO)	913	102
Cellular Component (GO)	111	84
Disease Ontology (DO)	1,412	274
Molecular Function (GO)	112	43
Pathways (KEGG, Reactome, UniPathway, PANTHER)	518	71
General
Drug Indications and Side Effects (SIDER)	1,048	202
Human Anatomy	1,099	241
Genes/Proteins/Transcripts
Human Genes & Proteins (EntrezGene)	7,303	3,055
Human Long Non-Coding RNAs (FARNA)	70	364
Human MicroRNAs (miRBase)	19,724	70
Human Transcription Co-Factors (TcoF-DB)	268	56
Human Transcription Factors (TcoF-DB)	1,123	345
Mutations (tmVar)	192,936	7,447
Total	36,222	5373
Searchable records (includes redundant inverse pairs for the same dictionary associations, i.e., for miRNA-miRNA and lncRNA-lncRNA associations)	36,222+19,724 = 55,966	5,373 + 364 = 5,737

Statistically significantly enriched pairs of terms as identified in the analyzed set of documents (pairs with FDR < = 0.05), when one member of the pair is miRNA or lncRNA. Using the identified human genes ad proteins that statistically significantly enriched, we determined GO, Reactome pathway, KOBAS pathway and KOBAS disease terms to which these genes and proteins are associated. The results are summarized in Table 3. These mapped entities are not necessarily present in the text documents analyzed.

Table 3.

Mapped entities from GO, Reactome and KOBAS resources.

	# of total inferred hits	# of statistically enriched inferred hits
GO Terms	12,755	2,893
Reactome Pathways	693	313
KOBAS Pathways	2,827	825
KOBAS Diseases	10,366	178

Mapped entities from GO, Reactome and KOBAS resources.

Utilization

DES-ncRNA provides users with tools that allow them to easily explore literature using statistically enriched single terms and pairs of terms, as well as potential hypotheses. These exploration tools are located in the main menu on the top of the DES-ncRNA homepage, and include “Enriched Terms,” “Enriched Term Pairs,” “Explore Hypotheses,” “GO Terms,” Reactome Pathways,” “KOBAS Pathways,” “KOBAS Diseases” and “Show Literature.” An in-depth description of each exploration tool was previously detailed in Salhi et al. Briefly, the “Enriched Terms” option allows users to mine ncRNA-related literature using single biologic terms/keywords (such as XIST, HOTAIR, MIR21, plasma membrane, bone marrow etc.) organized into thematic dictionaries. The “Enriched Term Pairs” tool allows users in-depth literature-mining for pairs of biologic terms/keywords co-occurring in the same text (identified at the title and abstract level of all available documents, and at sentence level in the remaining full-text document if available) thereby inferring possible biologic connections. The third tool, “Explore Hypotheses,” can be used to check if the inferred biologic connections/associations in “Enriched Term Pairs” are known or are novel and can serve as a starting point for further investigation. Each of these exploration tools has a “Help” link that provides simple instruction on how to use the tool. Exploration via these tools allow users to view enriched terms in pre-compiled theme-based dictionaries that can be viewed and restricted by several types of ranking options. Additionally, when a user hovers the mouse pointer over a term, a hover box is activated with “Network,” “Term Co-occurrences” and “Term Link Sources” links that are generated for the term of interest. On the pages for “Enriched Terms,” “Enriched Term Pairs,” and “Explore Hypotheses,” we added a link to FARNA, when the term in question is a miRNA or lncRNA, where users can find more information and a comprehensive annotation for the ncRNA. This is achieved through FARNA's RESTful API, where the ncRNA is used as the query term. However, it is worth noting that not all ncRNAs identified within DES-ncRNA have annotations in FARNA. “KOBAS Pathways” integrates several pathway databases including “Reactome Pathways,” however, due to the large number of metabolic pathways for more than 2500 organisms, metabolic pathways in humans (in “Reactome Pathways”) are usually not enriched. In DES-ncRNA we provide “Reactome Pathways,” as a separate exploration tool to ensure that metabolic pathways in humans are fairly enriched as several ncRNAs have been linked to disease states. Accordingly, “GO Terms” and “KOBAS Diseases” are also presented as exploration tools. Using “GO Terms,” Reactome Pathways,” “KOBAS Pathways” and “KOBAS Diseases” tools allows users to easily identify GO terms, pathways and diseases frequently associated with ncRNAs. The basic functionalities of the DES-ncRNA are demonstrated in a short introductory video on the “Home” page, and a detailed “Software Manual” can be downloaded from the website.

Case study that illustrate the usefulness of DES-ncRNA as an effective research support system: Progression of Alzheimer disease and potential therapeutic use of Fasudil

Using DES-ncRNA we can extract information regarding a specific disease, such as Alzheimer disease (AD), and the suggested involvement of ncRNAs in the disease pathology for exploring new target strategies. Here, we will explore the associations of ncRNAs and their role in AD. Deciphering molecular signatures triggered by the amyloid cascade that plays important role in AD has been a daunting task due to the complexity of the regulatory networks involving transcription factors and other regulatory proteins, ncRNAs, and numerous interactions between these entities. To explore potential ncRNAs involved in this process, we start by clicking “Enriched Term Pairs” (Fig. 2, Step 1). This opens a page with all enriched associated terms for all dictionaries. However, since the miRNA miR-485-5p transcript is known to be deregulated in AD patients, we filter the first dictionary (term A) for miR-485-5p by typing mir485 in the white input field above the “Select the first dictionary” button. The top 2 scored associated terms build the pairs ‘mir485’-‘BACE1’ and ‘mir485’-‘BACE1-AS’ (Fig. 2, Step 2). Both terms ‘BACE1’ and ‘BACE1-AS’ are known to play key roles in AD progression. Since the focus of this exploration is ncRNAs, we chose to expand our search through ‘BACE1-AS’. In the “Network” window, we clicked on “Dictionaries” and checked off the “Disease Ontology,” “Human Genes and Proteins” and “Human MicroRNAs” dictionaries we right clicked on ‘BACE1-AS’, and selected “Expand from the term” (Fig. 2, Step 2). We then repeated this process by expanding from ‘MIR485’, the only miRNA retrieved with ‘BACE1-AS’, using the “Disease Ontology,” “Human Genes and Proteins” and “Human Long Non-Coding RNAs” dictionaries. All nodes retrieved from ‘MIR485’ were further expanded with the “Cellular Component,” “Disease Ontology” and “Human Genes and Proteins” dictionaries. The resulting network was simplified by removing nodes with a single link (Fig. 2, Step 3).

Figure 2.

Step-by-step illustration of how DES-ncRNA can be used to identify ncRNA-related components involved in AD progression. The green circles represent the “Human MicroRNAs” dictionary; the gray upside-down triangles represent the “Cellular Component” dictionary; the yellow parallelograms represent ““Disease Ontology”” dictionary; the yellow squares represent the “Human Genes and Proteins” dictionary; and the lime circles represent “Human Long Non-Coding RNAs” dictionary. The edge color is distributed across a color spectrum from hot/red (high frequency co-occurrence/strong association) to cold/blue (small number of co-occurrences, weaker association). The numbers on the edges provide the number of publications that link the associated nodes. The final network is clearly divided into 2 sub-networks; one is centered on ‘MIR485’ while the other is centered on ‘BACE1-AS’ (Fig. 1), which is expected as BACE1-AS prevents miRNA- induced repression of BACE1 by masking the binding site for miR-485-5p. Both, BACE1-AS and miR-485-5p are deregulated in RNA samples from AD patients, however their opposing roles suggest that they each part of separate smaller networks focused on a common goal. With in the ‘MIR485’ subnetwork, smaller networks centered on ‘NTRK3’ and ‘discs, large homolog 4 (Drosophila)’ are visible. The ‘discs, large homolog 4 (Drosophila)’ (also known as PSD-95 or SAP-90) is a molecule known to regulate synaptic plasticity, thus its impairment is crucial to memory symptomatology in AD. Also, PSD-95 was demonstrated to play a key role in aging and other psychiatric disorders, thus making it an interesting molecule for chemical management. Moreover, a correlation was found between miR-485 and increased expression at pre synapses and prevented clustering of PSD-95, a post-synaptic protein. Mouse models have revealed that the levels of PSD-95 are reduced by Aβ deposition in brain vulnerable arias, in excitatory synapses in the hippocampus. Similar relevant connections can be made for ‘NTRK3’ (not discussed). To possibly address a therapeutic strategy given by our query, we found that Fasudil may act on more than 2 targets, in our case, the BACE and PSD-95 protein. Using the APP/PS1 Tg mice model Yu et al. demonstrated that Fasudil decreases levels of BACE and increases levels of PSD-95 suggesting that using the multitarget approach is an appropriate strategy to pursue. Fasudil also inhibited several inflammatory factors, suggesting its role in neuroprotection and immunomodulation. These results suggest a hypothesis that repression of neurogenesis “at the right time” may help to delay the onset of AD. This also suggests the importance of finding early biomarkers for increased neurogenesis in non-symptomatic stages of dementia.

Discussions and concluding remarks

DES-ncRNA is able to extract specific information on links between terms form 19 topic-specific dictionaries from these documents, as well as to link the extracted information to information in external resources. Such inferred information may not necessarily be present in the analyzed text. Overall, with the provided exploration interface and several tools integrated, DES-ncRNA can support exploration of many research questions that ncRNA researchers may have. It is worth mentioning that no similar system exists for support of miRNA and lncRNA research in humans. With the increasing complexity of rapidly growing information corpus related to molecular processes in living cells and the control mechanisms they are subjected to, it becomes more and more difficult to analyze that information. This is even more emphasized in relation to diseases and the understanding of their molecular functioning, as this would help in developing efficient diagnosis and treatment. The miRNA and lncRNA research fields are relatively new. However, these molecules have been found to be critical components in regulating fundamental cellular processes and many diseases. With the advent of advanced experimental high-throughput technologies, the volume of information derived through them is already enormous and is quickly increasing. It appears infeasible even for a large research group to follow all developments even in a selected topic. Thus, there is a need for tools that can assist researchers in summarizing the key information on selected topics and enabling them to explore potential associations of terms as suggested by the mined text. For this reason we believe that systems such as DES-ncRNA will be of help to the ncRNA research community. DES-ncRNA allows for linking numerous types of entities (as defined in the dictionaries used), generating association networks of these entities, and allows for hypotheses exploration that no other system currently provides. Nonetheless, it should be acknowledged that every existing resource aimed at supporting research in particular fields of science has numerous limitations. DES-ncRNA has the same types of shortcomings as other text-mining-based resources. For example, it is confined to information presented in the available documents. It should be noted that many of the full-text documents are not allowed for text-mining, which also limits what information can be extracted. In addition, all text-mining systems are far from being able to extract all useful information from the available texts. This is a field that will require significant improvements. For future developments of DES-ncRNA system, we intend to expand it by additional types of ncRNAs and the construction of ncRNA regulatory networks based on large-scale data sets and high-volume scientific literature. DES-ncRNA will be updated regularly.

Availability and requirements

DES-ncRNA is free for academic and nonprofit users and can be accessed through the portal (www.cbrc.kaust.edu.sa/des_ncrna). Users can access the KB using any of the mainstream web browsers, including Firefox, Safari and Chrome.

78 in total

1. Distribution of information in biomedical abstracts and full-text publications.

Authors: M J Schuemie; M Weeber; B J A Schijvenaars; E M van Mulligen; C C van der Eijk; R Jelier; B Mons; J A Kors
Journal: Bioinformatics Date: 2004-05-06 Impact factor: 6.937

Review 2. Non-coding RNAs: regulators of disease.

Authors: Ryan J Taft; Ken C Pang; Timothy R Mercer; Marcel Dinger; John S Mattick
Journal: J Pathol Date: 2010-01 Impact factor: 7.996

Review 3. Long non-coding RNA: Functional agent for disease traits.

Authors: Sriyans Jain; Nirav Thakkar; Jagamohan Chhatai; Manika Pal Bhadra; Utpal Bhadra
Journal: RNA Biol Date: 2016-05-26 Impact factor: 4.652

4. starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data.

Authors: Jian-Hua Yang; Jun-Hao Li; Peng Shao; Hui Zhou; Yue-Qin Chen; Liang-Hu Qu
Journal: Nucleic Acids Res Date: 2010-10-30 Impact factor: 16.971

5. UniPathway: a resource for the exploration and annotation of metabolic pathways.

Authors: Anne Morgat; Eric Coissac; Elisabeth Coudert; Kristian B Axelsen; Guillaume Keller; Amos Bairoch; Alan Bridge; Lydie Bougueleret; Ioannis Xenarios; Alain Viari
Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971

6. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.

Authors: Janet Piñero; Núria Queralt-Rosinach; Àlex Bravo; Jordi Deu-Pons; Anna Bauer-Mehren; Martin Baron; Ferran Sanz; Laura I Furlong
Journal: Database (Oxford) Date: 2015-04-15 Impact factor: 3.451

7. Ribosome-associated ncRNAs: an emerging class of translation regulators.

Authors: Andreas Pircher; Jennifer Gebetsberger; Norbert Polacek
Journal: RNA Biol Date: 2014 Impact factor: 4.652

8. miRegulome: a knowledge-base of miRNA regulomics and analysis.

Authors: Debmalya Barh; Bhanu Kamapantula; Neha Jain; Joseph Nalluri; Antaripa Bhattacharya; Lucky Juneja; Neha Barve; Sandeep Tiwari; Anderson Miyoshi; Vasco Azevedo; Kenneth Blum; Anil Kumar; Artur Silva; Preetam Ghosh
Journal: Sci Rep Date: 2015-08-05 Impact factor: 4.379

Review 9. The functions of microRNAs and long non-coding RNAs in embryonic and induced pluripotent stem cells.

Authors: Wenwen Jia; Wen Chen; Jiuhong Kang
Journal: Genomics Proteomics Bioinformatics Date: 2013-10-01 Impact factor: 7.691

10. ChemiRs: a web application for microRNAs and chemicals.

Authors: Emily Chia-Yu Su; Yu-Sing Chen; Yun-Cheng Tien; Jeff Liu; Bing-Ching Ho; Sung-Liang Yu; Sher Singh
Journal: BMC Bioinformatics Date: 2016-04-18 Impact factor: 3.169

10 in total

1. DES-Tcell is a knowledgebase for exploring immunology-related literature.

Authors: Ahdab AlSaieedi; Adil Salhi; Faroug Tifratene; Arwa Bin Raies; Arnaud Hungler; Mahmut Uludag; Christophe Van Neste; Vladimir B Bajic; Takashi Gojobori; Magbubah Essack
Journal: Sci Rep Date: 2021-07-12 Impact factor: 4.379

2. DES-TOMATO: A Knowledge Exploration System Focused On Tomato Species.

Authors: Adil Salhi; Sónia Negrão; Magbubah Essack; Mitchell J L Morton; Salim Bougouffa; Rozaimi Razali; Aleksandar Radovanovic; Benoit Marchand; Maxat Kulmanov; Robert Hoehndorf; Mark Tester; Vladimir B Bajic
Journal: Sci Rep Date: 2017-07-20 Impact factor: 4.379

3. LncBook: a curated knowledgebase of human long non-coding RNAs.

Authors: Lina Ma; Jiabao Cao; Lin Liu; Qiang Du; Zhao Li; Dong Zou; Vladimir B Bajic; Zhang Zhang
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

4. DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases.

Authors: Magbubah Essack; Adil Salhi; Christophe Van Neste; Arwa Bin Raies; Faroug Tifratene; Mahmut Uludag; Arnaud Hungler; Bozidarka Zaric; Sonja Zafirovic; Takashi Gojobori; Esma Isenovic; Vladan P Bajic
Journal: Oxid Med Cell Longev Date: 2020-03-27 Impact factor: 6.543

Review 5. Redox control of vascular biology.

Authors: Milan Obradovic; Magbubah Essack; Sonja Zafirovic; Emina Sudar-Milovanovic; Vladan P Bajic; Christophe Van Neste; Andreja Trpkovic; Julijana Stanimirovic; Vladimir B Bajic; Esma R Isenovic
Journal: Biofactors Date: 2019-09-04 Impact factor: 6.113

Review 6. Deep Learning in LncRNAome: Contribution, Challenges, and Perspectives.

Authors: Tanvir Alam; Hamada R H Al-Absi; Sebastian Schmeier
Journal: Noncoding RNA Date: 2020-11-30

Review 7. Literature Mining of Disease Associated Noncoding RNA in the Omics Era.

Authors: Jian Fan
Journal: Molecules Date: 2022-07-23 Impact factor: 4.927

8. DES-Amyloidoses "Amyloidoses through the looking-glass": A knowledgebase developed for exploring and linking information related to human amyloid-related diseases.

Authors: Vladan P Bajic; Adil Salhi; Katja Lakota; Aleksandar Radovanovic; Rozaimi Razali; Lada Zivkovic; Biljana Spremo-Potparevic; Mahmut Uludag; Faroug Tifratene; Olaa Motwalli; Benoit Marchand; Vladimir B Bajic; Takashi Gojobori; Esma R Isenovic; Magbubah Essack
Journal: PLoS One Date: 2022-07-25 Impact factor: 3.752

Review 9. Long Non-coding RNAs: Mechanisms, Experimental, and Computational Approaches in Identification, Characterization, and Their Biomarker Potential in Cancer.

Authors: Anshika Chowdhary; Venkata Satagopam; Reinhard Schneider
Journal: Front Genet Date: 2021-07-01 Impact factor: 4.599

10. Integrated analysis of lncRNA, miRNA and mRNA reveals novel insights into the fertility regulation of large white sows.

Authors: Huiyan Hu; Qing Jia; Jianzhong Xi; Bo Zhou; Zhiqiang Li
Journal: BMC Genomics Date: 2020-09-14 Impact factor: 3.969

10 in total