| Literature DB >> 28387604 |
Adil Salhi1, Magbubah Essack1, Tanvir Alam1, Vladan P Bajic2, Lina Ma3,4, Aleksandar Radovanovic1, Benoit Marchand5, Sebastian Schmeier6, Zhang Zhang3,4,7,8, Vladimir B Bajic1.
Abstract
Noncoding RNAs (ncRNAs), particularly microRNAs (miRNAs) and long ncRNAs (lncRNAs), are important players in diseases and emerge as novel drug targets. Thus, unraveling the relationships between ncRNAs and other biomedical entities in cells are critical for better understanding ncRNA roles that may eventually help develop their use in medicine. To support ncRNA research and facilitate retrieval of relevant information regarding miRNAs and lncRNAs from the plethora of published ncRNA-related research, we developed DES-ncRNA ( www.cbrc.kaust.edu.sa/des_ncrna ). DES-ncRNA is a knowledgebase containing text- and data-mined information from public scientific literature and other public resources. Exploration of mined information is enabled through terms and pairs of terms from 19 topic-specific dictionaries including, for example, antibiotics, toxins, drugs, enzymes, mutations, pathways, human genes and proteins, drug indications and side effects, mutations, diseases, etc. DES-ncRNA contains approximately 878,000 associations of terms from these dictionaries of which 36,222 (5,373) are with regards to miRNAs (lncRNAs). We provide several ways to explore information regarding ncRNAs to users including controlled generation of association networks as well as hypotheses generation. We show an example how DES-ncRNA can aid research on Alzheimer disease and suggest potential therapeutic role for Fasudil. DES-ncRNA is a powerful tool that can be used on its own or as a complement to the existing resources, to support research in human ncRNA. To our knowledge, this is the only knowledgebase dedicated to human miRNAs and lncRNAs derived primarily through literature-mining enabling exploration of a broad spectrum of associated biomedical entities, not paralleled by any other resource.Entities:
Keywords: Alzheimer disease; Noncoding RNA; bioinformatics; data-mining; information integration; knowledgebase; literature-mining; long noncoding RNA; microRNA; text-mining
Mesh:
Substances:
Year: 2017 PMID: 28387604 PMCID: PMC5546543 DOI: 10.1080/15476286.2017.1312243
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652
Figure 1.DES-ncRNA 3-tier server architecture.
List of dictionaries used in DES-ncRNA, with the number of terms that each dictionary contains and the number of statistically significantly enriched normalized terms identified in the analyzed documents.
| Dictionary | # of terms in dictionaries | # of statistically significant terms in documents | Source |
|---|---|---|---|
| Antibiotics | 6,768 | 203 | pre-existing in DES |
| Chemical Entities of Biological Interest (ChEBI) | 164,419 | 3,943 | pre-existing in DES |
| Drugs (DrugBank + Chembl) | 40,131 | 1,563 | updated from Chembl |
| Enzymes (IntEnz) | 29,993 | 1,192 | pre-existing in DES |
| Metabolites (MetaboLights) | 59,569 | 1,139 | pre-existing in DES |
| Toxins (T3DB) | 47,140 | 728 | pre-existing in DES |
| Biological Process (GO) | 27,801 | 2,816 | pre-existing in DES |
| Cellular Component (GO) | 3,842 | 894 | pre-existing in DES |
| Disease Ontology (DO) | 23,553 | 1,701 | pre-existing in DES |
| Molecular Function (GO) | 10,796 | 717 | pre-existing in DES |
| Pathways (KEGG, Reactome, UniPathway, PANTHER) | 9,650 | 896 | pre-existing in DES |
| Drug Indications and Side Effects (SIDER) | 7,058 | 1,382 | Newly compiled |
| Human Anatomy | 7,167 | 1,476 | pre-existing in DES |
| Human Genes & Proteins (EntrezGene) | 206,179 | 16,214 | pre-existing in DES |
| Human Long Non-Coding RNAs (FARNA) | 176,516 | 230 | Updated |
| Human MicroRNAs (miRBase) | 9,471 | 931 | Updated |
| Human Transcription Factors (TcoF-DB) | 12,280 | 1,273 | Updated |
| Human Transcription Co-Factors (TcoF-DB) | 3,850 | 345 | Updated |
| Mutations (tmVar) | 192,936 | 7,447 | pre-existing in DES |
References for the data sources indicated in Table 1 are as follows: ChEBI, DrugBank, Chembl, MetaboLights, IntEnz, T3DB, Industrially Important EnzymesEC, GO, KEGG, Reactome, PANTHER, UniPathways, EntrezGene, NCBI Taxonomy, KOBAS, FARNA, mirBase, TcoF-DB, tmVar, SIDER.
Statistically significantly enriched pairs of terms as identified in the analyzed set of documents (pairs with FDR < = 0.05), when one member of the pair is miRNA or lncRNA.
| Dictionary | # of statistically significantly enriched pairs of terms containing miRNAs | # of statistically significantly enriched pairs of terms containing lncRNAs |
|---|---|---|
| Antibiotics | 32 | 10 |
| Chemical Entities of Biological Interest (ChEBI) | 728 | 152 |
| Drugs (DrugBank + Chembl) | 357 | 56 |
| Enzymes (IntEnz) | 267 | 60 |
| Metabolites (MetaboLights) | 234 | 51 |
| Toxins (T3DB) | 236 | 57 |
| Biological Process (GO) | 913 | 102 |
| Cellular Component (GO) | 111 | 84 |
| Disease Ontology (DO) | 1,412 | 274 |
| Molecular Function (GO) | 112 | 43 |
| Pathways (KEGG, Reactome, UniPathway, PANTHER) | 518 | 71 |
| Drug Indications and Side Effects (SIDER) | 1,048 | 202 |
| Human Anatomy | 1,099 | 241 |
| Human Genes & Proteins (EntrezGene) | 7,303 | 3,055 |
| Human Long Non-Coding RNAs (FARNA) | 70 | |
| Human MicroRNAs (miRBase) | 70 | |
| Human Transcription Co-Factors (TcoF-DB) | 268 | 56 |
| Human Transcription Factors (TcoF-DB) | 1,123 | 345 |
| Mutations (tmVar) | 192,936 | 7,447 |
| 36,222 | 5373 | |
| Searchable records (includes redundant inverse pairs for the same dictionary associations, i.e., for miRNA-miRNA and lncRNA-lncRNA associations) | 36,222+ | 5,373 + |
Mapped entities from GO, Reactome and KOBAS resources.
| # of total inferred hits | # of statistically enriched inferred hits | |
|---|---|---|
| GO Terms | 12,755 | 2,893 |
| Reactome Pathways | 693 | 313 |
| KOBAS Pathways | 2,827 | 825 |
| KOBAS Diseases | 10,366 | 178 |
Figure 2.Step-by-step illustration of how DES-ncRNA can be used to identify ncRNA-related components involved in AD progression. The green circles represent the “Human MicroRNAs” dictionary; the gray upside-down triangles represent the “Cellular Component” dictionary; the yellow parallelograms represent ““Disease Ontology”” dictionary; the yellow squares represent the “Human Genes and Proteins” dictionary; and the lime circles represent “Human Long Non-Coding RNAs” dictionary. The edge color is distributed across a color spectrum from hot/red (high frequency co-occurrence/strong association) to cold/blue (small number of co-occurrences, weaker association). The numbers on the edges provide the number of publications that link the associated nodes.