| Literature DB >> 35897884 |
Jian Fan1.
Abstract
Noncoding RNAs (ncRNA) are transcripts without protein-coding potential that play fundamental regulatory roles in diverse cellular processes and diseases. The application of deep sequencing experiments in ncRNA research have generated massive omics datasets, which require rapid examination, interpretation and validation based on exiting knowledge resources. Thus, text-mining methods have been increasingly adapted for automatic extraction of relations between an ncRNA and its target or a disease condition from biomedical literature. These bioinformatics tools can also assist in more complex research, such as database curation of candidate ncRNAs and hypothesis generation with respect to pathophysiological mechanisms. In this concise review, we first introduced basic concepts and workflow of literature mining systems. Then, we compared available bioinformatics tools tailored for ncRNA studies, including the tasks, applicability, and limitations. Their powerful utilities and flexibility are demonstrated by examples in a variety of diseases, such as Alzheimer's disease, atherosclerosis and cancers. Finally, we outlined several challenges from the viewpoints of both system developers and end users. We concluded that the application of text-mining techniques will booster disease-associated ncRNA discoveries in the biomedical literature and enable integrative biology in the current omics era.Entities:
Keywords: biomedical literature mining; deep sequencing; ncRNA; omics
Mesh:
Substances:
Year: 2022 PMID: 35897884 PMCID: PMC9331993 DOI: 10.3390/molecules27154710
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.927
Figure 1Growth of MEDLINE database. The number of articles published from the 1970s to the 2020s is shown chronologically. ‘All’ means all papers indexed in MEDLINE; ‘ncRNA’ means papers with keyword ‘ncRNA’.
Figure 2Flow chart of a typical text-mining system. Literature mining systems can be roughly divided into base level and high level. At the base level, there are three important tasks, including named entity recognition (NER), named entity normalization (NEN) and relation extraction detection (RE). Topic recognition (TR), knowledge discovery (KD) and database curation (DC) are handled at the high level.
Figure 3An illustration of NER, NEN and RE steps in text analysis. As an example, a real-world sentence from a circRNA research paper is used to visualize each step. In this first step, entity names, such as circRNA, gene and disease, are recognized; in step 2, word variations, such as synonyms and abbreviations, are removed by NEN methods; finally, entities’ relations are identified by co-occurrence-based, linguistic rule or machine learning methods.
Selected text-mining systems and tools for ncRNA studies.
| Tools | Tasks | Methods | PMID | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| NER & NEN | RE | DC | KD | Dictionary-Based | Co-Occurrence | Semantic Approaches | Rule-Based | Machine Learning | ||
| Bagewadi et al. [ | Y | Y | Y | Y | Y | 26535109 | ||||
| miRSel | Y | Y | Y | Y | 20233441 | |||||
| miRTex | Y | Y | Y | Y | 26407127 | |||||
| miRiaD | Y | Y | Y | 27216254 | ||||||
| IBRel | Y | Y | Y | 28263989 | ||||||
| DES-ncRNA | Y | Y | Y | 28387604 | ||||||
| emiRIT | Y | Y | Y | 34048547 | ||||||
| miRetrieve | Y | Y | Y | 34988440 | ||||||
| LSI | Y | Y | 27766940 | |||||||
| RWRMTN | Y | Y | 32539680 | |||||||
| atheMir | Y | Y | Y | Y | 31378854 | |||||
| Henry et al. [ | Y | Y | 34250435 | |||||||
Note: Main tasks of literature mining systems include named entity recognition (NER), named entity normalization (NEN) relation extraction (RE), database curation (DC) and knowledge discovery (KD).