| Literature DB >> 34680062 |
Sofia I R Conceição1, Francisco M Couto1.
Abstract
In the assembly of biological networks it is important to provide reliable interactions in an effort to have the most possible accurate representation of real-life systems. Commonly, the data used to build a network comes from diverse high-throughput essays, however most of the interaction data is available through scientific literature. This has become a challenge with the notable increase in scientific literature being published, as it is hard for human curators to track all recent discoveries without using efficient tools to help them identify these interactions in an automatic way. This can be surpassed by using text mining approaches which are capable of extracting knowledge from scientific documents. One of the most important tasks in text mining for biological network building is relation extraction, which identifies relations between the entities of interest. Many interaction databases already use text mining systems, and the development of these tools will lead to more reliable networks, as well as the possibility to personalize the networks by selecting the desired relations. This review will focus on different approaches of automatic information extraction from biomedical text that can be used to enhance existing networks or create new ones, such as deep learning state-of-the-art approaches, focusing on cancer disease as a case-study.Entities:
Keywords: cancer; natural language processing; network biology; text mining
Mesh:
Year: 2021 PMID: 34680062 PMCID: PMC8533101 DOI: 10.3390/biom11101430
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Figure 1Resource Description Framework example.
Resume of the cited text mining methods.
| Text Mining Task | ||||
|---|---|---|---|---|
| Method | Target | NER | RE | Reference |
| LSTM 1-CRFF | Genes/proteins, chemicals, diseases, cell lines and species entity types | X | [ | |
| CollaboNet | Gene/protein, disease and chemicals | X | [ | |
| BioBERT | Gene-Disease and Protein-Chemical | X | [ | |
| BO-LSTM | Drug-Drug | X | [ | |
| BiOnt | Gene, phenotypes, disease and drugs combinations | X | [ | |
| RNN 3 + CNN 2 | Protein-Protein and Drug-Drug | X | [ | |
| LSTM 1 + CNN 2 | Protein-Protein | X | [ | |
| graph LSTM 1 | Drug-gene-mutation | X | [ | |
| graph LSTM 1 | Drug-gene-mutation | X | [ | |
| SETH | Gene variant normalization in to dbSNP or UniProt | X | [ | |
| Befree | Gene-disease and variant-disease | X | X | [ |
| LHGDN | Gene-Disease | X | X | [ |
| Link | Genes, diseases, drugs and key concepts | X | [ | |
1 Long Short Term Memory 2 Convolutional Neural Network 3 Recurrent Neural Network.
Resume of the cited cancer text mining methods.
| Target Cancer | Method | Source | Reference |
|---|---|---|---|
| Breast | Data Mining and network analysis | Biomedical Abstracts | [ |
| Breast | Rule Based | Pathology Reports | [ |
| Breast | Machine Learning | Pathology Reports | [ |
| Breast | Unsupervised Learning, Text mining and Pattern mining | PubMed Articles | [ |
| Urothelial cancer | Latent Dirichlet Allocation and Lda2vec | PubMed Abstracts and Titles | [ |
| Prostate adenocarcinoma | Machine Learning | Pathology Reports | [ |
| Generic | LSTM | PubMed Abstracts | [ |
| Generic | CNN 1 | Biomedical Abstracts | [ |
| Generic | Supervised Learning | Full PubMed | [ |
| Generic | Multitask CNN 1 | Pathology Reports | [ |
1 Convolutional Neural Network.