| Literature DB >> 23633944 |
K Bretonnel Cohen1, Lawrence E Hunter.
Abstract
Text mining for translational bioinformatics is a new field with tremendous research potential. It is a subfield of biomedical natural language processing that concerns itself directly with the problem of relating basic biomedical research to clinical practice, and vice versa. Applications of text mining fall both into the category of T1 translational research-translating basic science results into new interventions-and T2 translational research, or translational research for public health. Potential use cases include better phenotyping of research subjects, and pharmacogenomic research. A variety of methods for evaluating text mining applications exist, including corpora, structured test suites, and post hoc judging. Two basic principles of linguistic structure are relevant for building text mining applications. One is that linguistic structure consists of multiple levels. The other is that every level of linguistic structure is characterized by ambiguity. There are two basic approaches to text mining: rule-based, also known as knowledge-based; and machine-learning-based, also known as statistical. Many systems are hybrids of the two approaches. Shared tasks have had a strong effect on the direction of the field. Like all translational bioinformatics software, text mining software for translational bioinformatics can be considered health-critical and should be subject to the strictest standards of quality assurance and software testing.Entities:
Mesh:
Year: 2013 PMID: 23633944 PMCID: PMC3635962 DOI: 10.1371/journal.pcbi.1003044
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Some knowledge sources for biomedical natural language processing.
| Informatics for Integrating Biology and the Bedside (i2b2 - https://www.i2b2.org/) | National Center for Biomedical Computing with focus on translational research that facilitates and proves data sets for clinical natural language processing research |
| Gene Ontology (https://www.geneontology.org) | Controlled vocabulary with relationships including partonymy and inheritance, designed for describing gene functions, broadly construed |
| Entrez Gene (https://www.ncbi.nlm.nih.gov/gene) | Source for gene names, symbols, and synonyms; also the source for GeneRIFs and SUMMARY fields |
| PubMed/MEDLINE (https://www.ncbi.nlm.nih.gov/pubmed) | The National Library of Medicine's database of abstracts of biomedical publications (MEDLINE) and search interface for accessing them (PubMed) |
| Unified Medical Language System (https://www.nlm.nih.gov/research/umls/) | Large lexical and conceptual resource, including the UMLS Metathesaurus, which aggregates a large number of biomedical and some genomic vocabularies |
| SWISSPROT (https://www.uniprot.org/) | Database of information about proteins with literature references, useful as a gold standard |
| PharmGKB (https://www.pharmgkb.org/) | Database of relationships between a number of clinical, genomic, and other entities with literature references, useful as a gold standard |
| Comparative Toxicogenomics Database (https://ctdbase.org/) | Database of relationships between genes, diseases, and chemicals, with literature references, useful as a gold standard |
Various terminological resources, data sources, and gold-standard databases for biomedical natural language processing.