| Literature DB >> 21347135 |
Tara B Borlawsky1, Jianrong Li, Lyudmila Shagina, Matthew G Crowson, Yang Liu, Carol Friedman, Yves A Lussier.
Abstract
The ability to adequately and efficiently integrate unstructured, heterogeneous datasets, which are incumbent to systems biology and medicine, is one of the primary limitations to their comprehensive analysis. Natural language processing (NLP) and biomedical ontologies are automated methods for capturing, standardizing and integrating information across diverse sources, including narrative text. We have utilized the BioMedLEE NLP system to extract and encode, using standard ontologies (e.g., Cell Type Ontology, Mammalian Phenotype, Gene Ontology), biomolecular mechanisms and clinical phenotypes from the scientific literature. We subsequently applied semantic processing techniques to the structured BioMedLEE output to determine the relationships between these biomolecular and clinical phenotype concepts. We conducted an evaluation that shows an average precision and recall of BioMedLEE with respect to annotating phrases comprised of cell type, anatomy/disease, and gene/protein concepts were 86% and 78%, respectively. The precision of the asserted phenotype-molecular relationships was 75%.Entities:
Year: 2010 PMID: 21347135 PMCID: PMC3041541
Source DB: PubMed Journal: Summit Transl Bioinform ISSN: 2153-6430
Figure 1.Overview of NLP and semantic methods for relating genes and phenotypes.
Figure 2.XML output of BioMedLEE for “Wnt5A regulates proliferation of progenitor cells.”
Examples from recall (coding) evaluation
| Correct (E) | PTP2C is widely expressed in … heart,
| MA:0000168 [brain] |
| Correct (P) | Inhibition of | UMLS:C0021311 [infection] |
| Partial | … suggest … | GeneID:317783 [CELIAC3] |
| Incorrect | … mRNAs induced in
| UMLS:C0009013 [clone cells] (missing UMLS:C0006413 [Burkitt Lymphoma]) |
| None | … identified … as the | Missed UMLS:C0024439 [Macular corneal dystrophy] |
| No code | … region within the candidate locus for
| No exact UMLS code |
Figure 4.Summary of recall evaluation results.