Literature DB >> 21652190

Terminological resources for text mining over biomedical scientific literature.

Fabio Rinaldi1, Kaarel Kaljurand, Rune Sætre.   

Abstract

OBJECTIVE: We present a combined terminological resource for text mining over biomedical literature. The purpose of the resource is to allow the detection of mentions of specific biological entities in scientific publications, and their grounding to widely accepted identifiers. This is an essential process, useful in itself, and necessary as an intermediate step for almost every type of complex text mining application.
METHODS: We discuss some of the properties of the terminology for this domain, in particular the degree of ambiguity, which constitutes a peculiar problem for text mining applications. Without a correct recognition and disambiguation of the domain entities no reliable results can be produced.
RESULTS: We also discuss an application that makes use of the resulting terminological knowledge base. We annotate an existing corpus of sentences about protein interactions. The annotation consists of a normalization step that matches the terms in our resource with their actual representation in the corpus, and a disambiguation step that resolves the ambiguity of matched terms.
CONCLUSION: In this paper we present a large terminological resource, compiled through the aggregation of a number of different manually curated sources. We discuss the lexical properties of such resources, specifically the degree of ambiguity of the terms, and we inspect the causes of such ambiguity, in particular for protein names. This information is of vital importance for the implementation of an efficient term normalization and grounding algorithm.
Copyright © 2011 Elsevier B.V. All rights reserved.

Mesh:

Year:  2011        PMID: 21652190     DOI: 10.1016/j.artmed.2011.04.011

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  5 in total

1.  Ranking relations between diseases, drugs and genes for a curation task.

Authors:  Simon Clematide; Fabio Rinaldi
Journal:  J Biomed Semantics       Date:  2012-10-05

2.  Evaluation and cross-comparison of lexical entities of biological interest (LexEBI).

Authors:  Dietrich Rebholz-Schuhmann; Jee-Hyub Kim; Ying Yan; Abhishek Dixit; Caroline Friteyre; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal:  PLoS One       Date:  2013-10-04       Impact factor: 3.240

3.  FlexiTerm: a flexible term recognition method.

Authors:  Irena Spasić; Mark Greenwood; Alun Preece; Nick Francis; Glyn Elwyn
Journal:  J Biomed Semantics       Date:  2013-10-10

4.  Strategies towards digital and semi-automated curation in RegulonDB.

Authors:  Fabio Rinaldi; Oscar Lithgow; Socorro Gama-Castro; Hilda Solano; Alejandra Lopez; Luis José Muñiz Rascado; Cecilia Ishida-Gutiérrez; Carlos-Francisco Méndez-Cruz; Julio Collado-Vides
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

5.  Using the OntoGene pipeline for the triage task of BioCreative 2012.

Authors:  Fabio Rinaldi; Simon Clematide; Simon Hafner; Gerold Schneider; Gintare Grigonyte; Martin Romacker; Therese Vachon
Journal:  Database (Oxford)       Date:  2013-02-09       Impact factor: 3.451

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.