Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

Literature DB >> 21096556

A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature.

Miguel García-Remesal¹, Victor Maojo, José Crespo.

Abstract

In this paper we present a knowledge engineering approach to automatically recognize and extract genetic sequences from scientific articles. To carry out this task, we use a preliminary recognizer based on a finite state machine to extract all candidate DNA/RNA sequences. The latter are then fed into a knowledge-based system that automatically discards false positives and refines noisy and incorrectly merged sequences. We created the knowledge base by manually analyzing different manuscripts containing genetic sequences. Our approach was evaluated using a test set of 211 full-text articles in PDF format containing 3134 genetic sequences. For such set, we achieved 87.76% precision and 97.70% recall respectively. This method can facilitate different research tasks. These include text mining, information extraction, and information retrieval research dealing with large collections of documents containing genetic sequences.

Mesh：

Substances：
DNA

Year: 2010 PMID： 21096556 DOI： 10.1109/IEMBS.2010.5627316

Source DB: PubMed Journal: Annu Int Conf IEEE Eng Med Biol Soc ISSN： 2375-7477

Keyword Cloud
Cited

2 in total

1. PDF text classification to leverage information extraction from publication reports.

Authors: Duy Duc An Bui; Guilherme Del Fiol; Siddhartha Jonnalagadda
Journal: J Biomed Inform Date: 2016-04-01 Impact factor: 6.317

2. Biomarker identification using text mining.

Authors: Hui Li; Chunmei Liu
Journal: Comput Math Methods Med Date: 2012-11-11 Impact factor: 2.238

2 in total