| Literature DB >> 11163982 |
M Krauthammer1, A Rzhetsky, P Morozov, C Friedman.
Abstract
We describe a system which automatically identifies gene and protein names in journal articles, an important and non-trivial first step in knowledge extraction of protein and gene actions. Our system uses a database of gene and protein names and is based on BLAST [Altschul et al., Nucleic Acids Res. 25 (1997) 3389-3402], a popular tool for DNA and protein sequence comparison. We describe a method that consists of mapping sequences of text characters into sequences of nucleotides that can be processed by BLAST. We demonstrate that this approach is feasible: the system matches gene and protein names with a recall of 78.8% and a precision of 71.7%, which includes names that are not part of the system database. An analysis of the results suggests techniques that can be used to improve performance further.Mesh:
Substances:
Year: 2000 PMID: 11163982 DOI: 10.1016/s0378-1119(00)00431-5
Source DB: PubMed Journal: Gene ISSN: 0378-1119 Impact factor: 3.688