Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Literature DB >> 14764613

A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Sergei Egorov¹, Anton Yuryev, Nikolai Daraselia.

Abstract

OBJECTIVE: The aim of this study was to develop a practical and efficient protein identification system for biomedical corpora.
DESIGN: The developed system, called ProtScan, utilizes a carefully constructed dictionary of mammalian proteins in conjunction with a specialized tokenization algorithm to identify and tag protein name occurrences in biomedical texts and also takes advantage of Medline "Name-of-Substance" (NOS) annotation. The dictionaries for ProtScan were constructed in a semi-automatic way from various public-domain sequence databases followed by an intensive expert curation step. MEASUREMENTS: The recall and precision of the system have been determined using 1000 randomly selected and hand-tagged Medline abstracts.
RESULTS: The developed system is capable of identifying protein occurrences in Medline abstracts with a 98% precision and 88% recall. It was also found to be capable of processing approximately 300 abstracts per second. Without utilization of NOS annotation, precision and recall were found to be 98.5% and 84%, respectively.
CONCLUSION: The developed system appears to be well suited for protein-based Medline indexing and can help to improve biomedical information retrieval. Further approaches to ProtScan's recall improvement also are discussed.

Entities: Species

Mesh：

Substances：
Proteins

Year: 2004 PMID： 14764613 PMCID： PMC400515 DOI： 10.1197/jamia.M1453

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

9 in total

1. Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction.

Authors:
Journal: Genome Inform Ser Workshop Genome Inform Date: 1998

2. A biological named entity recognizer.

Authors: Meenakshi Narayanaswamy; K E Ravikumar; K Vijay-Shanker
Journal: Pac Symp Biocomput Date: 2003

3. A simple algorithm for identifying abbreviation definitions in biomedical text.

Authors: Ariel S Schwartz; Marti A Hearst
Journal: Pac Symp Biocomput Date: 2003

4. Creating an online dictionary of abbreviations from MEDLINE.

Authors: Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal: J Am Med Inform Assoc Date: 2002 Nov-Dec Impact factor: 4.497

5. Protein names and how to find them.

Authors: Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster
Journal: Int J Med Inform Date: 2002-12-04 Impact factor: 4.046

6. Tagging gene and protein names in biomedical text.

Authors: Lorraine Tanabe; W John Wilbur
Journal: Bioinformatics Date: 2002-08 Impact factor: 6.937

7. Using BLAST for identifying gene and protein names in journal articles.

Authors: M Krauthammer; A Rzhetsky; P Morozov; C Friedman
Journal: Gene Date: 2000-12-23 Impact factor: 3.688

8. A probabilistic model for identifying protein names and their name boundaries.

Authors: Kazuhiro Seki; Javed Mostafa
Journal: Proc IEEE Comput Soc Bioinform Conf Date: 2003

9. Toward information extraction: identifying protein names from biological papers.

Authors: K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal: Pac Symp Biocomput Date: 1998

9 in total

4 in total

1. SemCat: semantically categorized entities for genomics.

Authors: Lorraine Tanabe; Lynne H Thom; Wayne Matten; Donald C Comeau; W John Wilbur
Journal: AMIA Annu Symp Proc Date: 2006

2. BioTagger-GM: a gene/protein name recognition system.

Authors: Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal: J Am Med Inform Assoc Date: 2008-12-11 Impact factor: 4.497

3. A novel biological function for CD44 in axon growth of retinal ganglion cells identified by a bioinformatics approach.

Authors: Albert Ries; Jeffrey L Goldberg; Barbara Grimpe
Journal: J Neurochem Date: 2007-08-30 Impact factor: 5.372

4. Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.

Authors: Nikolai Daraselia; Anton Yuryev; Sergei Egorov; Ilya Mazo; Iaroslav Ispolatov
Journal: BMC Bioinformatics Date: 2007-07-10 Impact factor: 3.169

4 in total