Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Gene/protein name recognition based on support vector machine using dictionary as features.

Literature DB >> 15960842

Gene/protein name recognition based on support vector machine using dictionary as features.

Tomohiro Mitsumori¹, Sevrani Fation, Masaki Murata, Kouichi Doi, Hirohumi Doi.

Abstract

BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition.
RESULTS: In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting.
CONCLUSION: During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.

Entities: Chemical Disease Species

Mesh：

Substances：
Proteins

Year: 2005 PMID： 15960842 PMCID： PMC1869022 DOI： 10.1186/1471-2105-6-S1-S8

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

6 in total

1. Automated extraction of information on protein-protein interactions from the biological literature.

Authors: T Ono; H Hishigaki; A Tanigami; T Takagi
Journal: Bioinformatics Date: 2001-02 Impact factor: 6.937

2. Protein names and how to find them.

Authors: Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster
Journal: Int J Med Inform Date: 2002-12-04 Impact factor: 4.046

3. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors: Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

4. Extraction of protein interaction information from unstructured text using a context-free grammar.

Authors: Joshua M Temkin; Mark R Gilder
Journal: Bioinformatics Date: 2003-11-01 Impact factor: 6.937

5. Toward information extraction: identifying protein names from biological papers.

Authors: K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal: Pac Symp Biocomput Date: 1998

6. Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study.

Authors: C Blaschke; A Valencia
Journal: Comp Funct Genomics Date: 2001

6 in total

14 in total

1. Quantitative assessment of dictionary-based protein named entity tagging.

Authors: Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal: J Am Med Inform Assoc Date: 2006-06-23 Impact factor: 4.497

2. Unsupervised biomedical named entity recognition: experiments with clinical and biological texts.

Authors: Shaodian Zhang; Noémie Elhadad
Journal: J Biomed Inform Date: 2013-08-15 Impact factor: 6.317

3. Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

Authors: Sun Kim; Won Kim; Chih-Hsuan Wei; Zhiyong Lu; W John Wilbur
Journal: Database (Oxford) Date: 2012-11-17 Impact factor: 3.451

4. Automatic extraction of protein point mutations using a graph bigram association.

Authors: Lawrence C Lee; Florence Horn; Fred E Cohen
Journal: PLoS Comput Biol Date: 2007-02-02 Impact factor: 4.475

5. Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors: Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal: BMC Bioinformatics Date: 2005-05-24 Impact factor: 3.169

6. Automated recognition of malignancy mentions in biomedical literature.

Authors: Yang Jin; Ryan T McDonald; Kevin Lerman; Mark A Mandel; Steven Carroll; Mark Y Liberman; Fernando C Pereira; Raymond S Winters; Peter S White
Journal: BMC Bioinformatics Date: 2006-11-07 Impact factor: 3.169

7. Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

Authors: Dietrich Rebholz-Schuhmann; Senay Kafkas; Jee-Hyub Kim; Chen Li; Antonio Jimeno Yepes; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal: J Biomed Semantics Date: 2013-10-11

8. Identifying named entities from PubMed for enriching semantic categories.

Authors: Sun Kim; Zhiyong Lu; W John Wilbur
Journal: BMC Bioinformatics Date: 2015-02-21 Impact factor: 3.169

9. Integrating high dimensional bi-directional parsing models for gene mention tagging.

Authors: Chun-Nan Hsu; Yu-Ming Chang; Cheng-Ju Kuo; Yu-Shi Lin; Han-Shen Huang; I-Fang Chung
Journal: Bioinformatics Date: 2008-07-01 Impact factor: 6.937

10. Biomedical named entity extraction: some issues of corpus compatibilities.

Authors: Asif Ekbal; Sriparna Saha; Utpal Kumar Sikdar
Journal: Springerplus Date: 2013-11-12