Literature DB >> 15960842

Gene/protein name recognition based on support vector machine using dictionary as features.

Tomohiro Mitsumori1, Sevrani Fation, Masaki Murata, Kouichi Doi, Hirohumi Doi.   

Abstract

BACKGROUND: Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition.
RESULTS: In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting.
CONCLUSION: During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15960842      PMCID: PMC1869022          DOI: 10.1186/1471-2105-6-S1-S8

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  6 in total

1.  Automated extraction of information on protein-protein interactions from the biological literature.

Authors:  T Ono; H Hishigaki; A Tanigami; T Takagi
Journal:  Bioinformatics       Date:  2001-02       Impact factor: 6.937

2.  Protein names and how to find them.

Authors:  Kristofer Franzén; Gunnar Eriksson; Fredrik Olsson; Lars Asker; Per Lidén; Joakim Cöster
Journal:  Int J Med Inform       Date:  2002-12-04       Impact factor: 4.046

3.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors:  Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Extraction of protein interaction information from unstructured text using a context-free grammar.

Authors:  Joshua M Temkin; Mark R Gilder
Journal:  Bioinformatics       Date:  2003-11-01       Impact factor: 6.937

5.  Toward information extraction: identifying protein names from biological papers.

Authors:  K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal:  Pac Symp Biocomput       Date:  1998

6.  Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study.

Authors:  C Blaschke; A Valencia
Journal:  Comp Funct Genomics       Date:  2001
  6 in total
  14 in total

1.  Quantitative assessment of dictionary-based protein named entity tagging.

Authors:  Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

2.  Unsupervised biomedical named entity recognition: experiments with clinical and biological texts.

Authors:  Shaodian Zhang; Noémie Elhadad
Journal:  J Biomed Inform       Date:  2013-08-15       Impact factor: 6.317

3.  Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

Authors:  Sun Kim; Won Kim; Chih-Hsuan Wei; Zhiyong Lu; W John Wilbur
Journal:  Database (Oxford)       Date:  2012-11-17       Impact factor: 3.451

4.  Automatic extraction of protein point mutations using a graph bigram association.

Authors:  Lawrence C Lee; Florence Horn; Fred E Cohen
Journal:  PLoS Comput Biol       Date:  2007-02-02       Impact factor: 4.475

5.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

6.  Automated recognition of malignancy mentions in biomedical literature.

Authors:  Yang Jin; Ryan T McDonald; Kevin Lerman; Mark A Mandel; Steven Carroll; Mark Y Liberman; Fernando C Pereira; Raymond S Winters; Peter S White
Journal:  BMC Bioinformatics       Date:  2006-11-07       Impact factor: 3.169

7.  Evaluating gold standard corpora against gene/protein tagging solutions and lexical resources.

Authors:  Dietrich Rebholz-Schuhmann; Senay Kafkas; Jee-Hyub Kim; Chen Li; Antonio Jimeno Yepes; Robert Hoehndorf; Rolf Backofen; Ian Lewin
Journal:  J Biomed Semantics       Date:  2013-10-11

8.  Identifying named entities from PubMed for enriching semantic categories.

Authors:  Sun Kim; Zhiyong Lu; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2015-02-21       Impact factor: 3.169

9.  Integrating high dimensional bi-directional parsing models for gene mention tagging.

Authors:  Chun-Nan Hsu; Yu-Ming Chang; Cheng-Ju Kuo; Yu-Shi Lin; Han-Shen Huang; I-Fang Chung
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

10.  Biomedical named entity extraction: some issues of corpus compatibilities.

Authors:  Asif Ekbal; Sriparna Saha; Utpal Kumar Sikdar
Journal:  Springerplus       Date:  2013-11-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.