Literature DB >> 19074302

BioTagger-GM: a gene/protein name recognition system.

Manabu Torii1, Zhangzhi Hu, Cathy H Wu, Hongfang Liu.   

Abstract

OBJECTIVES: Biomedical named entity recognition (BNER) is a critical component in automated systems that mine biomedical knowledge in free text. Among different types of entities in the domain, gene/protein would be the most studied one for BNER. Our goal is to develop a gene/protein name recognition system BioTagger-GM that exploits rich information in terminology sources using powerful machine learning frameworks and system combination.
DESIGN: BioTagger-GM consists of four main components: (1) dictionary lookup-gene/protein names in BioThesaurus and biomedical terms in UMLS Metathesaurus are tagged in text, (2) machine learning-machine learning systems are trained using dictionary lookup results as one type of feature, (3) post-processing-heuristic rules are used to correct recognition errors, and (4) system combination-a voting scheme is used to combine recognition results from multiple systems. MEASUREMENTS: The BioCreAtIvE II Gene Mention (GM) corpus was used to evaluate the proposed method. To test its general applicability, the method was also evaluated on the JNLPBA corpus modified for gene/protein name recognition. The performance of the systems was evaluated through cross-validation tests and measured using precision, recall, and F-Measure.
RESULTS: BioTagger-GM achieved an F-Measure of 0.8887 on the BioCreAtIvE II GM corpus, which is higher than that of the first-place system in the BioCreAtIvE II challenge. The applicability of the method was also confirmed on the modified JNLPBA corpus.
CONCLUSION: The results suggest that terminology sources, powerful machine learning frameworks, and system combination can be integrated to build an effective BNER system.

Mesh:

Substances:

Year:  2008        PMID: 19074302      PMCID: PMC2649315          DOI: 10.1197/jamia.M2844

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  31 in total

1.  Event extraction from biomedical papers using a full parser.

Authors:  A Yakushiji; Y Tateisi; Y Miyao; J Tsujii
Journal:  Pac Symp Biocomput       Date:  2001

2.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures.

Authors:  K Humphreys; G Demetriou; R Gaizauskas
Journal:  Pac Symp Biocomput       Date:  2000

3.  A pragmatic information extraction strategy for gathering data on genetic interactions.

Authors:  D Proux; F Rechenmann; L Julliard
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

4.  A biological named entity recognizer.

Authors:  Meenakshi Narayanaswamy; K E Ravikumar; K Vijay-Shanker
Journal:  Pac Symp Biocomput       Date:  2003

5.  A simple algorithm for identifying abbreviation definitions in biomedical text.

Authors:  Ariel S Schwartz; Marti A Hearst
Journal:  Pac Symp Biocomput       Date:  2003

6.  GAPSCORE: finding gene and protein names one word at a time.

Authors:  Jeffrey T Chang; Hinrich Schütze; Russ B Altman
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

7.  GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data.

Authors:  Andrey Rzhetsky; Ivan Iossifov; Tomohiro Koike; Michael Krauthammer; Pauline Kra; Mitzi Morris; Hong Yu; Pablo Ariel Duboué; Wubin Weng; W John Wilbur; Vasileios Hatzivassiloglou; Carol Friedman
Journal:  J Biomed Inform       Date:  2004-02       Impact factor: 6.317

8.  Improving the performance of dictionary-based approaches in protein name recognition.

Authors:  Yoshimasa Tsuruoka; Jun'ichi Tsujii
Journal:  J Biomed Inform       Date:  2004-12       Impact factor: 6.317

9.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

10.  Overview of BioCreative II gene mention recognition.

Authors:  Larry Smith; Lorraine K Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A Struble; Richard J Povinelli; Andreas Vlachos; William A Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov; Anna Divoli; Manuel Maña-López; Jacinto Mata; W John Wilbur
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  27 in total

1.  Detecting abbreviations in discharge summaries using machine learning methods.

Authors:  Yonghui Wu; S Trent Rosenbloom; Joshua C Denny; Randolph A Miller; Subramani Mani; Dario A Giuse; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine.

Authors:  Son Doan; Hua Xu
Journal:  Proc Int Conf Comput Ling       Date:  2010-08

3.  Mining consumer health vocabulary from community-generated text.

Authors:  V G Vinod Vydiswaran; Qiaozhu Mei; David A Hanauer; Kai Zheng
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

4.  Using machine learning for concept extraction on clinical documents from multiple data sources.

Authors:  Manabu Torii; Kavishwar Wagholikar; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2011-06-27       Impact factor: 4.497

5.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors:  Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2011-04-20       Impact factor: 4.497

6.  Towards a semantic lexicon for clinical natural language processing.

Authors:  Hongfang Liu; Stephen T Wu; Dingcheng Li; Siddhartha Jonnalagadda; Sunghwan Sohn; Kavishwar Wagholikar; Peter J Haug; Stanley M Huff; Christopher G Chute
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

7.  Extracting genetic alteration information for personalized cancer therapy from ClinicalTrials.gov.

Authors:  Jun Xu; Hee-Jin Lee; Jia Zeng; Yonghui Wu; Yaoyun Zhang; Liang-Chin Huang; Amber Johnson; Vijaykumar Holla; Ann M Bailey; Trevor Cohen; Funda Meric-Bernstam; Elmer V Bernstam; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2016-03-24       Impact factor: 4.497

8.  Ensembles of natural language processing systems for portable phenotyping solutions.

Authors:  Cong Liu; Casey N Ta; James R Rogers; Ziran Li; Junghwan Lee; Alex M Butler; Ning Shang; Fabricio Sampaio Peres Kury; Liwei Wang; Feichen Shen; Hongfang Liu; Lyudmila Ena; Carol Friedman; Chunhua Weng
Journal:  J Biomed Inform       Date:  2019-10-23       Impact factor: 6.317

9.  Supervised segmentation of phenotype descriptions for the human skeletal phenome using hybrid methods.

Authors:  Tudor Groza; Jane Hunter; Andreas Zankl
Journal:  BMC Bioinformatics       Date:  2012-10-15       Impact factor: 3.169

10.  Recognizing scientific artifacts in biomedical literature.

Authors:  Tudor Groza; Hamed Hassanzadeh; Jane Hunter
Journal:  Biomed Inform Insights       Date:  2013-04-02
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.