Literature DB >> 25844401

SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine.

Chih-Hsuan Wei1, Robert Leaman1, Zhiyong Lu1.   

Abstract

Many text-mining studies have focused on the issue of named entity recognition and normalization, especially in the field of biomedical natural language processing. However, entity recognition is a complicated and difficult task in biomedical text. One particular challenge is to identify and resolve composite named entities, where a single span refers to more than one concept(e.g., BRCA1/2). Most bioconcept recognition and normalization studies have either ignored this issue, used simple ad-hoc rules, or only handled coordination ellipsis, which is only one of the many types of composite mentions studied in this work. No systematic methods for simplifying composite mentions have been previously reported, making a robust approach greatly needed. To this end, we propose a hybrid approach by integrating a machine learning model with a pattern identification strategy to identify the antecedent and conjuncts regions of a concept mention, and then reassemble the composite mention using those identified regions. Our method, which we have named SimConcept, is the first method to systematically handle most types of composite mentions. Our method achieves high performance in identifying and resolving composite mentions for three fundamental biological entities: genes (89.29% in F-measure), diseases (85.52% in F-measure) and chemicals (84.04% in F-measure). Furthermore, our results show that, using our SimConcept method can subsequently help improve the performance of gene and disease concept recognition and normalization.

Entities:  

Keywords:  Algorithms; Mention simplification; conditional random field; name entity normalization; name entity recognition; natural language processing

Year:  2014        PMID: 25844401      PMCID: PMC4384177          DOI: 10.1145/2649387.2649420

Source DB:  PubMed          Journal:  ACM BCB


  32 in total

1.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

2.  Semi-automatic semantic annotation of PubMed queries: a study on quality, efficiency, satisfaction.

Authors:  Aurélie Névéol; Rezarta Islamaj Doğan; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2010-11-20       Impact factor: 6.317

Review 3.  Frontiers of biomedical text mining: current progress.

Authors:  Pierre Zweigenbaum; Dina Demner-Fushman; Hong Yu; Kevin B Cohen
Journal:  Brief Bioinform       Date:  2007-10-30       Impact factor: 11.622

4.  High-performance gene name normalization with GeNo.

Authors:  Joachim Wermter; Katrin Tomanek; Udo Hahn
Journal:  Bioinformatics       Date:  2009-02-02       Impact factor: 6.937

5.  Improving perceived and actual text difficulty for health information consumers using semi-automated methods.

Authors:  Gondy Leroy; James E Endicott; Obay Mouradi; David Kauchak; Melissa L Just
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

6.  Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.

Authors:  Dietrich Rebholz-Schuhmann; Antonio Jimeno Yepes; Chen Li; Senay Kafkas; Ian Lewin; Ning Kang; Peter Corbett; David Milward; Ekaterina Buyko; Elena Beisswanger; Kerstin Hornbostel; Alexandre Kouznetsov; René Witte; Jonas B Laurila; Christopher Jo Baker; Cheng-Ju Kuo; Simone Clematide; Fabio Rinaldi; Richárd Farkas; György Móra; Kazuo Hara; Laura I Furlong; Michael Rautschka; Mariana Lara Neves; Alberto Pascual-Montano; Qi Wei; Nigel Collier; Md Faisal Mahbub Chowdhury; Alberto Lavelli; Rafael Berlanga; Roser Morante; Vincent Van Asch; Walter Daelemans; José Luís Marina; Erik van Mulligen; Jan Kors; Udo Hahn
Journal:  J Biomed Semantics       Date:  2011-10-06

7.  The gene normalization task in BioCreative III.

Authors:  Zhiyong Lu; Hung-Yu Kao; Chih-Hsuan Wei; Minlie Huang; Jingchen Liu; Cheng-Ju Kuo; Chun-Nan Hsu; Richard Tzong-Han Tsai; Hong-Jie Dai; Naoaki Okazaki; Han-Cheol Cho; Martin Gerner; Illes Solt; Shashank Agarwal; Feifan Liu; Dina Vishnyakova; Patrick Ruch; Martin Romacker; Fabio Rinaldi; Sanmitra Bhattacharya; Padmini Srinivasan; Hongfang Liu; Manabu Torii; Sergio Matos; David Campos; Karin Verspoor; Kevin M Livingston; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

8.  BioCreative-IV virtual issue.

Authors:  Cecilia N Arighi; Cathy H Wu; Kevin B Cohen; Lynette Hirschman; Martin Krallinger; Alfonso Valencia; Zhiyong Lu; John W Wilbur; Thomas C Wiegers
Journal:  Database (Oxford)       Date:  2014-05-22       Impact factor: 3.451

9.  OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression.

Authors:  Lawrence Hunter; Zhiyong Lu; James Firby; William A Baumgartner; Helen L Johnson; Philip V Ogren; K Bretonnel Cohen
Journal:  BMC Bioinformatics       Date:  2008-01-31       Impact factor: 3.169

10.  Detecting concept mentions in biomedical text using hidden Markov model: multiple concept types at once or one at a time?

Authors:  Manabu Torii; Kavishwar Wagholikar; Hongfang Liu
Journal:  J Biomed Semantics       Date:  2014-01-17
View more
  5 in total

1.  SimConcept: a hybrid approach for simplifying composite named entities in biomedical text.

Authors:  Chih-Hsuan Wei; Robert Leaman; Zhiyong Lu
Journal:  IEEE J Biomed Health Inform       Date:  2015-04-13       Impact factor: 5.772

2.  NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition.

Authors:  Richard Tzong-Han Tsai; Yu-Cheng Hsiao; Po-Ting Lai
Journal:  Database (Oxford)       Date:  2016-10-25       Impact factor: 3.451

3.  GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao; Zhiyong Lu
Journal:  Biomed Res Int       Date:  2015-08-25       Impact factor: 3.411

4.  AuDis: an automatic CRF-enhanced disease normalization in biomedical text.

Authors:  Hsin-Chun Lee; Yi-Yu Hsu; Hung-Yu Kao
Journal:  Database (Oxford)       Date:  2016-06-07       Impact factor: 3.451

Review 5.  Recent advances in predicting gene-disease associations.

Authors:  Kenneth Opap; Nicola Mulder
Journal:  F1000Res       Date:  2017-04-26
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.