Literature DB >> 24153413

Identifying non-elliptical entity mentions in a coordinated NP with ellipses.

Jeongmin Chae1, Younghee Jung2, Taemin Lee3, Soonyoung Jung4, Chan Huh5, Gilhan Kim6, Hyeoncheol Kim7, Heungbum Oh8.   

Abstract

Named entities in the biomedical domain are often written using a Noun Phrase (NP) along with a coordinating conjunction such as 'and' and 'or'. In addition, repeated words among named entity mentions are frequently omitted. It is often difficult to identify named entities. Although various Named Entity Recognition (NER) methods have tried to solve this problem, these methods can only deal with relatively simple elliptical patterns in coordinated NPs. We propose a new NER method for identifying non-elliptical entity mentions with simple or complex ellipses using linguistic rules and an entity mention dictionary. The GENIA and CRAFT corpora were used to evaluate the performance of the proposed system. The GENIA corpus was used to evaluate the performance of the system according to the quality of the dictionary. The GENIA corpus comprises 3434 non-elliptical entity mentions in 1585 coordinated NPs with ellipses. The system achieves 92.11% precision, 95.20% recall, and 93.63% F-score in identification of non-elliptical entity mentions in coordinated NPs. The accuracy of the system in resolving simple and complex ellipses is 94.54% and 91.95%, respectively. The CRAFT corpus was used to evaluate the performance of the system under realistic conditions. The system achieved 78.47% precision, 67.10% recall, and 72.34% F-score in coordinated NPs. The performance evaluations of the system show that it efficiently solves the problem caused by ellipses, and improves NER performance. The algorithm is implemented in PHP and the code can be downloaded from https://code.google.com/p/medtextmining/.
Copyright © 2013. Published by Elsevier Inc.

Entities:  

Keywords:  Ellipsis resolution; Named entity recognition; Text mining

Mesh:

Year:  2013        PMID: 24153413     DOI: 10.1016/j.jbi.2013.10.002

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  5 in total

1.  SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine.

Authors:  Chih-Hsuan Wei; Robert Leaman; Zhiyong Lu
Journal:  ACM BCB       Date:  2014

2.  SimConcept: a hybrid approach for simplifying composite named entities in biomedical text.

Authors:  Chih-Hsuan Wei; Robert Leaman; Zhiyong Lu
Journal:  IEEE J Biomed Health Inform       Date:  2015-04-13       Impact factor: 5.772

3.  tmChem: a high performance approach for chemical named entity recognition and normalization.

Authors:  Robert Leaman; Chih-Hsuan Wei; Zhiyong Lu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

4.  A graph-based method for reconstructing entities from coordination ellipsis in medical text.

Authors:  Chi Yuan; Yongli Wang; Ning Shang; Ziran Li; Ruxin Zhao; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

5.  Gold-standard ontology-based anatomical annotation in the CRAFT Corpus.

Authors:  Michael Bada; Nicole Vasilevsky; William A Baumgartner; Melissa Haendel; Lawrence E Hunter
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.