Literature DB >> 32719840

A graph-based method for reconstructing entities from coordination ellipsis in medical text.

Chi Yuan1,2, Yongli Wang1, Ning Shang2, Ziran Li2, Ruxin Zhao1, Chunhua Weng2.   

Abstract

OBJECTIVE: Coordination ellipsis is a linguistic phenomenon abound in medical text and is challenging for concept normalization because of difficulty in recognizing elliptical expressions referencing 2 or more entities accurately. To resolve this bottleneck, we aim to contribute a generalizable method to reconstruct concepts from medical coordinated elliptical expressions in a variety of biomedical corpora.
MATERIALS AND METHODS: We proposed a graph-based representation model and built a pipeline to reconstruct concepts from coordinated elliptical expressions in medical text (RECEEM). There are 4 modules: (1) identify all possible candidate conjunct pairs from original coordinated elliptical expressions, (2) calculate coefficients for candidate conjuncts using the embedding model, (3) select the most appropriate decompositions by global optimization, and (4) rebuild concepts based on a pathfinding algorithm. We evaluated the pipeline's performance on 2658 coordinated elliptical expressions from 3 different medical corpora (ie, biomedical literature, clinical narratives, and eligibility criteria from clinical trials). Precision, recall, and F1 score were calculated.
RESULTS: The F1 scores for biomedical publications, clinical narratives, and research eligibility criteria were 0.862, 0.721, and 0.870, respectively. RECEEM outperformed 2 previously released methods. By incorporating RECEEM into 2 existing NLP tools, the F1 scores increased from 0.248 to 0.460 and from 0.287 to 0.630 on concept mapping of 1125 coordination ellipses.
CONCLUSIONS: RECEEM improves concept normalization for medical coordinated elliptical expressions in a variety of biomedical corpora. It outperformed existing methods and significantly enhanced the performance of 2 notable NLP systems for mapping coordination ellipses in the evaluation. The algorithm is open sourced online (https://github.com/chiyuan1126/RECEEM).
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  concept normalization; coordination ellipsis; natural language processing

Mesh:

Year:  2020        PMID: 32719840      PMCID: PMC7647336          DOI: 10.1093/jamia/ocaa109

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  13 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors:  Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

3.  Identifying non-elliptical entity mentions in a coordinated NP with ellipses.

Authors:  Jeongmin Chae; Younghee Jung; Taemin Lee; Soonyoung Jung; Chan Huh; Gilhan Kim; Hyeoncheol Kim; Heungbum Oh
Journal:  J Biomed Inform       Date:  2013-10-20       Impact factor: 6.317

4.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Authors:  Özlem Uzuner; Brett R South; Shuying Shen; Scott L DuVall
Journal:  J Am Med Inform Assoc       Date:  2011-06-16       Impact factor: 4.497

5.  SimConcept: a hybrid approach for simplifying composite named entities in biomedical text.

Authors:  Chih-Hsuan Wei; Robert Leaman; Zhiyong Lu
Journal:  IEEE J Biomed Health Inform       Date:  2015-04-13       Impact factor: 5.772

6.  Leveraging syntax to better capture the semantics of elliptical coordinated compound noun phrases.

Authors:  Catherine Blake; Tom Rindflesch
Journal:  J Biomed Inform       Date:  2017-07-04       Impact factor: 6.317

7.  NCBI disease corpus: a resource for disease name recognition and concept normalization.

Authors:  Rezarta Islamaj Doğan; Robert Leaman; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2014-01-03       Impact factor: 6.317

Review 8.  Big Data Application in Biomedical Research and Health Care: A Literature Review.

Authors:  Jake Luo; Min Wu; Deepika Gopukumar; Yiqing Zhao
Journal:  Biomed Inform Insights       Date:  2016-01-19

9.  PubMed Phrases, an open set of coherent phrases for searching biomedical literature.

Authors:  Sun Kim; Lana Yeganova; Donald C Comeau; W John Wilbur; Zhiyong Lu
Journal:  Sci Data       Date:  2018-06-12       Impact factor: 6.444

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more
  2 in total

1.  A Comparison between Human and NLP-based Annotation of Clinical Trial Eligibility Criteria Text Using The OMOP Common Data Model.

Authors:  Xinhang Li; Hao Liu; Fabrício Kury; Chi Yuan; Alex Butler; Yingcheng Sun; Anna Ostropolets; Hua Xu; Chunhua Weng
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2021-05-17

2.  Chia, a large annotated corpus of clinical trial eligibility criteria.

Authors:  Fabrício Kury; Alex Butler; Chi Yuan; Li-Heng Fu; Yingcheng Sun; Hao Liu; Ida Sim; Simona Carini; Chunhua Weng
Journal:  Sci Data       Date:  2020-08-27       Impact factor: 6.444

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.