Literature DB >> 33319905

Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets.

Denis Newman-Griffis1,2, Guy Divita1, Bart Desmet1, Ayah Zirikly1, Carolyn P Rosé1,3, Eric Fosler-Lussier2.   

Abstract

OBJECTIVES: Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research.
MATERIALS AND METHODS: We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language.
RESULTS: We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena. DISCUSSION: Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods.
CONCLUSIONS: Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization. Published by Oxford University Press on behalf of the American Medical Informatics Association 2020.

Entities:  

Keywords:  Unified Medical Language System; controlled; machine learning; natural language processing; semantics; vocabulary

Mesh:

Year:  2021        PMID: 33319905      PMCID: PMC7936394          DOI: 10.1093/jamia/ocaa269

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  55 in total

Review 1.  Two biomedical sublanguages: a description based on the theories of Zellig Harris.

Authors:  Carol Friedman; Pauline Kra; Andrey Rzhetsky
Journal:  J Biomed Inform       Date:  2002-08       Impact factor: 6.317

2.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.

Authors:  Ahmad Pesaranghader; Stan Matwin; Marina Sokolova; Ali Pesaranghader
Journal:  J Am Med Inform Assoc       Date:  2019-05-01       Impact factor: 4.497

4.  Medical concept normalization in social media posts with recurrent neural networks.

Authors:  Elena Tutubalina; Zulfat Miftahutdinov; Sergey Nikolenko; Valentin Malykh
Journal:  J Biomed Inform       Date:  2018-06-12       Impact factor: 6.317

5.  A convolutional route to abbreviation disambiguation in clinical text.

Authors:  Venkata Joopudi; Bharath Dandala; Murthy Devarakonda
Journal:  J Biomed Inform       Date:  2018-08-15       Impact factor: 6.317

6.  Unsupervised Abbreviation Expansion in Clinical Narratives.

Authors:  Michel Oleynik; Markus Kreuzthaler; Stefan Schulz
Journal:  Stud Health Technol Inform       Date:  2017

7.  Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists.

Authors:  Peter L Elkin; Steven H Brown; Casey S Husser; Brent A Bauer; Dietlind Wahner-Roedler; S Trent Rosenbloom; Ted Speroff
Journal:  Mayo Clin Proc       Date:  2006-06       Impact factor: 7.616

8.  Systematic integration of biomedical knowledge prioritizes drugs for repurposing.

Authors:  Daniel Scott Himmelstein; Antoine Lizee; Christine Hessler; Leo Brueggeman; Sabrina L Chen; Dexter Hadley; Ari Green; Pouya Khankhanian; Sergio E Baranzini
Journal:  Elife       Date:  2017-09-22       Impact factor: 8.140

9.  CNN-based ranking for biomedical entity normalization.

Authors:  Haodi Li; Qingcai Chen; Buzhou Tang; Xiaolong Wang; Hua Xu; Baohua Wang; Dong Huang
Journal:  BMC Bioinformatics       Date:  2017-10-03       Impact factor: 3.169

10.  Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2.

Authors:  Danielle L Mowery; Brett R South; Lee Christensen; Jianwei Leng; Laura-Maria Peltonen; Sanna Salanterä; Hanna Suominen; David Martinez; Sumithra Velupillai; Noémie Elhadad; Guergana Savova; Sameer Pradhan; Wendy W Chapman
Journal:  J Biomed Semantics       Date:  2016-07-01
View more
  4 in total

1.  Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.

Authors:  Tiago Almeida; Rui Antunes; João F Silva; João R Almeida; Sérgio Matos
Journal:  Database (Oxford)       Date:  2022-07-01       Impact factor: 4.462

2.  A simple neural vector space model for medical concept normalization using concept embeddings.

Authors:  Dongfang Xu; Timothy Miller
Journal:  J Biomed Inform       Date:  2022-04-23       Impact factor: 8.000

3.  Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health.

Authors:  Denis Newman-Griffis; Eric Fosler-Lussier
Journal:  Front Digit Health       Date:  2021-03-10

4.  Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets.

Authors:  Shikhar Vashishth; Denis Newman-Griffis; Rishabh Joshi; Ritam Dutt; Carolyn P Rosé
Journal:  J Biomed Inform       Date:  2021-08-12       Impact factor: 6.317

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.