Literature DB >> 26187250

Challenges in clinical natural language processing for automated disorder normalization.

Robert Leaman1, Ritu Khare2, Zhiyong Lu3.   

Abstract

BACKGROUND: Identifying key variables such as disorders within the clinical narratives in electronic health records has wide-ranging applications within clinical practice and biomedical research. Previous research has demonstrated reduced performance of disorder named entity recognition (NER) and normalization (or grounding) in clinical narratives than in biomedical publications. In this work, we aim to identify the cause for this performance difference and introduce general solutions.
METHODS: We use closure properties to compare the richness of the vocabulary in clinical narrative text to biomedical publications. We approach both disorder NER and normalization using machine learning methodologies. Our NER methodology is based on linear-chain conditional random fields with a rich feature approach, and we introduce several improvements to enhance the lexical knowledge of the NER system. Our normalization method - never previously applied to clinical data - uses pairwise learning to rank to automatically learn term variation directly from the training data.
RESULTS: We find that while the size of the overall vocabulary is similar between clinical narrative and biomedical publications, clinical narrative uses a richer terminology to describe disorders than publications. We apply our system, DNorm-C, to locate disorder mentions and in the clinical narratives from the recent ShARe/CLEF eHealth Task. For NER (strict span-only), our system achieves precision=0.797, recall=0.713, f-score=0.753. For the normalization task (strict span+concept) it achieves precision=0.712, recall=0.637, f-score=0.672. The improvements described in this article increase the NER f-score by 0.039 and the normalization f-score by 0.036. We also describe a high recall version of the NER, which increases the normalization recall to as high as 0.744, albeit with reduced precision. DISCUSSION: We perform an error analysis, demonstrating that NER errors outnumber normalization errors by more than 4-to-1. Abbreviations and acronyms are found to be frequent causes of error, in addition to the mentions the annotators were not able to identify within the scope of the controlled vocabulary.
CONCLUSION: Disorder mentions in text from clinical narratives use a rich vocabulary that results in high term variation, which we believe to be one of the primary causes of reduced performance in clinical narrative. We show that pairwise learning to rank offers high performance in this context, and introduce several lexical enhancements - generalizable to other clinical NER tasks - that improve the ability of the NER system to handle this variation. DNorm-C is a high performing, open source system for disorders in clinical text, and a promising step toward NER and normalization methods that are trainable to a wide variety of domains and entities. (DNorm-C is open source software, and is available with a trained model at the DNorm demonstration website: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/tmTools/#DNorm.). Published by Elsevier Inc.

Entities:  

Keywords:  Electronic health records; Information extraction; Natural language processing

Mesh:

Year:  2015        PMID: 26187250      PMCID: PMC4713367          DOI: 10.1016/j.jbi.2015.07.010

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  30 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

3.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.

Authors:  Wendy W Chapman; Prakash M Nadkarni; Lynette Hirschman; Leonard W D'Avolio; Guergana K Savova; Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2011 Sep-Oct       Impact factor: 4.497

4.  Mining clinical text for signals of adverse drug-drug interactions.

Authors:  Srinivasan V Iyer; Rave Harpaz; Paea LePendu; Anna Bauer-Mehren; Nigam H Shah
Journal:  J Am Med Inform Assoc       Date:  2013-10-24       Impact factor: 4.497

Review 5.  Mining electronic health records: towards better research applications and clinical care.

Authors:  Peter B Jensen; Lars J Jensen; Søren Brunak
Journal:  Nat Rev Genet       Date:  2012-05-02       Impact factor: 53.242

6.  Semantic classification of diseases in discharge summaries using a context-aware rule-based classifier.

Authors:  Illés Solt; Domonkos Tikk; Viktor Gál; Zsolt T Kardkovács
Journal:  J Am Med Inform Assoc       Date:  2009-04-23       Impact factor: 4.497

Review 7.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge.

Authors:  Weiyi Sun; Anna Rumshisky; Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2013-04-05       Impact factor: 4.497

8.  Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

Authors:  Buzhou Tang; Hongxin Cao; Yonghui Wu; Min Jiang; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2013-04-05       Impact factor: 2.796

9.  Overview of BioCreative II gene mention recognition.

Authors:  Larry Smith; Lorraine K Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A Struble; Richard J Povinelli; Andreas Vlachos; William A Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov; Anna Divoli; Manuel Maña-López; Jacinto Mata; W John Wilbur
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

10.  DNorm: disease name normalization with pairwise learning to rank.

Authors:  Robert Leaman; Rezarta Islamaj Dogan; Zhiyong Lu
Journal:  Bioinformatics       Date:  2013-08-21       Impact factor: 6.937

View more
  25 in total

1.  MetaMap Lite: an evaluation of a new Java implementation of MetaMap.

Authors:  Dina Demner-Fushman; Willie J Rogers; Alan R Aronson
Journal:  J Am Med Inform Assoc       Date:  2017-07-01       Impact factor: 4.497

2.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Authors:  Robert Leaman; Zhiyong Lu
Journal:  Bioinformatics       Date:  2016-06-09       Impact factor: 6.937

3.  What's in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization.

Authors:  Griffin Adams; Emily Alsentzer; Mert Ketenci; Jason Zucker; Noémie Elhadad
Journal:  Proc Conf       Date:  2021-06

4.  The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.

Authors:  Sam Henry; Yanshan Wang; Feichen Shen; Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2020-10-01       Impact factor: 4.497

Review 5.  Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text.

Authors:  G Gonzalez-Hernandez; A Sarker; K O'Connor; G Savova
Journal:  Yearb Med Inform       Date:  2017-09-11

Review 6.  Evolving Role and Future Directions of Natural Language Processing in Gastroenterology.

Authors:  Fredy Nehme; Keith Feldman
Journal:  Dig Dis Sci       Date:  2020-02-27       Impact factor: 3.199

7.  Understanding spatial language in radiology: Representation framework, annotation, and spatial relation extraction from chest X-ray reports using deep learning.

Authors:  Surabhi Datta; Yuqi Si; Laritza Rodriguez; Sonya E Shooshan; Dina Demner-Fushman; Kirk Roberts
Journal:  J Biomed Inform       Date:  2020-06-18       Impact factor: 6.317

8.  Improving the Path from Diagnoses to Documentation: A Cognitive Review Tool for Clinical Notes and Administrative Records.

Authors:  Yufan Guo; Joy Wu; Tyler Baldwin; David Beymer; Vandana V Mukherjee; Tanveer F Syeda-Mahmood
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

9.  Reducing Physicians' Cognitive Load During Chart Review: A Problem-Oriented Summary of the Patient Electronic Record.

Authors:  Jennifer J Liang; Ching-Huei Tsou; Bharath Dandala; Ananya Poddar; Venkata Joopudi; Diwakar Mahajan; John Prager; Preethi Raghavan; Michele Payne
Journal:  AMIA Annu Symp Proc       Date:  2022-02-21

10.  Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study.

Authors:  Ghada Alfattni; Maksim Belousov; Niels Peek; Goran Nenadic
Journal:  JMIR Med Inform       Date:  2021-05-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.