Literature DB >> 26054428

An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

Ramakanth Kavuluru1, Anthony Rios2, Yuan Lu3.   

Abstract

BACKGROUND: Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. <br> OBJECTIVE: We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. <br> METHODS: We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011-2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. <br> RESULTS: Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. <br> CONCLUSIONS: We show that datasets at different scale (size of the EMRs, number of distinct codes) and with different characteristics warrant different learning approaches. For shorter narratives pertaining to a particular medical subdomain (e.g., radiology, pathology), classifier chaining is ideal given the codes are highly related with each other. For realistic in-patient full EMRs, feature and data selection methods offer high performance for smaller datasets. However, for large EMR datasets, we observe that the binary relevance approach with learning-to-rank based code reranking offers the best performance. Regardless of the training dataset size, for general EMRs, label calibration to select the optimal number of labels is an indispensable final step.
Copyright © 2015 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Diagnosis code assignment; Label calibration; Learning to rank; Multi-label text classification

Mesh:

Year:  2015        PMID: 26054428      PMCID: PMC4605853          DOI: 10.1016/j.artmed.2015.04.007

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  12 in total

1.  An overview of MetaMap: historical perspective and recent advances.

Authors:  Alan R Aronson; François-Michel Lang
Journal:  J Am Med Inform Assoc       Date:  2010 May-Jun       Impact factor: 4.497

Review 2.  Empirical distributional semantics: methods and biomedical applications.

Authors:  Trevor Cohen; Dominic Widdows
Journal:  J Biomed Inform       Date:  2009-02-14       Impact factor: 6.317

3.  Three approaches to automatic assignment of ICD-9-CM codes to radiology reports.

Authors:  Ira Goldstein; Anna Arzrumtsyan; Ozlem Uzuner
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

4.  From episodes of care to diagnosis codes: automatic text categorization for medico-economic encoding.

Authors:  Patrick Ruch; Julien Gobeilla; Imad Tbahritia; Antoine Geissbühlera
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

5.  Optimal training sets for Bayesian prediction of MeSH assignment.

Authors:  Sunghwan Sohn; Won Kim; Donald C Comeau; W John Wilbur
Journal:  J Am Med Inform Assoc       Date:  2008-04-24       Impact factor: 4.497

6.  Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

Authors:  Trevor Cohen; Roger Schvaneveldt; Dominic Widdows
Journal:  J Biomed Inform       Date:  2009-09-15       Impact factor: 6.317

7.  Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies.

Authors:  O Bodenreider; S J Nelson; W T Hole; H F Chang
Journal:  Proc AMIA Symp       Date:  1998

8.  A recent advance in the automatic indexing of the biomedical literature.

Authors:  Aurélie Névéol; Sonya E Shooshan; Susanne M Humphrey; James G Mork; Alan R Aronson
Journal:  J Biomed Inform       Date:  2008-12-30       Impact factor: 6.317

9.  Automatic construction of rule-based ICD-9-CM coding systems.

Authors:  Richárd Farkas; György Szarvas
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

10.  Diagnosis code assignment: models and evaluation metrics.

Authors:  Adler Perotte; Rimma Pivovarov; Karthik Natarajan; Nicole Weiskopf; Frank Wood; Noémie Elhadad
Journal:  J Am Med Inform Assoc       Date:  2013-12-02       Impact factor: 4.497

View more
  20 in total

1.  deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.

Authors:  Ahmad Pesaranghader; Stan Matwin; Marina Sokolova; Ali Pesaranghader
Journal:  J Am Med Inform Assoc       Date:  2019-05-01       Impact factor: 4.497

2.  Computer-Assisted Diagnostic Coding: Effectiveness of an NLP-based approach using SNOMED CT to ICD-10 mappings.

Authors:  Anthony N Nguyen; Donna Truran; Madonna Kemp; Bevan Koopman; David Conlan; John O'Dwyer; Ming Zhang; Sarvnaz Karimi; Hamed Hassanzadeh; Michael J Lawley; Damian Green
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

3.  Can structured EHR data support clinical coding? A data mining approach.

Authors:  José Carlos Ferrão; Mónica Duarte Oliveira; Filipe Janela; Henrique M G Martins; Daniel Gartner
Journal:  Health Syst (Basingstoke)       Date:  2020-03-01

4.  On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs.

Authors:  Orhan Abar; Richard J Charnigo; Abner Rayapati; Ramakanth Kavuluru
Journal:  ACM BCB       Date:  2016-10

5.  ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.

Authors:  Fei Li; Hong Yu
Journal:  Proc Conf AAAI Artif Intell       Date:  2020-04-03

6.  Predicting mental conditions based on "history of present illness" in psychiatric notes with deep neural networks.

Authors:  Tung Tran; Ramakanth Kavuluru
Journal:  J Biomed Inform       Date:  2017-06-10       Impact factor: 6.317

7.  Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

Authors:  Akm Sabbir; Antonio Jimeno-Yepes; Ramakanth Kavuluru
Journal:  Proc IEEE Int Symp Bioinformatics Bioeng       Date:  2018-01-11

8.  EMR Coding with Semi-Parametric Multi-Head Matching Networks.

Authors:  Anthony Rios; Ramakanth Kavuluru
Journal:  Proc Conf       Date:  2018-06

9.  Improved biomedical word embeddings in the transformer era.

Authors:  Jiho Noh; Ramakanth Kavuluru
Journal:  J Biomed Inform       Date:  2021-07-18       Impact factor: 8.000

Review 10.  Veterinary informatics: forging the future between veterinary medicine, human medicine, and One Health initiatives-a joint paper by the Association for Veterinary Informatics (AVI) and the CTSA One Health Alliance (COHA).

Authors:  Jonathan L Lustgarten; Ashley Zehnder; Wayde Shipman; Elizabeth Gancher; Tracy L Webb
Journal:  JAMIA Open       Date:  2020-04-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.