Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

Literature DB >> 26054428

An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

Ramakanth Kavuluru¹, Anthony Rios², Yuan Lu³.

Abstract

BACKGROUND: Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment.
OBJECTIVE: We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets.
METHODS: We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011-2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations.
RESULTS: Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance.
CONCLUSIONS: We show that datasets at different scale (size of the EMRs, number of distinct codes) and with different characteristics warrant different learning approaches. For shorter narratives pertaining to a particular medical subdomain (e.g., radiology, pathology), classifier chaining is ideal given the codes are highly related with each other. For realistic in-patient full EMRs, feature and data selection methods offer high performance for smaller datasets. However, for large EMR datasets, we observe that the binary relevance approach with learning-to-rank based code reranking offers the best performance. Regardless of the training dataset size, for general EMRs, label calibration to select the optimal number of labels is an indispensable final step.

BACKGROUND: Diagnosis codes are assigned to medical records in healthcare facilities by trained coders by reviewing all physician authored documents associated with a patient's visit. This is a necessary and complex task involving coders adhering to coding guidelines and coding all assignable codes. With the popularity of electronic medical records (EMRs), computational approaches to code assignment have been proposed in the recent years. However, most efforts have focused on single and often short clinical narratives, while realistic scenarios warrant full EMR level analysis for code assignment. OBJECTIVE: We evaluate supervised learning approaches to automatically assign international classification of diseases (ninth revision) - clinical modification (ICD-9-CM) codes to EMRs by experimenting with a large realistic EMR dataset. The overall goal is to identify methods that offer superior performance in this task when considering such datasets. METHODS: We use a dataset of 71,463 EMRs corresponding to in-patient visits with discharge date falling in a two year period (2011-2012) from the University of Kentucky (UKY) Medical Center. We curate a smaller subset of this dataset and also use a third gold standard dataset of radiology reports. We conduct experiments using different problem transformation approaches with feature and data selection components and employing suitable label calibration and ranking methods with novel features involving code co-occurrence frequencies and latent code associations. RESULTS: Over all codes with at least 50 training examples we obtain a micro F-score of 0.48. On the set of codes that occur at least in 1% of the two year dataset, we achieve a micro F-score of 0.54. For the smaller radiology report dataset, the classifier chaining approach yields best results. For the smaller subset of the UKY dataset, feature selection, data selection, and label calibration offer best performance. CONCLUSIONS: We show that datasets at different scale (size of the EMRs, number of distinct codes) and with different characteristics warrant different learning approaches. For shorter narratives pertaining to a particular medical subdomain (e.g., radiology, pathology), classifier chaining is ideal given the codes are highly related with each other. For realistic in-patient full EMRs, feature and data selection methods offer high performance for smaller datasets. However, for large EMR datasets, we observe that the binary relevance approach with learning-to-rank based code reranking offers the best performance. Regardless of the training dataset size, for general EMRs, label calibration to select the optimal number of labels is an indispensable final step.

Entities: Chemical Disease Gene Species

Keywords: Diagnosis code assignment; Label calibration; Learning to rank; Multi-label text classification

Mesh：

Year: 2015 PMID： 26054428 PMCID： PMC4605853 DOI： 10.1016/j.artmed.2015.04.007

Source DB: PubMed Journal: Artif Intell Med ISSN： 0933-3657 Impact factor: 5.326

12 in total

Review 10. Veterinary informatics: forging the future between veterinary medicine, human medicine, and One Health initiatives-a joint paper by the Association for Veterinary Informatics (AVI) and the CTSA One Health Alliance (COHA).

Authors: Jonathan L Lustgarten; Ashley Zehnder; Wayde Shipman; Elizabeth Gancher; Tracy L Webb
Journal: JAMIA Open Date: 2020-04-11

An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

1. An overview of MetaMap: historical perspective and recent advances.

Review 2. Empirical distributional semantics: methods and biomedical applications.

3. Three approaches to automatic assignment of ICD-9-CM codes to radiology reports.

4. From episodes of care to diagnosis codes: automatic text categorization for medico-economic encoding.

5. Optimal training sets for Bayesian prediction of MeSH assignment.

6. Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

7. Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies.

8. A recent advance in the automatic indexing of the biomedical literature.

9. Automatic construction of rule-based ICD-9-CM coding systems.

10. Diagnosis code assignment: models and evaluation metrics.

1. deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.

2. Computer-Assisted Diagnostic Coding: Effectiveness of an NLP-based approach using SNOMED CT to ICD-10 mappings.

3. Can structured EHR data support clinical coding? A data mining approach.

4. On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs.

5. ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.

6. Predicting mental conditions based on "history of present illness" in psychiatric notes with deep neural networks.

7. Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

8. EMR Coding with Semi-Parametric Multi-Head Matching Networks.

9. Improved biomedical word embeddings in the transformer era.

Review 10. Veterinary informatics: forging the future between veterinary medicine, human medicine, and One Health initiatives-a joint paper by the Association for Veterinary Informatics (AVI) and the CTSA One Health Alliance (COHA).