Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Selecting relevant features from the electronic health record for clinical code prediction.

Literature DB >> 28919106

Selecting relevant features from the electronic health record for clinical code prediction.

Elyne Scheurwegs¹, Boris Cule², Kim Luyckx³, Léon Luyten⁴, Walter Daelemans⁵.

Abstract

A multitude of information sources is present in the electronic health record (EHR), each of which can contain clues to automatically assign diagnosis and procedure codes. These sources however show information overlap and quality differences, which complicates the retrieval of these clues. Through feature selection, a denser representation with a consistent quality and less information overlap can be obtained. We introduce and compare coverage-based feature selection methods, based on confidence and information gain. These approaches were evaluated over a range of medical specialties, with seven different medical specialties for ICD-9-CM code prediction (six at the Antwerp University Hospital and one in the MIMIC-III dataset) and two different medical specialties for ICD-10-CM code prediction. Using confidence coverage to integrate all sources in an EHR shows a consistent improvement in F-measure (49.83% for diagnosis codes on average), both compared with the baseline (44.25% for diagnosis codes on average) and with using the best standalone source (44.41% for diagnosis codes on average). Confidence coverage creates a concise patient stay representation independent of a rigid framework such as UMLS, and contains easily interpretable features. Confidence coverage has several advantages to a baseline setup. In our baseline setup, feature selection was limited to a filter removing features with less than five total occurrences in the trainingset. Prediction results improved consistently when using multiple heterogeneous sources to predict clinical codes, while reducing the number of features and the processing time.

Entities: Species

Keywords: Clinical coding; Data integration; Data representation; EHR mining; Feature selection

Mesh：

Year: 2017 PMID： 28919106 DOI： 10.1016/j.jbi.2017.09.004

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

4 in total

1. Can structured EHR data support clinical coding? A data mining approach.

Authors: José Carlos Ferrão; Mónica Duarte Oliveira; Filipe Janela; Henrique M G Martins; Daniel Gartner
Journal: Health Syst (Basingstoke) Date: 2020-03-01

2. ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network.

Authors: Fei Li; Hong Yu
Journal: Proc Conf AAAI Artif Intell Date: 2020-04-03

3. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities.

Authors: Lauren J Beesley; Maxwell Salvatore; Lars G Fritsche; Anita Pandit; Arvind Rao; Chad Brummett; Cristen J Willer; Lynda D Lisabeth; Bhramar Mukherjee
Journal: Stat Med Date: 2019-12-20 Impact factor: 2.373

4. Data-driven phenotype discovery of FMR1 premutation carriers in a population-based sample.

Authors: Arezoo Movaghar; David Page; Murray Brilliant; Mei Wang Baker; Jan Greenberg; Jinkuk Hong; Leann Smith DaWalt; Krishanu Saha; Finn Kuusisto; Ron Stewart; Elizabeth Berry-Kravis; Marsha R Mailick
Journal: Sci Adv Date: 2019-08-21 Impact factor: 14.136

4 in total