Elyne Scheurwegs1, Kim Luyckx2, Léon Luyten2, Walter Daelemans3, Tim Van den Bulcke4. 1. ADReM (Advanced Database Research and Modelling), Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp, Antwerp, Belgium elyne.scheurwegs@uantwerpen.be. 2. Department of Medical Information, Antwerp University Hospital, Antwerp, Belgium. 3. Computational Linguistics and Psycholinguistics (CLiPS) Research Center, University of Antwerp, Antwerp, Belgium. 4. Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp - Antwerp University Hospital, Belgium; ADReM (Advanced Database Research and Modelling), University of Antwerp, Antwerp, Belgium.
Abstract
OBJECTIVE: Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation. METHODS: Two separate data integration approaches were evaluated. Early data integration combines features of several sources within a single model, and late data integration learns a separate model per data source and combines these predictions with a meta-learner. This is evaluated on data sources and clinical codes from a broad set of medical specialties. RESULTS: When compared with the best individual prediction source, late data integration leads to improvements in predictive power (eg, overall F-measure increased from 30.6% to 38.3% for International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes), while early data integration is less consistent. The predictive strength strongly differs between medical specialties, both for ICD-9-CM diagnostic and procedural codes. DISCUSSION: Structured data provides complementary information to unstructured data (and vice versa) for predicting ICD-9-CM codes. This can be captured most effectively by the proposed late data integration approach. CONCLUSIONS: We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.
OBJECTIVE: Enormous amounts of healthcare data are becoming increasingly accessible through the large-scale adoption of electronic health records. In this work, structured and unstructured (textual) data are combined to assign clinical diagnostic and procedural codes (specifically ICD-9-CM) to patient stays. We investigate whether integrating these heterogeneous data types improves prediction strength compared to using the data types in isolation. METHODS: Two separate data integration approaches were evaluated. Early data integration combines features of several sources within a single model, and late data integration learns a separate model per data source and combines these predictions with a meta-learner. This is evaluated on data sources and clinical codes from a broad set of medical specialties. RESULTS: When compared with the best individual prediction source, late data integration leads to improvements in predictive power (eg, overall F-measure increased from 30.6% to 38.3% for International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) diagnostic codes), while early data integration is less consistent. The predictive strength strongly differs between medical specialties, both for ICD-9-CM diagnostic and procedural codes. DISCUSSION: Structured data provides complementary information to unstructured data (and vice versa) for predicting ICD-9-CM codes. This can be captured most effectively by the proposed late data integration approach. CONCLUSIONS: We demonstrated that models using multiple electronic health record data sources systematically outperform models using data sources in isolation in the task of predicting ICD-9-CM codes over a broad range of medical specialties.
Authors: Jyotishman Pathak; Kent R Bailey; Calvin E Beebe; Steven Bethard; David C Carrell; Pei J Chen; Dmitriy Dligach; Cory M Endle; Lacey A Hart; Peter J Haug; Stanley M Huff; Vinod C Kaggal; Dingcheng Li; Hongfang Liu; Kyle Marchant; James Masanz; Timothy Miller; Thomas A Oniki; Martha Palmer; Kevin J Peterson; Susan Rea; Guergana K Savova; Craig R Stancl; Sunghwan Sohn; Harold R Solbrig; Dale B Suesse; Cui Tao; David P Taylor; Les Westberg; Stephen Wu; Ning Zhuo; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2013-11-04 Impact factor: 4.497
Authors: S A R Nouraei; S O'Hanlon; C R Butler; A Hadovsky; E Donald; E Benjamin; G S Sandhu Journal: Clin Otolaryngol Date: 2009-02 Impact factor: 2.597
Authors: Anthony N Nguyen; Donna Truran; Madonna Kemp; Bevan Koopman; David Conlan; John O'Dwyer; Ming Zhang; Sarvnaz Karimi; Hamed Hassanzadeh; Michael J Lawley; Damian Green Journal: AMIA Annu Symp Proc Date: 2018-12-05
Authors: José Carlos Ferrão; Mónica Duarte Oliveira; Filipe Janela; Henrique M G Martins; Daniel Gartner Journal: Health Syst (Basingstoke) Date: 2020-03-01
Authors: Daniel J Feller; Oliver J Bear Don't Walk Iv; Jason Zucker; Michael T Yin; Peter Gordon; Noémie Elhadad Journal: Appl Clin Inform Date: 2020-03-04 Impact factor: 2.342
Authors: Honghan Wu; Giulia Toti; Katherine I Morley; Zina M Ibrahim; Amos Folarin; Richard Jackson; Ismail Kartoglu; Asha Agrawal; Clive Stringer; Darren Gale; Genevieve Gorrell; Angus Roberts; Matthew Broadbent; Robert Stewart; Richard J B Dobson Journal: J Am Med Inform Assoc Date: 2018-05-01 Impact factor: 4.497
Authors: Sudhi G Upadhyaya; Dennis H Murphree; Che G Ngufor; Alison M Knight; Daniel J Cronk; Robert R Cima; Timothy B Curry; Jyotishman Pathak; Rickey E Carter; Daryl J Kor Journal: Mayo Clin Proc Innov Qual Outcomes Date: 2017-04-28
Authors: Ayoub Bagheri; T Katrien J Groenhof; Folkert W Asselbergs; Saskia Haitjema; Michiel L Bots; Wouter B Veldhuis; Pim A de Jong; Daniel L Oberski Journal: J Healthc Eng Date: 2021-07-09 Impact factor: 2.682