Literature DB >> 32930711

Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.

Laila Rasmy1, Firat Tiryaki1, Yujia Zhou1, Yang Xiang1, Cui Tao1, Hua Xu1, Degui Zhi1.   

Abstract

OBJECTIVE: Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning.
MATERIALS AND METHODS: We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, ICD-9, ICD-10, and PheWAS, and evaluated the prediction performances of these terminologies on 2 different tasks: the risk prediction of heart failure in diabetes patients and the risk prediction of pancreatic cancer. Two popular models were evaluated: logistic regression and a recurrent neural network.
RESULTS: For logistic regression, using UMLS delivered the optimal area under the receiver operating characteristics (AUROC) results in both dengue hemorrhagic fever (81.15%) and pancreatic cancer (80.53%) tasks. For recurrent neural network, UMLS worked best for pancreatic cancer prediction (AUROC 82.24%), second only (AUROC 85.55%) to PheWAS (AUROC 85.87%) for dengue hemorrhagic fever prediction. DISCUSSION/
CONCLUSION: In our experiments, terminologies with larger vocabularies and finer-grained representations were associated with better prediction performances. In particular, UMLS is consistently 1 of the best-performing ones. We believe that our work may help to inform better designs of predictive models, although further investigation is warranted.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  UMLS; electronic health records; predictive modeling; terminology representation

Mesh:

Year:  2020        PMID: 32930711      PMCID: PMC7647355          DOI: 10.1093/jamia/ocaa180

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  16 in total

1.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  Time-sensitive clinical concept embeddings learned from large electronic health records.

Authors:  Yang Xiang; Jun Xu; Yuqi Si; Zhiheng Li; Laila Rasmy; Yujia Zhou; Firat Tiryaki; Fang Li; Yaoyun Zhang; Yonghui Wu; Xiaoqian Jiang; Wenjin Jim Zheng; Degui Zhi; Cui Tao; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2019-04-09       Impact factor: 2.796

3.  An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms.

Authors:  William K Thompson; Luke V Rasmussen; Jennifer A Pacheco; Peggy L Peissig; Joshua C Denny; Abel N Kho; Aaron Miller; Jyotishman Pathak
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

4.  A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set.

Authors:  Laila Rasmy; Yonghui Wu; Ningtao Wang; Xin Geng; W Jim Zheng; Fei Wang; Hulin Wu; Hua Xu; Degui Zhi
Journal:  J Biomed Inform       Date:  2018-06-15       Impact factor: 6.317

5.  Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record.

Authors:  Wei-Qi Wei; Lisa A Bastarache; Robert J Carroll; Joy E Marlo; Travis J Osterman; Eric R Gamazon; Nancy J Cox; Dan M Roden; Joshua C Denny
Journal:  PLoS One       Date:  2017-07-07       Impact factor: 3.240

6.  Development of a prediction model for pancreatic cancer in patients with type 2 diabetes using logistic regression and artificial neural network models.

Authors:  Meng Hsuen Hsieh; Li-Min Sun; Cheng-Li Lin; Meng-Ju Hsieh; Chung-Y Hsu; Chia-Hung Kao
Journal:  Cancer Manag Res       Date:  2018-11-26       Impact factor: 3.989

7.  Predictive Modeling of the Hospital Readmission Risk from Patients' Claims Data Using Machine Learning: A Case Study on COPD.

Authors:  Xu Min; Bin Yu; Fei Wang
Journal:  Sci Rep       Date:  2019-02-20       Impact factor: 4.379

8.  Learning Low-Dimensional Representations of Medical Concepts.

Authors:  Youngduck Choi; Chill Yi-I Chiu; David Sontag
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-20

9.  Using recurrent neural network models for early detection of heart failure onset.

Authors:  Edward Choi; Andy Schuetz; Walter F Stewart; Jimeng Sun
Journal:  J Am Med Inform Assoc       Date:  2017-03-01       Impact factor: 4.497

10.  Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Authors:  Andrew L Beam; Benjamin Kompa; Allen Schmaltz; Inbar Fried; Griffin Weber; Nathan Palmer; Xu Shi; Tianxi Cai; Isaac S Kohane
Journal:  Pac Symp Biocomput       Date:  2020
View more
  7 in total

1.  The UMLS knowledge sources at 30: indispensable to current research and applications in biomedical informatics.

Authors:  Betsy L Humphreys; Guilherme Del Fiol; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2020-10-01       Impact factor: 4.497

Review 2.  Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS.

Authors:  Lisa Bastarache
Journal:  Annu Rev Biomed Data Sci       Date:  2021-07-20

3.  ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials.

Authors:  Ahmed Rafee; Sarah Riepenhausen; Philipp Neuhaus; Alexandra Meidt; Martin Dugas; Julian Varghese
Journal:  BMC Med Res Methodol       Date:  2022-05-14       Impact factor: 4.612

4.  Augmenting aer2vec: Enriching distributed representations of adverse event report data with orthographic and lexical information.

Authors:  Xiruo Ding; Justin Mower; Devika Subramanian; Trevor Cohen
Journal:  J Biomed Inform       Date:  2021-06-08       Impact factor: 8.000

5.  The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics.

Authors:  Victor M Castro; Vivian Gainer; Nich Wattanasin; Barbara Benoit; Andrew Cagan; Bhaswati Ghosh; Sergey Goryachev; Reeta Metta; Heekyong Park; David Wang; Michael Mendis; Martin Rees; Christopher Herrick; Shawn N Murphy
Journal:  J Am Med Inform Assoc       Date:  2022-03-15       Impact factor: 4.497

6.  Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data.

Authors:  Laila Rasmy; Masayuki Nigo; Bijun Sai Kannadath; Ziqian Xie; Bingyu Mao; Khush Patel; Yujia Zhou; Wanheng Zhang; Angela Ross; Hua Xu; Degui Zhi
Journal:  Lancet Digit Health       Date:  2022-04-21

7.  Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care-Associated Infection.

Authors:  Amber C Kiser; Karen Eilbeck; Jeffrey P Ferraro; David E Skarda; Matthew H Samore; Brian Bucher
Journal:  JMIR Med Inform       Date:  2022-08-30
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.