Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.

Literature DB >> 32930711

Representation of EHR data for predictive modeling: a comparison between UMLS and other terminologies.

Laila Rasmy¹, Firat Tiryaki¹, Yujia Zhou¹, Yang Xiang¹, Cui Tao¹, Hua Xu¹, Degui Zhi¹.

Abstract

OBJECTIVE: Predictive disease modeling using electronic health record data is a growing field. Although clinical data in their raw form can be used directly for predictive modeling, it is a common practice to map data to standard terminologies to facilitate data aggregation and reuse. There is, however, a lack of systematic investigation of how different representations could affect the performance of predictive models, especially in the context of machine learning and deep learning.
MATERIALS AND METHODS: We projected the input diagnoses data in the Cerner HealthFacts database to Unified Medical Language System (UMLS) and 5 other terminologies, including CCS, CCSR, ICD-9, ICD-10, and PheWAS, and evaluated the prediction performances of these terminologies on 2 different tasks: the risk prediction of heart failure in diabetes patients and the risk prediction of pancreatic cancer. Two popular models were evaluated: logistic regression and a recurrent neural network.
RESULTS: For logistic regression, using UMLS delivered the optimal area under the receiver operating characteristics (AUROC) results in both dengue hemorrhagic fever (81.15%) and pancreatic cancer (80.53%) tasks. For recurrent neural network, UMLS worked best for pancreatic cancer prediction (AUROC 82.24%), second only (AUROC 85.55%) to PheWAS (AUROC 85.87%) for dengue hemorrhagic fever prediction. DISCUSSION/
CONCLUSION: In our experiments, terminologies with larger vocabularies and finer-grained representations were associated with better prediction performances. In particular, UMLS is consistently 1 of the best-performing ones. We believe that our work may help to inform better designs of predictive models, although further investigation is warranted.

Entities: Disease Species

Keywords: UMLS; electronic health records; predictive modeling; terminology representation

Mesh：

Year: 2020 PMID： 32930711 PMCID： PMC7647355 DOI： 10.1093/jamia/ocaa180

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

16 in total

1. The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors: Olivier Bodenreider
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. Time-sensitive clinical concept embeddings learned from large electronic health records.

Authors: Yang Xiang; Jun Xu; Yuqi Si; Zhiheng Li; Laila Rasmy; Yujia Zhou; Firat Tiryaki; Fang Li; Yaoyun Zhang; Yonghui Wu; Xiaoqian Jiang; Wenjin Jim Zheng; Degui Zhi; Cui Tao; Hua Xu
Journal: BMC Med Inform Decis Mak Date: 2019-04-09 Impact factor: 2.796

3. An evaluation of the NQF Quality Data Model for representing Electronic Health Record driven phenotyping algorithms.

Authors: William K Thompson; Luke V Rasmussen; Jennifer A Pacheco; Peggy L Peissig; Joshua C Denny; Abel N Kho; Aaron Miller; Jyotishman Pathak
Journal: AMIA Annu Symp Proc Date: 2012-11-03

4. A study of generalizability of recurrent neural network-based predictive models for heart failure onset risk using a large and heterogeneous EHR data set.

Authors: Laila Rasmy; Yonghui Wu; Ningtao Wang; Xin Geng; W Jim Zheng; Fei Wang; Hulin Wu; Hua Xu; Degui Zhi
Journal: J Biomed Inform Date: 2018-06-15 Impact factor: 6.317

5. Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record.

Authors: Wei-Qi Wei; Lisa A Bastarache; Robert J Carroll; Joy E Marlo; Travis J Osterman; Eric R Gamazon; Nancy J Cox; Dan M Roden; Joshua C Denny
Journal: PLoS One Date: 2017-07-07 Impact factor: 3.240

6. Development of a prediction model for pancreatic cancer in patients with type 2 diabetes using logistic regression and artificial neural network models.

Authors: Meng Hsuen Hsieh; Li-Min Sun; Cheng-Li Lin; Meng-Ju Hsieh; Chung-Y Hsu; Chia-Hung Kao
Journal: Cancer Manag Res Date: 2018-11-26 Impact factor: 3.989

7. Predictive Modeling of the Hospital Readmission Risk from Patients' Claims Data Using Machine Learning: A Case Study on COPD.

Authors: Xu Min; Bin Yu; Fei Wang
Journal: Sci Rep Date: 2019-02-20 Impact factor: 4.379

8. Learning Low-Dimensional Representations of Medical Concepts.

Authors: Youngduck Choi; Chill Yi-I Chiu; David Sontag
Journal: AMIA Jt Summits Transl Sci Proc Date: 2016-07-20

9. Using recurrent neural network models for early detection of heart failure onset.

Authors: Edward Choi; Andy Schuetz; Walter F Stewart; Jimeng Sun
Journal: J Am Med Inform Assoc Date: 2017-03-01 Impact factor: 4.497

10. Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Authors: Andrew L Beam; Benjamin Kompa; Allen Schmaltz; Inbar Fried; Griffin Weber; Nathan Palmer; Xu Shi; Tianxi Cai; Isaac S Kohane
Journal: Pac Symp Biocomput Date: 2020

7 in total

1. The UMLS knowledge sources at 30: indispensable to current research and applications in biomedical informatics.

Authors: Betsy L Humphreys; Guilherme Del Fiol; Hua Xu
Journal: J Am Med Inform Assoc Date: 2020-10-01 Impact factor: 4.497

Review 2. Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS.

Authors: Lisa Bastarache
Journal: Annu Rev Biomed Data Sci Date: 2021-07-20

3. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials.

Authors: Ahmed Rafee; Sarah Riepenhausen; Philipp Neuhaus; Alexandra Meidt; Martin Dugas; Julian Varghese
Journal: BMC Med Res Methodol Date: 2022-05-14 Impact factor: 4.612

4. Augmenting aer2vec: Enriching distributed representations of adverse event report data with orthographic and lexical information.

Authors: Xiruo Ding; Justin Mower; Devika Subramanian; Trevor Cohen
Journal: J Biomed Inform Date: 2021-06-08 Impact factor: 8.000

5. The Mass General Brigham Biobank Portal: an i2b2-based data repository linking disparate and high-dimensional patient data to support multimodal analytics.

Authors: Victor M Castro; Vivian Gainer; Nich Wattanasin; Barbara Benoit; Andrew Cagan; Bhaswati Ghosh; Sergey Goryachev; Reeta Metta; Heekyong Park; David Wang; Michael Mendis; Martin Rees; Christopher Herrick; Shawn N Murphy
Journal: J Am Med Inform Assoc Date: 2022-03-15 Impact factor: 4.497

6. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data.

Authors: Laila Rasmy; Masayuki Nigo; Bijun Sai Kannadath; Ziqian Xie; Bingyu Mao; Khush Patel; Yujia Zhou; Wanheng Zhang; Angela Ross; Hua Xu; Degui Zhi
Journal: Lancet Digit Health Date: 2022-04-21

7. Standard Vocabularies to Improve Machine Learning Model Transferability With Electronic Health Record Data: Retrospective Cohort Study Using Health Care-Associated Infection.

Authors: Amber C Kiser; Karen Eilbeck; Jeffrey P Ferraro; David E Skarda; Matthew H Samore; Brian Bucher
Journal: JMIR Med Inform Date: 2022-08-30

7 in total