Literature DB >> 25038555

Limestone: high-throughput candidate phenotype generation via tensor factorization.

Joyce C Ho1, Joydeep Ghosh2, Steve R Steinhubl3, Walter F Stewart4, Joshua C Denny5, Bradley A Malin6, Jimeng Sun7.   

Abstract

The rapidly increasing availability of electronic health records (EHRs) from multiple heterogeneous sources has spearheaded the adoption of data-driven approaches for improved clinical research, decision making, prognosis, and patient management. Unfortunately, EHR data do not always directly and reliably map to medical concepts that clinical researchers need or use. Some recent studies have focused on EHR-derived phenotyping, which aims at mapping the EHR data to specific medical concepts; however, most of these approaches require labor intensive supervision from experienced clinical professionals. Furthermore, existing approaches are often disease-centric and specialized to the idiosyncrasies of the information technology and/or business practices of a single healthcare organization. In this paper, we propose Limestone, a nonnegative tensor factorization method to derive phenotype candidates with virtually no human supervision. Limestone represents the data source interactions naturally using tensors (a generalization of matrices). In particular, we investigate the interaction of diagnoses and medications among patients. The resulting tensor factors are reported as phenotype candidates that automatically reveal patient clusters on specific diagnoses and medications. Using the proposed method, multiple phenotypes can be identified simultaneously from data. We demonstrate the capability of Limestone on a cohort of 31,815 patient records from the Geisinger Health System. The dataset spans 7years of longitudinal patient records and was initially constructed for a heart failure onset prediction study. Our experiments demonstrate the robustness, stability, and the conciseness of Limestone-derived phenotypes. Our results show that using only 40 phenotypes, we can outperform the original 640 features (169 diagnosis categories and 471 medication types) to achieve an area under the receiver operator characteristic curve (AUC) of 0.720 (95% CI 0.715 to 0.725). Moreover, in consultation with a medical expert, we confirmed 82% of the top 50 candidates automatically extracted by Limestone are clinically meaningful.
Copyright © 2014 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Dimensionality reduction; EHR phenotyping; Nonnegative tensor factorization

Mesh:

Year:  2014        PMID: 25038555      PMCID: PMC6563906          DOI: 10.1016/j.jbi.2014.07.001

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  42 in total

1.  Implications of non-stationarity on predictive modeling using EHRs.

Authors:  Kenneth Jung; Nigam H Shah
Journal:  J Biomed Inform       Date:  2015-10-20       Impact factor: 6.317

2.  Trends in biomedical informatics: automated topic analysis of JAMIA articles.

Authors:  Dong Han; Shuang Wang; Chao Jiang; Xiaoqian Jiang; Hyeon-Eui Kim; Jimeng Sun; Lucila Ohno-Machado
Journal:  J Am Med Inform Assoc       Date:  2015-11       Impact factor: 4.497

3.  Coronary artery disease risk assessment from unstructured electronic health records using text mining.

Authors:  Jitendra Jonnagaddala; Siaw-Teng Liaw; Pradeep Ray; Manish Kumar; Nai-Wen Chang; Hong-Jie Dai
Journal:  J Biomed Inform       Date:  2015-08-28       Impact factor: 6.317

4.  Learning Clinical Workflows to Identify Subgroups of Heart Failure Patients.

Authors:  Chao Yan; You Chen; Bo Li; David Liebovitz; Bradley Malin
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

5.  Building bridges across electronic health record systems through inferred phenotypic topics.

Authors:  You Chen; Joydeep Ghosh; Cosmin Adrian Bejan; Carl A Gunter; Siddharth Gupta; Abel Kho; David Liebovitz; Jimeng Sun; Joshua Denny; Bradley Malin
Journal:  J Biomed Inform       Date:  2015-04-01       Impact factor: 6.317

6.  Clinical risk prediction by exploring high-order feature correlations.

Authors:  Fei Wang; Ping Zhang; Xiang Wang; Jianying Hu
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

7.  LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.

Authors:  Kejing Yin; Ardavan Afshar; Joyce C Ho; William K Cheung; Chao Zhang; Jimeng Sun
Journal:  KDD       Date:  2020-08

Review 8.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors:  S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal:  Yearb Med Inform       Date:  2017-09-11

Review 9.  Health Informatics via Machine Learning for the Clinical Management of Patients.

Authors:  D A Clifton; K E Niehaus; P Charlton; G W Colopy
Journal:  Yearb Med Inform       Date:  2015-08-13

10.  A Graph Based Methodology for Temporal Signature Identification from HER.

Authors:  Fei Wang; Chuanren Liu; Yajuan Wang; Jianying Hu; Guoqiang Yu
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.