Literature DB >> 29218877

Automated disease cohort selection using word embeddings from Electronic Health Records.

Benjamin S Glicksberg1, Riccardo Miotto, Kipp W Johnson, Khader Shameer, Li Li, Rong Chen, Joel T Dudley.   

Abstract

Accurate and robust cohort definition is critical to biomedical discovery using Electronic Health Records (EHR). Similar to prospective study designs, high quality EHR-based research requires rigorous selection criteria to designate case/control status particular to each disease. Electronic phenotyping algorithms, which are manually built and validated per disease, have been successful in filling this need. However, these approaches are time-consuming, leading to only a relatively small amount of algorithms for diseases developed. Methodologies that automatically learn features from EHRs have been used for cohort selection as well. To date, however, there has been no systematic analysis of how these methods perform against current gold standards. Accordingly, this paper compares the performance of a state-of-the-art automated feature learning method to extracting research-grade cohorts for five diseases against their established electronic phenotyping algorithms. In particular, we use word2vec to create unsupervised embeddings of the phenotype space within an EHR system. Using medical concepts as a query, we then rank patients by their proximity in the embedding space and automatically extract putative disease cohorts via a distance threshold. Experimental evaluation shows promising results with average F-score of 0.57 and AUC-ROC of 0.98. However, we noticed that results varied considerably between diseases, thus necessitating further investigation and/or phenotype-specific refinement of the approach before being readily deployed across all diseases.

Entities:  

Mesh:

Year:  2018        PMID: 29218877      PMCID: PMC5788312     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  18 in total

1.  A bootstrapping algorithm to improve cohort identification using structured data.

Authors:  Sasikiran Kandula; Qing Zeng-Treitler; Lingji Chen; William L Salomon; Bruce E Bray
Journal:  J Biomed Inform       Date:  2011-11-07       Impact factor: 6.317

2.  Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance.

Authors:  Wei-Qi Wei; Pedro L Teixeira; Huan Mo; Robert M Cronin; Jeremy L Warner; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2015-09-02       Impact factor: 4.497

3.  Identification and Validation of a Sickle Cell Disease Cohort Within Electronic Health Records.

Authors:  Daniel E Michalik; Bradley W Taylor; Julie A Panepinto
Journal:  Acad Pediatr       Date:  2016-12-13       Impact factor: 3.107

4.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.

Authors:  Jacqueline C Kirby; Peter Speltz; Luke V Rasmussen; Melissa Basford; Omri Gottesman; Peggy L Peissig; Jennifer A Pacheco; Gerard Tromp; Jyotishman Pathak; David S Carrell; Stephen B Ellis; Todd Lingren; Will K Thompson; Guergana Savova; Jonathan Haines; Dan M Roden; Paul A Harris; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2016-03-28       Impact factor: 4.497

5.  Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Authors:  Sheng Yu; Katherine P Liao; Stanley Y Shaw; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2015-04-29       Impact factor: 4.497

Review 6.  Mining electronic health records: towards better research applications and clinical care.

Authors:  Peter B Jensen; Lars J Jensen; Søren Brunak
Journal:  Nat Rev Genet       Date:  2012-05-02       Impact factor: 53.242

7.  The open biomedical annotator.

Authors:  Clement Jonquet; Nigam H Shah; Mark A Musen
Journal:  Summit Transl Bioinform       Date:  2009-03-01

Review 8.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future.

Authors:  Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams
Journal:  Genet Med       Date:  2013-06-06       Impact factor: 8.822

9.  Learning Low-Dimensional Representations of Medical Concepts.

Authors:  Youngduck Choi; Chill Yi-I Chiu; David Sontag
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-20

10.  Learning a Health Knowledge Graph from Electronic Medical Records.

Authors:  Maya Rotmensch; Yoni Halpern; Abdulhakim Tlimat; Steven Horng; David Sontag
Journal:  Sci Rep       Date:  2017-07-20       Impact factor: 4.379

View more
  14 in total

1.  Clinical trial cohort selection based on multi-level rule-based natural language processing system.

Authors:  Long Chen; Yu Gu; Xin Ji; Chao Lou; Zhiyong Sun; Haodan Li; Yuan Gao; Yang Huang
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

2.  Medical knowledge infused convolutional neural networks for cohort selection in clinical trials.

Authors:  Chi-Jen Chen; Neha Warikoo; Yung-Chun Chang; Jin-Hua Chen; Wen-Lian Hsu
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

3.  Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task.

Authors:  Steven R Chamberlin; Steven D Bedrick; Aaron M Cohen; Yanshan Wang; Andrew Wen; Sijia Liu; Hongfang Liu; William R Hersh
Journal:  JAMIA Open       Date:  2020-07-26

4.  A high-throughput phenotyping algorithm is portable from adult to pediatric populations.

Authors:  Alon Geva; Molei Liu; Vidul A Panickan; Paul Avillach; Tianxi Cai; Kenneth D Mandl
Journal:  J Am Med Inform Assoc       Date:  2021-06-12       Impact factor: 4.497

5.  Learning relevance models for patient cohort retrieval.

Authors:  Travis R Goodwin; Sanda M Harabagiu
Journal:  JAMIA Open       Date:  2018-09-28

6.  Automatic Human-like Mining and Constructing Reliable Genetic Association Database with Deep Reinforcement Learning.

Authors:  Haohan Wang; Xiang Liu; Yifeng Tao; Wenting Ye; Qiao Jin; William W Cohen; Eric P Xing
Journal:  Pac Symp Biocomput       Date:  2019

7.  Cohort selection for clinical trials using deep learning models.

Authors:  Isabel Segura-Bedmar; Pablo Raez
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

8.  Using indication embeddings to represent patient health for drug safety studies.

Authors:  Rachel D Melamed
Journal:  JAMIA Open       Date:  2020-10-27

9.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.

Authors:  Cao Xiao; Edward Choi; Jimeng Sun
Journal:  J Am Med Inform Assoc       Date:  2018-10-01       Impact factor: 4.497

10.  Deep representation learning of electronic health records to unlock patient stratification at scale.

Authors:  Isotta Landi; Benjamin S Glicksberg; Hao-Chih Lee; Sarah Cherng; Giulia Landi; Matteo Danieletto; Joel T Dudley; Cesare Furlanello; Riccardo Miotto
Journal:  NPJ Digit Med       Date:  2020-07-17
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.