Literature DB >> 30738949

Feature extraction for phenotyping from semantic and knowledge resources.

Wenxin Ning1, Stephanie Chan2, Andrew Beam3, Ming Yu1, Alon Geva4, Katherine Liao5, Mary Mullen6, Kenneth D Mandl7, Isaac Kohane3, Tianxi Cai8, Sheng Yu9.   

Abstract

OBJECTIVE: Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data.
METHODS: SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm.
RESULTS: SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors.
CONCLUSION: SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.
Copyright © 2019 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Distributional semantics; Electronic health records; Machine learning; Phenotyping

Mesh:

Year:  2019        PMID: 30738949      PMCID: PMC6424621          DOI: 10.1016/j.jbi.2019.103122

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  52 in total

1.  A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases.

Authors:  Christopher Kotfila; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-08-01       Impact factor: 6.317

2.  Simulating expert clinical comprehension: adapting latent semantic analysis to accurately extract clinical concepts from psychiatric narrative.

Authors:  Trevor Cohen; Brett Blatter; Vimla Patel
Journal:  J Biomed Inform       Date:  2008-03-27       Impact factor: 6.317

3.  Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

Authors:  Trevor Cohen; Roger Schvaneveldt; Dominic Widdows
Journal:  J Biomed Inform       Date:  2009-09-15       Impact factor: 6.317

4.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors:  Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2013-03-26       Impact factor: 4.497

5.  Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs.

Authors:  Bridget T McInnes; Ted Pedersen
Journal:  J Biomed Inform       Date:  2014-12-15       Impact factor: 6.317

6.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.

Authors:  Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden
Journal:  Nat Biotechnol       Date:  2013-12       Impact factor: 54.908

7.  Surrogate-assisted feature extraction for high-throughput phenotyping.

Authors:  Sheng Yu; Abhishek Chakrabortty; Katherine P Liao; Tianrun Cai; Ashwin N Ananthakrishnan; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2017-04-01       Impact factor: 4.497

8.  Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.

Authors:  Katherine P Liao; Ashwin N Ananthakrishnan; Vishesh Kumar; Zongqi Xia; Andrew Cagan; Vivian S Gainer; Sergey Goryachev; Pei Chen; Guergana K Savova; Denis Agniel; Susanne Churchill; Jaeyoung Lee; Shawn N Murphy; Robert M Plenge; Peter Szolovits; Isaac Kohane; Stanley Y Shaw; Elizabeth W Karlson; Tianxi Cai
Journal:  PLoS One       Date:  2015-08-24       Impact factor: 3.240

9.  A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation.

Authors:  Wenxin Ning; Ming Yu; Runtong Zhang
Journal:  BMC Med Inform Decis Mak       Date:  2016-03-03       Impact factor: 2.796

10.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing.

Authors:  Katherine P Liao; Tianxi Cai; Guergana K Savova; Shawn N Murphy; Elizabeth W Karlson; Ashwin N Ananthakrishnan; Vivian S Gainer; Stanley Y Shaw; Zongqi Xia; Peter Szolovits; Susanne Churchill; Isaac Kohane
Journal:  BMJ       Date:  2015-04-24
View more
  6 in total

1.  High-throughput phenotyping with temporal sequences.

Authors:  Hossein Estiri; Zachary H Strasser; Shawn N Murphy
Journal:  J Am Med Inform Assoc       Date:  2021-03-18       Impact factor: 4.497

Review 2.  Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches.

Authors:  Barbara M Decker; Chloé E Hill; Steven N Baldassano; Pouya Khankhanian
Journal:  Seizure       Date:  2021-01-13       Impact factor: 3.184

3.  Automated ICD coding via unsupervised knowledge integration (UNITE).

Authors:  Aaron Sonabend W; Winston Cai; Yuri Ahuja; Ashwin Ananthakrishnan; Zongqi Xia; Sheng Yu; Chuan Hong
Journal:  Int J Med Inform       Date:  2020-04-04       Impact factor: 4.730

4.  What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.

Authors:  Griffin M Weber; Tianxi Cai; Isaac S Kohane; Bruce J Aronow; Paul Avillach; Brett K Beaulieu-Jones; Riccardo Bellazzi; Robert L Bradford; Gabriel A Brat; Mario Cannataro; James J Cimino; Noelia García-Barrio; Nils Gehlenborg; Marzyeh Ghassemi; Alba Gutiérrez-Sacristán; David A Hanauer; John H Holmes; Chuan Hong; Jeffrey G Klann; Ne Hooi Will Loh; Yuan Luo; Kenneth D Mandl; Mohamad Daniar; Jason H Moore; Shawn N Murphy; Antoine Neuraz; Kee Yuan Ngiam; Gilbert S Omenn; Nathan Palmer; Lav P Patel; Miguel Pedrera-Jiménez; Piotr Sliz; Andrew M South; Amelia Li Min Tan; Deanne M Taylor; Bradley W Taylor; Carlo Torti; Andrew K Vallejos; Kavishwar B Wagholikar
Journal:  J Med Internet Res       Date:  2021-03-02       Impact factor: 7.076

5.  Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods.

Authors:  Phyllis M Thangaraj; Benjamin R Kummer; Tal Lorberbaum; Mitchell S V Elkind; Nicholas P Tatonetti
Journal:  BioData Min       Date:  2020-12-07       Impact factor: 2.522

6.  Generative transfer learning for measuring plausibility of EHR diagnosis records.

Authors:  Hossein Estiri; Sebastien Vasey; Shawn N Murphy
Journal:  J Am Med Inform Assoc       Date:  2021-03-01       Impact factor: 4.497

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.