Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Feature extraction for phenotyping from semantic and knowledge resources.

Literature DB >> 30738949

Feature extraction for phenotyping from semantic and knowledge resources.

Wenxin Ning¹, Stephanie Chan², Andrew Beam³, Ming Yu¹, Alon Geva⁴, Katherine Liao⁵, Mary Mullen⁶, Kenneth D Mandl⁷, Isaac Kohane³, Tianxi Cai⁸, Sheng Yu⁹.

Abstract

OBJECTIVE: Phenotyping algorithms can efficiently and accurately identify patients with a specific disease phenotype and construct electronic health records (EHR)-based cohorts for subsequent clinical or genomic studies. Previous studies have introduced unsupervised EHR-based feature selection methods that yielded algorithms with high accuracy. However, those selection methods still require expert intervention to tweak the parameter settings according to the EHR data distribution for each phenotype. To further accelerate the development of phenotyping algorithms, we propose a fully automated and robust unsupervised feature selection method that leverages only publicly available medical knowledge sources, instead of EHR data.
METHODS: SEmantics-Driven Feature Extraction (SEDFE) collects medical concepts from online knowledge sources as candidate features and gives them vector-form distributional semantic representations derived with neural word embedding and the Unified Medical Language System Metathesaurus. A number of features that are semantically closest and that sufficiently characterize the target phenotype are determined by a linear decomposition criterion and are selected for the final classification algorithm.
RESULTS: SEDFE was compared with the EHR-based SAFE algorithm and domain experts on feature selection for the classification of five phenotypes including coronary artery disease, rheumatoid arthritis, Crohn's disease, ulcerative colitis, and pediatric pulmonary arterial hypertension using both supervised and unsupervised approaches. Algorithms yielded by SEDFE achieved comparable accuracy to those yielded by SAFE and expert-curated features. SEDFE is also robust to the input semantic vectors.
CONCLUSION: SEDFE attains satisfying performance in unsupervised feature selection for EHR phenotyping. Both fully automated and EHR-independent, this method promises efficiency and accuracy in developing algorithms for high-throughput phenotyping.

Entities: Chemical Disease Gene Species

Keywords: Distributional semantics; Electronic health records; Machine learning; Phenotyping

Mesh：

Year: 2019 PMID： 30738949 PMCID： PMC6424621 DOI： 10.1016/j.jbi.2019.103122

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

52 in total

1. A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases.

Authors: Christopher Kotfila; Özlem Uzuner
Journal: J Biomed Inform Date: 2015-08-01 Impact factor: 6.317

2. Simulating expert clinical comprehension: adapting latent semantic analysis to accurately extract clinical concepts from psychiatric narrative.

Authors: Trevor Cohen; Brett Blatter; Vimla Patel
Journal: J Biomed Inform Date: 2008-03-27 Impact factor: 6.317

3. Reflective Random Indexing and indirect inference: a scalable method for discovery of implicit connections.

Authors: Trevor Cohen; Roger Schvaneveldt; Dominic Widdows
Journal: J Biomed Inform Date: 2009-09-15 Impact factor: 6.317

4. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors: Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2013-03-26 Impact factor: 4.497

5. Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs.

Authors: Bridget T McInnes; Ted Pedersen
Journal: J Biomed Inform Date: 2014-12-15 Impact factor: 6.317

6. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data.

Authors: Joshua C Denny; Lisa Bastarache; Marylyn D Ritchie; Robert J Carroll; Raquel Zink; Jonathan D Mosley; Julie R Field; Jill M Pulley; Andrea H Ramirez; Erica Bowton; Melissa A Basford; David S Carrell; Peggy L Peissig; Abel N Kho; Jennifer A Pacheco; Luke V Rasmussen; David R Crosslin; Paul K Crane; Jyotishman Pathak; Suzette J Bielinski; Sarah A Pendergrass; Hua Xu; Lucia A Hindorff; Rongling Li; Teri A Manolio; Christopher G Chute; Rex L Chisholm; Eric B Larson; Gail P Jarvik; Murray H Brilliant; Catherine A McCarty; Iftikhar J Kullo; Jonathan L Haines; Dana C Crawford; Daniel R Masys; Dan M Roden
Journal: Nat Biotechnol Date: 2013-12 Impact factor: 54.908

7. Surrogate-assisted feature extraction for high-throughput phenotyping.

Authors: Sheng Yu; Abhishek Chakrabortty; Katherine P Liao; Tianrun Cai; Ashwin N Ananthakrishnan; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai
Journal: J Am Med Inform Assoc Date: 2017-04-01 Impact factor: 4.497

8. Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.

Authors: Katherine P Liao; Ashwin N Ananthakrishnan; Vishesh Kumar; Zongqi Xia; Andrew Cagan; Vivian S Gainer; Sergey Goryachev; Pei Chen; Guergana K Savova; Denis Agniel; Susanne Churchill; Jaeyoung Lee; Shawn N Murphy; Robert M Plenge; Peter Szolovits; Isaac Kohane; Stanley Y Shaw; Elizabeth W Karlson; Tianxi Cai
Journal: PLoS One Date: 2015-08-24 Impact factor: 3.240

9. A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation.

Authors: Wenxin Ning; Ming Yu; Runtong Zhang
Journal: BMC Med Inform Decis Mak Date: 2016-03-03 Impact factor: 2.796

10. Development of phenotype algorithms using electronic medical records and incorporating natural language processing.

Authors: Katherine P Liao; Tianxi Cai; Guergana K Savova; Shawn N Murphy; Elizabeth W Karlson; Ashwin N Ananthakrishnan; Vivian S Gainer; Stanley Y Shaw; Zongqi Xia; Peter Szolovits; Susanne Churchill; Isaac Kohane
Journal: BMJ Date: 2015-04-24

6 in total

1. High-throughput phenotyping with temporal sequences.

Authors: Hossein Estiri; Zachary H Strasser; Shawn N Murphy
Journal: J Am Med Inform Assoc Date: 2021-03-18 Impact factor: 4.497

Review 2. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches.

Authors: Barbara M Decker; Chloé E Hill; Steven N Baldassano; Pouya Khankhanian
Journal: Seizure Date: 2021-01-13 Impact factor: 3.184

3. Automated ICD coding via unsupervised knowledge integration (UNITE).

Authors: Aaron Sonabend W; Winston Cai; Yuri Ahuja; Ashwin Ananthakrishnan; Zongqi Xia; Sheng Yu; Chuan Hong
Journal: Int J Med Inform Date: 2020-04-04 Impact factor: 4.730

4. What Every Reader Should Know About Studies Using Electronic Health Record Data but May Be Afraid to Ask.

Authors: Griffin M Weber; Tianxi Cai; Isaac S Kohane; Bruce J Aronow; Paul Avillach; Brett K Beaulieu-Jones; Riccardo Bellazzi; Robert L Bradford; Gabriel A Brat; Mario Cannataro; James J Cimino; Noelia García-Barrio; Nils Gehlenborg; Marzyeh Ghassemi; Alba Gutiérrez-Sacristán; David A Hanauer; John H Holmes; Chuan Hong; Jeffrey G Klann; Ne Hooi Will Loh; Yuan Luo; Kenneth D Mandl; Mohamad Daniar; Jason H Moore; Shawn N Murphy; Antoine Neuraz; Kee Yuan Ngiam; Gilbert S Omenn; Nathan Palmer; Lav P Patel; Miguel Pedrera-Jiménez; Piotr Sliz; Andrew M South; Amelia Li Min Tan; Deanne M Taylor; Bradley W Taylor; Carlo Torti; Andrew K Vallejos; Kavishwar B Wagholikar
Journal: J Med Internet Res Date: 2021-03-02 Impact factor: 7.076

5. Comparative analysis, applications, and interpretation of electronic health record-based stroke phenotyping methods.

Authors: Phyllis M Thangaraj; Benjamin R Kummer; Tal Lorberbaum; Mitchell S V Elkind; Nicholas P Tatonetti
Journal: BioData Min Date: 2020-12-07 Impact factor: 2.522

6. Generative transfer learning for measuring plausibility of EHR diagnosis records.

Authors: Hossein Estiri; Sebastien Vasey; Shawn N Murphy
Journal: J Am Med Inform Assoc Date: 2021-03-01 Impact factor: 4.497

6 in total