Literature DB >> 25929596

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Sheng Yu1, Katherine P Liao2, Stanley Y Shaw3, Vivian S Gainer4, Susanne E Churchill4, Peter Szolovits5, Shawn N Murphy6, Isaac S Kohane7, Tianxi Cai8.   

Abstract

OBJECTIVE: Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy.
MATERIALS AND METHODS: Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype.
RESULTS: The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. DISCUSSION: Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable.
CONCLUSION: The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping.
© The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2015        PMID: 25929596      PMCID: PMC4986664          DOI: 10.1093/jamia/ocv034

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  44 in total

1.  An evaluation of concept based latent semantic indexing for clinical information retrieval.

Authors:  C G Chute; Y Yang
Journal:  Proc Annu Symp Comput Appl Med Care       Date:  1992

2.  Patterns of cardiovascular risk in rheumatoid arthritis.

Authors:  D H Solomon; N J Goodson; J N Katz; M E Weinblatt; J Avorn; S Setoguchi; C Canning; S Schneeweiss
Journal:  Ann Rheum Dis       Date:  2006-06-22       Impact factor: 19.103

3.  Secondary use of EHR data for correlated comorbidity prevalence estimate.

Authors:  Srdjan B Stakic; Sanja Tasic
Journal:  Annu Int Conf IEEE Eng Med Biol Soc       Date:  2010

4.  A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification.

Authors:  Wenyu Jiang; Richard Simon
Journal:  Stat Med       Date:  2007-12-20       Impact factor: 2.373

5.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors:  Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2013-03-26       Impact factor: 4.497

6.  Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach.

Authors:  Ashwin N Ananthakrishnan; Tianxi Cai; Guergana Savova; Su-Chun Cheng; Pei Chen; Raul Guzman Perez; Vivian S Gainer; Shawn N Murphy; Peter Szolovits; Zongqi Xia; Stanley Shaw; Susanne Churchill; Elizabeth W Karlson; Isaac Kohane; Robert M Plenge; Katherine P Liao
Journal:  Inflamm Bowel Dis       Date:  2013-06       Impact factor: 5.325

7.  Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis.

Authors:  Robert J Carroll; Anne E Eyler; Joshua C Denny
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

8.  The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors:  Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal:  BMC Med Genomics       Date:  2011-01-26       Impact factor: 3.063

9.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.

Authors:  Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford
Journal:  Bioinformatics       Date:  2010-03-24       Impact factor: 6.937

Review 10.  Cardiovascular morbidity and mortality in rheumatoid arthritis.

Authors:  Sherine E Gabriel
Journal:  Am J Med       Date:  2008-10       Impact factor: 4.965

View more
  72 in total

1.  Trends in biomedical informatics: automated topic analysis of JAMIA articles.

Authors:  Dong Han; Shuang Wang; Chao Jiang; Xiaoqian Jiang; Hyeon-Eui Kim; Jimeng Sun; Lucila Ohno-Machado
Journal:  J Am Med Inform Assoc       Date:  2015-11       Impact factor: 4.497

2.  Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.

Authors:  Yuan Luo; Özlem Uzuner; Peter Szolovits
Journal:  Brief Bioinform       Date:  2016-02-05       Impact factor: 11.622

Review 3.  Unravelling the human genome-phenome relationship using phenome-wide association studies.

Authors:  William S Bush; Matthew T Oetjens; Dana C Crawford
Journal:  Nat Rev Genet       Date:  2016-02-15       Impact factor: 53.242

4.  High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors:  Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

5.  Integration of genetic and clinical information to improve imputation of data missing from electronic health records.

Authors:  Ruowang Li; Yong Chen; Jason H Moore
Journal:  J Am Med Inform Assoc       Date:  2019-10-01       Impact factor: 4.497

6.  Cohort selection for clinical trials using hierarchical neural network.

Authors:  Ying Xiong; Xue Shi; Shuai Chen; Dehuan Jiang; Buzhou Tang; Xiaolong Wang; Qingcai Chen; Jun Yan
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

7.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

8.  Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies.

Authors:  Majid Afshar; Dmitriy Dligach; Brihat Sharma; Xiaoyuan Cai; Jason Boyda; Steven Birch; Daniel Valdez; Suzan Zelisko; Cara Joyce; François Modave; Ron Price
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

9.  Automated disease cohort selection using word embeddings from Electronic Health Records.

Authors:  Benjamin S Glicksberg; Riccardo Miotto; Kipp W Johnson; Khader Shameer; Li Li; Rong Chen; Joel T Dudley
Journal:  Pac Symp Biocomput       Date:  2018

10.  Performing an Informatics Consult: Methods and Challenges.

Authors:  Alejandro Schuler; Alison Callahan; Kenneth Jung; Nigam H Shah
Journal:  J Am Coll Radiol       Date:  2018-02-13       Impact factor: 5.532

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.