Literature DB >> 27632993

Surrogate-assisted feature extraction for high-throughput phenotyping.

Sheng Yu1,2, Abhishek Chakrabortty3, Katherine P Liao4, Tianrun Cai5, Ashwin N Ananthakrishnan6, Vivian S Gainer7, Susanne E Churchill8, Peter Szolovits9, Shawn N Murphy7,10, Isaac S Kohane8, Tianxi Cai3.   

Abstract

OBJECTIVE: Phenotyping algorithms are capable of accurately identifying patients with specific phenotypes from within electronic medical records systems. However, developing phenotyping algorithms in a scalable way remains a challenge due to the extensive human resources required. This paper introduces a high-throughput unsupervised feature selection method, which improves the robustness and scalability of electronic medical record phenotyping without compromising its accuracy.
METHODS: The proposed Surrogate-Assisted Feature Extraction (SAFE) method selects candidate features from a pool of comprehensive medical concepts found in publicly available knowledge sources. The target phenotype's International Classification of Diseases, Ninth Revision and natural language processing counts, acting as noisy surrogates to the gold-standard labels, are used to create silver-standard labels. Candidate features highly predictive of the silver-standard labels are selected as the final features.
RESULTS: Algorithms were trained to identify patients with coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis using various numbers of labels to compare the performance of features selected by SAFE, a previously published automated feature extraction for phenotyping procedure, and domain experts. The out-of-sample area under the receiver operating characteristic curve and F -score from SAFE algorithms were remarkably higher than those from the other two, especially at small label sizes.
CONCLUSION: SAFE advances high-throughput phenotyping methods by automatically selecting a succinct set of informative features for algorithm training, which in turn reduces overfitting and the needed number of gold-standard labels. SAFE also potentially identifies important features missed by automated feature extraction for phenotyping or experts.
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Entities:  

Keywords:  data mining; electronic medical records; machine learning; phenotyping

Mesh:

Year:  2017        PMID: 27632993      PMCID: PMC6080726          DOI: 10.1093/jamia/ocw135

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  38 in total

1.  Deep phenotyping: The details of disease.

Authors:  Cathryn M Delude
Journal:  Nature       Date:  2015-11-05       Impact factor: 49.962

2.  Secondary use of EHR data for correlated comorbidity prevalence estimate.

Authors:  Srdjan B Stakic; Sanja Tasic
Journal:  Annu Int Conf IEEE Eng Med Biol Soc       Date:  2010

3.  The validity of ICD-9-CM codes in identifying postoperative deep vein thrombosis and pulmonary embolism.

Authors:  Chunliu Zhan; James Battles; Yen-Pin Chiang; David Hunt
Journal:  Jt Comm J Qual Patient Saf       Date:  2007-06

4.  Evaluation of the predictive value of ICD-9-CM coded administrative data for venous thromboembolism in the United States.

Authors:  Richard H White; Martina Garcia; Banafsheh Sadeghi; Daniel J Tancredi; Patricia Zrelak; Joanne Cuny; Pradeep Sama; Harriet Gammon; Stephen Schmaltz; Patrick S Romano
Journal:  Thromb Res       Date:  2010-04-28       Impact factor: 3.944

5.  The UMLS project: making the conceptual connection between users and the information they need.

Authors:  B L Humphreys; D A Lindberg
Journal:  Bull Med Libr Assoc       Date:  1993-04

6.  Validation of electronic health record phenotyping of bipolar disorder cases and controls.

Authors:  Victor M Castro; Jessica Minnier; Shawn N Murphy; Isaac Kohane; Susanne E Churchill; Vivian Gainer; Tianxi Cai; Alison G Hoffnagle; Yael Dai; Stefanie Block; Sydney R Weill; Mireya Nadal-Vicens; Alisha R Pollastri; J Niels Rosenquist; Sergey Goryachev; Dost Ongur; Pamela Sklar; Roy H Perlis; Jordan W Smoller
Journal:  Am J Psychiatry       Date:  2014-12-12       Impact factor: 18.112

7.  Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk.

Authors:  Marylyn D Ritchie; Joshua C Denny; Rebecca L Zuvich; Dana C Crawford; Jonathan S Schildcrout; Lisa Bastarache; Andrea H Ramirez; Jonathan D Mosley; Jill M Pulley; Melissa A Basford; Yuki Bradford; Luke V Rasmussen; Jyotishman Pathak; Christopher G Chute; Iftikhar J Kullo; Catherine A McCarty; Rex L Chisholm; Abel N Kho; Christopher S Carlson; Eric B Larson; Gail P Jarvik; Nona Sotoodehnia; Teri A Manolio; Rongling Li; Daniel R Masys; Jonathan L Haines; Dan M Roden
Journal:  Circulation       Date:  2013-03-05       Impact factor: 29.690

8.  QT interval and antidepressant use: a cross sectional study of electronic health records.

Authors:  Victor M Castro; Caitlin C Clements; Shawn N Murphy; Vivian S Gainer; Maurizio Fava; Jeffrey B Weilburg; Jane L Erb; Susanne E Churchill; Isaac S Kohane; Dan V Iosifescu; Jordan W Smoller; Roy H Perlis
Journal:  BMJ       Date:  2013-01-29

9.  Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts.

Authors:  Katherine P Liao; Ashwin N Ananthakrishnan; Vishesh Kumar; Zongqi Xia; Andrew Cagan; Vivian S Gainer; Sergey Goryachev; Pei Chen; Guergana K Savova; Denis Agniel; Susanne Churchill; Jaeyoung Lee; Shawn N Murphy; Robert M Plenge; Peter Szolovits; Isaac Kohane; Stanley Y Shaw; Elizabeth W Karlson; Tianxi Cai
Journal:  PLoS One       Date:  2015-08-24       Impact factor: 3.240

10.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing.

Authors:  Katherine P Liao; Tianxi Cai; Guergana K Savova; Shawn N Murphy; Elizabeth W Karlson; Ashwin N Ananthakrishnan; Vivian S Gainer; Stanley Y Shaw; Zongqi Xia; Peter Szolovits; Susanne Churchill; Isaac Kohane
Journal:  BMJ       Date:  2015-04-24
View more
  32 in total

1.  High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors:  Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

2.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

3.  Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies.

Authors:  Majid Afshar; Dmitriy Dligach; Brihat Sharma; Xiaoyuan Cai; Jason Boyda; Steven Birch; Daniel Valdez; Suzan Zelisko; Cara Joyce; François Modave; Ron Price
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

4.  Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records.

Authors:  Sizheng Steven Zhao; Chuan Hong; Tianrun Cai; Chang Xu; Jie Huang; Joerg Ermann; Nicola J Goodson; Daniel H Solomon; Tianxi Cai; Katherine P Liao
Journal:  Rheumatology (Oxford)       Date:  2020-05-01       Impact factor: 7.580

5.  Enabling phenotypic big data with PheNorm.

Authors:  Sheng Yu; Yumeng Ma; Jessica Gronsbell; Tianrun Cai; Ashwin N Ananthakrishnan; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Katherine P Liao; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2018-01-01       Impact factor: 4.497

6.  Development of an automated phenotyping algorithm for hepatorenal syndrome.

Authors:  Jejo D Koola; Sharon E Davis; Omar Al-Nimri; Sharidan K Parr; Daniel Fabbri; Bradley A Malin; Samuel B Ho; Michael E Matheny
Journal:  J Biomed Inform       Date:  2018-03-09       Impact factor: 6.317

Review 7.  Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing.

Authors:  A Névéol; P Zweigenbaum
Journal:  Yearb Med Inform       Date:  2017-09-11

8.  A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients.

Authors:  Lingjiao Zhang; Xiruo Ding; Yanyuan Ma; Naveen Muthu; Imran Ajmal; Jason H Moore; Daniel S Herman; Jinbo Chen
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

9.  sureLDA: A multidisease automated phenotyping method for the electronic health record.

Authors:  Yuri Ahuja; Doudou Zhou; Zeling He; Jiehuan Sun; Victor M Castro; Vivian Gainer; Shawn N Murphy; Chuan Hong; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2020-08-01       Impact factor: 4.497

10.  The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities.

Authors:  Lauren J Beesley; Maxwell Salvatore; Lars G Fritsche; Anita Pandit; Arvind Rao; Chad Brummett; Cristen J Willer; Lynda D Lisabeth; Bhramar Mukherjee
Journal:  Stat Med       Date:  2019-12-20       Impact factor: 2.373

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.