Literature DB >> 23851443

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

Yukun Chen1, Robert J Carroll, Eugenia R McPeek Hinz, Anushi Shah, Anne E Eyler, Joshua C Denny, Hua Xu.   

Abstract

OBJECTIVES: Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms.
METHODS: We integrated an uncertainty sampling AL approach with support vector machines-based phenotyping algorithms and evaluated its performance using three annotated disease cohorts including rheumatoid arthritis (RA), colorectal cancer (CRC), and venous thromboembolism (VTE). We investigated performance using two types of feature sets: unrefined features, which contained at least all clinical concepts extracted from notes and billing codes; and a smaller set of refined features selected by domain experts. The performance of the AL was compared with a passive learning (PL) approach based on random sampling.
RESULTS: Our evaluation showed that AL outperformed PL on three phenotyping tasks. When unrefined features were used in the RA and CRC tasks, AL reduced the number of annotated samples required to achieve an area under the curve (AUC) score of 0.95 by 68% and 23%, respectively. AL also achieved a reduction of 68% for VTE with an optimal AUC of 0.70 using refined features. As expected, refined features improved the performance of phenotyping classifiers and required fewer annotated samples.
CONCLUSIONS: This study demonstrated that AL can be useful in ML-based phenotyping methods. Moreover, AL and feature engineering based on domain knowledge could be combined to develop efficient and generalizable phenotyping methods.

Entities:  

Keywords:  Active Learning; Electronic Health Records; Machine Learning; Natural Language Processing; Phenotyping Algorithm

Mesh:

Year:  2013        PMID: 23851443      PMCID: PMC3861916          DOI: 10.1136/amiajnl-2013-001945

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  29 in total

1.  "Understanding" medical school curriculum content using KnowledgeMap.

Authors:  Joshua C Denny; Jeffrey D Smithers; Randolph A Miller; Anderson Spickard
Journal:  J Am Med Inform Assoc       Date:  2003-03-28       Impact factor: 4.497

2.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports.

Authors:  George Hripcsak; John H M Austin; Philip O Alderson; Carol Friedman
Journal:  Radiology       Date:  2002-07       Impact factor: 11.105

3.  Active learning with support vector machine applied to gene expression data for cancer classification.

Authors:  Ying Liu
Journal:  J Chem Inf Comput Sci       Date:  2004 Nov-Dec

Review 4.  Extracting information from textual documents in the electronic health record: a review of recent research.

Authors:  S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal:  Yearb Med Inform       Date:  2008

5.  Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study.

Authors:  Li Li; Herbert S Chase; Chintan O Patel; Carol Friedman; Chunhua Weng
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

6.  A general natural-language text processor for clinical radiology.

Authors:  C Friedman; P O Alderson; J H Austin; J J Cimino; S B Johnson
Journal:  J Am Med Inform Assoc       Date:  1994 Mar-Apr       Impact factor: 4.497

7.  Management of venous thromboembolism: a clinical practice guideline from the American College of Physicians and the American Academy of Family Physicians.

Authors:  Vincenza Snow; Amir Qaseem; Patricia Barry; E Rodney Hornbake; Jonathan E Rodnick; Timothy Tobolic; Belinda Ireland; Jodi Segal; Eric Bass; Kevin B Weiss; Lee Green; Douglas K Owens
Journal:  Ann Fam Med       Date:  2007 Jan-Feb       Impact factor: 5.166

8.  Unlocking clinical data from narrative reports: a study of natural language processing.

Authors:  G Hripcsak; C Friedman; P O Alderson; W DuMouchel; S B Johnson; P D Clayton
Journal:  Ann Intern Med       Date:  1995-05-01       Impact factor: 25.391

9.  Chapter 13: Mining electronic health records in the genomics era.

Authors:  Joshua C Denny
Journal:  PLoS Comput Biol       Date:  2012-12-27       Impact factor: 4.475

10.  Next-generation phenotyping of electronic health records.

Authors:  George Hripcsak; David J Albers
Journal:  J Am Med Inform Assoc       Date:  2012-09-06       Impact factor: 4.497

View more
  44 in total

1.  A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases.

Authors:  Christopher Kotfila; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-08-01       Impact factor: 6.317

2.  Text classification for assisting moderators in online health communities.

Authors:  Jina Huh; Meliha Yetisgen-Yildiz; Wanda Pratt
Journal:  J Biomed Inform       Date:  2013-09-08       Impact factor: 6.317

3.  Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.

Authors:  Jyotishman Pathak; Abel N Kho; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2013-12       Impact factor: 4.497

4.  High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors:  Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

5.  Modeling asynchronous event sequences with RNNs.

Authors:  Stephen Wu; Sijia Liu; Sunghwan Sohn; Sungrim Moon; Chung-Il Wi; Young Juhn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2018-06-05       Impact factor: 6.317

6.  Phenotype Concept Set Construction from Concept Pair Likelihoods.

Authors:  Victor A Rodriguez; Sun Tony; Phyllis Thangaraj; Chao Pang; Krishna S Kalluri; Xinzhuo Jiang; Anna Ostropolets; Chen RuiJun; Natarajan Karthik; Patrick Ryan
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

7.  An Empirical Study for Impacts of Measurement Errors on EHR based Association Studies.

Authors:  Rui Duan; Ming Cao; Yonghui Wu; Jing Huang; Joshua C Denny; Hua Xu; Yong Chen
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

8.  Algorithm to detect pediatric provider attention to high BMI and associated medical risk.

Authors:  Christy B Turer; Celette S Skinner; Sarah E Barlow
Journal:  J Am Med Inform Assoc       Date:  2019-01-01       Impact factor: 4.497

9.  Clinical research informatics and electronic health record data.

Authors:  R L Richesson; M M Horvath; S A Rusincovitch
Journal:  Yearb Med Inform       Date:  2014-08-15

10.  A study of active learning methods for named entity recognition in clinical text.

Authors:  Yukun Chen; Thomas A Lasko; Qiaozhu Mei; Joshua C Denny; Hua Xu
Journal:  J Biomed Inform       Date:  2015-09-15       Impact factor: 6.317

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.