Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

Literature DB >> 23851443

Applying active learning to high-throughput phenotyping algorithms for electronic health records data.

Yukun Chen¹, Robert J Carroll, Eugenia R McPeek Hinz, Anushi Shah, Anne E Eyler, Joshua C Denny, Hua Xu.

Abstract

OBJECTIVES: Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms.
METHODS: We integrated an uncertainty sampling AL approach with support vector machines-based phenotyping algorithms and evaluated its performance using three annotated disease cohorts including rheumatoid arthritis (RA), colorectal cancer (CRC), and venous thromboembolism (VTE). We investigated performance using two types of feature sets: unrefined features, which contained at least all clinical concepts extracted from notes and billing codes; and a smaller set of refined features selected by domain experts. The performance of the AL was compared with a passive learning (PL) approach based on random sampling.
RESULTS: Our evaluation showed that AL outperformed PL on three phenotyping tasks. When unrefined features were used in the RA and CRC tasks, AL reduced the number of annotated samples required to achieve an area under the curve (AUC) score of 0.95 by 68% and 23%, respectively. AL also achieved a reduction of 68% for VTE with an optimal AUC of 0.70 using refined features. As expected, refined features improved the performance of phenotyping classifiers and required fewer annotated samples.
CONCLUSIONS: This study demonstrated that AL can be useful in ML-based phenotyping methods. Moreover, AL and feature engineering based on domain knowledge could be combined to develop efficient and generalizable phenotyping methods.

Entities: Chemical Disease

Keywords: Active Learning; Electronic Health Records; Machine Learning; Natural Language Processing; Phenotyping Algorithm

Mesh：

Year: 2013 PMID： 23851443 PMCID： PMC3861916 DOI： 10.1136/amiajnl-2013-001945

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

29 in total

1. "Understanding" medical school curriculum content using KnowledgeMap.

Authors: Joshua C Denny; Jeffrey D Smithers; Randolph A Miller; Anderson Spickard
Journal: J Am Med Inform Assoc Date: 2003-03-28 Impact factor: 4.497

2. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports.

Authors: George Hripcsak; John H M Austin; Philip O Alderson; Carol Friedman
Journal: Radiology Date: 2002-07 Impact factor: 11.105

3. Active learning with support vector machine applied to gene expression data for cancer classification.

Authors: Ying Liu
Journal: J Chem Inf Comput Sci Date: 2004 Nov-Dec

Review 4. Extracting information from textual documents in the electronic health record: a review of recent research.

Authors: S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal: Yearb Med Inform Date: 2008

5. Comparing ICD9-encoded diagnoses and NLP-processed discharge summaries for clinical trials pre-screening: a case study.

Authors: Li Li; Herbert S Chase; Chintan O Patel; Carol Friedman; Chunhua Weng
Journal: AMIA Annu Symp Proc Date: 2008-11-06

6. A general natural-language text processor for clinical radiology.

Authors: C Friedman; P O Alderson; J H Austin; J J Cimino; S B Johnson
Journal: J Am Med Inform Assoc Date: 1994 Mar-Apr Impact factor: 4.497

7. Management of venous thromboembolism: a clinical practice guideline from the American College of Physicians and the American Academy of Family Physicians.

Authors: Vincenza Snow; Amir Qaseem; Patricia Barry; E Rodney Hornbake; Jonathan E Rodnick; Timothy Tobolic; Belinda Ireland; Jodi Segal; Eric Bass; Kevin B Weiss; Lee Green; Douglas K Owens
Journal: Ann Fam Med Date: 2007 Jan-Feb Impact factor: 5.166

8. Unlocking clinical data from narrative reports: a study of natural language processing.

Authors: G Hripcsak; C Friedman; P O Alderson; W DuMouchel; S B Johnson; P D Clayton
Journal: Ann Intern Med Date: 1995-05-01 Impact factor: 25.391

9. Chapter 13: Mining electronic health records in the genomics era.

Authors: Joshua C Denny
Journal: PLoS Comput Biol Date: 2012-12-27 Impact factor: 4.475

10. Next-generation phenotyping of electronic health records.

Authors: George Hripcsak; David J Albers
Journal: J Am Med Inform Assoc Date: 2012-09-06 Impact factor: 4.497

44 in total

1. A systematic comparison of feature space effects on disease classifier performance for phenotype identification of five diseases.

Authors: Christopher Kotfila; Özlem Uzuner
Journal: J Biomed Inform Date: 2015-08-01 Impact factor: 6.317

2. Text classification for assisting moderators in online health communities.

Authors: Jina Huh; Meliha Yetisgen-Yildiz; Wanda Pratt
Journal: J Biomed Inform Date: 2013-09-08 Impact factor: 6.317

3. Electronic health records-driven phenotyping: challenges, recent advances, and perspectives.

Authors: Jyotishman Pathak; Abel N Kho; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2013-12 Impact factor: 4.497

4. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors: Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497