Literature DB >> 27174893

Learning statistical models of phenotypes using noisy labeled training data.

Vibhu Agarwal1, Tanya Podchiyska2, Juan M Banda3, Veena Goel4,5, Tiffany I Leung6, Evan P Minty2,7, Timothy E Sweeney2,8, Elsie Gyang9, Nigam H Shah3.   

Abstract

OBJECTIVE: Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record.
METHODS: We use a list of keywords specific to the phenotype of interest to generate noisy labeled training data. We train L1 penalized logistic regression models for a chronic and an acute disease and evaluate the performance of the models against a gold standard.
RESULTS: Our models for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.90, 0.89, and 0.86, 0.89, respectively. Local implementations of the previously validated rule-based definitions for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.96, 0.92 and 0.84, 0.87, respectively.We have demonstrated feasibility of learning phenotype models using imperfectly labeled data for a chronic and acute phenotype. Further research in feature engineering and in specification of the keyword list can improve the performance of the models and the scalability of the approach.
CONCLUSIONS: Our method provides an alternative to manual labeling for creating training sets for statistical models of phenotypes. Such an approach can accelerate research with large observational healthcare datasets and may also be used to create local phenotype models.
© The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  Electronic health record; high throughput; machine learning; noisy labels; phenotyping

Mesh:

Year:  2016        PMID: 27174893      PMCID: PMC5070523          DOI: 10.1093/jamia/ocw028

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  39 in total

Review 1.  Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis.

Authors:  Robert J Carroll; Anne E Eyler; Joshua C Denny
Journal:  Expert Rev Clin Immunol       Date:  2015-02-08       Impact factor: 4.473

2.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors:  Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2011-04-20       Impact factor: 4.497

3.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors:  Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2013-03-26       Impact factor: 4.497

4.  Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.

Authors:  Wei-Qi Wei; Cynthia L Leibson; Jeanine E Ransom; Abel N Kho; Pedro J Caraballo; High Seng Chai; Barbara P Yawn; Jennifer A Pacheco; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2012-01-16       Impact factor: 4.497

5.  Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Authors:  Sheng Yu; Katherine P Liao; Stanley Y Shaw; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2015-04-29       Impact factor: 4.497

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

7.  The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors:  Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal:  BMC Med Genomics       Date:  2011-01-26       Impact factor: 3.063

8.  Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium.

Authors:  Jyotishman Pathak; Kent R Bailey; Calvin E Beebe; Steven Bethard; David C Carrell; Pei J Chen; Dmitriy Dligach; Cory M Endle; Lacey A Hart; Peter J Haug; Stanley M Huff; Vinod C Kaggal; Dingcheng Li; Hongfang Liu; Kyle Marchant; James Masanz; Timothy Miller; Thomas A Oniki; Martha Palmer; Kevin J Peterson; Susan Rea; Guergana K Savova; Craig R Stancl; Sunghwan Sohn; Harold R Solbrig; Dale B Suesse; Cui Tao; David P Taylor; Les Westberg; Stephen Wu; Ning Zhuo; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2013-11-04       Impact factor: 4.497

9.  Extracting research-quality phenotypes from electronic health records to support precision medicine.

Authors:  Wei-Qi Wei; Joshua C Denny
Journal:  Genome Med       Date:  2015-04-30       Impact factor: 11.117

Review 10.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future.

Authors:  Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams
Journal:  Genet Med       Date:  2013-06-06       Impact factor: 8.822

View more
  47 in total

1.  High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors:  Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

2.  PheValuator: Development and evaluation of a phenotype algorithm evaluator.

Authors:  Joel N Swerdel; George Hripcsak; Patrick B Ryan
Journal:  J Biomed Inform       Date:  2019-07-29       Impact factor: 6.317

3.  A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop.

Authors:  Curtis P Langlotz; Bibb Allen; Bradley J Erickson; Jayashree Kalpathy-Cramer; Keith Bigelow; Tessa S Cook; Adam E Flanders; Matthew P Lungren; David S Mendelson; Jeffrey D Rudie; Ge Wang; Krishna Kandarpa
Journal:  Radiology       Date:  2019-04-16       Impact factor: 11.105

4.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

5.  Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Authors:  Neil R Smalheiser; Aaron M Cohen
Journal:  Data Inf Manag       Date:  2018-05-22

6.  Using and improving distributed data networks to generate actionable evidence: the case of real-world outcomes in the Food and Drug Administration's Sentinel system.

Authors:  Jeffrey S Brown; Judith C Maro; Michael Nguyen; Robert Ball
Journal:  J Am Med Inform Assoc       Date:  2020-05-01       Impact factor: 4.497

7.  Enabling phenotypic big data with PheNorm.

Authors:  Sheng Yu; Yumeng Ma; Jessica Gronsbell; Tianrun Cai; Ashwin N Ananthakrishnan; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Katherine P Liao; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2018-01-01       Impact factor: 4.497

8.  Machine learning for phenotyping opioid overdose events.

Authors:  Jonathan Badger; Eric LaRose; John Mayer; Fereshteh Bashiri; David Page; Peggy Peissig
Journal:  J Biomed Inform       Date:  2019-04-25       Impact factor: 6.317

9.  Automated disease cohort selection using word embeddings from Electronic Health Records.

Authors:  Benjamin S Glicksberg; Riccardo Miotto; Kipp W Johnson; Khader Shameer; Li Li; Rong Chen; Joel T Dudley
Journal:  Pac Symp Biocomput       Date:  2018

10.  Performing an Informatics Consult: Methods and Challenges.

Authors:  Alejandro Schuler; Alison Callahan; Kenneth Jung; Nigam H Shah
Journal:  J Am Coll Radiol       Date:  2018-02-13       Impact factor: 5.532

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.