Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Learning statistical models of phenotypes using noisy labeled training data.

Literature DB >> 27174893

Learning statistical models of phenotypes using noisy labeled training data.

Vibhu Agarwal¹, Tanya Podchiyska², Juan M Banda³, Veena Goel^4,5, Tiffany I Leung⁶, Evan P Minty^2,7, Timothy E Sweeney^2,8, Elsie Gyang⁹, Nigam H Shah³.

Abstract

OBJECTIVE: Traditionally, patient groups with a phenotype are selected through rule-based definitions whose creation and validation are time-consuming. Machine learning approaches to electronic phenotyping are limited by the paucity of labeled training datasets. We demonstrate the feasibility of utilizing semi-automatically labeled training sets to create phenotype models via machine learning, using a comprehensive representation of the patient medical record.
METHODS: We use a list of keywords specific to the phenotype of interest to generate noisy labeled training data. We train L1 penalized logistic regression models for a chronic and an acute disease and evaluate the performance of the models against a gold standard.
RESULTS: Our models for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.90, 0.89, and 0.86, 0.89, respectively. Local implementations of the previously validated rule-based definitions for Type 2 diabetes mellitus and myocardial infarction achieve precision and accuracy of 0.96, 0.92 and 0.84, 0.87, respectively.We have demonstrated feasibility of learning phenotype models using imperfectly labeled data for a chronic and acute phenotype. Further research in feature engineering and in specification of the keyword list can improve the performance of the models and the scalability of the approach.
CONCLUSIONS: Our method provides an alternative to manual labeling for creating training sets for statistical models of phenotypes. Such an approach can accelerate research with large observational healthcare datasets and may also be used to create local phenotype models.

Entities: Disease Species

Keywords: Electronic health record; high throughput; machine learning; noisy labels; phenotyping

Mesh：

Year: 2016 PMID： 27174893 PMCID： PMC5070523 DOI： 10.1093/jamia/ocw028

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

39 in total

Review 1. Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis.

Authors: Robert J Carroll; Anne E Eyler; Joshua C Denny
Journal: Expert Rev Clin Immunol Date: 2015-02-08 Impact factor: 4.473

2. A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

Authors: Min Jiang; Yukun Chen; Mei Liu; S Trent Rosenbloom; Subramani Mani; Joshua C Denny; Hua Xu
Journal: J Am Med Inform Assoc Date: 2011-04-20 Impact factor: 4.497

3. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors: Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2013-03-26 Impact factor: 4.497

4. Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus.

Authors: Wei-Qi Wei; Cynthia L Leibson; Jeanine E Ransom; Abel N Kho; Pedro J Caraballo; High Seng Chai; Barbara P Yawn; Jennifer A Pacheco; Christopher G Chute
Journal: J Am Med Inform Assoc Date: 2012-01-16 Impact factor: 4.497

5. Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Authors: Sheng Yu; Katherine P Liao; Stanley Y Shaw; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai
Journal: J Am Med Inform Assoc Date: 2015-04-29 Impact factor: 4.497

6. Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors: Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal: J Stat Softw Date: 2010 Impact factor: 6.440

7. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063

8. Normalization and standardization of electronic health records for high-throughput phenotyping: the SHARPn consortium.

Authors: Jyotishman Pathak; Kent R Bailey; Calvin E Beebe; Steven Bethard; David C Carrell; Pei J Chen; Dmitriy Dligach; Cory M Endle; Lacey A Hart; Peter J Haug; Stanley M Huff; Vinod C Kaggal; Dingcheng Li; Hongfang Liu; Kyle Marchant; James Masanz; Timothy Miller; Thomas A Oniki; Martha Palmer; Kevin J Peterson; Susan Rea; Guergana K Savova; Craig R Stancl; Sunghwan Sohn; Harold R Solbrig; Dale B Suesse; Cui Tao; David P Taylor; Les Westberg; Stephen Wu; Ning Zhuo; Christopher G Chute
Journal: J Am Med Inform Assoc Date: 2013-11-04 Impact factor: 4.497

9. Extracting research-quality phenotypes from electronic health records to support precision medicine.

Authors: Wei-Qi Wei; Joshua C Denny
Journal: Genome Med Date: 2015-04-30 Impact factor: 11.117

Review 10. The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future.

Authors: Omri Gottesman; Helena Kuivaniemi; Gerard Tromp; W Andrew Faucett; Rongling Li; Teri A Manolio; Saskia C Sanderson; Joseph Kannry; Randi Zinberg; Melissa A Basford; Murray Brilliant; David J Carey; Rex L Chisholm; Christopher G Chute; John J Connolly; David Crosslin; Joshua C Denny; Carlos J Gallego; Jonathan L Haines; Hakon Hakonarson; John Harley; Gail P Jarvik; Isaac Kohane; Iftikhar J Kullo; Eric B Larson; Catherine McCarty; Marylyn D Ritchie; Dan M Roden; Maureen E Smith; Erwin P Böttinger; Marc S Williams
Journal: Genet Med Date: 2013-06-06 Impact factor: 8.822

47 in total

1. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors: Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

2. PheValuator: Development and evaluation of a phenotype algorithm evaluator.

Authors: Joel N Swerdel; George Hripcsak; Patrick B Ryan
Journal: J Biomed Inform Date: 2019-07-29 Impact factor: 6.317

3. A Roadmap for Foundational Research on Artificial Intelligence in Medical Imaging: From the 2018 NIH/RSNA/ACR/The Academy Workshop.

Authors: Curtis P Langlotz; Bibb Allen; Bradley J Erickson; Jayashree Kalpathy-Cramer; Keith Bigelow; Tessa S Cook; Adam E Flanders; Matthew P Lungren; David S Mendelson; Jeffrey D Rudie; Ge Wang; Krishna Kandarpa
Journal: Radiology Date: 2019-04-16 Impact factor: 11.105

4. Feature extraction for phenotyping from semantic and knowledge resources.

Authors: Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal: J Biomed Inform Date: 2019-02-07 Impact factor: 6.317

5. Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Authors: Neil R Smalheiser; Aaron M Cohen
Journal: Data Inf Manag Date: 2018-05-22

6. Using and improving distributed data networks to generate actionable evidence: the case of real-world outcomes in the Food and Drug Administration's Sentinel system.

Authors: Jeffrey S Brown; Judith C Maro; Michael Nguyen; Robert Ball
Journal: J Am Med Inform Assoc Date: 2020-05-01 Impact factor: 4.497

7. Enabling phenotypic big data with PheNorm.

Authors: Sheng Yu; Yumeng Ma; Jessica Gronsbell; Tianrun Cai; Ashwin N Ananthakrishnan; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Katherine P Liao; Tianxi Cai
Journal: J Am Med Inform Assoc Date: 2018-01-01 Impact factor: 4.497