Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Literature DB >> 25929596

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Sheng Yu¹, Katherine P Liao², Stanley Y Shaw³, Vivian S Gainer⁴, Susanne E Churchill⁴, Peter Szolovits⁵, Shawn N Murphy⁶, Isaac S Kohane⁷, Tianxi Cai⁸.

Abstract

OBJECTIVE: Analysis of narrative (text) data from electronic health records (EHRs) can improve population-scale phenotyping for clinical and genetic research. Currently, selection of text features for phenotyping algorithms is slow and laborious, requiring extensive and iterative involvement by domain experts. This paper introduces a method to develop phenotyping algorithms in an unbiased manner by automatically extracting and selecting informative features, which can be comparable to expert-curated ones in classification accuracy.
MATERIALS AND METHODS: Comprehensive medical concepts were collected from publicly available knowledge sources in an automated, unbiased fashion. Natural language processing (NLP) revealed the occurrence patterns of these concepts in EHR narrative notes, which enabled selection of informative features for phenotype classification. When combined with additional codified features, a penalized logistic regression model was trained to classify the target phenotype.
RESULTS: The authors applied our method to develop algorithms to identify patients with rheumatoid arthritis and coronary artery disease cases among those with rheumatoid arthritis from a large multi-institutional EHR. The area under the receiver operating characteristic curves (AUC) for classifying RA and CAD using models trained with automated features were 0.951 and 0.929, respectively, compared to the AUCs of 0.938 and 0.929 by models trained with expert-curated features. DISCUSSION: Models trained with NLP text features selected through an unbiased, automated procedure achieved comparable or slightly higher accuracy than those trained with expert-curated features. The majority of the selected model features were interpretable.
CONCLUSION: The proposed automated feature extraction method, generating highly accurate phenotyping algorithms with improved efficiency, is a significant step toward high-throughput phenotyping.

Entities: Disease Species

Mesh：

Year: 2015 PMID： 25929596 PMCID： PMC4986664 DOI： 10.1093/jamia/ocv034

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

44 in total

1. An evaluation of concept based latent semantic indexing for clinical information retrieval.

Authors: C G Chute; Y Yang
Journal: Proc Annu Symp Comput Appl Med Care Date: 1992

2. Patterns of cardiovascular risk in rheumatoid arthritis.

Authors: D H Solomon; N J Goodson; J N Katz; M E Weinblatt; J Avorn; S Setoguchi; C Canning; S Schneeweiss
Journal: Ann Rheum Dis Date: 2006-06-22 Impact factor: 19.103

3. Secondary use of EHR data for correlated comorbidity prevalence estimate.

Authors: Srdjan B Stakic; Sanja Tasic
Journal: Annu Int Conf IEEE Eng Med Biol Soc Date: 2010

4. A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification.

Authors: Wenyu Jiang; Richard Simon
Journal: Stat Med Date: 2007-12-20 Impact factor: 2.373

5. Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network.

Authors: Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2013-03-26 Impact factor: 4.497

6. Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach.

Authors: Ashwin N Ananthakrishnan; Tianxi Cai; Guergana Savova; Su-Chun Cheng; Pei Chen; Raul Guzman Perez; Vivian S Gainer; Shawn N Murphy; Peter Szolovits; Zongqi Xia; Stanley Shaw; Susanne Churchill; Elizabeth W Karlson; Isaac Kohane; Robert M Plenge; Katherine P Liao
Journal: Inflamm Bowel Dis Date: 2013-06 Impact factor: 5.325

7. Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis.

Authors: Robert J Carroll; Anne E Eyler; Joshua C Denny
Journal: AMIA Annu Symp Proc Date: 2011-10-22

8. The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063

9. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.

Authors: Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford
Journal: Bioinformatics Date: 2010-03-24 Impact factor: 6.937

Review 10. Cardiovascular morbidity and mortality in rheumatoid arthritis.

Authors: Sherine E Gabriel
Journal: Am J Med Date: 2008-10 Impact factor: 4.965

72 in total

1. Trends in biomedical informatics: automated topic analysis of JAMIA articles.

Authors: Dong Han; Shuang Wang; Chao Jiang; Xiaoqian Jiang; Hyeon-Eui Kim; Jimeng Sun; Lucila Ohno-Machado
Journal: J Am Med Inform Assoc Date: 2015-11 Impact factor: 4.497

2. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.

Authors: Yuan Luo; Özlem Uzuner; Peter Szolovits
Journal: Brief Bioinform Date: 2016-02-05 Impact factor: 11.622

Review 3. Unravelling the human genome-phenome relationship using phenome-wide association studies.

Authors: William S Bush; Matthew T Oetjens; Dana C Crawford
Journal: Nat Rev Genet Date: 2016-02-15 Impact factor: 53.242

4. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors: Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

5. Integration of genetic and clinical information to improve imputation of data missing from electronic health records.

Authors: Ruowang Li; Yong Chen; Jason H Moore
Journal: J Am Med Inform Assoc Date: 2019-10-01 Impact factor: 4.497

6. Cohort selection for clinical trials using hierarchical neural network.

Authors: Ying Xiong; Xue Shi; Shuai Chen; Dehuan Jiang; Buzhou Tang; Xiaolong Wang; Qingcai Chen; Jun Yan
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

7. Feature extraction for phenotyping from semantic and knowledge resources.

Authors: Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal: J Biomed Inform Date: 2019-02-07 Impact factor: 6.317

8. Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies.

Authors: Majid Afshar; Dmitriy Dligach; Brihat Sharma; Xiaoyuan Cai; Jason Boyda; Steven Birch; Daniel Valdez; Suzan Zelisko; Cara Joyce; François Modave; Ron Price
Journal: J Am Med Inform Assoc Date: 2019-11-01 Impact factor: 4.497

9. Automated disease cohort selection using word embeddings from Electronic Health Records.

Authors: Benjamin S Glicksberg; Riccardo Miotto; Kipp W Johnson; Khader Shameer; Li Li; Rong Chen; Joel T Dudley
Journal: Pac Symp Biocomput Date: 2018

10. Performing an Informatics Consult: Methods and Challenges.

Authors: Alejandro Schuler; Alison Callahan; Kenneth Jung; Nigam H Shah
Journal: J Am Coll Radiol Date: 2018-02-13 Impact factor: 5.532