Literature DB >> 29126253

Enabling phenotypic big data with PheNorm.

Sheng Yu1,2, Yumeng Ma3, Jessica Gronsbell4, Tianrun Cai5, Ashwin N Ananthakrishnan6, Vivian S Gainer7, Susanne E Churchill8, Peter Szolovits9, Shawn N Murphy7,10, Isaac S Kohane8, Katherine P Liao11, Tianxi Cai4.   

Abstract

Objective: Electronic health record (EHR)-based phenotyping infers whether a patient has a disease based on the information in his or her EHR. A human-annotated training set with gold-standard disease status labels is usually required to build an algorithm for phenotyping based on a set of predictive features. The time intensiveness of annotation and feature curation severely limits the ability to achieve high-throughput phenotyping. While previous studies have successfully automated feature curation, annotation remains a major bottleneck. In this paper, we present PheNorm, a phenotyping algorithm that does not require expert-labeled samples for training.
Methods: The most predictive features, such as the number of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes or mentions of the target phenotype, are normalized to resemble a normal mixture distribution with high area under the receiver operating curve (AUC) for prediction. The transformed features are then denoised and combined into a score for accurate disease classification.
Results: We validated the accuracy of PheNorm with 4 phenotypes: coronary artery disease, rheumatoid arthritis, Crohn's disease, and ulcerative colitis. The AUCs of the PheNorm score reached 0.90, 0.94, 0.95, and 0.94 for the 4 phenotypes, respectively, which were comparable to the accuracy of supervised algorithms trained with sample sizes of 100-300, with no statistically significant difference.
Conclusion: The accuracy of the PheNorm algorithms is on par with algorithms trained with annotated samples. PheNorm fully automates the generation of accurate phenotyping algorithms and demonstrates the capacity for EHR-driven annotations to scale to the next level - phenotypic big data.
© The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Entities:  

Keywords:  electronic health records; high-throughput phenotyping; phenotypic big data; precision medicine

Mesh:

Substances:

Year:  2018        PMID: 29126253      PMCID: PMC6251688          DOI: 10.1093/jamia/ocx111

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  42 in total

1.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.

Authors:  A R Aronson
Journal:  Proc AMIA Symp       Date:  2001

2.  Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach.

Authors:  Ashwin N Ananthakrishnan; Tianxi Cai; Guergana Savova; Su-Chun Cheng; Pei Chen; Raul Guzman Perez; Vivian S Gainer; Shawn N Murphy; Peter Szolovits; Zongqi Xia; Stanley Shaw; Susanne Churchill; Elizabeth W Karlson; Isaac Kohane; Robert M Plenge; Katherine P Liao
Journal:  Inflamm Bowel Dis       Date:  2013-06       Impact factor: 5.325

3.  Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.

Authors:  Sheng Yu; Katherine P Liao; Stanley Y Shaw; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2015-04-29       Impact factor: 4.497

4.  Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing.

Authors:  Sheng Yu; Kanako K Kumamaru; Elizabeth George; Ruth M Dunne; Arash Bedayat; Matey Neykov; Andetta R Hunsaker; Karin E Dill; Tianxi Cai; Frank J Rybicki
Journal:  J Biomed Inform       Date:  2014-08-10       Impact factor: 6.317

5.  Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis.

Authors:  Robert J Carroll; Anne E Eyler; Joshua C Denny
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

6.  The eMERGE Network: a consortium of biorepositories linked to electronic medical records data for conducting genomic studies.

Authors:  Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf
Journal:  BMC Med Genomics       Date:  2011-01-26       Impact factor: 3.063

7.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations.

Authors:  Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford
Journal:  Bioinformatics       Date:  2010-03-24       Impact factor: 6.937

8.  Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods.

Authors:  Rachel L Richesson; Jimeng Sun; Jyotishman Pathak; Abel N Kho; Joshua C Denny
Journal:  Artif Intell Med       Date:  2016-06-25       Impact factor: 5.326

9.  Validation of electronic health record phenotyping of bipolar disorder cases and controls.

Authors:  Victor M Castro; Jessica Minnier; Shawn N Murphy; Isaac Kohane; Susanne E Churchill; Vivian Gainer; Tianxi Cai; Alison G Hoffnagle; Yael Dai; Stefanie Block; Sydney R Weill; Mireya Nadal-Vicens; Alisha R Pollastri; J Niels Rosenquist; Sergey Goryachev; Dost Ongur; Pamela Sklar; Roy H Perlis; Jordan W Smoller
Journal:  Am J Psychiatry       Date:  2014-12-12       Impact factor: 18.112

10.  Surrogate-assisted feature extraction for high-throughput phenotyping.

Authors:  Sheng Yu; Abhishek Chakrabortty; Katherine P Liao; Tianrun Cai; Ashwin N Ananthakrishnan; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2017-04-01       Impact factor: 4.497

View more
  30 in total

1.  High-throughput multimodal automated phenotyping (MAP) with application to PheWAS.

Authors:  Katherine P Liao; Jiehuan Sun; Tianrun A Cai; Nicholas Link; Chuan Hong; Jie Huang; Jennifer E Huffman; Jessica Gronsbell; Yichi Zhang; Yuk-Lam Ho; Victor Castro; Vivian Gainer; Shawn N Murphy; Christopher J O'Donnell; J Michael Gaziano; Kelly Cho; Peter Szolovits; Isaac S Kohane; Sheng Yu; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

2.  Feature extraction for phenotyping from semantic and knowledge resources.

Authors:  Wenxin Ning; Stephanie Chan; Andrew Beam; Ming Yu; Alon Geva; Katherine Liao; Mary Mullen; Kenneth D Mandl; Isaac Kohane; Tianxi Cai; Sheng Yu
Journal:  J Biomed Inform       Date:  2019-02-07       Impact factor: 6.317

3.  Development and application of a high throughput natural language processing architecture to convert all clinical documents in a clinical data warehouse into standardized medical vocabularies.

Authors:  Majid Afshar; Dmitriy Dligach; Brihat Sharma; Xiaoyuan Cai; Jason Boyda; Steven Birch; Daniel Valdez; Suzan Zelisko; Cara Joyce; François Modave; Ron Price
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

4.  A Review of Challenges and Opportunities in Machine Learning for Health.

Authors:  Marzyeh Ghassemi; Tristan Naumann; Peter Schulam; Andrew L Beam; Irene Y Chen; Rajesh Ranganath
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2020-05-30

5.  sureLDA: A multidisease automated phenotyping method for the electronic health record.

Authors:  Yuri Ahuja; Doudou Zhou; Zeling He; Jiehuan Sun; Victor M Castro; Vivian Gainer; Shawn N Murphy; Chuan Hong; Tianxi Cai
Journal:  J Am Med Inform Assoc       Date:  2020-08-01       Impact factor: 4.497

6.  The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities.

Authors:  Lauren J Beesley; Maxwell Salvatore; Lars G Fritsche; Anita Pandit; Arvind Rao; Chad Brummett; Cristen J Willer; Lynda D Lisabeth; Bhramar Mukherjee
Journal:  Stat Med       Date:  2019-12-20       Impact factor: 2.373

7.  Stratifying risk for dementia onset using large-scale electronic health record data: A retrospective cohort study.

Authors:  Thomas H McCoy; Larry Han; Amelia M Pellegrini; Rudolph E Tanzi; Sabina Berretta; Roy H Perlis
Journal:  Alzheimers Dement       Date:  2020-01-16       Impact factor: 21.566

8.  Facilitating phenotype transfer using a common data model.

Authors:  George Hripcsak; Ning Shang; Peggy L Peissig; Luke V Rasmussen; Cong Liu; Barbara Benoit; Robert J Carroll; David S Carrell; Joshua C Denny; Ozan Dikilitas; Vivian S Gainer; Kayla Marie Howell; Jeffrey G Klann; Iftikhar J Kullo; Todd Lingren; Frank D Mentch; Shawn N Murphy; Karthik Natarajan; Jennifer A Pacheco; Wei-Qi Wei; Ken Wiley; Chunhua Weng
Journal:  J Biomed Inform       Date:  2019-07-17       Impact factor: 6.317

9.  Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors:  David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

10.  Comparison of the cohort selection performance of Australian Medicines Terminology to Anatomical Therapeutic Chemical mappings.

Authors:  Guan N Guo; Jitendra Jonnagaddala; Sanjay Farshid; Vojtech Huser; Christian Reich; Siaw-Teng Liaw
Journal:  J Am Med Inform Assoc       Date:  2019-11-01       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.