Literature DB >> 32012059

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning.

Emily R Pfaff1, Miles Crosskey2, Kenneth Morton2, Ashok Krishnamurthy3.   

Abstract

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient's medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning-based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning-based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods. ©Emily R Pfaff, Miles Crosskey, Kenneth Morton, Ashok Krishnamurthy. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 24.01.2020.

Entities:  

Keywords:  electronic health records; machine learning; natural language processing

Year:  2020        PMID: 32012059      PMCID: PMC7007592          DOI: 10.2196/16042

Source DB:  PubMed          Journal:  JMIR Med Inform


  16 in total

1.  Importance of multi-modal approaches to effectively identify cataract cases from electronic health records.

Authors:  Peggy L Peissig; Luke V Rasmussen; Richard L Berg; James G Linneman; Catherine A McCarty; Carol Waudby; Lin Chen; Joshua C Denny; Russell A Wilke; Jyotishman Pathak; David Carrell; Abel N Kho; Justin B Starren
Journal:  J Am Med Inform Assoc       Date:  2012 Mar-Apr       Impact factor: 4.497

2.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.

Authors:  Jacqueline C Kirby; Peter Speltz; Luke V Rasmussen; Melissa Basford; Omri Gottesman; Peggy L Peissig; Jennifer A Pacheco; Gerard Tromp; Jyotishman Pathak; David S Carrell; Stephen B Ellis; Todd Lingren; Will K Thompson; Guergana Savova; Jonathan Haines; Dan M Roden; Paul A Harris; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2016-03-28       Impact factor: 4.497

3.  Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

Authors:  Sarah R Hoffman; Anissa I Vines; Jacqueline R Halladay; Emily Pfaff; Lauren Schiff; Daniel Westreich; Aditi Sundaresan; La-Shell Johnson; Wanda K Nicholson
Journal:  Am J Obstet Gynecol       Date:  2018-02-09       Impact factor: 8.661

4.  Machine Learning and Prediction in Medicine - Beyond the Peak of Inflated Expectations.

Authors:  Jonathan H Chen; Steven M Asch
Journal:  N Engl J Med       Date:  2017-06-29       Impact factor: 91.245

5.  Electronic medical records for genetic research: results of the eMERGE consortium.

Authors:  Abel N Kho; Jennifer A Pacheco; Peggy L Peissig; Luke Rasmussen; Katherine M Newton; Noah Weston; Paul K Crane; Jyotishman Pathak; Christopher G Chute; Suzette J Bielinski; Iftikhar J Kullo; Rongling Li; Teri A Manolio; Rex L Chisholm; Joshua C Denny
Journal:  Sci Transl Med       Date:  2011-04-20       Impact factor: 17.956

6.  An efficient approach for surveillance of childhood diabetes by type derived from electronic health record data: the SEARCH for Diabetes in Youth Study.

Authors:  Victor W Zhong; Jihad S Obeid; Jean B Craig; Emily R Pfaff; Joan Thomas; Lindsay M Jaacks; Daniel P Beavers; Timothy S Carey; Jean M Lawrence; Dana Dabelea; Richard F Hamman; Deborah A Bowlby; Catherine Pihoker; Sharon H Saydah; Elizabeth J Mayer-Davis
Journal:  J Am Med Inform Assoc       Date:  2016-04-23       Impact factor: 4.497

Review 7.  Clinical information extraction applications: A literature review.

Authors:  Yanshan Wang; Liwei Wang; Majid Rastegar-Mojarad; Sungrim Moon; Feichen Shen; Naveed Afzal; Sijia Liu; Yuqun Zeng; Saeed Mehrabi; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2017-11-21       Impact factor: 6.317

8.  Ease of adoption of clinical natural language processing software: An evaluation of five systems.

Authors:  Kai Zheng; V G Vinod Vydiswaran; Yang Liu; Yue Wang; Amber Stubbs; Özlem Uzuner; Anupama E Gururaj; Samuel Bayer; John Aberdeen; Anna Rumshisky; Serguei Pakhomov; Hongfang Liu; Hua Xu
Journal:  J Biomed Inform       Date:  2015-07-22       Impact factor: 6.317

Review 9.  A review of approaches to identifying patient phenotype cohorts using electronic health records.

Authors:  Chaitanya Shivade; Preethi Raghavan; Eric Fosler-Lussier; Peter J Embi; Noemie Elhadad; Stephen B Johnson; Albert M Lai
Journal:  J Am Med Inform Assoc       Date:  2013-11-07       Impact factor: 4.497

10.  Desiderata for computable representations of electronic health records-driven phenotype algorithms.

Authors:  Huan Mo; William K Thompson; Luke V Rasmussen; Jennifer A Pacheco; Guoqian Jiang; Richard Kiefer; Qian Zhu; Jie Xu; Enid Montague; David S Carrell; Todd Lingren; Frank D Mentch; Yizhao Ni; Firas H Wehbe; Peggy L Peissig; Gerard Tromp; Eric B Larson; Christopher G Chute; Jyotishman Pathak; Joshua C Denny; Peter Speltz; Abel N Kho; Gail P Jarvik; Cosmin A Bejan; Marc S Williams; Kenneth Borthwick; Terrie E Kitchner; Dan M Roden; Paul A Harris
Journal:  J Am Med Inform Assoc       Date:  2015-09-05       Impact factor: 4.497

View more
  2 in total

1.  Identification of Uncontrolled Symptoms in Cancer Patients Using Natural Language Processing.

Authors:  Lisa DiMartino; Thomas Miano; Kathryn Wessell; Buck Bohac; Laura C Hanson
Journal:  J Pain Symptom Manage       Date:  2021-11-04       Impact factor: 3.612

2.  Leveraging Open Electronic Health Record Data and Environmental Exposures Data to Derive Insights Into Rare Pulmonary Disease.

Authors:  Karamarie Fecho; Stanley C Ahalt; Michael Knowles; Ashok Krishnamurthy; Margaret Leigh; Kenneth Morton; Emily Pfaff; Max Wang; Hong Yi
Journal:  Front Artif Intell       Date:  2022-06-28
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.