Peggy L Peissig1, Vitor Santos Costa2, Michael D Caldwell3, Carla Rottscheit4, Richard L Berg4, Eneida A Mendonca5, David Page6. 1. Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA. Electronic address: peissig.peggy@marshfieldclinic.org. 2. DCC-FCUP and CRACS INESC-TEC, Department de Ciência de Computadores, Universidade do Porto, Portugal. 3. Department of Surgery, Marshfield Clinic, Marshfield, WI, USA. 4. Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, WI, USA. 5. Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA; Department of Pediatrics, University of Wisconsin-Madison, USA. 6. Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, USA; Department of Computer Sciences, University of Wisconsin-Madison, USA.
Abstract
OBJECTIVE: Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient's clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using inductive logic programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping. METHODS: Two relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance. RESULTS: We developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each ML approach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p=0.039), J48 (p=0.003) and JRIP (p=0.003). DISCUSSION: ILP has the potential to improve phenotyping by independently delivering clinically expert interpretable rules for phenotype definitions, or intuitive phenotypes to assist experts. CONCLUSION: Relational learning using ILP offers a viable approach to EHR-driven phenotyping.
OBJECTIVE: Electronic health records (EHR) offer medical and pharmacogenomics research unprecedented opportunities to identify and classify patients at risk. EHRs are collections of highly inter-dependent records that include biological, anatomical, physiological, and behavioral observations. They comprise a patient's clinical phenome, where each patient has thousands of date-stamped records distributed across many relational tables. Development of EHR computer-based phenotyping algorithms require time and medical insight from clinical experts, who most often can only review a small patient subset representative of the total EHR records, to identify phenotype features. In this research we evaluate whether relational machine learning (ML) using inductive logic programming (ILP) can contribute to addressing these issues as a viable approach for EHR-based phenotyping. METHODS: Two relational learning ILP approaches and three well-known WEKA (Waikato Environment for Knowledge Analysis) implementations of non-relational approaches (PART, J48, and JRIP) were used to develop models for nine phenotypes. International Classification of Diseases, Ninth Revision (ICD-9) coded EHR data were used to select training cohorts for the development of each phenotypic model. Accuracy, precision, recall, F-Measure, and Area Under the Receiver Operating Characteristic (AUROC) curve statistics were measured for each phenotypic model based on independent manually verified test cohorts. A two-sided binomial distribution test (sign test) compared the five ML approaches across phenotypes for statistical significance. RESULTS: We developed an approach to automatically label training examples using ICD-9 diagnosis codes for the ML approaches being evaluated. Nine phenotypic models for each ML approach were evaluated, resulting in better overall model performance in AUROC using ILP when compared to PART (p=0.039), J48 (p=0.003) and JRIP (p=0.003). DISCUSSION: ILP has the potential to improve phenotyping by independently delivering clinically expert interpretable rules for phenotype definitions, or intuitive phenotypes to assist experts. CONCLUSION: Relational learning using ILP offers a viable approach to EHR-driven phenotyping.
Authors: Serguei Pakhomov; Susan A Weston; Steven J Jacobsen; Christopher G Chute; Ryan Meverden; Véronique L Roger Journal: Am J Manag Care Date: 2007-06 Impact factor: 2.229
Authors: Katherine M Newton; Peggy L Peissig; Abel Ngo Kho; Suzette J Bielinski; Richard L Berg; Vidhu Choudhary; Melissa Basford; Christopher G Chute; Iftikhar J Kullo; Rongling Li; Jennifer A Pacheco; Luke V Rasmussen; Leslie Spangler; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2013-03-26 Impact factor: 4.497
Authors: Curtis P Langlotz; Bibb Allen; Bradley J Erickson; Jayashree Kalpathy-Cramer; Keith Bigelow; Tessa S Cook; Adam E Flanders; Matthew P Lungren; David S Mendelson; Jeffrey D Rudie; Ge Wang; Krishna Kandarpa Journal: Radiology Date: 2019-04-16 Impact factor: 11.105
Authors: Xinzhi Zhang; Eliseo J Pérez-Stable; Philip E Bourne; Emmanuel Peprah; O Kenrik Duru; Nancy Breen; David Berrigan; Fred Wood; James S Jackson; David W S Wong; Joshua Denny Journal: Ethn Dis Date: 2017-04-20 Impact factor: 1.847
Authors: Jacqueline C Kirby; Peter Speltz; Luke V Rasmussen; Melissa Basford; Omri Gottesman; Peggy L Peissig; Jennifer A Pacheco; Gerard Tromp; Jyotishman Pathak; David S Carrell; Stephen B Ellis; Todd Lingren; Will K Thompson; Guergana Savova; Jonathan Haines; Dan M Roden; Paul A Harris; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2016-03-28 Impact factor: 4.497
Authors: Vibhu Agarwal; Tanya Podchiyska; Juan M Banda; Veena Goel; Tiffany I Leung; Evan P Minty; Timothy E Sweeney; Elsie Gyang; Nigam H Shah Journal: J Am Med Inform Assoc Date: 2016-05-12 Impact factor: 4.497