Alon Geva1, Jessica L Gronsbell2, Tianxi Cai2, Tianrun Cai3, Shawn N Murphy4, Jessica C Lyons5, Michelle M Heinz6, Marc D Natter7, Nandan Patibandla8, Jonathan Bickel9, Mary P Mullen10, Kenneth D Mandl11. 1. Computational Health Informatics Program, Boston Children's Hospital, Boston, MA; Division of Critical Care Medicine, Department of Anesthesiology, Perioperative, and Pain Medicine, Boston Children's Hospital, Boston, MA; Department of Anesthesia, Harvard Medical School, Boston, MA. 2. Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA. 3. Division of Rheumatology, Immunology and Allergy, Brigham and Women's Hospital, Boston, MA. 4. Department of Research Information Services and Computing, Partners Healthcare, Boston, MA; Department of Neurology, Massachusetts General Hospital, Boston, MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA. 5. Department of Biomedical Informatics, Harvard Medical School, Boston, MA. 6. Computational Health Informatics Program, Boston Children's Hospital, Boston, MA. 7. Computational Health Informatics Program, Boston Children's Hospital, Boston, MA; Department of Pediatrics, Harvard Medical School, Boston, MA. 8. Information Services Department, Boston Children's Hospital, Boston, MA. 9. Computational Health Informatics Program, Boston Children's Hospital, Boston, MA; Department of Pediatrics, Harvard Medical School, Boston, MA; Information Services Department, Boston Children's Hospital, Boston, MA. 10. Department of Pediatrics, Harvard Medical School, Boston, MA; Department of Cardiology, Boston Children's Hospital, Boston, MA. 11. Computational Health Informatics Program, Boston Children's Hospital, Boston, MA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA; Department of Pediatrics, Harvard Medical School, Boston, MA. Electronic address: kenneth_mandl@harvard.edu.
Abstract
OBJECTIVES: To compare registry and electronic health record (EHR) data mining approaches for cohort ascertainment in patients with pediatric pulmonary hypertension (PH) in an effort to overcome some of the limitations of registry enrollment alone in identifying patients with particular disease phenotypes. STUDY DESIGN: This study was a single-center retrospective analysis of EHR and registry data at Boston Children's Hospital. The local Informatics for Integrating Biology and the Bedside (i2b2) data warehouse was queried for billing codes, prescriptions, and narrative data related to pediatric PH. Computable phenotype algorithms were developed by fitting penalized logistic regression models to a physician-annotated training set. Algorithms were applied to a candidate patient cohort, and performance was evaluated using a separate set of 136 records and 179 registry patients. We compared clinical and demographic characteristics of patients identified by computable phenotype and the registry. RESULTS: The computable phenotype had an area under the receiver operating characteristics curve of 90% (95% CI, 85%-95%), a positive predictive value of 85% (95% CI, 77%-93%), and identified 413 patients (an additional 231%) with pediatric PH who were not enrolled in the registry. Patients identified by the computable phenotype were clinically distinct from registry patients, with a greater prevalence of diagnoses related to perinatal distress and left heart disease. CONCLUSIONS: Mining of EHRs using computable phenotypes identified a large cohort of patients not recruited using a classic registry. Fusion of EHR and registry data can improve cohort ascertainment for the study of rare diseases. TRIAL REGISTRATION: ClinicalTrials.gov: NCT02249923.
OBJECTIVES: To compare registry and electronic health record (EHR) data mining approaches for cohort ascertainment in patients with pediatric pulmonary hypertension (PH) in an effort to overcome some of the limitations of registry enrollment alone in identifying patients with particular disease phenotypes. STUDY DESIGN: This study was a single-center retrospective analysis of EHR and registry data at Boston Children's Hospital. The local Informatics for Integrating Biology and the Bedside (i2b2) data warehouse was queried for billing codes, prescriptions, and narrative data related to pediatric PH. Computable phenotype algorithms were developed by fitting penalized logistic regression models to a physician-annotated training set. Algorithms were applied to a candidate patient cohort, and performance was evaluated using a separate set of 136 records and 179 registry patients. We compared clinical and demographic characteristics of patients identified by computable phenotype and the registry. RESULTS: The computable phenotype had an area under the receiver operating characteristics curve of 90% (95% CI, 85%-95%), a positive predictive value of 85% (95% CI, 77%-93%), and identified 413 patients (an additional 231%) with pediatric PH who were not enrolled in the registry. Patients identified by the computable phenotype were clinically distinct from registry patients, with a greater prevalence of diagnoses related to perinatal distress and left heart disease. CONCLUSIONS: Mining of EHRs using computable phenotypes identified a large cohort of patients not recruited using a classic registry. Fusion of EHR and registry data can improve cohort ascertainment for the study of rare diseases. TRIAL REGISTRATION: ClinicalTrials.gov: NCT02249923.
Authors: Wei-Qi Wei; Pedro L Teixeira; Huan Mo; Robert M Cronin; Jeremy L Warner; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2015-09-02 Impact factor: 4.497
Authors: Shawn Murphy; Susanne Churchill; Lynn Bry; Henry Chueh; Scott Weiss; Ross Lazarus; Qing Zeng; Anil Dubey; Vivian Gainer; Michael Mendis; John Glaser; Isaac Kohane Journal: Genome Res Date: 2009-07-14 Impact factor: 9.043
Authors: Sheng Yu; Katherine P Liao; Stanley Y Shaw; Vivian S Gainer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Isaac S Kohane; Tianxi Cai Journal: J Am Med Inform Assoc Date: 2015-04-29 Impact factor: 4.497
Authors: Katherine P Liao; Ashwin N Ananthakrishnan; Vishesh Kumar; Zongqi Xia; Andrew Cagan; Vivian S Gainer; Sergey Goryachev; Pei Chen; Guergana K Savova; Denis Agniel; Susanne Churchill; Jaeyoung Lee; Shawn N Murphy; Robert M Plenge; Peter Szolovits; Isaac Kohane; Stanley Y Shaw; Elizabeth W Karlson; Tianxi Cai Journal: PLoS One Date: 2015-08-24 Impact factor: 3.240
Authors: Michelle R Denburg; Hanieh Razzaghi; L Charles Bailey; Danielle E Soranno; Ari H Pollack; Vikas R Dharnidharka; Mark M Mitsnefes; William E Smoyer; Michael J G Somers; Joshua J Zaritsky; Joseph T Flynn; Donna J Claes; Bradley P Dixon; Maryjane Benton; Laura H Mariani; Christopher B Forrest; Susan L Furth Journal: J Am Soc Nephrol Date: 2019-11-15 Impact factor: 10.121
Authors: Yichi Zhang; Tianrun Cai; Sheng Yu; Kelly Cho; Chuan Hong; Jiehuan Sun; Jie Huang; Yuk-Lam Ho; Ashwin N Ananthakrishnan; Zongqi Xia; Stanley Y Shaw; Vivian Gainer; Victor Castro; Nicholas Link; Jacqueline Honerlaw; Sicong Huang; David Gagnon; Elizabeth W Karlson; Robert M Plenge; Peter Szolovits; Guergana Savova; Susanne Churchill; Christopher O'Donnell; Shawn N Murphy; J Michael Gaziano; Isaac Kohane; Tianxi Cai; Katherine P Liao Journal: Nat Protoc Date: 2019-11-20 Impact factor: 13.491
Authors: Kelly T Gleason; Cheryl R Dennison Himmelfarb; Daniel E Ford; Harold Lehmann; Laura Samuel; Hae Ra Han; Sandeep K Jain; Gerald V Naccarelli; Vikas Aggarwal; Saman Nazarian Journal: BMC Cardiovasc Disord Date: 2019-04-05 Impact factor: 2.298
Authors: Alon Geva; Steven H Abman; Shannon F Manzi; Dunbar D Ivy; Mary P Mullen; John Griffin; Chen Lin; Guergana K Savova; Kenneth D Mandl Journal: J Am Med Inform Assoc Date: 2020-02-01 Impact factor: 4.497
Authors: Mindy K Ross; Henry Zheng; Bing Zhu; Ailina Lao; Hyejin Hong; Alamelu Natesan; Melina Radparvar; Alex A T Bui Journal: Methods Inf Med Date: 2021-07-14 Impact factor: 1.800