Ritu Khare1,2, Levon Utidjian1,2, Byron J Ruth1, Michael G Kahn3, Evanette Burrows1,2, Keith Marsolo4, Nandan Patibandla5, Hanieh Razzaghi2, Ryan Colvin6, Daksha Ranade7, Melody Kitzmiller8, Daniel Eckrich9, L Charles Bailey1,2,10. 1. Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA. 2. Department of Pediatrics, Children's Hospital of Philadelphia. 3. Department of Pediatrics, University of Colorado Denver Anschutz Medical Campus, Aurora, CO, USA. 4. University of Cincinnati Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA. 5. Information Services Department, Children's Hospital Boston, Boston, MA, USA. 6. Department of Pediatrics, Washington University in St. Louis, St. Louis, MO, USA. 7. Research Informatics, Seattle Children's Research Institute, Seattle, WA, USA. 8. Research Information Solutions and Innovation, Nationwide Children's Hospital, Columbus, OH, USA. 9. Center for Pediatric Auditory and Speech Sciences, Nemours Biomedical Research, Wilmington, DE, USA. 10. Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
Abstract
OBJECTIVE: PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet. MATERIALS AND METHODS: Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue. RESULTS: The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%). DISCUSSION: The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability. CONCLUSION: While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.
OBJECTIVE: PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet. MATERIALS AND METHODS: Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue. RESULTS: The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%). DISCUSSION: The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability. CONCLUSION: While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.
Authors: William R Hersh; Mark G Weiner; Peter J Embi; Judith R Logan; Philip R O Payne; Elmer V Bernstam; Harold P Lehmann; George Hripcsak; Timothy H Hartzog; James J Cimino; Joel H Saltz Journal: Med Care Date: 2013-08 Impact factor: 2.983
Authors: L Charles Bailey; David E Milov; Kelly Kelleher; Michael G Kahn; Mark Del Beccaro; Feliciano Yu; Thomas Richards; Christopher B Forrest Journal: PLoS One Date: 2013-06-18 Impact factor: 3.240
Authors: Michael G Kahn; Jeffrey S Brown; Alein T Chun; Bruce N Davidson; Daniella Meeker; Patrick B Ryan; Lisa M Schilling; Nicole G Weiskopf; Andrew E Williams; Meredith Nahm Zozus Journal: EGEMS (Wash DC) Date: 2015-03-23
Authors: Vojtech Huser; Frank J DeFalco; Martijn Schuemie; Patrick B Ryan; Ning Shang; Mark Velez; Rae Woong Park; Richard D Boyce; Jon Duke; Ritu Khare; Levon Utidjian; Charles Bailey Journal: EGEMS (Wash DC) Date: 2016-11-30
Authors: Michael G Kahn; Tiffany J Callahan; Juliana Barnard; Alan E Bauck; Jeff Brown; Bruce N Davidson; Hossein Estiri; Carsten Goerg; Erin Holve; Steven G Johnson; Siaw-Teng Liaw; Marianne Hamilton-Lopez; Daniella Meeker; Toan C Ong; Patrick Ryan; Ning Shang; Nicole G Weiskopf; Chunhua Weng; Meredith N Zozus; Lisa Schilling Journal: EGEMS (Wash DC) Date: 2016-09-11
Authors: Joanie Jean; Sara Goldberg; Ritu Khare; L Charles Bailey; Christopher B Forrest; Evlambia Hajishengallis; Hyun Koo Journal: Pediatr Dent Date: 2018-03-15 Impact factor: 1.874
Authors: Michelle R Denburg; Hanieh Razzaghi; L Charles Bailey; Danielle E Soranno; Ari H Pollack; Vikas R Dharnidharka; Mark M Mitsnefes; William E Smoyer; Michael J G Somers; Joshua J Zaritsky; Joseph T Flynn; Donna J Claes; Bradley P Dixon; Maryjane Benton; Laura H Mariani; Christopher B Forrest; Susan L Furth Journal: J Am Soc Nephrol Date: 2019-11-15 Impact factor: 10.121
Authors: Christoph P Hornik; Andrew M Atz; Catherine Bendel; Francis Chan; Kevin Downes; Robert Grundmeier; Ben Fogel; Debbie Gipson; Matthew Laughon; Michael Miller; Michael Smith; Chad Livingston; Cindy Kluchar; Anne Heath; Chanda Jarrett; Brian McKerlie; Hetalkumar Patel; Christina Hunter Journal: Appl Clin Inform Date: 2019-05-08 Impact factor: 2.342
Authors: Scott E Wenderfer; Joyce C Chang; Amy Goodwin Davies; Ingrid Y Luna; Rebecca Scobell; Cora Sears; Bliss Magella; Mark Mitsnefes; Brian R Stotter; Vikas R Dharnidharka; Katherine D Nowicki; Bradley P Dixon; Megan Kelton; Joseph T Flynn; Caroline Gluck; Mahmoud Kallash; William E Smoyer; Andrea Knight; Sangeeta Sule; Hanieh Razzaghi; L Charles Bailey; Susan L Furth; Christopher B Forrest; Michelle R Denburg; Meredith A Atkinson Journal: Clin J Am Soc Nephrol Date: 2021-11-03 Impact factor: 8.237
Authors: Siaw-Teng Liaw; Jason Guan Nan Guo; Sameera Ansari; Jitendra Jonnagaddala; Myron Anthony Godinho; Alder Jose Borelli; Simon de Lusignan; Daniel Capurro; Harshana Liyanage; Navreet Bhattal; Vicki Bennett; Jaclyn Chan; Michael G Kahn Journal: J Am Med Inform Assoc Date: 2021-07-14 Impact factor: 4.497
Authors: Laura Goettinger Qualls; Thomas A Phillips; Bradley G Hammill; James Topping; Darcy M Louzao; Jeffrey S Brown; Lesley H Curtis; Keith Marsolo Journal: EGEMS (Wash DC) Date: 2018-04-13