Hossein Estiri1,2,3, Kari A Stephens4,5, Jeffrey G Klann1,2,3, Shawn N Murphy1,2,3. 1. Harvard Medical School. 2. Massachusetts General Hospital. 3. Partners HealthCare, Boston, MA, USA. 4. Department of Biomedical Informatics and Medical Education. 5. Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA.
Abstract
Objective: To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories. Materials and Methods: This article describes the tool's design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix. Results: DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators. Discussion: Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup. Conclusion: EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.
Objective: To provide an open source, interoperable, and scalable data quality assessment tool for evaluation and visualization of completeness and conformance in electronic health record (EHR) data repositories. Materials and Methods: This article describes the tool's design and architecture and gives an overview of its outputs using a sample dataset of 200 000 randomly selected patient records with an encounter since January 1, 2010, extracted from the Research Patient Data Registry (RPDR) at Partners HealthCare. All the code and instructions to run the tool and interpret its results are provided in the Supplementary Appendix. Results: DQe-c produces a web-based report that summarizes data completeness and conformance in a given EHR data repository through descriptive graphics and tables. Results from running the tool on the sample RPDR data are organized into 4 sections: load and test details, completeness test, data model conformance test, and test of missingness in key clinical indicators. Discussion: Open science, interoperability across major clinical informatics platforms, and scalability to large databases are key design considerations for DQe-c. Iterative implementation of the tool across different institutions directed us to improve the scalability and interoperability of the tool and find ways to facilitate local setup. Conclusion: EHR data quality assessment has been hampered by implementation of ad hoc processes. The architecture and implementation of DQe-c offer valuable insights for developing reproducible and scalable data science tools to assess, manage, and process data in clinical data repositories.
Authors: Shawn N Murphy; Griffin Weber; Michael Mendis; Vivian Gainer; Henry C Chueh; Susanne Churchill; Isaac Kohane Journal: J Am Med Inform Assoc Date: 2010 Mar-Apr Impact factor: 4.497
Authors: Catherine A McCarty; Rex L Chisholm; Christopher G Chute; Iftikhar J Kullo; Gail P Jarvik; Eric B Larson; Rongling Li; Daniel R Masys; Marylyn D Ritchie; Dan M Roden; Jeffery P Struewing; Wendy A Wolf Journal: BMC Med Genomics Date: 2011-01-26 Impact factor: 3.063
Authors: Ronald Margolis; Leslie Derr; Michelle Dunn; Michael Huerta; Jennie Larkin; Jerry Sheehan; Mark Guyer; Eric D Green Journal: J Am Med Inform Assoc Date: 2014-07-09 Impact factor: 4.497
Authors: Hossein Estiri; Jeffrey G Klann; Sarah R Weiler; Ernest Alema-Mensah; R Joseph Applegate; Galina Lozinski; Nandan Patibandla; Kun Wei; William G Adams; Marc D Natter; Elizabeth O Ofili; Brian Ostasiewski; Alexander Quarshie; Gary E Rosenthal; Elmer V Bernstam; Kenneth D Mandl; Shawn N Murphy Journal: J Am Med Inform Assoc Date: 2019-07-01 Impact factor: 4.497
Authors: James R Rogers; Tiffany J Callahan; Tian Kang; Alan Bauck; Ritu Khare; Jeffrey S Brown; Michael G Kahn; Chunhua Weng Journal: EGEMS (Wash DC) Date: 2019-04-23