Literature DB >> 24303241

A comprehensive framework for data quality assessment in CER.

Erin Holve¹, Michael Kahn, Meredith Nahm, Patrick Ryan, Nicole Weiskopf.
1. AcademyHealth, Washington, DC;

Abstract

The panel addresses the urgent need to ensure that comparative effectiveness research (CER) findings derived from diverse and distributed data sources are based on credible, high-quality data; and that the methods used to assess and report data quality are consistent, comprehensive, and available to data consumers. The panel consists of representatives from four teams leveraging electronic clinical data for CER, patient centered outcomes research (PCOR), and quality improvement (QI) and seeks to change the current paradigm where data quality assessment (DQA) is performed "behind the scenes" using one-off project specific methods. The panelists will present their process of harmonizing existing models for describing and measuring clinical data quality and will describe a comprehensive integrated framework for assessing and reporting DQA findings. The collaborative project is supported by the Electronic Data Methods (EDM) Forum, a three-year grant from the Agency for Healthcare Research and Quality (AHRQ) to facilitate learning and foster collaboration across a set of CER, PCOR, and QI projects designed to build infrastructure and methods for collecting and analyzing prospective data from electronic clinical data .

Entities: Disease Gene Species

Year: 2013 PMID： 24303241 PMCID： PMC3845781

Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc

Introduction

Detailed clinical data from disparate data sources, including electronic health records (EHRs), is the backbone of large-scale comparative effectiveness research (CER). Yet there exists no formal methods for assessing and reporting on the quality of data obtained from these sources. This proposal will develop a comprehensive data quality assessment framework and guidelines for the CER community. This panel will present the collaboration between diverse research teams leveraging electronic clinical data for research and quality improvement (QI). The goal of this collaboration is to create draft recommendations and guidelines that can guide the development of new analytic and reporting methods specifically directed to data quality assessment and reporting for CER studies. The long-term vision is that all EHR-based clinical studies and all publically available data sets would be linked to data quality assessment results that allow for an independent assessment of the quality of the data used to generate the reported results. In addition, as NIH data sharing requirements become more stringent, the presence of uniform, standardized data quality assessment measures enables a potential data consumer to determine if a given data set is sufficient for their intended use. The collaborative project is supported by the Electronic Data Methods (EDM) Forum, a three-year grant from the Agency for Healthcare Research and Quality (AHRQ) to facilitate learning and foster collaboration across a set of CER, patient centered outcomes research (PCOR), and QI projects designed to build infrastructure and methods for collecting and analyzing prospective data from electronic clinical data. The EDM Forum has commissioned collaborative projects that examine current challenges and opportunities for conducting CER, PCOR, and QI with electronic clinical data. Specific areas of focus include aspects of the data governance, clinical informatics, and analytic issues that are crucial to the design and use of electronic clinical data for CER, as well as lessons learned from quality improvement and other efforts to use electronic clinical data for health research and clinical care. The EDM Forum and the research projects connected to the Forum are funded by the American Recovery and Reinvestment Act (ARRA).

Panel Overview

The panel consists of representatives from four teams leveraging electronic clinical data for CER, PCOR, and QI. One representative from each of the projects will describe their pre-existing model for describing and measuring clinical data quality and will describe their role in constructing a harmonized model of data quality that captures the key elements of their individual models and any additional features described in the clinical data quality assessment literature.

Moderator: Erin Holve PhD (Electronic Data Methods Forum)

Dr. Erin Holve, principal investigator of the EDM Forum, will serve as the panel’s moderator. In this role, Dr. Holve will highlight cross-cutting themes and challenges for data quality among the broader set of EDM Forum research projects.

Panelist: Michael G. Kahn MD, PhD (University of Colorado)

Comparative effectiveness research studies require access to detailed clinical data collected across diverse clinical practice settings. Unlike “traditional” prospective clinical trials that utilize detailed data collection tools and procedures and rely on trained data collection personnel, EHR databases contain data collected during routine clinical care by practitioners focused on patient care rather than research. Differences in clinical workflows, practice standards, patient populations, available technologies, and referral resources impact what data are collected and how they are documented. Numerous studies have highlighted significant concerns about the quality of data in EHRs. 1 – 6 CER studies seek to exploit real-world diversity in order to detect and understand determinants impacting outcome variation. Data quality and completeness problems, however, may affect the validity of CER findings. The importance of good quality data in clinical research is well accepted. 6 7 However, methods for categorizing, analyzing, and reporting on data quality are poorly developed. Most approaches to data quality assessment (DQA) are ad hoc, developed based on an intuitive understanding of data quality challenges, and focused on specific research questions. Few systematic approaches to DQA for the secondary use of clinically-obtained data have been proposed. Current methods do not emphasize the need to improve the reporting of DQA results. This presentation will describe the development of a comprehensive community-driven DQA framework and guidelines. Using this harmonized framework, we will describe how data quality can be continuously assessed and improved in large CER databases, and how investigators and consumers may utilize data quality assessment results to plan future studies and interpret study results.

Panelist: Meredith Nahm PhD (Duke University)

Similar to other research designs, clinical trials are making increased use of existing data. These data originate from multiple clinical sources, including data in health records, registries, and clinical data warehouses. Secondary use of data is at the core, use by individuals other than those who originally collected the data and for uses other than those for which the data were originally collected. Further, data from clinical trials, whether collected denovo for the trial or gathered from clinical documentation may be reused for secondary (tertiary, or further) analysis. Thus, in addition to the quality needs imposed to support practice or regulatory decision-making 7 8 , clinical trials must themselves contend with data quality issues related to secondary data use, i.e., consider additional dimensions of data quality to support further secondary use of the trial data itself such as federal requirements for data sharing. 9 Potential solutions to the trifecta are likely multifold. Frameworks are needed to support secondary data use, frameworks that facilitate analysis of data sources for a given data use. Because clinical trials are themselves a secondary data user and because fundamental informatics principles apply to representation and management of all data, such frameworks are likely consistent with other research designs across the spectrum of the NIH definition of clinical research 10 and can be unified. Further, because of increased secondary data use demands placed on the data from clinical trials, in some cases, data quality considerations for clinical trials will include additional data quality dimensions to support secondary data use. Examples of additional information that may be needed to support secondary data use include procedural definition of the original observation or measurement, complete definition of collected data elements, documentation of cleaning algorithms to which the data values were subjected, quantitative assessments of data accuracy, and metadata including attribution, contemporaneity, and provenance of the data values. Methods for managing association of this data quality metadata along with the data values are needed. 11

Panelist: Patrick Ryan PhD (OMOP)

In order to interpret the results of any analysis on a data source, the characteristics of the data source be clearly understood. The Observational Source Characteristics Analysis Report (OSCAR) provides a systematic approach for summarizing all observational healthcare data within the OMOP common data model. The procedure creates structured output of descriptive statistics for all relevant tables within the model to facilitate rapid summary and interpretation of the potential merits of a particular data source for addressing active surveillance needs. Generalized Review of OSCAR Unified Checking (GROUCH) is a program that produces a summary report for each data source of warnings of implausible and suspicious data observed from the OSCAR summary. It identifies potential issues across all OMOP common data model tables, including potential concerns with all drug exposures and all conditions. GROUCH allows for data quality review of specific drugs (such as the ingredients that comprise the OMOP drugs of interest) or specific conditions (including population-level prevalence of the health outcomes of interest, and unexpected gender-specific rates, such as males with pregnancy, and females with prostate cancer).

Panelist: Nicole Weiskopf (Columbia University)

In order to determine whether a dataset is of sufficient quality, it is first necessary to identify and understand the data needs of the intended use case. Systematic approaches to data quality assessment and reporting are desirable, but must be flexible enough to account for the fact that data quality may be a task-dependent concept. Further complicating matters is the increase in the secondary use of data pulled from sources like electronic health records (EHR), which present a number of unique challenges. An ideal data quality model must accommodate the task-dependent nature of data quality assessment by making explicit the connections between study designs, data needs, data quality requirements, and potential data quality assessment methods, thereby allowing data consumers to determine the suitability of complicated datasets for specific research tasks. This presentation will use data completeness as an example to illustrate how different definitions of the same data quality dimension may lead to significantly different data quality assessment methods and findings. Data consumers must explicitly define data completeness as it relates to the intended research in order to enable appropriate assessment and allow transparent reporting of data methods.

Conclusion

A key task of the EDM Forum is to assemble relevant stakeholders with an interest in addressing and resolving (when feasible) the infrastructure and methods challenges likely to arise through the development of infrastructure and methods for CER based on electronic clinical data. This collaborative project represents an important opportunity to bring together four existing data quality assessment (DQA) models and create a harmonized model, data quality assessment methods, DQA best practices and data quality reporting recommendations for the CER community.

6 in total

1. Toward reuse of clinical data for research and quality improvement: the end of the beginning?

Authors: Mark G Weiner; Peter J Embi
Journal: Ann Intern Med Date: 2009-07-28 Impact factor: 25.391

Review 2. Review: electronic health records and the reliability and validity of quality measures: a review of the literature.

Authors: Kitty S Chan; Jinnet B Fowles; Jonathan P Weiner
Journal: Med Care Res Rev Date: 2010-02-11 Impact factor: 3.929

Review 3. Accuracy of data in computer-based patient records.

Authors: W R Hogan; M M Wagner
Journal: J Am Med Inform Assoc Date: 1997 Sep-Oct Impact factor: 4.497

4. Assessing the quality of clinical data in a computer-based record for calculating the pneumonia severity index.

Authors: D Aronsky; P J Haug
Journal: J Am Med Inform Assoc Date: 2000 Jan-Feb Impact factor: 4.497

5. Problems with primary care data quality: osteoporosis as an exemplar.

Authors: Simon de Lusignan; Tom Valentin; Tom Chan; Nigel Hague; Oliver Wood; Jeremy van Vlymen; Neil Dhoul
Journal: Inform Prim Care Date: 2004

6. Secondary Use of EHR: Data Quality Issues and Informatics Opportunities.

Authors: Taxiarchis Botsis; Gunnar Hartvigsen; Fei Chen; Chunhua Weng
Journal: Summit Transl Bioinform Date: 2010-03-01

6 in total

9 in total

1. Clinical research informatics and electronic health record data.

Authors: R L Richesson; M M Horvath; S A Rusincovitch
Journal: Yearb Med Inform Date: 2014-08-15

2. Developing a Data Quality Standard Primer for Cardiovascular Risk Assessment from Electronic Health Record Data Using the DataGauge Process.

Authors: Franck Diaz-Garelli; Andrew Long; Michael P Bancks; Alain G Bertoni; Adhithya Narayanan; Brian J Wells
Journal: AMIA Annu Symp Proc Date: 2022-02-21

3. Workflow Differences Affect Data Accuracy in Oncologic EHRs: A First Step Toward Detangling the Diagnosis Data Babel.

Authors: Franck Diaz-Garelli; Roy Strowd; Virginia L Lawson; Maria E Mayorga; Brian J Wells; Thomas W Lycan; Umit Topaloglu
Journal: JCO Clin Cancer Inform Date: 2020-06

4. A longitudinal analysis of data quality in a large pediatric data research network.

Authors: Ritu Khare; Levon Utidjian; Byron J Ruth; Michael G Kahn; Evanette Burrows; Keith Marsolo; Nandan Patibandla; Hanieh Razzaghi; Ryan Colvin; Daksha Ranade; Melody Kitzmiller; Daniel Eckrich; L Charles Bailey
Journal: J Am Med Inform Assoc Date: 2017-11-01 Impact factor: 4.497

5. Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network.

Authors: Ritu Khare; Levon H Utidjian; Hanieh Razzaghi; Victoria Soucek; Evanette Burrows; Daniel Eckrich; Richard Hoyt; Harris Weinstein; Matthew W Miller; David Soler; Joshua Tucker; L Charles Bailey
Journal: EGEMS (Wash DC) Date: 2019-08-01

6. DataGauge: A Practical Process for Systematically Designing and Implementing Quality Assessments of Repurposed Clinical Data.

Authors: Jose-Franck Diaz-Garelli; Elmer V Bernstam; MinJae Lee; Kevin O Hwang; Mohammad H Rahbar; Todd R Johnson
Journal: EGEMS (Wash DC) Date: 2019-07-25

7. A tale of three subspecialties: Diagnosis recording patterns are internally consistent but Specialty-Dependent.

Authors: Jose-Franck Diaz-Garelli; Roy Strowd; Tamjeed Ahmed; Brian J Wells; Rebecca Merrill; Javier Laurini; Boris Pasche; Umit Topaloglu
Journal: JAMIA Open Date: 2019-08-05

8. Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative.

Authors: Emily R Pfaff; Andrew T Girvin; Davera L Gabriel; Kristin Kostka; Michele Morris; Matvey B Palchuk; Harold P Lehmann; Benjamin Amor; Mark Bissell; Katie R Bradwell; Sigfried Gold; Stephanie S Hong; Johanna Loomba; Amin Manna; Julie A McMurry; Emily Niehaus; Nabeel Qureshi; Anita Walden; Xiaohan Tanner Zhang; Richard L Zhu; Richard A Moffitt; Melissa A Haendel; Christopher G Chute; William G Adams; Shaymaa Al-Shukri; Alfred Anzalone; Ahmad Baghal; Tellen D Bennett; Elmer V Bernstam; Elmer V Bernstam; Mark M Bissell; Brian Bush; Thomas R Campion; Victor Castro; Jack Chang; Deepa D Chaudhari; Wenjin Chen; San Chu; James J Cimino; Keith A Crandall; Mark Crooks; Sara J Deakyne Davies; John DiPalazzo; David Dorr; Dan Eckrich; Sarah E Eltinge; Daniel G Fort; George Golovko; Snehil Gupta; Melissa A Haendel; Janos G Hajagos; David A Hanauer; Brett M Harnett; Ronald Horswell; Nancy Huang; Steven G Johnson; Michael Kahn; Kamil Khanipov; Curtis Kieler; Katherine Ruiz De Luzuriaga; Sarah Maidlow; Ashley Martinez; Jomol Mathew; James C McClay; Gabriel McMahan; Brian Melancon; Stephane Meystre; Lucio Miele; Hiroki Morizono; Ray Pablo; Lav Patel; Jimmy Phuong; Daniel J Popham; Claudia Pulgarin; Carlos Santos; Indra Neil Sarkar; Nancy Sazo; Soko Setoguchi; Selvin Soby; Sirisha Surampalli; Christine Suver; Uma Maheswara Reddy Vangala; Shyam Visweswaran; James von Oehsen; Kellie M Walters; Laura Wiley; David A Williams; Adrian Zai
Journal: J Am Med Inform Assoc Date: 2022-03-15 Impact factor: 7.942

9. A Framework to Support the Sharing and Reuse of Computable Phenotype Definitions Across Health Care Delivery and Clinical Research Applications.

Authors: Rachel L Richesson; Michelle M Smerek; C Blake Cameron
Journal: EGEMS (Wash DC) Date: 2016-07-05

9 in total