| Literature DB >> 29881734 |
Nicole G Weiskopf1, Suzanne Bakken2,3, George Hripcsak2, Chunhua Weng2.
Abstract
INTRODUCTION: We describe the formulation, development, and initial expert review of 3x3 Data Quality Assessment (DQA), a dynamic, evidence-based guideline to enable electronic health record (EHR) data quality assessment and reporting for clinical research.Entities:
Year: 2017 PMID: 29881734 PMCID: PMC5983018 DOI: 10.5334/egems.218
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
Figure 1Flowchart Summarizing Development of the EHR Data Quality Assessment Guideline
EHR Data Quality Constructs, Definitions, Counter Examples, Sources, Categories,25 and Inclusion Status in Final Guideline
| CONSTRUCT | DEFINITION (EXAMPLE VIOLATION) | SOURCE | PROXY FOR | CATEGORY | INCLUDED |
|---|---|---|---|---|---|
| Concordance | There is agreement between data elements. | literature, interviews | correctness | intrinsic | |
| Correctness | A value is true. | literature, interviews | intrinsic | X | |
| Plausibility | A value "makes sense" based on external knowledge. | literature | correctness | intrinsic | |
| Completeness | A truth about a patient is present. | literature, interviews | contextual | X | |
| Currency | A value is representative of the clinically relevant time. | literature | contextual | X | |
| Granularity | A data value is neither too specific nor too broad. | interviews | contextual | ||
| Fragmentation | A concept is recorded in one place in the record. | interviews | representational | ||
| Signal-to-noise | Information of interest can be distinguished from irrelevant data in the record. | interviews | contextual | ||
| Structuredness | Data are recorded in a format that enables reliable extraction. | interviews | representational | ||
Mapping Method Categories to Data Quality Constructs and Data Dimensionality
| METHOD CATEGORY | DEFINITION | PRIMARY CONSTRUCT | SECONDARY CONSTRUCT | PATIENTS | VARIABLES | TIME |
|---|---|---|---|---|---|---|
| Data element agreement | The values of two or more data elements are concordant | Correct | Complete | X | X | |
| Element presence | Desired or expected data elements are present | Complete | X | X | X | |
| Log review | Metadata (timestamps, edits, etc.) are used to determine quality | Current | Correct | X | X | X |
| Distribution comparison | Aggregated data are compared to external sources of information on the clinical concepts of interest | Correct | Complete | X | ||
| Validity check | The face validity of values and changes in values is assessed | Correct | Complete | X | X | |
Number of Experts (out of six) Who Agreed that a Given Component of the Guideline was Clear, Comprehensive, Feasible, or Valid
| CLEAR | COMPREHENSIVE | FEASIBLE | VALID | |
|---|---|---|---|---|
| Complete | 4 (67%) | 4 (67%) | ||
| Correct | 3 (50%) | 3 (50%) | ||
| Current | 2 (33%) | 4 (67%) | ||
| Complete x Patients | 4 (67%) | 4 (67%) | ||
| Complete x Variable | 4 (67%) | 4 (67%) | ||
| Complete x Time | 3 (50%) | 4 (67%) | ||
| Correct x Patient | 6 (100%) | 6 (100%) | ||
| Correct x Variable | 4 (67%) | 5 (83%) | ||
| Correct x Time | 5 (83%) | 5 (83%) | ||
| Current x Patients | 3 (50%) | 4 (67%) | ||
| Current x Variable | 2 (33%) | 4 (67%) | ||
| Current x Time | 5 (83%) | 5 (83%) | ||
| all Complete | 4 (67%) | |||
| all Correct | 3 (50%) | |||
| all Current | 3 (50%) | |||
| Complete x Patients | 4 (67%) | 4 (67%) | ||
| Complete x Variable | 4 (67%) | 5 (83%) | ||
| Complete x Time | 4 (67%) | 6 (100%) | ||
| Correct x Patient | 5 (83%) | 5 (83%) | ||
| Correct x Variable | 5 (83%) | 4 (67%) | ||
| Correct x Time | 6 (100%) | 6 (100%) | ||
| Current x Patients | 5 (83%) | 5 (83%) | ||
| Current x Variable | 4 (67%) | 5 (83%) | ||
| Current x Time | 4 (67%) | 2 (33%) | ||
Figure 23x3 DQA Framework
Note: Data quality constructs are at the top, data dimensions along the side, and cells contain corresponding operationalized constructs.
| CLEAR | COMP. | VALID | FEASIBLE | SAMPLE STATEMENTS FROM EXPERT | |
|---|---|---|---|---|---|
| Y | Y | Typically, a researcher will evaluate data quality/availability and develop a research design appropriate for the data. | |||
| 0/3 | 0/3 | ||||
| 4/9 | 1/3 | 3/9 | |||
| 8/9 | 8/9 | ||||
| N | N | I think missing from the framework is actually the frame—when in the research process are we supposed to use this? It seems to be aimed at the analysis of a data set—after the data-collection process has been specified. | |||
| 0/3 | 0/3 | ||||
| 1/9 | 0/3 | 6/9 | |||
| 5/9 | 6/9 | ||||
| Y | N | Again it is contextual—fitness for purpose definition. But overall the logic of self-assessment and self-determination of what "sufficient" is makes sense. | |||
| 3/3 | 2/3 | ||||
| 6/9 | 0/3 | 6/9 | |||
| 7/9 | 9/9 | ||||
| Y | N | I don't see anything to address the quality issue of, "Are the right patients included in the data?" Perhaps this is more of a research question…but it seems to cross into the data quality boundary when someone attempts to use the data for something that's not fit for purpose. | |||
| 0/3 | 2/3 | ||||
| 9/9 | 3/3 | 9/9 | |||
| 4/9 | 3/9 | ||||
| N | Y | Each construct seems like it should be followed by the term "for the task at hand." | |||
| 2/3 | 3/3 | ||||
| 7/9 | 3/3 | 8/9 | |||
| 8/9 | 7/9 | ||||
| Y | Y | I found the questions to be a nice way to frame the framework! I think the questions will help users understand the framework and the subsequently presented Recommendations. | |||
| 3/3 | 3/3 | ||||
| 9/9 | 3/3 | 9/9 | |||
| 9/9 | 9/9 | ||||