| Literature DB >> 27713905 |
Michael G Kahn1, Tiffany J Callahan1, Juliana Barnard1, Alan E Bauck2, Jeff Brown3, Bruce N Davidson4, Hossein Estiri5, Carsten Goerg1, Erin Holve6, Steven G Johnson7, Siaw-Teng Liaw8, Marianne Hamilton-Lopez9, Daniella Meeker10, Toan C Ong9, Patrick Ryan11, Ning Shang12, Nicole G Weiskopf13, Chunhua Weng12, Meredith N Zozus14, Lisa Schilling9.
Abstract
OBJECTIVE: Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is 'fit' for specific uses.Entities:
Keywords: data completeness; data use & quality; electronic health records
Year: 2016 PMID: 27713905 PMCID: PMC5051581 DOI: 10.13063/2327-9214.1244
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
Figure 1.Timeline of Significant Events in Developing the Harmonized DQ Terminology
Harmonized DQ Terms, Definitions, and Examples: Organized by Verification and Validation Contexts Within Categories and Subcategories
|
|
| ||
|---|---|---|---|
| a. Data values conform to internal formatting constraints. | a. Sex is only one ASCII character. | a. Data values conform to representational constraints based on external standards. | a. Values for primary language conform to ISO standards. |
| a. Data values conform to relational constraints. | a. Patient medical record number links to other tables as required. | a. Data values conform to relational constraints based on external standards. | a. Data values conform to all not-NULL requirements in a common multi-institutional data exchange format. |
| a. Computed values conform to computational or programming specifications. | a. Database- and hard-calculated Body Mass Index (BMI) values are identical. | a. Computed results based on published algorithms yield values that match validation values provided by external source. | a. Computed BMI percentiles yield identical values compared to test results and values provided by the CDC. |
| a. The absence of data values at a single moment in time agrees with local or common expectations. | a. The encounter ID variable has missing values. | a. The absence of data values at a single moment in time agrees with trusted reference standards or external knowledge. | a. The current encounter ID variable is missing twice as many values as the institutionally validated database. |
| a. Data values that identify a single object are not duplicated. | a. Patients from a single institution do not have multiple medical record numbers. | a. Data values that identify a single object in an external source are not duplicated. | a. An institution’s CMS facility identifier does not refer to a multiple institutions. |
| a. Data values and distributions agree with an internal measurement or local knowledge. | a. Height and weight values are positive. | a. Data values and distributions (including subgroup distributions) agree with trusted reference standards or external knowledge. | a. HbA1c values from hospital and national reference lab are statistically similar under the same conditions. |
| a. Observed or derived values conform to expected temporal properties. | a. Admission date occurs before discharge date. | a. Observed or derived values have similar temporal properties across one or more external comparators or gold standards. | a. Length of stay by outpatient procedure types conforms to Medicare data for similar populations. |
Notes: The lettering in each column can be used to map each definition to its corresponding example. Not every definition has a corresponding example.
Extract, Transform, Load ETL (ETL); International Organization for Standardization (ISO); Electronic Health Record (EHR) Data; International Classification of Diseases, Ninth and Tenth Revisions (ICD-9CM and ICD-10CM); Current Procedural Terminology (CPT); Centers for Medicare & Medicaid Services (CMS); Centers for Disease Control and Prevention (CDC).
Crosswalk Between Harmonization Terminology, Categories, and Subcategories Versus Pre-Existing Categories and Frameworks
| Value |
Representation-Integrity Coding-Consistency Representation-Consistency |
Consistency |
Internal Consistency External Consistency |
Plausibility | |
|
| |||||
| Relational |
Domain-Consistency Domain-Metadata |
Data Element Completeness Information Loss and Degradation | |||
|
| |||||
| Computational |
Correctness (Accuracy Elements) |
Concordance | |||
| Completeness |
Representation-Complete Domain-Complete Relative-Completeness |
“Column” Data Value Completeness |
Completeness (Elements of Correctness) |
Documentation Completeness Density Completeness |
Completeness |
| Uniqueness |
Ascertainment Completeness |
No Duplication | |||
|
| |||||
| Atemporal |
Domain-Consistency Relative- Correctness Relative-Completeness |
Representational Inaccuracy Information Loss and Degradation Consistency |
Correctness (Reliability Elements) Consistency (Reliability Elements) External Consistency |
Density Completeness |
Correctness Concordance Plausibility |
|
| |||||
| Temporal |
Representation-Correctness |
Consistency | |||
Notes: Existing DQ approaches were organized chronologically; approaches with the same year of publication were ordered alphabetically. Only the first author and publication date are provided in the table.
Johnson51 Correctness (RepresentationIntegrity, RelativeCorrectness, RepresentationCorrectness, Reliability); Consistency (RepresentationConsistency, DomainConsistency, CodingConsistency, DomainMetadata); Completeness (RepresentationComplete, DomainComplete, RelativeCompleteness, Sufficiency, DomainCoverage, TaskCoverage, Flexibility, Relevance); Currency (RepresentationCurrent, Dataset Current, TaskCurrency). The proposed terminology does not capture Reliability, Sufficiency, DomainCoverage, TaskCoverage, Flexibility, Relevance, RepresentationCurrent, DatasetCurrent, or TaskCurrency.
Zozus50 Completeness (Data Element Completeness, “Column” Data Value Completeness, “Row” Data Value Completeness, and Ascertainment Completeness); Accuracy (Representational Inadequacy, Information Loss and Degradation); Consistency has no lower-level terminology. The proposed terminology does not capture “Row” Data Value Completeness.
Liaw46 The proposed terminology does not capture: Timeliness, Relevance, Usability, or Security.
Weiskopf48 Completeness (Documentation, Breadth, Density, and Prediction). The proposed terminology does not capture Breadth or Prediction.
Weiskopf45 Both Plausibility and Concordance are proxies of Correctness. The proposed terminology does not capture Currency.
| Value |
Attribute Domain Constraints Historical Data Rules State-Dependent Object Rules |
Granularity Precision |
Data Integrity Fundamentals |
Representational Consistency | |
|
| |||||
| Relational |
Relational Integrity Rules |
Attribution |
Data Specifications | ||
|
| |||||
| Computational |
Attribute Dependency Rules | ||||
| Completeness |
Attribute Domain Constraints |
Completeness |
Data Integrity Fundamentals |
Completeness | |
| Uniqueness |
Relational Integrity Rules |
Duplication | |||
|
| |||||
| Atemporal |
Attribute Domain Constraints Relational Integrity Rules Attribute Dependency Rules |
Consistency (Internal) Granularity |
Data Integrity Fundamentals Accuracy Consistency and Synchronization |
Consistency Correctness Accuracy |
Believability Accuracy Representational Consistency |
|
| |||||
| Temporal |
Historical Data Rules Attribute Dependency Rules State-Dependent Object Rules |
Accuracy |
Data Integrity Fundamentals | ||
Notes: Existing DQ approaches were organized chronologically; approaches with the same year of publication were ordered alphabetically. Only the first author and publication date are provided in the table.
Kahn44 Attribute Domain Constraints (Attribute Profiling, Optionality, Format, Valid Valúes, Precisión); Relational Integrity Rules (Identity, Reference, Cardinality, Inheritance); Historical Data Rules (Currency, Retention, Granularity, Continuity, Timeline Patterns, Value Patterns, Event Dependencies, Event Conditions, Event Attributes); State-Dependent Object Rules (State-Transition Profiling, State Domain, Action Domain, Terminator Domain, State-Actions); Attribute Dependency Rules (Continuity, Duration, Redundant Attributes, Derived Attributes, Partially Dependent Attributes, Conditional Optionality, Correlated Attributes). The proposed terminology does not capture Historical Data Rules: Retention.
Nahm57 Inherent (Accuracy, Currency, Completeness, Consistency (internal), Specificity, Attribution); Context Dependent (Timeliness, Relevance, Granularity, Precisión). The proposed terminology does not capture Currency, Timeliness, Relevance, or Specificity.
McGilvray64 The proposed terminology does not capture Timeliness And Availability, Ease Of Use And Maintainability, Data Coverage, Presentation Quality, Perception, Relevance and Trust, Data Decay, or Transactability.
Eppler65 Community Level (Comprehensiveness, Accuracy, Clarity, Applicability); Product Level (Conciseness, Consistency, Correctness, Currency); Process Level (Convenience, Timeliness, Traceability, Interactivity); Infrastructure Level (Accessibility, Security, Maintainability, Speed).The proposed terminology does not capture Comprehensiveness, Clarity, Applicability, Conciseness, Currency, Convenience, Timeliness, Interactivity, Accessibility, Security, Maintainability, or Speed.
Wang42 Intrinsic DQ (Believability, Accuracy, Objectivity, Reputation); Contextual DQ (Value-Added, Relevance, Timeliness, Completeness, Appropriate Amount of Data); Accessibility DQ (Accessibility, Access Security); Representational DQ (Interpretability, Ease of Understanding, Representational Consistency, Concise). The proposed terminology does not capture Value-Added, Cost-effectiveness, Relevancy, Interpretability, Ease of Understanding, Ease of Operations, Accessibility, Flexibility, Objectivity, Timeliness, Reputation, Concise, Access Security, Appropriate Amount of Data, Variety of Data, or Traceability.