| Literature DB >> 29881761 |
Laura Goettinger Qualls1, Thomas A Phillips1, Bradley G Hammill1, James Topping1, Darcy M Louzao1, Jeffrey S Brown2, Lesley H Curtis1, Keith Marsolo3.
Abstract
INTRODUCTION: Distributed research networks (DRNs) are critical components of the strategic roadmaps for the National Institutes of Health and the Food and Drug Administration as they work to move toward large-scale systems of evidence generation. The National Patient-Centered Clinical Research Network (PCORnet®) is one of the first DRNs to incorporate electronic health record data from multiple domains on a national scale. Before conducting analyses in a DRN, it is important to assess the quality and characteristics of the data.Entities:
Keywords: data quality; distributed research networks; electronic health records; patient-centered care; quality improvement
Year: 2018 PMID: 29881761 PMCID: PMC5983028 DOI: 10.5334/egems.199
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
Figure 1Data Curation Cycle.
Data Quality Checks.
| Category | Check | Description | Number of measures |
|---|---|---|---|
| Data Model Conformance | 1.01 | Required tables are not present | 15 tables |
| 1.02 | Expected tables are not populated | 7 tables | |
| 1.03 | Required fields are not present | 188 fields | |
| 1.04 | Fields do not conform to CDM specifications for data type, length, or name | 188 fields | |
| 1.05 | Tables have primary key definition errors | 7 tables | |
| 1.06 | Fields contain values outside of CDM specifications | 29 fields | |
| 1.07 | Fields have non-permissible missing values | 25 fields | |
| Data Plausibility | 2.01 | More than 5% of records have future dates | 7 fields |
| 2.02 | More than 20% of records fall into the lowest or highest categories of age, height, weight, diastolic blood pressure, systolic blood pressure. | 5 fields | |
| Data Completeness | 3.01 | The average number of diagnoses records per encounter is less than 1.0 for ambulatory, inpatient, emergency department, or ED-to-inpatient encounters. | 4 measures |
| 3.02 | The average number of procedure records per encounter is less than 1.0 for ambulatory, inpatient, emergency department, or ED-to-inpatient encounters. | 4 measures | |
| 3.03 | More than 5% of records have missing or unknown values for the following fields: birth date; sex; diagnosis code type, procedure code type, and vital source. | 5 fields | |
| 3.04 | More than 15% of records have missing or unknown values for the following fields: race, discharge disposition (institutional encounters only), and principal diagnosis code (institutional encounters only). | 3 fields | |
Data Quality Results, Number and Percentage of DataMarts* with Data Check Exceptions.
| Category | Check | Description | Baseline Refresh | Final Refresh |
|---|---|---|---|---|
| Data Model Conformance | 1.01 | Required tables are not present | 2/64 (3.0%) | 0/64 (0.0%) |
| 1.02 | Expected tables are not populated | 1/64 (1.6%) | 0/64 (0.0%) | |
| 1.03 | Required fields are not present | 1/64 (1.6%) | 0/64 (0.0%) | |
| 1.04 | Fields do not conform to CDM specifications for data type, length, or name | 2/64 (3.1%) | 0/64 (0.0%) | |
| 1.05 | Tables have primary key definition errors | 11/64 (17.2%) | 0/64 (0.0%) | |
| 1.06 | Fields contain values outside of CDM specifications | 17/64 (26.6%) | 0/64 (0.0%) | |
| 1.07 | Fields have non-permissible missing values | 13/64 (20.3%) | 0/64 (0.0%) | |
| Data Plausibility | 2.01 | More than 5% of records have future dates: | ||
| Birth Date | 0/64 (0.0%) | 0/64 (0.0%) | ||
| Admit Date | 0/64 (0.0%) | 0/64 (0.0%) | ||
| Discharge Date, institutional encounters | 0/60 (0.0%) | 0/59 (0.0%) | ||
| Procedure Date | 0/54 (0.0%) | 0/62 (0.0%) | ||
| Enrollment Start Date | 0/64 (0.0%) | 0/64 (0.0%) | ||
| Enrollment End Date | 0/64 (0.0%) | 0/64 (0.0%) | ||
| Measure Date | 0/62 (0.0%) | 0/62 (0.0%) | ||
| 2.02 | More than 20% of records fall into the lowest or highest categories of age, height, weight, diastolic blood pressure, systolic blood pressure: | |||
| Age | 0/62 (0.0%) | 0/63 (0.0%) | ||
| Height | 3/60 (5.0%) | 0/60 (0.0%) | ||
| Weight | 3/62 (4.8%) | 0/62 (0.0%) | ||
| Diastolic blood pressure | 5/60 (8.3%) | 1/60 (1.7%) | ||
| Systolic blood pressure | 1/60 (1.7%) | 1/60 (1.7%) | ||
| Data Completeness | 3.01 | The average number of diagnoses records per encounter is less than 1.0 for ambulatory, inpatient, emergency department, or emergency department to inpatient encounters. | ||
| Ambulatory | 7/60 (11.7%) | 2/64 (3.1%) | ||
| Inpatient | 8/53 (15.1%) | 1/58 (1.7%) | ||
| Emergency department | 7/59 (11.9%) | 0/59 (0.0%) | ||
| Emergency department to inpatient | 1/14 (7.1%) | 0/20 (0.0%) | ||
| 3.02 | The average number of procedure records per encounter is less than 1.0 for ambulatory (AV), inpatient, emergency department, or emergency department to inpatient encounters. | |||
| Ambulatory | 23/59 (39.0%) | 13/64 (20.3%) | ||
| Inpatient | 9/52 (17.3%) | 11/58 (19.0%) | ||
| Emergency department | 18/58 (31.0%) | 0/59 (0.0%) | ||
| Emergency department to inpatient | 1/14 (7.1%) | 1/20 (5.0%) | ||
| 3.03 | More than 5% of records have missing or unknown values for the following fields: birth date; sex; diagnosis code type, procedure code type, and vital source. | |||
| Birth date | 2/64 (3.1%) | 1/64 (1.6%) | ||
| Sex | 1/64 (1.6%) | 0/64 (0.0%) | ||
| Diagnosis code type | 2/64 (3.1%) | 0/64 (0.0%) | ||
| Procedure code type | 16/63 (25.4%) | 14/64 (21.9%) | ||
| Vital source | 12/62 (19.4%) | 7/62 (11.3%) | ||
| 3.04 | More than 15% of records have missing or unknown values for the following fields: race, discharge disposition (institutional encounters only), and principal diagnosis code (institutional encounters only). | |||
| Race | 44/64 (68.8%) | 44/64 (68.8%) | ||
| Discharge disposition (institutional encounters only) | 31/64 (48.4%) | 24/59 (40.7%) | ||
| Principal diagnosis code (institutional encounters only) | 17/64 (26.6%) | 11/59 (18.6%) | ||
*The number of DataMarts varies by measure because of the data available in each DataMart. The number of DataMarts for a given measure may vary between the baseline and final refresh if network partners added, removed, or reclassified the data in the DataMart.
Data Quality Results, Descriptive Statistics for Data Completeness Checks.
| Baseline Refresh | Final Refresh | |||||||
|---|---|---|---|---|---|---|---|---|
| DataMarts* | Min | Median | Max | DataMarts* | Min | Median | Max | |
| Data Check 3.01 Diagnosis records per encounter, N | ||||||||
| Ambulatory encounters | 60 | 0.00 | 2.17 | 100.16 | 64 | 0.77 | 2.08 | 6.14 |
| Inpatient encounters | 59 | 0.00 | 6.92 | 46.99 | 59 | 1.11 | 9.62 | 45.97 |
| Emergency Department encounters | 53 | 0.00 | 3.26 | 12.54 | 58 | 0.00 | 3.37 | 12.62 |
| ED to inpatient encounters | 14 | 0.55 | 14.19 | 51.24 | 20 | 2.84 | 15.47 | 77.41 |
| Ambulatory encounters | 59 | 0.00 | 1.07 | 166.10 | 64 | 0.00 | 1.56 | 8.32 |
| Inpatient encounters | 58 | 0.00 | 4.42 | 173.90 | 59 | 0.00 | 10.14 | 173.18 |
| Emergency Department encounters | 52 | 0.00 | 1.36 | 16.66 | 58 | 0.00 | 3.47 | 16.66 |
| ED to inpatient encounters | 14 | 0.41 | 20.77 | 158.62 | 20 | 0.45 | 41.54 | 159.05 |
| Birth date | 64 | 0.00 | 0.00 | 82.51 | 64 | 0.00 | 0.00 | 8.51 |
| Sex | 64 | 0.00 | 0.04 | 6.00 | 64 | 0.00 | 0.04 | 5.82 |
| Diagnosis type | 64 | 0.00 | 0.00 | 17.85 | 64 | 0.00 | 0.00 | 1.82 |
| Procedure type | 63 | 0.00 | 0.00 | 100.00 | 64 | 0.00 | 0.00 | 100.00 |
| Vital source | 62 | 0.00 | 0.00 | 100.00 | 62 | 0.00 | 0.00 | 100.00 |
| Race | 64 | 0.67 | 27.20 | 86.43 | 64 | 0.69 | 21.94 | 86.09 |
| Discharge disposition, institutional encounters | 59 | 0.00 | 24.41 | 100.00 | 59 | 0.00 | 3.48 | 100.00 |
| Principal diagnoses, institutional encounters | 58 | 0.00 | 0.00 | 100.00 | 59 | 0.00 | 0.00 | 100.00 |
*The number of DataMarts varies by measure because of the data available in each DataMart. The number of DataMarts for a given measure may vary between the baseline and final refresh if network partners added, removed, or reclassified the data in the DataMart. ED = emergency department.