| Literature DB >> 29881733 |
Tiffany J Callahan1, Alan E Bauck2, David Bertoch3, Jeff Brown4, Ritu Khare5, Patrick B Ryan6, Jenny Staab2, Meredith N Zozus7, Michael G Kahn8.
Abstract
OBJECTIVE: To compare rule-based data quality (DQ) assessment approaches across multiple national clinical data sharing organizations.Entities:
Year: 2017 PMID: 29881733 PMCID: PMC5982846 DOI: 10.5334/egems.223
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
Participating Organization Characteristics
| CHARACTERISTIC | CESR | MURDOCK | OHDSI | PEDSNET | PHIS | SENTINEL |
|---|---|---|---|---|---|---|
| Organization Type | Clinical Research Network | Registry and Biorepository | Open Science Collaborative | Clinical Research Network | Member Association | Clinical Research Network |
| Date Founded | 2010 | 2007 | 2014 | 2013 | 1993 | 2008 |
| Stakeholdersa | Internal External | Internal External | External | External | External | External |
| Network Typeb | Distributed | Data Center | Distributed | Distributed | Data Center | Distributed |
| Network Sites (#) | 7 | 8 | 50 | 8 | 49 | 18 |
| Patient Recordsc | 10,400,000 | 11,749 | 660,000,000 | 5,112,227 | 22,000,000 | 193,000,000 |
| Primary Analytical Focus | Comparative Effectiveness and Safety | Precision Medicine | Large-Scale Analytics | Pediatric Disease Surveillance | Comparative Effectiveness | Medical Product Safety Surveillance |
| Common Data Modeld | CESR VDWe | — | OMOPf | OMOP | — | SCDMg |
| DQA Coordination | Centralized | Centralized | Distributed | Centralized | Centralized | Centralized |
| DQ Employees (#)h | 2 | 1 | Varies by site | 2 | 2 | 8 |
| DQA Programs and Tools | SAS | SAS | OHDSI toolsi | R, OHDSI tools | SAS/SAP Business Objects | Sentinel toolsj |
| DQ Checks Providedk | 3,434 | 3,220 | 172 | 875 | 1,835 | 1,487 |
| Received DQ Check Format | General Check List and VDW Information | Documented Check List | SQL Code | R Code | Documented Check List | Documented Check List |
| DQ Check Access | CESR Staff | MURDOCK Faculty | Open Source; GitHubl | Open Source GitHubm | PHIS staff | Open Source; Sentinel websiten |
Notes: aStakeholders: Refers to the governing group or organization for the project, not specifically for DQ-related work.
bNetwork Type: Refers to the organizations providing the data for secondary use, not the network where the patients are seen and the data is collected.
cPatient Records: Depending on the organization, refers to either the current active patient records (CESR, MURDOCK, PEDSnet) or all available patient records (OHDSI, PHIS, Sentinel).
dCommon Data Model: Refers to the data model that is used for data sharing, unless the organization utilizes a single data model for multiple purposes (i.e., for data storage versus data sharing); this distinction is not denoted in the table.
eThe Center for Effectiveness & Safety Research (CESR) Virtual Data Warehouse (VDW) is an expanded data model built on top of the Health Care Systems Research Network (HCSRN) VDW. http://cesr.kp.org/en/
fObservational Medical Outcomes Partnership (OMOP).
gSentinel Common Data Model (SCDM).
hDQ employees: Individuals who are not hired specifically to conduct DQ-related work, but who purposefully dedicate a portion of their FTE to assessing DQ.
iTool information can be found on the OHDSI home page: http://www.ohdsi.org/
jSentinel Data QA SAS tools and information can be found at the website: http://www.mini-sentinel.org/
kThe total number of provided DQ checks shown includes only those checks that were mapped to categories within the data verification context.
lAutomated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) Heel DQ checks obtained from https://github.com/OHDSI/Achilles/blob/master/inst/sql/sql_server/Achilles_v5.sql
mPEDSnet DQ checks obtained from https://github.com/PEDSnet; permission required to access code repository
nSentinel DQ checks obtained from http://www.mini-sentinel.org/data_activities/distributed_db_and_data/details.aspx?ID=131
DQ Check Coverage by DQA Category by Organization
| DQ HARMONIZATION TERMINOLOGY CATEGORIES | ORGANIZATIONS | |||||||
|---|---|---|---|---|---|---|---|---|
| CESR N (%) | MURDOCK N (%) | OHDSI N (%) | PEDSnet N (%) | PHIS N (%) | SENTINEL N (%) | TOTAL N (%) | ||
| Conformance | Value | 1,434 (41.76) | 43 (1.34) | 0 (0.00) | 3 (0.34) | 65.5 (3.57) | 421 (28.31) | 19,66.5 (17.84) |
| Relational | 786 (22.89) | 36 (1.12) | 25 (14.53) | 13 (1.49) | 114 (6.21) | 42 (2.82) | 1,016 (9.22) | |
| Calculation | 50 (1.46) | 0 (0.00) | 5 (2.91) | 0 (0.00) | 10 (0.54) | 1 (0.07) | 66 (0.60) | |
| Completeness | Atemporal | 754 (21.96) | 9 (0.28) | 3 (1.74) | 367.5 (42.00) | 186.5 (10.16) | 111 (7.46) | 1,431 (12.98) |
| Temporal | 0 (0.00) | 0 (0.00) | 0 (0.00) | 0 (0.00) | 22 (1.20) | 0 (0.00) | 22 (0.20) | |
| Plausibility | Uniqueness | 1 (0.03) | 0 (0.00) | 0 (0.00) | 0 (0.00) | 29 (1.58) | 18 (1.21) | 48 (0.44) |
| Atemporal | 207 (6.03) | 3,031 (94.13) | 87 (50.58) | 315 (36.00) | 1,300 (70.84) | 527 (35.44) | 5,467 (49.60) | |
| Temporal | 202 (5.88) | 101 (3.14) | 52 (30.23) | 176.5 (20.17) | 108 (5.89) | 367 (24.68) | 1,006.5 (9.13) | |
Notes: A total value of one was assigned to each mapped DQ check; DQ checks mapped to two categories were each given a value of 0.5.
Figure 1Harmonized DQA Terminology Mapped DQ Check Coverage
Figure 2Harmonized DQA Terminology Coverage of Mapped DQ Checks by Organization
DQ Check Mapping Conventions and Example DQ Checks from Participating Organizations
| HARMONIZED TERMINOLOGY | CONVENTIONS | ||
|---|---|---|---|
| Category | Category | Mapping Convention | Example DQ check ( |
| Conformance | Value | The DQ check examines the formatting of variables (i.e., length, string/numeric variable typing). | Type: Check the type of variable |
| Abnormal Flag valid values are LL for low panic, L for low, H for high, HH for high panic, and A for abnormal. Are other values present? | |||
| Relational | The DQ check examines the relational database constraints (e.g., primary and foreign key relationships), as well constraints specified by the metadata (e.g., table existence). | Compare the number of records in person table with the number of records in observation_period table | |
| At least one PatID in the ENR table is not in the DEM table | |||
| Calculation | The DQ check examines computationally derived variables. | For | |
| Admit date = date of physician order to admit as inpatient. LOS does not include time prior to admit order, i.e. time in ED, observation | |||
| Completeness | Atemporal | The DQ check examines counts of missing or available single variable or multiple variables at one time point. | Describe missing values |
| Fire if date_of_birth is | |||
| Temporal | The DQ check examines counts of missing or available single variable or multiple variables across multiple time points. | Medical Discharge Hour values are missing for >=8 consecutive hours (and <100% single value) | |
| Plausibility | Uniqueness | The DQ check tests for duplicated variables, variable values, and records. | Enr_End occurs more than once in the file in combination with PatID, MedCov, and DrugCov |
| Classify how many records: 1 = no duplicate admission to same facility, 2+ = duplicate admission | |||
| Atemporal | The DQ check examines the range or distribution of a single variable (e.g., height or weight) or the relationship between multiple variables (e.g., gender and procedure type) to determine if values are correct. | Distribution of age at first observation period | |
| HEIGHT is not between 36 and 84 | |||
| Temporal | The DQ check examines the believability of data values over a certain period of time (e.g., hours, days, years). | Number of visit records with end date < start date | |
| Count number of observations by year across all years of data | |||
Note: The conventions in the table are generalized to the category level of the harmonized DQA terminology.