| Literature DB >> 29881736 |
Tiffany Callahan1, Juliana Barnard1, Laura Helmkamp1, Julie Maertens1, Michael Kahn1.
Abstract
INTRODUCTION: Electronic health record (EHR) data are known to have significant data quality issues, yet the practice and frequency of assessing EHR data is unknown. We sought to understand current practices and attitudes towards reporting data quality assessment (DQA) results by data professionals.Entities:
Year: 2017 PMID: 29881736 PMCID: PMC5982990 DOI: 10.5334/egems.214
Source DB: PubMed Journal: EGEMS (Wash DC) ISSN: 2327-9214
Figure 1Project Timeline
Interview Participants and Site Characteristics
| CHARACTERISTIC | SITE 1 | SITE 2 | SITE 3 | SITE 4 |
|---|---|---|---|---|
| Site type | Academic | Medical group | Medical group | Commercial |
| # DQ employees | 1 | 8 | 2 | 8 |
| Stakeholders3 | Internal/External | External | Internal/External | Internal |
| Common data modelb | OHDSI, i2b2 | Sentinel | HCSRN VDW | OHDSI |
| Network type | Site-based | Distributed | Distributed | Site-based |
| Formal DQ plan | No | Yes | Yes | Yes |
| DQA tools | R, | Sentinel toolsd, | SAS | OHDSI toolsc, |
| Visualization tools | R, | COMPAREd | SAS, | OHDSI toolsc, |
| Key DQ dimensions | Completeness, | Validity, | Completeness, | Accuracy, |
| DQ remediation plan | No | Yes | Yes | Yes |
aAn internal stakeholder functions within an organization (e.g., employees, researchers, and/or project managers). In contrast, an external stakeholder (e.g., regulatory organizations, community members, and government agencies) has functionality that is external to the organization.
bCurrent CDM(s) utilized as of summer 2014. Observational Health Data Sciences and Informatics (OHDSI); Informatics for Integrating Biology and the Bedside (i2b2); Sentinel; The Health Systems Research Network (HSCRN) Virtual Data Warehouse (VDW).
cOHDSI tool information can be on the OHDSI home page: http://www.ohdsi.org/.
dSentinel Data QA SAS tools and information can be found here: http://www.mini-sentinel.org/.
Description of Participant Demographic and Employment Characteristics
| VARIABLE | N | % |
|---|---|---|
| Female | 56 | 50.5 |
| Male | 53 | 47.7 |
| Missing/unknown | 2 | 1.8 |
| 30 or younger | 14 | 12.6 |
| 31–50 | 62 | 55.9 |
| 51 or older | 33 | 29.7 |
| Missing/unknown | 2 | 1.8 |
| White | 89 | 80.2 |
| Asian/Pacific Islander | 9 | 8.1 |
| Hispanic/Latino | 2 | 1.8 |
| Multiracial | 5 | 4.5 |
| Missing/unknown | 6 | 5.4 |
| College graduate | 13 | 11.7 |
| Master's Degree | 31 | 27.9 |
| PhD/MD | 67 | 60.4 |
| 0–9 hours/week | 51 | 45.9 |
| 10–19 hours/week | 34 | 30.6 |
| 20–39 hours/week | 16 | 14.4 |
| 40 or more hours/week | 10 | 9.0 |
| 6 months – 1 year | 11 | 9.9 |
| 1 year – 2 years | 10 | 9.0 |
| 2 years – 4 years | 31 | 27.9 |
| 5 years or more | 58 | 52.3 |
| Missing/unknown | 1 | 0.9 |
| Data Producer i.e., creates/populates databases for a consumer/analyst | 14 | 12.6 |
| Data Consumer i.e., uses data that is provided by a data producer | 38 | 34.2 |
| Both a Data producer and Consumer | 54 | 48.6 |
| Missing/unknown | 5 | 4.5 |
| 0–25 customers/month | 38 | 34.2 |
| 26–50 customers/month | 4 | 3.6 |
| 51–100 customers/month | 5 | 4.5 |
| Greater than 100 customers/month | 10 | 9.0 |
| I am not sure how many consumers use the data that I produce each month | 9 | 8.1 |
| Missing/unknown | 45 | 40.5 |
| 1–4 sites | 32 | 28.8 |
| 5–10 sites | 21 | 18.9 |
| 11–14 sites | 9 | 8.1 |
| 15–20 sites | 10 | 9.0 |
| More than 20 sites | 21 | 18.9 |
| Does not apply | 16 | 14.4 |
| Missing/unknown | 2 | 1.8 |
Description of Current Data Quality Assessment Practices
| VARIABLE | N | % |
|---|---|---|
| OMOP | 16 | 14.4 |
| HCSMORN VDW | 7 | 6.3 |
| i2b2 | 8 | 7.2 |
| PCORnet | 6 | 5.4 |
| OpenEHR | 2 | 1.8 |
| Other | 15 | 13.5 |
| I do not utilize a common data model in my organization | 53 | 47.7 |
| Missing/unknown | 4 | 3.6 |
| No | 14 | 12.6 |
| Yes | 94 | 84.7 |
| Missing/unknown | 3 | 2.7 |
| No | 57 | 51.4 |
| Yes | 52 | 46.8 |
| Missing/unknown | 2 | 1.8 |
| < 10% | 22 | 19.8 |
| 10–40% | 53 | 47.7 |
| 41–70% | 17 | 15.3 |
| 71–100% | 9 | 8.1 |
| Does not apply | 7 | 6.3 |
| Missing/unknown | 3 | 2.7 |
| Probably won’t happen | 4 | 3.6 |
| Probably will happen | 19 | 17.1 |
| Definitely will happen | 86 | 77.5 |
| Missing/unknown | 2 | 1.8 |
| Will not happen | 9 | 8.1 |
| Probably won’t happen | 28 | 25.2 |
| Probably will happen | 45 | 40.5 |
| Definitely will happen | 20 | 18.0 |
| I am not currently assessing the quality of the data I work with | 4 | 3.6 |
| Consistency | 80 | 72.1 |
| Completeness | 83 | 74.8 |
| Non-redundancy | 40 | 36.0 |
| Processing Integrity | 64 | 57.7 |
| Quality of Documentation and Metadata | 51 | 45.9 |
| Consistency | 71 | 64.0 |
| Completeness | 84 | 75.7 |
| Non-redundancy | 35 | 31.5 |
| Processing Integrity | 50 | 45.0 |
| Quality of Documentation and Metadata | 31 | 27.9 |
Figure 2Factors Obtained for the Individual Barriers Questions
Note: Darker colored bars represent a stronger loading between a question and a factor. The questions that represent each factor are marked with a star (i.e., the Personal Consequences factor represents six items; the Process Issues factor represents three items; and the Lack of Resources factor represents two items).
Figure 3Factors Obtained for the Organizational Barriers Questions
Note: Darker colored bars represent a stronger loading between a question and a factor. The questions that represent each factor are marked with a star (i.e., The Environment Support factor represents four items and the Practice factor represents three items).
Factor Structure Matrix for Individual and Organizational Barriers Items
| ITEM | FACTOR LOADINGS | ||
|---|---|---|---|
| F1 | F2 | F3 | |
| Factor 1: Personal Consequences | |||
| Unknown consequences to my career should I discover and disclose data quality issues (i.e., losing my job, potential for attaining future federal funding, or losing a competitive edge within my field) | 0.79 | 0.12 | 0.11 |
| Concerns about discovering data quality issues that will invalidate my prior work and increase the difficulty of future publications | 0.77 | 0.30 | -0.03 |
| Concern that I will be expected to publicly report my data quality assessment findings | 0.72 | 0.39 | 0.00 |
| Possibility of colleagues leaving a collaboration because of discovered or unresolved data quality issues | 0.69 | 0.08 | 0.14 |
| A belief that the nature of my work does not require the assessment of data quality | 0.68 | -0.27 | 0.11 |
| Concern that data quality assessment reporting will create an expectation for reproducibility of my data quality findings | 0.68 | 0.43 | 0.07 |
| Factor 2: Process Issues | |||
| A lack of clear definitions for good or bad data quality | -0.08 | 0.82 | 0.22 |
| Concerns about discovering data quality issues that cannot easily be resolved | 0.35 | 0.63 | 0.18 |
| Belief that data quality efforts, no matter how comprehensive, fail to solve or prevent all potential analysis roadblocks | 0.36 | 0.54 | -0.30 |
| Factor 3: Lack of Resources | |||
| A lack of resources (i.e., not enough funding or time to carry out detailed data quality assessments) | 0.03 | 0.13 | 0.79 |
| My knowledge, experience, or training limits the types of data quality efforts that can be applied to the data I produce/consume | 0.17 | 0.04 | 0.74 |
| Factor 1: Environment/Support | |||
| Data providers/data owners are resistant to change | 0.77 | 0.21 | — |
| Data quality assessment is not a high priority for investigators | 0.74 | 0.14 | — |
| There are excess layers of management that interfere with data quality assessment efforts | 0.72 | 0.20 | — |
| The funding agencies compensation is not linked to achieving data quality goals | 0.64 | 0.17 | — |
| Factor 2: Practices | |||
| There are no best practices for effectively measuring data quality | 0.01 | 0.84 | — |
| Quality action plans/requirements/expectations are often vague | 0.22 | 0.78 | — |
| The high costs of implementing data quality assessments outweigh the benefits | 0.31 | 0.59 | — |
| Excluded items | |||
| Data providers/data owners are not trained in problem identification and problem solving skills | 0.56 | 0.47 | — |
| There is frequent turnover of data providers/data owners | 0.31 | 0.46 | — |
The three columns of numbers under the Factor Loading heading represent the loading values for each question onto each of the identified factors shown above.