| Literature DB >> 33812388 |
Qi Tian1,2, Zhexi Han1,2, Ping Yu3, Jiye An1,2, Xudong Lu4,5,6, Huilong Duan1,2.
Abstract
BACKGROUND: Ensuring data is of appropriate quality is essential for the secondary use of electronic health records (EHRs) in research and clinical decision support. An effective method of data quality assessment (DQA) is automating data quality rules (DQRs) to replace the time-consuming, labor-intensive manual process of creating DQRs, which is difficult to guarantee standard and comparable DQA results. This paper presents a case study of automatically creating DQRs based on openEHR archetypes in a Chinese hospital to investigate the feasibility and challenges of automating DQA for EHR data.Entities:
Keywords: Automatic; Data quality assessment; Data quality rule; OpenEHR archetypes; Secondary use of EHR
Mesh:
Year: 2021 PMID: 33812388 PMCID: PMC8019503 DOI: 10.1186/s12911-021-01481-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The details of Kahn’s DQA framework
| Definition of assessment dimensions | Sub-dimension | Definition |
|---|---|---|
| Conformance: whether data value fulfills certain standards and formats | Value conformance: data value conforms to prespecified data types, data domain, allowable values, value sets, or terminology standards | Data values conform to internal formatting constraints |
| Data values conform to allowable values or ranges | ||
| Relational conformance: data value conforms to relational constraints imposed by physical database structure | Data values conform to relational constraints | |
| Unique (key) data values are not duplicated | ||
| Changes to the data model or data model versioning | ||
| Computational conformance: calculated value is consistent with technical functional specification | Computed values conform to computational or programming specifications | |
| Completeness: features that describe the frequencies of data attributes present in a data set without reference to data values | – | The absence of data values at a single moment in time agrees with local or common expectations |
| The absence of data values measured over time agrees with local or common expectations | ||
| Plausibility: features that describe the believability or truthfulness of data values | Uniqueness plausibility: objects appear multiple times are not duplicate or cannot be distinguished | Data values that identify a single object are not duplicated |
| Atemporal plausibility: observed data values, distributions, or densities agree with local or “common” knowledge (Verification) or from comparisons with external sources that are deemed to be trusted or relative gold standards (Validation) | Data values and distributions agree with an internal measurement or local knowledge | |
| Data values and distributions for independent measurements of the same fact are in agreement | ||
| Logical constraints between values agree with local or common knowledge (includes “expected” missingness) | ||
| Values of repeated measurement of the same fact show expected variability | ||
| Temporal plausibility: time-varying variables change values as expected based on known temporal properties or across one or more external comparators or gold standards | Observed or derived values conform to expected temporal properties | |
| Sequences of values that represent state transitions conform to expected properties | ||
| Measures of data value density against a time-oriented denominator are expected based on internal knowledge |
Fig. 1The workflow of automatic creation of DQRs based on the CDR
An example of representing a quality requirement of CARSES as an SQL query
| Description of quality requirement | Corresponding table name | Corresponding column name | Corresponding DQR(SQL) |
|---|---|---|---|
| ID of request should not be empty | Imaging_exam_ request | Request_identifier | SELECT |
| Request_identifier | |||
| FROM | |||
| Imaging_exam_ request | |||
| WHERE | |||
| Request_identifier not null |
Fig. 2The workflow of verifying the method of automatic creation of DQRs
Sub-dimensions in Kahn’s framework and corresponding keywords of ADL and DQR templates
| Sub-dimension of Kahn’s DQA framework | Keywords of ADL | Definition | Example of constraint description | DQR templates with example for problematic data (in pseudocode) |
|---|---|---|---|---|
| Relational conformance | Cardinality | Limits the max number of memberships | (2..5) | Select * |
| From archetypes group by attributes having count(*) > 5 and count(*) < 2 | ||||
| Occurrences | Data exist only once | (1..1) | Select attribute | |
| From archetype | ||||
| group by attribute having count(*) ! = 1 | ||||
| Completeness | Existence | Attribute value is optional | (0..1) | – |
| Attribute value is mandatary | (1..1) | Select attribute | ||
| From archetype | ||||
| Where attribute is Null or attribute = ‘’ | ||||
| Uniqueness plausibility | Cardinality | Objects in one list are not duplicate | (..unique) | Select * |
| From archetypes group by attributes having count(*) > 1 | ||||
| Value conformance | Defining_code | Designate terminology code | (Codeset) | Select attribute |
| From archetype | ||||
| Where attribute.code not in {Codeset} | ||||
| Matches( ∈) | Value range | (|10..1000|) | Select attribute | |
| From archetype | ||||
| Where attribute value < 10 or > 1000 | ||||
| Designate value | (“mmHg”) | Select attribute | ||
| From archetype | ||||
| Where attribute.value ! = ‘mmHg’ |
The archetype and attribute(s) stand for corresponding table name and column name(s) of the CDR database. Codeset stands for the content of a specific code constraint
The quality-related features in the openEHR RM and the corresponding DQR templates.a
| Sub-dimension of Kahn’s DQA framework | Features | Type | Definition in RM | Use in this study | DQR templates with example for problematic data (in pseudocode) |
|---|---|---|---|---|---|
| Temporal plausibility | Instruction, action, observation, evaluation | Entry class | To represent the status of one clinical event | The time sequence logic of one clinical event, for example, the operation request time in Instruction should early than operation executed time in Action | Select attribute |
| From archetypes | |||||
| Where Instruction. attribute.date/time > action.attribute.date/time |
aThe archetype and attribute stand for corresponding column name and table name of the CDR database
The number of DQRs created by automatic and manual methods using the 27 archetypes
| Category | Title of archetype | Auto-created | Expert-created |
|---|---|---|---|
| Demographic | openEHR-DEMOGRAPHIC-PERSON.person | 3 | 2 |
| openEHR-DEMOGRAPHIC-ITEM_TREE.person_details | 7 | 7 | |
| openEHR-DEMOGRAPHIC-PERSON.person-patient | 3 | 3 | |
| openEHR-DEMOGRAPHIC-PARTY_IDENTITY.person_name | 1 | 1 | |
| Admission | openEHR-EHR-ADMIN_ENTRY.admission | 7 | 7 |
| openEHR-EHR-EVALUATION.problem_diagnosis | 6 | 9 | |
| Orders | openEHR-EHR-INSTRUCTION.order | 25 | 20 |
| openEHR-EHR-ACTION.order | 14 | 14 | |
| openEHR-EHR-INSTRUCTION.prescription | 22 | 22 | |
| openEHR-EHR-ACTION.Prescription | 12 | 12 | |
| Lab test | openEHR-EHR-INSTRUCTION.request-lab_test | 14 | 14 |
| openEHR-EHR-OBSERVATION.lab_test | 5 | 8 | |
| openEHR-EHR-OBSERVATION.lab_test_single | 15 | 17 | |
| openEHR-EHR-CLUSTER.specimen | 18 | 17 | |
| Imaging examination | openEHR-EHR-INSTRUCTION.request-imaging_exam | 27 | 28 |
| openEHR-EHR-OBSERVATION.imaging_exam_image_series | 16 | 17 | |
| openEHR-EHR-OBSERVATION.imaging_exam_report | 13 | 13 | |
| EMR | openEHR-EHR- OBSERVATION.EMR_first_page | 10 | 11 |
| openEHR-EHR- OBSERVATION.EMR_document | 11 | 13 | |
| Nursing | openEHR-EHR-INSTRUCTION.nursing | 13 | 15 |
| openEHR-EHR-ACTION.nursing | 14 | 14 | |
| openEHR-EHR-OBSERVATION.physical_sign | 15 | 15 | |
| Operation | openEHR-EHR-INSTRUCTION.request-operation | 24 | 23 |
| openEHR-EHR-OBSERVATION.operation_record | 19 | 20 | |
| openEHR-EHR-ACTION.operation | 12 | 13 | |
| openEHR-EHR-OBSERVATION.blood_match | 15 | 13 | |
| openEHR-EHR-INSTRUCTION.transfusion | 18 | 18 | |
| Sum | 27 | 359 | 366 |
Fig. 3The proportion of expert-created and auto-created DQRs and their coverage
The number of DQRs and coverage in each dimension of the requirements of CARSES
| Category | Completeness | Conformity | Consistency | Timeliness | Extra | Overall | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Auto | A/Ea | Auto | A/E | Auto | A/E | Auto | A/E | Auto | Auto | A/E | |
| Demographic | 6 | 6/6 | 1 | 1/1 | 7 | 6/6 | 0 | 0/0 | 0 | 14 | 13/13 |
| Admission | 8 | 8/8 | 1 | 1/5 | 4 | 2/2 | 0 | 0/1 | 1 | 14 | 11/16 |
| Orders | 35 | 35/35 | 9 | 9/16 | 17 | 13/13 | 2 | 2/4 | 10 | 75 | 61/68 |
| Lab test | 30 | 30/30 | 3 | 3/13 | 11 | 8/8 | 4 | 4/5 | 5 | 55 | 47/56 |
| Imaging exam | 36 | 36/37 | 8 | 8/12 | 7 | 6/6 | 2 | 2/3 | 4 | 59 | 54/58 |
| EMR | 13 | 13/16 | 4 | 4/6 | 1 | 0/0 | 0 | 0/2 | 3 | 21 | 17/24 |
| Nursing | 25 | 25/25 | 8 | 8/12 | 5 | 5/5 | 2 | 2/2 | 2 | 42 | 40/44 |
| Operation | 55 | 55/55 | 15 | 15/22 | 9 | 5/5 | 1 | 1/5 | 7 | 87 | 76/87 |
| Sum | 208 | 208/212 | 47 | 47/87 | 61 | 45/45 | 11 | 11/22 | 32 | 367 | 311/366 |
| Coverage (%) | 98.11 | 54.02 | 100.00 | 50.00 | 84.97 | ||||||
aNumber of agreement rules/ number of expert-created rules