| Literature DB >> 28709453 |
Loan R van Hoeven1,2, Martine C de Bruijne3, Peter F Kemper4, Maria M W Koopman5, Jan M M Rondeel6, Anja Leyte7, Hendrik Koffijberg8, Mart P Janssen9,4, Kit C B Roes9.
Abstract
BACKGROUND: Although data from electronic health records (EHR) are often used for research purposes, systematic validation of these data prior to their use is not standard practice. Existing validation frameworks discuss validity concepts without translating these into practical implementation steps or addressing the potential influence of linking multiple sources. Therefore we developed a practical approach for validating routinely collected data from multiple sources and to apply it to a blood transfusion data warehouse to evaluate the usability in practice.Entities:
Keywords: Data quality; Data validation; Linkage of multiple sources; Routinely collected data
Mesh:
Year: 2017 PMID: 28709453 PMCID: PMC5512751 DOI: 10.1186/s12911-017-0504-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Development of the validation approach. First, validity concepts are identified, selected and defined. Second, concrete validity outcomes are established, tailored to the specific application or dataset
Fig. 2The validation approach. The approach distinguishes external validation concepts (upper part) and internal validation (lower part) concepts. The numbers indicate a suggested order in which to check to concepts in order to efficiently identify errors in the data
Concepts and applied validation outcomes for n = 2 hospitals from the DTD
|
|
| |||
|---|---|---|---|---|
| Order | Concept | Aim | Outcome | Average of two hospitals |
|
| ||||
| 1 | Concordance with report | Data are concordant with (annual) report | % agreement between number of products in annual blood bank report and DWH | 98.7% (RBC 99.2%, PLT 97.6%, FFP 98.7%) |
| 10 | Concordance with literature | Data are concordant with previous findings in literature | Comparison of distribution of blood products by age and gender per product type in the Netherlands | Distributions were quite similar, only platelet use has shifted towards younger patients (Additional file |
| 11 | Concordance with experts | Data are concordant with expert opinions; findings can be explained in a clinical context | Plausibility of changes in Hb after blood transfusion | The experts concluded that the plausibility is acceptable; the 1% unexpected decreases might be explained by other factors. |
| 12 | Concordance with other databases | Findings are concordant with other databases | Comparison of findings with SCANDAT, a Scandinavian transfusion database | The SCANDAT database has similar external concordance, completeness and linkage rates. |
|
| ||||
| 2 | Linkage data sources within DWH | Entities occurring in multiple data tables can be linked | % transfusions linked to issued products by id of the end product | 99.96% (no link for |
| % products issued linked to transfusion (indicates spilling rate) | 97.65% (RBC 97.95%, PLT 99.25%, FFP 93.35%) | |||
| % products that can be linked to donation(s); % products linked to donors | Initially 96.73%, after improving the donation numbers this increased to 99.99%; the link from product to donor was 99.98% | |||
| 3 | Identity | No duplicates | % duplicated transfusions (donation identification code + product type) | 0.14% (initially this was 1%; it turned out that most duplicates were split products. Due to unavailable product codes in one hospital, the broader product type had to be used) |
| % duplicated donations (donation identification code + product code) | 0.005% (RBCs); 0% (FFP and PLT) | |||
| % duplicated procedures codes | 0% (all duplicates were removed, because it was expected that double registration would occur) | |||
| 4 | Completeness | No missing variables or values | % patient ID; date of birth; gender, procedure date; Hb and thrombocyte counts; product code | 100%; 99.99%; 99.99%; 100%; 99.8% and 97.5%; 50% |
| % non-missing values for donor ID; date of birth; gender, Hb value, Expiration or Production date | 100%; 99,995%; 100%; 98.8%; 100% | |||
| % of transfusions that fall within at least one diagnosis start and end date | 98% (see Additional file | |||
| 5 | Uniformity | Measures across time and data sources all have the same units, level of detail and/or coding system | % product codes that occur in the reference list of ISBT product codes | 50% (for one hospital product code was not available) |
| % Diagnosis codes that occur in the reference list (of national diagnosis codes and descriptions) | 96.1% | |||
| % of Hb measurements from hospitals and blood bank with the same level of precision | >99.6% uses 1 significant decimal in all sources | |||
| 6 | Time patterns | No unexplained changes over time | Compare number of donations, products and donors of subsequent (calendar) years | The observed decrease (Additional file |
| Examine number of transfusions per year per product type | The relatively high decrease for FFP use (Additional file | |||
| Examine linkage percentage of transfusions to products issued per year | In 2010 relatively many unlinked transfusions occurred (see Additional file | |||
| 7 | Plausibility | Data are free of identifiable errors | % donation date < date of pooling | 100% |
| % within limits for number of donations per donor per year (maximum is 3 (females) or 5 (males) for whole blood and 23 for plasma) | FFP 100%; WB 99.8% (0.2% exceeds the limit with in total 6 or 7 donations within a year) | |||
| % donor age > 18 and >70 years (minimum and maximum age for donating) | 100% (only 0.0006% was >70 and 0.0004% was <18 and these were mainly autologous donations) | |||
| % transfusion with increase (and decrease) in Hb level (Hb values + − 1 day around transfusion; difference > + − 8.8% is considered a clinical change) | 54% increases; 6% decreases; 40% no change. Of those decreasing, 97% had a diagnosis indicating high bleeding risk | |||
| % patient age < 121 years | 100% | |||
| Maximum number of transfusions per year | Max tr. per year 476 (mainly FFP) for diagnosis TTP. | |||
| % correct gender for Gynecology diagnoses | 100% | |||
| % patients with transfusions/ surgery after date of death | 0.0% ( | |||
| % with admission date before discharge date) (zero-length rule) | 100% | |||
| % with non-negative difference between expiration and transfusion date | 99.93% | |||
| 8 | Event attributes | All attributes relevant to an event description are present | % of pooled products that are linked to the correct number of unique donors (in this case 5 or 6 donors contribute to one pooled platelet product) | 100% |
| % of patients that are transferred to another hospital according to the ‘discharge destination’ variable | 6% | |||
| % transfusions linked to hospitalization (indicates outpatient transfusions) | 99.16% (of which 23.64% day admissions, likely including transfusions given at the outpatient ward) | |||
| 9 | Consistency hospitals within DWH | No unexplained differences between hospitals | Comparison of (validity) outcomes of the hospitals | The two hospitals have very similar validity outcomes, not requiring further investigation. |
Hospital characteristics (for the year 2014)
| Hospital A | Hospital B | |
|---|---|---|
| Number of beds | 1100 | 471 |
| Annual number of RBC transfusions | 12,653 | 6681 |
| Presence of typical transfusion specialisms | Hematology, oncology, thoracic surgery, trauma center | Hematology, oncology, thoracic surgery, trauma center (and heavy emphasis on major vascular / aneurysm surgery and obstetrics) |
Comparison of validity outcomes in the SCANDAT study and the current DTD results
| Outcome | SCANDAT 1/2 | DTD example |
|---|---|---|
| External concordance of database and official statistics on the number of transfusions | >97% | >98.7% for products and 99.96% for transfusions |
| % transfusions linked to the corresponding donor | 95% | 99.99% |
| % transfusions linked to hospitalization | 88.7% | 99.2% (of which 23.6% day admissions) |
| % duplicated donations and transfusions | 4.9% (donations) and 9.1% (transfusions) | 0% (donations) and 0.14% (transfusions) |
| % missing or invalid values for identification number or date values | Range between 0.1% to 3.6% | 0%–0.01% |
| Time patterns for donations and transfusion counts | In 1 year approximately 160,000 transfusions were missing; it took 2 years for the number of donations and transfusions to stabilize after the start of a new registration system | In 1 year the link of transfusions to products could be made for 2.2%, however this could be improved by adding donation data from the previous year |
| % of recipients had records of receiving a blood transfusion in two or more local registers | 8.9% | 6% of patients are transferred to another hospital |