Literature DB >> 12720565

Training in data definitions improves quality of intensive care data.

Daniëlle G T Arts¹, Rob J Bosman, Evert de Jonge, Johannes C A Joore, Nicolette F de Keizer.

Abstract

BACKGROUND: Our aim was to assess the contribution of training in data definitions and data extraction guidelines to improving quality of data for use in intensive care scoring systems such as the Acute Physiology and Chronic Health Evaluation (APACHE) II and Simplified Acute Physiology Score (SAPS) II in the Dutch National Intensive Care Evaluation (NICE) registry.
METHODS: Before and after attending a central training programme, a training group of 31 intensive care physicians from Dutch hospitals who were newly participating in the NICE registry extracted data from three sample patient records. The 5-hour training programme provided participants with guidelines for data extraction and strict data definitions. A control group of 10 intensive care physicians, who were trained according the to train-the-trainer principle at least 6 months before the study, extracted the data twice, without specific training in between.
RESULTS: In the training group the mean percentage of accurate data increased significantly after training for all NICE variables (+7%, 95% confidence interval 5%-10%), for APACHE II variables (+6%, 95% confidence interval 4%-9%) and for SAPS II variables (+4%, 95% confidence interval 1%-6%). The percentage data error due to nonadherence to data definitions decreased by 3.5% after training. Deviations from 'gold standard' SAPS II scores and predicted mortalities decreased significantly after training. Data accuracy in the control group did not change between the two data extractions and was equal to post-training data accuracy in the training group.
CONCLUSION: Training in data definitions and data extraction guidelines is an effective way to improve quality of intensive care scoring data.

Entities: Chemical Disease Species

Mesh：

Year: 2003 PMID： 12720565 PMCID： PMC270628 DOI： 10.1186/cc1886

Source DB: PubMed Journal: Crit Care ISSN： 1364-8535 Impact factor: 9.097

Introduction

A number of regional and (inter)national intensive care registries have been developed to enable quality assessment in intensive care, including the British Intensive Care National Audit & Research Centre (ICNARC) and the IMPACT project from the American Society of Critical Care Medicine. These registries enable evaluation of the effectiveness and efficiency of the care process. In 1996 the National Intensive Care Evaluation (NICE) registry was set up in The Netherlands for the same reason. A minimum data set is extracted from the patient record for every patient admitted to each of 21 intensive care units (ICUs) currently participating in the NICE registry. Scoring systems such as the Acute Physiology and Chronic Health Evaluation (APACHE) II [1] and the Simplified Acute Physiology Score (SAPS) II [2] are used to calculate a score for each patient based on the most abnormal data from the first 24 hours following intensive care admission; from this they quantify the severity of illness and calculate the corresponding probability of in-hospital mortality. As an indicator for quality assessment of intensive care, the observed mortality in the intensive care population is compared with the calculated case–mix corrected mortality in that population. The use of such an intensive care registry for quality assessment strongly depends on the quality of registry data. Several studies have shown interobserver and intraobserver variability in calculation of severity-of-illness scores [3-10]. Fery-Lemonnier and coworkers [3] discussed some of the problems that cause inaccurate APACHE II data collection, including ambiguous definitions and complex calculations. Chen and coworkers [4] also cited lack of clear instructions concerning the timing of APACHE II data collection as a source of variability. In order to improve quality of data, the NICE registry implemented a framework of quality assurance procedures [11]. As a part of this framework the NICE foundation has defined all variables. The NICE data definitions are present in a data dictionary that is available on the internet [12]. At least two physicians per ICU are obliged to attend a central training session organized by the NICE board, during which the data definitions are discussed. Physicians who have attended the central training session train their local staff. This is called the 'train-the-trainer' principle. The objective of the present study was to investigate whether this total training concept improves data quality and increases the validity of severity-of-illness scoring in the Dutch NICE registry.

Methods

Sample data

Two intensive care physicians who were experienced in severity scoring and formerly involved in composing the NICE data definitions selected three sample patient cases. These cases were modified to include some potential pitfalls in data extraction (e.g. abnormal physiological values just before or 24 hours after admission to the ICU). The NICE dataset consists of 88 variables (37 categorical variables, 43 numerical variables, six date/time variables, and two strings). In order to reduce errors associated with identifying the worst value, NICE requires the lowest and the highest values recorded in the first 24 hours, Subsequently a central computer algorithm selects the worst value. The standardized data definitions used in the NICE registry are in agreement with widely accepted data definitions used in the severity-of-illness scoring models (e.g. APACHE II and SAPS II) [1,2,13-15]. According to these definitions, the two physicians reached consensus on values for all data items for the three sample patient cases. These values were considered the 'gold standard'.

Training group

Between February 1999 and May 2001, four central NICE training sessions took place, which were attended by a total of 31 participants. Training session participants were physicians from ICUs that intended to participate in the NICE registry. Each training session took approximately 5 hours. All training participants received a copy of the NICE data dictionary. During the sessions the definitions of all variables and the data extraction guidelines were discussed and practiced with some patient cases. These central training sessions were given by members of the NICE board who had been involved in the composition of the NICE data definitions and were highly experienced with severity-of-illness scoring systems. All training participants received photocopies of the records from the three selected sample patient cases. They were asked to extract the NICE data from these records into specially designed paper forms 1 week before attending the training session and within 1 month afterward.

Control group

In order to assess the effect on quality of data extracted for the same patient records twice (without training), photocopies of the records from the same three sample patient cases were issued to a control group. The control group consisted of 10 randomly selected physicians and registrars working in one of the ICUs that had been routinely extracting NICE data for several years. The control group had been locally instructed on data definitions and guidelines at least 6 months before the study, according to the 'train-the-trainer' principle, by one of the intensive care staff who had previously attended a central NICE training session. A copy of the NICE data dictionary was available to all control group members. The control group was asked to extract data from the sample patient records twice at an interval of 4–6 weeks without training in between. After the first extraction both the training group and the control group were informed about the study design, implying that there would be a second data extraction 1 month after the first. Participants in the training group and in the control group did not receive their results for the first data extraction before they had completed the second.

Analysis of data quality

For both groups the recorded data were included only if a physician or registrar had extracted the data twice (before and after training, or for the first and the second data extractions). We analysed the accuracy of recorded data for three different data subsets: all variables in the NICE registry (n = 88), APACHE II variables (n = 35) and SAPS II variables (n = 26). Data accuracy for all three subsets was determined by comparing the extracted data with the gold standard data. The criteria used for analyzing the accuracy of APACHE II and SAPS II variables were different from those used for all NICE variables.

Criteria for all NICE variables

When assessing data quality for all NICE variables, categorical values, strings, dates and times were judged inaccurate when they were incomplete (an item left blank when, according to the gold standard, it was available) or not equal to the gold standard value. Numerical values were considered inaccurate when they were incomplete or deviated from the gold standard value by a degree greater than was considered acceptable. For example, a deviation in systolic blood pressure of more than 10 mmHg below or above the gold standard systolic blood pressure was considered inaccurate. Detailed criteria are available from the authors.

Criteria for APACHE II variables and SAPS II variables

APACHE II and SAPS II data were judged inaccurate if they caused a deviation from the gold standard score for that particular APACHE II or SAPS II variable. For instance: a recorded mean blood pressure of 130 mmHg instead of the gold standard value of 127 would be considered inaccurate because the first results in 3 APACHE II points for blood pressure and the latter in only 2 APACHE II points.

Statistics

Values are presented as percentage accurate data and as absolute deviations from gold standard APACHE II and SAPS II scores and predicted mortalities. The percentage accurate data per participant per case was calculated by dividing the number of correctly recorded data items by the total number of data items that should have been recorded. A 95% confidence interval was calculated for all medians. For the training group and for the control group, differences in percentages of accurate data or in deviations from gold standard scores between the first and the second scoring were tested using the Wilcoxon signed rank sum test. P < 0.05 was considered statistically significant. We analyzed the type of data errors and compared their frequency of occurrence between the two data extractions by the training group. All data analyses were performed using SPSS software version 10.0 (SPSS Inc, Chicago, IL, USA).

Results

Group characteristics

Of all 31 training participants, 22 extracted the NICE data for one or more of the three sample patient cases before and after training. A total of 55 sample cases were evaluated. Eight of the 10 physicians and registrars in the control group returned the completed data collection forms for all three sample cases for the first and the second data extractions. This resulted in 24 cases. Training and control group characteristics are described in Table 1. Participants in the training group had more experience with APACHE II (82%) than with SAPS II (41%). Participants who did not extract data for all three cases before as well as after training did not differ in data accuracy as compared with those who fully extracted data for all three cases.

Table 1

Group characteristics

Variables	Training group	Control group
N	22	8
Participant types	Intensive care physicians (22)	Intensive care physicians (6), Intensive care registrars (2)
Gender (male/female)	18/4	6/2
Completed patient cases (total)	60	24
Prior experience in
APACHE II	18 (82%)	8 (100%)
SAPS II	9 (41%)	8 (100%)

APACHE, Acute Physiology and Chronic Health Evaluation; SAPS, Simplified Acute Physiology Score.

Percentage complete and accurate data items

Table 2 shows the percentage accurate data. Results of the Wilcoxon signed rank sum test indicate that, for all three data subsets, the percentage accurate data increased after training (P < 0.01) in the training group. The percentage accurate data for all NICE variables was 79% before and 86% after training. The control group showed no difference between the two sessions for all three data subsets. In the control group, the percentages of accurate data in the first and the second scoring were 86% and 85%, respectively.

Table 2

Percentage complete and accurately recorded data items for all NICE variables, APACHE II variables and SAPS II variables

Group	All variables NICE (n = 88)	APACHE II variables (n = 35)	SAPS II variables (n = 26)
Training group (60 cases)
Before training	79 (74–84)	82 (77–88)	89 (86–92)
After training	86 (82–91)	89 (87–91)	93 (91–96)
Difference	7 (2–11)*	6 (2–9)*	4 (0–5)*
Control group (24 cases)
1st data extraction	86 (80–91)	86 (81–90)	94 (90–94)
2nd data extraction	85 (75–91)	86 (77–92)	93 (82–95)
Difference	3 (-10 to +9)	1 (-6 to +7)	1 (-10 to +5)

Values are pressed as median percentage (95% confidence interval). *P < 0.05. APACHE, Acute Physiology and Chronic Health Evaluation; NICE, National Intensive Care Evaluation; SAPS, Simplified Acute Physiology Score.

Table 3 displays data items with an accuracy percentage below 75%. A relatively large number of inaccurate values was recorded for physiological data and laboratory data. 'Body temperature' was one of the physiological variables with a high error rate. Before training as well as after, the 'mean blood pressure' was frequently left blank and the 'mean alveolar–arterial oxygen difference' was frequently incorrect. These two variables must be calculated on the basis of other physiological variables. The APACHE II diagnosis was often erroneously extracted.

Table 3

NICE data items with percentage accuracy below 75% before training in the training group (for all NICE variables)

	% Accurate

Data item	Before training	After training
Body temperature	18	40
Alveolar–arterial oxygen difference	23	47
Mean arterial blood pressure	42	53
APACHE II diagnosis	48	54
Respiratory rate	52	68
Admission type	62	68
Urine output (8 hours)	70	87
Glasgow Coma Scale score	73	85
All data items	79	86

Sixty cases were assessed in total in the training group. APACHE, Acute Physiology and Chronic Health Evaluation; NICE, National Intensive Care Evaluation.

Types of data errors

Data errors were categorized into three types: (1) incomplete data; (2) nonadherence to data definitions, such as the selection of values outside the first 24 hours of intensive care admission, inaccurate calculation and ignoring specific guidelines; and (3) other errors that could not directly be accounted for, such as writing errors, transposed minimal and maximal values, or values that were not in the source data. The percentages of data error types are displayed in Table 4. All types of data errors exhibited a decrease after training. Nonadherence to data definitions decreased the most after training. Incomplete data mostly concerned respiratory rate and mean blood pressure. Inaccurate data from outside the first 24 hours of intensive care admission affected all kinds of physiological and laboratory data. Calculation errors mostly pertained to urine production and alveolar–arterial oxygen difference. Blood gas values should be selected from the sample that results in the highest alveolar–arterial oxygen difference; incorrect selection of blood gas samples resulted in inaccurate values for all five blood gas variables (i.e. partial arterial oxygen pressure, partial arterial carbon dioxide pressure, fractional inspired oxygen, alveolar–arterial oxygen difference and pH), causing a relatively high increase in the percentage of inaccurate data. Other variables for which training participants frequently ignored data definitions were admission type, diagnoses, Glasgow Coma Scale scores and body temperature. Measurements of body temperature from the rectum, blood, oesophagus or ear are considered to be core temperatures. According to the definition 1°C should be added to temperatures measured at the patient's groin; this was often forgotten. Other errors that could not directly be accounted for were randomly distributed among all variables. In most of these cases the documented data value did not correspond to the true lowest or highest value.

Table 4

Types of data extraction errors and their frequency of occurrence in the training group before and after training (for all NICE variables)

	Frequency (%)

Data error type	Before training	After training	Difference
Incomplete	7.4	4.5	2.9*
Nonadherence to data definitions	11.4	7.9	3.5*
Inclusion of values outside first 24 hours	3.8	1.3	2.5*
Calculation error	2.8	2.3	0.5
Other	4.8	4.3	0.5*
Other (errors that could not directly be accounted for)	2.1	1.3	0.8
Total inaccurate data	21.2	13.9	7.3*

Sixty cases were assessed in total in the training group. * P < 0.05.

Severity of illness scores and predicted mortalities

Results of the Wilcoxon signed rank sum test indicate that, after training, SAPS II scores and mortality probabilities in the training group showed less absolute deviation from gold standard scores and predicted mortalities (P = 0.002 in both cases; Table 5). APACHE II scores and mortality probabilities showed no improvement after training. In the control group, deviations from SAPS II and APACHE II gold standard scores and mortality probabilities were equally large at both data extractions.

Table 5

Deviation from the gold standard severity-of-illness scores and mortality probabilities for training and control groups

Group	SAPS II score	SAPS II probability of death	APACHE II score	APACHE II probability of death
Training group (60 cases)
Before	7.5 (5 to 10)	0.17 (0.11 to 0.23)	4 (2 to 5)	0.14 (0.11 to 0.21)
After	4 (3 to 6)	0.09 (0.06 to 0.12)	2.5 (2 to 4)	0.12 (0.07 to 0.17)
Difference	-2 (-4 to 0)*	-0.05 (-0.09 to 0)*	0 (-1 to +1)	0 (-0.01 to +0.10)
Control group (24 cases)
1st extraction	5 (3 to 8)	0.11 (0.07 to 0.19)	3 (2 to 5)	0.11 (0.07 to 0.17)
2nd extraction	5.5 (2 to 9)	0.12 (0.04 to 0.19)	2 (1 to 3)	0.10 (0.04 to 0.17)
Difference	0.5 (0 to 3)	0.008 (0 to 0.07)	-1 (-3 to +1)	-0.002 (-0.06 to +0.04)

Values are expressed as median absolute deviation (95% confidence interval). *P < 0.05. APACHE, Acute Physiology and Chronic Health Evaluation; NICE, National Intensive Care Evaluation; SAPS, Simplified Acute Physiology Score.

Discussion

Based on the results of the present study we may conclude that training in data definitions and data collection guidelines improves data quality in general. Before training, many variables were incorrect and incomplete; this was probably due to the fact that participants were unacquainted with most of the data definitions. After training, completeness and adherence to data definitions increased significantly. It could be argued that the decrease in errors after training was not the result of training but simply the result of extracting the same data twice within 2 months. Therefore, a control group was included, members of which also extracted the data of the same three cases twice with an interval of 4–6 weeks but without any training (or other intervention) in between. In that group, no difference was observed in data accuracy between the first and second data extractions. The fact that data quality did not change between the first and second data extractions in the control group suggests that simply assessing the same cases twice did not influence data quality in the training group. However, it cannot be ruled out that data quality in the control group was already optimal at the first data extraction, making it impossible to improve further ('ceiling effect'). Data accuracy in the control group was equal to the data accuracy after training in the training group. We could conclude from this that central and local training sessions (in the control group) are equally effective and that the effect of training remains for a longer period. Alternatively, the difference in baseline data quality might have been a reflection of different characteristics of both groups. For example, the number of participants in both groups was not equal, and the control group consisted of physicians from one hospital whereas the participants in the training group were from different sites. We only evaluated the short-term effects of training on data quality. Further studies are necessary to determine how long these beneficial effects will last. In the present study we found that, after training, 14% of all data items were still incomplete or inaccurate. We recently examined the quality of data contained in the NICE registry and found a considerable lower error rate (6%) [16]. The different findings in these two studies may have various reasons. First, in the present study the cases were specially selected for evaluation of data quality and contained many artificially incorporated pitfalls in data extraction. Second, in contrast to real data extraction for the NICE registry, no automatic data checks were run on the extracted data. Finally, in reality, data are extracted by the treating physician. For this study, the physicians had to extract data from copies of patient records that they were not familiar with and from patients they had never seen. The NICE registry is primarily used to calculate severity-of-illness scores and predicted mortality based on these scores. The validity of SAPS II scores and mortality probabilities improved after training, whereas the validity of APACHE II scores and mortality probabilities did not. This difference is probably accounted for by the fact that, before training, only a few participants were experienced in SAPS II data collection and almost all participants were familiar with APACHE II data collection. The deviations from gold standard scores and probabilities found in the present study, even after training, are still considerable and may not support their use in clinical practice. However, the reliability of severity-of-illness scores was found to be sufficient by two other studies [7,16]. These different findings can be explained by differences in study designs, such as the incorporation of artificial pitfalls in the present study. Four APACHE II variables, namely temperature, alveolar–arterial oxygen difference, mean arterial blood pressure and APACHE II diagnostic category, exhibited a high percentage of incorrect data before and after training. These variables have similarly been reported by other researchers to have low accuracy and reliability rates [3,4,7]. In a study conducted by Chen and coworkers [4], variables involving calculations, such as the alveolar oxygen difference, were found to have the lowest agreement. Several studies suggest that the ambiguous definitions for some of the APACHE II medical terms are an important cause of the wide interobserver variations [3-5]. Physicians, researchers and decision makers should be aware of the variability in severity-of-illness scores and mortality probabilities, and take them into account. To increase data accuracy and reduce variability on an international level, there should be an international agreement on unambiguous definitions for all variables used in APACHE II and SAPS II models. A study conducted by Polderman and coworkers [10] showed that training in data extraction reduced the interobserver variability in APACHE II scoring in a single university hospital setting. It is possible that the positive effect of training in their study was overestimated because their training programme focused on the errors observed in the first data extraction episode, before training. The content of our NICE training programme was determined before we started the present study and was not affected by the results of the first data extraction. Many ICUs collect severity-of-illness scores. Data transcription from a patient record to a case record form is the most commonly used method for collection of these scores. Therefore, we believe that the positive effect of training found in the present study will also be found in other intensive care registries and clinical trials. Although it is probably not possible to have an intensive care registry that is completely free of errors, this study shows that centrally organized training in data definitions, which is further diffused by the train-the-trainer principle, is an important basis for good data quality.

Competing interests

None declared.

Key messages

• Good definitions for all data items are a prerequisite for highly accurate data collection • Training in data extraction and definitions appears to be effective in improving quality of intensive care data • Within the scoring system data, the positive training effect was only proven for the SAPS II data, in which the study population was less experienced as compared with APACHE II • The positive effect of central training may be diffused by the train-the-trainer principle • Medical registries should implement quality assurance programmes in order to optimize their data quality. These programmes should include training sessions, in combination with other quality assurance procedures

Abbreviations

APACHE = Acute Physiology and Chronic Health Evaluation; ICU = intensive care unit; NICE = National Intensive Care Evaluation; SAPS = Simplified Acute Physiology Score.

15 in total

1. Inter-observer variability in APACHE II scoring: effect of strict guidelines and training.

Authors: K H Polderman; E M Jorna; A R Girbes
Journal: Intensive Care Med Date: 2001-08 Impact factor: 17.440

Review 2. Defining and improving data quality in medical registries: a literature review, case study, and generic framework.

Authors: Danielle G T Arts; Nicolette F De Keizer; Gert-Jan Scheffer
Journal: J Am Med Inform Assoc Date: 2002 Nov-Dec Impact factor: 4.497

3. Reliability of a measure of severity of illness: acute physiology of chronic health evaluation--II.

Authors: A M Damiano; M Bergner; E A Draper; W A Knaus; D P Wagner
Journal: J Clin Epidemiol Date: 1992-02 Impact factor: 6.437

4. Interobserver variability in data collection of the APACHE II score in teaching and community hospitals.

Authors: L M Chen; C M Martin; T L Morrison; W J Sibbald
Journal: Crit Care Med Date: 1999-09 Impact factor: 7.598

5. APACHE II: a severity of disease classification system.

Authors: W A Knaus; E A Draper; D P Wagner; J E Zimmerman
Journal: Crit Care Med Date: 1985-10 Impact factor: 7.598

6. Quality of data collected for severity of illness scores in the Dutch National Intensive Care Evaluation (NICE) registry.

Authors: Daniëlle Arts; Nicolette de Keizer; Gert-Jan Scheffer; Evert de Jonge
Journal: Intensive Care Med Date: 2002-04-13 Impact factor: 17.440

7. The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults.

Authors: W A Knaus; D P Wagner; E A Draper; J E Zimmerman; M Bergner; P G Bastos; C A Sirio; D J Murphy; T Lotring; A Damiano
Journal: Chest Date: 1991-12 Impact factor: 9.410

8. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study.

Authors: J R Le Gall; S Lemeshow; F Saulnier
Journal: JAMA Date: 1993 Dec 22-29 Impact factor: 56.272

9. Prospective evaluation of residents and nurses as severity score data collectors.

Authors: A W Holt; L K Bury; A D Bersten; G A Skowronski; A E Vedig
Journal: Crit Care Med Date: 1992-12 Impact factor: 7.598

10. Mortality probability models for patients in the intensive care unit for 48 or 72 hours: a prospective, multicenter study.

Authors: S Lemeshow; J Klar; D Teres; J S Avrunin; S H Gehlbach; J Rapoport; M Rué
Journal: Crit Care Med Date: 1994-09 Impact factor: 7.598

8 in total

1. Redesign of diagnostic coding in pediatrics: from form-based to discharge letter linked.

Authors: Hilco Prins; Hans Büller; Betty Zwetsloot-Schonk
Journal: Perspect Health Inf Manag Date: 2004-12-07

2. Perfusion Downunder Collaboration Database--data quality assurance: towards a high quality clinical database.

Authors: Sigrid C Tuble
Journal: J Extra Corpor Technol Date: 2011-03

3. Comparison of sequential organ failure assessment (SOFA) scoring between nurses and residents.

Authors: Nur Baykara; Kaan Gökduman; Tülay Hoşten; Mine Solak; Kamil Toker
Journal: J Anesth Date: 2011-09-20 Impact factor: 2.078

4. Use of the All Patient Refined-Diagnosis Related Group (APR-DRG) Risk of Mortality Score as a Severity Adjustor in the Medical ICU.

Authors: Daniel Baram; Feroza Daroowalla; Ruel Garcia; Guangxiang Zhang; John J Chen; Erin Healy; Syed Ali Riaz; Paul Richman
Journal: Clin Med Circ Respirat Pulm Med Date: 2008-04-18

5. Hospital mortality is associated with ICU admission time.

Authors: Hans A J M Kuijsten; Sylvia Brinkman; Iwan A Meynaar; Peter E Spronk; Johan I van der Spoel; Rob J Bosman; Nicolette F de Keizer; Ameen Abu-Hanna; Dylan W de Lange
Journal: Intensive Care Med Date: 2010-06-15 Impact factor: 17.440

6. Evaluating the effectiveness of a tailored multifaceted performance feedback intervention to improve the quality of care: protocol for a cluster randomized trial in intensive care.

Authors: Sabine N van der Veer; Maartje L G de Vos; Kitty J Jager; Peter H J van der Voort; Niels Peek; Gert P Westert; Wilco C Graafmans; Nicolette F de Keizer
Journal: Implement Sci Date: 2011-10-24 Impact factor: 7.327

7. Quantifying data quality for clinical trials using electronic data capture.

Authors: Meredith L Nahm; Carl F Pieper; Maureen M Cunningham
Journal: PLoS One Date: 2008-08-25 Impact factor: 3.240

8. Case mix, outcome and length of stay for admissions to adult, general critical care units in England, Wales and Northern Ireland: the Intensive Care National Audit & Research Centre Case Mix Programme Database.

Authors: David A Harrison; Anthony R Brady; Kathy Rowan
Journal: Crit Care Date: 2004-02-26 Impact factor: 9.097

8 in total