| Literature DB >> 27130217 |
Lindsey A Knake1, Monika Ahuja2, Erin L McDonald1, Kelli K Ryckman3, Nancy Weathers1, Todd Burstain2, John M Dagle1, Jeffrey C Murray1, Prakash Nadkarni4.
Abstract
BACKGROUND: The use of Electronic Health Records (EHR) has increased significantly in the past 15 years. This study compares electronic vs. manual data abstractions from an EHR for accuracy. While the dataset is limited to preterm birth data, our work is generally applicable. We enumerate challenges to reliable extraction, and state guidelines to maximize reliability.Entities:
Keywords: Bioinformatics; Data quality; EHR and manual chart abstraction comparison; EHR vs. Manual chart abstraction, and difference in data quality; Neonatology; PEDs data registry; Prematurity; Quality assurance
Mesh:
Year: 2016 PMID: 27130217 PMCID: PMC4851819 DOI: 10.1186/s12887-016-0592-z
Source DB: PubMed Journal: BMC Pediatr ISSN: 1471-2431 Impact factor: 2.125
Neonate demographics
| Sex | |
| Male | 55.6 % |
| Female | 44.4 % |
| Ethnicity | |
| Non-Hispanic | 91.7 % |
| Hispanic | 6.3 % |
| Unknown/Not reported | 2.0 % |
| Race | |
| White | 85.4 % |
| African American | 6.0 % |
| Asian | 1.9 % |
| American Indian or Native Alaskan | 1.1 % |
| Other or more than one race | 4.2 % |
| Unknown/Not reported | 0.9 % |
| GA (weeks) | |
| < 32 | 44.2 % |
| 32–36 | 37.7 % |
| ≥37 | 18.1 % |
| Mean | 32.2 |
| Range | 22–42 |
| Birthweights (grams) | |
| Range | 328–5,006 |
| Mean | 1,989 |
| Patent ductus arteriosus (PDA) | |
| % of total Neonates | 20.4 % |
| < 32 weeks | 87.6 % |
| 32–36 weeks | 12.4 % |
| ≥ 37 weeks | 0.0 % |
Demographics of the 1,772 neonates enrolled in Iowa’s Prematurity study during the years of 2001–2011
Demographic parameters compared
| 1. Manually abstracted database, # of subjects | 2. EHR extract-ion, # of subjects | 3. Discrepancy (% and # of subjects) between the databases a | 4. Manually abstracted database errors | 5. EHR-extracted data errors | 6. Median discrepancy | 7. Discrepancy range | |
|---|---|---|---|---|---|---|---|
| Gestational age | 1772 | 700 | 2.6 % (18) | 1.0 % (7) | 1.3 % (9) | 1 week | 1–10 weeks |
| Birthweight | 1772 | 735 | 9.7 % (71) | 1.5 % (11) | 8.0 % (59) c **** | 13 g | 2–548 gm |
| Neonate race b | 1758 | 1384 | 3.2 % (44) | !- | !- | NA | NA |
| Neonate ethnicity | 1757 | 596 | 1.5 % (9) | !- | !- | NA | NA |
| Mother race b | 1749 | 1378 | 3.2 % (45) | !- | !- | NA | NA |
| Mother ethnicity | 1739 | 595 | 5.0 % (30) | !- | !- | NA | NA |
Demographic parameters compared in the paper. The denominator for the percentage is the smaller of the corresponding values in the first two columns
! – EHR manual review data could not be used as a gold standard – often recorded as unknown or null, while the manually collected data was based on patient interviews and was more detailed. *P0.05; **P0.01; ***P0.001; ****P0.0001
a - In general, the sum of the error counts in columns 4 and 5 do not add up to the number in column 3, because the error occurred in both manually and electronically extracted data, or the cause was ambiguous
b - Re-calculated discrepancies after adjusting for the inappropriate Hispanic category in the race column
c - Difference statistically significant, p = 4.3 × 10−9 by Chi-square test
Laboratory data and PDA diagnosis compared
| 1. Manually abstracted database, # of subjects | 2. EHR extract-ion, # of subjects | 3. Discrepancy (% and # of subjects) between the databases a | 4. Manually abstracted database errors | 5. EHR-extracted data errors | 6. Median discrepancy | 7. Discrepancy range | |
|---|---|---|---|---|---|---|---|
| 1st WBC count b | 1257 | 1437 | 3.2 % (40) | 2.5 %(32) | 0.6 % (8) c*** | 0.75 k/mm3 | 0.01–109 k/mm3 |
| 1st Hemoglobin | 1333 | 1460 | 11.9 % (158) | 5.8 % (77) | 8.3 % (110) d * | 1.4 g/dl | 0.1–25.9 g/dl |
| Peak total bilirubin | 1565 | 1336 | 11.4 % (152) | 6.9 % (92) | 5.1 % (68) e * | 1.45 mg/dl | 0.1–15.2 mg/dl |
| Peak direct bilirubin | 681 | 674 | 4.9 % (33) | 4.5 % (30) | 0.9 % (6) f **** | 0.5 mg/dl | 0.1–16.4 mg/dl |
| PDA | 512 | 414 | 12.8 % (53) | 12.8 %(53) | 8.2 % (34) g *** | NA | NA |
Laboratory data and PDA parameters compared in the paper. The denominator for the percentage is the smaller of the corresponding values in the first two columns
*P0.05; **P0.01; ***P0.001; ****P0.0001
a - In general, the sum of the error counts in columns 4 and 5 do not add up to the number in column 3, because the error occurred in both manually and electronically extracted data, or the cause was ambiguous
b - Re-calculated discrepancies after adjusting for the inappropriate Hispanic category in the race column
c - Difference statistically significant, p = 1.3 × 10−4 by Chi-square test
d - Difference statistically significant, p = 0.012 by Chi-square test
e - Difference statistically significant, p = 0.05 by Chi-square test
f - Difference statistically significant, p = 4.9 × 10−5 by Chi-square test
g - Difference statistically significant, p = 0.001 by Chi-square test