| Literature DB >> 23793682 |
Matthew K H Hong1, Henry H I Yao, John S Pedersen, Justin S Peters, Anthony J Costello, Declan G Murphy, Christopher M Hovens, Niall M Corcoran.
Abstract
OBJECTIVE: Data errors are a well-documented part of clinical datasets as is their potential to confound downstream analysis. In this study, we explore the reliability of manually transcribed data across different pathology fields in a prostate cancer database and also measure error rates attributable to the source data.Entities:
Keywords: clinical informatics; data quality; database; error sources; prostate cancer
Year: 2013 PMID: 23793682 PMCID: PMC3657671 DOI: 10.1136/bmjopen-2012-002406
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Schematic representation of the digital import of pathology data. Structured ‘synoptic’ reports facilitated digital recognition of relevant pathology fields. (A) Demographics data were used to link reports to individual patients in the database and (B) individual data were then extracted from the report and directed to populate relevant fields in the main database.
Figure 2Schematic representation of the comparison of a dataset imported digitally and in parallel to a manually entered dataset. (A) Records were linked using unique patient identifiers and (B) pathology fields were individually compared. Concordant data were flagged for merging in order to eliminate duplicate data. (C) Mismatches were used to identify errors in the manual entry dataset. (D) We compared across all pathology fields for individual patients.
Digitally imported cases containing pathological stage and extraprostatic extension variables where comparison enabled analysis of error within the source pathology reports
| Pathological stage | Number |
|---|---|
| T2a | 131 |
| T2b | 4 |
| T2c | 543 |
| T3a | 225 |
| T3b | 68 |
| T4 | 0 |
| Extraprostatic extension | |
| Absent | 670 |
| Present | 302 |
| Total cases for comparison | 971 |
Analysis of error rate in individual pathology fields in a radical prostatectomy dataset
| Pathology field | Variable type | Data format | Total data points | Error | Error rate (%) | (95% CI) |
|---|---|---|---|---|---|---|
| Gleason 1 | Categorical | Numeric | 415 | 2 | 0.5 | (0.06 to 1.7) |
| Gleason 2 | Categorical | Numeric | 415 | 3 | 0.7 | (0.15 to 2.1) |
| Gleason score | Categorical | Numeric | 415 | 1 | 0.2 | (0.01 to 1.3) |
| Extraprostatic extension* | Binary | Text | 421 | 21 | 5.0 | (3.1 to 7.5) |
| Stage | Categorical | Alphanumeric | 421 | 13 | 3.1 | (1.7 to 5.2) |
| Focality | Binary | Text | 421 | 9 | 2.1 | (1.0 to 4.0) |
| Perineural invasion* | Categorical | Text | 421 | 27 | 6.4 | (4.3 to 9.2) |
| Lymphovascular invasion* | Categorical | Text | 421 | 27 | 6.4 | (4.3 to 9.2) |
| Prostatic intraepithelial neoplasia* | Categorical | Text | 420 | 27 | 6.4 | (4.3 to 9.2) |
| Margins* | Binary | Text | 386 | 5 | 1.3 | (0.42 to 3.0) |
| Tumour volume | Continuous | Numeric | 310 | 4 | 1.3 | (0.35 to 3.3) |
| Prostate dimensions† | Continuous | Numeric | 272 | 2 | 0.7 | (0.09 to 2.6) |
| Prostate weight | Continuous | Numeric | 410 | 5 | 1.2 | (0.40 to 2.8) |
| All fields | 5148 | 146 | 2.8 | (2.4 to 3.3) |
*Data required some interpretation on data entry—these were coded numerically.
†Each data point was a combination of three numbers. Error never occurred in more than one dimension.
Analysis of errors in source pathology data was performed by matching stage with extraprostatic extension status (EPE). T2 disease should have been EPE negative, whereas T3 disease should have been EPE positive unless seminal vesicle involvement was documented
| Pathological stage | Matches | Mismatches | Error rate (%) | (95% CI) |
|---|---|---|---|---|
| T2 | 672 | 6 | 0.9 | (0.33 to 1.9) |
| T3 | 292 | 1 | 0.3 | (0.01 to 1.9) |
| Total | 964 | 7 | 0.7 | (0.30 to 1.5) |