| Literature DB >> 25005866 |
Sian Taylor-Phillips1, Markus C Elze, Elizabeth A Krupinski, Kathryn Dennick, Alastair G Gale, Aileen Clarke, Claudia Mello-Thoms.
Abstract
The vigilance decrement describes a decrease in sensitivity or increase in specificity with time on task. It has been observed in a variety of repetitive visual tasks, but little is known about these patterns in radiologists. We investigated whether there is systematic variation in performance over the course of a radiology reading session. We re-analyzed data from six previous lesion-enriched radiology studies. Studies featured 8-22 participants assessing 27-100 cases (including mammograms, chest CT, chest x-ray, and bone x-ray) in a reading session. Changes in performance and speed as the reading session progressed were analyzed using mixed effects models. Time taken per case decreased 9-23% as the reading session progressed (p < 0.005 for every study). There was a sensitivity decrease or specificity increase over the course of reading 100 chest x-rays (p = 0.005), 60 bone fracture x-rays (p = 0.03), and 100 chest CT scans (p < 0.0001). This effect was not found in the shorter mammography sessions with 27 or 50 cases. We found evidence supporting the hypothesis that behavior and performance may change over the course of reading an enriched test set. Further research is required to ascertain whether this effect is present in radiological practice.Entities:
Mesh:
Year: 2015 PMID: 25005866 PMCID: PMC4305061 DOI: 10.1007/s10278-014-9717-9
Source DB: PubMed Journal: J Digit Imaging ISSN: 0897-1889 Impact factor: 4.056
Characteristics of studies included in the analysis
| Study | Participants | Cases | Reading session length | Performance measure | Task | Prevalence of abnormality (%) | Treatment in original investigation |
|---|---|---|---|---|---|---|---|
| Krupinski et al. 2007 [ | 18 | 100 | 100 | Confidence of positive decisions (1 = nodule absent, definite, 6 = nodule present definite) | Pulmonary nodule detection in chest x-rays | 50 | 8- or 11-bit displays |
| Mello-Thoms et al. 2008, 2009 [ | 8 | 50 | 50 | Confidence of positive decisions 1 to 3 (negative decisions treated as 0) | Detection of abnormal screening mammograms | 76 | Eye tracking study |
| Krupinski et al. 2010 [ | 20 | 60 | 60 | Confidence of positive decisions 0 to 100 % in 10 % intervals (negative decisions treated as 0 %) | Detection of bone fracture on x-rays (all body areas) | 50 | Before or after a workday |
| Taylor-Phillips et al. 2012 [ | 8 | 162 | 27 | Probability of malignancy from 0 % definitely not malignant to 100 % definitely malignant (negative decisions treated as 0 %) | Detection of abnormal screening mammograms | 41 | Digital mammography with digitized, film, or no previous mammograms |
| Krupinski et al. 2012 [ | 22 | 100 | 100 | Confidence of positive decisions 0 to 100 % (negative decisions treated as 0 %) | Detecting lung nodules on CT scans | 50 | Before or after a workday |
Prevalence refers to proportion of cases which contained the abnormality observers were searching for in the task
Fig. 1Mean scores assigned to normal (dashed lines) and abnormal cases (solid lines) as the reading session progressed. To provide a convenient visualization, the cases are grouped into batches of 25 cases and mean scores are calculated for each of those groups. The first case group includes the first 25 cases read, with subsequent groups including subsequent cases in reading order. For Krupinski 2010 [11], the last group includes fewer than 25 cases. For Krupinski 2007, scores from 1 to 6 have been rescaled onto 0 to 100 % for this graphic. For Taylor-Phillips 2012, scores refer to judgement of probability of malignancy rather than confidence
Fig. 2Histograms of scores assigned to normal cases (left) and abnormal cases (right) in groups of 25 cases in the reading set. The first case group includes the first 25 cases read, with subsequent groups including subsequent cases in reading order. For Taylor-Phillips 2012 and Krupinski 2010, the last group includes fewer than 25 cases. Note that for Taylor-Phillips 2012, scores were rounded to the nearest multiple of 10 to facilitate comparison with the other datasets, and in this dataset scores refer to radiologists judgement of “probability of malignancy” rather than confidence in decision
Fig. 3Overview of the effects of case order on sensitivity (top) and specificity (bottom) cases in the five datasets. Effect sizes were calculated using logit mixed effects models and exp(effect size) is the multiplicative change in the probability of being correct when moving from case i to case i + 1. Positive effects (greater than zero) indicate sensitivity/specificity increasing with time on task. Negative effects (less than zero) indicate sensitivity/specificity decreasing with time on task. The black box indicates the estimate for the effect size. The area of the black box is proportional to the standard error. The lines around the black box show 1.96 × standard error. p values are based on Wald Z-tests. Note that the effects of experience were not considered for this comparison, but the effect for the group of easier cases was considered for the Krupinski 2010 dataset
Fig. 4Overview of the effects of case order on time taken for cases classified as abnormal (top) and normal (bottom) in the five datasets. Effect sizes were calculated using linear mixed effects models and exp(effect size) is the multiplicative change in the log(time taken) when moving from case i to case i + 1. The decrease in time taken per case as the session progressed was significant in all studies. The black box indicates the estimate for the effect size. The area of the black box is proportional to the standard error. The lines around the black box show 95 % MCMC confidence intervals. p values are also based on MCMC. One study (Krupinski 2007) [15] did not have records of time taken per case so could not be included in this analysis