Literature DB >> 26605121

Diagnostic performance on briefly presented digital pathology images.

Joseph P Houghton¹, Bruce R Smoller², Niamh Leonard³, Michael R Stevenson¹, Tim Dornan¹.

Abstract

BACKGROUND: Identifying new and more robust assessments of proficiency/expertise (finding new "biomarkers of expertise") in histopathology is desirable for many reasons. Advances in digital pathology permit new and innovative tests such as flash viewing tests and eye tracking and slide navigation analyses that would not be possible with a traditional microscope. The main purpose of this study was to examine the usefulness of time-restricted testing of expertise in histopathology using digital images.
METHODS: 19 novices (undergraduate medical students), 18 intermediates (trainees), and 19 experts (consultants) were invited to give their opinion on 20 general histopathology cases after 1 s and 10 s viewing times. Differences in performance between groups were measured and the internal reliability of the test was calculated.
RESULTS: There were highly significant differences in performance between the groups using the Fisher's least significant difference method for multiple comparisons. Differences between groups were consistently greater in the 10-s than the 1-s test. The Kuder-Richardson 20 internal reliability coefficients were very high for both tests: 0.905 for the 1-s test and 0.926 for the 10-s test. Consultants had levels of diagnostic accuracy of 72% at 1 s and 83% at 10 s.
CONCLUSIONS: Time-restricted tests using digital images have the potential to be extremely reliable tests of diagnostic proficiency in histopathology. A 10-s viewing test may be more reliable than a 1-s test. Over-reliance on "at a glance" diagnoses in histopathology is a potential source of medical error due to over-confidence bias and premature closure.

Entities: Chemical

Keywords: Digital pathology; expertise; overconfidence bias; premature closure; time-restricted test

Year: 2015 PMID： 26605121 PMCID： PMC4639946 DOI： 10.4103/2153-3539.168517

Source DB: PubMed Journal: J Pathol Inform

INTRODUCTION

Histopathology is, at its core, a visual discipline. The cornerstone of accurate tissue diagnosis is a pathologist viewing and correctly interpreting a microscopic image. A number of other important skills such as clinicopathological correlation are also important, but visual pattern recognition is the critical element. The gradual acquisition of pattern recognition skills, while progressing from a novice style to an expert style and maintaining these skills throughout a professional career, is essential. Very few studies have attempted to assess visual memory or pattern recognition skills in histopathology. A widely supported theory explaining how we visually examine images to identify key information proposes that there are two pathways.[12345] These two systems run in parallel are fluid and communicate with each other. First, a nonselective pathway, which has alternatively been described as holistic, automatic, Gestalt-like, coup d’oeil, top-down, thin-slicing or subconscious searching, involves a global (at a glance) impression of the image. The second pathway is the selective pathway and involves careful screening for specific findings. This has also been called conscious, analytic, or bottom-up searching. Studies in neuroimaging which support this two-pathway model have shown that object identification maps to regions in the occipitotemporal cortex whereas global identification maps to other regions in the brain.[6789] There are two broad approaches to testing these pathways. The nonselective pathway can be tested using time-restricted tests whereas the selective pathway can be tested using the eye-tracking equipment. A small number of studies have examined differences in performance in time-restricted tests in radiology and pathology.[101112131415] Most of these studies have shown superior accuracy by experts. In other words, experts tend to make correct diagnoses quickly and accurately. Performance in time-restricted tests can, therefore, be used as a marker of expertise. Identifying new and more robust assessments of proficiency/expertise in histopathology (finding new “biomarkers of expertise”)[16] is desirable for several reasons (for example, as a serial assessment tool of histopathologists in training) and the main purpose of this study was to examine the usefulness of time-restricted tests in histopathology using digital images.

METHODS

Ethical approval for the study was granted by the Queen's University Belfast Research Ethics Committee, School of Medicine Dentistry and Biomedical Sciences Ref: 14.47v2, 3/11/2014. Participants (n = 56; mean ± standard deviation [SD] age: 33.2 ± 10.3 years; 31 males and 25 females) formed three groups with increasing levels of experience. Novices (Group 1) were 19 third-year undergraduate medical students who were midway through a core pathology module and who, in addition, had just completed an elective in pathology (mean ± SD age: 23.1 ± 2.9 years; 13 males and 6 females). Intermediates (Group 2) were 18 trainee histopathologists/residents (mean ± SD age: 31.3 ± 3.4 years; 6 males and 12 females). Within this group, six trainees had fewer than 2 years’ experience viewing histopathology slides; seven trainees had between 2 and 4 years’ experience, and five trainees had > 4 years’ experience. Experts (Group 3) were 19 practicing consultant histopathologists (mean ± SD age: 45.1 ± 8.0 years; 12 males and 7 females) with a mean ± SD length of experience in consultant practice of 12.2 ± 8.2 years. The consultant group was a heterogenous mixture of general and specialist histopathologists. Trainees and consultant histopathologists were based in Belfast City Hospital and Royal Victoria Hospital, Belfast, and St. James’ Hospital, Dublin. Stimuli consisted of 20 digital histological images from teaching archives [Table 1]. An example of an image used is presented in Figure 1. These were general pathology cases from four different anatomical sites chosen to represent a full range of difficulty ranging from normal histology to challenging cases. In order to achieve this, we referred to the competency framework for graded responsibility for Specialist Registrars in Histopathology and Cytopathology published by the Joint Committee on Pathology Training of the Royal College of Pathologists (UK).[17] This categorizes cases into four increasing levels of complexity, where one is the lowest and four is the highest. In addition, we introduced level 5, which is not in the original document, but includes cases that would be considered more difficult than level 4. The magnification used was tailored to each individual case; if the diagnosis was based primarily on an architectural feature (e.g., diverticulosis) a low-power magnification was used, whereas if the diagnosis was based on a cytological feature (e.g., Barrett's metaplasia) a high-power magnification was used. All test material including the clinical history, image quality, and correct diagnosis for each case was verified by two experienced consultant histopathologists who did not participate in the study.

Table 1

Twenty cases from four anatomical sites representing a broad range of difficulty

Figure 1

Pilomatrixoma (benign skin adnexal tumor) in which there are uniform basaloid cells on the left and “ghost cells” on the right, high magnification

Twenty cases from four anatomical sites representing a broad range of difficulty Pilomatrixoma (benign skin adnexal tumor) in which there are uniform basaloid cells on the left and “ghost cells” on the right, high magnification The experiment was carried out in a seminar room where single representative fixed photographs (.jpg) of each diagnostic entity were displayed on a white screen using an overhead projector. For each case, a brief clinical summary was provided to participants. The clinical information was deliberately brief and only included age, gender, and the site of biopsy and did not give any further information from which participants could guess the correct answer. Each image was displayed for 1 s followed by a 20 s pause during which participants recorded their diagnoses. Responses were written down in free-text format rather than using a multiple-choice format in order to reduce the likelihood of guessing the correct answer. Immediately after the 1-s tests were complete, the test was repeated as before using the same images; however, the images were now displayed for 10-s. The rationale for choosing 1 s/10 s timings was that 1 s represented a brief glance, whereas 10 s was chosen to represent a longer but still challenging exposure. The candidates’ answers were marked manually, and their scores were presented using a box and whisker plot graph. Differences in performance between individual groups were analyzed using the Fisher's least significant difference method for multiple comparisons. The Kuder–Richardson formula 20 (KR-20) internal reliability coefficient was calculated using Statistical Package for the Social Sciences (SPSS, IBM SSPS Statistics 21) for both tests as quality measure of internal reliability. As a general principle, KR-20 coefficients of between 0.7 and 0.9 are considered good and coefficients of >0.9 are considered excellent.

RESULTS

The range of participants’ scores for the 1-s and 10-s tests is presented in Figures 2 and 3, respectively. For all groups (including consultants), accuracy was higher at 10-s.

Figure 2

Box and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 1-s test

Figure 3

Box and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 10-s test

Box and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 1-s test Box and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 10-s test Table 2 illustrates highly significant differences in performance between groups using the Fisher's least significant difference method for multiple comparisons. Differences between groups were consistently greater with the 10-s than the 1-s test.

Table 2

Fisher's least significant difference method for multiple comparisons, 1-s and 10-s tests

Fisher's least significant difference method for multiple comparisons, 1-s and 10-s tests KR-20 internal reliability coefficients were very high for both tests: 0.905 for the 1-s test and 0.926 for the 10-s test. The range of incorrect answers (over both tests) for the five cases that the consultants found most difficult is presented in Table 3. In some cases, the incorrect answers are major diagnostic errors; for example interpreting a benign tumor as malignant. The magnification of the image that was selected did not influence the likelihood of error.

Table 3

Range of incorrect answers submitted by consultants for the five most difficult cases

CONCLUSIONS

In this study, we analyzed performance in 2 time-restricted tests, at 1-s and also at 10-s. Both of these tests demonstrated very high degrees of internal reliability 0.905 and 0.926, respectively. Intuitively, we had expected that a 1-s test would be more discriminating than a 10-s test because it has been suggested that extreme time restriction can expose differences in ability that would otherwise be undetectable. However, the results of this study did not corroborate this; in fact, the 10-s test had a marginally superior reliability coefficient and differences among groups were consistently greater. There was a broad range of ability in performance within each group. A strong performance in the student group could potentially identify candidates with a natural talent for pattern recognition who could be suited to a career in pathology. The broad range of trainees most likely reflected the broad range of experience in this subgroup. Among the consultant group, the study highlighted truly expert performances; at the very top was a consultant who scored 18/20 at 1-s and 20/20 at 10-s. The lowest consultant scores most likely represented the loss of general pathology skills among pathologists who had worked as subspecialists for many years. With respect to the utility formula of assessment,[18] the time-restricted tests used in this study scored highly due to high reliability coefficients were highly valid in that they were an assessment of a key diagnostic skill, could potentially have a positive impact on learning, and were extremely brief and cost effective. We did not formally survey the participants afterward with regards to acceptability but a 10-s test appeared to be more acceptable than a 1-s test. We propose that time-restricted testing could be used in a number of situations; as a formative assessment during training to track trainees longitudinally to record the acquisition or failure of acquisition of skills with a view to accelerate this progression but also to quickly identify, at an early stage, trainees in difficulty who may require additional support or who may not be suited to a career in histopathology; to demonstrate maintenance of expert skills throughout a career; to assess pathologists who chose to re-train in a new subspecialty; to compare training programs nationally and internationally to identify best practice. Of course, time-restricted testing only assesses one skill, the ability to make rapid, and accurate assessments of histopathological images. While some cases are “spot/at a glance diagnoses” others require careful screening of slides to identify a subtle or scanty feature; for example, in cervical cytology or prostate chippings. In this study, we only looked at expertise in rapid image identification, whereas differences in experience with regard to screening histological slides requires analysis of pathologists’ searching strategies which can be assessed using eye tracking studies or search maps recorded when pathologists view digital slides.[1920212223242526] We suggest that performance in time-restricted tests could be incorporated with performance in other tests such as eye tracking to provide a global overview of an individual pathologist's expertise. Being a safe and competent practitioner also requires attention to detail, an ability to correlate clinical and pathological information, a commitment to audit, quality improvement, and continuous professional development. Some pathologists may perform well in a time-restricted test but may not be committed to these other attributes. An interesting finding in this study was the remarkably high diagnostic accuracy of experts of 72% at 1-s and 83% at 10-s, which could have implications for routine practice. With increasing experience, experts can diagnose so many cases within the first few seconds that there is potentially an associated increased risk of medical error due to overconfidence bias and premature closure.[27] During routine busy practice, there is a constant tension between rapid diagnosis and cautious decision making and it is important that histopathologists are aware that over-reliance on “at a glance” diagnoses is a potential source of medical error. In this study, there were a number of examples where the consultant's first impression was that of a malignant tumor when the lesion was benign. It is likely that some of these discrepancies were due to specialists giving opinions on cases with which they were unfamiliar but there are numerous examples in surgical pathology where benign lesions can closely mimic malignant lesions, and an interesting further study could involve specifically focusing on these problematic areas. In addition, in a small number of cases, there is double pathology, and it is important not to miss the second abnormality having identified the first more obvious pathology. A potential weakness in our study was using a consultant/expert group that was a mixture of generalists and specialists and, while all of the experts had originally trained in general pathology, some had since subspecialized in areas not included in this study. If all the experts had been practicing general pathologists, the expert group might have performed better. Furthermore, inclusion of 19 novices of very low ability could have artificially inflated the high KR-20 reliability coefficient[28] and it would, therefore, be useful to carry out similar tests with trainees and/or consultant histopathologists. Another potential weakness is that interpretation of the images at 10-s may have been influenced by the 1-s test due to a direct/repetition priming effect.[29]

Financial Support and Sponsorship

Nil.

Conflicts of Interest

There are no conflicts of interest.

21 in total

1. Viewpoint-specific scene representations in human parahippocampal cortex.

Authors: Russell Epstein; Kim S Graham; Paul E Downing
Journal: Neuron Date: 2003-03-06 Impact factor: 17.173

2. Achieving quality in clinical decision making: cognitive strategies and detection of bias.

Authors: Pat Croskerry
Journal: Acad Emerg Med Date: 2002-11 Impact factor: 3.451

3. Eye-movement study and human performance using telepathology virtual slides: implications for medical education and differences with experience.

Authors: Elizabeth A Krupinski; Allison A Tillack; Lynne Richter; Jeffrey T Henderson; Achyut K Bhattacharyya; Katherine M Scott; Anna R Graham; Michael R Descour; John R Davis; Ronald S Weinstein
Journal: Hum Pathol Date: 2006-12 Impact factor: 3.466

4. The assessment of professional competence: Developments, research and practical implications.

Authors: C P Van Der Vleuten
Journal: Adv Health Sci Educ Theory Pract Date: 1996-01 Impact factor: 3.853

Review 5. Informatics in radiology: what can you see in a single glance and how might this guide visual search in medical images?

Authors: Trafton Drew; Karla Evans; Melissa L-H Võ; Francine L Jacobson; Jeremy M Wolfe
Journal: Radiographics Date: 2012-10-25 Impact factor: 5.333

6. Lung lesions: correlation between viewing time and detection.

Authors: J W Oestmann; R Greene; D C Kushner; P M Bourgouin; L Linetsky; H J Llewellyn
Journal: Radiology Date: 1988-02 Impact factor: 11.105

7. Finding lung nodules with and without comparative visual scanning.

Authors: D P Carmody; C F Nodine; H L Kundel
Journal: Percept Psychophys Date: 1981-06

8. Distributed and overlapping representations of faces and objects in ventral temporal cortex.

Authors: J V Haxby; M I Gobbini; M L Furey; A Ishai; J L Schouten; P Pietrini
Journal: Science Date: 2001-09-28 Impact factor: 47.728

Review 9. Neural components of topographical representation.

Authors: G K Aguirre; E Zarahn; M D'Esposito
Journal: Proc Natl Acad Sci U S A Date: 1998-02-03 Impact factor: 11.205

10. The gist of the abnormal: above-chance medical decision making in the blink of an eye.

Authors: Karla K Evans; Diane Georgian-Smith; Rosemary Tambouret; Robyn L Birdwell; Jeremy M Wolfe
Journal: Psychon Bull Rev Date: 2013-12

3 in total

1. Accuracy of Digital Pathologic Analysis vs Traditional Microscopy in the Interpretation of Melanocytic Lesions.

Authors: Tracy Onega; Raymond L Barnhill; Michael W Piepkorn; Gary M Longton; David E Elder; Martin A Weinstock; Stevan R Knezevich; Lisa M Reisch; Patricia A Carney; Heidi D Nelson; Andrea C Radick; Joann G Elmore
Journal: JAMA Dermatol Date: 2018-10-01 Impact factor: 10.282

Review 2. The Holistic Processing Account of Visual Expertise in Medical Image Perception: A Review.

Authors: Heather Sheridan; Eyal M Reingold
Journal: Front Psychol Date: 2017-09-28

3. The Use of Screencasts with Embedded Whole-Slide Scans and Hyperlinks to Teach Anatomic Pathology in a Supervised Digital Environment.

Authors: Mary Wong; Joseph Frye; Stacey Kim; Alberto M Marchevsky
Journal: J Pathol Inform Date: 2018-11-14

3 in total