Joseph P Houghton1, Bruce R Smoller2, Niamh Leonard3, Michael R Stevenson1, Tim Dornan1. 1. Centre for Medical Education, Queen's University Belfast, Belfast BT9 7BL, Ireland. 2. Department of Pathology and Laboratory Medicine, University of Rochester Medical Center, Rochester, New York, USA. 3. Department of Histopathology, St. James's Hospital, James's Street, Dublin 8, Ireland.
Abstract
BACKGROUND: Identifying new and more robust assessments of proficiency/expertise (finding new "biomarkers of expertise") in histopathology is desirable for many reasons. Advances in digital pathology permit new and innovative tests such as flash viewing tests and eye tracking and slide navigation analyses that would not be possible with a traditional microscope. The main purpose of this study was to examine the usefulness of time-restricted testing of expertise in histopathology using digital images. METHODS: 19 novices (undergraduate medical students), 18 intermediates (trainees), and 19 experts (consultants) were invited to give their opinion on 20 general histopathology cases after 1 s and 10 s viewing times. Differences in performance between groups were measured and the internal reliability of the test was calculated. RESULTS: There were highly significant differences in performance between the groups using the Fisher's least significant difference method for multiple comparisons. Differences between groups were consistently greater in the 10-s than the 1-s test. The Kuder-Richardson 20 internal reliability coefficients were very high for both tests: 0.905 for the 1-s test and 0.926 for the 10-s test. Consultants had levels of diagnostic accuracy of 72% at 1 s and 83% at 10 s. CONCLUSIONS: Time-restricted tests using digital images have the potential to be extremely reliable tests of diagnostic proficiency in histopathology. A 10-s viewing test may be more reliable than a 1-s test. Over-reliance on "at a glance" diagnoses in histopathology is a potential source of medical error due to over-confidence bias and premature closure.
BACKGROUND: Identifying new and more robust assessments of proficiency/expertise (finding new "biomarkers of expertise") in histopathology is desirable for many reasons. Advances in digital pathology permit new and innovative tests such as flash viewing tests and eye tracking and slide navigation analyses that would not be possible with a traditional microscope. The main purpose of this study was to examine the usefulness of time-restricted testing of expertise in histopathology using digital images. METHODS: 19 novices (undergraduate medical students), 18 intermediates (trainees), and 19 experts (consultants) were invited to give their opinion on 20 general histopathology cases after 1 s and 10 s viewing times. Differences in performance between groups were measured and the internal reliability of the test was calculated. RESULTS: There were highly significant differences in performance between the groups using the Fisher's least significant difference method for multiple comparisons. Differences between groups were consistently greater in the 10-s than the 1-s test. The Kuder-Richardson 20 internal reliability coefficients were very high for both tests: 0.905 for the 1-s test and 0.926 for the 10-s test. Consultants had levels of diagnostic accuracy of 72% at 1 s and 83% at 10 s. CONCLUSIONS: Time-restricted tests using digital images have the potential to be extremely reliable tests of diagnostic proficiency in histopathology. A 10-s viewing test may be more reliable than a 1-s test. Over-reliance on "at a glance" diagnoses in histopathology is a potential source of medical error due to over-confidence bias and premature closure.
Entities:
Keywords:
Digital pathology; expertise; overconfidence bias; premature closure; time-restricted test
Histopathology is, at its core, a visual discipline. The cornerstone of accurate tissue diagnosis is a pathologist viewing and correctly interpreting a microscopic image. A number of other important skills such as clinicopathological correlation are also important, but visual pattern recognition is the critical element. The gradual acquisition of pattern recognition skills, while progressing from a novice style to an expert style and maintaining these skills throughout a professional career, is essential. Very few studies have attempted to assess visual memory or pattern recognition skills in histopathology.A widely supported theory explaining how we visually examine images to identify key information proposes that there are two pathways.[12345] These two systems run in parallel are fluid and communicate with each other. First, a nonselective pathway, which has alternatively been described as holistic, automatic, Gestalt-like, coup d’oeil, top-down, thin-slicing or subconscious searching, involves a global (at a glance) impression of the image. The second pathway is the selective pathway and involves careful screening for specific findings. This has also been called conscious, analytic, or bottom-up searching. Studies in neuroimaging which support this two-pathway model have shown that object identification maps to regions in the occipitotemporal cortex whereas global identification maps to other regions in the brain.[6789]There are two broad approaches to testing these pathways. The nonselective pathway can be tested using time-restricted tests whereas the selective pathway can be tested using the eye-tracking equipment. A small number of studies have examined differences in performance in time-restricted tests in radiology and pathology.[101112131415] Most of these studies have shown superior accuracy by experts. In other words, experts tend to make correct diagnoses quickly and accurately. Performance in time-restricted tests can, therefore, be used as a marker of expertise.Identifying new and more robust assessments of proficiency/expertise in histopathology (finding new “biomarkers of expertise”)[16] is desirable for several reasons (for example, as a serial assessment tool of histopathologists in training) and the main purpose of this study was to examine the usefulness of time-restricted tests in histopathology using digital images.
METHODS
Ethical approval for the study was granted by the Queen's University Belfast Research Ethics Committee, School of Medicine Dentistry and Biomedical Sciences Ref: 14.47v2, 3/11/2014.Participants (n = 56; mean ± standard deviation [SD] age: 33.2 ± 10.3 years; 31 males and 25 females) formed three groups with increasing levels of experience. Novices (Group 1) were 19 third-year undergraduate medical students who were midway through a core pathology module and who, in addition, had just completed an elective in pathology (mean ± SD age: 23.1 ± 2.9 years; 13 males and 6 females). Intermediates (Group 2) were 18 trainee histopathologists/residents (mean ± SD age: 31.3 ± 3.4 years; 6 males and 12 females). Within this group, six trainees had fewer than 2 years’ experience viewing histopathology slides; seven trainees had between 2 and 4 years’ experience, and five trainees had > 4 years’ experience. Experts (Group 3) were 19 practicing consultant histopathologists (mean ± SD age: 45.1 ± 8.0 years; 12 males and 7 females) with a mean ± SD length of experience in consultant practice of 12.2 ± 8.2 years. The consultant group was a heterogenous mixture of general and specialist histopathologists. Trainees and consultant histopathologists were based in Belfast City Hospital and Royal Victoria Hospital, Belfast, and St. James’ Hospital, Dublin.Stimuli consisted of 20 digital histological images from teaching archives [Table 1]. An example of an image used is presented in Figure 1. These were general pathology cases from four different anatomical sites chosen to represent a full range of difficulty ranging from normal histology to challenging cases. In order to achieve this, we referred to the competency framework for graded responsibility for Specialist Registrars in Histopathology and Cytopathology published by the Joint Committee on Pathology Training of the Royal College of Pathologists (UK).[17] This categorizes cases into four increasing levels of complexity, where one is the lowest and four is the highest. In addition, we introduced level 5, which is not in the original document, but includes cases that would be considered more difficult than level 4. The magnification used was tailored to each individual case; if the diagnosis was based primarily on an architectural feature (e.g., diverticulosis) a low-power magnification was used, whereas if the diagnosis was based on a cytological feature (e.g., Barrett's metaplasia) a high-power magnification was used. All test material including the clinical history, image quality, and correct diagnosis for each case was verified by two experienced consultant histopathologists who did not participate in the study.
Table 1
Twenty cases from four anatomical sites representing a broad range of difficulty
Figure 1
Pilomatrixoma (benign skin adnexal tumor) in which there are uniform basaloid cells on the left and “ghost cells” on the right, high magnification
Twenty cases from four anatomical sites representing a broad range of difficultyPilomatrixoma (benign skin adnexal tumor) in which there are uniform basaloid cells on the left and “ghost cells” on the right, high magnificationThe experiment was carried out in a seminar room where single representative fixed photographs (.jpg) of each diagnostic entity were displayed on a white screen using an overhead projector. For each case, a brief clinical summary was provided to participants. The clinical information was deliberately brief and only included age, gender, and the site of biopsy and did not give any further information from which participants could guess the correct answer. Each image was displayed for 1 s followed by a 20 s pause during which participants recorded their diagnoses. Responses were written down in free-text format rather than using a multiple-choice format in order to reduce the likelihood of guessing the correct answer. Immediately after the 1-s tests were complete, the test was repeated as before using the same images; however, the images were now displayed for 10-s. The rationale for choosing 1 s/10 s timings was that 1 s represented a brief glance, whereas 10 s was chosen to represent a longer but still challenging exposure. The candidates’ answers were marked manually, and their scores were presented using a box and whisker plot graph. Differences in performance between individual groups were analyzed using the Fisher's least significant difference method for multiple comparisons. The Kuder–Richardson formula 20 (KR-20) internal reliability coefficient was calculated using Statistical Package for the Social Sciences (SPSS, IBM SSPS Statistics 21) for both tests as quality measure of internal reliability. As a general principle, KR-20 coefficients of between 0.7 and 0.9 are considered good and coefficients of >0.9 are considered excellent.
RESULTS
The range of participants’ scores for the 1-s and 10-s tests is presented in Figures 2 and 3, respectively. For all groups (including consultants), accuracy was higher at 10-s.
Figure 2
Box and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 1-s test
Figure 3
Box and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 10-s test
Box and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 1-s testBox and whisker plots with the ends of the whiskers representing the maximum and minimum scores for all participants for the 10-s testTable 2 illustrates highly significant differences in performance between groups using the Fisher's least significant difference method for multiple comparisons. Differences between groups were consistently greater with the 10-s than the 1-s test.
Table 2
Fisher's least significant difference method for multiple comparisons, 1-s and 10-s tests
Fisher's least significant difference method for multiple comparisons, 1-s and 10-s testsKR-20 internal reliability coefficients were very high for both tests: 0.905 for the 1-s test and 0.926 for the 10-s test.The range of incorrect answers (over both tests) for the five cases that the consultants found most difficult is presented in Table 3. In some cases, the incorrect answers are major diagnostic errors; for example interpreting a benign tumor as malignant. The magnification of the image that was selected did not influence the likelihood of error.
Table 3
Range of incorrect answers submitted by consultants for the five most difficult cases
Range of incorrect answers submitted by consultants for the five most difficult cases
CONCLUSIONS
In this study, we analyzed performance in 2 time-restricted tests, at 1-s and also at 10-s. Both of these tests demonstrated very high degrees of internal reliability 0.905 and 0.926, respectively. Intuitively, we had expected that a 1-s test would be more discriminating than a 10-s test because it has been suggested that extreme time restriction can expose differences in ability that would otherwise be undetectable. However, the results of this study did not corroborate this; in fact, the 10-s test had a marginally superior reliability coefficient and differences among groups were consistently greater.There was a broad range of ability in performance within each group. A strong performance in the student group could potentially identify candidates with a natural talent for pattern recognition who could be suited to a career in pathology. The broad range of trainees most likely reflected the broad range of experience in this subgroup. Among the consultant group, the study highlighted truly expert performances; at the very top was a consultant who scored 18/20 at 1-s and 20/20 at 10-s. The lowest consultant scores most likely represented the loss of general pathology skills among pathologists who had worked as subspecialists for many years.With respect to the utility formula of assessment,[18] the time-restricted tests used in this study scored highly due to high reliability coefficients were highly valid in that they were an assessment of a key diagnostic skill, could potentially have a positive impact on learning, and were extremely brief and cost effective. We did not formally survey the participants afterward with regards to acceptability but a 10-s test appeared to be more acceptable than a 1-s test.We propose that time-restricted testing could be used in a number of situations; as a formative assessment during training to track trainees longitudinally to record the acquisition or failure of acquisition of skills with a view to accelerate this progression but also to quickly identify, at an early stage, trainees in difficulty who may require additional support or who may not be suited to a career in histopathology; to demonstrate maintenance of expert skills throughout a career; to assess pathologists who chose to re-train in a new subspecialty; to compare training programs nationally and internationally to identify best practice.Of course, time-restricted testing only assesses one skill, the ability to make rapid, and accurate assessments of histopathological images. While some cases are “spot/at a glance diagnoses” others require careful screening of slides to identify a subtle or scanty feature; for example, in cervical cytology or prostate chippings. In this study, we only looked at expertise in rapid image identification, whereas differences in experience with regard to screening histological slides requires analysis of pathologists’ searching strategies which can be assessed using eye tracking studies or search maps recorded when pathologists view digital slides.[1920212223242526] We suggest that performance in time-restricted tests could be incorporated with performance in other tests such as eye tracking to provide a global overview of an individual pathologist's expertise.Being a safe and competent practitioner also requires attention to detail, an ability to correlate clinical and pathological information, a commitment to audit, quality improvement, and continuous professional development. Some pathologists may perform well in a time-restricted test but may not be committed to these other attributes.An interesting finding in this study was the remarkably high diagnostic accuracy of experts of 72% at 1-s and 83% at 10-s, which could have implications for routine practice. With increasing experience, experts can diagnose so many cases within the first few seconds that there is potentially an associated increased risk of medical error due to overconfidence bias and premature closure.[27] During routine busy practice, there is a constant tension between rapid diagnosis and cautious decision making and it is important that histopathologists are aware that over-reliance on “at a glance” diagnoses is a potential source of medical error. In this study, there were a number of examples where the consultant's first impression was that of a malignant tumor when the lesion was benign. It is likely that some of these discrepancies were due to specialists giving opinions on cases with which they were unfamiliar but there are numerous examples in surgical pathology where benign lesions can closely mimic malignant lesions, and an interesting further study could involve specifically focusing on these problematic areas. In addition, in a small number of cases, there is double pathology, and it is important not to miss the second abnormality having identified the first more obvious pathology.A potential weakness in our study was using a consultant/expert group that was a mixture of generalists and specialists and, while all of the experts had originally trained in general pathology, some had since subspecialized in areas not included in this study. If all the experts had been practicing general pathologists, the expert group might have performed better. Furthermore, inclusion of 19 novices of very low ability could have artificially inflated the high KR-20 reliability coefficient[28] and it would, therefore, be useful to carry out similar tests with trainees and/or consultant histopathologists. Another potential weakness is that interpretation of the images at 10-s may have been influenced by the 1-s test due to a direct/repetition priming effect.[29]
Authors: Elizabeth A Krupinski; Allison A Tillack; Lynne Richter; Jeffrey T Henderson; Achyut K Bhattacharyya; Katherine M Scott; Anna R Graham; Michael R Descour; John R Davis; Ronald S Weinstein Journal: Hum Pathol Date: 2006-12 Impact factor: 3.466
Authors: Tracy Onega; Raymond L Barnhill; Michael W Piepkorn; Gary M Longton; David E Elder; Martin A Weinstock; Stevan R Knezevich; Lisa M Reisch; Patricia A Carney; Heidi D Nelson; Andrea C Radick; Joann G Elmore Journal: JAMA Dermatol Date: 2018-10-01 Impact factor: 10.282