Literature DB >> 33057623

What do radiologists look for? Advances and limitations of perceptual learning in radiologic search.

Robert G Alexander^1,2, Stephen Waite^3,4, Stephen L Macknik^1,5, Susana Martinez-Conde^1,6.

Abstract

Supported by guidance from training during residency programs, radiologists learn clinically relevant visual features by viewing thousands of medical images. Yet the precise visual features that expert radiologists use in their clinical practice remain unknown. Identifying such features would allow the development of perceptual learning training methods targeted to the optimization of radiology training and the reduction of medical error. Here we review attempts to bridge current gaps in understanding with a focus on computational saliency models that characterize and predict gaze behavior in radiologists. There have been great strides toward the accurate prediction of relevant medical information within images, thereby facilitating the development of novel computer-aided detection and diagnostic tools. In some cases, computational models have achieved equivalent sensitivity to that of radiologists, suggesting that we may be close to identifying the underlying visual representations that radiologists use. However, because the relevant bottom-up features vary across task context and imaging modalities, it will also be necessary to identify relevant top-down factors before perceptual expertise in radiology can be fully understood. Progress along these dimensions will improve the tools available for educating new generations of radiologists, and aid in the detection of medically relevant information, ultimately improving patient health.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 33057623 PMCID： PMC7571277 DOI： 10.1167/jov.20.10.17

Source DB: PubMed Journal: J Vis ISSN： 1534-7362 Impact factor: 2.240

Introduction

Current models of medical image perception are incomplete and demonstrate significant gaps in the current understanding of radiologic expertise (see Waite et al., 2019 for a review). That is to say, we do not precisely know what an expert radiologist does: currently, radiologists achieve peak expertise only after years of trial-and-error training, during which they acquire their skillset through veiled principles that are yet to be articulated. Mentors provide feedback about mistakes, guidance for what is benign versus malignant, and other conceptual, factual, and procedural information. However, pattern recognition is difficult to teach (Kellman & Garrigan, 2009), and expertise in viewing radiologic images is therefore gained largely as a function of the number of images read, rather than through explicit instruction and understanding (Krupinski, Graham, & Weinstein, 2013; Nodine & Mello-Thoms, 2010). The result is a knowledge base that has not translated into concrete methods of training derived from critical perceptual features. Determining the precise features that radiologists use to discriminate abnormalities in medical images—and designing innovative heuristics for trainees that enable efficient learning of informative features—would help optimize performance in the field.

Error rates in radiology

The long-term goal of most studies of radiologist performance is to reduce error. Although it is difficult to determine the precise error rates in current clinical practice, it is clear that reductions in error rate would improve patient care. The diagnostic error rate in a typical clinical practice (comprising both normal and abnormal image studies) is between 3% and 4% (Borgstede, Lewis, Bhargavan, & Sunshine, 2004; Siegle et al., 1998), which translates into approximately 40 million interpretive errors per year worldwide (Bruno, Walker, & Abujudeh, 2015). This error rate is substantially higher—approximately 30%—when all images contain abnormalities (Berlin, 2007; Rauschecker et al., 2020; see Waite et al., 2017 for a review). Detection/omission errors account for 60% to 80% of interpretive errors (Funaki, Szymski, & Rosenblum, 1997; Rosenkrantz & Bansal, 2016). Thus faulty perception is the most important source of interpretive error in diagnostic imaging (Berlin, 2014; Donald & Barnard, 2012; Krupinski, 2010).

Radiologic image viewing as a visual search task

Radiologic image viewing is essentially a specialized visual search task: the first step in medical imaging is the detection of medically relevant information in an image (e.g., nodules in a chest x-ray [CXR]), by searching for abnormalities amid normal anatomy and physiology (e.g., normal lung tissue). Visual features play a critical role in such tasks: search is generally faster and targets easier to detect and recognize when target features are dissimilar to those of the background and nontarget distractors (Alexander, Nahvi, & Zelinsky, 2019; Alexander & Zelinsky, 2011; Alexander & Zelinsky, 2012; Duncan & Humphreys, 1989; Ralph, Seli, Cheng, Solman, & Smilek, 2014; Treisman, 1991). Outside the field of radiology, researchers often decompose visual search tasks into two stages: the initial detection of various simple target features across the visual field, and the subsequent deployment of attention and gaze to specific objects, in which more complex visual information is available on foveation (Alexander & Zelinsky, 2012; Alexander & Zelinsky, 2018; Castelhano, Pollatsek, & Cave, 2008; Wolfe, 1994a). Radiologic search tasks are similarly thought to involve an initial feature-processing step across the visual field, followed by foveation of specific objects (Krupinski, 2011; Kundel, Nodine, Conant, & Weinstein, 2007; Nodine & Kundel, 1987; Sheridan & Reingold, 2017). Although there is a consensus on a few features (color, motion, orientation, and size) used by the human visual system to guide attention and eye movements during search tasks, the exploration of other potential features has been limited due to dissenting opinions, insufficient data, and alternative explanations for observed data patterns (Wolfe & Horowitz, 2004; Wolfe & Horowitz, 2017). Further, even if the visual system can use a feature, it does not mean that the feature will be used (Alexander et al., 2019). As a result, the features that guide search in any given context—including medical image viewing—are not yet fully known.

Types of errors in radiologic search

Some authors have proposed that errors in medical image perception can be best understood as stemming from different aspects of the task—leading to three general types of errors: search, recognition, and decision errors (Kundel, Nodine, & Carmody, 1978). “Scanning errors” (also called “search errors”) result from failures in the first stage of search. Specifically, peripheral information fails to guide the observer's gaze to a relevant location, and high-resolution foveal vision does not assist interpretation because the observer never looks at the location directly (Doshi et al., 2019; Holland, Sun, Gackle, Goldring, & Osmar, 2019; Ukweh, Ugbem, Okeke, & Ekpo, 2019). In “recognition errors,” abnormalities are foveated too briefly for the observer to correctly recognize them (Holland et al., 2019). Depending on the imaging modality, the foveation time that suffices to prevent recognition errors varies from 500 to 1000 ms (Hamnett & Jack, 2019; Holland et al., 2019). “Decision-making errors” occur when the observer either fails to recognize relevant features or actively dismisses them, despite foveating an abnormality for a relatively long period of time (Baskaran et al., 2019; Holland et al., 2019). Roughly one-third of omission errors falls under each of the earlier mentioned three categories (Kundel et al., 1978). Traditionally, decision-making errors have been considered “cognitive” errors (in which the abnormality is visually detected but the meaning or importance of the finding is not correctly understood or appreciated), and both scanning and recognition errors have been considered “perceptual” errors (in which an abnormality is not observed) (Kundel, 1989). However, all three kinds of errors may result from perceptual failures, and all three kinds of errors might reflect expectation or cognitive biases, making it sometimes impossible to determine whether a given search, recognition, or decision-making error is perceptual or cognitive in nature. Indeed, both foveal and peripheral search performance might rely on the same perceptual features, rather than on different features for peripheral versus foveal search (Maxfield & Zelinsky, 2012; Nakayama & Martini, 2011; Zelinsky, Peng, Berg, & Samaras, 2013; but see Alexander & Zelinsky, 2018). Thus an observer looking specifically for dark, oriented bands might fail to foveate a bright, round lesion for the same reasons they might fail to recognize it if foveated.

Visual search in 3D volumetric imaging

Searching through CXRs and other two-dimensional (2D) images is superficially similar to search tasks conducted in traditional laboratory settings. However, search through three-dimensional (3D) volumetric images involves a qualitatively different process than that through 2D images. When reading a computed tomography (CT) or a magnetic resonance imaging scan, a radiologist must scroll through a stack of images—thin slices of the 3D volume of an organ (Nakashima, Komori, Maeda, Yoshikawa, & Yokosawa, 2016). When searching for lung nodules in an image stack from a chest CT, a common strategy is to restrict one's eye movements to a small region of the image, while quickly scrolling (i.e., “drilling”) through the stack (Drew et al., 2013). An alternative strategy is to change depth more slowly, and make eye movements across a larger area of the image (Drew et al., 2013) (Figure 1). Similar patterns have been found in searches through digital breast pathology images, in which observers “zoom” in and out of a single image, rather than scroll through image stacks (Mercan, Shapiro, Brunyé, Weaver, & Elmore, 2018). These strategies may depend on the body part imaged, with “scanning” being more likely in studies of larger body parts: when intending to search a small anatomic region, it makes little sense to make large eye movements. Thus radiologists typically adopt a “driller” strategy when viewing CTs of the abdomen and pelvis (Kelahan et al., 2019). Although some searches through these anatomic regions involve slower changes in depth—and those radiologists might be characterized as “scanners” (as opposed to “drillers”)—their search patterns still qualitatively resemble the “driller” pattern shown in Figure 1. Regardless, and despite any differences in strategy, the vast amount of 3D data that radiologists must scrutinize effectively prevents the careful foveation of each image region within a CT stack (Eckstein, Lago, & Abbey, 2017; Miller et al., 2015). Therefore because some image regions are only seen peripherally on some slices, peripheral vision is especially important in searches of CT images.

Figure 1.

Description of 3D scan paths from Drew et al. (2013), who recorded eye position in each quadrant (left panel) as observers scrolled through CT scans in depth. Color indicates the quadrant of the image the radiologist was looking at during a given time in the trial. “Depth” on the y-axis refers to the 2D orthogonal slice of the scan currently viewed. In this study, radiologists looking for nodules on chest CTs could be characterized into two groups based on their search strategies. “Drillers,” such as the radiologist whose data appear in the middle column, tend to look within a single region of an image while quickly scrolling back and forth in depth through stacks of images. “Scanners,” such as the radiologist whose data appear in the right column, scroll more slowly in depth, and typically do not return to depths that they have already viewed. Scanners make more frequent eye movements to different spatial locations on the image, exploring the current 2D slice in greater detail. Note that although scanners spend more time than drillers making saccades per slice, neither scanners nor drillers visit all four quadrants of the image on every slice. Thus some regions of some slices may never be viewed foveally by either group. (Reprinted from Drew et al., 2013). As peripheral vision cannot provide the kind of fine spatial discrimination that characterizes foveal vision, detectability of certain lesions can differ between searches through 3D or 2D images. For instance, Eckstein et al. (2017) found higher detectability for calcifications in 2D single slice images, and relatively improved detection of masses in 3D volumetric imaging, lending support to the notion that observer performance in 2D search tasks might not generalize to that in volumetric searches. To address this possibility, Wen et al. (2016) used a novel dynamic 3D saliency approach to model naive observers’ gaze distributions in images designed to mimic lung CTs. They found that the dynamic 3D saliency model predicted gaze distributions better than the traditional saliency approach, especially for observers who scrolled quickly in depth. Wen et al. (2016) proposed that the success of the dynamic saliency model might result from the human visual system's sensitivity to optic flow (the apparent motion of objects caused by the relative motion between observer and scene). However, evidence from other studies indicates that optic flow may not preattentively guide attention or eye movements (Wolfe & Horowitz, 2004; Wolfe & Horowitz, 2017).

Gaining expertise in medical image analysis

Expert radiologists efficiently direct their gaze to clinically relevant information using learned features in their peripheral vision (Kundel, 2015). That is, expert radiologists are better at “search,” finding abnormalities faster than novices because they need fewer eye movements to foveate an abnormality that they first detect peripherally (Drew et al., 2013; Manning, Ethell, Donovan, & Crawford, 2006). The initial stage of peripheral processing typically produces the greatest differences between experts and novices in radiologic search (Drew et al., 2013; Manning et al., 2006). Researchers have therefore begun to ask whether performance differences between expert and novice radiologists may be the result of underlying differences in search strategies (Brams et al., 2020; Wood, 1999; Wood et al., 2013). Search tasks that are initially performed slowly and inefficiently can become faster and efficient with practice, as the searchers learn task-relevant features (Frank, Reavis, Tse, & Greenlee, 2014; Sireteanu & Rettenbach, 2000; Steinman, 1987). Thus performance in one study's initially slow search task—searching for red-green bisected disks among green-red bisected disks—improved dramatically across eight training sessions spread over eight different days, and this improvement was still present when the same subjects were retested 9 months later (Frank et al., 2014). Radiologists similarly become faster at searching for abnormalities as they gain expertise (Krupinski, 1996; Kundel et al., 2007; Nodine, Kundel, Lauver, & Toto, 1996.

Perceptual learning in radiology

Like any other perceptual skill, the ability to detect radiologic abnormalities can improve through perceptual learning, that is, experience-induced improvements in the way perceptual information is extracted from stimuli. Perceptual learning techniques have been developed to accelerate the acquisition of perceptual expertise in domains as varied as flight training and mathematics, as well as in histopathology and surgery (Guerlain et al., 2004; Kellman, 2013; Kellman & Kaiser, 2016; Kellman, Massey, & Son, 2010; Krasne, Hillman, Kellman, & Drake, 2013). One example from radiology is that novice mammography film readers’ sensitivity toward low-contrast information in x-rays improves with increasing practice viewing x-ray images (Sowden, Davies, & Roling, 2000; Sowden, Rose, & Davies, 2002). Several studies have demonstrated transfer of learning from trained images to new images. For instance, performance improvements resulting from training through exposure to CXR images can transfer from positive contrast images to negative contrast images, and vice versa (Sowden et al., 2000). Recent research has shown that training to ascertain if abdominal CTs are consistent with appendicitis can transfer both to previously unseen abdominal CT images and to different image orientations (Johnston et al., 2020). Another recent study (Li, Toh, Remington, & Jiang, 2020) presented novice nonradiologist participants with pairs of CXRs, one of which always contained a tumor. Across four sessions, observers practiced discriminating which CXR contained the tumor and locating the tumor within the image. Perceptual learning resulted in discrimination performance improvements both for old images and for novel images. Transfer of learning to novel images has also been demonstrated when training to identify bone fractures on pelvic radiographs (Chen, HolcDorf, McCusker, Gaillard, & Howe, 2017). Studies on the effects of image variability have produced mixed results: whereas one study found that greater variability of training images led to greater transfer effects (Chen et al., 2017), another study found comparable performance after training with either a larger number of images or more repetitions of a smaller number of images (Li et al., 2020). The earlier mentioned evidence of limited transfer notwithstanding improvements in detection or discrimination of visual stimuli are usually constrained to the particular tasks (Ball & Sekular, 1987; Saffell & Matthews, 2003) or to the features (Ball & Sekular, 1987; Fahle, 2005) used during training (see Karni & Sagi, 1991; Sagi, 2011; Watanabe & Sasaki, 2015 for reviews). Similarly, the perceptual skills that radiologists develop over the course of their training are restricted to specific radiologic image perception tasks. Indeed, radiologists are no better at performing nonradiologic search tasks than nonradiologists are (Nodine & Krupinski, 1998). Thus radiologic expertise does not arise from general perceptual improvements, but instead results from the learning of features and task demands specific to radiologic search tasks. This is consistent with findings from laboratory studies suggesting that perceptual learning primarily improves radiologic search as a result of the learning of task-relevant visual features (as opposed to the learning of other task demands [Frank et al., 2014]). Perceptual learning regimens have been shown to improve performance even in medical students who had already seen thousands of images prior to perceptual training (Krasne et al., 2013; Sowden et al., 2000). Further, even small amounts of practice in a relatively short interval can produce significant improvements in radiologic performance. For example, Krasne et al. (2013) used web-based perceptual and adaptive learning modules to enhance histopathology pattern recognition and image interpretation in a test group of medical students. The many short classification trials in the learning modules were combined with a continuous assessment of accuracy and reaction time, which was used both to track progress and to adapt trials to focus perceptual learning on the categories of patterns that needed the most practice. The training led to improved accuracy and reaction times from pretest to posttest, with a delayed posttest (6–7 weeks later) showing that much of this learning was retained. In a different perceptual learning study with other radiologic tasks, improvement was appreciated after just a few hours of training (Johnston et al., 2020). Detailed feedback may be particularly important for the perceptual learning of radiologic features: in people with no prior experience viewing mammographic images, sensitivity to lesions with complex visual structures only improved when feedback about both response correctness and correctness of the identified location of the lesion was provided (Frank et al., 2020). In such conditions, performance improvements were significantly retained 6 months after training. Using a different radiologic task (Johnston et al., 2020) similarly found stronger perceptual learning when participants received feedback about both accuracy and location, compared with accuracy alone. Training to recognize the often-subtle diagnostic features in radiology may thus benefit more from specific feedback during instruction than from learning the simple features that are often used in perceptual learning tasks (e.g., orientation). In particular, subtle structural feature layouts may be both difficult to perceive and vary substantially across different radiologic images. Although observers can be successfully trained in perceptual learning of difficult- or impossible-to-perceive patterns (Seitz, Kim, & Watanabe, 2009; Watanabe, Náñez, & Sasaki, 2001), explicit feedback can help radiologists learn how to better resolve subtle critical features (Seitz, 2020). This does not necessarily mean that feedback must be provided after each image: blocked feedback (i.e., explicit messages indicating the percent of images that were correctly diagnosed) has been shown to boost perceptual learning (Seitz et al., 2010). The main caveat for this general approach is that developing the perceptual skills needed for successful detection of abnormalities requires practice within the correct task context and with the correct set of features. Thus the key is: how do we determine the features that expert radiologists use when searching through medical images?

Saliency models and guiding features in radiologic images

In visual search, eye movements are often directed to the most “salient” or “informative” regions in an image (McCamy, Otero‐Millan, Di Stasi, Macknik, & Martinez-Conde, 2014; Otero‐Millan, Troncoso, Macknik, Serrano-Pedraza, & Martinez-Conde, 2008). Salient regions are thus a reasonable place for radiologists to explore first when searching medical images for clinically relevant abnormalities. Several studies have therefore attempted to use computational models of saliency to specify the features radiologists use when searching for abnormalities. Tests of saliency models may provide insights into the importance of different features in radiologic image viewing. For instance, if a saliency model fails to accurately predict human performance, it may be that the model neglects relevant visual features. Conversely, if a model identifies salient lesions that radiologists miss, the model may rely on overlooked image features that radiologists might incorporate in future searches. CXRs, the radiologic test most ordered in hospitals (Pirnejad, Niazkhani, & Bal, 2013), produce 2D images with overlapping structures that have different luminance or optical densities. The larger the difference in thickness or density in the anatomy between two structures (e.g., air and dense tissue), the larger the resulting difference in radiographic density or contrast. Salient regions in these images are therefore typically those that differ in density from the regions around them: for example, nodules differ in density from their surround, and can be salient on CXRs (Jampani, Ujjwal, Sivaswamy, & Vaidya, 2012). Alzubaidi, Balasubramanian, Patel, Panchanathan, and Black (2010) found that image regions that typically capture radiologists’ gazes in CXRs are characterized by oriented edges and textures. Jampani, Ujjwal, et al. (2012) moreover found that, in the case of CXR's demonstrating pneumoconiosis (a lung disease caused by breathing in certain kinds of dust), saliency maps generated using the graph-based visual saliency (GBVS) computational model performed relatively well compared with radiologists’ fixations. In a more recent and comprehensive study, Wen et al. (2017) evaluated 16 representative saliency models and ranked them by how well the saliency maps agreed with radiologists’ eye positions during interpretation of CXR, CT scans, and positron emission tomography (PET) scans. Relative to CXR, CT scans produce higher resolution views of nodules and other abnormalities, as well as better visualization of soft tissue. Scrolling through 3D CT scans can produce feature dimensions not present in 2D images, such as apparent motion and optic flow. In PET, the customary use of a dye containing radioactive tracers yields stronger signals in the regions to which the tracer flows, and therefore higher saliency. Wen et al. (2017) found that the rank orders of the models over medical images were different from the benchmark rank-order over natural images, and that the models’ performance differed across medical imaging modalities (Figure 2). This pattern is consistent with reports that radiologic search uses different features than other search tasks, and that the skills involved may differ across imaging modalities (Gunderman, Williamson, Fraley, & Steele, 2001; Nodine & Krupinski, 1998). Further, certain saliency models matched the performance of individual radiologists better than that of others (Wen et al., 2017). For example, the “Saliency in Context” or “Salicon” model—which operates very differently from most saliency models—uses high-level semantics in deep neural networks to recognize objects, rather than relying solely on low-level feature differences (Huang, Shen, Boix, & Zhao, 2015). As one might expect, the Salicon model did not correlate strongly with any of the other saliency models tested, but it outperformed most other models in predicting the eye movements of two participants (a radiologist faculty member and a fellow) when viewing CT scans. However, Salicon ranked 15th or 16th among 16 models for predicting the eye movements of each of the four other radiologists, suggesting that individual differences in gaze behavior and performance may result at least partly from different radiologists using different image features, or employing different task strategies, during visual search (Wen et al., 2017; see also Wen et al., 2016).

Figure 2.

Examples of saliency models applied to PET, CT, and CXR. Saliency maps are represented as heat maps, with color indicating the saliency at that location: red is more salient than blue. The left column displays representative images. The middle column shows examples of saliency maps that accurately highlighted the regions of interest in the images. The right column shows examples of saliency maps that highlighted task-irrelevant regions in the images. Models with accurate predictions may provide insight into the features that radiologists use to view images, and models with inaccurate predictions may help narrow the list of potential features that need to be assessed. Image signature (ImgSig; Hou, Harel, & Koch, 2012), fast and efficient saliency (FES; Tavakoli, Rahtu, & Heikkilä, 2011), and RARE (Riche et al., 2013) were top-ranked models for PET, CT, and CXR. (Reprinted from Wen et al., 2017). SIM, Saliency by induction mechanisms; CovSal, Covariance saliency; AIM, Attention based on information maximization. Visual features that have predicted the gaze patterns of expert radiologists in 2D medical image models include orientation, pixel intensity, and size (Alzubaidi et al., 2010; Jampani, Sivaswamy, & Vaidya, 2012). Intensity and orientation also predict the gaze patterns of expert radiologists in 3D volumetric images, as is optic flow, although the features relevant to 3D images may partly depend on the search strategies that radiologists adopt (e.g., “scanning” slices before scrolling in depth versus “drilling” more quickly in depth) (Wen et al., 2016; Wen et al., 2017). We note that the earlier mentioned saliency models do not consider different task strategies. Instead, these models are “bottom-up,” or solely driven by the low-level visual features of an image. The bottom-up modeling approach assumes that certain parts of an image are salient enough to be attended and looked at, regardless of the ongoing goals of the viewer (e.g., (Theeuwes, 2004). Bottom-up features are particularly important for the detection of unexpected or incidental findings. Yet, attention and eye movements are also affected by “top-down” information: recurrent feedback processing can bias the direction of eye movements and attention as a result of the viewer's goals and expectations (Alexander et al., 2019; Chen & Zelinsky, 2006; Folk, Remington, & Johnston, 1992). Top-down mechanisms are known to play powerful roles in search, by restricting exploration to image regions that are likely to contain targets, by preventing salient but task-irrelevant features from capturing as much attention as task-relevant features (Alexander & Zelinsky, 2012; Chen & Zelinsky, 2006; Folk et al., 1992; Wolfe, Butcher, Lee, & Hyle, 2003), and by “filling-in” missing information about search targets (Alexander & Zelinsky, 2018). In the case of radiologists, previous knowledge and expertise (including their a priori expectations about a task) can change the features they may use to accomplish their task goals. Further, the interpretation of medical imaging is highly task-dependent: radiologic expertise in one domain does not necessarily rely on the same skills as expertise in another domain (Beam, Conant, & Sickles, 2006; Elmore et al., 2016; Gunderman et al., 2001). Task demands can similarly change within a specific domain and therefore affect search behavior: for example, in patients with known renal cell cancer, radiologists may prolong their search of the lungs, looking carefully for nodules that might represent metastatic lesions. Thus “top-down” features likely play a large role in radiologic search, in which relevant information may not be salient from a low-level perspective. For instance, a fat-containing tumor may resemble normal fat—and therefore fail to be detected by bottom-up models—but could be displaced spatially, appearing in a location where there should not be fatty tissue. Similarly, air in the lungs is normal, but air in the heart is abnormal. In addition, knowledge of the typical appearance of lesions and the normal appearance of the surrounding organ/tissue also aids radiologists in searching for lesions. Understanding the patient's clinical scenario can help ensure that the radiologist carefully examines the relevant regions of the image with the right features in mind. Consequently, including top-down information is a major challenge in developing accurate models of radiologic search. Computational models may theoretically make use of such top-down information, and some have begun to do so. Jampani et al. (2012) found that although the lung regions cover only approximately 40% to 50% of the area in a typical 2D CXR, they contain approximately 84% of all fixations, which is indicative of their top-down importance. A modified saliency model that combined bottom-up and top-down saliency (and thus avoided unimportant image regions) performed better than the standard bottom-up GBVS model in predicting eye fixations (Jampani, Ujjwal, et al., 2012). To incorporate top-down factors in their saliency models, Wen et al., (2017) segmented relevant anatomic regions (i.e., the liver and aorta in CT images), so that other regions (e.g., the kidney on CT images) were not allowed to be salient.

Potential future approaches

As promising candidate features continue to be identified by models that accurately predict radiologist performance, further studies will need to confirm that these are the features radiologists do use, rather than features that are correlated with actual critical features. One approach to confirming that the correct features have been identified is to create faux radiologic images that are matched to real images in terms of such features. Recently, Semizer, Michel, Evans, and Wolfe (2018) took this approach, testing whether a texture model (the Portilla-Simoncelli texture algorithm; [Portilla & Simoncelli, 2000]) accurately captured features used by radiologists (for similar approaches in other domains; see Alexander, Schmidt, & Zelinsky, 2014; Rosenholtz, Huang, & Ehinger, 2012; Rosenholtz, Huang, Raj, Balas, & Ilie, 2012). This study used the texture algorithm to generate faux images that perfectly matched real medical images in terms of the modeled texture features, but which were otherwise different. When no visible lesion was present in the image, radiologists’ performance was comparable for real images and faux images, suggesting that both types of images were equivalent in terms of the critical features that radiologists use to render judgment. However, when visible lesions were present, radiologists performed better with real than with faux images, suggesting that they used additional information other than texture to conduct the task (Semizer et al., 2018). One possibility is that the spatial relationships between image elements are also important: the Portilla-Simoncelli texture algorithm discards spatial relationships between local features, thereby removing the global or configural shape, which previous research has found to be important in the targeting of visual attention during search (Alexander et al., 2014). We believe that the texture approach can prove fruitful in conjunction with the correct features or model. Although a simple grayscale object (e.g., a white bar on a black background) can be described by first-order cardinal image statistics—including contrast, spatial frequency, position, entropy, and orientation—none of these dimensions individually indicates that a radiologic image region is normal or abnormal. These dimensions may guide the targeting of eye movements (Wolfe, 1994a), but are not necessarily task relevant, and therefore do not provide a complete picture of the relationship between informativeness and ocular targeting (McCamy et al., 2014). The identification of some combinations of features has proven efficient in other contexts (Rappaport, Humphreys, & Riddoch, 2013; Wildegger, Riddoch, & Humphreys, 2015; Wolfe, 1994b; Wolfe et al., 1990), and may apply to efficient radiologic search. Krupinski, Berger, Dallas, and Roehrig (2003) found that individual features (signal-to-noise ratio, size, conspicuity, location, and calcification status) did not predict the gaze patterns of radiologists, but their combination did. Although Semizer et al. (2018) suggested that more than texture is involved in radiologic search, their model might have relied on the wrong features or combination of features. Because texture space is so large, with many potential dimensions and feature values along those dimensions, it remains unknown what values are important. However, just as normal images may be represented as statistical combinations of features (i.e., visual texture), abnormal images are theoretically detectable as deviations from such statistical combinations or textures. Having said that, understanding of eye movements will be important for determining these features. One of the main results from the study of radiologic search is that experts can find tumors faster than novices—meaning that experts can find relevant features in their peripheral vision and direct their central vision to those features (Drew et al., 2013; Manning et al., 2006). A key difference between expert and novice radiologic search is therefore the ability to detect critical peripheral features. Where a general approach in radiology has been to determine the features that distinguish normal from abnormal image regions—a process that has aided the growth and development of computer aided detection and diagnostic tools—we propose that future work should be directed to identifying the key features that experts locate in their visual periphery. Such features, critical for radiologists’ peripheral advantage, have not yet been determined. Longitudinal studies of how gaze dynamics change as a function of expertise may be particularly valuable toward this goal. Once these features are known, follow-up studies may develop heuristics based on them, to help optimize radiologists’ eye movements and foveation during radiologic screening. Many of the approaches outlined earlier use purely visual information, without considering the additional knowledge that radiologists rely on, such as the patient's history or the context in which a test was ordered. Although much can be achieved through visual-only approaches, current models are likely to constitute the first few steps along the way to future models that may combine bottom-up visual information with oculomotor behavior, top-down expectations, and other contextual data (e.g., patient history) to more fully characterize expert radiologists’ perceptual experience. As current understanding of the features that radiologists use becomes more refined, it will become possible to design heuristics for radiologists in training, which may specifically target the perceptual learning of relevant image features. Such tools have the potential to enhance student instruction, decrease perceptual errors, and help improve patient health and well-being.

Conclusions

Perceptual errors in radiology are a significant contributor to patient harm (Waite et al., 2019; Waite et al., 2017). Educational and practical interventions to improve human perceptual and decision-making skills are therefore needed to improve diagnostic accuracy and to reduce medical error (Ekpo, Alakhras, & Brennan, 2018; Waite et al., 2020; Waite et al., 2019). However, the features used by expert radiologists during visual inspection of medical images are not yet well specified. Feature-based modeling approaches, including the saliency models discussed in this review, may help isolate such features—a practical matter critically important to the perceptual training of future radiologists, and ultimately to patient safety.

92 in total

1. Where practice makes perfect in texture discrimination: evidence for primary visual cortex plasticity.

Authors: A Karni; D Sagi
Journal: Proc Natl Acad Sci U S A Date: 1991-06-01 Impact factor: 11.205

2. Real-world visual search is dominated by top-down guidance.

Authors: Xin Chen; Gregory J Zelinsky
Journal: Vision Res Date: 2006-09-26 Impact factor: 1.886

3. Image Signature: Highlighting Sparse Salient Regions.

Authors: J Harel; C Koch
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2011-07-28 Impact factor: 6.226

4. Visual search and lung nodule detection on CT scans.

Authors: Harold L Kundel
Journal: Radiology Date: 2015-01 Impact factor: 11.105

5. Diagnostic errors in abdominopelvic CT interpretation: characterization based on report addenda.

Authors: Andrew B Rosenkrantz; Neil K Bansal
Journal: Abdom Radiol (NY) Date: 2016-09

6. Visual search and stimulus similarity.

Authors: J Duncan; G W Humphreys
Journal: Psychol Rev Date: 1989-07 Impact factor: 8.934

7. Artificial Intelligence System Approaching Neuroradiologist-level Differential Diagnosis Accuracy at Brain MRI.

Authors: Andreas M Rauschecker; Jeffrey D Rudie; Long Xie; Jiancong Wang; Michael Tran Duong; Emmanuel J Botzolakis; Asha M Kovalovich; John Egan; Tessa C Cook; R Nick Bryan; Ilya M Nasrallah; Suyash Mohan; James C Gee
Journal: Radiology Date: 2020-04-07 Impact factor: 11.105

8. Specifying the precision of guiding features for visual search.

Authors: Robert G Alexander; Roxanna J Nahvi; Gregory J Zelinsky
Journal: J Exp Psychol Hum Percept Perform Date: 2019-06-20 Impact factor: 3.332

Review 9. The Holistic Processing Account of Visual Expertise in Medical Image Perception: A Review.

Authors: Heather Sheridan; Eyal M Reingold
Journal: Front Psychol Date: 2017-09-28

10. Value and Diagnostic Efficacy of Fetal Morphology Assessment Using Ultrasound in A Poor-Resource Setting.

Authors: Ofonime N Ukweh; Theophilus I Ugbem; Chibuike M Okeke; Ernest U Ekpo
Journal: Diagnostics (Basel) Date: 2019-09-01

4 in total

Review 1. Mandating Limits on Workload, Duty, and Speed in Radiology.

Authors: Robert Alexander; Stephen Waite; Michael A Bruno; Elizabeth A Krupinski; Leonard Berlin; Stephen Macknik; Susana Martinez-Conde
Journal: Radiology Date: 2022-06-14 Impact factor: 29.146

2. Eye tracking: empirical foundations for a minimal reporting guideline.

Authors: Kenneth Holmqvist; Saga Lee Örbom; Ignace T C Hooge; Diederick C Niehorster; Robert G Alexander; Richard Andersson; Jeroen S Benjamins; Pieter Blignaut; Anne-Marie Brouwer; Lewis L Chuang; Kirsten A Dalrymple; Denis Drieghe; Matt J Dunn; Ulrich Ettinger; Susann Fiedler; Tom Foulsham; Jos N van der Geest; Dan Witzner Hansen; Samuel B Hutton; Enkelejda Kasneci; Alan Kingstone; Paul C Knox; Ellen M Kok; Helena Lee; Joy Yeonjoo Lee; Jukka M Leppänen; Stephen Macknik; Päivi Majaranta; Susana Martinez-Conde; Antje Nuthmann; Marcus Nyström; Jacob L Orquin; Jorge Otero-Millan; Soon Young Park; Stanislav Popelka; Frank Proudlock; Frank Renkewitz; Austin Roorda; Michael Schulte-Mecklenbeck; Bonita Sharif; Frederick Shic; Mark Shovman; Mervyn G Thomas; Ward Venrooij; Raimondas Zemblys; Roy S Hessels
Journal: Behav Res Methods Date: 2022-04-06

3. Negative cues minimize visual search specificity effects.

Authors: Ashley M Phelps; Robert G Alexander; Joseph Schmidt
Journal: Vision Res Date: 2022-03-18 Impact factor: 1.984

Review 4. Visual Illusions in Radiology: Untrue Perceptions in Medical Images and Their Implications for Diagnostic Accuracy.

Authors: Robert G Alexander; Fahd Yazdanie; Stephen Waite; Zeshan A Chaudhry; Srinivas Kolla; Stephen L Macknik; Susana Martinez-Conde
Journal: Front Neurosci Date: 2021-06-11 Impact factor: 5.152

4 in total