Daniel T Huff1, Amy J Weisman1, Robert Jeraj1,2. 1. Department of Medical Physics, University of Wisconsin-Madison, Madison WI, United States of America. 2. Faculty of Mathematics and Physics, University of Ljubljana, Ljubljana, Slovenia.
Abstract
Deep learning (DL) approaches to medical image analysis tasks have recently become popular; however, they suffer from a lack of human interpretability critical for both increasing understanding of the methods' operation and enabling clinical translation. This review summarizes currently available methods for performing image model interpretation and critically evaluates published uses of these methods for medical imaging applications. We divide model interpretation in two categories: (1) understanding model structure and function and (2) understanding model output. Understanding model structure and function summarizes ways to inspect the learned features of the model and how those features act on an image. We discuss techniques for reducing the dimensionality of high-dimensional data and cover autoencoders, both of which can also be leveraged for model interpretation. Understanding model output covers attribution-based methods, such as saliency maps and class activation maps, which produce heatmaps describing the importance of different parts of an image to the model prediction. We describe the mathematics behind these methods, give examples of their use in medical imaging, and compare them against one another. We summarize several published toolkits for model interpretation specific to medical imaging applications, cover limitations of current model interpretation methods, provide recommendations for DL practitioners looking to incorporate model interpretation into their task, and offer general discussion on the importance of model interpretation in medical imaging contexts.
Deep learning (DL) approaches to medical image analysis tasks have recently become popular; however, they suffer from a lack of human interpretability critical for both increasing understanding of the methods' operation and enabling clinical translation. This review summarizes currently available methods for performing image model interpretation and critically evaluates published uses of these methods for medical imaging applications. We divide model interpretation in two categories: (1) understanding model structure and function and (2) understanding model output. Understanding model structure and function summarizes ways to inspect the learned features of the model and how those features act on an image. We discuss techniques for reducing the dimensionality of high-dimensional data and cover autoencoders, both of which can also be leveraged for model interpretation. Understanding model output covers attribution-based methods, such as saliency maps and class activation maps, which produce heatmaps describing the importance of different parts of an image to the model prediction. We describe the mathematics behind these methods, give examples of their use in medical imaging, and compare them against one another. We summarize several published toolkits for model interpretation specific to medical imaging applications, cover limitations of current model interpretation methods, provide recommendations for DL practitioners looking to incorporate model interpretation into their task, and offer general discussion on the importance of model interpretation in medical imaging contexts.
Recently, artificial intelligence (AI) and, more specifically, deep learning (DL), approaches have achieved state of the art results for many medical imaging tasks including image segmentation (Hu ; Kamnitsas ; Roth ), disease detection and diagnosis (Gao and Noble, 2017; Roth ; Kim ; Huynh ; Roth ), and image classification (Yang ; Chen and Shi, 2018; Van Molle ; Shen and Gao, 2018; Yi ). The workhorse of DL applications for medical imaging is the convolutional neural network (CNN). CNNs are a type of deep learning model that take images as input and consist of a series of convolutional layers and non-linear activations, the behavior of which are tuned by weights and biases learned throughout the model training process (Krizhevsky ).While the popularity of using CNNs to perform medical image analysis tasks has increased rapidly in the past few years, a criticism often raised against them is their “black box” nature – meaning the internal structure of a CNN is not conducive to providing a simple explanation as to why a given input produces a corresponding output. To address this, researchers have developed model interpretation techniques and tools that aim to explain or visualize the decision-making process of CNNs. Interpretable models and interpretation methods in medical imaging have been the topic of several recent reviews and editorials (Jia ; Reyes ; Gastounioti and Kontos, 2020).Model interpretation is of particular importance for medical imaging applications due to the complexity and high stakes nature of medical decisions. An incorrect diagnosis or failure to detect disease can be highly detrimental to patient care, so contributions of a deep learning model to medical decision-making processes need to be explainable in order to gain clinician trust (Hengstler ; Nundy ). In fact, both clinicians and patients alike have advocated for increased transparency for medical imaging applications of deep learning. In a recent review, (Hosny ) notes that the, “lack of transparency [of deep learning models] makes it difficult to predict failures, isolate the logic for a specific conclusion or troubleshoot inabilities to generalize to different imaging hardware, scanning protocols and patient populations.” In a patient perspective, (Andrews, 2017) stresses the need for model interpretation with an analogy, likening a radiologist’s use of AI to provide a medical diagnosis to a car mechanic’s use of a computer diagnostic tool to provide an automotive diagnosis. The use of the technology by both the radiologist and mechanic is helpful, but only so long as the technology can provide an explanation in terms the patient can understand.As the use of AI tools in medicine become more widespread, additional legal requirements for model interpretability could become relevant. Article 22 of the General Data Protection Regulation (GDPR) adopted by European Union member states in 2018 contains requirements for automated decision-making, and some have argued that this could have implications for explanation of AI models in healthcare (Selbst and Powles, 2017).The overall goal of this review is to describe existing approaches for model interpretation, provide examples of their application to medical imaging, provide recommendations for deep learning practitioners looking to incorporate model interpretation into their task, and offer some general discussion on the importance of model interpretation in medical imaging contexts. Broadly, we divide the approaches to model interpretation into two categories: (1) understanding model structure and function (Section 2) and (2) understanding model predictions (Section 3), as described in Figure 1. Approaches in (1) primarily concern the hidden layers of the model: looking at hidden layer filters and features and visualizing or utilizing latent representations of data within the model. In contrast, approaches in (2) primarily concern the output of the model; these techniques produce heatmaps which describe which parts of an image are important to the model output.
Figure 1:
Model interpretation techniques can focus either (1) on increasing understanding of internal model structure and function, or (2) on increasing understanding of model output. These two approaches to model interpretation are covered in review Section 2 and Section 3, respectively.
Model interpretation closely follows the model development process as illustrated in Figure 2. Inspecting the model filters and feature activations can provide insight during the model training process (covered in Section 2.1), while techniques for reducing the dimension of high-dimensional data can come into play during both data collection and model deployment (covered in Section 2.2). Post-hoc attribution-based techniques for model interpretation, such as saliency maps or grad-CAM, provide interpretation of model output, and are most relevant during model training and at model deployment (covered in Section 3).
Figure 2:
Overview of the model development process (bottom), overlaid with aspects of model interpretation best suited to each phase (top), and the sections of the review in which they are covered. Different approaches to model interpretation are applicable to all steps of the model development process, from initial data collection, model training, and through to model deployment.
It is worth noting that there are several model interpretation toolkits published specifically with medical imaging in mind (covered in Section 4), which can facilitate application of interpretation methodologies. We conclude our review with specific recommendations for model interpretation best practices (Section 5), and discussion on the importance and limitations of model interpretation in medical contexts (Section 6).
Understanding model structure and function
When opening the “black box” of CNNs, the most direct approach to model interpretation is to look at the hidden layers of the network. This can be done in several ways, including direct inspection of the learned filters and feature maps, plotting high dimensional latent representations in two dimensions, or through employing models which learn useful latent representations.
Model filter and feature map visualization
A core component of modern CNNs is the convolutional layer (Goodfellow ). A convolutional layer takes the output of the previously layer as input, convolves it with a set of filters, sometimes also called features or kernels, and then applies a non-linear activation function. The output of the activation function is called a feature map or activation map and serves as input for the next layer (Figure 3A). CNNs learn features of varying complexity, often edges and corners in the first layers followed by more complex patterns in subsequent layers (Olah ). Understanding the features that the model learns and how those features act on images as they pass through the model can assist not only in ensuring the model is learning practical information, but also in connecting this information to patterns recognizable by humans.
Figure 3:
The parts of a convolutional layer are shown to make clear the distinction between a convolutional filter and a feature map (A). The input layer is convolved with a set of filters and an activation function is applied to generate a feature map. Comparison of learned model filters from (Yu et al., 2018) (B) and (Shin et al., 2016) (C). Both examples show filters learned via random initialization (RI) of filters and after transfer learning (TL). Filters in (B) are of size 3×3, and filters in (C) range from size 5×5 to 11×11.
Visual inspection of convolutional filters
After training a CNN, the learned filters can be visualized by loading the trained model and accessing the saved model weights. The weights learned by the first convolutional layer are arguably the most useful for interpretation, as they act on the images directly. While subsequent layers may also provide useful information, the filters themselves are difficult to interpret as they are acting on feature maps from previous layers.Within medical imaging, filter visualization has been used to compare filters across models and tasks. For example, (Roth ) compared filters learned for computer aided detection (CADe) of sclerotic metastases, lymph nodes, and colonic polyps on 3D computed tomography (CT) images. The authors observe that filters learned for lymph node detection represented a blobby texture and gradients in different orientations, whereas filters learned for colonic polyp detection were visually more diverse.The ability to visualize the filters themselves depends on many aspects of the model architecture, most notably the filter size. This can be seen in comparing filters learned in (Yu ) with those learned in (Shin ), as shown in Figure 3B. The filters learned in (Yu ) were of size 3×3, for which only minor conclusions can be made due to the small amount of information shown in only 9 filter elements. Conversely, commonly known CNNs like AlexNet, which utilize 11×11 filters in the first layer, have been applied to medical imaging tasks such as lymph node detection and prostate segmentation (Shin ; Roth ). This allows the visualization of much more complex patterns and shapes (Figure 3C).Filter visualization can be useful to compare filters learned after training a model with random initialization (i.e., from scratch) to those learned using transfer learning (initial weights taken from a network trained for another task). For example, filter visualization in (Shin ) indicated that AlexNet learned more blurry filters when random initialization was used, whereas transfer learning allowed more fine-tuning of higher-contrast and edge-preserving patterns (Figure 3B).It should be noted that although larger filter sizes can arguably result in more interpretable filters, they require more memory. Commonly used 2D network architectures (e.g., AlexNet, GoogLeNet) generally utilize a combination of multiple filter sizes, whereas large filters are often not feasible in fully 3D models as the memory constraints of large kernels come at the cost of having fewer layers or fewer filters per layer (Kamnitsas ). However, memory limitations are likely to be less significant in the future as GPU memory capacity expands with time.
Visual inspection of feature maps
A more intuitive way of visualizing the features learned by a CNN is looking at the feature maps, sometimes referred to as activation maps. Feature maps are the output of the CNN at each layer. That is, they are the result of convolving the input of the layer with the filters of that layer and then applying an activation function. Thus, non-zero values in a feature map indicate that a feature was activated. Networks that have many feature maps that are all zero may indicate a problem with the training process.Feature map visualization is a commonly used and straightforward model interpretation technique. It has been used in a wide variety of applications in medical imaging tasks including brain lesion segmentation on MRI images (Kamnitsas ), fetal facial plan recognition on ultrasound images (Yu ), classification of skin lesions on dermatology photographs (Van Molle ), and diagnosing Alzheimer’s disease with PET/MRI (Zhang ).Visualization of feature maps allows users to connect features that a human may learn to identify with features that the CNNs learn. For example, (Kamnitsas ) observed the network learning features to identify ventricles, cerebrospinal fluid (CSF), white, and gray matter on MRI images, indicating that differentiating between tissue types is useful for lesion segmentation. A similar finding was observed in (Van Molle ), where a CNN trained to classify skin lesions learned features corresponding to darker colors, skin types, lesion borders, and hair.
Dimensionality reduction
As discussed above, the hidden layer filters and feature maps of a CNN can be visualized. However, CNNs often have upwards of thousands of features per layer. To visualize such high dimensional data, techniques for reducing the number of dimensions while maintaining meaningful relationships between data points can be used. These methods take vectorized highly dimensional CNN features as input and produce a 2D summary that is easier to interpret. Principal component analysis (PCA) is perhaps the most well-known and widely used dimensionality reduction algorithm. PCA transforms the input data into orthogonal principal components (PCs) which are linear combinations of the original data. PCs are ordered by descending variance, such that the first few PCs often contain most of the useful information of the data and the remaining can be discarded without substantial loss of information. However, PCA relies on linear transformations, which are often not sufficient to preserve relationships of very high-dimensional data (Maaten and Hinton, 2008).T-distributed Stochastic Neighbor Embedding (tSNE), introduced in (Maaten and Hinton, 2008), is a nonlinear dimensionality reduction technique consisting of two main stages. First, a probability distribution is constructed over the high dimensional data points, for which conditional probabilities of two objects are proportional to the similarity of those objects. Second, a similar probability distribution is created in a low-dimensional map (typically two dimensions). The Kullback-Leibler (KL) divergence is then minimized between the two distributions to ensure a good mapping to the low-dimensional space, ensuring that datapoints close together in the high dimensional space are similarly close together in the low dimensional space. tSNE is typically performed on the components of the last fully connected layer before the final classification layer. Depending on the dimension of this layer, PCA may be applied prior to tSNE to reduce the computational demands of performing tSNE.tSNE is commonly used in visualizing deep learning models as it preserves pairwise Euclidean distances between data points. It can thus be used for several applications, for example visualizing patterns and clusters across classes and detecting outliers. This technique has been applied to the classification of abdominal ultrasound images (Cheng and Malhi, 2017) as well as classification and anomaly detection in histopathology images (Faust ). An example of tSNE images generated from (Faust ) is shown in Figure 4A. In (Yu ), tSNE was performed not only on the components of the fully connected layers but also on vectorized 2D ultrasound images.
Figure 4:
Examples of visualizing high-dimensional deep learning features in a 2D projection. (A) tSNE used to classify regions of histopathology images in (Faust et al., 2018). (B) Constraint-based embedding used to visualize brain MRI images from healthy controls and patients with a varying severity of Huntington disease (Plis et al., 2014).
Another approach to visualizing high-dimensional data was proposed in (Plis ) to assess whether their CNN models were learning useful information. The authors argue that due to the complicated nature of tSNE, it is difficult to know whether a two-dimensional mapping of CNN features is of poor quality due to the tSNE process or due to the deep learning process. Instead, the authors propose a constraint-based embedding technique that uses a divide-and-conquer algorithm that recursively breaks a problem into smaller sub-problems until each sub-problem can be solved directly and explicitly outputs the constraints that are being satisfied. The constraint used in (Plis ) was that k nearest neighbors of the resulting 2D projection were the same as in the original space, where k is a tunable parameter. The authors apply their technique to visualize how well a deep belief network (DBN) can separate brain MRI images of schizophrenic and healthy patients at each layer of the network, showing increased separations at deeper layers of the network. Similarly, they also separate brain MRI images of patients with and without Huntington disease, as shown in Figure 4B. The authors also note that neither tSNE nor the constraint-based embedding was able to separate patients when applied directly to the raw data.
Autoencoders for learning latent representations
Autoencoders are a class of deep learning model common to unsupervised feature learning (Vincent ), with applications in anomaly detection (Kiran ), image compression (Cheng ; Theis ), and representation learning (Tschannen ). Autoencoders for imaging applications are similar to CNNs in that they take images as input but differ in that their output is not a label, but rather the output is equal to the input. Autoencoders consist of two stages: an encoder which converts an input image into a latent representation, and a decoder which reconstructs the image from the latent representation. Typically, the encoder and decoder are trained jointly, minimizing the reconstruction loss between input and output. However, multiple types of autoencoders with different structures and loss functions have been developed, including variational autoencoders (VAE) (Doersch, 2016) and adversarial autoencoders (AAE) (Makhzani ), among others.Autoencoder use in medical imaging has focused predominantly on abnormality detection. In such application, an autoencoder is first trained with many examples containing no abnormality (i.e. scans of healthy patients with no pathology). This way the encoder learns a latent representation of normal images. Then, after abnormal test examples are introduced, their abnormalities are not captured in the latent representation, and the decoder will struggle to accurately reconstruct the parts of the image containing the abnormality. As a result, abnormal images can be detected by assessing the difference between the input image and model reconstruction. Simultaneously, the autoencoder can also provide a localization of abnormality by highlighting parts of the image with high reconstruction loss. In this way, autoencoders may be considered an interpretable form of deep learning model for image analysis tasks, because they can provide an assessment of where an image differs from what is expected based on a distribution of normal images.This approach to abnormality detection has been applied to multiple medical imaging tasks. In (Uzunova ), a VAE was trained to reconstruct OCT retinal images of healthy patients. The autoencoder was then used to classify three retinal pathologies in a separate dataset. The authors also assessed the ability of the VAE to localize the pathology in the OCT image, and compare the VAE to other visualization methods. They concluded that their proposed VAE-perturbation method was well-suited for explaining the output of their classifier. The authors also applied the same methodology to brain MRI, with similar results. An AAE is employed in (Chen ) to learn the distribution of healthy subject brain MRI images, and is then applied to test images of brain MRI containing lesions. The difference images between input and AAE reconstruction successfully localize lesions in the test images. A convolutional autoencoder was also used to detect nuclei in histopathology images by (Hou ). These authors developed an interesting approach to nuclei detection by combining learned latent representations with thresholding to separate images into a foreground containing nuclei and background containing cytoplasm.
Understanding model predictions
In contrast to model interpretation methods that involve visualizing intermediate network features or learned representations of data, other approaches to model interpretation try to attribute the model output to different parts of the input image. In general, they produce heatmaps that describe the importance of different parts of an image to the model decision on a pixel-by-pixel basis. Most attribution-based interpretation models function as post-hoc explanations – they are only meaningful when applied to a fully trained model, thus, they should be implemented after training has been completed. The available approaches generally fall into three groups: perturbation-based approaches (Section 3.1), backpropagation- or gradient-based approaches (Section 3.2), and decomposition-based approaches (Section 3.3). In addition to post-hoc attribution, so-called attention maps can be produced by trainable attention modules, which can be added to typical CNN architectures (Jetley ). This approach to model interpretation is covered in Section 3.4.
Perturbation-based methods
Perturbation-based approaches to model interpretation involve altering different parts of an image and seeing how those perturbations change the output of the model. The commonality of the approaches in this section is the underlying idea that when important parts of an image are perturbed, the output of the model is strongly affected, and when unimportant parts of an image are perturbed, the output of the model is unaffected. These approaches can be thought of as a type of sensitivity analysis to test the effect of small changes in model input on model output.
Occlusion
Occlusion as a means of performing model interpretation was first introduced in (Zeiler and Fergus, 2014). This technique consists of systematically occluding parts of an image and monitoring how strongly the perturbation influences model output. Image parts that, when occluded, strongly affect the output of the model are assigned high importance, while image parts that have little effect on model output when occluded are of low importance. Kermany et al. employed occlusion as a way to perform model interpretation for the diagnosis of retinal pathologies in optical coherence tomography images (Kermany ). Several occlusion-produced heatmaps from this application are shown in Figure 5A.
Figure 5:
Example uses of perturbation-based attribution methods for model interpretability. In (A), (Kermany et al., 2018) employed occlusion in order to visualize CNN-based diagnosis of retinal pathologies in optical coherence tomography images. (B) compares several approaches to interpretation for identifying congestive heart failure on chest x-ray (Seah et al., 2018). In (C), (Sayres et al., 2019) uses integrated gradients to visualize evidence of diabetic retinopathy on retinal fundus images.
One drawback of occlusion is that it amounts to performing inference on many slightly perturbed versions of the image, which requires computation. This drawback is especially pertinent if a high resolution heatmap is desired, as inference must be performed on as many perturbed images as there are occluded patches in the desired heatmap.
Local interpretable model-agnostic explanations (LIME)
LIME is an approach to model interpretation introduced by Ribeiro et al. (Ribeiro ). While LIME can be used to explain the prediction of any classifier, for this review we only consider LIME in the context of image models. LIME for images works by first identifying groups of contiguous pixels with similar intensities called superpixels. The image is then perturbed by turning subsets of superpixels “off” by replacing the value of all pixels in the superpixel with the mean intensity value of that superpixel. Like occlusion, changes in the model output due to the perturbation are used to identify how important each superpixel is to model output, and a heatmap highlighting the important superpixels is produced.Seah et al. used LIME to visualize the salient portions of chest radiographs for identifying congestive heart failure (Seah )). For their application, they find that LIME produces heatmaps that are less intelligible than their proposed Generative Visual Rationale method, as demonstrated in Figure 5B. An advantage of LIME over occlusion is that LIME uses superpixels that are more likely to correspond to semantically different parts of an image, while occlusion perturbs image patches in a systematic, uniform way, ignoring possible semantic similarity between adjacent pixels. LIME also uses less extreme perturbations than occlusion, as the intensities in the perturbed image region are replaced by the mean intensity instead of zeroes, however, there is nothing to prevent modification of either method to remove this difference.
Integrated gradients
Integrated gradients was first introduced in (Sundararajan ). The integrated gradients method considers the input image and a baseline image of all zeroes. Starting from the baseline image, a set of intermediate images are produced along the path from the baseline to the input image. At each step along the path, the gradient of the model output with respect to each pixel in the intermediate image is computed. Then, these gradients are summed over the path from baseline to input image. This produces the heatmap of pixelwise importance desired. Formally, the integrated gradients heatmap IG(x) produced for a given input image x and baseline image x’ is given by:
where the function F: Rn → [0,1] represents the CNN model.In its introductory paper, the authors demonstrate the use of integrated gradients for providing model explanations for detecting diabetic retinopathy on retinal fundus images. This use is expanded upon in (Sayres ), where ophthalmologists were tasked with grading the severity of diabetic retinopathy both with and without the explanatory heatmap produced by integrated gradients. Figure 5C shows an example of images provided to clinicians with and without interpretation. The authors report that readers provided with model-predicted grades and heatmaps graded patients with diabetic retinopathy more accurately than readers without any model assistance.
Backpropagation- or gradient-based methods
Backpropagation is the method by which weights in a neural network are updated during the model training process. Model interpretation methods in this section do not actually update model weights as occurs during training; rather, they rely on backpropagation to compute gradients, and these gradients are combined in different ways to visualize salient parts of an image.
Saliency maps
Saliency maps, introduced in 2013 by (Simonyan ), use gradients to visualize the classification of an image evaluated by a deep convolutional network. In the introductory paper, the authors offer two uses for saliency maps: class maximization visualization and image-specific class saliency maps.Class maximization uses gradient ascent to produce an image that maximizes the activation of that class, and therefore can be interpreted as being most representative of that class. Formally, class maximization finds an image I of class c for which a class score Sc is maximized:
where λ is a regularization parameter.In (Yi ), class maximization is used to produce visualizations of maximally malignant and maximally benign breast masses to help interpret the performance of a network trained for mammogram classification. Figure 6A contains examples of the class maximization visualizations for benign and malignant breast masses. The authors note that the maximally malignant visualization appears to contain a highly spiculated mass, a visual feature used by radiologists to identify malignant breast masses.
Figure 6:
Backpropagation-based approaches to model interpretation include: (A) class maximization visualization of malignant and benign breast masses on mammogram (Yi et al., 2017), (B) weakly supervised detection of extra perivascular spaces on brain MR via saliency mapping (Dubost et al., 2019), and (C) class activation map visualization for classifying breast masses on mammograms (Kim et al., 2018).
Image-specific class saliency maps are image- and class-specific heatmaps that represent the importance of individual pixels to the assignment of the image to a class, providing an assessment of which parts of an image are most important to the model. Saliency mapping is sometimes also referred to as “sensitivity analysis”, but it should be noted that it is a separate technique from the perturbation-based methods outlined previously in Section 3.1. Here, the heatmap Sal(x) for a class c is computed directly as the derivative of the model output score F(x) with respect to each pixel in the input image x through backpropagation:Because of its simplicity, saliency mapping is one of the most widely implemented methods for model interpretation in medical imaging to date. As shown in Figure 6B, Dubost et al. employed image-specific class saliency maps as part of a weakly supervised approach to segmentation of structures on brain MRI (Dubost ; Dubost ). Other areas of application for saliency maps include heart disease classification on chest x-rays (Chen and Shi, 2018), detection of abnormalities in the spine on MRI (Jamaludin ), detection of artifacts on magnetoencephalography (Garg ), classification of breast masses in mammography (Lévy and Jain, 2016), and classification of pediatric elbow fractures on x-ray (Rayan ).Gonzalez-Gonzalo et al. presented an expansion of class saliency maps in 2018 with the introduction of iterative saliency maps (González-Gonzalo ). The objective of iterative saliency mapping is to identify less discriminative image regions that may have been ignored in the initial saliency map. Briefly, the method works by iteratively computing a saliency map, inpainting the most salient image regions identified, and computing the saliency map again. This process repeats until the perturbed image is no longer classified as containing an abnormality, or a maximum number of iterations is reached. The final iterative saliency map is computed as a weighted sum of the saliency maps computed at each step. The authors apply their technique to the task of identifying retinal fundus image segments relevant to grading diabetic retinopathy and demonstrate higher sensitivity with iterative saliency maps compared to saliency maps without iterative refinement.Despite their popularity, saliency mapping has the notable drawback that it provides no indication as to whether a pixel provides evidence for or against a class, only that the classification is sensitive to that pixel. Several authors have also noted that in binary classification settings, saliency maps lose their class specificity, because if a feature is important for distinguishing between two classes, it may be highlighted by a saliency map for both classes (Garg ).
Guided backpropagation
Guided backpropagation, introduced in (Springenberg ), is an extension to saliency maps introduced by (Simonyan ) and the ‘deconvnet’ concept introduced in (Zeiler and Fergus, 2014). The difference between these approaches lie in how backpropagation through Rectified Linear Unit (ReLU) activation layers of the network is handled. ReLU is an activation function commonly used in CNNs (Glorot ). During the forward pass, neurons with negative output are clamped to zero by ReLU by definition (ReLU(x) = max(0,x)). (Zeiler and Fergus, 2014) extended this idea to computing gradients in the backward pass by clamping to zero negative gradients. Guided backpropagation combines these two ideas, zeroing out signal through neurons that have either negative output during the forward pass or negative gradient during the backward pass. This produces a heatmap that highlights only pixels that provide positive evidence for a classification. Further discussion of the relationship between saliency maps, deconvnet, and guided backpropagation can be found in (Mahendran and Vedaldi, 2016).Guided backpropagation has been used to visualize salient image pixels for the task of fetal heartbeat localization in ultrasound images by Gao et al. (Gao and Noble, 2017). They find that the heatmaps produced by guided backpropagation are robust to variations in heart appearance, scale, position, and contrast. Bohle et al. evaluated guided backpropagation as a method for visualizing Alzheimer’s disease (AD) diagnosis on brain MRI, but found the visualizations produced by guided backpropagation to be less discriminative than those produced by other methods (Böhle ).
Class activation mapping
Class activation mapping (CAM) was first introduced in 2016 by (Zhou ). Class activation mapping works by computing a weighted sum of feature maps following the final convolutional layer where the weights are provided by the fully connected layer following global average pooling, a type of pooling described in (Lin ). The class activation map CAMc(x) for a class c and image x is defined as:
where are the weights for class c in the final network layer, and f(x) is the corresponding feature map prior to global average pooling. Thus, CAMc(x) is a class-specific heatmap that indicates discriminative image segments.Class activation mapping has seen use in both classification and localization applications in medical imaging. (Feng ) used class activation maps as part of a weakly-supervised approach to lung nodule segmentation on thoracic CT scans. First, a CNN was trained to perform binary classification of CT images as containing a nodule or not. Then, the authors show that class activation maps generated from the trained classification model successfully highlights nodule candidates. Similar weakly-supervised approaches using CAMs for chest x-ray abnormality and breast mass localization is described in (Hwang and Kim, 2016), and for ACL tear localization on knee MRI in (Liu ). Kim et al. computed class activation maps for the classification of benign vs malignant breast masses on mammograms, but found them difficult to interpret for their task, as shown in Figure 6C (Kim ). Other applications of class activation maps to medical imaging tasks include localization of diabetic retinopathy lesions in retinal fundus images (Gondal ), and weakly supervised diagnosis of tuberculosis on chest x-rays (Hwang and Kim, 2016).A drawback of class activation mapping is that it places some restrictions on network architecture. It requires a global pooling layer, followed by a fully connected layer as the last layers before the output layer. While Zhou et al. used global average pooling when they introduced class activation mapping, related work by Oquab et al. produces similar localization score maps using global max pooling (Oquab ).To address this limitation of class activation maps, (Selvaraju ) introduced gradient-weighted class activation maps (grad-CAM). In grad-CAM, the weights are the gradients of the class score with respect to each feature map, instead of requiring that the weights be taken from a fully connected layer. That is:
where the weights are the gradients of the score for class c y with respect to the kth feature maps f(x) of the preceding convolutional layer:Selvaraju et al. define grad-CAM to include a ReLU activation, because they are interested only in features with a positive association with the class c. They also offer a further refinement with guided grad-CAM, which is the pixelwise product of grad-CAM and Guided Backpropagation (Springenberg ). More recently, (Zhao ) have added a further variant of class activation mapping with respond-weighted class activation mapping (Respond-CAM).Garg et al. employed grad-CAM visualizations to identify discriminative regions of magnetoencephalography images in the task of detecting eye-blink artifacts (Garg ). The authors found that the regions of the eye highlighted by grad-CAM are the same regions that human experts rely on. Furthermore, (Shen and Gao, 2018) used grad-CAM to visualize areas of chest x-ray indicative of fourteen suspected diseases. In this multi-class setting, the class-specific nature of grad-CAM proved to be valuable.
Decomposition-based methods
Decomposition-based methods for model interpretation seek to decompose the prediction of the model to a heatmap that describes how much each pixel contributes to the prediction. Whereas perturbation- and gradient-based methods for interpretation highlight parts of the image that, if altered, affect the prediction of the model, decomposition-based methods identify parts of the image that directly provide evidence for the model decision.
Layer-wise relevance propagation
Layer-wise relevance propagation (LRP) was introduced by Bach et al. in 2015 (Bach ). Unlike saliency mapping, guided backpropagation, and grad-CAM, LRP does not rely on gradients to generate a heatmap. Instead, LRP works by computing relevance scores that distribute the output of the final layer amongst nodes in the previous layer. This process continues recursively until the input layer of the network is reached, producing a relevancy score heatmap that can be overlaid over the input image. Formally, the relevance score contribution to a neuron i in the lth layer from a neuron k in the (l+1)th layer is:The total relevance score for a neuron i in the lth layer is the sum of contributions over all neurons k in layer l+1 to which neuron i is connected:Further properties of LRP and details of its theoretical basis are given in (Montavon ), and comparison of LRP to other interpretation methods can be found in (Samek ; Kohlbrenner ).Layer-wise relevance propagation has seen some use in medical imaging applications. Eitel et al. applied LRP for providing interpretation to CNN-based diagnosis of multiple sclerosis (MS) on brain T2-weighted MRI (Eitel ). Interestingly, they show that LRP heatmaps focus on both hyperintense lesions in regions in the brain clinically associated with MS diagnosis as well as the thalamus, which is known to be affected by MS at an early disease stage (Figure 7A). The same researchers also used LRP to interpret CNN-based Alzheimer’s disease (AD) diagnosis on MRI in (Böhle ). The authors find that LRP heatmaps from their trained model highlight the hippocampal volume, which has been used to diagnose AD and predict disease progression (Figure 7B). They also compared LRP to guided backpropagation and concluded that LRP may be more valuable than guided backpropagation for their task because the difference in heatmap scores between Alzheimer’s disease and healthy controls was more evident for LRP. Finally, Thomas et al. use LRP as part of their DeepLight framework for associating brain regions with different cognitive states on functional MRI (Thomas ).
Figure 7:
Layerwise relevance propagation for model interpretation in (A) diagnosing multiple sclerosis on brain MRI (Eitel et al., 2019), and (B) visualizing evidence for Alzheimer’s disease on brain MRI (Böhle et al., 2019b).
Trainable attention models
In addition to techniques for interpretation that produce attribution heatmaps from a fully trained model, as covered in Sections 3.1–3.3, some research has also been done toward developing trainable mechanisms for attribution. In particular, the concept of soft attention as introduced for CNNs by (Jetley ) has seen application to medical image analysis tasks. This type of attention module can be added to any layer of a CNN to produce a fine-grain attention map which highlights salient parts of an image. Attention modules take as input the activation map output from the previous network layer (local features, l) and a global feature vector (g) obtained from the final network layer. The attention module computes a compatibility score c between the local features and the global features. In (Jetley ), two functions for computing compatibility are proposed: the dot product between local and global features (〈l, g〉), and the dot product between the sum of local and global features and a learned vector u (〈u, l + g〉). Intuitively, the compatibility score is high for local features which are similar to the global features from deeper in the network. Finally, the attention map a is produced by applying softmax normalization to the compatibility scores c:The output of the attention module g is the local features weighted by the attention map:Thus, the attention module increases signal from features with high compatibility with the global feature description of the image and suppresses signal from features with low compatibility.Attention for 3D medical images was implemented by (Schlemper ), who introduced an Attention U-Net for organ segmentation in abdominal CT scans, and Attention Gate (AG)-Sononet for fetal ultrasound image plane classification. In both tasks, the authors demonstrated that the attention maps produced by their attention gates correctly highlighted the structures of interest. In (Li ), soft attention mechanisms were added to a traditional U-Net for segmenting breast masses on digital mammograms. For this task, the addition of attention provided a modest improvement in segmentation performance over the authors’ base model. In melanoma lesion classification, the use of a regularized attention mechanism for the classification of skin lesion photographs demonstrated that attention maps generated from deeper network layers may focus more strongly on valid image regions than those from shallower layers (Yan ). Attention modules were added to a VGG-16 architecture for grading osteoarthritis severity on x-ray images of the knee (Górriz ). In contrast with the previous study, these authors obtained higher classification accuracy and more reasonable attention maps from earlier layers in the network. Other applications of attention mechanisms to medical imaging include classification of breast cancer histopathology images (Yang ), and segmentation of cardiac substructures on MRI (Sun ).Trainable attention is particularly interesting as an interpretation strategy because not only does it provide an attribution heatmap, but it can also improve model performance (Schlemper ; Li ). However, the optimal architecture and implementation hyperparameters have yet to be determined and are likely to vary by application.
Toolkits for model interpretation specific to medical imaging
In Sections 2 and 3, we described methods for model interpretation that are application-agnostic. However, we highlighted examples of their use for medical imaging-specific applications. In this section, we summarize several publications that introduce tools for model interpretation designed specifically for medical imaging tasks. Some of the toolkits in this section incorporate approaches to model interpretation described in Sections 2 and 3. For example, Mimer, described in Section 4.3, makes use of grad-CAM, previously described in Section 3.2.3.
CLEAR-DR
Kumar et al. introduced CLass-Enhanced Attentive Response Discovery Radiomics (CLEAR-DR) as a framework for model interpretation in 2019 (Kumar ). CLEAR-DR is built on CLEAR, published previously by the same authors (Kumar ). CLEAR produces two types of explanatory heatmap: a dominant class attentive map, which assigns each image pixel to the class most influential at that location, and a dominant response map, which shows the dominant attentive level based on the identified dominant class. The authors apply the method to grading diabetic retinopathy (DR) in a set of more than 50,000 retinal fundus images. They produce heatmaps corresponding to evidence for each of five classes: mild, moderate, severe, and proliferative DR, as well as DR-negative. For their classification task, the authors find that in correctly classified images the CLEAR-DR maps correspond to relevant portions of the eye anatomy, and in cases that are misclassified, CLEAR-DR fails to focus on the relevant abnormality.
DeepMiner
DeepMiner, introduced in (Wu ), is a framework for discovering interpretable representations for explaining medical imaging predictions. This framework builds on their previous work in which they demonstrate that visual patterns learned by their network correspond to relevant medical phenomena (Wu ). DeepMiner uses a technique called network dissection (Bau ) to identify influential network features, and an expert then manually annotates these features with an application-specific semantic concept. The authors apply their framework to the task of mammogram classification. In this application, a radiologist specializing in mammography manually assigns a concept from the BI-RADS lexicon to influential features, such as “benign vascular calcification” or “spiculation”. DeepMiner then generates a report consisting of a test image overlaid with influential feature activation maps and the corresponding medical phenomena as an explanatory aid. The proposed framework is application-agnostic and could be extended to other image classification tasks where semantic concepts of image sections are of interest. However, a drawback of DeepMiner is its reliance on expert annotation of network feature activation maps. In their application, the authors find that 75% of influential features correspond to an identifiable medical phenomena, but this percentage may be task-dependent.
Mimer
Hicks et al. describes the development of an automated multimedia reporting system called Mimer in their 2018 paper (Hicks ). The goal of Mimer is to produce an understandable and reproducible report containing text and images from a medical procedure appropriate for non-technical users. Users select an image, a network layer, and a target class, and Mimer produces a report describing the likelihood of the selected image containing the target class (model output), and a guided grad-CAM visualization of evidence for the target class (model interpretation). The authors provide example reports for performing polyp detection following a colonoscopy. They also provide clear, step-by-step instructions for installing the necessary dependencies for running Mimer either from a provided git repository, or a pre-configured Docker image.
DX-Caps
In (LaLonde ), the authors introduce a capsule network-based model for producing explainable diagnoses. Capsule networks differ from traditional CNNs in that the scalar feature maps produced at each layer in a traditional CNN are replaced with vectorized representations (Sabour ). The authors use the capsule architecture to assign specific semantic concepts to each component of the network output. They apply DX-Caps to lung nodule classification and use a six-dimensional output to capture six attributes important to lung nodule malignancy prediction (subtlety, sphericity, margin, lobulation, spiculation, and texture). While DX-Caps does not produce a heatmap of interpretation, their approach of using a vectorized network output to provide a clinician with an explanation for the model output in terms of familiar semantic concepts is interesting and widely applicable to diagnosis tasks. However, a drawback of their approach that should be noted is its reliance on expert annotation of individual semantic concepts in training images, which may be time-consuming to produce and may differ from clinician to clinician.
MDNet
MDNet, introduced in (Zhang ), uses joint image and language models to provide diagnosis interpretation for images paired with text reports. For their image model, the authors use ResNet (He ), which they use to generate an image feature vector that is passed to the long short term memory (LSTM) language model. The LSTM also takes as input the text report. MDNet uses the attention mechanism from (Xu ) to produce heatmaps which show the image support for each word in the accompanying text. The authors apply MDNet to a bladder cancer pathology dataset of 32 whole-slide hematoxylin and eosin (H&E) stained samples from patients at risk for papillary urothelial neoplasm with paired diagnostic report text. The authors report that collaborating pathologists found the MDNet-produced attention maps to be “fairly encouraging” for highlighting informative regions of the images. While the authors only investigate applying MDNet to pathological images, radiological images would be another promising area of application, as large-scale repositories of paired images and radiologist-produced report text exist at any large academic hospital.
Recommendations
In this review, we have summarized current approaches to interpreting deep learning models used in medical image analysis. Here, we provide a succinct summary of recommended steps for conducting model interpretation in medical imaging. When thinking of deep learning data and model workflow, it helps to think of three distinctive steps (Figure 2): data collection, model training and model deployment.Model interpretation can be performed during both model training and deployment. However, different approaches for interpretation are appropriate at different steps. For example, dimensionality reduction may be useful prior to model training to better understand underlying structure in a high dimensional dataset, whereas attribution-based methods for interpretation are only meaningful once a trained model is obtained, and so should be used after model training or once a model is deployed.Before performing model interpretation, it is important to ensure that the model has been trained properly. Attempting to perform any of the interpretation techniques with a model trained with incorrect data, or a model overfit to training data, can be misleading. To help avoid common training pitfalls, we include Appendix 1 which summarizes some of the most common issues likely to be encountered in training CNNs for medical image analysis tasks. However, as training of CNNs is a vast and complex topic, the summary is not meant to be exhaustive, but rather typical in analysis of medical images. For a full discussion on training CNNs, there are several other resources available, for example (Goodfellow ).Filter and feature map visualization, as described in Section 2.1, is a simple way to perform model interpretation, but the value of doing so can be unclear. Some authors have been able to connect individual feature maps with human identifiable features (Van Molle ), but others find that layers contain many similar feature maps with little intuitive meaning (Zhang ). It is also important to keep in mind that feature maps are likely to become more complicated and less intuitive at deeper layers of the model (Olah ). Techniques like network dissection (Bau ), which identify important features, should be considered when pursuing feature map visualization for model interpretation.Attribution-based methods for model interpretation is perhaps the largest research direction in model interpretation, and the existence of multiple attribution-based methods for model interpretation can make the process of choosing a method overwhelming. To better understand the relationship between attribution-based methods, we provide their “family tree” (Figure 8). Some qualities to consider when making this choice are method maturity and popularity, weaknesses described in literature, and publicly available implementations. More mature methods, such as saliency mapping (Simonyan ), have the advantages of simple intuition, and abundant examples of use in literature (Table 1), but drawbacks of the method have also been identified (Adebayo ; Rudin, 2018). Newer methods, such as grad-CAM (Selvaraju ) or layer wise relevance propagation (Bach ) have fewer examples of use in literature, but there are also fewer publications identifying their weaknesses. However, it is unclear if this is due to their being inherently more useful than previous methods, or just a product of their more recent development. Given the rapid pace of development in deep learning, there will likely be new methods for model interpretation developed in the future, which may address shortcoming of current methods.
Figure 8:
Family tree of attribution-based methods for model interpretation. Dates correspond to when the method was first published. Arrows indicate methods that are developmends or refinements of previous methods.
Table 1:
Summary of uses of model interpretation methods for medical imaging tasks. Filled cells indicate which interpretation method is used by each publication. Publications that compare multiple methods have more than one filled cell in their row. Feat. Vis. = feature visualization, Dim. Red. = dimensionality reduction, AE = autoencoder, Occ. = occlusion, LIME = Local interpretable model-agnostic explanations, IG = integrated gradients, Sal. = saliency mapping, GB = guided backpropagation, CAM = class activation mapping, LRP = layer wise relevance propagation, Att. = trainable attention.
Author
Task
Modality
Feat. Vis
Dim. Red.
AE
Occ.
LIME
IG
Sal.
GB
CAM
LRP
Att.
(Yu et al., 2018)
classification
US
○
○
(Roth et al., 2014)
detection
CT
○
(Roth et al., 2015a)
segmentation
CT
○
(Shin et al., 2016)
detection
CT
○
(Van Molle et al., 2018)
classification
photo
○
(Zhang et al., 2019)
classification
PET, MRI
○
(Cheng and Malhi, 2017)
classification
US
○
(Faust et al., 2018)
detection
histopath
○
(Plis et al., 2014)
classification
MRI
○
(Uzunova et al., 2019)
classification
OCT, MRI
○
(Chen et al., 2020)
segmentation
MRI
○
(Hou et al., 2019)
detection
histopath
○
(Seah et al., 2018)
classification
x-ray
○
○
○
(Kermany et al., 2018)
classification
OCT
○
(Sayres et al., 2019)
classification
DR
○
(Sundararajan et al., 2017)
classification
DR
○
(Garg et al., 2017)
detection
MEG
○
○
(Chen and Shi, 2018)
classification
x-ray
○
(Dubost et al., 2019)
detection
MRI
○
(González-Gonzalo et al., 2018)
detection
DR
○
(Jamaludin et al., 2016)
classification
MRI
○
(Lévy and Jain, 2016)
classification
mammo
○
(Rayan et al., 2019)
detection
x-ray
○
(Yi et al., 2017)
classification
mammo
○
(Hicks et al., 2018a)
classification
colonoscopy
○
○
(Böhle et al., 2019a)
classification
MRI
○
○
(Gao and Noble, 2017)
detection
US
○
(Feng et al., 2017)
detection
CT
○
(Gondal et al., 2017)
detection
DR
○
(Hwang and Kim, 2016)
detection
x-ray, mammo
○
(Kim et al., 2018)
classification
mammo
○
(Liu et al., 2019)
detection
MRI
○
(Shen and Gao, 2018)
detection
x-ray
○
(Eitel et al., 2019)
classification
MRI
○
(Thomas et al., 2018)
classification
MRI
○
(Schlemper et al., 2019)
classification, segmentation
US, CT
○
(Li et al., 2019)
segmentation
mammo
○
(Yan et al., 2019)
classification
photo
○
(Górriz et al., 2019)
classification
x-ray
○
(Yang et al., 2019)
classification
histopath
○
(Sun et al., 2020)
segmentation
MRI
○
Another point to consider when choosing a method for model interpretation is ease of implementation. Some public implementations are available. For example, the Keras Visualization Toolkit (Keras-vis) has implementations of class maximization, saliency maps, and grad-CAM. (https://raghakot.github.io/keras-vis/). Keras-explain (https://pypi.org/project/keras-explain/) is another project with implementations of multiple interpretation methods including grad-CAM, guided back-propagation, and integrated gradients. When choosing a publicly available implementation to pursue, it can be useful to check whether the developer is still actively supporting the project. Github (https://github.com/), a popular website for sharing code, shows when a project was last updated, and the Issues tab of a Github repository can be a useful indicator of whether the developer is likely to respond to questions.
Discussion
This review is the first to summarize both the technical and practical implementation details of model interpretability approaches for deep learning practitioners focusing on medical imaging applications. We have grouped interpretation approaches by their technical similarities, and by their relevance to different stages of the model development process. We have also provided practical advice for choosing between interpretation techniques and for implementing them. Deep learning-based approaches to medical image analysis is a rapidly expanding and exciting area of research, however the medical nature of the problems addressed in this field warrants extra emphasis on model interpretation.When performing model interpretation, it is important to keep in mind that one of the end goals is to improve clinician trust in the model. To assess whether model interpretation has an impact on clinician behavior, comparison of clinician-alone vs clinician with model interpretation can be a valuable tool. For example, (Sayres ) compared reader performance in assigning diabetic retinopathy grade to retinal fundus images in three assistance settings: unassisted, reader provided with algorithm-predicted grade only, and reader provided with algorithm-predicted grade and integrated gradient heatmap. The authors compare the grading accuracy, reader confidence, and read time across these three assistance conditions and find that while algorithm-predicted grades improved grading accuracy and reader confidence, they also resulted in increased read time. There was also not a significant difference in grading accuracy when the integrated gradients heatmap was provided with the predicted grade as compared to the predicted grade alone. Regardless, this type of comparison is a strong example for assessing the impact of model interpretation on clinician-algorithm interaction.It is important to recognize that the importance of model interpretation is task dependent, and different levels of model interpretation are necessary for different tasks. Table 1 organizes the reviewed literature by interpretation technique and image analysis task. The implementation of model interpretation techniques is common in detection and classification tasks, but not in segmentation tasks, despite segmentation being a large application of deep learning in medical imaging (Litjens ). This disparity may be attributed to the difference in perceived importance of interpretation by task. For detection and classification, it is natural to want to compare the parts of an image the model uses to make a prediction to the parts of an image a physician would use. In segmentation, the equivalent rationale for the importance of interpretation is less clear. For segmentation, perhaps interpretation heatmaps would highlight anatomical landmarks nearby a structure of interest, or would simply highlight the target structure itself, but this has not been fully investigated. Despite differences, all classes of application stand to gain the same benefits from model interpretation, including the investigation of model limitations, assessment of generalizability, and increased user trust.The uses for model interpretation methods can extend beyond providing model interpretation alone. For example, (Dubost ; Dubost ) used saliency maps as part of a weakly-supervised method for detecting enlarged perivascular spaces on brain MRI. This use of saliency mapping is especially interesting because it was able to circumvent the need for dense, pixel-by-pixel labels that would be required by a traditional, fully supervised detection approach. Dimensionality reduction techniques are commonly used to visualize network representations of data and gain intuition into how the network separates classes, but they can also be used to identify outlier data. In (Faust ), the authors apply a k nearest neighbors approach to tSNE-produced representations of patches drawn from whole slide histology images of CNS tissue samples. They find that tSNE representations of new glioma patches fall close to glioma patches in their training data. Another possible application of model interpretation techniques is to inform the design of mathematical models. For example, understanding feature maps, as described in Section 2.1, can potentially provide insight into the key dependencies describing a particular system, and could be useful inputs for modeling of the investigated system behavior. As these examples indicate, there is great potential for creative and valuable use of interpretation techniques as part of other medical image analysis approaches.
Current challenges and future directions
While current approaches to model interpretation can provide valuable insight into how a deep learning model is performing, there are important limitations that should be discussed. First, some researchers have argued that explanations provided by commonly employed attribution-based interpretation methods such as saliency mapping or class activation mapping are not reliable and can be misleading (Rudin, 2018; Adebayo ). These concerns have been voiced within medical imaging as well. For example, (Seah ) report that in detecting abnormalities in chest x-rays, several attribution-based interpretation methods (occlusion, integrated gradients, LIME) produced nonspecific heatmaps for the expected abnormality. In (Böhle ), the authors compared guided backpropagation and layer-wise relevance propagation for producing heatmaps of explanation for separating Alzheimer’s Disease from healthy controls on brain MRI. They observed that guided backpropagation failed to produce heatmaps that were visually dissimilar for Alzheimer’s Disease versus healthy controls.To avoid these weaknesses of post-hoc explanation methods, (Rudin, 2018) suggested that new types of models designed to be inherently interpretable should be used instead. One possible approach for this was described in (Hase ), where test images were classified by comparing them to a predefined hierarchical taxonomy of images that act as primitives of each classification category. As an example, the authors describe the classification of an image of a capuchin monkey: first, the image is determined to contain an animal based on similarity between the test image and an animal prototype image, then it is determined to contain a primate, and finally a capuchin. This interesting approach to interpretation could be potentially valuable in medical imaging applications where abnormalities or pathologies have a hierarchical relationship. However, this approach would likely be restricted to classification problems.Adversarial attacks represent another limitation of current model interpretation techniques. Adversarial attacks refer to image perturbations designed to strongly affect the prediction of a deep learning model without affecting the appearance of the image to a human observer (Szegedy ; Kurakin ). Multiple researchers have demonstrated that medical images are susceptible to adversarial attack (Finlayson ; Mirsky ). In addition to adversarial attacks designed to maximally perturb model predictions, adversarial attacks against model interpretation heatmaps have also been investigated. These perturbations are designed to maximally change the interpretation heatmap, while leaving the model prediction unaffected. (Ghorbani ) showed that small perturbations to natural images could be designed to maximally change the heatmaps produced by several commonly used interpretation methods. The adversarial perturbations could also be designed to cause the interpretation method to selectively highlight a part of the targeted image that is semantically different from the predicted label. This kind of attack would be especially damaging to physician model trust in medical settings. Given the susceptibility of medical images to adversarial attack against their predictions, it is reasonable to expect medical image interpretation to be similarly vulnerable. However, despite these limitations, many other publications summarized in this review have demonstrated useful benefit from performing model interpretation with various methods.Finally, several growing areas of medical imaging research are ripe for application of model interpretation techniques. First, image to image synthesis tasks are an interesting area of application. For example, MRI to CT synthesis for PET/MR attenuation correction and MRI-only radiotherapy planning (Wolterink ), or low dose to high dose image synthesis for radiotracer dose reduction (Wang ; Yi and Babyn, 2018). No established interpretation methods have been consistently applied to this class of application. Additionally, several cancer imaging studies investigating the relationship between non-invasive imaging modalities and pathology or genetic information have recently been published. For example, pre-treatment PET textural features were correlated with vascular endothelial growth factor (VEGF) expression in head and neck cancer patients (Chen ). Similarly, several groups have correlated PET findings to programmed death ligand-1 (PD-L1) expression, a possible marker for patient response to cancer immunotherapies (Chen ; Takada ; Jreige ). Applying model interpretation techniques to this class of problem could potentially increase understanding of the connections between image features and the underlying biology. Early work in this direction has been done in (Wang ), where a deep learning approach to predicting epidermal growth factor receptor (EGFR) status from chest CT in lung adenocarcinoma patients made use of grad-CAM to provide visual evidence of the model decision in the image. The authors of this work provide example visualizations of both EGFR+ and EGFR- cases to demonstrate the differences in grad-CAM maps by EGFR status.
Summary
We have reviewed approaches to interpreting CNN-produced predictions and their use in medical imaging applications. Model interpretation can be performed by looking inside the model at the features it learns, or by looking at the output of the model and understanding which parts of an image were important to producing that output. Medical images have unique characteristics which should be considered when performing model interpretation, and several tools designed to accommodate these unique characteristics have been developed. It is now well established that deep learning models can achieve state of the art performance for a wide variety of medical image analysis tasks, but in order to better understand the models and gain clinician trust, developing methods for providing clear and interpretable rationale for model decisions is critical. For this reason, it is imperative that developments in model interpretation progress in step with developments in model performance. It is equally important that investigators applying deep learning to medical imaging tasks rigorously implement and consistently report on model interpretation steps undertaken for their task.
Authors: Geert Litjens; Thijs Kooi; Babak Ehteshami Bejnordi; Arnaud Arindra Adiyoso Setio; Francesco Ciompi; Mohsen Ghafoorian; Jeroen A W M van der Laak; Bram van Ginneken; Clara I Sánchez Journal: Med Image Anal Date: 2017-07-26 Impact factor: 8.545
Authors: Ahmed Hosny; Chintan Parmar; John Quackenbush; Lawrence H Schwartz; Hugo J W L Aerts Journal: Nat Rev Cancer Date: 2018-08 Impact factor: 60.716
Authors: Konstantinos Kamnitsas; Christian Ledig; Virginia F J Newcombe; Joanna P Simpson; Andrew D Kane; David K Menon; Daniel Rueckert; Ben Glocker Journal: Med Image Anal Date: 2016-10-29 Impact factor: 8.545
Authors: Jo Schlemper; Ozan Oktay; Michiel Schaap; Mattias Heinrich; Bernhard Kainz; Ben Glocker; Daniel Rueckert Journal: Med Image Anal Date: 2019-02-05 Impact factor: 8.545
Authors: Vania B Silva; Danilo Andrade De Jesus; Stefan Klein; Theo van Walsum; João Cardoso; Luisa Sánchez Brea; Pedro G Vaz Journal: J Biomed Opt Date: 2022-03 Impact factor: 3.758