Literature DB >> 35097261

Attribution-Driven Explanation of the Deep Neural Network Model via Conditional Microstructure Image Synthesis.

Shusen Liu¹, Bhavya Kailkhura¹, Jize Zhang¹, Anna M Hiszpanski¹, Emily Robertson¹, Donald Loveland¹, Xiaoting Zhong¹, T Yong-Jin Han¹.

Abstract

The materials science community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite their effectiveness in building highly predictive models, e.g., predicting material properties from microstructure imaging, due to their opaque nature fundamental challenges exist in extracting meaningful domain knowledge from the deep neural networks. In this work, we propose a technique for interpreting the behavior of deep learning models by injecting domain-specific attributes as tunable "knobs" in the material optimization analysis pipeline. By incorporating the material concepts in a generative modeling framework, we are able to explain what structure-to-property linkages these black-box models have learned, which provides scientists with a tool to leverage the full potential of deep learning for domain discoveries.

Entities: Chemical

Year: 2022 PMID： 35097261 PMCID： PMC8793074 DOI： 10.1021/acsomega.1c04796

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Inspired by the tremendous success of deep learning in commercial applications, there are significant efforts to leverage these tools to solve scientific challenges.[1−9] Scientists have successfully built machine learning models that can accurately predict material properties from raw microstructure images.[4,5,10−12] These machine learning models hold great potential in materials design because of their great predictive power and highly automated pipeline.[13−18] However, these complex machine learning models are often difficult to interpret and are considered as black boxes. This lack of interpretability prevents useful scientific insights from being gained from the predictive models and forms a bottleneck of applying machine learning for scientific settings that are exploratory in nature. Although many model explanation methods exist for image-based input,[19−23] they usually do not translate well to scientific image prediction tasks. The main reason is that most currently available explanation methods are tailored for natural images, which are very different from scientific images. In natural images, objects of interest are usually clear sets of pixels which are easily recognizable. An explanation of natural images can be provided via pixel salience heat maps by highlighting the image regions that contribute the most to the prediction. For example, a cat classification model can be successfully explained by highlighting the face of the cat. However, in the case of scientific images, what matters is usually not a few distinctive objects but the distribution of objects and the interaction between different objects.[24] Scientists are usually more interested in an attribute abstracted from an image rather than a single distinctive object within the image. To better illustrate the point, let us look at a real-world material structure–property prediction example using deep neural networks. First, similar to the state-of-the-art works for structure–property prediction we have built a convolutional neural network (CNN) that predicts the ultimate compressive strength (i.e., peak stress) of a material given scanning electron microscope (SEM) images of its feedstock powders as the input (see Figure a). A heat-map explanation generated by a typical CNN explanation approach is shown Figure b. This per-pixel explanation is not particularly insightful because the image pixel space does not contain important materials science attributes directly. Materials scientists are usually interested in microstructure statistics (e.g., size distribution and porosity level) and the underlying processing parameters (e.g., temperature and pressure), not specific image pixels. However, these attributes (e.g., microstructure statistics and processing parameters) are not directly measurable from a complicated SEM image (e.g., Figure a).

Figure 1

Overview of the actionable explanation pipeline. We have a deep neural network model (a) for predicting material peak stress from SEM images. Instead of trying to attribute the decision to the input pixel space (e.g., GradCAM[23]) (b), which cannot produce an understandable and actionable solution, we can relying on a generative model to produce a hypothetical lot that is conditioned on the key attributes of the material, from which we can obtain an explanation that is not only directly understandable by the material scientist but also can easily be translated into actionable guidelines in the material synthesis process (c). The effect of these domain attributes on a structure–property prediction model can be understood if input microstructure images contain systematically varying attributes. We can then understand the impact of the varying attribute by simply passing corresponding input images through the prediction model and observing its responses. Such input images can be hard and expensive to obtain via experiments, while synthetic images may better serve the purpose. Many microstructure synthesis methods have been proposed in the field of computational materials science. Classic examples include those based on n-point correlations,[25,26] hierarchical reconstruction with physical descriptors,[27,28] random fields,[29,30] and many more.[31,32] However, as pointed out by a recent extensive review by Bostanaband et al.,[24] these classic methods are usually based on specific assumptions of the underlying microstructure and do not generalize easily. They are also not readily compatible with deep learning models. One promising technique that emerged in recent years is the generative adversarial network (GAN),[33,34] which is a deep learning method that generates high-quality images. Several works[35−39] have proposed the adoption of GAN models for generating microstructure images. However, GANs have not been combined with structure–property prediction models for human-friendly explanation. Moreover, training a GAN model that can produce high-quality, high-resolution images conditioned on given attribute values remains a challenging task, especially in a small data setting. In this work, we propose to explain state-of-the-art deep learning structure–property prediction models in terms of human understandable scientific domain attributes with an image editing GAN model. Our GAN model successfully injects domain attributes in a posthoc manner into the prediction pipeline with only a few data labels (no more than 30 for each domain attribute). As illustrated in Figure c, we start by building a “domain-aware” generative model that can produce synthetic SEM images compliant to user-controlled attributes such as size and porosity. In other words, a synthetic SEM image with larger or smaller average size can be generated with respect to a reference image. We then leverage these attributes as explainable handles to reason more effectively by probing the predictive model behavior with generated hypothetical materials. This approach allows us to explain deep learning decision making in the language that a domain scientist can understand, i.e., how does a change in the crystal size (or porosity, etc.) impact the peak stress prediction, or how should material attributes be altered to obtain a material with higher peak stress? The key contributions of our work are summarized as follows: (1) We propose a general explainable deep learning approach for reasoning about model decision in human interpretable terms; (2) we demonstrate that the domain-aware generative model can capture the association between material attributes and intricate image features with an extremely small amount of supervised information (up to 30 unique labels); (3) we verify the effectiveness of the proposed model explanation approach with a real-world material optimization application and an experiment validation: by providing model explanations that are easily understandable for domain scientists and by conducting actual physical experiments to validate the explanation for the target material property. Our work demonstrates that human understandable attributes can be manipulated as model inputs to facilitate an extremely flexible model behavior explanation framework. It is also important to note that we achieve this kind of input manipulation with extremely little supervised information by leveraging advancement in the content editing GAN (e.g., ref (21)). We present our framework with a specific real-world material property prediction problem, but the framework is highly generalizable and can easily adapt to other materials systems and scientific attributes.

Results

Experimental Setup

In our exemplar case study, the material of interest is 2,4,6-triamino-1,3,5-trinitrobenzene (TATB), and the property of interest is the ultimate compressive strength (i.e., peak stress) of compacted TATB. The peak stress can vary significantly with changes in TATB crystal characteristics, including average size, size distribution, porosity, and surface textures to name a few. The experiment involves 30 different synthesis batches (referred to as lots in the context of this work) of material samples. Each batch is synthesized with different experiment parameters and shows different crystal characteristics. The raw material of each lot is imaged by a scanning electron microscope (SEM) that captures high-resolution images. As illustrated in Figure , for each sample, the entire SEM stub surface is mapped into smaller regions, and corresponding images (with a raw pixel resolution of 1024 × 1024) are collected for analysis. The stress and strain mechanical properties are tested for each lot in duplicate and under identical conditions. A deep neural network regression model is then trained to predict the peak stress of a material lot from a given SEM image tile (see Figure a).

Figure 2

SEM high-resolution scan (a) of a given lot divided into smaller image tiles. All the image tiles from 30 lots are used for training a CNN-based peak-stress prediction network. In (b), examples of the tile from different lots are illustrated. We can see that each image captured key characteristic, e.g., crystal size, of their respective lots. Material scientists identified several key microstructure attributes of TATB crystals, including size,[40,41]porosity, polydispersity,[42] and facetness.[43] The meanings of the attributes are the following: size, the average size of crystals; porosity, how “porous” the crystals are, i.e., does it look like they have a lot of small pinprick holes on the surface or are they solid; polydispersity, how varied the size of the crystals are, i.e., how broad is the size distribution; facetness, do the crystals look rounded/smooth at the edges or do they have flat faces that meet at different angles to give a faceted structure? The attribute values of our synthesized lots were estimated by human experts via visually examining a large number of images per lot. For each lot, multiple experts made their estimations independently, and their estimations were averaged to generate the final per-lot material attribute value. The estimated values are normalized (0.0–1.0). The average standard deviation across all features and lots was 0.11, indicating a reasonable consensus among experts. A detailed mean and variance breakdown of the annotation scores is summarized in the Supporting Information Part A.

Attribute-Guided Explanation Pipeline

The above-mentioned material microstructure attributes came from domain knowledge and tens of years of experimental experience.[40−43] They have helped materials scientists customize TATB to meet different functional requirements and can potentially help the understanding of material property prediction processes. However, these attributes have not been fully utilized in modern deep learning frameworks as state-of-the-art interpretation techniques[19−23] only provide pixel-based explanations (see Figure b). We currently have limited means to directly reason about the relationship between the microstructure attributes and the desired material property (peak stress). The proposed work aims to address this fundamental limitation by explaining the model behavior in the space of domain concepts, i.e., the material attributes. As illustrated in Figure c, by leveraging the power of the generative adversarial network,[44] we are able to produce new SEM images from a hypothetical lot with modified material attributes, which we can then feed into the predictive model to investigate the connection among the material attributes, the predicted peak stress, and the corresponding SEM images from the hypothetical lot. A crucial component for the success of such an explanation pipeline is the ability to generate appropriate images corresponding to given changes in the attribute values. Unlike the past works in this direction, our generative model takes an SEM image and target material attributes as inputs and then performs selective editing of the desired attributes in the given image. The additional information from the input image allows us to train high-quality editing models capable of generating hypothetical images that capture intricate details of the material attributes, while only relying on extremely sparse supervised information. There are only 30 lots and no more than 30 distinct values/labels for each attribute. The amount of data label can be considered as extremely small for any traditional supervised learning task. Compared to other attributes in GAN applications, i.e., face images, in which we have individually defined labels for all images, the supervised lot level information for the SEM images is extremely sparse. Besides the sparsity in labeling information, the other challenges originate from the presence of intricate patterns in the images themselves. For example, the porosity of a material is reflected by the presence of small pinprick holes on the surface of the crystals in the SEM image, which only occupy an extremely small number of pixels. Learning attributes represented by such a minuscule feature can be very challenging. Despite these obstacles, as illustrated in Figure , by utilizing the attGAN, the attribute-driven generation model can accurately capture these intricate material features. Such a success not only indicates the accuracy of the estimated material attributes by the scientists but also demonstrates the coherency among images from the same lot.

Figure 3

Illustration of material attributes-guided SEM image generation. The left column is the original SEM image. The middle and right columns show the GAN-generated images of hypothetical lots that increase or decrease the corresponding material attributes, respectively. The colored boxes highlight the corresponding regions in the image (different colors mark different regions in the image), in which we can find clear changes that reflect the alterations in the attribution. To ensure that the image generation model is producing the intended modification, we examine the quality of a generated SEM image from the following two aspects: (1) the synthesized images should be indistinguishable from the real SEM images, and (2) the generated images should exhibit material features that correspond to the modified attributes. For a comprehensive analysis, we not only looked into widely adopted computational metrics but also investigated human perception through the feedback from material scientists. Both of these evaluations corroborated that the GAN-based SEM image editing process produces satisfactory results, i.e., meaningful images from a hypothetical lot. To confirm the quality of the GAN model from a computational aspect, we closely examined the convergence and the loss behavior of the generator, the discriminator, and the classifier in our SEM attGAN[44] model. In particular, the low and stable reconstruction error indicates that the GAN can reproduce realistic looking SEM images. We also observed that the classifier can accurately predict the attributes from both the original and the hypothetical (GAN-generated) images. This implies that the generator can produce realistic modifications that can be correctly classified by the same classifier which correctly predicted attributes from the original images. Moreover, we also resort to material scientists for further evaluating the quality of generated images, as their domain knowledge is essential for understanding the intrinsic details and material concepts that may not easily be evaluated by the computational metrics. According to the feedback from three material scientists, they not only had a hard time distinguishing between the images from original and hypothetical lots but also confirmed that the modification reflects the intended changes as described by the attribute inputs. These observations are also demonstrated in the examples of the attribute-guided modification as shown in Figure (additional samples are also provided in the Supporting Information). We see in the top row that the larger crystal in the original image (left column) is naturally broken into smaller ones in the synthesized image that aim to decrease the overall size. Alternatively, we can see that smaller crystals are removed (or suppressed) in the synthesized image to increase the overall crystal size (highlighted by brown boxes). In the second row, we can see that the small porous structures are being added in the rightmost image (increased porosity), whereas the corresponding region is smoothed out in the middle image (decreased porosity). The polydispersity attribute also works well, as we can see the GAN try to remove smaller crystals in the middle image (decreased polydispersity) while increasing them in the case of increasing polydispersity. The facetness is the only attribute that does not seem to be effectively learned. Even though the model appears to reduce/increase the facetness (see the region marked by green and yellow squares), it also brings along more drastic change with respect to polydispersity and size. Moreover, there is likely an inherent dependency among these attributes, which we may not be able to eliminate even with additional data and labels. For more examples, please refer to the Supporting Information.

How Do Changes in Attributes Affect the Predicted Peak Stress?

As illustrated in Figure , once an image from the hypothetical lot is generated, we can feed it into the predictive model to predict the respective mechanical properties (e.g., peak stress). The introduction of the attribute-driven image generation process not only exposes the explicitly defined material features but also enables the ability to actively control them to form intervention operations that are essential for reasoning about counterfactual relationships (i.e., alter a material feature and then observe corresponding changes in the prediction). An added benefit of the image editing GAN is that it often strives to introduce minimal alteration in the image for the required attribute change (e.g., for a face image, the attGAN can change the hair color without altering other facial features). Such behavior makes it suitable to reason about the effect of the desired change, as the editing does not intend to change other features or the general structure of the original image. The most straightforward way to ascertain the relationship between the material attributes and the predicted mechanical properties is to do a simple “forward” sensitivity analysis by observing how the predicted stress changes as we vary the material properties in the image generation process. To understand the impact of a particular set of attributes, we can fix all other attributes while varying the values of the attributes of interest. We then feed the generated images to the predictive model and obtain the corresponding predicted peak stress. Such an analysis allows us to estimate the sensitivity (i.e., importance) for each of the material attributes predictions, which enables material scientists to form intuition about the influence of the attribute changes on the peak stress of a given lot. As illustrated in Figure a, by altering the size attribute when generating the images of hypothetical lots, we can observe changes in the predicted peak stress values. Here, we generated 11 images with size attribute varying from 0.0 to 1.0 (the full range of the attribute) while fixing all other attributes. Attribute values are shown on the x-axis, whereas the predicted peak stress (in psi) is shown on the y-axis. As shown in Figure a, for a single input SEM image instance, the predicted peak stress decreases as we increase the crystal particle size. The visual effects of the size attribute change can also be observed in the corresponding images (only three images are shown due to space constraint). Since our regression model generates the peak stress prediction for a given lot based on a single image tile (each lot image contains hundreds to thousands of image tiles), variations exist among the image tiles within each lot. To illustrate the average behavior of the given lot, as shown in Figure b, we show the aggregated results from all SEM images from the lot AS. In the box plot, each vertical glyph (along the x-axis that corresponds to the attribute value size) encodes predictions of all image tiles with the same attribute values. The y-axis shows the predicted peak stress. Despite the variation among the tiles in the lot, we can observe a similar trend in both (a) and (b).

Figure 4

Illustration of the forward explanation. In (a), the effect of varying the attribute of a single input image tile is illustrated. Here we explore the images from different hypothetical lots by varying the crystal size. The x-axis is the normalized material attribute values (0–1), which correspond to the relative strength of the changes, e.g., zero indicates a very small size with respect to the norm in the given lot. The horizontal dotted line illustrates the measured peak stress of the given lot. We can also perform similar aggregated analysis on all image tiles from a specific lot (e.g., lot AS) through a box plot[45] as illustrated in (b). We can evaluate how each of the four attributes impacts the peak stress prediction by applying a similar sensitivity analysis for different lots by varying the attribute values one at a time. In Figure , we illustrate three lots, with high (lot M), median (lot AS), and low (lot N) peak stress values, respectively. As shown in the plots, the size and facet attributes have a pronounced and consistent effect on the prediction output, which shows that having larger particles in general has a detrimental impact on the peak stress of the sample, while having more well-faceted crystals in the samples is beneficial to increasing the peak stress values. The inverse correlation between size and peak stress aligns with established materials knowledge such as the Hall–Petch rule.[46,47] This discovery is not particularly new to an experienced materials scientist, but we note that this prior knowledge was not explicitly encoded in our models in any way. Our method rediscovered it automatically through explaining the forward predictive model. Our analysis also highlights that there are no clear trends for both porosity and polydispersity attributes, which diverge depending on the selection of the lot. The divergence in these attributes shows that there is more than one single pathway to achieve a particular peak stress value, since the exemplar cases of M, AS, and N lots all have very different original attributes. The ability to modify different attributes of a given microstructure image for either increased or decreased performance provides powerful visualization cues to the subject matter experts while also informing them about which knobs should be tuned (and their sensitivities) to achieve the desired performance.

Figure 5

Forward explanations illustrating the sensitivity of the peak stress with respect to varying attribute values for different lots.

Optimizing Material Attributes for the Target Material Property

Despite the effectiveness of a “forward” explanation approach, i.e., explaining a model through the sensitivity of a given attribute of the material, it is extremely valuable to answer retrospective questions that can provide precise insights for improving the performance of a certain lot, e.g., what specific changes should be made to the attributes of a given lot to increase its peak stress? Such questions are often framed as an inverse material design problem,[48,49] in which we try to guide the design of the material given the desired specification. Such an analysis not only facilitates a direct way for examining the effects of simultaneous modification of multiple attributes but also produces a guidance for the synthesis process for achieving a more desirable material property. To address this challenge, we proposed a “backward” explanation scheme, in which an optimization is performed to obtain the necessary changes to the input attributes for producing the desired material property, i.e., peak stress. Let us define the generative editing model as G(I; A), where I is the original image and A = {a1, ..., a} are the material attributes that control the editing. Given an SEM image I for which the regressor R predicts a certain peak stress, we aim to identify attributes A′ with minimal deviation from A such that the edited image I(A′) = G(I; A′) would lead to a higher/lower peak stress prediction p. Given an image I with corresponding image attribute vector A and a target peak stress p, we formulate backward explanation problem as follows: The neural network model makes the formulation (eq ) nonlinear and nonconvex, which makes it difficult to solve the problem in its original form. Thus, we formulate a relaxed version of the problem that can be solved efficiently as followswhere mean-squared error (MSE) loss is used to encourage the predicted peak stress to be closer to the target peak stress p. Further, to obtain a sparser (i.e., understandable) explanation, we set q = 1 in the regularization term. Since both regressor R and generator G are differentiable, we can compute the gradient of the objective function via back-propagation and solve the optimization problem using a gradient descent algorithm. Despite the model’s aim to predict the peak stress of the given lot, the prediction itself is made based on a single image tile (each lot contains a large number of image tiles; please refer to Method section for details of the SEM image acquisition process). As a result, it is imperative to look beyond the behavior of individual prediction and examine the average behavior of all image tiles of a specific lot. The same applies to the model explanation, in which we can obtain a more comprehensive understanding of the behavior of the lot by averaging or aggregating its explanation. In Figure , the original image and modified image for increasing and decreasing predicted peak stress (based on the changes in the attributes) are shown. In the top row (lot N), we can see in both SEM images (left) and the attribute bar plot (right) that decreasing the crystal size while the increasing porosity, polydispersity, and facetness will lead to higher peak stress prediction. The same pattern can be observed for lot AT (midrow). Although the bottom row (lot F) shows a slight deviation compared to earlier patterns on porosity, the small absolute value indicates that the change in porosity does not contribute much to the changes in the generated image. One thing to note is that the increase of the facetness attributes in the image generation process seems to also lead to a marked increase in polydispersity and a reduction of average size, so the effect we observe for altering facetness is likely also due to the changes in the size attribute.

Figure 6

Backward actionable explanation for a single SEM image. The images from the original and hypothetical lot (through GAN-based image manipulation) are shown on the left, and the corresponding attribute changes that led to an increase or decrease of the predicted peak stress are illustrated in the bar chart on the right. We also estimate the overall behavior of the entire lot by averaging the backward explanation of all image titles from a given lot. As shown in Figure , we can utilize a similar attribute change plot to illustrate the averaging behaviors by showing the mean values. The plots show that the rule we identified by examining a backward explanation of individual images is consistent with the average behavior of the entire lot. The polydispersity, size, and facetness behave consistently in increasing/decreasing of the peak stress. Since both polydispersity and size attributes are directly associated with the mean and variance of the crystal size, the reduction of size appears to be the most effective route to increase the peak stress prediction.

Figure 7

Backward explain result for the given lots by averaging the per-image explanation.

Experimental Validation of the Explanation Results

As discussed before, the explanations generated by the proposed method, especially the impact of particle size, align well with general materials science domain knowledge.[46,47] Nevertheless, it remains valuable to verify the explanations with physical experiments for our specific TATB materials. We would like to experimentally verify how the peak stress performance of a given lot changes following the attGAN-generated guidelines. Analyses in the previous section show that the particle size is a dominating factor in determining the material peak stress, especially for the lots with lower peak stress values. An example is given in Figure , which shows that altering particle size is expected to drastically change the peak stress value of lot N. Note that lot N has the lowest peak stress of 819.81 psi among all available 30 material lots. We aim to experimentally verify the attGAN explanations by reducing the particle size of lot N and checking its peak stress change. The particle size reduction was achieved via a grinding step, during which large particles break and the average particle size reduces. The grinding process was carried out using a jar mill with a 000-size (0.3 L nominal volume) ceramic jar filled with 25 g of lot N powder, 50 g of water, and 30 pcs of a half-inch diameter ceramic cylinder. The samples were ground for 24 h before being filtered, collected, and dried to a constant mass. After the grinding step, we reexamined the sample microstructure by taking new SEM images and performed new mechanical tests to measure the peak stress of the new ground samples. The microstructure of the new ground sample is shown on the left side of Figure . We see that the grinding process is not perfect. The microstructure of the ground sample is not identical with that generated by the attGAN model, and size was not the only microstructure character that has changed after grinding. However, grinding is the best experimental approach currently available to us, and we do see a significant reduction in sample particle size. The uniaxial compressive stress–strain curve of the new ground sample is shown on the right side of Figure ; we observe a significant upward shift of the stress–strain curve and a substantial gain in the sample peak stress after grinding. The grinding operation alters the peak stress of lot N from 819.81 to 2361.09 psi or from the lowest peak stress of the 30 lots to surpassing the highest peak stress of the 30 lots (lot AM, 2300.30 psi).

Figure 8

By grinding the lot with the lowest peak stress, we are able to obtain a sample that surpasses all original lots in terms of the peak stress performance. The experimental result aligned with both the explanation suggestion of a given material lot and the common wisdom.[50] Though the grinding experiment is not perfect, it provides experimental support for the explanation generated by our attGAN model: smaller particle size leads to higher peak stress for the TATB material. We also checked the peak stress prediction of the new ground lot N using our previously trained forward model. The forward prediction of the new ground lot N is around 1600 psi, which is in the right direction (much higher than the unground lot N 819.81 psi) but not accurate enough (less than the ground truth experimental result of 2361.09 psi). This discrepancy between the predicted and the true peak stress is probably related to the fact that the grinding process might have introduced microstructure features that did not exist in the original training set. From a materials science perspective, it is not surprising that the ground (thus deformed) TATB crystals behave differently than the undeformed TATB crystals.[51,52] In machine learning terms, these changes will lead to a distribution shift that would impact the prediction accuracy.

Discussion

The power of the proposed technique lies not only in its ability to provide fine-grained (i.e., per-lot and per-image explanation) explanations of the material behavior that is directly linked to human understandable guidelines but also in its capability to visualize the hypothetical target lot through synthesizing SEM images with given material attributes. In this particular materials science application, compared to the conventional wisdom, our method not only explicitly confirms the impact of the crystal sizes but also produces a realistic depiction of the appearance of the material for which the given predictive model would predict to have a higher peak stress value. As a result, the method enables a rapid and multifaceted analysis: from a single instance of the prediction to the averaging behavior of the entire lot and from the sensitivity of a single attribute to the joint influence of multiple ones for achieving an optimal objective value. Our approach is illustrated with a real-world materials optimization application. By examination of the model explanations, a well-established scientific insight (inverse correlation between size and sample strength) was rediscovered. We note that the approach itself is general and can be valuable in emerging applications where scientists have not yet formed deep scientific insights by serving as a computation tool to provide intuitive explanations without extensive experiments. Moreover, since we obtain the explanations through controlling the attribute-aware variation in the input data, compared to many state-of-the-art explanation techniques, the proposed technique is not restricted to a specific model and can be adapted to understand or compare the behavior of different models. As with any newly developed techniques, our approach has some shortcomings that need future improvement. One particular challenge originates from the potential distribution shift from the original images to the reconstructed images (when we generate new image tiles using the attributes associated with the corresponding lot). Even though a human viewer often cannot discern any noticeable difference between the original images and reconstructed ones, these unnoticeable changes can lead to minor prediction shifts from the original ones. Moreover, due to the way the regression model is built, we predict the peak stress for the given lot based on a single SEM image tile, which leads to built-in variation among predicted values generated from different image tiles from the same lot. We are currently exploring other approaches to build more robust regression models that capture the overall sample quality from limited data (i.e, data-efficient model design[53]), a common obstacle in applying machine learning to scientific data. For domain scientists to adopt the emerging machine learning techniques, domain-specific explanations of how machine learning models function are essential. Without tangible and actionable information from machine learning models, the overall scientific benefit that machine learning can bring is limited. This work serves as a proof that it is possible to directly reason about meaningful domain concepts and extract domain insights in a complex machine learning pipeline. We believe that it is an important step toward a credible use of machone learning in material science applications that can potentially lead to new scientific discoveries.

Method

SEM Image Data Acquisition

Our experiment involves 30 different synthesis batches (referred to as lots in the context of this work) of material samples, with each batch showing different overall crystal characteristics. Each of the 30 lots is analyzed with a Zeiss Sigma HD VP scanning electron microscope (SEM) using a 30 μm aperture, 2.00 keV beam energy, and ca. 5.1 mm working distance to capture high-resolution images. The software Atlas is used to automate the image collection. As illustrated in Figure , for each sample, the entire SEM stub surface is mapped, and corresponding images are collected with slight overlap to create a stitched mosaic of the full area. The field of each mosaic tile is 256.19 μm × 256.19 μm with a pixel size of 256.19 nm × 256.19 nm (1024 × 1024 image size). In total, we captured 69 894 sample images from 30 lots of TATB. These images are then downselected by removing the ones with black margins (i.e., at the edge of the scan) and other inconsistencies to ensure the quality of the training and validation sets, which consist of 59 690 images with a resolution of 256 × 256 (downsampled from 1024 × 1024 in the original image).

Material Mechanical Properties Experiment

The stress and strain mechanical properties are tested for each lot by uniaxially pressing duplicate samples from each TATB powder lot in a cylindrical die at ambient temperature to 0.5 in. diameter by 1 in. height, with a nominal density of 1.800 g/cm3. Strain-controlled compression tests were run in duplicate at 23 °C at a ramp rate of 0.0001 s–1 on an MTS Mini-Bionix servohydraulic test system model 858 with a pair of 0.5 in. gauge length extensometers to collect strain data. From the obtained stress–strain curve, only the peak stress values were considered as the outputs of the machine learning models, resulting in an image data set, in which the same properties are assigned to all images (tiles) from the same lot.

Details of the Peak Stress Prediction Model

The regression model architecture for peak stress prediction is based on the Wide ResNet model,[54] with a total of 28 convolutional layers and a widening factor of 1, followed by an adaptive average pooling layer. Since the Wide ResNet model was originally proposed for classification, we also need to replace the final softmax layer with a fully connected regression layer tanh activation to predict continuous scalar values. Our implementation is based on PyTorch. We set aside 10% of the training data for validation, leaving a total of 53 721 training images and 5 969 validation images. All images are preprocessed by subtracting the mean and dividing by the standard deviation. For data augmentation, we do horizontal flips. We train the regression model with the mean-squared error (MSE) loss function and the Adam optimizer[55] with a learning rate of 0.001 and a minibatch size of 64. We used early stopping to terminate training when the validation performance did not improve, and the whole training procedure stops in 48 epochs. Globally, the regression model achieved a root-mean-square error (RMSE) of 66.0 and a mean absolute percentage error (MAPE) of 3.07% across all lots. For each lot, the peak stress predictions versus the ground-truth peak stress values are shown in Figure , where the error bars present the standard deviation of predictions across images in the lot. The root-mean-square error per lot is plotted in Figure .

Figure 9

Predicted and ground-truth values of peak stress for different lots.

Figure 10

RMSE of the predicted peak stress values for different lots.

Predicted and ground-truth values of peak stress for different lots. RMSE of the predicted peak stress values for different lots.

Detail of the Generative Adversarial Network (GAN) Model

The generative adversarial network (GAN)[33] has revolutionized our ability to generate incredibly realistic samples from highly complex distributions.[56,57] In general, a GAN transforms noise vectors (Z vectors from a high-dimensional latent space) into synthetic samples I, resembling data in the training set. The GAN is learned in an adversarial manner, in which a discriminator D(I) (differentiate real vs fake samples) and a generator G(Z) (produce realistic fake samples) are trained together to compete with each other. One limitation of the standard GAN is that the latent space is not immediately understandable, which limits our ability to control the generated content. This problem is partially addressed by conditional GAN that is conditioned on the labels,[58] i.e., generate different types of images by providing both a noise vector Z and a label L. Still, these models, like most GANs, are often extremely hard to train and require a large number of samples for even moderately complex data. Our initial attempts to apply conditional GAN on our SEM image data with the lot indices or other properties as labels were met with mixed results. On one hand, the GAN appears to produce rather realistic-looking SEM images, but the generated images do not reflect the corresponding labels on which they are conditioned. This is likely due to an insufficient amount of images and labels as well as the innate complexity of the SEM image data. To mitigate the training challenge and to improve the control over generated contents, we turn our focus to another class of GANs that make selective modifications to existing images rather than generating them from scratch (i.e., transform a vector into the images). Instead of providing a noise vector to the generator, these image editing GANs (e.g., attGAN[44]) take an input image along with the attributes A that describe the desirable changes (G(I, A)). For face images, such a GAN can be trained to alter attributes, such as the color of the hair or the presence of eyewear in the original image. The ability to modify SEM images is achieved by training an AttGAN[44] with the material attribute labels provided by materials sciences (the continuous attribute values between 0 and 1 are converted into binary labels for training the attGAN). Compared to the training setup for the celebrity image data set, the largest difference with SEM images is the number of available labels. For the celebrity images, labels are obtained for each individual face image. However, it is the case for the SEM images, in which we only have a label for each lot that contains a large number of images. We trained the attGAN utilizing a PyTorch implementation with the same learning rate as for the celebrity data set. The output image of the GAN is the same as the input with a resolution of 256 × 256. We trained the model for 70 epochs (around 40 h on an NVIDIA T4 GPU). The training cutoff is determined by examining the sample results during training, where additional epochs do not appear to improve the visual fidelity of the generated modification. Despite the relatively lengthy time to train the generative model, this process only incurs a one time cost. The explanation method can then leverage the trained attGAN generative model to produce an explanation in a timely fashion; e.g., it takes 1 min or less to produce an explanation of a given forward prediction. The training details for the GAN are shown in Figure . The attGAN jointly trains the generator (contains both encoder and decoder), the discriminator, and the classifier for predicting attributes. Please refer to the original work[44] for more details about the model architecture. Figure a illustrates the generator’s reconstruction error. As the error decreases, the generator is able to produce more realistic SEM images. (b) shows the discriminator’s adversarial loss. The loss increases over training iterations denoting that the generator is producing more realistic images, in turn fooling the discriminator. (c) and (d) show the classification loss for predicting the attribute labels from both the original (part of the overall training loss for the discriminator) and the generated (part of the overall training loss for the generator) SEM images. The decreasing and then stabilizing behavior of the classifier losses indicates the jointly trained classifier can accurately predict the attributes for both the real and fake (generated) images. This also implies that the generator can produce realistic modifications that can be correctly classified by the same classifier that is predicting attributes from the real images correctly. Moreover, the low and stable reconstruction indicates the generator can reproduce realistic looking SEM images. These combined observations provide evidence to support our claim that the GAN is trained well and the quality of the image editing process is good as showcased by the classifier performance on generated images.

Figure 11

Performance curves for attGAN training. (a) The generator’s reconstruction loss. As the error decreases, the generator is able to produce more realistic SEM images. (b) The discriminator’s adversarial loss, which increases over training iterations as the generator is producing more realistic images, in turn fooling the discriminator. (c) and (d) The classification loss for predicting the attribute labels for both the original (part of the overall training loss for the discriminator) and the generated (part of the overall training loss for the generator) SEM images.

12 in total

Attribution-Driven Explanation of the Deep Neural Network Model via Conditional Microstructure Image Synthesis.

Introduction

Results

Experimental Setup

Attribute-Guided Explanation Pipeline

How Do Changes in Attributes Affect the Predicted Peak Stress?

Optimizing Material Attributes for the Target Material Property

Experimental Validation of the Explanation Results

Discussion

Method

SEM Image Data Acquisition

Material Mechanical Properties Experiment

Details of the Peak Stress Prediction Model

Detail of the Generative Adversarial Network (GAN) Model

1. Efficient 3D porous microstructure reconstruction via Gaussian random field and hybrid optimization.

2. AttGAN: Facial Attribute Editing by Only Changing What You Want.

3. Deep learning for biology.

4. Inferring low-dimensional microstructure representations using convolutional neural networks.

5. Microstructure synthesis using style-based generative adversarial networks.

6. Stalking the Materials Genome: A Data-Driven Approach to the Virtual Design of Nanostructured Polymers.

7. Design of Non-Deterministic Quasi-random Nanophotonic Structures Using Fourier Space Representations.

8. Improved surrogates in inertial confinement fusion with manifold and cycle consistencies.

Review 9. Data-Driven Materials Science: Status, Challenges, and Perspectives.

Review 10. Machine learning for molecular and materials science.