The materials science community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite their effectiveness in building highly predictive models, e.g., predicting material properties from microstructure imaging, due to their opaque nature fundamental challenges exist in extracting meaningful domain knowledge from the deep neural networks. In this work, we propose a technique for interpreting the behavior of deep learning models by injecting domain-specific attributes as tunable "knobs" in the material optimization analysis pipeline. By incorporating the material concepts in a generative modeling framework, we are able to explain what structure-to-property linkages these black-box models have learned, which provides scientists with a tool to leverage the full potential of deep learning for domain discoveries.
The materials science community has been increasingly interested in harnessing the power of deep learning to solve various domain challenges. However, despite their effectiveness in building highly predictive models, e.g., predicting material properties from microstructure imaging, due to their opaque nature fundamental challenges exist in extracting meaningful domain knowledge from the deep neural networks. In this work, we propose a technique for interpreting the behavior of deep learning models by injecting domain-specific attributes as tunable "knobs" in the material optimization analysis pipeline. By incorporating the material concepts in a generative modeling framework, we are able to explain what structure-to-property linkages these black-box models have learned, which provides scientists with a tool to leverage the full potential of deep learning for domain discoveries.
Inspired by the tremendous
success of deep learning in commercial
applications, there are significant efforts to leverage these tools
to solve scientific challenges.[1−9] Scientists have successfully built machine learning models that
can accurately predict material properties from raw microstructure
images.[4,5,10−12] These machine learning models hold great potential in materials
design because of their great predictive power and highly automated
pipeline.[13−18] However, these complex machine learning models are often difficult
to interpret and are considered as black boxes. This lack of interpretability
prevents useful scientific insights from being gained from the predictive
models and forms a bottleneck of applying machine learning for scientific
settings that are exploratory in nature.Although many model
explanation methods exist for image-based input,[19−23] they usually do not translate well to scientific
image prediction
tasks. The main reason is that most currently available explanation
methods are tailored for natural images, which are very different
from scientific images. In natural images, objects of interest are
usually clear sets of pixels which are easily recognizable. An explanation
of natural images can be provided via pixel salience heat maps by
highlighting the image regions that contribute the most to the prediction.
For example, a cat classification model can be successfully explained
by highlighting the face of the cat. However, in the case of scientific
images, what matters is usually not a few distinctive objects but
the distribution of objects and the interaction between different
objects.[24] Scientists are usually more
interested in an attribute abstracted from an image rather than a
single distinctive object within the image.To better illustrate
the point, let us look at a real-world material
structure–property prediction example using deep neural networks.
First, similar to the state-of-the-art works for structure–property
prediction we have built a convolutional neural network (CNN) that
predicts the ultimate compressive strength (i.e., peak stress) of
a material given scanning electron microscope (SEM) images of its
feedstock powders as the input (see Figure a). A heat-map explanation generated by a
typical CNN explanation approach is shown Figure b. This per-pixel explanation is not particularly
insightful because the image pixel space does not contain important
materials science attributes directly. Materials scientists are usually
interested in microstructure statistics (e.g., size distribution and
porosity level) and the underlying processing parameters (e.g., temperature
and pressure), not specific image pixels. However, these attributes
(e.g., microstructure statistics and processing parameters) are not
directly measurable from a complicated SEM image (e.g., Figure a).
Figure 1
Overview of the actionable
explanation pipeline. We have a deep
neural network model (a) for predicting material peak stress from
SEM images. Instead of trying to attribute the decision to the input
pixel space (e.g., GradCAM[23]) (b), which
cannot produce an understandable and actionable solution, we can relying
on a generative model to produce a hypothetical lot that is conditioned
on the key attributes of the material, from which we can obtain an
explanation that is not only directly understandable by the material
scientist but also can easily be translated into actionable guidelines
in the material synthesis process (c).
Overview of the actionable
explanation pipeline. We have a deep
neural network model (a) for predicting material peak stress from
SEM images. Instead of trying to attribute the decision to the input
pixel space (e.g., GradCAM[23]) (b), which
cannot produce an understandable and actionable solution, we can relying
on a generative model to produce a hypothetical lot that is conditioned
on the key attributes of the material, from which we can obtain an
explanation that is not only directly understandable by the material
scientist but also can easily be translated into actionable guidelines
in the material synthesis process (c).The effect of these domain attributes on a structure–property
prediction model can be understood if input microstructure images
contain systematically varying attributes. We can then understand
the impact of the varying attribute by simply passing corresponding
input images through the prediction model and observing its responses.
Such input images can be hard and expensive to obtain via experiments,
while synthetic images may better serve the purpose. Many microstructure
synthesis methods have been proposed in the field of computational
materials science. Classic examples include those based on n-point correlations,[25,26] hierarchical reconstruction
with physical descriptors,[27,28] random fields,[29,30] and many more.[31,32] However, as pointed out by a
recent extensive review by Bostanaband et al.,[24] these classic methods are usually based on specific assumptions
of the underlying microstructure and do not generalize easily. They
are also not readily compatible with deep learning models. One promising
technique that emerged in recent years is the generative adversarial
network (GAN),[33,34] which is a deep learning method
that generates high-quality images. Several works[35−39] have proposed the adoption of GAN models for generating
microstructure images. However, GANs have not been combined with structure–property
prediction models for human-friendly explanation. Moreover, training
a GAN model that can produce high-quality, high-resolution images
conditioned on given attribute values remains a challenging task,
especially in a small data setting.In this work, we propose
to explain state-of-the-art deep learning
structure–property prediction models in terms of human understandable
scientific domain attributes with an image editing GAN model. Our
GAN model successfully injects domain attributes in a posthoc manner
into the prediction pipeline with only a few data labels (no more
than 30 for each domain attribute). As illustrated in Figure c, we start by building a “domain-aware”
generative model that can produce synthetic SEM images compliant to
user-controlled attributes such as size and porosity. In other words,
a synthetic SEM image with larger or smaller average size can be generated
with respect to a reference image. We then leverage these attributes
as explainable handles to reason more effectively by probing the predictive
model behavior with generated hypothetical materials. This approach
allows us to explain deep learning decision making in the language
that a domain scientist can understand, i.e., how does a change in
the crystal size (or porosity, etc.) impact the peak stress prediction,
or how should material attributes be altered to obtain a material
with higher peak stress?The key contributions of our work are
summarized as follows: (1)
We propose a general explainable deep learning approach for reasoning
about model decision in human interpretable terms; (2) we demonstrate
that the domain-aware generative model can capture the association
between material attributes and intricate image features with an extremely
small amount of supervised information (up to 30 unique labels); (3)
we verify the effectiveness of the proposed model explanation approach
with a real-world material optimization application and an experiment
validation: by providing model explanations that are easily understandable
for domain scientists and by conducting actual physical experiments
to validate the explanation for the target material property. Our
work demonstrates that human understandable attributes can be manipulated
as model inputs to facilitate an extremely flexible model behavior
explanation framework. It is also important to note that we achieve
this kind of input manipulation with extremely little supervised information
by leveraging advancement in the content editing GAN (e.g., ref (21)). We present our framework
with a specific real-world material property prediction problem, but
the framework is highly generalizable and can easily adapt to other
materials systems and scientific attributes.
Results
Experimental
Setup
In our exemplar case study, the
material of interest is 2,4,6-triamino-1,3,5-trinitrobenzene (TATB),
and the property of interest is the ultimate compressive strength
(i.e., peak stress) of compacted TATB. The peak stress can vary significantly
with changes in TATB crystal characteristics, including average size,
size distribution, porosity, and surface textures to name a few. The
experiment involves 30 different synthesis batches (referred to as lots in the context of this work) of material samples. Each
batch is synthesized with different experiment parameters and shows
different crystal characteristics. The raw material of each lot is imaged by a scanning electron microscope (SEM) that
captures high-resolution images. As illustrated in Figure , for each sample, the entire
SEM stub surface is mapped into smaller regions, and corresponding
images (with a raw pixel resolution of 1024 × 1024) are collected
for analysis. The stress and strain mechanical properties are tested
for each lot in duplicate and under identical conditions. A deep neural
network regression model is then trained to predict the peak stress
of a material lot from a given SEM image tile (see Figure a).
Figure 2
SEM high-resolution scan
(a) of a given lot divided into smaller
image tiles. All the image tiles from 30 lots are used for training
a CNN-based peak-stress prediction network. In (b), examples of the
tile from different lots are illustrated. We can see that each image
captured key characteristic, e.g., crystal size, of their respective
lots.
SEM high-resolution scan
(a) of a given lot divided into smaller
image tiles. All the image tiles from 30 lots are used for training
a CNN-based peak-stress prediction network. In (b), examples of the
tile from different lots are illustrated. We can see that each image
captured key characteristic, e.g., crystal size, of their respective
lots.Material scientists identified
several key microstructure attributes
of TATB crystals, including size,[40,41]porosity, polydispersity,[42] and facetness.[43] The meanings of the attributes are the following: size,
the average size of crystals; porosity, how “porous”
the crystals are, i.e., does it look like they have a lot of small
pinprick holes on the surface or are they solid; polydispersity, how varied the size of the crystals are, i.e., how broad is the
size distribution; facetness, do the crystals look
rounded/smooth at the edges or do they have flat faces that meet at
different angles to give a faceted structure? The attribute values
of our synthesized lots were estimated by human experts via visually
examining a large number of images per lot. For each lot, multiple
experts made their estimations independently, and their estimations
were averaged to generate the final per-lot material attribute value.
The estimated values are normalized (0.0–1.0). The average
standard deviation across all features and lots was 0.11, indicating
a reasonable consensus among experts. A detailed mean and variance
breakdown of the annotation scores is summarized in the Supporting Information Part A.
Attribute-Guided
Explanation Pipeline
The above-mentioned
material microstructure attributes came from domain knowledge and
tens of years of experimental experience.[40−43] They have helped materials scientists
customize TATB to meet different functional requirements and can potentially
help the understanding of material property prediction processes.
However, these attributes have not been fully utilized in modern deep
learning frameworks as state-of-the-art interpretation techniques[19−23] only provide pixel-based explanations (see Figure b). We currently have limited means to directly
reason about the relationship between the microstructure attributes
and the desired material property (peak stress). The proposed work
aims to address this fundamental limitation by explaining the model
behavior in the space of domain concepts, i.e., the material attributes.
As illustrated in Figure c, by leveraging the power of the generative adversarial network,[44] we are able to produce new SEM images from a
hypothetical lot with modified material attributes, which we can then
feed into the predictive model to investigate the connection among
the material attributes, the predicted peak stress, and the corresponding
SEM images from the hypothetical lot.A crucial component for
the success of such an explanation pipeline is the ability to generate
appropriate images corresponding to given changes in the attribute
values. Unlike the past works in this direction, our generative model
takes an SEM image and target material attributes as inputs and then
performs selective editing of the desired attributes in the given
image. The additional information from the input image allows us to
train high-quality editing models capable of generating hypothetical
images that capture intricate details of the material attributes,
while only relying on extremely sparse supervised information.There are only 30 lots and no more than 30 distinct values/labels
for each attribute. The amount of data label can be considered as
extremely small for any traditional supervised learning task. Compared
to other attributes in GAN applications, i.e., face images, in which
we have individually defined labels for all images, the supervised
lot level information for the SEM images is extremely sparse. Besides
the sparsity in labeling information, the other challenges originate
from the presence of intricate patterns in the images themselves.
For example, the porosity of a material is reflected by the presence
of small pinprick holes on the surface of the crystals in the SEM
image, which only occupy an extremely small number of pixels. Learning
attributes represented by such a minuscule feature can be very challenging.
Despite these obstacles, as illustrated in Figure , by utilizing the attGAN, the attribute-driven
generation model can accurately capture these intricate material features.
Such a success not only indicates the accuracy of the estimated material
attributes by the scientists but also demonstrates the coherency among
images from the same lot.
Figure 3
Illustration of material attributes-guided SEM
image generation.
The left column is the original SEM image. The middle and right columns
show the GAN-generated images of hypothetical lots that increase or
decrease the corresponding material attributes, respectively. The
colored boxes highlight the corresponding regions in the image (different
colors mark different regions in the image), in which we can find
clear changes that reflect the alterations in the attribution.
Illustration of material attributes-guided SEM
image generation.
The left column is the original SEM image. The middle and right columns
show the GAN-generated images of hypothetical lots that increase or
decrease the corresponding material attributes, respectively. The
colored boxes highlight the corresponding regions in the image (different
colors mark different regions in the image), in which we can find
clear changes that reflect the alterations in the attribution.To ensure that the image generation model is producing
the intended
modification, we examine the quality of a generated SEM image from
the following two aspects: (1) the synthesized images should be indistinguishable
from the real SEM images, and (2) the generated images should exhibit
material features that correspond to the modified attributes. For
a comprehensive analysis, we not only looked into widely adopted computational
metrics but also investigated human perception through the feedback
from material scientists. Both of these evaluations corroborated that
the GAN-based SEM image editing process produces satisfactory results,
i.e., meaningful images from a hypothetical lot. To confirm the quality
of the GAN model from a computational aspect, we closely examined
the convergence and the loss behavior of the generator, the discriminator,
and the classifier in our SEM attGAN[44] model.
In particular, the low and stable reconstruction error indicates that
the GAN can reproduce realistic looking SEM images. We also observed
that the classifier can accurately predict the attributes from both
the original and the hypothetical (GAN-generated) images. This implies
that the generator can produce realistic modifications that can be
correctly classified by the same classifier which correctly predicted
attributes from the original images. Moreover, we also resort to material
scientists for further evaluating the quality of generated images,
as their domain knowledge is essential for understanding the intrinsic
details and material concepts that may not easily be evaluated by
the computational metrics. According to the feedback from three material
scientists, they not only had a hard time distinguishing between the
images from original and hypothetical lots but also confirmed that
the modification reflects the intended changes as described by the
attribute inputs.These observations are also demonstrated in
the examples of the
attribute-guided modification as shown in Figure (additional samples are also provided in
the Supporting Information). We see in
the top row that the larger crystal in the original image (left column)
is naturally broken into smaller ones in the synthesized image that
aim to decrease the overall size. Alternatively, we can see that smaller
crystals are removed (or suppressed) in the synthesized image to increase
the overall crystal size (highlighted by brown boxes). In the second
row, we can see that the small porous structures are being added in
the rightmost image (increased porosity), whereas the corresponding
region is smoothed out in the middle image (decreased porosity). The
polydispersity attribute also works well, as we can see the GAN try
to remove smaller crystals in the middle image (decreased polydispersity)
while increasing them in the case of increasing polydispersity. The
facetness is the only attribute that does not seem to be effectively
learned. Even though the model appears to reduce/increase the facetness
(see the region marked by green and yellow squares), it also brings
along more drastic change with respect to polydispersity and size.
Moreover, there is likely an inherent dependency among these attributes,
which we may not be able to eliminate even with additional data and
labels. For more examples, please refer to the Supporting Information.
How Do Changes in Attributes
Affect the Predicted Peak Stress?
As illustrated in Figure , once an image from
the hypothetical lot is generated, we
can feed it into the predictive model to predict the respective mechanical
properties (e.g., peak stress). The introduction of the attribute-driven
image generation process not only exposes the explicitly defined material
features but also enables the ability to actively control them to
form intervention operations that are essential for reasoning about
counterfactual relationships (i.e., alter a material feature and then
observe corresponding changes in the prediction). An added benefit
of the image editing GAN is that it often strives to introduce minimal
alteration in the image for the required attribute change (e.g., for
a face image, the attGAN can change the hair color without altering
other facial features). Such behavior makes it suitable to reason
about the effect of the desired change, as the editing does not intend
to change other features or the general structure of the original
image.The most straightforward way to ascertain the relationship
between the material attributes and the predicted mechanical properties
is to do a simple “forward” sensitivity analysis by
observing how the predicted stress changes as we vary the material
properties in the image generation process. To understand the impact
of a particular set of attributes, we can fix all other attributes
while varying the values of the attributes of interest. We then feed
the generated images to the predictive model and obtain the corresponding
predicted peak stress. Such an analysis allows us to estimate the
sensitivity (i.e., importance) for each of the material attributes
predictions, which enables material scientists to form intuition about
the influence of the attribute changes on the peak stress of a given
lot.As illustrated in Figure a, by altering the size attribute when generating the
images
of hypothetical lots, we can observe changes in the predicted peak
stress values. Here, we generated 11 images with size attribute varying
from 0.0 to 1.0 (the full range of the attribute) while fixing all
other attributes. Attribute values are shown on the x-axis, whereas the predicted peak stress (in psi) is shown on the y-axis. As shown in Figure a, for a single input SEM image instance, the predicted
peak stress decreases as we increase the crystal particle size. The
visual effects of the size attribute change can also be observed in
the corresponding images (only three images are shown due to space
constraint). Since our regression model generates the peak stress
prediction for a given lot based on a single image tile (each lot
image contains hundreds to thousands of image tiles), variations exist
among the image tiles within each lot. To illustrate the average behavior
of the given lot, as shown in Figure b, we show the aggregated results from all SEM images
from the lot AS. In the box plot, each vertical glyph (along the x-axis that corresponds to the attribute value size) encodes
predictions of all image tiles with the same attribute values. The y-axis shows the predicted peak stress. Despite the variation
among the tiles in the lot, we can observe a similar trend in both
(a) and (b).
Figure 4
Illustration of the forward explanation. In (a), the effect
of
varying the attribute of a single input image tile is illustrated.
Here we explore the images from different hypothetical lots by varying
the crystal size. The x-axis is the normalized material
attribute values (0–1), which correspond to the relative strength
of the changes, e.g., zero indicates a very small size with respect
to the norm in the given lot. The horizontal dotted line illustrates
the measured peak stress of the given lot. We can also perform similar
aggregated analysis on all image tiles from a specific lot (e.g.,
lot AS) through a box plot[45] as illustrated
in (b).
Illustration of the forward explanation. In (a), the effect
of
varying the attribute of a single input image tile is illustrated.
Here we explore the images from different hypothetical lots by varying
the crystal size. The x-axis is the normalized material
attribute values (0–1), which correspond to the relative strength
of the changes, e.g., zero indicates a very small size with respect
to the norm in the given lot. The horizontal dotted line illustrates
the measured peak stress of the given lot. We can also perform similar
aggregated analysis on all image tiles from a specific lot (e.g.,
lot AS) through a box plot[45] as illustrated
in (b).We can evaluate how each of the
four attributes impacts the peak
stress prediction by applying a similar sensitivity analysis for different
lots by varying the attribute values one at a time. In Figure , we illustrate three lots,
with high (lot M), median (lot AS), and low (lot N) peak stress values,
respectively. As shown in the plots, the size and facet attributes
have a pronounced and consistent effect on the prediction output,
which shows that having larger particles in general has a
detrimental impact on the peak stress of the sample, while having
more well-faceted crystals in the samples is beneficial to increasing
the peak stress values. The inverse correlation between size
and peak stress aligns with established materials knowledge such as
the Hall–Petch rule.[46,47] This discovery is not
particularly new to an experienced materials scientist, but we note
that this prior knowledge was not explicitly encoded in our models
in any way. Our method rediscovered it automatically through explaining
the forward predictive model. Our analysis also highlights that there
are no clear trends for both porosity and polydispersity attributes,
which diverge depending on the selection of the lot. The divergence
in these attributes shows that there is more than one single
pathway to achieve a particular peak stress value, since
the exemplar cases of M, AS, and N lots all have very different original
attributes. The ability to modify different attributes of a given
microstructure image for either increased or decreased performance
provides powerful visualization cues to the subject matter experts
while also informing them about which knobs should be tuned (and their
sensitivities) to achieve the desired performance.
Figure 5
Forward explanations
illustrating the sensitivity of the peak stress
with respect to varying attribute values for different lots.
Forward explanations
illustrating the sensitivity of the peak stress
with respect to varying attribute values for different lots.
Optimizing Material Attributes for the Target
Material Property
Despite the effectiveness of a “forward”
explanation
approach, i.e., explaining a model through the sensitivity of a given
attribute of the material, it is extremely valuable to answer retrospective
questions that can provide precise insights for improving the performance
of a certain lot, e.g., what specific changes should be made to the
attributes of a given lot to increase its peak stress? Such questions
are often framed as an inverse material design problem,[48,49] in which we try to guide the design of the material given the desired
specification. Such an analysis not only facilitates a direct way
for examining the effects of simultaneous modification of multiple
attributes but also produces a guidance for the synthesis process
for achieving a more desirable material property.To address
this challenge, we proposed a “backward” explanation
scheme, in which an optimization is performed to obtain the necessary
changes to the input attributes for producing the desired material
property, i.e., peak stress. Let us define the generative editing
model as G(I; A), where I is the original image and A = {a1, ..., a} are the material attributes that control the editing. Given an
SEM image I for which the regressor R predicts a certain peak stress, we aim to identify attributes A′ with minimal deviation from A such
that the edited image I(A′) = G(I; A′) would lead to
a higher/lower peak stress prediction p. Given an
image I with corresponding image attribute vector A and a target peak stress p, we formulate
backward explanation problem as follows:The neural network model makes the formulation (eq ) nonlinear and nonconvex, which
makes it difficult to solve the problem in its original form. Thus,
we formulate a relaxed version of the problem that can be solved efficiently
as followswhere mean-squared error (MSE) loss is used
to encourage the predicted peak stress to be closer to the target
peak stress p. Further, to obtain a sparser (i.e.,
understandable) explanation, we set q = 1 in the
regularization term. Since both regressor R and generator G are differentiable, we can compute the gradient of the
objective function via back-propagation and solve the optimization
problem using a gradient descent algorithm.Despite the model’s
aim to predict the peak stress of the
given lot, the prediction itself is made based on a single image tile
(each lot contains a large number of image tiles; please refer to Method section for details of the SEM image acquisition
process). As a result, it is imperative to look beyond the behavior
of individual prediction and examine the average behavior of all image
tiles of a specific lot. The same applies to the model explanation,
in which we can obtain a more comprehensive understanding of the behavior
of the lot by averaging or aggregating its explanation.In Figure , the
original image and modified image for increasing and decreasing predicted
peak stress (based on the changes in the attributes) are shown. In
the top row (lot N), we can see in both SEM images (left) and the
attribute bar plot (right) that decreasing the crystal size while
the increasing porosity, polydispersity, and facetness will lead to
higher peak stress prediction. The same pattern can be observed for
lot AT (midrow). Although the bottom row (lot F) shows a slight deviation
compared to earlier patterns on porosity, the small absolute value
indicates that the change in porosity does not contribute much to
the changes in the generated image. One thing to note is that the
increase of the facetness attributes in the image generation process
seems to also lead to a marked increase in polydispersity and a reduction
of average size, so the effect we observe for altering facetness is
likely also due to the changes in the size attribute.
Figure 6
Backward actionable explanation
for a single SEM image. The images
from the original and hypothetical lot (through GAN-based image manipulation)
are shown on the left, and the corresponding attribute changes that
led to an increase or decrease of the predicted peak stress are illustrated
in the bar chart on the right.
Backward actionable explanation
for a single SEM image. The images
from the original and hypothetical lot (through GAN-based image manipulation)
are shown on the left, and the corresponding attribute changes that
led to an increase or decrease of the predicted peak stress are illustrated
in the bar chart on the right.We also estimate the overall behavior of the entire lot by averaging
the backward explanation of all image titles from a given lot. As
shown in Figure ,
we can utilize a similar attribute change plot to illustrate the averaging
behaviors by showing the mean values. The plots show that the rule
we identified by examining a backward explanation of individual images
is consistent with the average behavior of the entire lot. The polydispersity,
size, and facetness behave consistently in increasing/decreasing of
the peak stress. Since both polydispersity and size attributes are
directly associated with the mean and variance of the crystal size, the reduction of size appears to be the most effective route to increase
the peak stress prediction.
Figure 7
Backward explain result for the given
lots by averaging the per-image
explanation.
Backward explain result for the given
lots by averaging the per-image
explanation.
Experimental Validation
of the Explanation Results
As discussed before, the explanations
generated by the proposed method,
especially the impact of particle size, align well with general materials
science domain knowledge.[46,47] Nevertheless, it remains
valuable to verify the explanations with physical experiments for
our specific TATB materials. We would like to experimentally verify
how the peak stress performance of a given lot changes following the
attGAN-generated guidelines. Analyses in the previous section show
that the particle size is a dominating factor in determining the material
peak stress, especially for the lots with lower peak stress values.
An example is given in Figure , which shows that altering particle size is expected to drastically
change the peak stress value of lot N. Note that lot N has the lowest
peak stress of 819.81 psi among all available 30 material lots.We aim to experimentally verify the attGAN explanations by reducing
the particle size of lot N and checking its peak stress change. The
particle size reduction was achieved via a grinding step, during which
large particles break and the average particle size reduces. The grinding
process was carried out using a jar mill with a 000-size (0.3 L nominal
volume) ceramic jar filled with 25 g of lot N powder, 50 g of water,
and 30 pcs of a half-inch diameter ceramic cylinder. The samples were
ground for 24 h before being filtered, collected, and dried to a constant
mass. After the grinding step, we reexamined the sample microstructure
by taking new SEM images and performed new mechanical tests to measure
the peak stress of the new ground samples.The microstructure
of the new ground sample is shown on the left
side of Figure . We
see that the grinding process is not perfect. The microstructure of
the ground sample is not identical with that generated by the attGAN
model, and size was not the only microstructure character that has
changed after grinding. However, grinding is the best experimental
approach currently available to us, and we do see a significant reduction
in sample particle size. The uniaxial compressive stress–strain
curve of the new ground sample is shown on the right side of Figure ; we observe a significant
upward shift of the stress–strain curve and a substantial gain
in the sample peak stress after grinding. The grinding operation alters
the peak stress of lot N from 819.81 to 2361.09 psi or from the lowest
peak stress of the 30 lots to surpassing the highest peak stress of
the 30 lots (lot AM, 2300.30 psi).
Figure 8
By grinding the lot with the lowest peak
stress, we are able to
obtain a sample that surpasses all original lots in terms of the peak
stress performance. The experimental result aligned with both the
explanation suggestion of a given material lot and the common wisdom.[50]
By grinding the lot with the lowest peak
stress, we are able to
obtain a sample that surpasses all original lots in terms of the peak
stress performance. The experimental result aligned with both the
explanation suggestion of a given material lot and the common wisdom.[50]Though the grinding experiment
is not perfect, it provides experimental
support for the explanation generated by our attGAN model: smaller
particle size leads to higher peak stress for the TATB material. We
also checked the peak stress prediction of the new ground lot N using
our previously trained forward model. The forward prediction of the
new ground lot N is around 1600 psi, which is in the right direction
(much higher than the unground lot N 819.81 psi) but not accurate
enough (less than the ground truth experimental result of 2361.09
psi). This discrepancy between the predicted and the true peak stress
is probably related to the fact that the grinding process might have
introduced microstructure features that did not exist in the original
training set. From a materials science perspective, it is not surprising
that the ground (thus deformed) TATB crystals behave differently than
the undeformed TATB crystals.[51,52] In machine learning
terms, these changes will lead to a distribution shift that would
impact the prediction accuracy.
Discussion
The
power of the proposed technique lies not only in its ability
to provide fine-grained (i.e., per-lot and per-image explanation)
explanations of the material behavior that is directly linked to human
understandable guidelines but also in its capability to visualize
the hypothetical target lot through synthesizing SEM images with given
material attributes. In this particular materials science application,
compared to the conventional wisdom, our method not only explicitly
confirms the impact of the crystal sizes but also produces a realistic
depiction of the appearance of the material for which the given predictive
model would predict to have a higher peak stress value. As a result,
the method enables a rapid and multifaceted analysis: from a single
instance of the prediction to the averaging behavior of the entire
lot and from the sensitivity of a single attribute to the joint influence
of multiple ones for achieving an optimal objective value.Our
approach is illustrated with a real-world materials optimization
application. By examination of the model explanations, a well-established
scientific insight (inverse correlation between size and sample strength)
was rediscovered. We note that the approach itself is general and
can be valuable in emerging applications where scientists have not
yet formed deep scientific insights by serving as a computation tool
to provide intuitive explanations without extensive experiments. Moreover,
since we obtain the explanations through controlling the attribute-aware
variation in the input data, compared to many state-of-the-art explanation
techniques, the proposed technique is not restricted to a specific
model and can be adapted to understand or compare the behavior of
different models.As with any newly developed techniques, our
approach has some shortcomings
that need future improvement. One particular challenge originates
from the potential distribution shift from the original images to
the reconstructed images (when we generate new image tiles using the
attributes associated with the corresponding lot). Even though a human
viewer often cannot discern any noticeable difference between the
original images and reconstructed ones, these unnoticeable changes
can lead to minor prediction shifts from the original ones. Moreover,
due to the way the regression model is built, we predict the peak
stress for the given lot based on a single SEM image tile, which leads
to built-in variation among predicted values generated from different
image tiles from the same lot. We are currently exploring other approaches
to build more robust regression models that capture the overall sample
quality from limited data (i.e, data-efficient model design[53]), a common obstacle in applying machine learning
to scientific data.For domain scientists to adopt the emerging
machine learning techniques,
domain-specific explanations of how machine learning models function
are essential. Without tangible and actionable information from machine
learning models, the overall scientific benefit that machine learning
can bring is limited. This work serves as a proof that it is possible
to directly reason about meaningful domain concepts and extract domain
insights in a complex machine learning pipeline. We believe that it
is an important step toward a credible use of machone learning in
material science applications that can potentially lead to new scientific
discoveries.
Method
SEM Image Data Acquisition
Our experiment
involves
30 different synthesis batches (referred to as lots in the context
of this work) of material samples, with each batch showing different
overall crystal characteristics. Each of the 30 lots is analyzed with
a Zeiss Sigma HD VP scanning electron microscope (SEM) using a 30
μm aperture, 2.00 keV beam energy, and ca. 5.1 mm working distance
to capture high-resolution images. The software Atlas is used to automate
the image collection. As illustrated in Figure , for each sample, the entire SEM stub surface
is mapped, and corresponding images are collected with slight overlap
to create a stitched mosaic of the full area. The field of each mosaic
tile is 256.19 μm × 256.19 μm with a pixel size of
256.19 nm × 256.19 nm (1024 × 1024 image size). In total,
we captured 69 894 sample images from 30 lots of TATB. These
images are then downselected by removing the ones with black margins
(i.e., at the edge of the scan) and other inconsistencies to ensure
the quality of the training and validation sets, which consist of
59 690 images with a resolution of 256 × 256 (downsampled
from 1024 × 1024 in the original image).
Material Mechanical Properties
Experiment
The stress
and strain mechanical properties are tested for each lot by uniaxially
pressing duplicate samples from each TATB powder lot in a cylindrical
die at ambient temperature to 0.5 in. diameter by 1 in. height, with
a nominal density of 1.800 g/cm3. Strain-controlled compression
tests were run in duplicate at 23 °C at a ramp rate of 0.0001
s–1 on an MTS Mini-Bionix servohydraulic test system
model 858 with a pair of 0.5 in. gauge length extensometers to collect
strain data. From the obtained stress–strain curve, only the
peak stress values were considered as the outputs of the machine learning
models, resulting in an image data set, in which the same properties
are assigned to all images (tiles) from the same lot.
Details of
the Peak Stress Prediction Model
The regression
model architecture for peak stress prediction is based on the Wide
ResNet model,[54] with a total of 28 convolutional
layers and a widening factor of 1, followed by an adaptive average
pooling layer. Since the Wide ResNet model was originally proposed
for classification, we also need to replace the final softmax layer with a fully connected regression layer tanh activation to predict continuous scalar values. Our implementation
is based on PyTorch.We set aside 10% of the training data for
validation, leaving a total of 53 721 training images and 5 969
validation images. All images are preprocessed by subtracting the
mean and dividing by the standard deviation. For data augmentation,
we do horizontal flips. We train the regression model with the mean-squared
error (MSE) loss function and the Adam optimizer[55] with a learning rate of 0.001 and a minibatch size of 64.
We used early stopping to terminate training when the validation performance
did not improve, and the whole training procedure stops in 48 epochs.Globally, the regression model achieved a root-mean-square error
(RMSE) of 66.0 and a mean absolute percentage error (MAPE) of 3.07%
across all lots. For each lot, the peak stress predictions versus
the ground-truth peak stress values are shown in Figure , where the error bars present
the standard deviation of predictions across images in the lot. The
root-mean-square error per lot is plotted in Figure .
Figure 9
Predicted and ground-truth values of peak stress
for different
lots.
Figure 10
RMSE of the predicted peak stress values
for different lots.
Predicted and ground-truth values of peak stress
for different
lots.RMSE of the predicted peak stress values
for different lots.
Detail of the Generative
Adversarial Network (GAN) Model
The generative adversarial
network (GAN)[33] has revolutionized our
ability to generate incredibly realistic
samples from highly complex distributions.[56,57] In general, a GAN transforms noise vectors (Z vectors
from a high-dimensional latent space) into synthetic samples I, resembling data in the training set. The GAN is learned
in an adversarial manner, in which a discriminator D(I) (differentiate real vs fake samples) and a generator G(Z) (produce realistic fake samples) are trained
together to compete with each other. One limitation of the standard
GAN is that the latent space is not immediately understandable, which
limits our ability to control the generated content. This problem
is partially addressed by conditional GAN that is conditioned on the
labels,[58] i.e., generate different types
of images by providing both a noise vector Z and a label L. Still, these models, like most GANs, are often extremely
hard to train and require a large number of samples for even moderately
complex data. Our initial attempts to apply conditional GAN on our
SEM image data with the lot indices or other properties as labels
were met with mixed results. On one hand, the GAN appears to produce
rather realistic-looking SEM images, but the generated images do not
reflect the corresponding labels on which they are conditioned. This
is likely due to an insufficient amount of images and labels as well
as the innate complexity of the SEM image data. To mitigate the training
challenge and to improve the control over generated contents, we turn
our focus to another class of GANs that make selective modifications
to existing images rather than generating them from scratch (i.e.,
transform a vector into the images). Instead of providing a noise
vector to the generator, these image editing GANs (e.g., attGAN[44]) take an input image along with the attributes A that describe the desirable changes (G(I, A)). For face images, such a GAN can be trained
to alter attributes, such as the color of the hair or the presence
of eyewear in the original image.The ability to modify SEM
images is achieved by training an AttGAN[44] with the material attribute labels provided by materials sciences
(the continuous attribute values between 0 and 1 are converted into
binary labels for training the attGAN). Compared to the training setup
for the celebrity image data set, the largest difference with SEM
images is the number of available labels. For the celebrity images,
labels are obtained for each individual face image. However, it is
the case for the SEM images, in which we only have a label for each
lot that contains a large number of images. We trained the attGAN
utilizing a PyTorch implementation with the same learning rate as
for the celebrity data set. The output image of the GAN is the same
as the input with a resolution of 256 × 256. We trained the model
for 70 epochs (around 40 h on an NVIDIA T4 GPU). The training cutoff
is determined by examining the sample results during training, where
additional epochs do not appear to improve the visual fidelity of
the generated modification. Despite the relatively lengthy time to
train the generative model, this process only incurs a one time cost.
The explanation method can then leverage the trained attGAN generative
model to produce an explanation in a timely fashion; e.g., it takes
1 min or less to produce an explanation of a given forward prediction.The training details for the GAN are shown in Figure . The attGAN jointly trains
the generator (contains both encoder and decoder), the discriminator,
and the classifier for predicting attributes. Please refer to the
original work[44] for more details about
the model architecture. Figure a illustrates the generator’s reconstruction
error. As the error decreases, the generator is able to produce more
realistic SEM images. (b) shows the discriminator’s adversarial
loss. The loss increases over training iterations denoting that the
generator is producing more realistic images, in turn fooling the
discriminator. (c) and (d) show the classification loss for predicting
the attribute labels from both the original (part of the overall training
loss for the discriminator) and the generated (part of the overall
training loss for the generator) SEM images. The decreasing and then
stabilizing behavior of the classifier losses indicates the jointly
trained classifier can accurately predict the attributes for both
the real and fake (generated) images. This also implies that the generator
can produce realistic modifications that can be correctly classified
by the same classifier that is predicting attributes from the real
images correctly. Moreover, the low and stable reconstruction indicates
the generator can reproduce realistic looking SEM images. These combined
observations provide evidence to support our claim that the GAN is
trained well and the quality of the image editing process is good
as showcased by the classifier performance on generated images.
Figure 11
Performance
curves for attGAN training. (a) The generator’s
reconstruction loss. As the error decreases, the generator is able
to produce more realistic SEM images. (b) The discriminator’s
adversarial loss, which increases over training iterations as the
generator is producing more realistic images, in turn fooling the
discriminator. (c) and (d) The classification loss for predicting
the attribute labels for both the original (part of the overall training
loss for the discriminator) and the generated (part of the overall
training loss for the generator) SEM images.
Performance
curves for attGAN training. (a) The generator’s
reconstruction loss. As the error decreases, the generator is able
to produce more realistic SEM images. (b) The discriminator’s
adversarial loss, which increases over training iterations as the
generator is producing more realistic images, in turn fooling the
discriminator. (c) and (d) The classification loss for predicting
the attribute labels for both the original (part of the overall training
loss for the discriminator) and the generated (part of the overall
training loss for the generator) SEM images.
Authors: Curt M Breneman; L Catherine Brinson; Linda S Schadler; Bharath Natarajan; Michael Krein; Ke Wu; Lisa Morkowchuk; Yang Li; Hua Deng; Hongyi Xu Journal: Adv Funct Mater Date: 2013-06-24 Impact factor: 18.808
Authors: Rushil Anirudh; Jayaraman J Thiagarajan; Peer-Timo Bremer; Brian K Spears Journal: Proc Natl Acad Sci U S A Date: 2020-04-20 Impact factor: 11.205