Christian M Heil1, Anvay Patil2, Ali Dhinojwala2, Arthi Jayaraman1,3. 1. Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Newark, Delaware 19716, United States. 2. School of Polymer Science and Polymer Engineering, The University of Akron, 170 University Avenue, Akron, Ohio 44325, United States. 3. Department of Materials Science and Engineering, University of Delaware, 201 DuPont Hall, Newark, Delaware 19716, United States.
Abstract
We present a new open-source, machine learning (ML) enhanced computational method for experimentalists to quickly analyze high-throughput small-angle scattering results from multicomponent nanoparticle mixtures and solutions at varying compositions and concentrations to obtain reconstructed 3D structures of the sample. This new method is an improvement over our original computational reverse-engineering analysis for scattering experiments (CREASE) method (ACS Materials Au2021, 1 (2 (2), ), 140-156), which takes as input the experimental scattering profiles and outputs a 3D visualization and structural characterization (e.g., real space pair-correlation functions, domain sizes, and extent of mixing in binary nanoparticle mixtures) of the nanoparticle mixtures. The new gene-based CREASE method reduces the computational running time by >95% as compared to the original CREASE and performs better in scenarios where the original CREASE method performed poorly. Furthermore, the ML model linking features of nanoparticle solutions (e.g., concentration, nanoparticles' tendency to aggregate) to a computed scattering profile is generic enough to analyze scattering profiles for nanoparticle solutions at conditions (nanoparticle chemistry and size) beyond those that were used for the ML training. Finally, we demonstrate application of this new gene-based CREASE method for analysis of small-angle X-ray scattering results from a nanoparticle solution with unknown nanoparticle aggregation and small-angle neutron scattering results from a binary nanoparticle assembly with unknown mixing/segregation among the nanoparticles.
We present a new open-source, machine learning (ML) enhanced computational method for experimentalists to quickly analyze high-throughput small-angle scattering results from multicomponent nanoparticle mixtures and solutions at varying compositions and concentrations to obtain reconstructed 3D structures of the sample. This new method is an improvement over our original computational reverse-engineering analysis for scattering experiments (CREASE) method (ACS Materials Au2021, 1 (2 (2), ), 140-156), which takes as input the experimental scattering profiles and outputs a 3D visualization and structural characterization (e.g., real space pair-correlation functions, domain sizes, and extent of mixing in binary nanoparticle mixtures) of the nanoparticle mixtures. The new gene-based CREASE method reduces the computational running time by >95% as compared to the original CREASE and performs better in scenarios where the original CREASE method performed poorly. Furthermore, the ML model linking features of nanoparticle solutions (e.g., concentration, nanoparticles' tendency to aggregate) to a computed scattering profile is generic enough to analyze scattering profiles for nanoparticle solutions at conditions (nanoparticle chemistry and size) beyond those that were used for the ML training. Finally, we demonstrate application of this new gene-based CREASE method for analysis of small-angle X-ray scattering results from a nanoparticle solution with unknown nanoparticle aggregation and small-angle neutron scattering results from a binary nanoparticle assembly with unknown mixing/segregation among the nanoparticles.
Directed
or self-assembly of nanoparticles in solution or near
surfaces/interfaces is used to engineer functional materials for optical,
biomedical, catalytic, and electronic applications.[1−5] For example, assembly of nanoparticles in thin film
or droplets (e.g., in emulsion assembly) into ordered three-dimensional
(3D) nanostructures[6−12] is used in photonic applications where the noniridescent structural
color can be adjusted by tuning the nanoparticle size and structural
order.[11,13−16] In these applications, the assembled
structure directly affects the resulting macroscopic properties, making
structural characterization an important step during development of
such functional materials.Assembled nanoparticle structures
are commonly characterized using
microscopy [e.g., transmission electron microscopy (TEM) and scanning
electron microscopy (SEM)] and small-angle scattering (SAS) techniques.[17−25] Although microscopy enables direct imaging of the nanoparticles,
the 2D images may probe a limited sample area and/or be limited by
instrument resolution.[19,20,26] More advanced microscopy techniques that can possibly obtain a 3D
representation of the sample often require extensive data collection
time and substantial data processing and analysis while still probing
limited length scales compared to SAS techniques.[27−29] Small-angle
neutron and X-ray scattering (SANS and SAXS) are used to analyze the
assembled structure with spatial resolution ranging from the nm to
μm scale and present ensemble averaged structural information.[26] The output from the SANS or SAXS measurement,
the scattering intensity as a function of the scattering wavevector I(q) vs q, includes the
nanoparticles’ form factor (i.e., size and shape) and the nanoparticles’
structure factor (i.e., spatial correlation).[18] Interpretation of I(q) vs q from dilute nanoparticle dispersions[30,31] (i.e., nonaggregating) have relied on analytical structure factor
models,[26,32] such as hard sphere[33,34] and sticky hard sphere.[35,36] However, the models
developed assuming fluid suspensions do not perform well at high packing
fractions above 0.4.[37,38] For aggregating particles, the
choice of structure factor model dictates the type of structural information
one can obtain (e.g., aggregation numbers and aggregates’ radii
of gyration).[39]The above limitations
of existing analytical models motivate the
development of a computational method that is more broadly applicable
without requiring the choice of a specific analytical model to fit
the scattering data or significant a priori characterization.
Further, it would be ideal if the computational method also provides
a representative 3D structural reconstruction that could then be used
as an input for other calculations (e.g., finite-difference time-domain
method for optical properties[40] and resistor
network model calculation for electrical conductivity[41]).Reverse Monte Carlo (RMC) simulation is a well-known
computational
method to analyze scattering data from nanoparticle dispersions. In
RMC simulations, the random particle moves are rejected or accepted
based on how much the particle move impacts the match between computed
and target (e.g., experimental) scattering intensity profiles.[42] With increasing number of particle moves, the
structure progressively migrates toward one with a closer match between
the computed and target scattering intensity profile. RMC simulations
are commonly used for nanoparticle solutions at low particle packing
fraction[43−45] because the particle move acceptance rate is low
at higher packing fraction.[46] MONSA is
an algorithm shown to identify various polymer-grafted nanoparticle
morphologies using an RMC-based approach to reconstruct the structure
by fitting a scattering profile curve.[25] However, most RMC-related methods are limited by their reliance
on a sequential, incrementally evolving structure that can require
a significant number of accepted particle moves (computational
time) to converge to the optimal structure (assuming no early convergence
to a local minimum).To overcome shortcomings in some of the
above RMC-related methods,
we developed our original analysis method called computational reverse-engineering
analysis for scattering experiment (CREASE) using a genetic algorithm
(GA)-based optimization that sampled many configurations simultaneously
in parallel simulations and iterated toward the “optimal”
structure(s) whose computed scattering profile(s) closely matched
the target scattering profile. Besides successful application to analyzing
structure in assembled polymer solutions,[47−51] this CREASE method also worked well for analyzing
scattering from polydisperse nanoparticles at high
nanoparticle packing fraction (near the close-packed limit) for varying
particle size dispersity, mixture composition, and chemical interactions
in binary nanoparticle mixtures.[52] In the
case of nanoparticle systems, the key drawback of the original CREASE
method was the representation of every individual as a 3D configuration
(i.e., each individual stores the coordinates of all particles) and
the creation of new individual configurations being dependent on stochastic
particle swaps, much like RMC methods. These drawbacks led to the
method being computationally slow and manifested in poor performance
for the case of high packing fraction nanoparticle mixtures with strongly
segregated particle domains.To improve the original CREASE
method, we draw inspiration from
other implementations of CREASE from our laboratory that have used
gene-based representation of individuals and successfully analyzed
experimental and in silico generated target scattering
profiles of micelles and vesicles.[47,49−51] The micelle or vesicle structure is represented using a set of genes
that include relevant structural dimensions (e.g., micelle diameter,
vesicle core diameter, wall thickness, and dispersity) and parameters
of interest (e.g., micelle aggregation number, ratio of molecules
in one domain versus another). Using this idea of genes to represent
relevant features of the structure, we present here a new gene-based
CREASE method (Figure ) to analyze the structural arrangement of binary nanoparticle mixtures
and one-component nanoparticles solution at low and high packing fractions.
The switch to a gene-based description of an individual from the cumbersome
3D configurations used before[52] enables
faster optimization toward the best structure whose computed scattering
matches the target scattering. Furthermore, the gene-based CREASE
also facilitates the incorporation of machine learning (ML) that directly
links the genes defining the structure to a scattering intensity profile,
bypassing the need to create a 3D structure with nanoparticles to
calculate the computed scattering intensity profile using the Debye[53,54] scattering equation.[50]
Figure 1
Overview of the new gene-based
CREASE workflow showing the input,
how the structural information is stored as genes, two options for
calculating the Icomp(q), checking for fitness convergence, population evolution, and finally,
the types of output information.
Overview of the new gene-based
CREASE workflow showing the input,
how the structural information is stored as genes, two options for
calculating the Icomp(q), checking for fitness convergence, population evolution, and finally,
the types of output information.We first validate our new gene-based CREASE approach using in silico scattering profiles from binary nanoparticle assemblies
(with differing nanoparticle mixing/demixing, composition, and size
dispersity) and one-component nanoparticle solutions (with varying
nanoparticle aggregation, concentration, and size dispersity). Our
incorporation of ML into the gene-based CREASE method significantly
reduces computational time, maintains the performance similar to the
non-ML version of the gene-based CREASE, and is general enough to
analyze scattering profiles from systems beyond those that were used
for ML training. Finally, we apply our gene-based CREASE method on
an in vitro (i.e., from SAXS and SANS experiments)
scattering profile of aggregating nanoparticles where both the extent
of aggregation and nanoparticle size characteristics are unknown and
of binary nanoparticle assembly with unknown nanoparticle mixing/segregation.The ML augmented gene-based CREASE method described in this paper
will be a broadly applicable computational method for experimentalists
to quickly analyze high-throughput scattering results and obtain reconstructed
3D structures of the sample.
Results and Discussion
We first
validate the new gene-based CREASE method by showing that
it can produce the correct structure when used to analyze Itarget(q) obtained from in silico experiments (i.e., molecular simulations) where
the target structure corresponding to the Itarget(q) is known. The in silico experiments
contain both binary nanoparticle assemblies and one-component nanoparticle
solutions to illustrate the versatility of the gene-based CREASE method.The binary (A and B) nanoparticle assemblies are produced using
molecular dynamics simulations with the details of the protocol described
in our earlier work.[12] We consider a range
of systems with A-type volume compositions of 25%v
and 50%v, weak and medium nanoparticle demixing, and nanoparticle
size dispersity of 9% and 20%. We select the specific systems to briefly
illustrate that this gene-based CREASE method outperforms the previous
original CREASE method[52] while requiring
less specific inputs. The one-component nanoparticle solution systems
are produced by placing nanoparticles to achieve a range of nanoparticle
aggregation from disperse to strongly aggregating with nanoparticle
concentrations by volume from 10%v to 50%v and nanoparticle size dispersity
from 10% to 20%. The one-component nanoparticle solution systems target
radial distribution function (RDF) and Itarget(q) is the average of 10 structures with similar
characteristics (e.g., nanoparticle aggregation, concentration, and
size distribution) to incorporate the variability in the structure.
For these in silico systems, we compare the RDF returned
by CREASE against that of the target structure, and we quantify the
RDF match quality by determining the percent error between the CREASE
output structure’s RDF and the target structure’s RDF.For all systems, we perform three independent gene-based CREASE
runs and compare the average and standard deviation from the three
gene-based CREASE runs against the target structure. All visualizations
are created using the VMD software.[55]After validation, we apply the new gene-based CREASE method to
analyze Iexp(q) vs q, from a SAXS measurement on a dilute, aggregating nanoparticle
solution for which the extent of nanoparticle aggregation in unknown a priori and SANS measurements on a densely packed mixture
of melanin and silica nanoparticles for which the extent of mixing
between the two particles in unknown.
New Gene-Based CREASE Method
vs Original CREASE Method for Analysis
of I(q) vs q from
Binary Nanoparticle Mixtures
We first highlight (briefly)
the key differences between the new gene-based CREASE method and our
original CREASE method.[52] Additional details
on this comparison are provided in the ESI Section II.In the original CREASE, we described each individual
in the GA as a 3D structure storing each nanoparticle’s identity
and position; the original CREASE method also required the user to
input the exact particle size distributions (defined as the average
particle diameter and dispersity for a log-normal distribution) and
nanoparticle binary mixture composition (in volume%). In contrast,
in the new gene-based CREASE method, we define an individual as a
collection of genes that identify a nanoparticle structure in a lower
dimensional space than the positions of all nanoparticles (see ESI Section I). Further, the new gene-based CREASE
does not require knowing the exact particle size distributions, composition
(for binary mixture at high packing fraction) or concentration (for
one-component nanoparticle solution) and can provide these as output!
The gene-based description of individuals enables exploration of the
phase space (degree of mixing/demixing, particle size distributions)
more rapidly than the 3D configuration-based description of individuals
in the original CREASE. Lastly the gene-based CREASE can utilize ML
models that link the individual’s genes to a computed I(q) vs q, significantly
reduce the computational time, and make the method more general.We begin our comparison between the original and gene-based CREASE
methods using target scattering profiles obtained from in
silico experiments (where we know the structure) as it enables
us to elucidate how well both methods reproduce the known target structure.
To evaluate the performance of the gene-based CREASE method further,
we consider two scenarios:where the user inputs the exact particle
size distribution, but not the composition of the particle mixture
and allows the gene-based CREASE to converge to the optimal composition
of the nanoparticle binary mixture (or concentration in case of 1-component
nanoparticle solutions) in addition to the optimal structure; andwhere the user does not
input the
particle size distribution or particle mixture composition and lets
the gene-based CREASE output the optimal particle size distributions
and mixture composition/nanoparticle concentration in addition to
the optimal structure.The first scenario
shows how the gene-based CREASE method outperforms
the original CREASE with similar user input. The second scenario shows
the superiority of the gene-based CREASE in exhibiting excellent performance
even with less user input than the original CREASE method.Figure shows the
performance of both CREASE methods on Itarget(q) from one specific binary nanoparticle mixture
with 25% v A-type particle assembled within a spherical
confinement. The weak and medium demixing in the target structures
were brought about by modeling pairwise A–A and B–B
interactions using the colloid Lenndard-Jones potential[56] with Hamaker constants of 0.25 kBT (weak demixing) or 0.3 kBT (medium demixing) and A–B interactions
using Hamaker constants of 0.2 kBT(weak and medium demixing). Figure a,b provides the scattering profiles of the A-type and B-type particles, respectively; in
experiments one would obtain these through variations in contrast
matching.[57] The weak demixing system is
offset in the y-axis direction from the medium demixing
system to show how increasing the particle demixing impacts the scattering
profile shape especially at low to medium q values.
The Icomp(q) of the best
structure from the original CREASE method (red curve) shows notable
deviations from the Itarget(q) while both gene-based CREASE methods’ Icomp(q) more closely match the Itarget(q). This is confirmed
by the higher χ[2] value of the original
CREASE’s Icomp(q) than the new gene-based CREASE methods’s Icomp(q) (ESI Figure S1).
Figure 2
Comparison of the original and new gene-based CREASE methods on
scattering profiles from an in silico nanoparticle
mixture assembled in spherical confinement (using the simulation approach
from ref (12)). The
systems considered include binary (A and B) nanoparticles mixtures
with 25%v A-type nanoparticle, weak or medium demixing
of A and B nanoparticles, and an average nanoparticle diameter of
220 nm with 9% particle size dispersity (log-normal distribution)
for both A and B nanoparticles. In all plots, curves from the target
structure are in black, those from the original CREASE method are
in red, those from the gene-based CREASE method with given particle
size distribution as input are in blue, and those from the gene-based
CREASE method without the particle size distribution as input are
in purple. (e,f) Representative 3D renderings of the target structure
and each of the CREASE method results. For all the CREASE methods,
we provide the average and standard deviation from three independent
CREASE runs.
Comparison of the original and new gene-based CREASE methods on
scattering profiles from an in silico nanoparticle
mixture assembled in spherical confinement (using the simulation approach
from ref (12)). The
systems considered include binary (A and B) nanoparticles mixtures
with 25%v A-type nanoparticle, weak or medium demixing
of A and B nanoparticles, and an average nanoparticle diameter of
220 nm with 9% particle size dispersity (log-normal distribution)
for both A and B nanoparticles. In all plots, curves from the target
structure are in black, those from the original CREASE method are
in red, those from the gene-based CREASE method with given particle
size distribution as input are in blue, and those from the gene-based
CREASE method without the particle size distribution as input are
in purple. (e,f) Representative 3D renderings of the target structure
and each of the CREASE method results. For all the CREASE methods,
we provide the average and standard deviation from three independent
CREASE runs.Figure c provides
the ensemble averaged real-space structure in the form of pairwise
radial distribution functions (RDFs) for the target structure from
a weakly demixed nanoparticle mixture; representative 3D visualizations
of the target and CREASE structures are in Figure e. We quantify the percent error between
the RDFs of the target structure and the optimized structure from
each CREASE method in ESI Figure S1. In
addition to the optimized structures, the gene-based CREASE method
with the particle size distribution known via input also outputs the
optimal composition of the structure as 24.2 ± 2.8%v A-type (the error 2.8% describes the variability between multiple
CREASE runs). The gene-based CREASE with the particle size not known
as input converges to an A-type particle diameter of
220.3 ± 2.4 nm, A-type particle diameter dispersity
of 9.4 ± 1.1%, B-type particle diameter of 217.9
± 3.3 nm, B-type particle diameter dispersity of
10.7 ± 0.7%, and a 27.9 ± 2.6%v A-type composition.
For both scenarios of input (unknown or known information), the gene-based
CREASE method achieves a composition that is quantitatively close
to the composition in the target structure. Strikingly, even when
the particle size distribution is not input to the gene-based CREASE,
the gene-based CREASE converges to a particle size distribution close
to the target values (220 nm diameter and 9% dispersity). However,
if the exact particle size distribution is not input, we expect the
resulting RDF to not match the target RDF as well as when the size
distribution is input. For this weakly demixed target structure, the
original CREASE method and the gene-based CREASE method with the particle
size distribution input achieve similar error (ESI Figure S1). Not surprisingly, the gene-based CREASE method
with the particle size distributions not input has the largest RDF
percent error and also exhibits a minor shift in the primary RDF peak
location because the particle size distribution is not same as that
of target structure. It is important to note that the gene-based CREASE
method (regardless of whether the particle size distribution was input
or not) requires only ∼46% of the time needed by the original
CREASE method and less input from the user!Figure d,f shows
similar information as Figure c,e but for the medium demixing target structure; how well
the gene-based CREASE method performs with increasing extent of particle
demixing is important to study as our past work showed that the original
CREASE method’s performance decreased with increasing degree
of demixing.[52] Because the nanoparticle
system has an asymmetric composition, the A–A RDF contains
the most noticeable differences in the RDF profiles from increasing
the particle demixing. The gene-based CREASE with the particle size
input achieves a lower RDF percent error than the original CREASE
method, with an especially good match to the target A–A RDF
primary peak, as quantified in ESI Figure S1. Similar to weak demixed case, the gene-based CREASE method with
the particle size distribution not input has the largest RDF percent
error. Besides error resulting from slight difference in particle
size distributions, this increased error demonstrates the not surprising
trade-off of including less user input. In addition to the structure,
the gene-based CREASE with the particle size distribution input converges
to a composition of 25.5 ± 2.0%v A-type particle,
and the gene-based CREASE without the particle size distribution input
converges to an A-type particle diameter of 219.9 ±
3.2 nm, A-type particle diameter dispersity of 9.2 ±
0.4%, B-type particle diameter of 219.8 ± 0.7 nm, B-type particle diameter dispersity of 9.8 ± 0.5%, and
a 26.4 ± 2.9%v A-type composition. While the resulting
structure is qualitatively similar for the two gene-based CREASE methods,
there is a loss in the quantitative match to the target structure
because the gene-based CREASE method with less input has to converge
to more outputs - the particle structure, particle composition, and
particle size distributions. Thus, we suggest users include as much
information as possible for cases where a close quantitative match
is desired.Figure highlights
how the gene-based CREASE method outperforms the original CREASE method
even with increasing particle size dispersity; controlling dispersity
in particle size during synthesis remains a challenge in experiments.[58,59] In Figures a,b,
the Icomp(q) of the best
structure from the original CREASE method shows notable deviations
from the Itarget(q) while
both the gene-based CREASE methods with/without the nanoparticle size
distributions input closely match the Itarget(q), also confirmed by the higher χ2 value of the original CREASE’s Icomp(q) than the new gene-based CREASE methods’s Icomp(q) (ESI Figure S2).
Figure 3
Similar to Figure except this figure is focused on 50%v A-type nanoparticle,
medium demixing of A and B nanoparticles, and 9 or 20% nanoparticle
size dispersity.
Similar to Figure except this figure is focused on 50%v A-type nanoparticle,
medium demixing of A and B nanoparticles, and 9 or 20% nanoparticle
size dispersity.For the 9% particle size
dispersity target structure, Figure c illustrates that
the gene-based CREASE method with the particle size distributions
input achieves the closest RDF match to the target structures (also
confirmed by the percent error plotted in ESI Figure S2). The gene-based CREASE with the particle size input,
over three independent replicates, converges to a composition of 48.0
± 1.6%v A-type; the gene-based CREASE without the
particle size input converges to an average A-type particle
diameter of 221.7 ± 3.4 nm, A-type particle diameter
dispersity of 9.8 ± 0.8%, average B-type particle
diameter of 220.1 ± 2.2 nm, B-type particle diameter
dispersity of 9.4 ± 1.2%, and a mixture composition of 47.1 ±
5.9%v A-type particle.For the 20% particle size
dispersity target structure, the original
CREASE method produces a structure with larger RDF percent errors
(ESI Figure S2) than both of the gene-based
CREASE outputs (Figures d and S2). Our past work showed that the
original CREASE method performs poorly when the degree of particle
demixing is high, and the increased particle size dispersity worsens
that poor performance.[52] Unsurprisingly,
the structure generated from the gene-based CREASE with the particle
size distributions input results in the lowest RDF percent error.
In addition, the gene-based CREASE with the particle size input converges
to a mixture composition of 48.4 ± 2.5%v A-type;
the gene-based CREASE with the particle size not input converges to
an average A-type particle diameter of 220.1 ± 5.2
nm, A-type particle diameter dispersity of 20.5 ±
2.4%, average B-type particle diameter of 223.6 ±
5.4 nm, B-type particle diameter dispersity of 21.8 ±
1.1%, and a mixture composition of 43.9 ± 2.4%v A-type particle.ESI Figure S3 provides
additional comparisons
between the original and gene-based CREASE methods for binary nanoparticle
mixtures similar to Figure but with weak particle demixing. Consistent with Figures and 3, the original CREASE method achieves the worst match between
the computed and target scattering profiles (also confirmed with χ2 values in ESI Figure S4). The
original CREASE method (which was shown to perform reasonably well
for weak demixing systems) and gene-based CREASE method with the particle
size distributions input return structures with similar RDF percent
error (ESI Figure S4). The gene-based CREASE
with particle size distributions not input achieves a qualitatively
similar local structure as the target assembly with quantitative differences
largely resulting from the minor differences in particle size distributions
that the gene-based CREASE method converges to.So far, we have
demonstrated that our new gene-based CREASE method
is a significant improvement over the original CREASE method. It improves
the accuracy of real-space structure determination for binary nanoparticle
mixtures, especially with increasing particle demixing. The gene-based
CREASE method signficantly reduces the computational time compared
to the original CREASE method, even without the ML-based enhancement
we describe in the next section. Lastly, the gene-based CREASE method
predicts additional information about the nanoparticle sample (e.g.,
average particle size and dispersity, mixture composition) which would
be greatly useful to users who have limited knowledge about the system
that was used in the scattering experiment. It was also seen, not
surprisingly that the gene-based CREASE method performs better quantitatively
if the user provides more input (e.g., particle size information).In the next section, we expand our validation of the gene-based
CREASE method to another experimentally relevant system of one-component
nanoparticles (e.g., one-component nanoparticle solutions or polymer
nanocomposities with the scattering primarily from the nanoparticles
like in ref.[60]).
Gene-Based CREASE Methods
Applied to I(q) vs q from One-Component Nanoparticle
Solutions with Varying Nanoparticle Aggregation and Concentration
We select three (one-component) nanoparticle solution concentrations
that span from dilute to concentrated, and for each nanoparticle concentration,
we consider three different nanoparticle degrees of aggregation from
dispersed (nonaggregating) to strongly aggregating. We also demonstrate
incorporation of machine learning (ML), specifically a trained artificial
neural network (ANN), into the gene-based CREASE method to evaluate
the Icomp(q) for each
individual directly from the individual’s genes rather than
using the computationally intensive Icomp(q) calculation with 3D structure creation and pairwise
sums in the Debye equation. The details of how the ANN is trained
is presented in the ESI Section I and Section VII; briefly, the ANN is trained to relate
an individual’s genes and the q value to the
resulting Icomp(q) at
that q value. ESI Figure S5 shows we need an order of magnitude less time (∼95% faster)
for the ANN-evaluated gene-based CREASE method vs the Debye-equation-evaluated
gene-based CREASE method.Next, we discuss the performance of
the ANN-evaluated gene-based CREASE method compared to the explicit
Debye-equation-evaluated gene-based CREASE methods (exclusively utilized
up to this point in Figures and 3).In Figure a–c,
we provide the nanoparticles’ scattering profiles spanning
two decades of q values to include scattering from
low q values (large distances) containing structural
information and high q values (small distances) incorporating
particle size and dispersity data. At intermediate q values (∼0.008–0.02 nm–1), the degree
of nanoparticle aggregation influences the scattering profile shape
with nonaggregating systems exhibiting a plateau I(q) value while aggregating systems have a negative
slope in the I(q) vs q curve. At the lowest q values, we observe the emergence
of minor oscillations in the scattering profiles due to finite system
sizes used for the target structure and the 3D structure constructed
for the converged final genes in the gene-based CREASE method.[61] The finite system size effects can be mitigated
at lower q values by considering larger system sizes
or only utilizing those q values that have minor
to negligible system size effect-driven oscillations. Regardless,
all three gene-based CREASE methods converge to structures with Icomp(q) that closely matches
the Itarget(q). We quantify
the quality of fit of the scattering profile matches in ESI Figure S6 (10%v concentration), ESI Figure S7 (30%v concentration), and ESI Figure S8 (50%v concentration). In those
ESI Figures, we show the converged nanoparticle size distribution
(if not input) and the nanoparticle concentration to demonstrate that
all three gene-based CREASE methods converge to particle size distributions
close to target structures’ values. For the 50%v concentration,
the Debye-equation-evaluated gene-based CREASE methods (blue and purple)
converge to values closer to the target concentration than the ANN-evaluated
gene-based CREASE method.
Figure 4
Evaluation of the gene-based CREASE method on
scattering profiles
from simulated one-component nanoparticle solutions with varying degrees
of nanoparticle aggregation (disperse, weakly aggregated, and strongly
aggregated) and nanoparticle concentration (10%v, 30%v, and 50%v)
for nanoparticles with average diameter of 220 nm and 10% particle
size dispersity. (a–c) I(q) vs q for varying nanoparticle concentration (10%v,
30%v, and 50%v); shifted curves represent varying degrees of aggregation.
(d–f) Nanoparticle radial distribution functions (RDFs). In
all parts, the curve for the target structure is in black, the gene-based
CREASE method with the particle size distribution input is in blue,
the gene-based CREASE method without the particle size distribution
input is in purple, and the ANN-evaluated gene-based CREASE method
is in green. The inset images in (d–f) show representative
renderings of the target structure and an output structure from the
gene-based CREASE method with the nanoparticle size distribution input.
For all the CREASE methods, we provide the average and standard deviation
from three independent CREASE runs.
Evaluation of the gene-based CREASE method on
scattering profiles
from simulated one-component nanoparticle solutions with varying degrees
of nanoparticle aggregation (disperse, weakly aggregated, and strongly
aggregated) and nanoparticle concentration (10%v, 30%v, and 50%v)
for nanoparticles with average diameter of 220 nm and 10% particle
size dispersity. (a–c) I(q) vs q for varying nanoparticle concentration (10%v,
30%v, and 50%v); shifted curves represent varying degrees of aggregation.
(d–f) Nanoparticle radial distribution functions (RDFs). In
all parts, the curve for the target structure is in black, the gene-based
CREASE method with the particle size distribution input is in blue,
the gene-based CREASE method without the particle size distribution
input is in purple, and the ANN-evaluated gene-based CREASE method
is in green. The inset images in (d–f) show representative
renderings of the target structure and an output structure from the
gene-based CREASE method with the nanoparticle size distribution input.
For all the CREASE methods, we provide the average and standard deviation
from three independent CREASE runs.In Figure d–f,
we find that the Debye-equation-evaluated gene-based CREASE methods
converge to structures that have similar local ordering as the target
structure. We quantify the RDF percent error in ESI Figures S6–S8 (10%–50%v concentration) to
confirm the quantitative differences. The lower match to target RDF
for the ANN-evaluated compared to Debye-equation-evaluated gene-based
methods is not surprising as the ANN interpolated scattering calculation
is not as precise as the explicit Debye scattering equation, and the
value of the nanoparticle concentration is also mildly incorrect.
However, despite these shortcomings, the ANN returns a structure with qualitatively similar degree of aggregation as target (dispersed
for a dispersed target or strongly aggregated for a strongly aggregated
target) with significantly reduced computational time as compared
to Debye-equation-evaluated method.We now validate our gene-based
CREASE method on target structures
with a higher nanoparticle size dispersity of 20% in Figure . The ANN used in Figure was trained with
data from systems with smaller nanoparticle size dispersity (7–13%),
so a new ANN is needed for systems with larger size dispersity. We
consider two approaches, one where the new ANN is trained solely on
data with size dispersity of 17–23% and another where the ANN
is trained on data with a broader range of dispersity (both the 7–13%
and 17–23% data). The results in Figure describe the performance of the former ANN
models and in ESI Figure S11 the performance
of the latter.
Figure 5
Similar to Figure , except this figure is focused on 20% particle size dispersity
and
10%v and 50%v nanoparticle concentrations.
Similar to Figure , except this figure is focused on 20% particle size dispersity
and
10%v and 50%v nanoparticle concentrations.At 20% particle size dispersity (Figure a,b) we observe the expected trend of less
prominent features in the scattering profiles especially at high q values than at 10% particle size dispersity; having less
prominent features in the scattering profile is a challenge when analyzing
scattering results of increasingly polydisperse systems. All three
gene-based CREASE methods converge to Icomp(q) that closely match the Itarget(q). We quantify the fit of the scattering
matches as well as the converged values of nanoparticle size distribution
and nanoparticle concentration for all three gene-based CREASE methods
in ESI Figure S9 (10%v system) and S10 (50%v system). As expected, the increased
nanoparticle size dispersity reduces the ability for the gene-based
CREASE methods to achieve the exact target nanoparticle size distributions
and concentrations; the converged particle size distributions and
concentrations are qualitatively near the target values but not precise
matches. For the average nanoparticle size, the CREASE methods without
the nanoparticle size input (purple and green) converge to a value
within a few nanometers of the target value (within ∼3% error).
Interestingly, those two gene-based CREASE methods excel at determining
the target nanoparticle size dispersity (within a standard deviation
from the average dispersity returned). In ESI Figures S9 and S10, all three gene-based CREASE methods converge
to concentration values with larger errors than for the systems with
10% nanoparticle size dispersity (ESI Figures S6–S8).The nanoparticle RDFs in Figure c-d shows that all three gene-based
CREASE methods
achieve excellent (blue and purple curves) to reasonable (green curve)
agreement for both 10%v and 50%v; particularly for the 50%v concentration
systems, the ANN-evaluated gene-based CREASE method also performs
well. Among the two Debye-equation-evaluated gene-based CREASE methods,
when the nanoparticle size distribution is input, it helps the method
converge to a closer RDF match with target structure RDF, otherwise
it converges to slightly different nanoparticle size distribution.
The ANN-evaluated gene-based CREASE method converges to structures
with similar average RDF values as the target RDF; however, as the
nanoparticle size dispersity increases there is larger variability
in the output RDF, shown with larger error bars. We quantify the RDF
percent error for the 10%v and 50%v systems in ESI Figures S9 and S10, respectively. Overall, we observe
a similar performance of the three gene-based CREASE methods for target
structures at 20% nanoparticle size dispersity as we saw at the 10%
nanoparticle size dispersity in Figure .In ESI Figure S11 we show that the ANN-evaluated
gene-based CREASE if the ANN were trained using all training data
generated for the two ANNs used in Figures and 5. The Icomp(q) and Itarget(q) agree with the match quantified
in ESI Figures S12 (10%v system) and S13 (50%v system). The ANN trained over all training
data does not converge to closer values for the nanoparticle size
distribution or concentrations than the dispersity specific ANN. Overall,
the RDF matches are of similar quality between the two ANNs because
one ANN will achieve a closer match for one system while the other
ANN will perform better for a different system. Thus, while training
a single ANN over a wider dispersity does not improve performance,
it does not reduce accuracy, and it enables a user to rely on a single
ANN over a much wider size dispersity range.In Figures and 5, we demonstrate how the gene-based CREASE method
can utilize a pretrained ANN to output structures significantly faster
than using the Debye scattering equation. Regardless of the nature
of the data used for the ANN training, there is a computational cost
to generate the data to train the ANN (ESI Figure S5). To generalize the ANN for various particle sizes, during
the ANN training, we normalize the q value input
by 2π divided by the nanoparticle average diameter gene instead
of using the training data specific values. This normalization forces
the ANN to learn the Icomp(q) value as a function of the normalized q values
ranging from less than 1 to greater than 1, making the trained ANNs
nanoparticle size agnostic. Therefore, a single trained
ANN can be applied to target systems with different nanoparticle sizes
without the need to obtain new data and train a new ANN. In Figure , we demonstrate
how the ANN previously trained on data from 220 ± 10 nm diameter
particles with 10 ± 3% size dispersity (used in Figure ) can be directly applied to
a target system with 130 nm diameter particles with 10% size dispersity.
Figure 6
Similar
to Figures and 5, except this figure is for systems
with an average nanoparticle diameter of 130 nm with 10% particle
size dispersity and 50%v nanoparticle concentration.
Similar
to Figures and 5, except this figure is for systems
with an average nanoparticle diameter of 130 nm with 10% particle
size dispersity and 50%v nanoparticle concentration.Figure a
illustrates
the scattering profiles of the target structures and the computed
scattering profile of the ANN-evaluated gene-based CREASE method.
Because the average particle diameter is smaller than in Figures and 5, the q range is shifted to higher q values to still span two decades without extrapolating
to normalized q values that the ANN was not trained
on. To understand why the q range shifts, in Figure for average diameter
of 220 nm, the lowest q value is 0.001 nm–1 leading to a normalized q value of 0.001/(2π/220)
≈ 0.35; in Figure , the average diameter is 130 nm, so the lowest q value is qlow ≈ 0.00169 nm–1.We compare the closeness of the scattering
fit values to those
from the 220 nm diameter 50%v systems in Figure (shown in ESI Figure S8), and we find that the ANN achieves a similar quality of
scattering match in both cases. Figure b provides the local structure comparison through the
RDF calculation, and ESI Figure S14 illustrates
the RDF percent error. The ANN-evaluated gene-based CREASE method
converges to structures with a close match to the target structures.
Additionally, the RDF percent error for results in Figure is quantitatively similar
to that for Figure highlighting that the ANN-evaluated gene-based CREASE does not suffer
performance degradation for systems with nanoparticle diameters different
from the ones the ANN is trained on.We note, however, that
our explanation as to why the q range changes for Figure compared to Figure was primarily to
illustrate that even though the ANN is particle size agnostic, the ANN should only be applied within
the limits of the trained range of normalized q values.
CREASE Performance
on Scattering from Experimental One-Component
(Melanin) Nanoparticle Solution and Binary (Melanin and Silica) Nanoparticle
Assembly
First, we apply the validated gene-based CREASE
method to analyze Iexp(q) vs q, from a SAXS measurement on a dilute, weakly
aggregating nanoparticle solution (Figure ). For this system, the exact particle size
distribution and degree of aggregation are unknown, so the gene-based
CREASE must also optimize those parameters in addition to the real-space
structure. In Figure a the shape of Iexp(q) suggests that the solution exhibit some degree of aggregation.
Specifically, the Iexp(q) in the q range from ∼0.008 to ∼0.02
nm–1 exhibits a steady decline unlike nonaggregating
systems that are flat over that range (compare the Iexp(q) shape in Figure a to the Itarget(q) shapes in Figure a). ESI Figure S15 shows
that all three CREASE variations achieve a similar goodness of fit
(χ2 value) and converge to a similar average nanoparticle
and size dispersity. The main difference between the three gene-based
CREASE variations is the nanoparticle concentration as the gene-based
CREASE method without the concentration input converges to ∼3.3%v
instead of the experimental estimate used as input, ∼ 0.77%v.
Figure 7
Application
of the gene-based CREASE method on scattering profiles
from an experimental system of one-component (melanin) nanoparticle
solution with ∼0.77%v nanoparticle concentration. (a) Iexp(q) in black, Icomp(q) from the gene-based CREASE method
with the nanoparticle concentration input in blue, Icomp(q) from the gene-based CREASE method
without the nanoparticle concentration input in purple, and the Icomp(q) from the ANN-evaluated
gene-based CREASE method with concentration input in green. The Iexp(q) comes from a slit-smeared,
small-angle X-ray scattering experiment, and as such, the Icomp(q) curves shown incorporate
instrument smearing effects (as described in the Methods). (b) Renderings of representative output structures
from the three gene-based CREASE methods. (c) Nanoparticle radial
distribution functions (RDFs) and (d) Average number of nanoparticles
per aggregate; the gene-based CREASE method with the nanoparticle
concentration input is in blue, the gene-based CREASE method without
the nanoparticle concentration input is in purple, and the ANN-evaluated
gene-based CREASE method with concentration input is in green. For
all three gene-based CREASE methods, we provide the average and standard
deviation from three independent runs.
Application
of the gene-based CREASE method on scattering profiles
from an experimental system of one-component (melanin) nanoparticle
solution with ∼0.77%v nanoparticle concentration. (a) Iexp(q) in black, Icomp(q) from the gene-based CREASE method
with the nanoparticle concentration input in blue, Icomp(q) from the gene-based CREASE method
without the nanoparticle concentration input in purple, and the Icomp(q) from the ANN-evaluated
gene-based CREASE method with concentration input in green. The Iexp(q) comes from a slit-smeared,
small-angle X-ray scattering experiment, and as such, the Icomp(q) curves shown incorporate
instrument smearing effects (as described in the Methods). (b) Renderings of representative output structures
from the three gene-based CREASE methods. (c) Nanoparticle radial
distribution functions (RDFs) and (d) Average number of nanoparticles
per aggregate; the gene-based CREASE method with the nanoparticle
concentration input is in blue, the gene-based CREASE method without
the nanoparticle concentration input is in purple, and the ANN-evaluated
gene-based CREASE method with concentration input is in green. For
all three gene-based CREASE methods, we provide the average and standard
deviation from three independent runs.Figure b shows
that despite the differences in concentration, all the three methods
visually return similarly small aggregates. We characterize the real-space
structure in Figure c-d by considering both the nanoparticle RDF and the nanoparticle
aggregation number (i.e., the number of nanoparticles in each aggregate).
The three methods converge to structures with a similar RDF shape
though the peak value is different when the gene-based CREASE method
does not have an input concentration. The difference in peak value
is a result of the concentration differences because the RDF is normalized
by the number of particles. The higher concentration system (∼3.3%v)
is normalized by that larger value (∼3.3%v) as compared to
∼0.77%v leading to a lower RDF peak value for that purple curve. ESI Figure S15 illustrates that the RDFs are
the same if we normalize the CREASE variations as if they all had
the same concentration. Figure d shows that the three methods achieve nearly identical average
nanoparticle aggregation size and standard deviation sizes. The nanoparticle
aggregation number is a metric for quantifying nanoparticle clusters
that, unlike the RDF calculation, is not as impacted by system concentration.
Thus, CREASE is consistently able to return quantitatively similar
structures though with some variability in the concentration for such
a low concentration in the experimental system.Next, we apply
the validated gene-based CREASE method to analyze Iexp(q) vs q, from a
SANS measurement on a binary (melanin and silica) nanoparticle
mixture assembled into a spherical supra-structure (Figure ). For this system, the gene-based
CREASE determines the melanin and silica nanoparticles’ size
distribution and extent of mixing between melanin and silica nanoparticles
within the real-space structure. We take as input both the total scattering
from melanin and silica nanoparticles (Figure a) as well as the scattering from just the
silica nanoparticles (Figure b). We note that the gene-based CREASE fits both scattering
profiles concurrently as opposed to the traditional
approach of manually fitting each scattering profile separately. ESI Figure S16 shows that both CREASE versions–with
and without concentration input - achieve a similar goodness of fit
(χ2 value) and converge to a similar average nanoparticle
size and size dispersity. The CREASE version without the concentration
input converges to a slightly smaller average diameter melanin nanoparticle,
so the melanin–melanin RDF is slightly shifted left to smaller
contact position. Figure c,d demonstrates that both gene-based CREASE versions achieve
similar structures with a visualization that agrees with the experimental
TEM images. Thus, CREASE is able to reconstruct correct structures
for experimental scattering profiles coming from binary nanoparticle
assemblies at high packing fractions.
Figure 8
Application of the gene-based CREASE method
on small-angle neutron
scattering profiles from an experimental system of a two-component
(melanin and silica) nanoparticle assembly with ∼50%v melanin
nanoparticle concentration. (a) Itotal(q) vs q is the scattering from
both the melanin and silica nanoparticles (non-contrast-matched) with
the experimental profile shown in black, Icomp(q) from the Debye-evaluated gene-based CREASE method
with the nanoparticle concentration input is in red, and the Icomp(q) from the Debye-evaluated
gene-based CREASE method without the nanoparticle concentration input
is in purple. (b) ISil(q) vs q is the scattering from only the silica nanoparticles
(i.e., melanin particles are contrast-matched with background) with
the experimental profile shown in black, Icomp(q) from the gene-based CREASE method with the nanoparticle
concentration input in red, and the Icomp(q) from the gene-based CREASE method without the
nanoparticle concentration input in purple. The experimental profiles
come from a pinhole-smeared, small-angle neutron scattering experiment,
and as such, the computed scattering Icomp(q) curves shown also have instrument smearing effects
(as described in the Methods). (c) Melanin
and silica nanoparticle radial distribution functions (RDFs); the
gene-based CREASE method with the nanoparticle concentration input
is in red and the gene-based CREASE method without the nanoparticle
concentration input is in purple. (d) Cross section TEM image of a
binary assembly (dark particles are silica; light particles are melanin)
and renderings of sections of representative output structures from
the gene-based CREASE method (blue particles are melanin; yellow particles
are silica). For both gene-based CREASE methods, we provide the average
and standard deviation from three independent runs.
Application of the gene-based CREASE method
on small-angle neutron
scattering profiles from an experimental system of a two-component
(melanin and silica) nanoparticle assembly with ∼50%v melanin
nanoparticle concentration. (a) Itotal(q) vs q is the scattering from
both the melanin and silica nanoparticles (non-contrast-matched) with
the experimental profile shown in black, Icomp(q) from the Debye-evaluated gene-based CREASE method
with the nanoparticle concentration input is in red, and the Icomp(q) from the Debye-evaluated
gene-based CREASE method without the nanoparticle concentration input
is in purple. (b) ISil(q) vs q is the scattering from only the silica nanoparticles
(i.e., melanin particles are contrast-matched with background) with
the experimental profile shown in black, Icomp(q) from the gene-based CREASE method with the nanoparticle
concentration input in red, and the Icomp(q) from the gene-based CREASE method without the
nanoparticle concentration input in purple. The experimental profiles
come from a pinhole-smeared, small-angle neutron scattering experiment,
and as such, the computed scattering Icomp(q) curves shown also have instrument smearing effects
(as described in the Methods). (c) Melanin
and silica nanoparticle radial distribution functions (RDFs); the
gene-based CREASE method with the nanoparticle concentration input
is in red and the gene-based CREASE method without the nanoparticle
concentration input is in purple. (d) Cross section TEM image of a
binary assembly (dark particles are silica; light particles are melanin)
and renderings of sections of representative output structures from
the gene-based CREASE method (blue particles are melanin; yellow particles
are silica). For both gene-based CREASE methods, we provide the average
and standard deviation from three independent runs.
Conclusion
We have presented an open-source computational
tool to reconstruct
the bulk structure from small-angle scattering experiments of binary
nanoparticle mixtures and one-component nanoparticle solutions. We
demonstrated that this redesigned gene-based CREASE method is more
accurate and faster than the previously developed approach, applicable
to a wider range of experimentally relevant systems, and can be augmented
with an artificial neural network (ANN) to further accelerate the
speed of the method while maintaining good performance. For every
system considered, we quantify the closeness of the match between
the target structure and CREASE’s determined structure by calculating
the error (difference) between the target and CREASE scattering profiles
and radial distribution functions. Furthermore, our approach to training
the ANN using the nondimensionalized q value divided
by the nanoparticle diameter enables a single ANN to be used on multiple
systems with differing nanoparticle sizes. The gene-based CREASE method
presents a facile tool to obtain a computational structural reconstruction
enabling users to perform additional characterization such as optical
modeling for further structural validation and development of structure–property
relationships. The successful implementation of this gene-based CREASE
method promises further extensions to characterize the amorphous structure
of other soft materials that also have more complex form factors (e.g.,
anisotropic nanoparticles, concentrated micelle solutions, etc.).
Authors: Mario Echeverri; Anvay Patil; Ziying Hu; Matthew D Shawkey; Nathan C Gianneschi; Ali Dhinojwala Journal: ACS Appl Mater Interfaces Date: 2020-04-14 Impact factor: 9.229