Generating sets of trial structures that sample the configurational space of crystal packing possibilities is an essential step in the process of ab initio crystal structure prediction (CSP). One effective methodology for performing such a search relies on low-discrepancy, quasi-random sampling, and our implementation of such a search for molecular crystals is described in this paper. Herein we restrict ourselves to rigid organic molecules and, by considering their geometric properties, build trial crystal packings as starting points for local lattice energy minimization. We also describe a method to match instances of the same structure, which we use to measure the convergence of our packing search toward completeness. The use of these tools is demonstrated for a set of molecules with diverse molecular characteristics and as representative of areas of application where CSP has been applied. An important finding is that the lowest energy crystal structures are typically located early and frequently during a quasi-random search of phase space. It is usually the complete sampling of higher energy structures that requires extended sampling. We show how the procedure can first be refined, through targetting the volume of the generated crystal structures, and then extended across a range of space groups to make a full CSP search and locate experimentally observed and lists of hypothetical polymorphs. As the described method has also been created to lie at the base of more involved approaches to CSP, which are being developed within the Global Lattice Energy Explorer (Glee) software, a few of these extensions are briefly discussed.
Generating sets of trial structures that sample the configurational space of crystal packing possibilities is an essential step in the process of ab initio crystal structure prediction (CSP). One effective methodology for performing such a search relies on low-discrepancy, quasi-random sampling, and our implementation of such a search for molecular crystals is described in this paper. Herein we restrict ourselves to rigid organic molecules and, by considering their geometric properties, build trial crystal packings as starting points for local lattice energy minimization. We also describe a method to match instances of the same structure, which we use to measure the convergence of our packing search toward completeness. The use of these tools is demonstrated for a set of molecules with diverse molecular characteristics and as representative of areas of application where CSP has been applied. An important finding is that the lowest energy crystal structures are typically located early and frequently during a quasi-random search of phase space. It is usually the complete sampling of higher energy structures that requires extended sampling. We show how the procedure can first be refined, through targetting the volume of the generated crystal structures, and then extended across a range of space groups to make a full CSP search and locate experimentally observed and lists of hypothetical polymorphs. As the described method has also been created to lie at the base of more involved approaches to CSP, which are being developed within the Global Lattice Energy Explorer (Glee) software, a few of these extensions are briefly discussed.
The
great majority of compounds synthesized by chemists exist at
room temperature as solids, often in crystal form. Crystallization
itself can be a challenging part of the synthetic process, further
complicated by polymorphs (the existence of multiple crystal structures
of a given compound), impurities, or the desire for crystals with
particular structural properties.[1] The
physical properties of an organic molecular crystal must derive in
part not only from those of its constituent molecules but also by
the arrangement of molecules in a crystal and the intermolecular interactions
that either drive, or result from, a particular crystal packing. Many
of the molecules which the chemist is interested in synthesizing are
chosen due to their solid-state properties, such as in the fields
of organic semiconductors,[2] pigments,[3] and porous molecular materials.[4−6] The selection and control of solid form is also vitally important
in developing pharmaceutical molecules into tablets with satisfactory
stability and bioavailability; the issues raised by polymorphism in
pharmaceutical chemistry have been characterized extensively.[7] A large proportion of organic molecules is known
to be polymorphic,[8] although the relationship
between molecular characteristics and the existence of polymorphs
is unclear.For the above reasons, the importance of characterizing
the crystal
structure is key to rationalizing many properties. Many tools used
to probe the molecular structure, such as X-ray diffraction, solid
state NMR,[9−11] or those of solid-state spectroscopy,[12,13] are sensitive to the local and long-range structure of a molecule
within the crystal. From a theoretical perspective, the prediction
of crystal structures ab initio is a natural challenge
to theoretical and computational chemists and has valuable applications
in characterizing the landscape of possible crystal structures available
to a given molecule. This challenge has drawn a community of researchers
who seek to solve these structures from limited initial data and preferably
from just the two-dimensional chemical diagram of the molecular structure.[14−16] The progress that has been made by this community is clear from
published studies on large and flexible molecules[5,17−23] and can be tracked in a series of collaborative exercises in which
active members of the field have been challenged to predict the structures
of unpublished crystals.[24,25] These ”blind
tests” of crystal structure prediction (CSP) attempt to benchmark
the successes and limitations of the different contemporary approaches,
the progress that has been made, and that which is still required.A fundamental concept in our approach to CSP is to represent the
internal potential energy of a crystal structure as a function of
the intra- and intermolecular coordinates, where the intramolecular
structure, energy, and properties are calculated using quantum chemical
methods, and intermolecular interactions are calculated using anisotropic
atom–atom potentials.[26] Although
research in our group into methods for efficiently handling molecular
flexibility, and for the calculation of free-energies,[27] is active, this paper is concerned with characterizing
the potential energy surface describing the crystal packing of rigid
molecules. Thus, our configurational space is the union of the coordinates
which describe the positions and orientations of the molecules in
the crystal’s asymmetric unit and the degrees of freedom which
determine the unit cell. These, along with the space group operations,
define a crystal structure. Our potential energy surface (PES), which
is a function of these variables, is a force field which comprises
an exp-6 model of short-range and dispersion interactions combined
with an atomic multipole electrostatic model derived from single molecule
DFT calculations. We seek to characterize the resulting lattice energy
surface by reliably locating and ranking all local minima within a
certain energy of the global minimum. Each local minimum on the lattice
energy surface could represent an observable polymorph of the molecule
in question.Efforts toward improving our ability to predict
crystal structures
largely focus on either the evaluation of more accurate lattice energies
or the challenge of locating all possible structures through sampling
of the PES. The problem of sampling phase space is the chief concern
of this paper, and we describe methods that have been implemented
in our Global Lattice Energy Explorer (Glee) software which
is described herein. Each minimum on the PES will have an associated
basin within which any structure will, when relaxed using local lattice
energy minimization, be reached. The structure generator outlined
here seeks to sample trial structures such that all local minima are
located during the minimization step. We take an approach of ensuring
as diverse a sampling as possible, because we are not only interested
in the structure corresponding to the global lattice energy minimum
but rather the entire ”landscape” of structures. For
most applications of the prediction of molecular crystals, the landscape
should be sampled as completely as possible within the energy range
of expected polymorphism. Recent calculations on over 1000 crystal
structures of known polymorphs show that, while the majority of polymorphs
are separated by less than 2 kJ mol–1, occasional
pairs of known polymorphs differ by 10 or more kJ mol–1.[27] The number of distinct crystal packing
alternatives within such an energy window above the global minimum
usually amounts to many 10s and very frequently over 100 distinct
crystal structures for small organic molecules.[28] In the class of crystals which can support inclusion compounds,
it may be that the structures which are higher in energy, and less
perfectly packed from a purely energetic point of view, are the most
interesting.[29] A focus on too small a set
of structures at, or around, the global lattice energy minimimum would
be to risk losing the richness of the landscape and potential solid
form diversity of a molecule, which must be considered in developing
a molecule into a useful material.Alternative approaches to
crystal structure generation which have
been applied to molecular crystals include simulated annealing[30] and more sophisticated[31] variants of Monte Carlo searches and genetic algorithms,[32−34] as well as the early pioneering CSP studies using purely random
or grid searches.[35−37] In fact, these simplest methods have been remarkably
successful, consistently performing well in the structure searching
aspect of the blind tests of CSP.[24,25] We follow
previous groups who favor low-discrepancy, quasi-random sampling,[38,39] as we require an algorithm that samples the phase-space completely
and efficiently. Quasi-random sequences have attractive properties
for locating local minima; in particular, at each step in the sequence
the configurational space is as uniformly sampled as possible. Unlike
a deterministic method such as a grid-based search, the convergence
of the search can be continuously monitored and extended until one
is suitably confident that all relevant local energy minima have been
sampled. The problems of structure generation and lattice energy minimization
can be programmed in this way to make full use of the computational
resources available to us, to require very few pieces of input information,
and to be repeatable.Chemical diagrams of the three molecules studied here:
a) artemisinin,
b) quinacridone, and c) CC1.The purpose here is to describe our implementation of a quasi-random
search, whose use has already been demonstrated in studies of mechanochemical
reactions[40] and cocrystal formation,[41] as well as to investigate and optimize the performance
of our algorithm. While quasi-random CSP methods have been extensively
applied to CSP, there have been few detailed studies of the performance
of such a search. For the purposes of this study, we follow the convergence
of finding a complete set of possible low energy crystal structures
for three molecular systems. We investigate the coverage of packing
space as a function of the number of trial structures that have been
generated and lattice energy minimized and the influence of modifying
the volume available to the molecule during the structure generation.
We also describe the use of the separating axis theorem to relieve
molecular clashes in generated structures, in place of rejection.The three molecules studied, artemisinin, quinacridone, and an
organic cage (Figure ), were chosen for diversity in molecular characteristics, in terms
of shape and intermolecular interactions, and from three areas where
CSP has found applications. Artemisinin, whose discovery was honored
by the 2015 Nobel prize in medicine, is a drug used in the treatment
of malaria[42] and, potentially, cancer.[43] Quinacridone finds use in the pigment and semiconductor
industries[44] and has known polymorphism.
The third molecule investigated, hereafter referred to as CC1, is
one of a series of porous organic cages that we have studied previously
using simulated annealing;[21,45] these cages are of
interest as solution processable porous materials, and CC1 has the
interesting behavior of switching between porous and nonporous polymorphs.[46] These three molecules not only exemplify the
relevance of CSP in various application areas but also test and demonstrate
the performance of the Glee code in cases with a range of
known experimental polymorphs, molecular geometries, and intermolecular
interactions.
Figure 1
Chemical diagrams of the three molecules studied here:
a) artemisinin,
b) quinacridone, and c) CC1.
Methods
For rigid
molecules, the process of CSP involves the following
general steps: (i) molecular geometry optimization; (ii) trial crystal
structure generation; and (iii) local lattice energy minimization
of trial structures. Clustering of structures is performed after their
lattice energy minimization to remove duplicates and assess the completeness
of the search.All calculations presented here are performed
with rigid molecular
geometries after step (i), taken from isolated molecule geometry optimization
using the B3LYP functional with a 6-311G** basis set within the Gaussian09
software.[47]
Crystal
Structure Generation
Mapping Quasi-Random
Numbers to Structural
Parameters
Our sampling of the crystal packing configurational
space is based on quasi-random, low-discrepancy sequences generated
by the Sobol method,[48] in a similar manner
to Della Valle and co-workers[38] and Pantelides
and Adjiman.[39,49] The present study is restricted
to rigid molecules, for which the molecular geometry is kept fixed
throughout the generation and optimization of crystal structures.
In this approximation, each independent molecule in the asymmetric
unit requires three parameters to determine its position and three
for its orientation. The values of a further X (X = 1–6) parameters must be generated in order to
specify the internal angles and lengths of the unit cell parallelepiped: X = 6 in the case of a triclinic cell, although fewer for
lattices with restrictions on cell lengths and angles. Each parameter, p, is associated with a quasi-random
number, x ∈ [0,1),
although, as is discussed below, not all parameters are determined
independently of each other.
Molecular Positions
The mapping
from three random numbers
to the three positions of a molecule’s centroid is trivial.
Each number, x, is taken
as a position in fractional coordinates along a particular cell axis.
To keep the method general, we include translation along all three
lattice vectors in all space groups, regardless of whether the energy
is invariant to particular translations in certain space groups. Molecular
orientations relative to the global axis frame are sampled using the
quaternion based Shoemake method,[50] which
has previously, for example, been used in the generation of molecular
dimers.[51] The positions and orientations
of all molecules in the unit cell are then generated by applying space
group symmetry operators to the asymmetric unit.
Unit Cell
Sampling
Each unit cell angle, θ, that is not constrained by space group
symmetry is sampled to give an even distribution in cos(θ) according towith n = 2 and , and x is the relevant element of
the Sobol vector. This choice samples
the range from to with a probability density that is highest
at its center of . The function
used to sample cell angles
was chosen to provide a balance between sampling a spread of angles
and avoiding problematic representations of structures. It is not
generally the case that a particular crystal structure has a unique
choice of lattice vectors. In triclinic and monoclinic systems, many
options for the unit cell have very acute or obtuse angles, which
are computationally awkward and inefficient to lattice energy minimize.
The chosen range for cell angles will not exclude any structures but
attempts to only generate versions of structures without flat unit
cells.It is only at the stage of selecting bounds for the cell
lengths that our algorithm includes specific information pertaining
to the individual system. Our sampling is influenced by the ”box
model” of Pidcock and Motherwell,[52] which established relationships between molecular dimensions and
unit cell lengths. We establish a target volume for the unit cell
as the sum of the volumes of all molecules in the unit cell, multiplied
by a constant, henceforth referred to as the target volume parameter
(TVP). TVP takes a default value of 1.0 but is varied in a later section
of this paper to investigate its influence on the performance of the
method. The molecular volume is calculated as that of a box chosen
to enclose all of its atoms. This box is defined by calculating the
axes of inertia of each molecule and finding the maximum and minimum
value of the projection of each of its atomic coordinates onto these
axes, including standard van der Waals radii[53] for each atom. In this section, the difference between the maximum
and minimum value of the projections of atomic coordinates, with an
appropriate consideration of each atom’s van der Waals radius,
will be referred to as the molecule’s ”shadow”
onto that axis. As a measure of the volume of the molecule, the product
of these three shadows onto the molecule’s axes of inertia
would be an overestimate when compared to a more usual measure of
volume based on atomic volumes or the molecular van der Waals surface.
However, when generating crystal structures we expect to start with
a larger volume before allowing the cell to contract under intermolecular
forces at a later stage of the process.Molecular projections
onto lattice vectors, used to define the
sampling range for unit cell lengths. The directions of the three
lattice vectors, l1,2,3, are shown, and
the molecular projections of two quinacridone molecules are shown
onto lattice vector l2. Thin lines show
the projection of the edges of the van der Waals radii of each atom
onto the lattice vector. Bold red and blue lines show the molecular
shadows onto l2. In this example, s2 = 9.13 Å and s2 = 9.57 Å.The bounds of the three
cell lengths can be calculated by considering
this target volume and also the projections of the atomic positions
onto the lattice vectors. Given that the unit cell angles have been
determined, we are able to fix the direction of each unit cell vector
in a global axis frame and can calculate the shadow of each lattice
vector onto each of our global axes. We must consider separately all
molecules in the unit cell that differ by rotation and find the maximal
and minimal values of molecular projection on each cell vector, j, which we denote s and s, respectively
(Figure ). To sample
a physically realistic range of cell lengths, we choose the length
of the first unit cell vector in the range from c · s to c · N · s, where N is the number of molecules
in the unit cell, and c is a constant used to scale
the entire range:
Figure 2
Molecular projections
onto lattice vectors, used to define the
sampling range for unit cell lengths. The directions of the three
lattice vectors, l1,2,3, are shown, and
the molecular projections of two quinacridone molecules are shown
onto lattice vector l2. Thin lines show
the projection of the edges of the van der Waals radii of each atom
onto the lattice vector. Bold red and blue lines show the molecular
shadows onto l2. In this example, s2 = 9.13 Å and s2 = 9.57 Å.
The constant c is fixed at 0.75 in this study,
reflecting the fact that molecular dimensions can extend past the
length of unit cell dimensions.[52] The second
unit cell length is sampled in the same manner, using projections
of the molecular dimensions onto the direction of the second vector
and taking the next element of the Sobol vector to sample the relevant
range. The third (final) cell length is chosen to give a normal distribution
of cell volumes, whose mean is the target volume described above.
Thus, the next element of the Sobol vector samples a normal distribution
with a standard deviation of 0.15 s, which we find yields a reasonable distribution of volumes.
The only cases where this sampling of unit cell lengths is altered
are (i) when N =
1, where s is increased
by 50% to ensure a spread of unit cell lengths is sampled and (ii)
in crystal systems that place restrictions on cell lengths, where
fewer independent unit cell lengths must be determined. We also cycle
through the permutations of possible orderings in which the cell lengths
could be assigned, so as to avoid any possible systematic bias.
Screening of Unphysical Structures
Before
the crystal structure’s parameters are optimized with
respect to the lattice energy, unphysical structures, particularly
those in which molecules overlap, should be rejected or adjusted.
To do this quickly, the convex hull of the molecule is calculated,[54] and the separating axis theorem[55] is employed to calculate the overlap it has with its neighbors.
The molecule’s convex hull is a polytope whose vertices are
at atomic positions; these are defined such that the object is convex,
and all atomic positions that are not vertices of the hull lie within
its volume. A common analogy is to compare the convex hull of an object
to its shape if it were wrapped in wrapping paper. As the number of
vertices defining the convex hull grows more slowly than the total
number of atoms in the molecule, it is an efficient object to deal
with when molecules are large. Furthermore, in the case of rigid molecules,
the convex hull needs only to be calculated once for each type of
molecule, which is performed before generating structures. The convex
hulls can be manipulated with the usual symmetry operators, and all
neighboring molecular pairs are tested for overlap.From the
separating axis theorem we determine whether a pair of convex hulls
overlap and the vector of minimum length which is required to separate
the two objects. A set of vectors is taken, which are either normal
to the faces of a hull or to an edge from each.[56] Onto this set, the shadow of each convex hull is projected
(as always, considering the finite size of the atoms by including
the appropriate van der Waals radius), and the overlap of the two
shadows is measured. The minimal length of overlap along any vector
in this set yields the smallest vector required to separate the objects.
If there is no overlap of the shadows on any axis in the set, the
convex hulls do not overlap, as is the case in the example in Figure .
Figure 3
Separating axis theorem
test for molecular overlap. The separating
axis theorem prescribes the vectors upon which to project the vertices
of the convex hulls when testing polytopes for their overlap in space.
An example for the cage molecule CC1 is shown with convex hulls overlaid
on the molecular geometry. In the geometry shown there is a vector
upon which the “shadows“ of their hulls, the blue and
red vectors, do not overlap. If they did overlap, the set of overlapping
blue and red vectors would determine the minimum displacement necessary
to separate them in the direction of that vector.
Separating axis theorem
test for molecular overlap. The separating
axis theorem prescribes the vectors upon which to project the vertices
of the convex hulls when testing polytopes for their overlap in space.
An example for the cage molecule CC1 is shown with convex hulls overlaid
on the molecular geometry. In the geometry shown there is a vector
upon which the “shadows“ of their hulls, the blue and
red vectors, do not overlap. If they did overlap, the set of overlapping
blue and red vectors would determine the minimum displacement necessary
to separate them in the direction of that vector.The result of the separating axis theorem test can be used
in one
of two ways, each of which are investigated in this study. The simplest
procedure is to reject any trial crystal structure that contains overlapping
molecules. The proportion of rejected structures decreases as the
target unit cell volume (as determined by the parameter TVP) is increased,
which makes more efficient use of the Sobol sequence, at the expense
of creating crystal structures that are farther from their final (post
energy minimization) density and thus more expensive to lattice energy
minimize. For this reason, we test the influence of our choice of
TVP on the performance of the search. In this study, we test the structure
generation procedure with rejection of trial structures using TVP
= 1.0, 1.5, 2.0, and 2.5.The second option that we have implemented
is to adjust trial crystal
structures to remove the overlap between molecules. We do this by
expanding lattice vector lengths according toThe lattice vector indexed by i grows due
to the
relationship of the overlap vector, v, and the vector between the centroids of the
objects, v. When
both vectors are given in fractional coordinates, the increase in
cell length, Δl, is given by the ratio of their components along that axis. For
numerical stability, the cell is only expanded along axis j when v > 0.05,
and we add a parameter, η, to Δl with the value 0.001 Å in this study.
This lattice vector expansion procedure is iterated until the structure
contains no overlapping molecules. In cases where molecules are positioned
close to a space group symmetry element, the cell expansion required
to relieve molecular overlap can lead to very large unit cells. We
therefore place a limit on unit cell volume after lattice vector expansion,
above which the trial structure is rejected. We define this limit
(maximum volume parameter, MVP) in reference to the molecular volumes
used to calculate the target volume parameter, TVP, as 2.5 times the
sum of molecular volumes in the unit cell.A typical use of
the crystal structure generation procedure is
to generate a set number of trial structures within a specified space
group, to allow them to reach a minimum on the PES through lattice
energy minimization, and then to monitor the results achieved to assess
whether the sampling of possible structures is sufficiently complete.
If more structures must be generated for a particular space group,
then the search is continued, starting from the highest value of Sobol
seed that has previously been used.
Lattice Energy Minimization
The crystal
structure generator described above creates trial structures that
could be lattice energy minimized by any method that can affordably
be applied to the number of structures required to sample the PES.
Currently, the Glee software is interfaced with the DMACRYS crystal structure modeling software,[26] to make use of anisotropic atom–atom model potentials. All
lattice energy minimizations reported here were performed using DMACRYS, which employs a quasi-Newton–Raphson, rigid-molecule
optimization of molecular positions, orientations, and unit cell parameters
with space group symmetry constrained. The intermolecular interaction
energy between molecules M and N was modeled with an anisotropic model potential of the formwhere i, k are atoms of type ι and κ belonging
to molecules M and N, respectively,
separated by the
distance r. The first
two terms model the repulsive and attractive nonelectrostatic intermolecular
interactions, whose parameters are taken from a revised version[57,58] of the Williams99 force field.[59] The
final term, describing electrostatic interactions, is calculated from
atom-centered multipoles up to rank 4 (hexadecapole) on all atoms,
obtained from a distributed multipole analysis[60] (DMA) of the B3LYP/6-311G** charge density. Charge–charge,
charge–dipole, and dipole–dipole interactions were calculated
using Ewald summation, while repulsion–dispersion interactions
and all higher multipole–multipole interactions were truncated
after a cutoff distance. The summation cutoff (for exp-6 interactions
and higher-order multipole-multipole interactions) was set to 30 Å
for CC1 and quinacridone and 15 Å for the more compact artemisinin
molecule.
Clustering
Any method for structure
prediction requires a procedure for comparing pairs of generated structures
and determining whether they are, to within a set tolerance, identical.
This step is essential to both remove duplicates from a data set and
also to monitor the convergence of the completeness of the sampling.
Clustering is only performed after lattice energy minimization of
the trial structures. Various methods exist to perform this task in
CSP, including the comparison of similarities of computed X-ray powder
diffraction patterns[61] and the Compack
algorithm,[62] which tests interatomic separations,
and performs an overlay of molecules in order to quantify the similarity
of the structures.Structure comparison and clustering in this
work has been processed with our in-house method, which is related
to the Compack approach. A cluster of molecules is constructed surrounding
each molecule in the asymmetric unit (we use clusters of 25 molecules
in this work). We then construct a list of the displacements between
atoms in the neighboring molecules and the centroid of the reference
molecule. Two such lists can be compared, by positioning the origins
of both clusters at the same position and testing whether, for every
molecule in one cluster, a set of points occurs in the second which
can be overlaid upon the first by the action of rotation only. An
algorithm to calculate the optimal RMSD exists,[63] and we use a tolerance for comparing pairs of structures,
in Å, of 0.5 + 0.05 · r(c1), where r(c1) is the distance
of the centroid of the molecule in the first list, to that of the
reference molecule around which the cluster is built. If the centroid
to origin distance of the molecule in the second structure, r(c2), is not within 20% of that of r(c1), or if the molecules contain different
numbers of atoms (in the case of multicomponent crystals), the test
fails automatically. If, under this criterion, the lists of clusters
of atomic coordinates are determined to match for clusters around
all molecules in the asymmetric unit, then the crystals are judged
to have identical packings, corresponding to identical minima of the
PES.In order to build up the clusters in a robust and computationally
efficient manner, we make use of the Niggli reduced cell[64] representation of each crystal structure. The
reduced cell vectors are calculated using an algorithm from the Computational
Crystallography Toolbox.[65] Furthermore,
to improve the performance of the algorithm in comparing crystal structures
whose molecules are large, the set of atomic positions used in the
comparison is reduced to those atoms which comprise the convex hull
of the molecule, after the hydrogen atoms have been removed.A final point concerns molecular symmetry. The algorithm which
calculates the optimal RMSD of the overlaid points is sensitive to
the order of the coordinates, and hence up to S overlays
may have to be performed, where S is the order of
symmetry of the molecule. Without making assumptions about combinations
of crystal and molecular symmetry operations, when building up lists
of atomic positions, S such lists must be calculated:
one for each set of coordinates that are equivalent under the internal
symmetry of the molecule. We calculate the matrices for each of these
operations, and by maintaining a consistent atomic labeling scheme
and limiting ourselves, in this paper, to rigid molecules, this calculation
is only required once. The CC1 cage is a particularly symmetric molecule,
with S = 12, but even with this ”worst case”
example the clustering is an inexpensive step in the entire CSP procedure.
Results and Discussion
We start with the
results for a selection of molecule/space group
combinations, choosing the space groups of the observed polymorphs
of each molecule for detailed investigation of the convergence of
the search for crystal structures. Since we must treat whole molecules
in the crystal structure generation procedure, we consider the space
groups of the observed structures after removing space group symmetry
elements that correspond to intramolecular symmetry. Searches were
performed with one molecule in the asymmetric unit (Z′ = 1) in P212121 for artemisinin; P1 and P21/c for CC1; and P21/c and P1̅ for quinacridone.
Quinacridone has a known polymorph with two independent molecules
in the asymmetric unit (Z′ = 2), so searches
were also performed with Z′ = 2 in space group P1̅ .We generated trial structures with each
of the five variations
on the structure generation procedure for each system (molecule/space
group combination) with Z′ = 1. Four seaches
employed rejection of trial structures with overlapping molecules,
using different target cell volumes in the assignment of lattice parameters
to the trial structures (TVP = 1.0; 1.5; 2.0; 2.5). A fifth search
was performed for each system using the cell expansion method in place
of rejection, with TVP = 1.0 and a maximum expansion of the unit cell
volume (MVP) to a volume parameter of 2.5 (we refer to this method
hereafter as SAT-expand). The SAT-expand method should be viewed as
a variation on the simpler TVP = 1.0 search, but where structures
with overlapping molecules are retained if this overlap can be relieved
through expansion of the unit cell volume by up to 250%.10000
trial structures were generated (after rejection) with each
variation of the method for the searches with one molecule in the
asymmetric unit. We expect this to be more structures than would generally
be required per space group in a CSP study. This deliberate oversampling
is performed to gather meaningful statistics. 50000 structures were
generated for the quinacridone Z′ = 2 search
with each method; a larger number is expected to be required to cover
the higher dimensional space. A second, low symmetry (P1, Z′ = 4) polymorph of artemisinin is known.
Therefore, to test if this structure could be located with our method,
a 50000 structure search was performed for artemisinin in P1 with Z′ = 4 with the SAT-expand
method only.
Convergence of the Number of Unique Crystal
Structures
We first examine the efficiency with which each
variation of the structure generation method uses the Sobol sequence.
Since the low-discrepancy sampling is designed to uniformly sample
phase space, we want a method that makes the best use of each point
in the sequence; high rates of rejecting trial structures could undermine
the uniformity of the search.As expected, we find that fewer
trial crystal structures are rejected when the target volume is increased
(Table ). The probability
of molecular overlap is decreased as the volume per molecule is increased.
Between 1 in 3 and 1 in 50 trial structures contain overlapping molecules
when the target unit cell volume is chosen to just fit the molecules
(TVP = 1.0) and there are large variations in the rejection rate between
molecules. Quinacridone leads to the most rejected structures: this
long, thin molecule clashes with neighbors in most orientations generated
from a random sampling. Trial structures of the more isotropically
shaped CC1 and artemisinin less frequently contain molecular clashes.
We also find variations between space groups for a given molecule. P21/c generally leads to more rejected structures than
simpler space groups
with fewer symmetry elements (P21/c vs P1 for CC1, P21/c vs P1̅ for quinacridone),
since more of the configurational space lies sufficiently close to
a symmetry element such that symmetry generated molecules overlap
with the original. The Z′ = 2 search is particularly
problematic: the generation of 50000 accepted structures required
almost 108 trial structures with TVP = 1.0, an acceptance
rate of 0.05%.
Table 1
Number of Trial Structures Required
To Generate 10000 Accepted Crystal Structures (50000 for Z′ = 2 Quinacridone and Z′ = 4 Artemisinin)
for Each Systema
system
TVP = 1.0
TVP = 1.5
TVP = 2.0
TVP =
2.5
SAT-expand
CC1 (P1)
17863 (9581)
17225 (8999)
17215 (8274)
17215 (7681)
10090 (8514)
CC1 (P21/c)
156918 (9723)
38951 (9211)
23395 (8843)
18133 (8279)
16022 (9059)
quinacridone (P1̅ )
251805 (9804)
55400 (9827)
30696 (9761)
23262 (9651)
25131 (9862)
quinacridone (P21/c)
501181 (9767)
78400 (9533)
36348 (9315)
24135 (9075)
26617 (9353)
quinacridone (P1̅ , Z′ = 2)
96693852 (32021)
8325359 (46213)
1057042 (46443)
504452 (45714)
480626 (43541)
artemisinin (P212121)
39082 (9894)
16866 (9584)
13166 (9071)
12018 (8490)
11208 (9362)
artemisinin (P1, Z′ = 4)
363185 (38510)
Z′ =
1 unless otherwise stated. The number in parentheses is the number
of accepted structures that lead to a successful lattice energy minimization.
Z′ =
1 unless otherwise stated. The number in parentheses is the number
of accepted structures that lead to a successful lattice energy minimization.Considering only Z′ = 1, the differences
in rejection rate between space groups and between molecules nearly
disappear at large target volumes, where the proportion of rejected
structures is decreased. An increase of only 50% to the target volume
(TVP = 1.5) has the largest impact on systems where rejection rates
were very large (CC1P21/c and both space groups for quinacridone). At TVP = 2.5, the acceptance
rates are quite high: almost all trial structures of artemisinin are
accepted, and acceptance rates are in the 40–50% range for
CC1 and quinacridone. The acceptance rate for Z′
= 2 is also improved dramatically when a larger target volume is used,
so that almost 1 in 10 structures is accepted for TVP = 2.5.Increasing the volume of generated unit cells clearly makes more
efficient use of the Sobol sequence. On the other hand, trial structures
with smaller volumes are closer in cell parameters to the final densely
packed, lattice energy minimized crystal structures. As a result,
the proportion of accepted structures that results in successful lattice
energy minimization is highest (96–99%) when the target volume
is matched with the molecular volume (TVP = 1.0). Trial unit cells
with large volumes prove more challenging for the lattice energy minimizer;
up to 23% of accepted Z′ = 1 trial structures
fail to find a local minimum using TVP = 2.5 (Table ). Furthermore, it could be conjectured that
making initial guesses that are close to the final, energy minimized
structures are more likely to end up in narrow wells and thus more
reliably locate all low energy structures than when TVP is set artificially
high.In searches where structures with overlapping molecules
are rejected,
there is a balance between efficient use of the Sobol sequence, which
is best at high target volume, and ease of lattice energy minimization,
which is best at smaller target volumes. The SAT-expand method compares
favorably to the rejection-based methods on both criteria (final column, Table ). The rate of accepting
trial structures is as high as the TVP = 2.5 searches, since only
those that require excessively large unit cell expansion to relieve
molecular clashes are rejected. However, since structures in the SAT-expand
approach are initially generated with TVP = 1.0 and many of these
do not require significant expansion, many of the structures entering
energy minimization are close to the densities of the final lattice
energy minima. This results in higher success rates of lattice energy
minimization than generating directly with large unit cells (e.g.,
TVP = 2.5).Number of unique crystal structures, within 15 kJ/mol of the global
minimum, for artemisinin in space group P212121, displayed (a) as a function of the current
position in the Sobol sequence and (b) as a function of the total
number of successfully energy minimized structures.As a first analysis of the convergence of the crystal
structure
searches, we monitored the number of unique, low energy lattice energy
minima that had been located as the search progressed. For this analysis,
we defined the low energy region as that within 15 kJ/mol of the global
minimum. Figure displays
the results for artemisinin in P212121 (corresponding plots for the other systems can
be found in the Supporting Information).
The rate of finding new crystal structures is high at the beginning
of the search but levels off to the point where no new crystal structures
are being located. We observed that all of the methods converge to
the same number of unique structures. The TVP = 1.0 method converges
most slowly as a function of the number of Sobol vectors attempted
but fastest as a function of the number of valid, lattice energy minimized
structures. Given that lattice energy minimization is the most costly
part of the process, this suggests a slight advantage of generating
trial structures with small unit cell volumes. Again, the SAT-expand
approach compares favorably with simple rejection, making efficient
use of the Sobol sequence and converging quickly with respect to the
number of lattice energy minimizations.
Figure 4
Number of unique crystal structures, within 15 kJ/mol of the global
minimum, for artemisinin in space group P212121, displayed (a) as a function of the current
position in the Sobol sequence and (b) as a function of the total
number of successfully energy minimized structures.
Energetic
Assessment of Sampling Convergence
As well as monitoring
how the total number of unique low energy
crystal structures converges during a crystal structure search (Figure ), it is useful to
monitor the evolution of the energy of the lowest energy structure
found during a search, as a function of the number of structures that
have been energy minimized. Figure displays the rate at which the energy of both the
lowest individual structure, and the set of the lowest 10 structures,
converges with respect to the number of successful energy minimizations
for artemisinin (P212121) and CC1 (P21/c).
Figure 5
Average lattice energy of the ten lowest energy structures is shown,
as a function of the number of minimized structures generated in the
experimentally observed space group for a) artemisinin in P212121 and b) CC1 in P21. The dashed lines indicate the energy of
the single lowest energy structure, where the color relates to the
same method in the legend. The data had converged after 1000 and 6500
minimizations for a) and b), respectively, so is not shown beyond
this point for clarity.
Average lattice energy of the ten lowest energy structures is shown,
as a function of the number of minimized structures generated in the
experimentally observed space group for a) artemisinin in P212121 and b) CC1 in P21. The dashed lines indicate the energy of
the single lowest energy structure, where the color relates to the
same method in the legend. The data had converged after 1000 and 6500
minimizations for a) and b), respectively, so is not shown beyond
this point for clarity.Several points are immediately obvious. The lowest energy
structure
in the set is found rapidly, which we find to be true for all systems
studied here (see the Supporting Information for results for the other systems). Once the lowest energy structure
remains stable with respect to the number of energy minimized structures,
we assume that this corresponds to the true global minimum on the
lattice energy surface. We also monitor the mean energy of the 10
lowest energy structures that have been located, to see by which point
in the search this larger set of low energy structures remains stable.
We find that convergence of the set of the 10 lowest energy structures
is about an order of magnitude slower than the rate of finding the
global minimum and that the convergence is quicker for artemisinin
than CC1. For artemisinin, the sampling in this space group appears
to be complete for all variations of the search (different TVP and
SAT-expand) well before 1000 successful lattice energy minimizations.We observe that searches using a large TVP generally appear to
converge more slowly than the smaller target volumes, with the SAT-expand
method performing fairly well; this is in line with our expectations
based on the convergence of the number of unique structures (Figure ). The change in
performance of the search upon changing TVP is particularly stark
for CC1 in P21/c, where
the entire set of 10 lowest energy structures converges slowly for
TVP ≥ 1.5. On the basis of these results, the most satisfactory
results are obtained when searching either with rejection-based sampling
and a small target volume, recognizing that many trial structures
will be rejected, or the SAT-expand method.There is evidence[66] and intuition behind
the idea that the deeper wells on the PES may well also have a large
watershed around them, and a quasi-random search seeks to take advantage
of this. The rapid convergence of the set of lowest energy structures
is a useful property when looking to make rapid searches in a wide
range of space groups, as it should be possible to estimate the limit
of the lowest energy structure in the set before completeness is achieved.
Noting the number of structures needed to find the ten lowest energy
structures gives us a ball park figure of the absolute minimum number
of structures that we would wish to successfully lattice energy minimize
in these space groups. In some cases, this can be as few as several
hundred lattice energy minimizations, although the small computational
expense of generating and minimizing structures means that we would
generally afford ourselves several thousand structures in space groups
whose lowest energy structure is within the lattice energy range of
interest to us.
Rate of Sampling of Low
Energy Structures
The number of times that each low energy
predicted crystal structure
is located can be studied in detail, to investigate how our attempt
at uniform sampling of configurational space during trial structure
generation translates into uniformity of sampling of local energy
minima. Figure displays
the number of times that each of the 10 lowest energy crystal structures
appear in each search for four of our systems. The sampling of individual
low energy crystal structures is clearly uneven; each system has some
structures that are more rarely located than others.
Figure 6
Bar charts showing the
frequency with which each low energy structure
is located. For each of the lowest 10 unique structures, for the denoted
systems, the energy above the minimum in the set is displayed on the
horizontal axis, and the number of times that it was found in the
search is read from the vertical axis. The five methods appear alongside
each other, with the color of the bar signifying the method.
Bar charts showing the
frequency with which each low energy structure
is located. For each of the lowest 10 unique structures, for the denoted
systems, the energy above the minimum in the set is displayed on the
horizontal axis, and the number of times that it was found in the
search is read from the vertical axis. The five methods appear alongside
each other, with the color of the bar signifying the method.The case of CC1 in space group P1 (Figure ) is very simple: the frequency
of finding each minimum decreases as the lattice energy increases,
and for all methods, well over half of the initial structures relax
to the two lowest minima (4000–5000 hits to te global minimum,
1500–2000 hits to the second lowest energy structure). CC1
(P1) is also the system with the largest energy differences
between structures, since there are few ways to achieve a low energy
crystal packing when all molecules are related by translational symmetry
only (space group P1). The five variations on the
sampling method show very similar performance for CC1 (P1), with the searches that sample smaller volume trial structures
leading to slightly more structures overall, as the rate of achieving
successful lattice energy mimimization is slightly larger from these
trial structures.The tendency for the lowest energy structures
within a space group
to be frequently located is repeated for all other systems, with the
global minimum always one of the most sampled structures; this finding
explains the rapid convergence of the global minimum energy shown
in Figure . However,
the lattice energy surfaces of most systems are more detailed than
that of CC1 (P1), with more low-lying energy minima,
and a less clear relationship between the energy of a local minimum
and the frequency of finding it in a search. For example, the structure
search for quinacridone in space group P1̅
leads to many energetically similar crystal structures, with similar
layered packings of the planar molecule. The first two structures
are found frequently, but we observe that it is much harder to locate
all of the others (Figure ). The differing frequencies of obtaining each minimum are
difficult to explain, as only 6.68 kJ/mol separate the set, and they
are structurally very similar; as will be noted later, we also have
concerns with regard to our force-field for this system.It
is those crystal structures that are located infrequently that
are most concerning, as they could easily be missed if sampling is
stopped too early, and most systems that we studied have such structures
on their landscapes. The fourth lowest energy structure in CC1 (P21/c), with a relative lattice
energy of 4.15 kJ/mol is one such example (Figure b), as are structures 4 (1.81 kJ/mol) and
8–10 for quinacridone in P1̅ (Figure b) and, to a lesser
extent, structures 2 (0.58 kJ/mol), 5 (2.27 kJ/mol), and 7 (2.72 kJ/mol)
for artemisinin (Figure d). In most cases, the rate at which these challenging structures
are found decreases as TVP is increased, meaning that the rejection-based
methods with large unit cell volumes have a high risk of missing some
low energy structures. An advantage of the SAT-expand method is that
a range of initial volumes are covered during the search, and we find
this method performs well on the challenging, infrequently sampled
crystal structures.Hits to the 10 lowest ranked crystal structures of quinacridone P1̅ based on the combined complete search of the
five methods. Each point represents a lattice energy minimization
from a trial structure, showing the step in the Sobol sequence where
the trial structure was generated and the lattice energy minimum to
which it optimizes.Another way of examing
the sampling of low energy structures is
to keep track of where each occurrence of each low energy crystal
structure was generated in the original Sobol sequence (Figure ). This representation reassures
us that the Sobol sequence is evenly exploring the configurational
space, as the points leading to each low energy structure are evenly
distributed along the series. This representation of how well the
low energy structures are sampled is useful for monitoring a calculation
as it proceeds, since it provides an immediate picture of the state
of completeness. Again, we clearly see that increasing TVP hinders
the sampling of some low energy structures (Figure b) and that a more even sampling is achieved
with the SAT-expand method (Figure c).
Figure 7
Hits to the 10 lowest ranked crystal structures of quinacridone P1̅ based on the combined complete search of the
five methods. Each point represents a lattice energy minimization
from a trial structure, showing the step in the Sobol sequence where
the trial structure was generated and the lattice energy minimum to
which it optimizes.
Convergence
and sampling of the quinacridone P1̅ Z′
= 2 search for crystal structures. a) The number
of unique structures within 15 kJ/mol of the global minimum as a function
of the total number of successful lattice energy minimizations, using
each variation of the structure generation method. b) Hits of the
10 lowest energy crystal structures througout the Sobol sequence,
using the SAT-expand method.
Multiple Independent Molecules
Our calculations on Z′= 2 crystal
structures
of quinacridone demonstrate the increased difficulty of predicting
crystal structures with multiple independent molecules in the asymmetric
unit. The inclusion of a second independent molecule can greatly increase
the number of Sobol seeds needed to generate the desired number of
crystal structures. This can be seen in Table , where far higher values in the Sobol sequence
must be used for quinacridone P1̅ Z′ = 2 across all TVP values compared to the same
space group with Z′ = 1. The larger Sobol
sequences are needed as a large proportion of structures is rejected
due to overlap of molecules. This is a particular problem with TVP
= 1.0; large numbers of rejected structures lead to much more time
spent on the structure generation as a whole.The other challenging
aspect of Z′ = 2 is the higher dimensionality
of the energy surface and, hence, the smaller relative volume of configurational
space that is expected to lattice energy minimize to any particular
crystal structure. Although the number of unique Z′ = 2 crystal structures in the low energy region is small,
some of our searches do not find the full set of structures until
well over 10000 lattice energy minimizations have been completed (Figure ); the TVP = 2.0
search has not located one of the low energy structures, even as 50000
lattice energy minimizations are approached. The SAT-expand method
performs well in finding all low energy structures in a relatively
low number of lattice energy minimizations but still suffers from
very infrequent sampling of some structures (Figure b).
Figure 8
Convergence
and sampling of the quinacridone P1̅ Z′
= 2 search for crystal structures. a) The number
of unique structures within 15 kJ/mol of the global minimum as a function
of the total number of successful lattice energy minimizations, using
each variation of the structure generation method. b) Hits of the
10 lowest energy crystal structures througout the Sobol sequence,
using the SAT-expand method.
A second polymorph is known for
artemisinin, with four independent
molecules in the asymmetric unit. Ensuring complete searches of high Z′ structures is known to be difficult[67] due to the very high dimensionality of search
space, and we know of few previous studies which have successfully
located Z′ = 3[68] and Z′ = 4[69] polymorphs
in CSP studies. As a test of our methods, we generated 50000 structures
in the relevant space group, P1 with Z′ = 4, and found the known crystal structure to be the lowest
energy structure of all. However, there were only 3 matches to the
experimentally observed structure from 38510 valid lattice energy
minimizations. As with Z′ = 2 quinacridone,
the search required a large number of steps in the Sobol sequence
(Table ), due to a
high proportion of unphysical structures. The results demonstrate
that high Z′ CSP is possible, albeit challenging.
Full Searches
The SAT-expand method,
with a maximum volume parameter of 2.5, showed some of the best characteristics
in the above tests and was used in an extended search across a range
of space groups. Currently, 95 space groups are available to be searched
in the Glee program, but we restrict ourselves here to a
subset of the most commonly observed symmetries for organic molecular
crystals. For chiral, enantiomerically pure artemisinin, 5000 structures
were generated in each of 8 space groups (P1, P21, C2, P21212, P212121, C2221, P41212, and R3). These space
groups were searched for CC1 and quinacridone, in addition to P1̅, Cc, P21/c, C2/c, Pna21, Pbcn, Pbca, and Pnma,
all with 5000 accepted structures. All calculations were performed
with one molecule in the asymmetric unit, except for the case of a
search for the Z′ = 4 polymorph of artemisinin
which has been included as a special example. Lattice energy minimization
and clustering were performed using the same procedures as have been
employed throughout.Lattice energy vs density plots for artemisinin, quinacridone,
and CC1. Each point corresponds to a distinct crystal structure (a
unique minimum on the PES). For the case of quinacridone, two sets
of data have been calculated, and the basis set used in generating
the electrostatic model is included in parentheses in the subcaption.
The α polymorph was located at too high a lattice energy to
appear on the graph in the case of the 6-311G** basis set. For artemisinin
the Z′ = 4 structure is added. Predicted structures
that geometrically match the experimental structures (see Table ) are circled and
labeled.
Table 2
Matches from the Full CSP to Experimentally
Determined Structures of the Observed Polymorphsa
cell
lengths
cell
angles
crystal structure
a
b
c
α
β
γ
RMSD30
artemisinin (P212121)
expt
24.066
9.439
6.354
90.00
90.00
90.00
pred
24.456
9.399
6.386
90.00
90.00
90.00
0.131
artemisinin (P1) ,Z′ = 2
expt
9.881
9.891
15.343
93.28
90.92
102.99
pred
9.892
10.020
15.164
90.81
93.64
102.32
0.247
CC1 (R3, β′)
expt
21.015
21.015
10.491
90.00
90.00
120.00
pred
21.623
21.602
10.851
90.02
90.02
119.98
0.603
CC1 (P21/c, α′)
expt
12.810
10.910
36.810
90.00
97.49
90.00
pred
13.425
11.156
37.761
90.00
94.45
90.00
0.812
quinacridone (P21/c, γ)
expt
13.697
3.881
13.402
90.00
100.44
90.00
pred (6-31G**)
12.847
4.251
13.370
90.00
97.08
90.00
0.288
pred (6-311G**)
13.397
4.115
13.002
90.00
98.21
90.00
0.439
quinacridone (P21/c, β)
expt
5.692
3.975
30.020
90.00
96.76
90.00
pred (6-31G**)
5.746
4.110
29.565
90.00
93.90
90.00
0.369
pred (6-311G**)
8.972
5.296
34.750
90.00
123.27
90.00
0.492
quinacridone (P1̅, αI)
expt
3.802
6.612
14.485
100.68
94.40
102.11
pred (6-31G**)
4.331
6.203
13.632
97.49
97.07
98.00
0.451
pred (6-311G**)
4.620
6.372
12.530
95.25
97.82
103.62
1.245
quinacridone (P1̅, αII)
expt
14.934
3.622
12.935
91.39
107.13
92.84
pred (6-31G**)
13.684
4.369
13.239
90.00
115.39
90.00
0.219
pred (6-311G**)
13.397
4.115
13.002
90.00
98.21
90.00
1.019
RMSD30 is the deviation
in atomic positions of a cluster of 30 molecules taken from predicted
and experimental structures, not including hydrogen atoms. CC1 (R3) was generated in the P1 space group,
which reduces to R3 on account of intramolecular
symmetry, hence the cell angles differ at the second decimal place.
The experimental structures of CC1 also contained residual solvent,
which was removed for purposes of comparison. All structures were
converted to their reduced unit cell for comparison. Å and degrees
are used throughout.
Results are summarized in Figure , where each structure
is represented by its calculated
lattice energy and density. A central assumption of crystal structure
prediction by global lattice energy minimization is that the most
likely structure to be observed experimentally is that with the lowest
free energy of formation. Although free energy contributions associated
with the dynamics of molecules about the equilibrium positions can
be significant,[27,70] in this study we have focused
on the lattice energy, which is the largest contribution to the free
energy difference between crystal structures. The existence of polymorphs
indicates that the process of crystallization is more subtle than
a simple drive toward the single lowest lattice energy structure,
but our methodology is predicated upon the assumption that all solvent-free,
stable crystal structures can be located in a set of low-lying, lattice
energy ordered structures determined from a quasi-random search.
Figure 9
Lattice energy vs density plots for artemisinin, quinacridone,
and CC1. Each point corresponds to a distinct crystal structure (a
unique minimum on the PES). For the case of quinacridone, two sets
of data have been calculated, and the basis set used in generating
the electrostatic model is included in parentheses in the subcaption.
The α polymorph was located at too high a lattice energy to
appear on the graph in the case of the 6-311G** basis set. For artemisinin
the Z′ = 4 structure is added. Predicted structures
that geometrically match the experimental structures (see Table ) are circled and
labeled.
For artemisinin, we find that the second lowest energy structure
from the full search of Z′ = 1 structures
corresponds to the known crystal form (Figure a), to within 1.6% in lattice dimensions
(Table ). The structure
is only 0.15 kJ/mol above the global minimum within the constraint
of Z′ = 1. As described above, the second
known artemisinin polymorph, in P1 with Z′ = 4, was located in our search, as the global minimum in P1 with Z′ = 4 and lower in energy
than any other crystal structure that was generated in our full search
(Figure a). These
results suggest that this low symmetry crystal structure results from
lowering the lattice energy, rather than being kinetically trapped
as an “incomplete“ crystallization.[71]Our results for quinacridone were surprisingly sensitive
to the
basis set used in generating the electrostatic model for intermolecular
interactions. Among the quinacridone structures generated in the full
search using B3LYP/6-311G** electrostatics, the γ polymorph
(encircled Figure c) is the lowest energy experimentally known structure located, being
the fifth lowest structure in energy 3.4 kJ/mol above the global minimum.
The β (encircled Figure c) and α polymorphs sit
at 9.9 and 16.9 kJ/mol above the global minimum, respectively. These
energy rankings are surprisingly high, and there is no reason to believe
that these polymorphs are truly high energy crystal forms. Furthermore,
the predicted structures are geometrically in fairly poor agreement
with the structures determined from X-ray diffraction (Table ).To investigate the
sensitivity of these results to the electrostatic
model used in the force field model, all predicted crystal structures
were reoptimized using atomic multipoles derived from a smaller basis
set (6-31G**). The ranking of observed structures within the predictions
changes significantly; β is now the global minimum (encircled Figure d) with γ and
α 3.4 and 5.68 kJ/mol above the
minimum. While no full search was performed for Z′ = 2, the proposed structure of α has been located in each of the preliminary CSP searches that
have been used above to analyze the performance of TVP values and
the SAT-expand method, with both 6-311G** and 6-31G** basis sets.
However, as reported by Paulus et al.,[72] we observe that the Z′ = 2 α structure relaxes to the Z′
= 1 γ structure during lattice energy minimization. These two
polymorphs seem to correspond to the same minimum on the lattice energy
surface.RMSD30 is the deviation
in atomic positions of a cluster of 30 molecules taken from predicted
and experimental structures, not including hydrogen atoms. CC1 (R3) was generated in the P1 space group,
which reduces to R3 on account of intramolecular
symmetry, hence the cell angles differ at the second decimal place.
The experimental structures of CC1 also contained residual solvent,
which was removed for purposes of comparison. All structures were
converted to their reduced unit cell for comparison. Å and degrees
are used throughout.Throughout
this study we have maintained the same methodology for
generating a force field for all systems, but while the success has
been fairly strong for the cases of artemisinin and CC1, it was not
so for quinacridone. Previous work by Leusen[72] located the known polymorphs of quinacridone among the lowest structures
generated and minimized with a simple, isotropic atom force field,
but Kraft has demonstrated the importance of anisotropic force fields
in modeling the PES of polyaromatic hydrocarbons.[73] We include an ab initio anisotropic electrostatic
model, but it is highly sensitive to the underlying DFT calculations.
The results for quinacridone demonstrate important differences in
how the atomic multipoles model intermolecular electrostatics, which
may be due to the strong basis set dependence of the original distributed
multipole analysis algorithm that we employed here.[74] When we change the basis set used for the electrostatic
model, we use a empirically fitted exp-6 parameter set for all other
intermolecular interactions, although not one that is fitted to polyaromatic
hydrocarbons specifically. Even so, we would not expect such large
changes in lattice energy and polymorph ordering as have been observed
in this case. We also note that, even at the level of Hückel
theory, the electronic structure of a π system will change as
multiple rings are fused together, and, of course, this molecule is
semiconducting in the solid state, indicating its unusual character.
From the literature on enhanced π van der Waals interactions
in aromatics, Grimme has suggested that stronger dispersion effects
arise in systems beyond three fused rings,[75] which would include quinacridone, but few systems from which our
potential has been fitted. This study has, at the least, highlighted
the need for more work to produce a transferable, accurate force field
for these systems, while also tested the search methodology in a difficult
case.Finally, the results for CC1 agree with our earlier study
of this
molecule,[21] which used a Monte Carlo simulated
annealing approach to generating trial crystal structures. The two
polymorphs are found in the low energy region of the landscape and
are good geometrical matches to the structures determined by X-ray
diffraction. The crystal structures of this organic cage are obtained
by desolvation of solvate structures in which guest solvent molecules
fill the voids within and between cage molecules. The structure-directing
effect of the included solvent has been shown to be so strong that
polymorph transformation can be achieved through exposure to solvent
vapor.[46] In this situation of strong solvent
directing effects, it is unsurprising that the observed structures
do not correspond to the lowest energy possibilities on the solvent-free
energy landscape. Indeed, re-evaluation of the energies of the predicted
structures of CC1 using dispersion-corrected solid state DFT shows
little energetic reranking,[21] providing
further confidence in the force field based relative energies.
Conclusions
This paper outlines a method of generating
trial structures of
molecular crystals, which is an essential part of an ab initio crystal structure prediction methodology. A core idea of our methodology
is to consider the shape of the molecules but to use as few other
restrictions as possible in our quasi-random search. We have demonstrated
that this is an effective method of determining the full set of low
energy crystal packing possibilities of a molecule, which includes
the experimentally observed polymorphs in the cases that we have studied
here. We find that the global lattice energy minimum is typically
located early in a search and sampled frequently throughout a quasi-random
search. This is an important finding, as it suggests that short quasi-random
searches can be applied to rapidly evaluate a molecule’s crystal
packing preferences, which can be extended to complete, converged
searches if desired. However, we also find that some low energy crystal
structures are more infrequently sampled, making the frequency of
locating such difficult structures rate-limiting when a complete crystal
structure search is required.We examined the influence of increasing
the target unit cell volume
in generating trial crystal structures, but large target volumes led
to less reliable sampling of structures. Our use of the separating
axis theorem allows us to quickly rule out unphysical trial structures,
and by expanding the cell to relieve clashes, we can keep a larger
proportion of trial structures. This is important, as the Sobol sequence
is designed to cover the manifold of random numbers in an efficient
way and helps us to rapidly consider a wide range of potential structures
in the search. This SAT-expand approach has the best characterists,
overall, of the variations on our method that we have investigated.This tool not only provides us with a method to conduct a crystal
structure prediction study for rigid molecules but also provides a
platform upon which further functionality can be built. We have a
robust method of exploring the PES that we show to be effective for
a set of different molecules and structures with multiple molecules
in the asymmetric unit (Z′ = 2 and 4). The search methodology
also finds polymorphs whose crystallization is determined by solvent
templating rather than the principle of close packing (CC1) or even
in a case for which our energy model is not optimized for a particular
case (quinacridone). The principle of a pseudorandom number search,
when coupled to our code base, provides us with a lot of flexibility.
We are currently extending this methodology to include further functionality,
such as molecular flexibility, and are also incorporating these tools
into high-througput screening of molecules for discovering molecular
crystals with targeted properties.
Authors: Graeme M Day; Timothy G Cooper; Aurora J Cruz-Cabeza; Katarzyna E Hejczyk; Herman L Ammon; Stephan X M Boerrigter; Jeffrey S Tan; Raffaele G Della Valle; Elisabetta Venuti; Jovan Jose; Shridhar R Gadre; Gautam R Desiraju; Tejender S Thakur; Bouke P van Eijck; Julio C Facelli; Victor E Bazterra; Marta B Ferraro; Detlef W M Hofmann; Marcus A Neumann; Frank J J Leusen; John Kendrick; Sarah L Price; Alston J Misquitta; Panagiotis G Karamertzanis; Gareth W A Welch; Harold A Scheraga; Yelena A Arnautova; Martin U Schmidt; Jacco van de Streek; Alexandra K Wolf; Bernd Schweizer Journal: Acta Crystallogr B Date: 2009-03-16
Authors: Tomokazu Tozawa; James T A Jones; Shashikala I Swamy; Shan Jiang; Dave J Adams; Stephen Shakespeare; Rob Clowes; Darren Bradshaw; Tom Hasell; Samantha Y Chong; Chiu Tang; Stephen Thompson; Julia Parker; Abbie Trewin; John Bacsa; Alexandra M Z Slawin; Alexander Steiner; Andrew I Cooper Journal: Nat Mater Date: 2009-10-25 Impact factor: 43.841
Authors: Andrei V Kazantsev; Panagiotis G Karamertzanis; Claire S Adjiman; Constantinos C Pantelides; Sarah L Price; Peter T A Galek; Graeme M Day; Aurora J Cruz-Cabeza Journal: Int J Pharm Date: 2011-04-08 Impact factor: 5.875
Authors: Sarah L Price; Maurice Leslie; Gareth W A Welch; Matthew Habgood; Louise S Price; Panagiotis G Karamertzanis; Graeme M Day Journal: Phys Chem Chem Phys Date: 2010-07-07 Impact factor: 3.676
Authors: Hiyam Hamaed; Jenna M Pawlowski; Benjamin F T Cooper; Riqiang Fu; S Holger Eichhorn; Robert W Schurko Journal: J Am Chem Soc Date: 2008-07-26 Impact factor: 15.419
Authors: Anthony M Reilly; Richard I Cooper; Claire S Adjiman; Saswata Bhattacharya; A Daniel Boese; Jan Gerit Brandenburg; Peter J Bygrave; Rita Bylsma; Josh E Campbell; Roberto Car; David H Case; Renu Chadha; Jason C Cole; Katherine Cosburn; Herma M Cuppen; Farren Curtis; Graeme M Day; Robert A DiStasio; Alexander Dzyabchenko; Bouke P van Eijck; Dennis M Elking; Joost A van den Ende; Julio C Facelli; Marta B Ferraro; Laszlo Fusti-Molnar; Christina Anna Gatsiou; Thomas S Gee; René de Gelder; Luca M Ghiringhelli; Hitoshi Goto; Stefan Grimme; Rui Guo; Detlef W M Hofmann; Johannes Hoja; Rebecca K Hylton; Luca Iuzzolino; Wojciech Jankiewicz; Daniël T de Jong; John Kendrick; Niek J J de Klerk; Hsin Yu Ko; Liudmila N Kuleshova; Xiayue Li; Sanjaya Lohani; Frank J J Leusen; Albert M Lund; Jian Lv; Yanming Ma; Noa Marom; Artëm E Masunov; Patrick McCabe; David P McMahon; Hugo Meekes; Michael P Metz; Alston J Misquitta; Sharmarke Mohamed; Bartomeu Monserrat; Richard J Needs; Marcus A Neumann; Jonas Nyman; Shigeaki Obata; Harald Oberhofer; Artem R Oganov; Anita M Orendt; Gabriel I Pagola; Constantinos C Pantelides; Chris J Pickard; Rafal Podeszwa; Louise S Price; Sarah L Price; Angeles Pulido; Murray G Read; Karsten Reuter; Elia Schneider; Christoph Schober; Gregory P Shields; Pawanpreet Singh; Isaac J Sugden; Krzysztof Szalewicz; Christopher R Taylor; Alexandre Tkatchenko; Mark E Tuckerman; Francesca Vacarro; Manolis Vasileiadis; Alvaro Vazquez-Mayagoitia; Leslie Vogt; Yanchao Wang; Rona E Watson; Gilles A de Wijs; Jack Yang; Qiang Zhu; Colin R Groom Journal: Acta Crystallogr B Struct Sci Cryst Eng Mater Date: 2016-08-01
Authors: Angeles Pulido; Linjiang Chen; Tomasz Kaczorowski; Daniel Holden; Marc A Little; Samantha Y Chong; Benjamin J Slater; David P McMahon; Baltasar Bonillo; Chloe J Stackhouse; Andrew Stephenson; Christopher M Kane; Rob Clowes; Tom Hasell; Andrew I Cooper; Graeme M Day Journal: Nature Date: 2017-03-22 Impact factor: 49.962
Authors: Peng Cui; David P McMahon; Peter R Spackman; Ben M Alston; Marc A Little; Graeme M Day; Andrew I Cooper Journal: Chem Sci Date: 2019-09-17 Impact factor: 9.825
Authors: Doris E Braun; Sreenivas R Lingireddy; Mark D Beidelschies; Rui Guo; Peter Müller; Sarah L Price; Susan M Reutzel-Edens Journal: Cryst Growth Des Date: 2017-09-07 Impact factor: 4.076
Authors: Kecheng Jie; Ming Liu; Yujuan Zhou; Marc A Little; Angeles Pulido; Samantha Y Chong; Andrew Stephenson; Ashlea R Hughes; Fumiyasu Sakakibara; Tomoki Ogoshi; Frédéric Blanc; Graeme M Day; Feihe Huang; Andrew I Cooper Journal: J Am Chem Soc Date: 2018-05-24 Impact factor: 15.419
Authors: David P McMahon; Andrew Stephenson; Samantha Y Chong; Marc A Little; James T A Jones; Andrew I Cooper; Graeme M Day Journal: Faraday Discuss Date: 2018-10-26 Impact factor: 4.008