Genki Imam Prayogo1, Andrea Tirelli2, Keishu Utimula1, Kenta Hongo3, Ryo Maezono4, Kousuke Nakano2,4. 1. School of Materials Science, JAIST, Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan. 2. International School for Advanced Studies (SISSA), Via Bonomea 265, 34136 Trieste, Italy. 3. Research Center for Advanced Computing Infrastructure, JAIST, Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan. 4. School of Information Science, JAIST, Asahidai 1-1, Nomi, Ishikawa 923-1292, Japan.
Abstract
A common approach for studying a solid solution or disordered system within a periodic ab initio framework is to create a supercell in which certain amounts of target elements are substituted with other elements. The key to generating supercells is determining how to eliminate symmetry-equivalent structures from many substitution patterns. Although the total number of substitutions is on the order of trillions, only symmetry-inequivalent atomic substitution patterns need to be identified, and their number is far smaller than the total. Our developed Python software package, which is called Shry (Suite for High-throughput generation of models with atomic substitutions implemented by Python), allows the selection of only symmetry-inequivalent structures from the vast number of candidates based on the canonical augmentation algorithm. Shry is implemented in Python 3 and uses the CIF format as the standard for both reading and writing the reference and generated sets of substituted structures. Shry can be integrated into another Python program as a module or can be used as a stand-alone program. The implementation was verified through a comparison with other codes with the same functionality, based on the total numbers of symmetry-inequivalent structures, and also on the equivalencies of the output structures themselves. The provided crystal structure data used for the verification are expected to be useful for benchmarking other codes and also developing new algorithms in the future.
A common approach for studying a solid solution or disordered system within a periodic ab initio framework is to create a supercell in which certain amounts of target elements are substituted with other elements. The key to generating supercells is determining how to eliminate symmetry-equivalent structures from many substitution patterns. Although the total number of substitutions is on the order of trillions, only symmetry-inequivalent atomic substitution patterns need to be identified, and their number is far smaller than the total. Our developed Python software package, which is called Shry (Suite for High-throughput generation of models with atomic substitutions implemented by Python), allows the selection of only symmetry-inequivalent structures from the vast number of candidates based on the canonical augmentation algorithm. Shry is implemented in Python 3 and uses the CIF format as the standard for both reading and writing the reference and generated sets of substituted structures. Shry can be integrated into another Python program as a module or can be used as a stand-alone program. The implementation was verified through a comparison with other codes with the same functionality, based on the total numbers of symmetry-inequivalent structures, and also on the equivalencies of the output structures themselves. The provided crystal structure data used for the verification are expected to be useful for benchmarking other codes and also developing new algorithms in the future.
Atomic substitution
in solids is the most common strategy in materials
science to tune material properties for applications such as alloying
and cation/anion/vacancy doping, which often demonstrate unprecedented
functionalities superior to nonsubstituted materials.[1] Within the scientific and technological communities, there
is a considerable demand for the ability to predict the extent to
which substitutions affect material properties using ab initio simulations, as well as the ability to evaluate whether substitutions
are useful for achieving the desired properties. For instance, if
we consider vacancy-type defects as a form of substitution (by a vacancy),
this demand covers another industrially important problem: understanding
how defects affect material properties.[2,3] This problem
arises during the evaluation of sample qualities, damages, and degradation. Ab initio calculations based on density functional theory
(DFT) are the most promising framework for such an assessment; they
utilize the microscopic structure models of substitutions.There
are many strategies for studying disordered crystals or solid
solutions within the DFT framework. If one considers disorder effects
at the DFT level with the preservation of periodicity of a target
crystal, virtual crystal approximation (VCA)[4] or coherent potential approximation (CPA)[5] combined with the Korringa–Kohn–Rostoker (KKR) method[6,7] are good choices. The former simulates disordered crystals or solid
solutions by mixing (all-electron or pseudo) potentials of the constituent
elements based on the composition, whereas the latter utilizes an
effective potential constructed from the Green’s function of
the effective medium. However, because both methods require special
implementations, not all ab initio codes can perform
them. The so-called supercell method is a considerably
simpler yet very powerful strategy. In this method, substitutions
are performed within the supercell of the original (i.e., unsubstituted)
unit cell. While the supercell approach is simple and applicable for
any ab initio framework, an intrinsic problem is
that the approach introduces an artificial periodic
boundary condition that does not exist in real materials. A direct
approach to alleviate this artificial periodicity is to increase the
size of the supercell. Although a larger supercell inevitably increases
computational cost, recent advances in high-performance computing
allow us to handle it even at the phonon analysis level.[8,9] Further, a larger supercell affords a finer resolution of substitution
concentration compared to an unsubstituted unit cell because only
discrete substitutions are allowed in the supercell approach.[10] In addition, more sophisticated methods to alleviate
the artificial periodicity have been devised. One such successful
method is the method of quasi-random structures (SQS),[11,12] which finds the closest periodic supercell to a genuine disordered
state based on correlation functions. If the thermodynamical properties
are in focus, the cluster expansion method with Monte Carlo sampling[13] is another very effective technique for studying
disordered crystals.[14] Several schemes
have been developed[15−18] to correct spurious interactions between supercell images for studying
charged defects in crystals more quantitatively. These methods ensure
that the supercell approach is effective and reliable. In this paper,
we focus on the supercell approach.Within the supercell approach,
a solid solution or disordered system
is modeled with a supercell in which certain amounts of target elements
are substituted with other elements. The number of possible atomic
substitution patterns grows combinatorically with
the substitution concentration and size of the supercell; this readily
causes technical issues in the ab initio framework.
A simple but powerful solution is to randomly select several configurations
from the vast number of the possible combinations.[19,20] With an efficient implementation, this random sampling method is
not limited by the number of configurations.[20] Further, the method is easy to implement. In this paper, we focus
on an exhaustive approach of exploring all the symmetry-inequivalent
structures. Indeed, many configurations in the redundant structure
set are related to crystallographic symmetry operations; however, ab initio simulations require only symmetry-inequivalent
structures. This implies that these redundant configurations
containing many symmetry-equivalent configurations can be further
categorized into those composed of only symmetry-inequivalent configurations. If the number of symmetry-inequivalent structures
is too large, one can also select several feasible configurations
after enumerating the symmetry-inequivalent structures according to
a criterion such as the Ewald energy.Several software packages
have been developed to address such vast
combinations of atomic substitution in a solid; examples include Supercell,[21]Materials Studio (i.e., Disorder tool),[22]enumlib[23] (also combined with Pymatgen (Python Materials Genomics)[24]), Crystal,[25,26] and Sod.[27] Recently, we developed a Python package called Shry (a Suite for High-throughput generation of models with
atomic substitutions implemented in Python). Shry allows
one to select only symmetry-inequivalent structures from a set of
redundant structures based on the canonical augmentation algorithm.[28]Shry is implemented in Python 3 and
uses the CIF format as the standard for both reading and writing the
reference and generated sets of substituted structures. Shry can be integrated into another Python program as a module or can
be used as a stand-alone program. In the following, we describe the
package specification, underlying method, implementation, and several
benchmark and validation tests.
Summary of Technical Details
Shry is implemented in Python 3 and uses the CIF format
as the standard for both reading and writing the reference and generated
sets of substituted structures. Structure abstractions, manipulations,
and symmetry analysis rely on Pymatgen(24) and spglib.[29] The key
point for the implementation is identifying and avoiding, among all
possible permutations, multiple instances of symmetrically identical
structures. The implementation of a variant of the canonical augmentation[30] algorithm is unique to Shry, which
allows symmetrically unique structures corresponding to the specified
substitute concentration to be recursively generated from the unique
structures. As a command line tool, a CIF format file containing site
occupation information can be used as an input structure. Shry writes all the identified symmetry-inequivalent structures as separate
CIF files. A user can also specify the maximum number of outputted
CIF files if the total number of structures is too large to write
all of them. The total number of symmetry-inequivalent structures
can be estimated a priori by Polya’s enumeration
theorem,[31,32] which provides the expected number of irreducible
structures. To enable a user to include Shry as a module
in their own Python scripts, we provide several examples of Python
scripts in the distribution, which also include interfaces for Shry with Pymatgen.Since our formulation described
in the Supporting Information is quite formal (but entirely self-contained),
we here present several hints on the properties of canonical augmentation.
The objective of Shry is to eliminate symmetry-equivalent
structures from the possible atomic substitution patterns. A straightforward
solution, classifying symmetry-inequivalent structures after determining
all possible configurations, is redundant and practically unfeasible
because the number of total combinations increases exponentially (i.e.,
combinatorial explosion). The key observation to alleviate the combinatorial
explosion problem is that the atomic substitution problem can be mapped
into a vertex coloring problem (Figures and 2). Here, we
consider a very simple case: the problem of coloring the eight vertices
of a cube with two colors, as shown in Figure . We color white vertices with red in a stepwise
manner until three of them become red. This is essentially equivalent
to the substitution in which three out of eight atoms on a Wyckoff
site are substituted with another element (e.g., see Figure ; Ce8Pd24Sb → (Ce5,La3)Pd24Sb, where
Ce and La atoms are on the 8g Wyckoff position and
the space group is 221-Pm3m). Then, the question of how many symmetry-inequivalent
structures exist in all the possible substitution patterns essentially
becomes equivalent to the question of how many symmetrically unique
coloring patterns exist in a given graph.
Figure 1
Example of the search
tree, where three vertices out of eight are
colored in red in a cube. p(X) denotes
the parent node of X, whereas C(X) denotes the set of children of X.
Figure 2
Atomic substitutions corresponding to Figure . Ce8Pd24Sb →
(Ce5,La3)Pd24Sb, where Ce and La
atoms are on the 8g Wyckoff site. The crystal structure[34] was obtained from the ICSD database[35,36] (CollCode: 83378). The space group is 221-Pm3m. The crystal structures are depicted
using VESTA.[37]
Example of the search
tree, where three vertices out of eight are
colored in red in a cube. p(X) denotes
the parent node of X, whereas C(X) denotes the set of children of X.Atomic substitutions corresponding to Figure . Ce8Pd24Sb →
(Ce5,La3)Pd24Sb, where Ce and La
atoms are on the 8g Wyckoff site. The crystal structure[34] was obtained from the ICSD database[35,36] (CollCode: 83378). The space group is 221-Pm3m. The crystal structures are depicted
using VESTA.[37]A search tree to find the unique coloring patterns can be constructed
as shown in Figure , where the search tree is truncated at the depth corresponding to
the number of colored vertices. The tree is traversed by a proper
algorithm such as the depth-first traversal procedure. The mapping
itself does not solve the combinational explosion problem, because
one visits all the nodes in the worst-case scenario. If one can truncate
large parts of the search tree at the upper level without visiting
lower nodes, the computational cost can be drastically decreased.
For instance, in Figure , since p(X) and p(Y) are symmetrically equivalent, all the children
of p(Y) such as Y and W can be disregarded without visiting these
nodes, if all the children associated with p(X) such as X and Z have
been already visited. This so-called isomorphic rejection(28) procedure is one of the key points
in the present study. Such truncations should be performed using only local information. Indeed, if one needs to store all the
information on the visited nodes for the truncation (i.e., nonlocal information is needed), the combinatorial explosion
problem still remains. The orderly generation and canonical augmentation
exploit the so-called canonicity for a node representation
to truncate a search tree using only local information (see the Supporting Information for the
mathematical definition). They are very powerful algorithms for the
classification of combinatorial structures, the main idea of which
is to construct only nonequivalent objects (namely, symmetry-inequivalent
structures in the atomic substitution problem) and thereby classify
such objects. The orderly generation and canonical augmentation were
first introduced by Read[33] and McKay,[30] respectively.[28] They
are similar algorithms in the sense that both exploit the canonicity.
However, they also show some significant differences. The orderly
generation requires that all nodes should themselves be canonical.[28] In contrast, the canonical augmentation does
not require the canonicity of nodes themselves; rather, it requires
a search tree to be generated in a canonical manner.[28] This feature of the canonical augmentation
allows us to occasionally avoid the computation of time-consuming
canonicity tests in the tree search. In the Supporting Information, we outline the mathematical details of the algorithms
implemented in Shry.
Comparison with Existing Software Packages
Since the atomic substitution problem is very general and important,
several software packages have already been developed. To our knowledge,
the site-occupancy disorder (Sod) package is the pioneering
work in this field.[27] The software package
has been applied for many disordered systems since 2007. Another open-source
package with similar functionalities is the enumlib package.[23] The code enables us to generate derivative superstructures
of a parent lattice in a considerably efficient manner, and it has
been integrated into Pymatgen.[24]Supercell[21] is the most recent
and most sophisticated software package in the field. The package
implements considerably efficient algorithms for the atomic substitution
problem. The atomic substitution problem can be solved using several
commercial software packages. The Disorder tool implemented
in Materials Studio(22) can generate
solid solutions with a very user-friendly graphical user interface
(GUI). Similar functionalities are implemented in a well-known DFT
code, Crystal program,[26,38] to treat disorder systems
within the periodic ab initio framework. Within the
existing solutions, Okhotonikov et al.[21] reported that only the Supercell package could handle cases
with a large number of permutations because the other programs crash
on such computationally demanding test cases. Therefore, for the benchmark
and validation tests, the relevant comparison is mostly between Shry ver. 1.1.0 and Supercell ver. 2.0.2.
Benchmark Test
APb1–Te is a typical benchmark system used for testing the performance
of an atomic substitution program. Table summarizes the results for this benchmark
test. Since Shry is implemented in Python (i.e., an interpreted
language), it is intrinsically slower than the other tools implemented
in compiled programming languages such as C++. Note that the actual
computational time of Supercell is much smaller than its
CPU time because Supercell is multithreaded very well. In
contrast, Shry is not multithreaded at present.
Table 1
Total Number of Possible Atom Combinations
for Different Supercell Sizes of the A0.5Pb0.5Te Systema
Total number of atoms in simulation cell
8
16
24
32
64
Supercell size a × b × c
1 × 1 × 1
1 × 1 × 2
1 × 1 × 3
1 × 2 × 2
2 × 2 × 2
Symmetry operations
192
128
192
256
1536
Total combinations (symmetry-inequivalent
structures)
6(1)
70(8)
924 (34)
12,870 (153)
601,080,390 (404,582)
Performance of Shry (s)
2.85(1.21)
1.70(1.17)
1.78(1.27)
1.85(1.39)
161.91(160.99)
Performance of Supercell (s)
0.03(0.01)
0.03(0.02)
0.04(0.02)
0.06(0.04)
8.40(200.72)
The computational
times of Shry and Supercell were measured on an
Intel Xeon G-6242
(2.80 GHz, 16 cores, 32 threads) processor using the time command, where the time-consuming I/O operations were disabled.
Both real and user (in parentheses) times are shown in the Shry and Supercell rows. The distributed binary Supercell program was used for the test.
The computational
times of Shry and Supercell were measured on an
Intel Xeon G-6242
(2.80 GHz, 16 cores, 32 threads) processor using the time command, where the time-consuming I/O operations were disabled.
Both real and user (in parentheses) times are shown in the Shry and Supercell rows. The distributed binary Supercell program was used for the test.We benchmarked Shry and Supercell for a variety
of systems. For the benchmark data set, we randomly chose several
CIF files from the Crystallography Open Database (COD)[39] for each space group, as listed in Table S2 (Supporting Information), and we generated CIF files containing up to three atomic substitution
sites. The supercell size and atomic substitution sites were randomly
and repeatedly selected for each compound until the total number of
symmetry-inequivalent structures becomes less than 109.
The number of compounds used for the benchmark test is 500. Table S2 lists the compounds, unique IDs in the
database, and other relevant information used for the benchmark test.
Further, we report the sizes of supercells, total numbers of substituted
structures, and numbers of symmetry-inequivalent structures. The computational
times required to find all symmetry-inequivalent structures are summarized
in the rightmost columns of Table S2. We
confirmed that both the number of substitution patterns including
symmetry-equivalent structures and the number of symmetry-inequivalent
structures are consistent between Shry and Supercell for all compounds listed in Table S2 with
the given substitution patterns. We show the computational times of Shry on the benchmark set in Figure . The computational time is almost constant
up to N ∼ 104 in the small N region, where N refers to the number
of symmetrically distinct configurations. This is because preconditioning
steps, such as finding the symmetry operations of an unsubstituted
structure, are the rate-determining steps of the algorithms (e.g.,
the actual computational times are less than 3 s for N ≤ 104). Instead, the computational time in the
large N region increases as N increases;
however, it scales linearly up to N ∼ 109. Since Shry is purely implemented
in Python and not yet parallelized, the actual computational time
is longer than that of other packages realizing the same functionality,
such as Supercell shown in Figure S3. There are several possible strategies for speeding up the computation
in Shry, such as parallelizing the code or rewriting the
core parts with Cython or C++, which would be promising future works.
Figure 3
Computational
times of Shry on a benchmark data set. On
the horizontal axis, we place the number of symmetrically distinct
structures, whereas the vertical axis indicates the absolute computational
time.
Computational
times of Shry on a benchmark data set. On
the horizontal axis, we place the number of symmetrically distinct
structures, whereas the vertical axis indicates the absolute computational
time.
Validation Test
We compared not
only computational times and the numbers of structures
but also the redundancies and equivalencies of the generated structures
among the software packages. We performed both (a) intra-software
and (b) inter-software tests. (a) The intra-software code test checked whether each program generates zero redundant
structures. Thus, for each program, we investigated whether each generated
structure is symmetry inequivalent to all the others (Figure (a)). Brute-force comparisons were performed for this test (i.e., all the possible
combinations were investigated). (b) The inter-software code test checked whether the symmetry-inequivalent structure sets
generated by Shry and Supercell (or Disorder
tool) are identical. Thus, we built pairs of Shry and Supercell (or Disorder tool) structures and confirmed
that there is no excess or deficiency (Figure (b)).
Figure 4
Schematics of (a) intra-software and (b)
inter-software tests.
The intra-software test checked whether each program provides zero
redundant structures, where brute-force comparisons
were performed (i.e., all the possible combinations were considered).
The inter-software test checked whether the symmetry-inequivalent
structure sets generated by Shry and Supercell (Disorder tool) are identical; i.e., there is no excess or deficiency
among them. The crystal structures are depicted using VESTA.[37]
Schematics of (a) intra-software and (b)
inter-software tests.
The intra-software test checked whether each program provides zero
redundant structures, where brute-force comparisons
were performed (i.e., all the possible combinations were considered).
The inter-software test checked whether the symmetry-inequivalent
structure sets generated by Shry and Supercell (Disorder tool) are identical; i.e., there is no excess or deficiency
among them. The crystal structures are depicted using VESTA.[37]For the validation test,
we used a total of 600 compounds, which
were taken from COD.[39]Table S3 lists the compounds, unique IDs in the database,
and other relevant information used for the validation test. No discrepancy
was found among the three codes (Shry, Supercell, and Disorder tool). Both the intra-software and inter-software tests
were successful for all the 600 compounds with Shry and Supercell (Disorder tool was tested only for some cases;
see the Supporting Information). All the
used CIF files are uploaded in our git repository.
Conclusion
We developed a Python software package, Shry, that enables
the selection of only symmetry-inequivalent structures from a vast
number of candidates. This is very useful when employing the supercell
approach to study a solid solution or a disordered system in an ab initio calculation. Although we showed several examples
in this paper where some elements are substituted by other elements,
the application of Shry is not limited to them. For instance, Shry can also be applied for studying vacancies and charge disproportionation
in crystals within the supercell approach as far as symmetry-inequivalent
patterns are concerned. Notice that the magnetic structures (e.g.,
configurations of up and down spins) cannot be studied with Shry at present because the current implementation does not support Shubnikov
groups. There are several perspectives for the future development
of Shry. One major concern is the long absolute computational
time. Since Shry is purely implemented in Python and is not
multithreaded at present, the absolute computational time is not compatible
with other codes for large N. A possible solution
is parallelizing the code, which should be very efficient because
canonical augmentation can be performed independently. Another possibility
is rewriting the core parts of Shry with much a faster language,
such as C++, like other scientific packages. From the algorithmic
point of view, a great advantage of the canonical augmentation algorithm
used in Shry is the use of the subobject invariant (Supporting Information), which enables us to
avoid computing the time-consuming canonicity test. The present implementation,
i.e., the sum of the distance matrix, is sometimes not capable of
distinguishing objects belonging to different orbits on a subobject.
Finding a more distinguishable subject invariant is an interesting
direction for future work that must be pursued to fully exploit the
advantage of the canonical augmentation algorithm, realizing a more
efficient exhaustive search.
Data and Software Availability
The
Python software package, Shry, is distributed on our GitHub repository [https://github.com/giprayogo/SHRY] under the MIT license. The
package is archived in the Zenodo repository [10.5281/zenodo.5652360]. All the CIF files used for the benchmark and validation tests
are also available from our GitHub repository [https://github.com/giprayogo/SHRY], which is also archived in the Zenodo repository [10.5281/zenodo.5652360], and those of unsubstituted structures are available from the Crystallography
Open Database (COD)[39] [https://www.crystallography.net/cod/].
Authors: Philippe D'Arco; Sami Mustapha; Matteo Ferrabone; Yves Noël; Marco De La Pierre; Roberto Dovesi Journal: J Phys Condens Matter Date: 2013-08-02 Impact factor: 2.333
Authors: Sami Mustapha; Philippe D'Arco; Marco De La Pierre; Yves Noël; Matteo Ferrabone; Roberto Dovesi Journal: J Phys Condens Matter Date: 2013-02-06 Impact factor: 2.333
Authors: Roberto Dovesi; Fabien Pascale; Bartolomeo Civalleri; Klaus Doll; Nicholas M Harrison; Ian Bush; Philippe D'Arco; Yves Noël; Michel Rérat; Philippe Carbonnière; Mauro Causà; Simone Salustro; Valentina Lacivita; Bernard Kirtman; Anna Maria Ferrari; Francesco Silvio Gentile; Jacopo Baima; Mauro Ferrero; Raffaella Demichelis; Marco De La Pierre Journal: J Chem Phys Date: 2020-05-29 Impact factor: 3.488
Authors: Saulius Gražulis; Daniel Chateigner; Robert T Downs; A F T Yokochi; Miguel Quirós; Luca Lutterotti; Elena Manakova; Justas Butkus; Peter Moeck; Armel Le Bail Journal: J Appl Crystallogr Date: 2009-05-30 Impact factor: 3.304