Literature DB >> 35678099

Shry: Application of Canonical Augmentation to the Atomic Substitution Problem.

Genki Imam Prayogo¹, Andrea Tirelli², Keishu Utimula¹, Kenta Hongo³, Ryo Maezono⁴, Kousuke Nakano^2,4.

Abstract

A common approach for studying a solid solution or disordered system within a periodic ab initio framework is to create a supercell in which certain amounts of target elements are substituted with other elements. The key to generating supercells is determining how to eliminate symmetry-equivalent structures from many substitution patterns. Although the total number of substitutions is on the order of trillions, only symmetry-inequivalent atomic substitution patterns need to be identified, and their number is far smaller than the total. Our developed Python software package, which is called Shry (Suite for High-throughput generation of models with atomic substitutions implemented by Python), allows the selection of only symmetry-inequivalent structures from the vast number of candidates based on the canonical augmentation algorithm. Shry is implemented in Python 3 and uses the CIF format as the standard for both reading and writing the reference and generated sets of substituted structures. Shry can be integrated into another Python program as a module or can be used as a stand-alone program. The implementation was verified through a comparison with other codes with the same functionality, based on the total numbers of symmetry-inequivalent structures, and also on the equivalencies of the output structures themselves. The provided crystal structure data used for the verification are expected to be useful for benchmarking other codes and also developing new algorithms in the future.

Entities: Chemical

Mesh：
Algorithms
Software

Year: 2022 PMID： 35678099 PMCID： PMC9241080 DOI： 10.1021/acs.jcim.2c00389

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 6.162

Introduction

Atomic substitution in solids is the most common strategy in materials science to tune material properties for applications such as alloying and cation/anion/vacancy doping, which often demonstrate unprecedented functionalities superior to nonsubstituted materials.[1] Within the scientific and technological communities, there is a considerable demand for the ability to predict the extent to which substitutions affect material properties using ab initio simulations, as well as the ability to evaluate whether substitutions are useful for achieving the desired properties. For instance, if we consider vacancy-type defects as a form of substitution (by a vacancy), this demand covers another industrially important problem: understanding how defects affect material properties.[2,3] This problem arises during the evaluation of sample qualities, damages, and degradation. Ab initio calculations based on density functional theory (DFT) are the most promising framework for such an assessment; they utilize the microscopic structure models of substitutions. There are many strategies for studying disordered crystals or solid solutions within the DFT framework. If one considers disorder effects at the DFT level with the preservation of periodicity of a target crystal, virtual crystal approximation (VCA)[4] or coherent potential approximation (CPA)[5] combined with the Korringa–Kohn–Rostoker (KKR) method[6,7] are good choices. The former simulates disordered crystals or solid solutions by mixing (all-electron or pseudo) potentials of the constituent elements based on the composition, whereas the latter utilizes an effective potential constructed from the Green’s function of the effective medium. However, because both methods require special implementations, not all ab initio codes can perform them. The so-called supercell method is a considerably simpler yet very powerful strategy. In this method, substitutions are performed within the supercell of the original (i.e., unsubstituted) unit cell. While the supercell approach is simple and applicable for any ab initio framework, an intrinsic problem is that the approach introduces an artificial periodic boundary condition that does not exist in real materials. A direct approach to alleviate this artificial periodicity is to increase the size of the supercell. Although a larger supercell inevitably increases computational cost, recent advances in high-performance computing allow us to handle it even at the phonon analysis level.[8,9] Further, a larger supercell affords a finer resolution of substitution concentration compared to an unsubstituted unit cell because only discrete substitutions are allowed in the supercell approach.[10] In addition, more sophisticated methods to alleviate the artificial periodicity have been devised. One such successful method is the method of quasi-random structures (SQS),[11,12] which finds the closest periodic supercell to a genuine disordered state based on correlation functions. If the thermodynamical properties are in focus, the cluster expansion method with Monte Carlo sampling[13] is another very effective technique for studying disordered crystals.[14] Several schemes have been developed[15−18] to correct spurious interactions between supercell images for studying charged defects in crystals more quantitatively. These methods ensure that the supercell approach is effective and reliable. In this paper, we focus on the supercell approach. Within the supercell approach, a solid solution or disordered system is modeled with a supercell in which certain amounts of target elements are substituted with other elements. The number of possible atomic substitution patterns grows combinatorically with the substitution concentration and size of the supercell; this readily causes technical issues in the ab initio framework. A simple but powerful solution is to randomly select several configurations from the vast number of the possible combinations.[19,20] With an efficient implementation, this random sampling method is not limited by the number of configurations.[20] Further, the method is easy to implement. In this paper, we focus on an exhaustive approach of exploring all the symmetry-inequivalent structures. Indeed, many configurations in the redundant structure set are related to crystallographic symmetry operations; however, ab initio simulations require only symmetry-inequivalent structures. This implies that these redundant configurations containing many symmetry-equivalent configurations can be further categorized into those composed of only symmetry-inequivalent configurations. If the number of symmetry-inequivalent structures is too large, one can also select several feasible configurations after enumerating the symmetry-inequivalent structures according to a criterion such as the Ewald energy. Several software packages have been developed to address such vast combinations of atomic substitution in a solid; examples include Supercell,[21]Materials Studio (i.e., Disorder tool),[22]enumlib[23] (also combined with Pymatgen (Python Materials Genomics)[24]), Crystal,[25,26] and Sod.[27] Recently, we developed a Python package called Shry (a Suite for High-throughput generation of models with atomic substitutions implemented in Python). Shry allows one to select only symmetry-inequivalent structures from a set of redundant structures based on the canonical augmentation algorithm.[28]Shry is implemented in Python 3 and uses the CIF format as the standard for both reading and writing the reference and generated sets of substituted structures. Shry can be integrated into another Python program as a module or can be used as a stand-alone program. In the following, we describe the package specification, underlying method, implementation, and several benchmark and validation tests.

Summary of Technical Details

Shry is implemented in Python 3 and uses the CIF format as the standard for both reading and writing the reference and generated sets of substituted structures. Structure abstractions, manipulations, and symmetry analysis rely on Pymatgen(24) and spglib.[29] The key point for the implementation is identifying and avoiding, among all possible permutations, multiple instances of symmetrically identical structures. The implementation of a variant of the canonical augmentation[30] algorithm is unique to Shry, which allows symmetrically unique structures corresponding to the specified substitute concentration to be recursively generated from the unique structures. As a command line tool, a CIF format file containing site occupation information can be used as an input structure. Shry writes all the identified symmetry-inequivalent structures as separate CIF files. A user can also specify the maximum number of outputted CIF files if the total number of structures is too large to write all of them. The total number of symmetry-inequivalent structures can be estimated a priori by Polya’s enumeration theorem,[31,32] which provides the expected number of irreducible structures. To enable a user to include Shry as a module in their own Python scripts, we provide several examples of Python scripts in the distribution, which also include interfaces for Shry with Pymatgen. Since our formulation described in the Supporting Information is quite formal (but entirely self-contained), we here present several hints on the properties of canonical augmentation. The objective of Shry is to eliminate symmetry-equivalent structures from the possible atomic substitution patterns. A straightforward solution, classifying symmetry-inequivalent structures after determining all possible configurations, is redundant and practically unfeasible because the number of total combinations increases exponentially (i.e., combinatorial explosion). The key observation to alleviate the combinatorial explosion problem is that the atomic substitution problem can be mapped into a vertex coloring problem (Figures and 2). Here, we consider a very simple case: the problem of coloring the eight vertices of a cube with two colors, as shown in Figure . We color white vertices with red in a stepwise manner until three of them become red. This is essentially equivalent to the substitution in which three out of eight atoms on a Wyckoff site are substituted with another element (e.g., see Figure ; Ce8Pd24Sb → (Ce5,La3)Pd24Sb, where Ce and La atoms are on the 8g Wyckoff position and the space group is 221-Pm3m). Then, the question of how many symmetry-inequivalent structures exist in all the possible substitution patterns essentially becomes equivalent to the question of how many symmetrically unique coloring patterns exist in a given graph.

Figure 1

Example of the search tree, where three vertices out of eight are colored in red in a cube. p(X) denotes the parent node of X, whereas C(X) denotes the set of children of X.

Figure 2

Atomic substitutions corresponding to Figure . Ce8Pd24Sb → (Ce5,La3)Pd24Sb, where Ce and La atoms are on the 8g Wyckoff site. The crystal structure[34] was obtained from the ICSD database[35,36] (CollCode: 83378). The space group is 221-Pm3m. The crystal structures are depicted using VESTA.[37]

Example of the search tree, where three vertices out of eight are colored in red in a cube. p(X) denotes the parent node of X, whereas C(X) denotes the set of children of X. Atomic substitutions corresponding to Figure . Ce8Pd24Sb → (Ce5,La3)Pd24Sb, where Ce and La atoms are on the 8g Wyckoff site. The crystal structure[34] was obtained from the ICSD database[35,36] (CollCode: 83378). The space group is 221-Pm3m. The crystal structures are depicted using VESTA.[37] A search tree to find the unique coloring patterns can be constructed as shown in Figure , where the search tree is truncated at the depth corresponding to the number of colored vertices. The tree is traversed by a proper algorithm such as the depth-first traversal procedure. The mapping itself does not solve the combinational explosion problem, because one visits all the nodes in the worst-case scenario. If one can truncate large parts of the search tree at the upper level without visiting lower nodes, the computational cost can be drastically decreased. For instance, in Figure , since p(X) and p(Y) are symmetrically equivalent, all the children of p(Y) such as Y and W can be disregarded without visiting these nodes, if all the children associated with p(X) such as X and Z have been already visited. This so-called isomorphic rejection(28) procedure is one of the key points in the present study. Such truncations should be performed using only local information. Indeed, if one needs to store all the information on the visited nodes for the truncation (i.e., nonlocal information is needed), the combinatorial explosion problem still remains. The orderly generation and canonical augmentation exploit the so-called canonicity for a node representation to truncate a search tree using only local information (see the Supporting Information for the mathematical definition). They are very powerful algorithms for the classification of combinatorial structures, the main idea of which is to construct only nonequivalent objects (namely, symmetry-inequivalent structures in the atomic substitution problem) and thereby classify such objects. The orderly generation and canonical augmentation were first introduced by Read[33] and McKay,[30] respectively.[28] They are similar algorithms in the sense that both exploit the canonicity. However, they also show some significant differences. The orderly generation requires that all nodes should themselves be canonical.[28] In contrast, the canonical augmentation does not require the canonicity of nodes themselves; rather, it requires a search tree to be generated in a canonical manner.[28] This feature of the canonical augmentation allows us to occasionally avoid the computation of time-consuming canonicity tests in the tree search. In the Supporting Information, we outline the mathematical details of the algorithms implemented in Shry.

Comparison with Existing Software Packages

Since the atomic substitution problem is very general and important, several software packages have already been developed. To our knowledge, the site-occupancy disorder (Sod) package is the pioneering work in this field.[27] The software package has been applied for many disordered systems since 2007. Another open-source package with similar functionalities is the enumlib package.[23] The code enables us to generate derivative superstructures of a parent lattice in a considerably efficient manner, and it has been integrated into Pymatgen.[24]Supercell[21] is the most recent and most sophisticated software package in the field. The package implements considerably efficient algorithms for the atomic substitution problem. The atomic substitution problem can be solved using several commercial software packages. The Disorder tool implemented in Materials Studio(22) can generate solid solutions with a very user-friendly graphical user interface (GUI). Similar functionalities are implemented in a well-known DFT code, Crystal program,[26,38] to treat disorder systems within the periodic ab initio framework. Within the existing solutions, Okhotonikov et al.[21] reported that only the Supercell package could handle cases with a large number of permutations because the other programs crash on such computationally demanding test cases. Therefore, for the benchmark and validation tests, the relevant comparison is mostly between Shry ver. 1.1.0 and Supercell ver. 2.0.2.

Benchmark Test

APb1–Te is a typical benchmark system used for testing the performance of an atomic substitution program. Table summarizes the results for this benchmark test. Since Shry is implemented in Python (i.e., an interpreted language), it is intrinsically slower than the other tools implemented in compiled programming languages such as C++. Note that the actual computational time of Supercell is much smaller than its CPU time because Supercell is multithreaded very well. In contrast, Shry is not multithreaded at present.

Table 1

Total Number of Possible Atom Combinations for Different Supercell Sizes of the A0.5Pb0.5Te Systema

Total number of atoms in simulation cell	8	16	24	32	64
Supercell size a × b × c	1 × 1 × 1	1 × 1 × 2	1 × 1 × 3	1 × 2 × 2	2 × 2 × 2
Symmetry operations	192	128	192	256	1536
Total combinations (symmetry-inequivalent structures)	6(1)	70(8)	924 (34)	12,870 (153)	601,080,390 (404,582)
Performance of Shry (s)	2.85(1.21)	1.70(1.17)	1.78(1.27)	1.85(1.39)	161.91(160.99)
Performance of Supercell (s)	0.03(0.01)	0.03(0.02)	0.04(0.02)	0.06(0.04)	8.40(200.72)

The computational times of Shry and Supercell were measured on an Intel Xeon G-6242 (2.80 GHz, 16 cores, 32 threads) processor using the time command, where the time-consuming I/O operations were disabled. Both real and user (in parentheses) times are shown in the Shry and Supercell rows. The distributed binary Supercell program was used for the test. We benchmarked Shry and Supercell for a variety of systems. For the benchmark data set, we randomly chose several CIF files from the Crystallography Open Database (COD)[39] for each space group, as listed in Table S2 (Supporting Information), and we generated CIF files containing up to three atomic substitution sites. The supercell size and atomic substitution sites were randomly and repeatedly selected for each compound until the total number of symmetry-inequivalent structures becomes less than 109. The number of compounds used for the benchmark test is 500. Table S2 lists the compounds, unique IDs in the database, and other relevant information used for the benchmark test. Further, we report the sizes of supercells, total numbers of substituted structures, and numbers of symmetry-inequivalent structures. The computational times required to find all symmetry-inequivalent structures are summarized in the rightmost columns of Table S2. We confirmed that both the number of substitution patterns including symmetry-equivalent structures and the number of symmetry-inequivalent structures are consistent between Shry and Supercell for all compounds listed in Table S2 with the given substitution patterns. We show the computational times of Shry on the benchmark set in Figure . The computational time is almost constant up to N ∼ 104 in the small N region, where N refers to the number of symmetrically distinct configurations. This is because preconditioning steps, such as finding the symmetry operations of an unsubstituted structure, are the rate-determining steps of the algorithms (e.g., the actual computational times are less than 3 s for N ≤ 104). Instead, the computational time in the large N region increases as N increases; however, it scales linearly up to N ∼ 109. Since Shry is purely implemented in Python and not yet parallelized, the actual computational time is longer than that of other packages realizing the same functionality, such as Supercell shown in Figure S3. There are several possible strategies for speeding up the computation in Shry, such as parallelizing the code or rewriting the core parts with Cython or C++, which would be promising future works.

Figure 3

Computational times of Shry on a benchmark data set. On the horizontal axis, we place the number of symmetrically distinct structures, whereas the vertical axis indicates the absolute computational time.

Validation Test

We compared not only computational times and the numbers of structures but also the redundancies and equivalencies of the generated structures among the software packages. We performed both (a) intra-software and (b) inter-software tests. (a) The intra-software code test checked whether each program generates zero redundant structures. Thus, for each program, we investigated whether each generated structure is symmetry inequivalent to all the others (Figure (a)). Brute-force comparisons were performed for this test (i.e., all the possible combinations were investigated). (b) The inter-software code test checked whether the symmetry-inequivalent structure sets generated by Shry and Supercell (or Disorder tool) are identical. Thus, we built pairs of Shry and Supercell (or Disorder tool) structures and confirmed that there is no excess or deficiency (Figure (b)).

Figure 4

Schematics of (a) intra-software and (b) inter-software tests. The intra-software test checked whether each program provides zero redundant structures, where brute-force comparisons were performed (i.e., all the possible combinations were considered). The inter-software test checked whether the symmetry-inequivalent structure sets generated by Shry and Supercell (Disorder tool) are identical; i.e., there is no excess or deficiency among them. The crystal structures are depicted using VESTA.[37] For the validation test, we used a total of 600 compounds, which were taken from COD.[39]Table S3 lists the compounds, unique IDs in the database, and other relevant information used for the validation test. No discrepancy was found among the three codes (Shry, Supercell, and Disorder tool). Both the intra-software and inter-software tests were successful for all the 600 compounds with Shry and Supercell (Disorder tool was tested only for some cases; see the Supporting Information). All the used CIF files are uploaded in our git repository.

Conclusion

We developed a Python software package, Shry, that enables the selection of only symmetry-inequivalent structures from a vast number of candidates. This is very useful when employing the supercell approach to study a solid solution or a disordered system in an ab initio calculation. Although we showed several examples in this paper where some elements are substituted by other elements, the application of Shry is not limited to them. For instance, Shry can also be applied for studying vacancies and charge disproportionation in crystals within the supercell approach as far as symmetry-inequivalent patterns are concerned. Notice that the magnetic structures (e.g., configurations of up and down spins) cannot be studied with Shry at present because the current implementation does not support Shubnikov groups. There are several perspectives for the future development of Shry. One major concern is the long absolute computational time. Since Shry is purely implemented in Python and is not multithreaded at present, the absolute computational time is not compatible with other codes for large N. A possible solution is parallelizing the code, which should be very efficient because canonical augmentation can be performed independently. Another possibility is rewriting the core parts of Shry with much a faster language, such as C++, like other scientific packages. From the algorithmic point of view, a great advantage of the canonical augmentation algorithm used in Shry is the use of the subobject invariant (Supporting Information), which enables us to avoid computing the time-consuming canonicity test. The present implementation, i.e., the sum of the distance matrix, is sometimes not capable of distinguishing objects belonging to different orbits on a subobject. Finding a more distinguishable subject invariant is an interesting direction for future work that must be pursued to fully exploit the advantage of the canonical augmentation algorithm, realizing a more efficient exhaustive search.

Data and Software Availability

The Python software package, Shry, is distributed on our GitHub repository [https://github.com/giprayogo/SHRY] under the MIT license. The package is archived in the Zenodo repository [10.5281/zenodo.5652360]. All the CIF files used for the benchmark and validation tests are also available from our GitHub repository [https://github.com/giprayogo/SHRY], which is also archived in the Zenodo repository [10.5281/zenodo.5652360], and those of unsubstituted structures are available from the Crystallography Open Database (COD)[39] [https://www.crystallography.net/cod/].

13 in total

1. Special quasirandom structures.

Authors:
Journal: Phys Rev Lett Date: 1990-07-16 Impact factor: 9.161

2. Theory of defects in one-dimensional systems: application to Al-catalyzed Si nanowires.

Authors: Riccardo Rurali; Xavier Cartoix
Journal: Nano Lett Date: 2009-03 Impact factor: 11.189

3. Symmetry and random sampling of symmetry independent configurations for the simulation of disordered solids.

Authors: Philippe D'Arco; Sami Mustapha; Matteo Ferrabone; Yves Noël; Marco De La Pierre; Roberto Dovesi
Journal: J Phys Condens Matter Date: 2013-08-02 Impact factor: 2.333

4. A new ab initio modeling scheme for the ion self-diffusion coefficient applied to the ε-Cu₃Sn phase of the Cu-Sn alloy.

Authors: Tom Ichibha; Genki Prayogo; Kenta Hongo; Ryo Maezono
Journal: Phys Chem Chem Phys Date: 2019-02-27 Impact factor: 3.676

5. On the use of symmetry in configurational analysis for the simulation of disordered solids.

Authors: Sami Mustapha; Philippe D'Arco; Marco De La Pierre; Yves Noël; Matteo Ferrabone; Roberto Dovesi
Journal: J Phys Condens Matter Date: 2013-02-06 Impact factor: 2.333

6. The CRYSTAL code, 1976-2020 and beyond, a long story.

Authors: Roberto Dovesi; Fabien Pascale; Bartolomeo Civalleri; Klaus Doll; Nicholas M Harrison; Ian Bush; Philippe D'Arco; Yves Noël; Michel Rérat; Philippe Carbonnière; Mauro Causà; Simone Salustro; Valentina Lacivita; Bernard Kirtman; Anna Maria Ferrari; Francesco Silvio Gentile; Jacopo Baima; Mauro Ferrero; Raffaella Demichelis; Marco De La Pierre
Journal: J Chem Phys Date: 2020-05-29 Impact factor: 3.488

7. Investigation into Structural Phase Transitions in Layered Titanium-Oxypnictides by a Computational Phonon Analysis.

Authors: Kousuke Nakano; Kenta Hongo; Ryo Maezono
Journal: Inorg Chem Date: 2017-11-02 Impact factor: 5.165

8. Crystallography Open Database - an open-access collection of crystal structures.

Authors: Saulius Gražulis; Daniel Chateigner; Robert T Downs; A F T Yokochi; Miguel Quirós; Luca Lutterotti; Elena Manakova; Justas Butkus; Peter Moeck; Armel Le Bail
Journal: J Appl Crystallogr Date: 2009-05-30 Impact factor: 3.304

9. Phonon dispersions and Fermi surfaces nesting explaining the variety of charge ordering in titanium-oxypnictides superconductors.

Authors: Kousuke Nakano; Kenta Hongo; Ryo Maezono
Journal: Sci Rep Date: 2016-07-19 Impact factor: 4.379

10. Supercell program: a combinatorial structure-generation approach for the local-level modeling of atomic substitutions and partial occupancies in crystals.

Authors: Kirill Okhotnikov; Thibault Charpentier; Sylvian Cadars
Journal: J Cheminform Date: 2016-03-31 Impact factor: 5.514