Literature DB >> 32082439

Sparse modeling of chemical bonding in binary compounds.

Yosuke Kanda1, Hitoshi Fujii2, Tamio Oguchi1,2.   

Abstract

A sparse model for quantifying energy difference between zinc-blende and rock-salt crystal structures in octet elemental and binary materials is constructed by using the linearly independent descriptor-generation method and exhaustive search, following the previous work by Ghiringhelli et al. [Phys Rev Lett. 2015;114:105503]. The obtained simplest model includes only atomic radius information of constituent atoms and its physical meaning is interpreted in relation to van Arkel-Ketelaar's triangle for classifying chemical bonding in binary compounds.
© 2019 The Author(s). Published by National Institute for Materials Science in partnership with Taylor & Francis Group.

Entities:  

Keywords:  404 Materials informatics / Genomics; Sparse modeling; binary compounds; chemical bonding; machine learning

Year:  2019        PMID: 32082439      PMCID: PMC7006824          DOI: 10.1080/14686996.2019.1697858

Source DB:  PubMed          Journal:  Sci Technol Adv Mater        ISSN: 1468-6996            Impact factor:   8.090


Introduction

Recently, data-intensive scientific discovery and design have been the focus of great attention for the acceleration of research and development in materials science, being widely called materials informatics (MI). The major aims of MI are the exploration of new materials with desired properties, the optimization of existing materials for particular performances, and the understanding of underlying physical mechanisms for further development. Generally, if one demands high predictability for a model constructed by data-science machine-learning techniques, complicated methods using non-linear models such as kernel ridge regression [1], neural network [2], and random forest [3] are appropriate, though their interpretation becomes troublesome because of the non-linearity involved in the modeling. On the other hand, simple modeling such as linear regression with interpretable descriptors is suitable for extracting intuitive understanding from materials data at the sacrifice of its predictability to a certain degree. Sparse modeling [4] is the statistical learning technique to realize such a simple model by the selection and reduction of the descriptors assumed. A pioneering work with the use of the sparse modeling for materials properties was reported by Ghiringhelli et al. [5]. Total energy differences between zinc-blende and rock-salt crystal structures obtained by density-functional-theory (DFT) calculations for 82 elementary and binary semiconductors are modeled with the least absolute shrinkage and selection operation (LASSO) [6] and exhaustive search techniques within the linear regression modeling. They have succeeded to construct simple models with a small number of descriptors at relatively high predictability. The key to success can be found in the construction of the descriptors. They first assumed several basic descriptors such as ionization potential, electron affinity, and some DFT atomic data for constituent atoms and then operated them to get higher-order descriptors with multiplication, division, and functionalization up to the order of thousands. The LASSO technique is utilized to reduce the number of descriptors to tens by statistical procedures and error evaluations. Finally, an exhaustive search is used to extract the most important descriptors for a given number of descriptors among them. Nevertheless, the obtained model is still far understandable with physical intuitiveness because of complicated functions of several basic descriptors. In this study, we aim to construct a simpler and interpretable model for the same problem as that Ghiringhelli and coworkers attacked. Our idea is two folds: one is the symmetrization of basic descriptors for the permutation of constituent elements and the second is the high-order operation of basic descriptors without using complicated functions like exponential. Also, regression trials with a single basic descriptor will be carried out. During the high-order descriptor operations, collinearity problems (including multicollinearity and near multicollinearity) often take place because of strong dependency between the generated higher-order descriptors by products of descriptors. The linear independent descriptor generation (LIDG) method recently proposed by us [7] is adopted to remove those collinearity problems if they happen. Our models are found to be as simple as the previous models, without utilizing complicated descriptors and able to quantitatively classify the chemical bonding in binary compound systems.

Methods

Target variables

The target variables prepared by Ghiringhelli et al. [5] are used for the construction of modeling in this study. Namely, total energy differences between zinc-blende (ZB) and rock-salt (RS) type structures calculated for 82 octet elementary and binary compounds with main-group elements as The data used for the present regression are listed in Appendix A. To confirm the precision of the target data, total energies of the 82 systems with ZB and RS structures are recalculated by using the all-electron full-potential linearized augmentation plane-wave method implemented in our HiLAPW code [8] and the root-mean-square errors are 7meV/atom in the total-energy difference and 0.009Å in the equilibrium lattice constant.

Descriptors

Ghiringhelli et al. [5] distinguished the constituent elements and according to the size of electronegativity. However, permutation of and leads to no physical change in the system at the equiatomic condition and models constructed should be symmetric by the permutation. In the present study, we generate descriptors as follows: Prepare basic descriptors and for constituent atoms and on the basis of our intuition. Symmetrize them by permutation and add their inversion, being called first-order descriptors. Generate high-order descriptors by multiplication of the first-order descriptors. Remove multicollinearity and near multicollinearity by erasing the irrelevant descriptors Iterate to generate the high-order descriptor generation and to reduce collinearity problems, if needed. Concerning the basic descriptors, easily obtainable physical quantities could appeal our intuition to construct interpretable models. Atomic radius, ionization potential, electron affinity, and electronegativity are adopted in this study and tabulated in Appendix A. As for the symmetrization and inversion, we consider the following operations as The high-order descriptors are generated by multiplication of the first-order descriptors. From first-order descriptors, th-order descriptors can be constructed. As mentioned above, every time high-order descriptors are generated, multicollinearity and near multicollinearity are removed by the LIDG method [7]. Here, multicollinearity is a linear dependency between descriptor vectors. Such a linear dependency often occurs when higher-order descriptor generations are performed. The existence of the linear dependency means that has non-trivial solutions , where is a descriptor matrix (design matrix) with descriptor vector as the columns. is the number of descriptors. Thus, to find multicollinearity, all non-trivial solutions of should be found. Fortunately, the non-trivial solutions can be easily found by computing the reduced row echelon form (RREF) [9] of . In the LIDG method, is linearly independentized by appropriately removing the detected descriptors having a multicollinearity relationship. Since the constant term is originally included in the regression model, constant terms additionally arising by multiplication are removed tacitly.

Model selection

In sparse modeling [4], the best model that has the highest predictivity is usually selected by the cross-validation procedure [10,11]. For the purpose, we employ the leave-one-out scheme in this study, where 81 sets of data (target and descriptors) are used for the construction of model and the remaining one set of data called is adopted for estimating the predictivity error [12] as where , , and are true, predicted, and averaged target valuables, respectively. Then, the measure of predictivity for the model selection by the cross-validation is calculated by average as with the total number of data set (82 in this case). To obtain models as simple as possible, the exhaustive search method [13] for a given number of descriptors is employed.

Results

Using the procedures described in the previous section with the four kinds of basic descriptors, 86 descriptors are generated up to the second order, called descriptor space 1 (DS1) as listed in Appendix B. Figure 1 shows for the best models by the exhaustive search as a function of the number of descriptors in DS1. That is, when descriptors are used, linear regressions (ordinary least-squares method) are performed for all the combinations of descriptors, and of Equation (4) is calculated for each, and then the maximum value of is plotted. Here, is the total number of descriptors in DS1. Detailed results of model selection are summarized in Appendix C.
Figure 1.

Measure of predictivity for the best models with descriptor space 1 (DS1), 2 (DS2), and 3 (DS3) as a function of the number of descriptors obtained by the exhaustive search.

Measure of predictivity for the best models with descriptor space 1 (DS1), 2 (DS2), and 3 (DS3) as a function of the number of descriptors obtained by the exhaustive search. It is seen in Figure 1 that as the number of descriptors is increased, the predictivity with DS1 is also increased through and then almost saturated afterward. Therefore, the model with is appropriately simple with relatively high predictability. This model called Model 1 is given as where and are electronegativity and atomic radius, respectively. The regression performance of Model 1 is shown in Figure 2. It is quite interesting that only the electronegativity and atomic radius are included in Model 1 with a simple form, but its physical meaning is not readily understandable.
Figure 2.

Regression performance of Model 1. (a) predicted and DFT data. (b) predicted and DFT data for each semiconductor. ID corresponds to that in Table A1.

Regression performance of Model 1. (a) predicted and DFT data. (b) predicted and DFT data for each semiconductor. ID corresponds to that in Table A1.
Table A1.

Target data for regression taken from Ghiringhelli et al. [5]. is total energy difference (in eV/atom) between zinc-blende and rock-salt structures of 82 elementary and binary systems .

IDABΔEIDABΔEIDABΔE
0LiF0.05928SiSi0.27556RbI0.169
1LiCl0.03829SiGe0.26457SrO0.221
2LiBr0.03330SiSn0.13658SrS0.369
3LiI0.02231KF0.14659SrSe0.375
4BeO0.43032KCl0.16560SrTe0.381
5BeS0.50633KBr0.16661AgF0.156
6BeSe0.49534KI0.16862AgCl0.044
7BeTe0.46635CaO0.26663AgBr0.030
8BN1.71336CaS0.36964AgI0.037
9BP1.02037CaSe0.36165CdO0.087
10BAs0.87938CaTe0.35066CdS0.070
11BSb0.58139CuF0.01967CdSe0.083
12CC2.63840CuCl0.15668CdTe0.113
13CSi0.66841CuBr0.15269InN0.150
14CGe0.80842CuI0.20370InP0.170
15CSn0.45043ZnO0.10271InAs0.122
16NaF0.14644ZnS0.27572InSb0.080
17NaCl0.13345ZnSe0.25973SnSn0.016
18NaBr0.12746ZnTe0.24174CsF0.112
19NaI0.11547GaN0.43375CsCl0.152
20MgO0.17848GaP0.34176CsBr0.158
21MgS0.08749GaAs0.27177CsI0.165
22MgSe0.05550GaSb0.15878BaO0.095
23MgTe0.00551GeGe0.20279BaS0.326
24AlN0.07252GeSn0.08780BaSe0.350
25AlP0.21953RbF0.13681BaTe0.381
26AlAs0.21254RbCl0.161    
27AlSb0.15055RbBr0.164    
Electronegativity and atomic radius are known to be empirically correlated as [14,15] though they are not so highly collinear that our near-collinearity criteria judge. Note that Pearson’s correlation coefficient is . Therefore, atomic radius only and electronegativity only in the basic descriptor set are used on trial to generate 24 high-order descriptors up to fourth order for sparse modeling, called descriptor space 2 (DS2) and 3 (DS3), respectively, as listed in Appendix B. As results, it is found that DS2 gives much better than DS3. For example, for in DS2 and DS3 is 0.892 and 0.714, respectively. In Figure 1, with DS2 becomes almost constant beyond and the model with might be a good one from the viewpoints of predictivity and interpretable sparse modeling, being called Model 2 expressed as and its regression performance is given in Figure 3.
Figure 3.

Regression performance of Model 2. (a) predicted and DFT data. (b) predicted and DFT data for each semiconductor. ID corresponds to that in Table A1.

Regression performance of Model 2. (a) predicted and DFT data. (b) predicted and DFT data for each semiconductor. ID corresponds to that in Table A1. It should be emphasized that Model 2 is a really simple model including atomic-radius descriptors only at high predictivity (). Regression performance of the present models (Model 1 and Model 2) and the previous ones (Model A, Model B, and Model C) is summarized in Table 1 in terms of decision coefficient [16], measure of predictivity (Equation 4) [12], Akaike information criterion AIC [17,18], mean absolute error MAE , and maximum absolute error MaxAE . Model A, Model B, and Model C of the previous work in Table 1 are the best models with descriptors selecting one, two, and three, respectively, from left in the following descriptor list:
Table 1.

Regression performance of models obtained in the present and the previous works. , , , AIC, MAE, and MaxAE are the number of descriptors, decision coefficient [16], measure of predictivity (Equation 4) [12], Akaike information criterion [17,18], mean absolute error, and maximum absolute error, respectively. Models in the previous work are given in the text.

 Present
Previous work [5]
CriterionModel 1Model 2Model AModel BModel C
M32123
R20.9130.8760.8830.9290.957
Q20.9020.8660.8670.9180.946
AIC92.465.072.0110.6149.4
MAE (eV)0.1020.1180.1210.0970.071
MaxAE (eV)0.4570.4600.4000.3490.301
Regression performance of models obtained in the present and the previous works. , , , AIC, MAE, and MaxAE are the number of descriptors, decision coefficient [16], measure of predictivity (Equation 4) [12], Akaike information criterion [17,18], mean absolute error, and maximum absolute error, respectively. Models in the previous work are given in the text. Note that the values of ionization potential, electron affinity, and electronegativity used in the present study are slightly different from those in the previous work [5]. Because of that, MAE and MaxAE do not perfectly coincide with those listed previously.

Discussion

Let us consider the possible consequences of Model 2 that is the simplest one among the models constructed in the preceding session. In the cases of elemental materials (), becomes positive for Å, preferring zinc-blende (properly diamond) structure. Actually, no elementary materials nor compounds with the same atomic radii greater than 1.68 Å are included in the present octet compounds. For compounds with largely different atomic radii, rock-salt structure with higher coordination than zinc blende is realized. From Equation (7), the borderline between ZB and RS structures, namely , is given as , providing a quantitative guideline to classify ZB and RS structures in the present systems. The borderline and the structural classification will be discussed further below in relation to van Arkel-Ketelaar’s triangle of chemical bonding. Approximately, Model 2 shown in Equation (7) tells that the energy difference between ZB and RS structures is linearly scaled to the absolute difference in the atomic radius of the constituent atoms () and inversely proportional to the cell volume (). In the octet compounds, the cohesion mechanism is dominated by covalent bonds with additive ionic electrostatic interactions. Covalent bonds originate from the formation of bonding and antibonding states between neighboring orbitals and are roughly proportional to the size of the corresponding hopping integrals. According to the scaling rules in the tight-binding theory [19-21], the hopping integral for neighboring orbitals is proportional to , where is the interatomic distance. Therefore, it is reasonable to see the inverse proportionality of the cell volume in the energy difference. Chemical trends in the stable structure directly derived from Model 2 are listed in Table 2.
Table 2.

Relations between atomic radius and stable structure derived from Model 2 (Equation 7). is defined in Equation 1.

Atomic radiusΔEStable structure
|rArB| : large<0Rock salt
rA+rB : small  
and>0Zinc blende
|rArB| : small  
Relations between atomic radius and stable structure derived from Model 2 (Equation 7). is defined in Equation 1. Empirically, electronegativity is well known to be related to chemical bonding in compounds [22] and has an inverse relation to the atomic radius, as shown in Equation (6). With this relation, the trends with respect to the atomic radius listed in Table 2 can be converted to trends with respect to electronegativity given in Table 3. This result is consistent with our knowledge of the stable structure in semiconductors such that covalent (ionic) compounds tend to possess zinc-blende (rock-salt) crystal structure [23]. Nevertheless, it is quite interesting to able to model the energy difference quantitatively better with atomic radius than with electronegativity, as mentioned in the preceding section.
Table 3.

Relations between electronegativity, stable structure, and chemical bond, derived from Table 2 and Equation 6.

ElectronegativityΔEStable structureChemical bond
|ENAENB| : large<0Rock saltIonic
ENA+ENB : large   
and>0Zinc blendeCovalent
|ENAENB| : small   
Relations between electronegativity, stable structure, and chemical bond, derived from Table 2 and Equation 6. van Arkel-Ketelaar’s triable is a map for displaying chemical bonding of compounds [24-26]. Metallic, ionic, and covalent bonding are represented in a two-dimensional (2D) map with the axes of mean and difference of electronegativity of the constituent atoms in the latest version [27,28]. Following van Arkel-Ketelaar’s triangle, the total energy difference given by Equation (7) is plotted in a 2D map of the sum and difference of atomic radius as shown in Figure 4. Figure 4 precisely reproduces the stable crystal structure, either zinc-blende or rock-salt and covalent or ionic bonding via relation between structure and chemical bonding. Note that the models constructed by regression include no information about chemical bonding characteristics beyond the training data. As a matter of fact, Model 2 can not represent metallic systems, that may correspond to an empty region in the present triangle shown in Figure 4.
Figure 4.

Total energy difference map in a triable of the sum and difference of atomic radius of the constituent atoms given by Model 2 (Equation 7). Red-colored (blue-colored) dots form an area where zinc-blende (rock-salt) structure is stable and covalent (ionic) bonding is realized. An area with no dots corresponds to the region where training data are not included, possibly indicating a metallic bonding region.

Total energy difference map in a triable of the sum and difference of atomic radius of the constituent atoms given by Model 2 (Equation 7). Red-colored (blue-colored) dots form an area where zinc-blende (rock-salt) structure is stable and covalent (ionic) bonding is realized. An area with no dots corresponds to the region where training data are not included, possibly indicating a metallic bonding region.

Conclusions

A simple model quantifying energy difference between zinc-blende and rock-salt structure in octet elemental and binary semiconductors is obtained with only the information of atomic radius of constituent atoms, leading to a 2D map of chemical bonding represented in terms of the sum and difference of atomic radius. It is found that our descriptor-generation method including symmetrization for permutation, multiplication operation to higher order, and removal of collinearity problems is crucial to construct such a sparse model in addition to the exhaustive search. That is, since we use only symmetrized descriptors as initial descriptors, it is guaranteed that a correct model can be obtained at least in terms of symmetry. In addition, since inappropriate descriptors that do not satisfy symmetry are not included, the number of descriptor candidates can be reduced. The above two are the effects of descriptor symmetrization. On the other hand, the model obtained in the previous study does not satisfy the symmetry due to the permutation of and elements. Therefore, no matter how high the prediction accuracy, it can be said that this is a physically inappropriate model at least in symmetry. One would also like to mention the effect of removing multicollinearity. For example, if there is multicollinearity, such as , in descriptor matrix, the estimation and prediction accuracies do not change regardless of which one of , and is deleted from the descriptor matrix. Therefore, it cannot be decided from statistics whether , , or should be removed. In our LIDG method, however, since the multicollinearity has been detected prior to regression, one can introduce the simplicity of descriptors in the descriptor selection process and employ two descriptors with a simpler form between , , and . Therefore, the obtained model is the simplest model among the models that give the same prediction accuracy. This is the advantage of the LIDG method in the detection and removal of multicollinearity.
Table A2.

Basic descriptors. , , , and are atomic radius, ionization potential, electron affinity, and electronegativity, respectively. , , and are the radius at maximum probability amplitude of , , and orbitals, respectively.

Atomra(Å)IPb(eV)EAc(eV)ENdrse(Å)rpe(Å)rde(Å)
Li1.675.3920.61800.981.6521.9956.930
Be1.129.3220.50001.571.0781.2112.877
B0.878.2980.27702.040.8050.8261.946
C0.6711.2601.26292.550.6440.6301.631
N0.5614.5340.07003.040.5390.5111.540
O0.4813.6181.46113.440.4620.4272.219
F0.4217.4223.39903.980.4060.3711.428
Na1.905.1390.54790.931.7152.5976.566
Mg1.457.6460.40001.311.3301.8973.171
Al1.185.9860.44101.611.0921.3931.939
Si1.118.1511.38501.900.9381.1341.890
P0.9810.4860.74652.190.8260.9661.771
S0.8810.3602.07712.580.7420.8472.366
Cl0.7912.9673.61703.160.6790.7561.666
K2.434.3410.50150.822.1282.4431.785
Ca1.946.1130.30001.001.7572.3240.679
Cu1.457.7261.22801.901.1971.6802.576
Zn1.429.3940.60001.651.0991.5472.254
Ga1.365.9990.30001.810.9941.3302.163
Ge1.257.8991.20002.010.9171.1622.373
As1.149.8100.81002.180.8471.0432.023
Se1.039.7522.02072.550.7980.9522.177
Br0.9411.8143.36502.960.7490.8821.869
Rb2.654.1770.48590.822.2403.1991.960
Sr2.195.6950.30000.951.9112.5481.204
Ag1.657.5761.30201.931.3161.8832.968
Cd1.618.9930.70001.691.2321.7362.604
In1.565.7860.30001.781.1341.4983.108
Sn1.457.3441.20001.961.0571.3442.030
Sb1.338.6411.07002.051.0011.2322.065
Te1.239.0091.97082.100.9451.1411.827
I1.1510.4513.05912.660.8961.0711.722
Cs2.983.8940.47160.792.4643.1641.974
Ba2.535.2120.30000.892.1492.6321.351

aRef. [14]:,bRef. [29]:,cRef. [30]:,dRef. [15]:,eRef. [5].

Table B1.

Descriptor space 1 (DS1): 86 descriptors up to second order of atomic radius, ionization potential, electron affinity, and electronegativity.

OrderDescriptor
1rA+rB, IPA+IPB, EAA+EAB, ENA+ENB
 |rArB|, |IPAIPB|, |EAAEAB|, |ENAENB|
 1rA+rB, 1IPA+IPB, 1EAA+EAB, 1ENA+ENB
2rA+rB2, rA+rBIPA+IPB, rA+rBEAA+EAB
 rA+rBENA+ENB, rA+rB|rArB|, rA+rB|IPAIPB|
 rA+rB|EAAEAB|, rA+rB|ENAENB|, rA+rBIPA+IPB
 rA+rBEAA+EAB, rA+rBENA+ENB, IPA+IPB2, IPA+IPBEAA+EAB
 IPA+IPBENA+ENB, IPA+IPB|rArB|
 IPA+IPB|IPAIPB|, IPA+IPB|EAAEAB|
 IPA+IPB|ENAENB|, IPA+IPBrA+rB, IPA+IPBEAA+EAB, IPA+IPBENA+ENB
 EAA+EAB2, EAA+EABENA+ENB, EAA+EAB|rArB|
 EAA+EAB|IPAIPB|, EAA+EAB|EAAEAB|
 EAA+EAB|ENAENB|, EAA+EABrA+rB, EAA+EABIPA+IPB, EAA+EABENA+ENB
 ENA+ENB2, ENA+ENB|rArB|, ENA+ENB|IPAIPB|
 ENA+ENB|EAAEAB|, ENA+ENB|ENAENB|
 ENA+ENBrA+rB, ENA+ENBIPA+IPB, ENA+ENBEAA+EAB, |rArB|2, |rArB||IPAIPB|
 |rArB||EAAEAB|, |rArB||ENAENB|, |rArB|rA+rB, |rArB|IPA+IPB
 |rArB|EAA+EAB, |rArB|ENA+ENB, |IPAIPB|2, |IPAIPB||EAAEAB|
 |IPAIPB||ENAENB|, |IPAIPB|rA+rB, |IPAIPB|IPA+IPB, |IPAIPB|EAA+EAB, |IPAIPB|ENA+ENB
 |EAAEAB|2, |EAAEAB||ENAENB|, |EAAEAB|rA+rB, |EAAEAB|IPA+IPB
 |EAAEAB|EAA+EAB, |EAAEAB|ENA+ENB, |ENAENB|2, |ENAENB|rA+rB, |ENAENB|IPA+IPB
 |ENAENB|EAA+EAB, |ENAENB|ENA+ENB, 1rA+rB2, 1rA+rBIPA+IPB
 1rA+rBEAA+EAB, 1rA+rBENA+ENB, 1IPA+IPB2
 1IPA+IPBEAA+EAB, 1IPA+IPBENA+ENB, 1EAA+EAB2
 1EAA+EABENA+ENB, 1ENA+ENB2
Table B2.

Descriptor space 2 (DS2): 24 descriptors up to fourth order of atomic radius.

OrderDescriptors
1rA+rB, |rArB|, 1rA+rB
2rA+rB2, rA+rB|rArB|, |rArB|2, |rArB|rA+rB
 1rA+rB2
3rA+rB3, rA+rB2|rArB|, rA+rB|rArB|2
 |rArB|3, |rArB|2rA+rB, |rArB|rA+rB2, 1rA+rB3
4rA+rB4, rA+rB3|rArB|, rA+rB2|rArB|2
 rA+rB|rArB|3, |rArB|4, |rArB|3rA+rB, |rArB|2rA+rB2
 |rArB|rA+rB3, 1rA+rB4
Table B3.

Descriptor space 3 (DS3): 24 descriptors up to fourth order of electronegativity.

OrderDescriptors
1ENA+ENB, |ENAENB|, 1ENA+ENB
2ENA+ENB2, ENA+ENB|ENAENB|, |ENAENB|2
 |ENAENB|ENA+ENB, 1ENA+ENB2
3ENA+ENB3, ENA+ENB2|ENAENB|
 ENA+ENB|ENAENB|2, |ENAENB|3
 |ENAENB|2ENA+ENB, |ENAENB|ENA+ENB2, 1ENA+ENB3
4ENA+ENB4, ENA+ENB3|ENAENB|
 ENA+ENB2|ENAENB|2, ENA+ENB|ENAENB|3
 |ENAENB|4, |ENAENB|3ENA+ENB, |ENAENB|2ENA+ENB2, |ENAENB|ENA+ENB3
 1ENA+ENB4
Table C1.

Results of model selection in descriptor space 1 (DS1) by exhaustive search. Models are ranked by score and listed only top 3.

MRankingR2Q2ΔE
110.6580.5954.121rA+rB2 0.60
 20.5920.5223.561rA+rB 1.33
 30.5460.48186.631rA+rB1IPA+IPB 1.82
210.8230.7820.53|ENAENB|rA+rB + 4.251rA+rB2 0.36
 20.7710.7280.17IPA+IPBrA+rB + 9.111rA+rB2 0.20
 30.7670.7160.11|IPAIPB|rA+rB + 4.421rA+rB2 0.46
310.9130.9020.59|ENAENB| 1.95|ENAENB|rA+rB
    + 6.151rA+rB2 0.75
 20.9110.8990.12|EAAEAB||ENAENB| 1.26|ENAENB|rA+rB
    + 5.771rA+rB2 0.60
 30.9040.8920.10(rA+rB)|ENAENB| 1.13|ENAENB|rA+rB
    + 5.891rA+rB2 0.70
410.9340.9211.86|EAAEAB|IPA+IPB+ 0.13|ENAENB|2
    1.52|ENAENB|rA+rB + 6.141rA+rB2 0.68
 20.9330.9210.04(rA+rB)|EAAEAB| + 0.11|ENAENB|2
    1.42|ENAENB|rA+rB + 6.091rA+rB2 0.67
 30.9300.9200.13(rA+rB)|ENAENB| 0.06(IPA+IPB)|ENAENB|
    + 0.22IPA+IPBrA+rB + 0.23|ENAENB|2 1.00
510.9450.9360.22(rA+rB)|ENAENB| 0.07(IPA+IPB)|ENAENB|
    + 0.22IPA+IPBrA+rB 1.06|rArB|ENA+ENB
    + 0.21|ENAENB|2 0.10
 20.9460.9360.01(IPA+IPB)(EAA+EAB)
    + 0.05(EAA+EAB)|EAAEAB| + 0.11|ENAENB|2
    1.50|ENAENB|rA+rB + 6.161rA+rB2 0.53
 30.9440.9350.22(rA+rB)|ENAENB| 0.07(IPA+IPB)|ENAENB|
    + 0.22IPA+IPBrA+rB 4.55|rArB|IPA+IPB
    + 0.21|ENAENB|2 1.01
Table C2.

Results of model selection in descriptor space 2 (DS2) by exhaustive search. Models are ranked by score and listed only top 3.

MRankingR2Q2ΔE
110.7110.6828.081rA+rB4 0.19
 20.6980.6525.701rA+rB3 0.34
 30.6580.5954.121rA+rB2 0.60
210.8760.8666.871rA+rB3 5.02|rArB|(rA+rB)3 0.18
 20.8630.8445.161rA+rB2 5.49|rArB|(rA+rB)3 0.51
 30.8440.8332.01|rArB|(rA+rB)2+ 8.431rA+rB4+ 0.03
310.9030.8930.08|rArB|2+ 6.031rA+rB2 7.03|rArB|(rA+rB)3 0.68
 20.9020.8935.971rA+rB2+ 0.02(rA+rB)|rArB|2
    6.61|rArB|(rA+rB)3 0.67
 30.9020.8925.851rA+rB2 + 0.04|rArB|3 6.58|rArB|(rA+rB)3 0.64
410.9030.8920.02(rA+rB)|rArB| + 6.041rA+rB2 + 0.08|rArB|3rA+rB
    7.01|rArB|(rA+rB)3 0.69
 20.9030.8925.991rA+rB2 + 0.01(rA+rB)2|rArB| + 0.09|rArB|3rA+rB
    6.84|rArB|(rA+rB)3 0.67
 30.9030.8920.07|rArB|2 + 6.001rA+rB2 + 0.03|rArB|3rA+rB
    7.01|rArB|(rA+rB)3 0.67
510.9050.8920.05(rA+rB)3 + 0.03(rA+rB)|rArB|2
    + 5.831rA+rB3 + 0.01(rA+rB)4 6.65|rArB|(rA+rB)3+ 0.34
 20.9040.8920.12|rArB|2 + 5.881rA+rB2 + 0.18|rArB|3
    0.04|rArB|4 6.56|rArB|(rA+rB)3 0.64
 30.9050.8910.12(rA+rB)2 + 0.03(rA+rB)|rArB|2 + 5.601rA+rB3
    + 0.003(rA+rB)4 6.69|rArB|(rA+rB)3 + 0.54
Table C3.

Results of model selection in descriptor space 3 (DS3) by exhaustive search. Models are ranked by score and listed only top 3.

MRankingR2Q2ΔE
110.3750.33819.07|ENAENB|(ENA+ENB)3 + 0.47
 20.3750.3365.25|ENAENB|(ENA+ENB)2 + 0.51
 30.3360.2941.27|ENAENB|ENA+ENB + 0.50
210.5720.4980.72(ENA+ENB) 0.004(ENA+ENB)3|ENAENB|
    2.47
 20.5650.4820.08(ENA+ENB)2 0.004(ENA+ENB)3|ENAENB|
    0.94
 30.5460.47811.541ENA+ENB 0.004(ENA+ENB)3|ENAENB|
    + 3.31
310.7080.6540.07(ENA+ENB)|ENAENB|2 + 0.004(ENA+ENB)4
    0.01(ENA+ENB)3|ENAENB| 0.44
 20.7080.6460.02(ENA+ENB)3 + 0.05(ENA+ENB)|ENAENB|2
    0.01(ENA+ENB)3|ENAENB| 0.85
 30.6970.6300.02(ENA+ENB)3 + 0.06|ENAENB|3
    0.01(ENA+ENB)3|ENAENB| 0.72
410.7280.6760.05(ENA+ENB)2|ENAENB| + 0.004(ENA+ENB)4
    0.02(ENA+ENB)3|ENAENB|
    + 0.01(ENA+ENB)2|ENAENB|2 0.60
 20.7250.6730.11(ENA+ENB)|ENAENB| + 0.004(ENA+ENB)4
    0.02(ENA+ENB)3|ENAENB|
    + 0.01(ENA+ENB2)|ENAENB|2 0.61
 30.7180.6620.32|ENAENB| + 0.004(ENA+ENB)4
    0.02(ENA+ENB)3|ENAENB|
    + 0.01(ENA+ENB)2|ENAENB|2 0.60
510.7540.7141.20(ENA+ENB)|ENAENB|
    0.32(ENA+ENB)2|ENAENB| + 0.005(ENA+ENB)4
    + 0.03(ENA+ENB)2|ENAENB|2
    6.10|ENAENB|ENA+ENB2 0.91
 20.7520.7091.08(ENA+ENB)|ENAENB|
    0.29(ENA+ENB)2|ENAENB|
    + 0.19(ENA+ENB)|ENAENB|2
    2.76|ENAENB|2ENA+ENB + 0.01(ENA+ENB)4 0.86
 30.7530.7092.33|ENAENB| + 0.90|ENAENB|2
    0.17(ENA+ENB)2|ENAENB| + 0.005(ENA+ENB)4
    13.32|ENAENB|ENA+ENB2 0.85
  4 in total

1.  Bonds and Bands in Semiconductors: New insight into covalent bonding in crystals has followed from studies of energy-band spectroscopy.

Authors:  J C Phillips
Journal:  Science       Date:  1970-09-11       Impact factor: 47.728

2.  External validation and prediction employing the predictive squared correlation coefficient test set activity mean vs training set activity mean.

Authors:  Gerrit Schüürmann; Ralf-Uwe Ebert; Jingwen Chen; Bin Wang; Ralph Kühne
Journal:  J Chem Inf Model       Date:  2008-11       Impact factor: 4.956

3.  Big data of materials science: critical role of the descriptor.

Authors:  Luca M Ghiringhelli; Jan Vybiral; Sergey V Levchenko; Claudia Draxl; Matthias Scheffler
Journal:  Phys Rev Lett       Date:  2015-03-10       Impact factor: 9.161

4.  Accelerating materials property predictions using machine learning.

Authors:  Ghanshyam Pilania; Chenchen Wang; Xun Jiang; Sanguthevar Rajasekaran; Ramamurthy Ramprasad
Journal:  Sci Rep       Date:  2013-09-30       Impact factor: 4.379

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.