Literature DB >> 34815431

BSE49, a diverse, high-quality benchmark dataset of separation energies of chemical bonds.

Viki Kumar Prasad1, M Hossein Khalilian1, Alberto Otero-de-la-Roza2, Gino A DiLabio3.   

Abstract

We present an extensive and diverse dataset of bond separation energies associated with the homolytic cleavage of covalently bonded molecules (A-B) into their corresponding radical fragments (A. and B.). Our dataset contains two different classifications of model structures referred to as "Existing" (molecules with associated experimental data) and "Hypothetical" (molecules with no associated experimental data). In total, the dataset consists of 4502 datapoints (1969 datapoints from the Existing and 2533 datapoints from the Hypothetical classes). The dataset covers 49 unique X-Y type single bonds (except H-H, H-F, and H-Cl), where X and Y are H, B, C, N, O, F, Si, P, S, and Cl atoms. All the reference data was calculated at the (RO)CBS-QB3 level of theory. The reference bond separation energies are non-relativistic ground-state energy differences and contain no zero-point energy corrections. This new dataset of bond separation energies (BSE49) is presented as a high-quality reference dataset for assessing and developing computational chemistry methods.
© 2021. The Author(s).

Entities:  

Year:  2021        PMID: 34815431      PMCID: PMC8611007          DOI: 10.1038/s41597-021-01088-2

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

Bond dissociation enthalpies (BDEs) are a central property in chemistry that have been studied for decades experimentally and computationally[1-4]. BDEs can be used to estimate the selectivity and reactivity of various molecules with free radicals (like ·OH, ·OOH, ·OR, ·OOR, ·NO, ·NO2, etc.) that are generated and transformed during chemical reactions relevant in chemistry and biology[5-10]. In this context, the calculation of BDEs for C-H, O-H, N-H, S-H, O-O, and S-S bonds in biologically relevant systems can help develop an understanding of the efficiency of antioxidants[11-13]. Furthermore, the calculation of BDEs is fundamental to develop a deeper understanding of various enzyme catalytic processes[14-16] and surface functionalization chemistry[17-19]. In 2012, Drew and Reynisson employed BDE calculations to predict the major metabolic sites of fifty known drug molecules[20]. Similarly, Andersson and co-workers applied BDE calculations to estimate the sensitivity of various drug candidates toward autoxidation[21]. The application of computed BDEs in these works shows how computational techniques can be incorporated into the risk assessment of drug products and guide further experimentation. Computationally obtained BDEs were also reported in different studies[22-24], where the C-O and C-C BDEs were calculated for several substituted analogues of lignin, an abundant polymeric organic material and a potential renewable source of biofuels and chemicals[22-24]. The calculated BDEs were used to predict the homolytic dissociation of C-C and C-O bonds under thermal decomposition using model compounds representing the dominant linkages of lignin. Given the importance of BDEs in many areas of chemistry and, consequently, the need to accurately predict bond energies computationally, a dataset of accurately predicted bond separation energies (BSEs) is developed here using an accurate computational chemistry method. Bond separation energies are a molecular property that can be computed in a straightforward manner in vacuum and provides direct information about the strength of a chemical bond. The BSEs presented in this work are differences between non-relativistic ground-state energies and contain no vibrational energy contributions, no zero-point energies, and no attempt has been made at thermally averaging over molecular conformations. As such, the reported BSEs are not comparable to experimental BDEs, but they serve as an ideal resource for developing and evaluating lower-cost computational chemistry methods used for a wide range of applications in chemistry and biology. Similar datasets to the one proposed in this work are available in the literature, but they tend to be small in terms of the total number of datapoints[25], lack bond-type diversity[26,27] or are calculated using less accurate computational chemistry methods compared to the one used in this work[28-30]. To the best of our knowledge, an accurate and extensive dataset of computationally predicted BSEs is not available in the literature. The main reason for this absence is that BSE calculations with high accuracy require computationally expensive methods that tend to scale poorly with system size. This work addresses the aforementioned gap in the literature by constructing a large dataset (4502 datapoints) of computationally predicted BSEs of 49 unique bond types, all of which are determined with a high-level composite theoretical procedure denoted as (RO)CBS-QB3[31-33]. This approach ensures uniform, high-quality reference data and eliminates the need to collect and verify data gathered from various sources, which may differ substantially in their accuracy. The (RO)CBS-QB3 method is known to produce BDEs of high accuracy[8,33-37]. Therefore, it is suitable for developing a database of BSEs that can be used to test and parametrize low-cost computational methods. One particular target application of our dataset is for the training of cost-effective computational approaches like atom-centered potentials[38-40] (ACPs) or machine learning potentials[28-30].

Methods

Dataset composition

We present the BSE49 dataset, which comprises a broad range of bond separation energies for 49 unique bond types. The model systems present in the dataset are neutral molecules with X-H, X-F, X-Cl, X-X, and X-Y single bonds, where X and Y are B, C, N, O, Si, P, and S. The number of datapoints and the ranges of bond separation energies associated with each bond type are provided in Table 1. The structures of model systems on which the calculations were performed are divided into “Existing” and “Hypothetical” classes. The Existing type structures were built by selecting molecules with experimental data reported in the Comprehensive Handbook of Chemical Bond Dissociation Energies[41]. In contrast, the Hypothetical type structures were constructed by functional group substitutions of X-Y single bonds in order to include bond types that were not present in the handbook and to increase the diversity and number of datapoints for each bond type in the dataset. The candidate molecules for both Existing and Hypothetical subsets were generated using a partially automated computational workflow as described below.
Table 1

List of the number of datapoints in the BSE49 dataset and the ranges of bond separation energies associated with each bond type calculated using (RO)CBS-QB3.

Bond typeDatapointsRange of bond separation energies
B-H6877.22–115.14
C-H39580.08–141.22
N-H15653.05–131.63
O-H24068.65–126.75
Si-H11174.31–106.06
P-H11861.73–87.98
S-H3974.80–95.81
B-B7547.41–112.40
B-C8392.26–142.78
B-N7185.50–155.16
B-O51100.14–158.50
B-F82152.61–177.24
B-Si8436.27–110.83
B-P8972.64–99.12
B-S5184.10–128.28
B-Cl8181.86–128.98
C-C36364.69–156.08
C-N9827.65–122.95
C-O17148.31–127.45
C-F40103.44–133.45
C-Si15336.82–111.67
C-P8560.93–115.15
C-S6441.42–105.29
C-Cl12964.26–113.54
N-N3715.64–70.81
N-O3122.50–70.80
N-F3649.72–83.45
N-Si6433.93–122.94
N-P9340.82–91.06
N-S5324.53–72.61
N-Cl3135.89–80.63
O-O6021.20–56.42
O-F9011.04–51.79
O-Si14474.85–144.88
O-P2783.10–130.79
O-S5146.55–93.05
O-Cl859.38–61.56
F-Si36123.92–169.04
F-P3299.43–125.94
F-S9972.84–107.41
Si-Si16534.86–104.94
Si-P6560.09–87.04
Si-S5762.95–98.18
Si-Cl102109.68–123.12
P-P2044.37–77.37
P-S2967.42–96.01
P-Cl3269.91–89.30
S-S6437.09–78.33
S-Cl10250.44–71.17

The bond separation energy ranges are in kcal/mol.

List of the number of datapoints in the BSE49 dataset and the ranges of bond separation energies associated with each bond type calculated using (RO)CBS-QB3. The bond separation energy ranges are in kcal/mol.

Dataset generation

The calculated bond separation energies are defined as the negative of the difference in the ground-state electronic energies for the reactionwhere A. and B. represent the two radical fragments formed by homolytically breaking the A-B covalent bond in vacuum. Based on this reaction, the equilibrium geometries of the parent molecules and their respective radical fragments are required to calculate the bond separation energies. The geometries of the parent molecule and the associated radicals were constructed manually for both Existing and Hypothetical subsets using the Avogadro[42] program. The constructed geometries were then used as starting points for a conformer search. The CSD conformer generator[43] and FullMonte[44] codes were used to generate multiple conformers. The geometry of each conformer was relaxed to the corresponding local minimum using the Gaussian[45] software package. This relaxation was carried out first by using a low-level method, combining the B3LYP[46-51] density functional and 6-31G*[52,53] basis set along with the D3[54-56] dispersion correction scheme using the Becke-Johnson[57] damping (B3LYP-D3(BJ)/6-31G*). The optimized conformers were ranked using the B3LYP-D3(BJ)/6-31G* relative energies at the local minima. The ten lowest-energy conformers were then re-optimized at the higher-level CAM-B3LYP-D3(BJ)/def2-TZVP level of theory[54-59]. Range-separated functionals like CAM-B3LYP minimize the delocalization error, which could be important in the description of radical species[60]. The lowest-energy conformer obtained in this procedure was used for calculating the bond separation energies using the composite method described below. All calculations employed a default self-consistent field (SCF) convergence criterion of 10−8 Hartrees, ultrafine integration grid, and a tight optimization convergence criteria (maximum force = 1.5 × 10−5 Hartrees/Bohr, RMS force = 1 × 10−5 Hartrees/Bohr, maximum displacement = 6 × 10−5 Bohr, RMS displacement = 4 × 10−5 Bohr). This partially automated workflow produced structures that are not necessarily the global minima. A visual inspection of the structures revealed that about 20% of the conformers generated do not correspond to the global minima, which reflects the difficulty of solving a global optimization problem (finding the most stable conformer) for such a large number of systems reliably. In addition, due to computational constraints, no attempt was made at evaluating the conformational energy landscape and statistically weighting the low-energy conformers associated with each molecule. Therefore, the dataset is not appropriate for direct comparison to bond separation energies obtained by back-correcting experimental BDEs, but it is suitable for testing and training computationally less expensive methods regarding their ability to accurately calculate the energy difference between the chosen conformers of products (A. and B.) and reactant (A-B). The structures obtained from the workflow described above were then used for the final step of reference data calculation, using the composite (RO)CBS-QB3[31-33] method. The restricted-open-shell[61] CBS-QB3 or ROCBS-QB3 was employed for the open-shell radical fragments, while restricted closed-shell calculations were performed for the closed-shell parent molecules with CBS-QB3. The composite (RO)CBS-QB3 method approximates energies at the complete-basis-set CCSD(T) level, using a series of computationally lower-cost methods including: (i) geometry optimization followed by vibration frequency calculation using the unrestricted-open-shell[62] B3LYP/6-311G(2d,d,p) method[46-51,63], (ii) ROMP2/6-311+G(3d2f,2df,2p) level[63-65] energy extrapolated to the complete-basis-set limit, (iii) energy calculation at ROMP4(SDQ)/6-31+G(d(f),p) level[63,64,66], and (iv) energy calculation at ROCCSD(T)/6-31+G† level[63,64,67] (where 6-31+G† is a modified 6-31+G(d) basis set). Note that the final (RO)CBS-QB3 energy includes additional empirical correction terms described in Reference[33]. Structures were screened to remove any system for which the imaginary frequencies were obtained. The (RO)CBS-QB3 energies for the structures associated with a particular bond breaking reaction were used to obtain the bond separation energies for the dataset.

Data Records

The reference bond separation energies (in kcal/mol) and coordinates (in Å) of the structures presented in the BSE49 dataset are publicly available free-of-charge from the Figshare[68] and GitHub (https://github.com/aoterodelaroza/bse49) repositories in the plain-text database file format (DB format) described in Table 2. The atomic coordinates of the model structures are stored in a plain-text XYZ format in the Geometries directory. The BSE49 dataset contains one DB format file and three XYZ format files for each bond separation energy. In total, deposited files include 4502 DB format files stored in the db-BSE49 directory and 13506 XYZ format files stored in their respective Existing or Hypothetical classification directories. Additional files labelled as BSE49_Existing.org and BSE49_Hypothetical.org are also provided. These files contain the necessary information about the reference data for all the model systems.
Table 2

A description of the DB format file (.db) for an A-B molecule containing N number of atoms with two radical fragments (A. and B.), which have n1 and n2 number of atoms, respectively.

LineColumnContent
11‘ref’ string specifying reference energy
12reference bond separation energy (in kcal/mol)
21‘molc’ string specifying start of the first molecular block
22unique integer identifier, 1 indicating the A. fragment
23the charge of the A. fragment
24the multiplicity of the A. fragment
3, …, n1 + 21element type
3, …, n1 + 22X coordinates (in Å)
3, …, n1 + 23Y coordinates (in Å)
3, …, n1 + 24Z coordinates (in Å)
n1 + 31‘end’ string specifying end of the first molecular block
n1 + 41‘molc’ string specifying start of the second molecular block
n1 + 42unique integer identifier, 1 indicating B. fragment
n1 + 43the charge of the B. fragment
n1 + 44the multiplicity of the B. fragment
n1 + 5, …, n1 + n2 + 41element type
n1 + 5, …, n1 + n2 + 42X coordinates (in Å)
n1 + 5, …, n1 + n2 + 43Y coordinates (in Å)
n1 + 5, …, n1 + n2 + 44Z coordinates (in Å)
n1 + n2 + 51‘end’ string specifying end of the second molecular block
n1 + n2 + 61‘molc’ string specifying start of the third molecular block
n1 + n2 + 62unique integer identifier, -1 indicating the A-B parent molecule
n1 + n2 + 63the charge of the A-B parent molecule
n1 + n2 + 64the multiplicity of the A-B parent molecule
n1 + n2 + 7, …, n1 + n2 + N + 61element type
n1 + n2 + 7…, n1 + n2 + N + 62X coordinates (in Å)
n1 + n2 + 7, …, n1 + n2 + N + 63Y coordinates (in Å)
n1 + n2 + 7, …, n1 + n2 + N + 64Z coordinates (in Å)
n1 + n2 + N + 71‘end’ string specifying end of the third molecular block
A description of the DB format file (.db) for an A-B molecule containing N number of atoms with two radical fragments (A. and B.), which have n1 and n2 number of atoms, respectively.

File format

For each molecule, the reference bond separation energy and the atomic coordinates are stored in a file named MoleculeName.db. The Cartesian coordinates of the atoms are stored in files called MoleculeName_AB.xyz, MoleculeName_A.xyz, and MoleculeName_B.xyz, where AB represents the parent molecule, A represents the first radical fragment, and B represents the second radical fragment. The DB format file contains a header line specifying the reference energy value (in kcal/mol) followed by three ‘molc’ (short for molecule) blocks containing a unique integer identifier, charge, multiplicity, and the atomic coordinates (in Å) of the parent molecule and its corresponding radical fragments. The XYZ format file contains a header line defining the number of atoms N, a comment line containing the charge and multiplicity, and N lines with each containing element type and X, Y, Z coordinates (in Å). The BSE49_Existing.org and BSE49_Hypothetical.org files are special-character separated plain-text files (where the special character is ‘|’) containing multiple lines and eight columns. The columns are: (i) dataset name of the model system, (ii) unique integer identifier 1 indicating the A. fragment, (iii) geometry filename of the A. fragment, (iv) unique integer identifier 1 indicating the B. fragment, (v) geometry filename of the B. fragment, (vi) unique integer identifier -1 indicating the A-B model system, (vii) geometry filename of the A-B model system, and (viii) computational reference bond separation energy (in kcal/mol).

Technical Validation

For the generation of reference data, the reliable (RO)CBS-QB3 method was chosen for all the model systems considered in the BSE49 dataset. The (RO)CBS-QB3 method has been widely used in literature in recent years[69-90]. The developers of the (RO)CBS-QB3 method reported that it predicts heats of formation at 298 K with a mean absolute deviation (MAD) from the experiment of 0.91 kcal/mol[33]. For bond dissociation enthalpies of eleven molecules with chemical structures typically found in amino acid sidechains, peptide termini, and peptide backbones, Moore et al. reported an MAD of 1.72 kcal/mol from the experimental values[8]. For small lignin model molecules, the CBS-QB3 approach was shown to yield bond dissociation enthalpies within 2.99 kcal/mol from experimental values[34]. (RO)CBS-QB3 has been used as a reference method for benchmarking various density functional theory methods to estimate bond dissociation enthalpies in a different study on small lignin model systems[23]. Hudzik and co-workers utilized the CBS-QB3 composite method to study the C-H bond separation energies of a few alkane molecules and reported a good agreement with literature values[35]. The (RO)CBS-QB3 has also been used for the prediction of bond dissociation enthalpies in a previous work by Menon et al.[36] The MAD of (RO)CBS-QB3 was reported to be only 0.60 kcal/mol from the experiment and was suggested as being a reliable and efficient procedure for calculating bond separation energies in comparison to the other composite methods tested. In another work, bond dissociation enthalpies of 200 molecules were calculated using an earlier version of this work’s composite method, CBS-Q[37]. It was shown that the results of the CBS-Q composite procedure predicted bond dissociation enthalpies to within 2.39 kcal/mol of the reported experimental values. Collectively, these results support the selection of (RO)CBS-QB3 as a practical and accurate method for the generation of reference data in this work. Note that the reference bond separation energies reported in this work are non-relativistic (RO)CBS-QB3 energies without zero-point energy corrections. This makes the reference data suitable to support the development of low-cost computational chemistry methods like those described in references[28-30,38-40].
Measurement(s)bond separation energies
Technology Type(s)ab initio quantum chemistry computational method
Factor Type(s)existing or hypothetical model structure
  33 in total

1.  Dissociation energies of X-H bonds in amino acids.

Authors:  Benjamin N Moore; Ryan R Julian
Journal:  Phys Chem Chem Phys       Date:  2012-01-30       Impact factor: 3.676

2.  How the Co-C bond is cleaved in coenzyme B12 enzymes: a theoretical study.

Authors:  Kasper P Jensen; Ulf Ryde
Journal:  J Am Chem Soc       Date:  2005-06-29       Impact factor: 15.419

3.  Field regulation of single-molecule conductivity by a charged surface atom.

Authors:  Paul G Piva; Gino A DiLabio; Jason L Pitters; Janik Zikovsky; Moh'd Rezeq; Stanislav Dogel; Werner A Hofer; Robert A Wolkow
Journal:  Nature       Date:  2005-06-02       Impact factor: 49.962

4.  Antiradical and antioxidant activities of new bio-antioxidants.

Authors:  V D Kancheva; L Saso; S E Angelova; M C Foti; A Slavova-Kasakova; C Daquino; V Enchev; O Firuzi; J Nechev
Journal:  Biochimie       Date:  2011-08-25       Impact factor: 4.079

5.  The impact of carbon-hydrogen bond dissociation energies on the prediction of the cytochrome P450 mediated major metabolic site of drug-like compounds.

Authors:  Kurt L M Drew; Jóhannes Reynisson
Journal:  Eur J Med Chem       Date:  2012-08-16       Impact factor: 6.514

6.  Prediction of drug candidates' sensitivity toward autoxidation: computational estimation of C-H dissociation energies of carbon-centered radicals.

Authors:  Thomas Andersson; Anders Broo; Emma Evertsson
Journal:  J Pharm Sci       Date:  2014-05-13       Impact factor: 3.534

7.  Predicting the activity of phenolic antioxidants: theoretical method, analysis of substituent effects, and application to major families of antioxidants.

Authors:  J S Wright; E R Johnson; G A DiLabio
Journal:  J Am Chem Soc       Date:  2001-02-14       Impact factor: 15.419

8.  Co-C Bond Dissociation Energies in Cobalamin Derivatives and Dispersion Effects: Anomaly or Just Challenging?

Authors:  Zheng-wang Qu; Andreas Hansen; Stefan Grimme
Journal:  J Chem Theory Comput       Date:  2015-03-10       Impact factor: 6.006

9.  The Cobalt-Methyl Bond Dissociation in Methylcobalamin: New Benchmark Analysis Based on Density Functional Theory and Completely Renormalized Coupled-Cluster Calculations.

Authors:  Pawel M Kozlowski; Manoj Kumar; Piotr Piecuch; Wei Li; Nicholas P Bauman; Jared A Hansen; Piotr Lodowski; Maria Jaworska
Journal:  J Chem Theory Comput       Date:  2012-05-04       Impact factor: 6.006

10.  Computational study of bond dissociation enthalpies for lignin model compounds. Substituent effects in phenethyl phenyl ethers.

Authors:  Ariana Beste; A C Buchanan
Journal:  J Org Chem       Date:  2009-04-03       Impact factor: 4.354

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.