| Literature DB >> 29633987 |
Mohammad M Ghahremanpour1, Paul J van Maaren1, David van der Spoel1.
Abstract
Data quality as well as library size are crucial issues for force field development. In order to predict molecular properties in a large chemical space, the foundation to build force fields on needs to encompass a large variety of chemical compounds. The tabulated molecular physicochemical properties also need to be accurate. Due to the limited transparency in data used for development of existing force fields it is hard to establish data quality and reusability is low. This paper presents the Alexandria library as an open and freely accessible database of optimized molecular geometries, frequencies, electrostatic moments up to the hexadecupole, electrostatic potential, polarizabilities, and thermochemistry, obtained from quantum chemistry calculations for 2704 compounds. Values are tabulated and where available compared to experimental data. This library can assist systematic development and training of empirical force fields for a broad range of molecules.Entities:
Mesh:
Year: 2018 PMID: 29633987 PMCID: PMC5892371 DOI: 10.1038/sdata.2018.62
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
The number of calculations for each quantum-chemical method in the library.
| G2, G3, G4, CBS-QB3, W1BD, and W1U were used to calculate thermochemical properties. The HF/6-311G** and B3LYP/aug-cc-pVTZ levels of theories were used to optimize the molecular geometries, determine the electric moments and polarizability, molecular electrostatic potential map, atomic partial charge, vibrational frequencies, and the zero-point vibrational energy. Note that not all calculations have been done for all compounds, therefore some numbers above are lower than the total number of compounds. | |
|---|---|
| B3LYP/aug-cc-pVTZ | 2500 |
| CBS-QB3 | 2179 |
| G2 | 2096 |
| G3 | 2090 |
| G4 | 2091 |
| HF/6-311G** | 2537 |
| W1BD | 705 |
| W1U | 606 |
Compound information provided in the repository.
| 1 | IUPAC name |
| 2 | Formula |
| 3 | Total charge |
| 4 | Multplicity |
| 5 | CAS number |
| 6 | ChemSpider ID (CSID) |
| 7 | PubChem ID (CID) |
| 8 | Number of rotatable bonds |
| 9 | StdInChI |
| 10 | InChIKey |
Root mean square deviation (RMSD) from experiment for polarizability α and dipole moment μ for compounds where calculations were done at both levels of theory.
| The RMSD and its error bar are obtained by bootstrapping with 100 iterations. N is the number of compounds, which is limited by the availability of experimental data. | |||
|---|---|---|---|
| 1198 | 2.39(0.004) | 0.43(0.006) | |
| 542 | 0.48(0.002) | 0.30(0.003) |
Figure 1Residual plot for the isotropic polarizability α as calculated at two levels of theory.
Figure 2Residual plot for the dipole moment μ as calculated at two levels of theory.
Chemical space analysis of polarizability α.
| Number of compounds with experimental data N | |||||
|---|---|---|---|---|---|
| C4H6O2 | 8 | 9(0.3) | 6 | 0.1 | 0.1 |
| C4H8O2 | 9 | 9(0.2) | 9 | 0.2 | 0.0 |
| C4H10O2 | 9 | 9(0.1) | 8 | 0.1 | 0.0 |
| C5H8 | 12 | 10(0.5) | 10 | 0.2 | 0.2 |
| C5H10 | 10 | 10(0.3) | 9 | 0.2 | −0.2 |
| C5H10O2 | 11 | 11(0.1) | 11 | 0.1 | −0.1 |
| C5H10O | 10 | 10(0.1) | 9 | 0.1 | −0.1 |
| C5H12O | 11 | 11(0.2) | 10 | 0.3 | −0.2 |
| C6H10 | 35 | 12(0.5) | 28 | 0.3 | 0.1 |
| C6H12O2 | 10 | 13(0.0) | 8 | 0.1 | −0.0 |
| C6H12 | 30 | 12(0.3) | 29 | 0.2 | −0.2 |
| C6H12O | 11 | 12(0.3) | 9 | 0.2 | −0.1 |
| C6H14O | 14 | 12(0.1) | 13 | 0.2 | −0.1 |
| C7H9N | 12 | 14(0.3) | 7 | 0.1 | 0.1 |
| C7H12 | 23 | 13(0.2) | 21 | 0.2 | −0.1 |
| C7H14 | 44 | 13(0.3) | 41 | 0.2 | −0.2 |
| C7H14O | 17 | 14(0.2) | 8 | 0.2 | −0.2 |
| C7H16 | 9 | 14(0.1) | 9 | 0.3 | −0.3 |
| C8H10O | 9 | 15(0.1) | 5 | 0.2 | 0.1 |
| C8H11N | 9 | 16(0.4) | 5 | 0.4 | 0.2 |
| C8H16 | 113 | 15(0.3) | 109 | 0.3 | −0.2 |
| C8H18 | 18 | 15(0.1) | 16 | 0.4 | −0.4 |
| C9H10 | 8 | 16(0.4) | 6 | 0.7 | 0.6 |
| C9H12 | 8 | 16(0.1) | 7 | 0.1 | 0.1 |
| C9H18 | 31 | 17(0.2) | 25 | 0.4 | −0.4 |
| C9H18O | 9 | 17(0.1) | 2 | 0.5 | −0.5 |
| C9H20 | 16 | 17(0.1) | 6 | 0.2 | −0.2 |
| C10H14 | 19 | 18(0.1) | 12 | 0.2 | 0.1 |
| C10H22 | 14 | 19(0.1) | 3 | 0.1 | −0.0 |
Chemical space analysis of standard entropy S0.
| Number of compounds with experimental data N | |||||
|---|---|---|---|---|---|
| C4H8O2 | 12 | 349(29.0) | 12 | 15.9 | 8.1 |
| C4H10O2 | 8 | 384(11.8) | 7 | 16.5 | −8.9 |
| C5H8 | 11 | 318(16.0) | 10 | 4.8 | 1.1 |
| C5H10 | 10 | 327(18.1) | 10 | 8.1 | 0.1 |
| C5H10O2 | 11 | 394(11.7) | 9 | 10.4 | 4.1 |
| C5H12O | 12 | 381(11.5) | 11 | 7.0 | −3.4 |
| C6H10 | 20 | 354(17.7) | 17 | 8.1 | 0.6 |
| C6H12O2 | 10 | 444(21.2) | 10 | 18.6 | −1.8 |
| C6H12 | 19 | 368(20.6) | 19 | 5.8 | −0.9 |
| C6H12O | 8 | 402(30.2) | 5 | 8.8 | −0.9 |
| C6H14O2 | 8 | 461(22.2) | 4 | 17.1 | −9.7 |
| C6H14O | 14 | 424(12.2) | 9 | 8.1 | −2.6 |
| C7H9N | 9 | 355(6.8) | 7 | 10.2 | 3.4 |
| C7H12 | 23 | 375(26.3) | 23 | 9.5 | −2.5 |
| C7H14 | 20 | 395(29.9) | 19 | 8.1 | −2.7 |
| C7H14O | 15 | 417(33.4) | 6 | 17.0 | −0.1 |
| C7H16 | 9 | 408(14.9) | 9 | 9.4 | 5.3 |
| C8H10O | 12 | 395(5.2) | 10 | 10.2 | −8.0 |
| C8H16 | 31 | 414(37.2) | 30 | 7.9 | −0.6 |
| C8H18 | 18 | 441(18.9) | 17 | 6.2 | 1.9 |
| C9H10 | 8 | 382(16.6) | 7 | 9.8 | −1.2 |
| C9H12 | 10 | 392(9.2) | 7 | 7.3 | 3.7 |
| C9H18 | 9 | 463(41.8) | 2 | 2.5 | −2.5 |
| C9H20 | 16 | 470(25.7) | 6 | 17.3 | 12.9 |
| C10H14 | 20 | 428(10.2) | 5 | 9.3 | 6.0 |
| C10H22 | 14 | 522(23.3) | 2 | 8.1 | 7.6 |
Chemical space analysis of heat capacity at constant volume Cv.
| Number of compounds with experimental data N | |||||
|---|---|---|---|---|---|
| C4H8O2 | 9 | 97(8.7) | 9 | 5.2 | 0.8 |
| C4H8O | 8 | 88(7.9) | 7 | 4.9 | −3.6 |
| C5H8 | 12 | 91(7.4) | 11 | 3.9 | −2.8 |
| C5H10 | 10 | 97(8.6) | 10 | 5.7 | −2.9 |
| C5H10O2 | 11 | 126(2.9) | 9 | 9.3 | −6.9 |
| C5H12O | 12 | 127(4.3) | 11 | 7.6 | −6.4 |
| C6H10 | 16 | 110(11.1) | 13 | 5.5 | −2.0 |
| C6H12O2 | 9 | 148(2.4) | 9 | 5.6 | −3.3 |
| C6H12 | 19 | 121(9.2) | 19 | 8.0 | −4.5 |
| C6H14O | 11 | 149(2.0) | 7 | 10.0 | −9.0 |
| C7H9N | 9 | 117(2.7) | 7 | 3.6 | −0.4 |
| C7H12 | 23 | 131(11.4) | 23 | 6.5 | −4.4 |
| C7H14 | 19 | 137(9.8) | 18 | 6.2 | −5.2 |
| C7H14O | 14 | 151(10.8) | 6 | 11.6 | −5.5 |
| C7H16 | 8 | 156(2.7) | 8 | 8.6 | −7.6 |
| C8H10O | 12 | 143(8.6) | 10 | 8.0 | −5.4 |
| C8H16 | 18 | 156(10.7) | 18 | 8.0 | −7.1 |
| C8H18 | 18 | 179(2.8) | 17 | 10.4 | −9.7 |
| C9H10 | 8 | 133(4.8) | 7 | 3.2 | −2.2 |
| C9H12 | 10 | 143(4.3) | 7 | 3.8 | −1.3 |
| C9H20 | 16 | 201(3.4) | 6 | 14.7 | −14.6 |
| C10H14 | 20 | 170(3.8) | 5 | 5.3 | −4.4 |
| C10H22 | 14 | 223(2.5) | 2 | 17.6 | −17.5 |
Chemical space analysis of enthalpy of formation ΔH0.
| Number of compounds with experimental data N | |||||
|---|---|---|---|---|---|
| C4H8O2 | 12 | −379(63.9) | 12 | 14.0 | 9.3 |
| C4H8O | 8 | −172(42.3) | 7 | 3.5 | −0.8 |
| C4H10O2 | 10 | −404(66.4) | 9 | 14.6 | 10.9 |
| C5H8 | 12 | 114(39.3) | 11 | 3.2 | 1.7 |
| C5H10 | 12 | −23(21.7) | 12 | 5.6 | 2.0 |
| C5H10O2 | 11 | −457(36.2) | 9 | 10.5 | 2.6 |
| C5H10O | 9 | −244(13.6) | 8 | 8.4 | 3.8 |
| C5H12O | 12 | −293(21.3) | 11 | 7.8 | −0.7 |
| C6H10 | 20 | 68(41.8) | 17 | 7.5 | 2.9 |
| C6H12O2 | 11 | −470(56.7) | 11 | 19.1 | 10.5 |
| C6H12 | 28 | −52(23.7) | 28 | 8.5 | 3.3 |
| C6H12O | 8 | −267(33.9) | 5 | 2.7 | −1.1 |
| C6H14O2 | 9 | −466(34.4) | 4 | 17.8 | 11.9 |
| C6H14O | 14 | −317(19.3) | 9 | 6.5 | 0.7 |
| C7H9N | 9 | 69(15.2) | 7 | 7.1 | 1.3 |
| C7H12 | 28 | 31(54.9) | 26 | 8.5 | −0.9 |
| C7H14 | 44 | −88(23.0) | 43 | 7.2 | 2.6 |
| C7H14O | 15 | −316(30.4) | 6 | 23.7 | 14.6 |
| C7H16 | 9 | −197(6.3) | 9 | 3.2 | 0.7 |
| C8H10O | 12 | −148(17.9) | 10 | 10.6 | 3.4 |
| C8H16 | 104 | −111(25.3) | 102 | 6.3 | 1.8 |
| C8H18 | 18 | −217(5.0) | 17 | 7.7 | 4.9 |
| C9H10 | 8 | 113(21.0) | 7 | 3.0 | −2.3 |
| C9H12 | 10 | 27(63.1) | 7 | 3.4 | −3.3 |
| C9H18 | 9 | −146(41.0) | 2 | 22.1 | 18.8 |
| C9H20 | 16 | −237(6.5) | 6 | 25.8 | 19.9 |
| C10H14 | 20 | −27(8.2) | 5 | 15.0 | 6.1 |
| C10H22 | 14 | −260(7.8) | 2 | 3.4 | −3.4 |