| Literature DB >> 33400527 |
László Gyevi-Nagy1, Mihály Kállay1, Péter R Nagy1.
Abstract
The accurate and systematically improvable frozen natural orbital (FNO) and natural auxiliary function (NAF) cost-reducing approaches are combined with our recent coupled-cluster singles, doubles, and perturbative triples [CCSD(T)] implementations. Both of the closed- and open-shell FNO-CCSD(T) codes benefit from OpenMP parallelism, completely or partially integral-direct density-fitting algorithms, checkpointing, and hand-optimized, memory- and operation count effective implementations exploiting all permutational symmetries. The closed-shell CCSD(T) code requires negligible disk I/O and network bandwidth, is MPI/OpenMP parallel, and exhibits outstanding peak performance utilization of 50-70% up to hundreds of cores. Conservative FNO and NAF truncation thresholds benchmarked for challenging reaction, atomization, and ionization energies of both closed- and open-shell species are shown to maintain 1 kJ/mol accuracy against canonical CCSD(T) for systems of 31-43 atoms even with large basis sets. The cost reduction of up to an order of magnitude achieved extends the reach of FNO-CCSD(T) to systems of 50-75 atoms (up to 2124 atomic orbitals) with triple- and quadruple-ζ basis sets, which is unprecedented without local approximations. Consequently, a considerably larger portion of the chemical compound space can now be covered by the practically "gold standard" quality FNO-CCSD(T) method using affordable resources and about a week of wall time. Large-scale applications are presented for organocatalytic and transition-metal reactions as well as noncovalent interactions. Possible applications for benchmarking local CCSD(T) methods, as well as for the accuracy assessment or parametrization of less complete models, for example, density functional approximations or machine learning potentials, are also outlined.Entities:
Year: 2021 PMID: 33400527 PMCID: PMC7884001 DOI: 10.1021/acs.jctc.0c01077
Source DB: PubMed Journal: J Chem Theory Comput ISSN: 1549-9618 Impact factor: 6.006
Wall Times and Corresponding Peak Performance Utilizations Measured for Medium-Sized Systems
| wall
time | %
performance | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| species | atoms | no. of AOs | no. of AFs | FNO threshold | % NO | % NAF | CCSD [min] | (T) [h] | CCSD | (T) |
| FLPO | 41 | 1037 | 2500 | 5 × 10–5 | 60 | 39 | 13 | 15 | 47 | 50 |
| 10–5 | 81 | 53 | 40 | 57 | 53 | 45 | ||||
| TSAdd | 43 | 1071 | 2578 | 5 × 10–5 | 59 | 38 | 15 | 16 | 47 | 55 |
| 10–5 | 81 | 53 | 47 | 74 | 53 | 42 | ||||
| FLPA | 43 | 1071 | 2578 | 5 × 10–5 | 59 | 38 | 15 | 16 | 47 | 55 |
| 10–5 | 81 | 53 | 45 | 75 | 55 | 41 | ||||
| OO | 40 | 1089 | 2620 | 5 × 10–5 | 63 | 41 | 5 | 4.6 | 35 | 55 |
| 10–5 | 82 | 53 | 13 | 13 | 37 | 58 | ||||
| TS1 | 40 | 1089 | 2620 | 5 × 10–5 | 63 | 41 | 5 | 4.9 | 34 | 53 |
| 10–5 | 83 | 54 | 13 | 16 | 38 | 46 | ||||
| ABP | 31 | 1569 | 3671 | 5 × 10–5 | 41 | 31 | 4 | 2.6 | 35 | 51 |
| 10–5 | 65 | 48 | 17 | 13 | 44 | 60 | ||||
Percentage of active virtual NOs.
Percentage of retained NAFs with the threshold set to 5 × 10–2.
Time of one iteration.
Performed on two 12-core Intel Xeon E5-2670 v3 CPUs clocked at 2.3 GHz.
Performed on four 28-core Intel Xeon Platinum 8180M CPUs clocked at 1.7 GHz.
Dimensions of the Various Orbital Spaces Employed for the Largest Systems of 47–75 Atoms, As Well As the Corresponding FNO-CCSD(T) Correlation Energies
| species | atoms | basis set | no. of AOs | no. of AFs | % NO | % NAF | |
|---|---|---|---|---|---|---|---|
| ED28 | 47 | def2-QZVP | 1978 | 4469 | 66 | 50 | –4.1527 |
| PR28 | 51 | def2-QZVP | 2124 | 4745 | 65 | 49 | –4.2185 |
| enamine | 57 | def2-TZVP | 998 | 2478 | 86 | 52 | –4.6411 |
| Corannulene dimer | 60 | def2-TZVPPD | 1820 | 4460 | 66 | 42 | –6.6704 |
| GC-dDMP-B | 63 | 6-311++G(d,p) | 1042 | 5320 | 80 | 23 | –6.6211 |
| TSCCRS | 75 | def2-TZVP | 1381 | 3419 | 86 | 53 | –6.7300 |
Percentage of active virtual NOs. For the ED28 and the PR28 molecules 10–5, for the other species 5 × 10–5 NO threshold was set.
Percentage of retained NAFs with the threshold set to 5 × 10–2.
Figure 1Relationship of the various truncation strategies for the FNO (left panel) and the NAF (right panel) approximations. The values were averaged over the molecules of the AW test set.
Figure 2Percentage error of the CCSD(T) correlation energy as a function of the discarded cumulative occupation (1-COT) (left panel) and the percentage error of the MP2 correlation energy (right panel). Symbols and colors refer to five ONT thresholds ranging from 10–4 to 10–6, while individual points mark a single species of the AW test set with the cc-pVQZ basis.
Average Relative (Maximum) Error of Correlation Energies as the Percentage of the Conventional CCSD(T) Correlation Energy for the AW Test Set with the cc-pVQZ Basis Set and Various FNO Truncation Thresholdsa
| threshold | |||||
|---|---|---|---|---|---|
| technique | 10–4 | 3.16 × 10–5 | 10–5 | 3.16 × 10–6 | 10–6 |
| uncorrected | 4.22 (5.32) | 1.41 (2.43) | 0.40 (1.18) | 0.07 (0.19) | 0.00 (0.03) |
| ΔMP2 | 0.93 (2.72) | 0.51 (2.55) | 0.20 (1.23) | 0.04 (0.16) | 0.00 (0.03) |
| COT linear | 0.14 (1.79) | 0.03 (0.43) | 0.01 (0.02) | 0.00 (0.01) | |
| MP2 linear | 0.05 (0.12) | ||||
| Shanks | 0.04 (0.10) | ||||
| Richardson 2 | |||||
The best performing methods are highlighted in bold for each ONT value.
Average (Maximum) Error of Reaction Energies [in kJ/mol] Compared to Conventional CCSD(T) Calculations for the AW Test Set with the cc-pVQZ Basis Set and Various FNO Truncation Thresholdsa
| threshold | |||||
|---|---|---|---|---|---|
| technique | 10–4 | 3.16 × 10–5 | 10–5 | 3.16 × 10–6 | 10–6 |
| uncorrected | 4.31 (14.81) | 1.90 (6.55) | 0.61 (2.65) | 0.28 (1.15) | 0.05 (0.17) |
| ΔMP2 | 0.71 (2.24) | 0.13 (0.52) | 0.03 (0.12) | ||
| COT linear | 1.62 (6.49) | 0.30 (1.58) | 0.09 (0.67) | 0.03 (0.09) | |
| MP2 linear | 0.78 (2.96) | 0.22 (1.18) | 0.03 (0.13) | ||
| Shanks | 0.23 (1.22) | 0.03 (0.13) | 0.01 (0.08) | ||
| Richardson 2 | 0.21 (0.97) | ||||
The best performing methods are highlighted in bold for each ONT value.
Figure 3Errors of CCSD(T) reaction energies [in kJ/mol] for the AW test set with various basis sets and truncation thresholds separately for the FNO (left panel) and NAF (right panel) approximations. For clarity, the 3.69 kJ/mol RMS and 15.68 kJ/mol MAX errors obtained with the 10–1 NAF threshold are not shown.
Figure 4Average percentage of retained virtual NOs (left panel) and NAFs (right panel) for the AW test set with various basis sets and truncation thresholds. The plotted numerical data are collected in Tables S5 and S8 of the Supporting Information.
Combined FNO and NAF Truncation Errors (in kJ/mol) Including all MP2-Based Corrections for the Reaction Energies of the AW and the NWH Test Sets Using 10–5 FNO and 5 × 10–2 NAF Thresholds
| test set | basis | MAE | MAX | RMS |
|---|---|---|---|---|
| AW | cc-pVDZ | 0.75 | 2.85 | 1.02 |
| cc-pVTZ | 0.19 | 0.75 | 0.24 | |
| cc-pVQZ | 0.18 | 0.67 | 0.24 | |
| CBS(D,T) | 0.41 | 1.93 | 0.55 | |
| CBS(T,Q) | 0.30 | 1.18 | 0.41 | |
| NWH | cc-pVTZ | 0.10 | 0.37 | 0.14 |
CBS(X,X+1) denotes results obtained with the basis set extrapolation using the cc-pVXZ and cc-pV(X+1)Z basis sets.
Combined FNO and NAF Truncation Errors (in kJ/mol) Including All MP2-Based Corrections for the Atomization Energies of the HEAT and the Ionization Potentials of the VIP Test Sets Using 10–5 FNO and 5 × 10–2 NAF Thresholds
| all
reactions | no hydrogen | ||||||
|---|---|---|---|---|---|---|---|
| test set | basis | MAE | MAX | RMS | MAE | MAX | RMS |
| HEAT | cc-pVTZ | 0.84 | 1.99 | 0.98 | 1.12 | 1.99 | 1.21 |
| cc-pVQZ | 0.96 | 2.71 | 1.20 | 0.30 | 0.88 | 0.39 | |
| CBS(T,Q) | 1.34 | 3.96 | 1.73 | 0.57 | 1.46 | 0.74 | |
| VIP | aug-cc-pV(T+d)Z | 0.24 | 0.97 | 0.37 | |||
Calculated from atomization energies excluding hydrogen-containing species.
Figure 5FNO-CCSD(T) reaction energy errors compared to the conventional DF-CCSD(T) reference for medium-sized molecules of 31–43 atoms with two FNO thresholds and 5 × 10–2 as NAF threshold. Notations: “FLP TS” stands for FLPO + H2 → TSAdd, “FLP reac.” for FLPO + H2 → FLPA, “orgcat. TS” for en-trans + nitrostyrene → TS1, “orgcat. reac.” for en-trans + nitrostyrene → OO, and “Pd reac.” for AA + BA + TBHP → ABP + TBP + H2O. The plotted data are collected in Table S9 of the Supporting Information.
Figure 6Guanine–cytosine deoxydinucleotide monophosphate (GC-dDMP-B) system: 63 atoms, 1042 AOs.[30]
Wall Times for CCSD(T) Calculations for the GC-dDMP-B Molecule Performed with the NWChem,[30] MPQC,[10] and MRCC Suites,[12] as well as the CCSD Program of TeraChem[24]
| no. of CPUs | no. of cores | CCSD it. [min] | (T) [day] | % CCSD performance | % (T) performance | |
|---|---|---|---|---|---|---|
| NWChem[ | 1100 | 1100 | 72 | 11 | ||
| NWChem[ | 20,000 | 160,000 | 13 | 0.06 | 3.4 | 10 |
| MPQC[ | 128 | 1024 | 43 | 1.98 | 24 | 44 |
| MRCC | 4 | 112 | 67 | 5.48 | 47 | 53 |
| MRCC (FNO & NAF) | 4 | 112 | 31 | 2.27 | 39 | 54 |
Efficiency based on the operation count of an optimal CCSD algorithm estimated as the sum of the operation counts of the sixth-power scaling terms, and in the case of a DF algorithm, the assembly of the four-external two-electron integrals utilizing the full permutational symmetry.
Efficiency based on the operation count of an optimal (T) algorithm estimated as the operation count of the seventh-power scaling terms utilizing the full permutational symmetry.
Performed with 8-core AMD 6276 Interlagos CPUs clocked at 2.3 GHz.
The CCSD calculation utilized one core per node.
Performed with 8-core Intel Xeon E5-2670 CPUs clocked at 2.6 GHz.
The calculation was performed on 8 Tesla V100 GPUs. The “no. of CPUs” column contains the number of GPUs, the “no. of cores” corresponds to the number of CUDA cores.
Performed with four 28-core Intel Xeon Platinum 8180M CPUs clocked at 1.7 GHz.
Calculated with a FNO threshold of 5 × 10–5 and a NAF threshold of 5 × 10–2.
Figure 7Ligand exchange reaction of a Ru-complex (reaction 28 of the MOR41 test set[81]).
Figure 8C–C bond formation step of the Michael addition reaction via the transition state labeled TSCCRS in ref (82).
Figure 9Concave–convex, eclipsed conformer of the corannulene dimer taken from ref (83).
Figure 10Reaction and NCIEs of extended molecules using the FNO and NAF thresholds specified in Table . Notation: “Ru reac.” for ED28 + dmpe → PR28 + COD,[81] “organocat. TS” for NS + enamine → TSCCRS,[82] and “NCIE (no CP)” and “NCIE (CP)” for the NCIE of the corannulene dimer without and with CP correction, respectively. The plotted data are collected in Table S10 of the Supporting Information.
Wall Times and Corresponding Peak Performance Utilizations Measured for Large Systems of 47–75 Atoms with the FNO-NAF Thresholds Specified in Table
| wall
time | % performance | |||||||
|---|---|---|---|---|---|---|---|---|
| species | atoms | no. of AOs | no. of cores | CCSD [min] | (T) [day] | CCSD | (T) | |
| ED28 | 47 | 1978 | 57 | 112 | 60 | 3.4 | 48 | 54 |
| PR28 | 51 | 2124 | 60 | 112 | 76 | 5.1 | 49 | 50 |
| enamine | 57 | 998 | 69 | 32 | 74 | 4.7 | 55 | 71 |
| corannulene dimer | 60 | 1820 | 90 | 160 | 80 | 13.5 | 28 | 33 |
| GC-dDMP-B | 63 | 1042 | 103 | 112 | 31 | 2.3 | 39 | 54 |
| TSCCRS | 75 | 1381 | 97 | 112 | 87 | 9.9 | 46 | 50 |
Time of one iteration.
Performed with four 28-core Intel Xeon Platinum 8180M CPUs clocked at 1.7 GHz.
Performed with four 8-core Intel Xeon E5-2609 v4 CPUs clocked at 1.7 GHz.
Performed with eight 20-core Intel Xeon Gold 6138 CPUs clocked at 1.3 GHz.