Literature DB >> 35936506

Mechanistic Insights into Enzyme Catalysis from Explaining Machine-Learned Quantum Mechanical and Molecular Mechanical Minimum Energy Pathways.

Zilin Song¹, Francesco Trozzi¹, Hao Tian¹, Chao Yin¹, Peng Tao¹.

Abstract

With the increasing popularity of machine learning (ML) applications, the demand for explainable artificial intelligence techniques to explain ML models developed for computational chemistry has also emerged. In this study, we present the development of the Boltzmann-weighted cumulative integrated gradients (BCIG) approach for effective explanation of mechanistic insights into ML models trained on high-level quantum mechanical and molecular mechanical (QM/MM) minimum energy pathways. Using the acylation reactions of the Toho-1 β-lactamase and two antibiotics (ampicillin and cefalexin) as the model systems, we show that the BCIG approach could quantitatively attribute the energetic contribution in one system and the relative reactivity of individual steps across different systems to specific chemical processes such as the bond making/breaking and proton transfers. The proposed BCIG contribution attribution method quantifies chemistry-interpretable insights in terms of contributions from each elementary chemical process, which is in agreement with the validating QM/MM calculations and our intuitive mechanistic understandings of the model reactions.

Entities: Chemical

Year: 2022 PMID： 35936506 PMCID： PMC9344433 DOI： 10.1021/acsphyschemau.2c00005

Source DB: PubMed Journal: ACS Phys Chem Au ISSN： 2694-2445

Introduction

Machine learning (ML) has emerged with a great promise to approximate the target function of any form regardless of the a priori knowledge about the underlying correlations among variables. The application of various ML techniques has also advanced theoretical chemistry in various subjects,[1−4] which have been suffering from either the extensive computational demands of high levels of theory[5−8] or the high dimensionality of the chemical and/or conformational spaces.[9−14] Although ML could be introduced to many topics that require accurate and efficient approximations, its performance and effectiveness have been limited by the feature representation and interpretability of the model.[15] In addition to the routinely applied feature representations, unsupervised models and rational statistical procedures have been developed and applied to extract robust feature vectors from the chemical feature space.[16−18] In particular, considerable pioneering efforts have focused on the development of suitable descriptors and accurate ML models for approximating hybrid quantum mechanical/molecular mechanical (QM/MM) potentials.[19−23] With the booming popularity of ML, interest to interpret deep learning (DL) neural networks have synergistically risen as a subfield of great importance, namely the explainable artificial intelligence (XAI).[24] The DL models being interpretable not only elevate our understanding of the learning algorithms but also constitute responsible ML/DL-assisted decision making.[25] Practically, XAI techniques attribute the predicted outcome of DL models to individual feature contributions, therefore, rationalizing the driving forces behind the decision flow in the models that are black boxes. While explicit indicators for feature contributions are straightforward in linear models[26] and are incorporated by design in specific ensemble-based models,[27] explaining neural networks is in general hindered by the high nonlinearity accumulated through the activations of the hidden layers. Based on the assumption that the predicted nonlinear surface could be approximated as linear at local regions, effective importance attribution methods have been proposed based on model gradients.[28] The state-of-the-art XAI techniques, such as the integrated gradients (IG)[29] and the layer-wise relevance propagation,[30] have demonstrated great promise in various explaining tasks such as medical diagnosis[31] and cheminformatics applications.[32] In addition, interpretable-by-design DL architectures effectively making the use of the attention mechanism have been developed for challenging explaining tasks such as protein–ligand binding affinities.[33,34] Whereas DL models have proven to accurately predict various chemical properties, we are particularly interested in that if the DL models could also be post hoc explained to gain mechanistic insights into enzyme catalysis. In this study, the acylation between Toho-1, a class A serine-based β-lactamase (ASβL), and two β-lactam antibiotics, ampicillin (AMP) and cefalexin (CEX), were used as the model reactions (Figure a,b). Theoretical efforts have revealed that the acylation of ASβLs could undergo two distinct pathways, which are distinguished by the reactant protonation states of Lys73 and Glu166 (Figure c).[35−40] Hermann et al.[35,36] first reported that the acylation is initialized by a deprotonated Glu166 acting as the general base, leading to the Glu166 as the sole base pathway which produces the acyl-enzyme product (R1-AE). Meroueh et al.[37] further proposed that the deprotonated Lys73 and protonated Glu166 could concertedly mediate the acylation (R2-AE). Whereas the acylation mechanisms of β-lactamases are considered similar, in general, the reactivity of β-lactamases varies with different β-lactam families.[38] The understanding of the determinant factors that impact the acylation activity of different ligand scaffolds is challenging but critical for future optimizations of β-lactam drugs.[39] Our recent study on the acylation of Toho-1 by AMP and CEX showed that the R1-AE pathways are viable for AMP but prohibitive for CEX, while the R2-AE mechanism is generally accessible for both AMP and CEX.[40] Herein, we aim to unravel the conformational and energetic factors that prime the different pathways observed for the acylation of Toho-1 by AMP and CEX. We first present 800 minimum energy pathways (MEPs) that were obtained from a robust computational workflow combining constrained molecular dynamics (MD) and QM/MM chain-of-state calculations. Feed-forward neural networks were designed to accurately predict the MEP profiles using conformational features selected from an ad hoc univariate variable analysis. In light of the pioneering works on XAI, we developed a thermodynamics-aware contribution attribution method to interpret the ML MEP models. We show that quantitative mechanistic insights into various aspects of enzyme catalysis can be derived from interpreting purpose-oriented ML-MEP models in terms of energy contributions from chemistry-interpretable factors.

Figure 1

Toho-1 β-lactamase, the β-lactam antibiotics, and the acylation pathways of Toho-1. (a) Toho-1/β-lactam complex and the selection of QM atoms. The carbon atoms of the amino acid residues are colored in dark gray. The Cα–Cβ bonds (used as the QM/MM boundary for the amino acid side chains) are colored in cyan. The carbon atoms of the β-lactam ligand are colored in magenta, except the carbonyl carbon in β-lactams, which is colored in dark purple. The catalytic water molecule is uniformly colored in green. All nitrogen, oxygen, sulfur, and hydrogen atoms are colored in blue, red, yellow, and white, respectively. All hydrogen atoms in the ligand are omitted for clarity. (b) Chemical structures of AMP and CEX. The penam and cephem scaffolds are colored in red; (c) Acylation pathways using Glu166 as the general base (R1-AE) and using Glu166/Lys73 concertedly as the general base (R2-AE).

Methods

For clarity, we first present the theoretical framework for contribution attribution of ML/DL models that were trained on an ensemble of QM/MM MEPs. For an ML-MEP model, which is designated as F and is trained on a dataset of MEPs, the contribution of a chemical process c on the p-th MEP can be represented as the “pathway-wise” contribution attributing function Apath The overall contribution of chemical process c in the ensemble of MEPs is the sum of the contribution weighted by the accessibility (the Boltzmann factor) of p-th MEP, which iswhere ΔE is the energy barrier of the p-th MEP, R is the ideal gas constant, T is the temperature, and CN is a normalization factor. The exponential averaging implicitly assumes sample completeness in the MEP datasets, which is mostly impractical for actual MEP calculations. In practice, the direct application of Boltzmann weights would lead to numerical instability with a limited number of sampled MEPs. Alternatively, a probability density function (PDF), PDF(ΔE), could be introduced to smoothen the density of MEP barriers. In its simplest form, the PDF could be a single Gaussian function. Furthermore, in cases where the sampled barrier distribution failed to approximate a Gaussian distribution, alternative density estimators such as Gaussian mixture models (GMMs) or Kernel density estimations could be employed for better approximation. Nonetheless, introducing PDF to eq yields Note that the normalization factor CN′ is In our implementation, 2-component GMMs, which approximate the distribution of MEP barriers using a weighted sum of two independent Gaussian, were used as the PDF for MEP barriers. If the chain-of-states replica path method (RPM, see Computational Details)[41−43] is used for MEP calculations, each transition path is represented by a series of discrete replicated structures (replicas) that connect the reactant and product. The “pathway-wise” contribution Apath could be calculated from the “replica-wise” attribution function Areplica of c to the energy of the r-th replica on the p-th MEPwhere is the total number of replicas in each MEP. As proposed by Sundararajan et al.,[29] for a DL model F, the contribution of the i-th feature x of the feature vector corresponding to a specific sample point can be calculated as the IG along a path γ(α) that connects the sample point with feature vector (where α = 1) and a baseline with feature vector x′ (where α = 0) In our case, the reactant states on the MEPs were selected as the baselines, and the contribution of c at the r-th replica was the integrated partial derivatives (with regard to c) through the intermediate replicas preceding the r-th replica along the MEP. Accordingly, eq is adapted for the discrete reaction pathway aswhere i is the index of the pathway replicas, and (p,i) is the feature vector of the i-th replica on the p-th MEP. The representation of c must be determined to expand the first partial derivative in eq . As noted, c represents a “chemical process” that includes (but is not limited to) bond making/breaking and proton transfers. The progress of the chemical process is commonly represented by the linear combination of multiple order parameters such as atomic distances, often referred to as the reaction coordinates or collective variables. However, we note that the correlation between the atomic distances is highly nonlinear in the evolution along the optimal reaction path obtained from the chain-of-state calculations. Therefore, instead of feeding the reduced representation of linear-combined atomic distances, we used a set of atomic distances that accounts for all chemical processes during the acylation for the training of the DL model F, that iswhere E(p,i) is the energy of the i-th replica on the p-th MEP. Furthermore, owing to the nonlinearity of the correlation between the feature dimensions, the gradients of F with regard to c, which usually correspond to multiple feature dimensions in , cannot be calculated analytically. Therefore, the first partial derivative in eq is computed numericallywhere ε denotes a small perturbative factor (0.01 as in the current study); δ acts as the selector for the feature dimensions included in the chemical process (c). In practice, δ is a multihot-encoded mask to ensure that the perturbation ε is applied only to the features that represent the chemical process of interest; Dist(A,B) stands for the distance metric that accounts for the pathway curvature. The gradient on the pathway curvature reads Combining eqs , 9, and 10, the contribution of the chemical process c to the r-th replica on the p-th pathway is calculated as the integrated partial gradients of the ML-MEP model F as Accordingly, the contribution of c along one MEP Apath(F,c,p) can be calculated by cumulatively summing the integrated partial gradients (eq ). The sign of Apath(F,c,p) gives the interaction between the chemical processes, whereas its absolute values give the perturbative response of the ML-MEP model regarding different c. Therefore, the absolute values of Apath(F,c,p) were used to calculate the weighted contributions in eq . In this study, eq is termed the Boltzmann-weighted cumulative IG (BCIG) contributions. We proceed with the computational details that implement the BCIG approach for interpreting the acylation reactions of Toho/AMP and Toho/CEX.

Computational Details

System Setup

The crystal structure of the Toho-1/benzylpenicillin complex with high resolution (PDB entry: 5KMW)[44] was used as the template to generate the target systems. Toho/AMP and Toho/CEX systems were constructed from the crystal structure of the Toho-1/benzylpenicillin complex, respectively. The parameters for AMP and CEX were obtained from the CHARMM general force field (CGenFF).[45,46] The three engineered mutations (Ala166/Asn274/Asn276) in the crystal structure were modified to Glu166/Arg274/Arg276 as in the wild type enzyme. All titratable residues were protonated under neutral pH based on PropKa calculations and experimental neutron diffraction results.[47,48] Both systems were solvated in 80 Å cubic water boxes. Sodium and chloride ions were added to neutralize the total charges in the systems. The CHARMM36 force field (C36),[49] CGenFF,[50] and TIP3P parameters were used to treat the protein, ligands, and solvent molecules, respectively. The solvent molecules were held rigid with the SHAKE constraint.[51] The nonbonding interactions excluding long-range electrostatic interactions between the atoms using molecular mechanics (MM) were explicitly treated within 10 Å and were smoothened out to zero at 12 Å. The periodic boundary conditions were employed in all three dimensions, and the particle mesh Ewald method[52] was used for the summation of classical long-range electrostatic interactions. The solvated systems were then subjected to 500 steps of steepest descent minimizations followed by 5000 steps of adopted basis Newton–Raphson (ABNR) minimizations using the MM potentials. In order to obtain reliable orientations of the active site, we further optimized the system for 5000 ABNR steps using the semiempirical third-order density-functional tight-binding (DFTB3) potential with the 3OB parameter set for the QM region and CHARMM36 force field for the MM region (DFTB3/3OB/C36).[53,54] The DFTB3/3OB/C36 potential has been used extensively for simulating biomolecules, including β-lactamases,[55] for its computational efficiency. While the size of the QM region converges slowly with respect to physicochemical properties,[56,57] we note that we did not perform extensive benchmarks on the dependency of energetic profiles to the QM selections. Instead, the QM atoms are picked as the balance between interactions with the reacting atoms and computational cost. The QM region includes Ser70, Lys73, Ser130, Glu166, Asn170, Lys234, Thr235, Ser237, ligands, and catalytic water (Figure a). Our selections yield 125 and 123 atoms in the QM subsystems of Toho/AMP and Toho/CEX, respectively. The QM part of the amino acid residues was capped by hydrogen link atoms through their Cα–Cβ bonds. In all QM/MM simulations, the classical charges on the MM host atoms (Cα) were removed to prevent the unphysical overpolarizations to the QM density. After energy minimization, both systems were subjected to heating and equilibration simulations using classical force fields. All QM atoms were kept frozen during the heating and equilibration stage in order to preserve the QM-optimized geometries of the active site. Accordingly, the QM link atoms (used in the DFTB3 minimizations) were temporally removed, and the masses of the MM hosting carbon atoms were rescaled to their unit atomic mass (12.0110). Both systems were heated from 110 to 310 K with a temperature increment of 10 K per 1.5 ps followed by canonical (NVT) equilibrations at 310 K for 150 ps. The temperatures during the heating and the NVT equilibration dynamics were regulated via explicit velocity rescaling. Both Toho/AMP and Toho/CEX systems were then subjected to 150 ps isothermal–isobaric (NPT) equilibration dynamics. During this stage, the temperature of the systems was maintained via the Hoover thermostat[58] at 310 K, while the pressure was coupled to 1 atm with the Langevin piston method.[59] Based on reaction mechanisms of the target reaction, we created the reactant state suitable for the mechanism with Glu166 as the sole base (R1), the reactant state suitable for the mechanism with Lys73 and Glu166 as dual bases (R2), and the acyl-enzyme (AE) state for Toho/AMP and Toho/CEX systems using the DFTB3/3OB/C36 potential with necessary distance-based restraints. A total of six states (R1, R2, and AE of Toho/AMP and Toho/CEX) were subjected to extensive conformational sampling with constrained NVT simulations for 150 ns. During the constrained dynamics, the reacting function groups (the hydroxyls of the Ser70/Ser130, the Lys73 amino group, the Glu166 carboxyl group, the carbonyl–nitrogen bond of the ligands, and the catalytic water) were fixed in place to retain their QM optimized orientations, and the rest of the QM atoms were allowed to move freely. The snapshots used for the MEP calculations were taken from the last 120 ns of the constrained MD trajectories with a time interval of 1.2 ns for each system. Briefly, a total number of 600 snapshots (100 snapshots from three states of two systems) were selected as the starting conformations for the MEP calculations.

QM/MM MEPs

The 600 starting conformations were all optimized using the DFTB3 potential. During the optimizations, the surrounding MM residues within 4 Å of the QM region were allowed to move while the remaining of the system was fixed. The corresponding product/reactant states were generated from the starting conformations. Specifically, AE states were generated from each R1 or R2 conformation from the simulations as the product state for each R1 → AE or R2 → AE pathway. For each AE state sampled from the simulations, both R1 and R2 reactant states are generated, leading to two independent AE → R1 and AE → R2 pathways. In total, 800 pairs of either R1/AE or R2/AE states were generated for the Toho/ligand complexes and were used as the reactant/product pairs for the MEP optimizations using the RPM with holonomic constraints. The accuracy of the QM potential is crucial for the quality of the MEP profiles. The DFTB3 method has been previously validated to produce accurate geometries at stable states. However, it was also reported that this approach would largely overestimate the bond dissociation energies between the heavy atoms.[60] Alternatively, a plausible approach for obtaining high-quality pathway profiles is to first optimize the MEP geometry under a fast, parameterized semiempirical QM method and refine the single-point energies using higher ab initio/DFT QM levels of theory along the reaction pathway represented as a chain of replicas.[61] This approach requires that the MEPs optimized at the semiempirical QM level of theory are close to the MEPs optimized at higher ab initio/DFT QM levels of theory to achieve the goals of avoiding large errors of the energetics using semiempirical QM models such as DFTB3 and high computational cost of pathway optimization using the higher QM levels of theory. We compared the MEPs optimized using DFTB3/3OB/C36 and B3LYP-D3/6-31G**/C36 levels of theory, respectively (Figure S1). Additionally, the 3OB-f parameter set,[53] with optimal stretching vibrational frequencies for the C=C, C=N, and C=O species, was also employed as comparison (referred to as DFTB3/3OB-f/C36). Overall, the DFTB3/3OB/C36 level of theory overestimates the bond lengths of the tetrahedral intermediates compared to that of DFT calculations. On the other hand, DFTB3/3OB-f/C36 MEPs closely resemble the B3LYP-D3/6-31G**/C36 MEPs in terms of key bond distances during the acylation processes. Therefore, DFTB3/3OB-f/C36 was applied to carry out all the pathway optimization calculations in this study. The B3LYP hybrid functional[62,63] with the 6-31+G** basis set[64] plus the D3 dispersion correction[65] (B3LYP-D3/6-31+G**/C36) were employed as the high-level QM counterpart to refine the single-point energies along the optimized MEPs. We note that the combination of the levels of theory (DFTB3 and B3LYP-D3) has been proposed previously for enzyme catalysis simulations.[66,67] It has also been demonstrated that both the DFTB3 and B3LYP methods are applicable for studying concerted reaction steps in QM/MM simulations.[68,69] The initial guesses of the MEPs were created by linearly intercepting the Cartesian space between the reactants and the acyl-enzyme products using 50 replicated structures. The constrained RPM method,[41−43] which enforces equal mass-weighted root-mean-square deviation distances between adjacent replicas, was applied to optimize the initial linear chain-of-replicas. The atomic weights of the migrating protons were scaled by a factor of 50 in order to capture continuous proton transfers along the pathway trajectory. A kinetic energy potential force constant of 0.05 kcal mol–1 Å–2 was applied for all MEP optimizations to facilitate the smoothness and convergence of the transition path. All 800 Toho/AMP and Toho/CEX acylation reaction MEPs were subjected to single-point calculations at the B3LYP-D3/6-31+G**/C36 level of theory. These 800 MEPs (40,000 replica conformations and energies) were grouped into four datasets: the Toho/AMP: R1-AE pathways, the Toho/AMP: R2-AE pathways, the Toho/CEX: R1-AE pathways, and the Toho/CEX: R2-AE pathways (Figure S2).

Machine Learning

A proper selection of feature representation is critical to the performance and interpretability of the ML models. Various feature vectors built from conformational and/or physical descriptors have been proposed to accurately predict molecular properties or potentials.[13,14] In this study, we note that the goal of the ML regressions is not only to predict the single-point total energies of the systems but also to bridge the conformational change with the energy evolution along the optimized MEPs, which mainly attributes to the displacement of the reacting atoms in the QM regions. Therefore, the initial selection of features covered (1) the atomic distances between the chemical-bonded and hydrogen-bonded heavy atoms in the QM region and (2) the hydrogen-donor/acceptor distances between the reacting hydrogens and surrounding QM heavy atoms. This initial selection of features was then subjected to a univariate analysis against the energy profiles using mutual information as the metric. In each pathway dataset, the atomic distances sharing higher mutual information than the intuitive reaction coordinates were included in the feature vectors. The features selected based on datasets of the same mechanism (e.g., R1-AE pathways for both Toho/AMP and Toho/CEX complexes) were merged to produce unified selections for either R1-AE or R2-AE mechanisms across different systems, respectively. The final selection of features on the R1-AE and R2-AE pathway datasets was illustrated in Figure a,b.

Figure 2

Selected features, the 2D representation of the pathway conformations, and the architecture of the ML-MEP models. The selected features and chemical processes of (a) Toho/AMP: R1-AE and Toho/CEX: R1-AE datasets and (b) Toho/AMP: R2-AE and Toho/CEX: R2-AE datasets. The atomic distances that are included in feature vectors are noted in orange lines and the chemical processes are noted in blue; see also Table S1. (c) 2D principal component dimensionality reduction of the pairwise inter-heavy-atom distances in the QM region, and a schematic demonstration for the loss of pathway context of the replicas; (d) architecture of the QM/MM MEP learning deep-and-wide neural network. In order to reflect the conformational change along each MEP, the selected bond lengths as features were offset by subtracting their corresponding values in the acyl-enzyme conformation as the product state in the pathway. Furthermore, for each MEP, the feature vectors on the replicas are rescaled by dividing the largest values along each feature dimensionwhere represents the feature vectors of all replicas on the -th MEP. Similarly, for each MEP, the replica energy profiles were also offset using the energies of the acyl-enzyme product states as the reference (0 kcal mol–1). Notably, our preprocessing of input features enables the independent representation of conformational and energetic evolutions along each MEP: each pathway is treated individually without reference to other MEPs in the dataset. However, the largely reduced feature representation also blurred the context of each replica, leading to the loss of pathway identity. As shown in Figure c, replicas that belong to different pathways could distribute closer than their neighboring replica on the same pathway (d1 < d2). Using the reduced feature representation, it is important to distinguish the pathway identity for replicas, especially those that are close to each other in the reduced feature space. In this regard, the distinguishability between different pathways should be retrieved by further hardcoding the pathway affinity of the replicas in a “one-hot” manner. With these considerations, a feed-forward neural network with the “deep-and-wide” learning architecture (DaWNN) proposed by Cheng et al.[70] was implemented for the learning of the QM/MM MEPs (Figure d). The conformational features were fed to the deep side of the input layers and were embedded through four hidden layers. The encoded pathway identities were introduced from the wide entrance and were concatenated to the deep embedding flow before the last hidden layer. The single-point replica energies were obtained from a linear output layer. The rectifier linear unit was used to activate all hidden neuron nodes. The dropout strategy[71] was applied for all hidden layers to promote the generalizability of the neural networks and prevent overfitting. The dropout rate (0.1) and the number of neuron units (256) on the hidden layers were tuned via a grid search strategy on a 10% pathwise stratified validation set. Practically, we constructed the validation set by randomly picking five replicas from each of the pathways carrying 50 replicas. The standard mean squared error (MSE) was used as the objective loss function to train the ML-MEP models. All models were trained with the AdaM optimizer[72] for 300 epochs with a sample batch size of 25.

Implementation Details

All MD simulations were performed using CHARMM and OpenMM.[73,74] All QM/MM calculations were performed with the built-in DFTB module of CHARMM or its interface to Q-Chem.[75−77] The postprocessing of the pathway conformations, featurization, and the density estimating GMMs took advantage of the MDAnalysis and the scikit-learn package.[78,79] The MEP-learning neural networks were implemented with TensorFlow.[80] All molecular graphics were prepared with UCSF ChimeraX.[81]

Results

QM/MM MEPs

Pioneering mechanistic studies have suggested that the acylation half of the ASβL catalysis is the rate-limiting step of the β-lactam hydrolysis,[35,36] allowing the comparison between the theoretical acylation barriers and the experimental kcat values using the Eyring equation. According to the Toho-1 catalysis kinetic studies by Shimizu-Ibuka et al.,[82] the acylation barrier of Toho/AMP is ∼15.5 kcal mol–1. Moreover, the enzyme kinetics reported by Nitanai et al.[83] on Toho-1 with the Arg274Asn/Arg276Asn-engineered mutants showed that the acylation barrier of AMP is lower compared to that of CEX (determined from the kcat/kM ratio) by ∼1.7 kcal mol–1. While both mutations were known to not participate in the acylation process, the acylation barrier of wild type Toho/CEX can be estimated as ∼17.2 kcal mol–1. The barrier distributions of the calculated QM/MM MEPs are plotted in Figure . While the free-energy barriers of a certain reaction path could be obtained from the exponential average of the barriers from a set of MEPs, the number of minimum energy barriers needed to achieve an accurate estimation is prohibitively high. According to the numerical simulations of Ryde,[84] if the barriers were Gaussian distributed, the number of pathway samples required for an accurate exponential averaged free-energy barrier is >5000 for all four pathways investigated in this study. Fortunately, the relative rankings of free-energy barriers for different mechanisms can be robustly estimated from the arithmetic averaged barrier. In particular, in cases where the difference between mean averages of the modeled systems is larger than ∼4.8 kcal mol–1, reliable comparisons of reactivity can be assessed from ∼70 barrier samples with regard to the largest standard deviation of the pathway barriers in the current study (12.58 kcal mol–1 of the Toho/CEX: R2-AE pathways, Figure d).[84] With 200 barrier samples for each mechanism, we accordingly rank the relative activity of the four mechanisms as Toho/AMP: R2-AE > Toho/AMP: R1-AE > Toho/CEX: R2-AE > Toho/CEX: R1-AE.

Figure 3

The distribution of the acylation barriers (ΔE) at B3LYP-D3/6-31+G**/C36 level of theory. (a) Toho/AMP: R1-AE acylation pathways; (b) Toho/CEX: R1-AE acylation pathways; (c) Toho/AMP: R2-AE acylation pathways; and (d) Toho/CEX: R2-AE acylation pathways. The scatters show the locations of the energy barriers. The width of the histograms is 4 kcal mol–1. The red curves note the density estimation from the GMMs. The labels “min. ΔE”, “expo. ”, “mean ”, “med. ΔE”, and “std” refer to the minimum, exponential average, arithmetic average, median, and standard deviation of the energy barriers from the QM/MM MEP profiles, respectively. Both R1-AE and R2-AE acylation pathways of Toho/AMP are accessible as they show lower mean averaged barriers than the Toho/CEX ones. The Toho/AMP pathways also resemble the energetic landscapes of ASβL/penam acylation reported in previous studies.[37] The exponential-averaged acylation barrier is 16.98 kcal mol–1 for the Toho/AMP: R1-AE pathway (Figure a), which is 12.81 kcal mol–1 higher than the Toho/AMP: R2-AE pathway(4.17 kcal mol–1, Figure c). The lowest Toho/AMP: R1-AE barrier is shown to be 14.01 kcal mol–1, which is lower than the experimental acylation barriers of ∼15.5 kcal mol–1. As for Toho/CEX acylation, the pathway via the R2-AE mechanism confers an exponential-averaged barrier of 14.33 kcal mol–1 (Figure d), which is 11.22 kcal mol–1 lower than its R1-AE alternative (25.55 kcal mol–1, Figure b). Furthermore, as the lowest energy barrier found on Toho/CEX: R1-AE pathways (22.35 kcal mol–1) is much higher than the estimated experimental barrier (∼17.2 kcal mol–1), these pathways are considered generally inaccessible. The viable acylation path for Toho/CEX is therefore verified to be the R2-AE mechanism.[40]

Energetics Interpreted from BCIG Contributions

In order to delineate the BCIG contributions of the chemical processes during the acylation, each MEP dataset was used to train an ML-MEP model to learn the correlation between the evolution of energetics and conformations along the MEPs. The prerequisite for model expandability is that the ML/DL model should correctly resemble the underlying variable correlations, which also facilitate accurate predictions. The importance of accounting for the pathway identity is reflected in the predictive performances of our DaWNN architecture and conventional deep neural networks (DNN) with identical hyperparameters and training configurations. As shown in Figure a, the DaWNN models could accurately predict the replica-wise energy with R2 scores >0.995. Meanwhile, the predictive performance of the DNN models dramatically decreases (R2 < 0.940) in the absence of pathway distinguishability (Figure S3). On the other hand, the DaWNN models could correctly predict the barrier heights with RMSE < 2 kcal mol–1 (Figure b). While the improvement of including the pathway identity of each replica is evident, intuitively, the identity encoding extended the input feature space to higher dimensions and embedded the replicas into their respective pathways, allowing each pathway replica to be independently correlated to its own conformational/energetic context.

Figure 4

Predictive performance and the BCIG contributions of the ML-MEP models. The predictive performance of (a) replica energies and (b) pathway barriers of (left to right) the Toho/AMP: R1-AE, Toho/CEX: R1-AE, Toho/AMP: R2-AE, and Toho/CEX: R2-AE models. The BCIG contributions of the models: (c) Toho/AMP: R1-AE; (d) Toho/CEX: R1-AE; (e) Toho/AMP: R2-AE; and (f) Toho/CEX: R2-AE. The rankings (highest to lowest) of the BCIG contributions are noted below. To interpret the ML-MEP models in terms of energetic contributions, we first grouped the input features (as atomic distances) by the chemical processes that they are involved in (Figure a,b, Table S1). The contributions of each chemical process are quantified using the proposed BCIG metrics. At this stage, the BCIG measurements are not ready to be quantitatively compared between the ML-MEP models that were trained from different pathway datasets (see below). For the Toho/AMP: R1-AE pathways (Figure c), the highest BCIG contributions come from the concerted proton transfers from Lys73 to the thiazolidine nitrogen (P2 and P3). The proton abstraction of Glu166 carboxyl (P0 and P1) was assessed to be moderately rate-limiting as they pose higher contributions than the nucleophilic attacks of the Ser70 hydroxyl to the β-lactam carbonyl (B0 and B1). The contributions for the Toho/CEX: R1-AE pathways also suggest that the protonation of the cephem nitrogen (P2 and P3) is the determinant factor with the highest contribution (Figure d). However, the deprotonation of Ser70 Oγ (P1) and its nucleophilic attach to the cephem carbonyl carbon (B0) were shown to considerably contribute to the reaction profiles of Toho/CEX: R1-AE. On the R2-AE acylation pathways, the concerted proton transfers from Lys73 to the β-lactam nitrogen, bridged by Ser130 hydroxyl (P2 and P3), remains as the reaction step of the highest BCIG contributions in both Toho/AMP and Toho/CEX systems (Figure e,f). Interestingly, in the Toho/AMP: R2-AE pathways, the highest individual contribution comes from P2 (the proton transfer between Lys73 and Ser130), while in Toho/CEX: R2-AE, it was determined as P3 (the protonation of the cephem nitrogen). In summary, on all reactive pathways (Figure c–f), the protonation of the β-lactam nitrogen (P2 and P3) is deemed to be the highest energy contribution, in agreement with our previous investigations on TEM-1/benzylpenicillin acylation.[85] Noteworthily, even if the contributions between two systems cannot be directly compared, the ordering of the contributions from various chemical processes could qualitatively hint at the major factors that differentiate the acylation kinetics of different systems. Specifically, the ranking of contributions from the nucleophilic attack of Ser70 Oγ (B0 and P1) are higher for the Toho/CEX: R1-AE pathways (Figure d) than that of the Toho/AMP: R1-AE pathways (Figure c). This suggests that the formation of the tetrahedral intermediates for the CEX acylation is one of the main causes for its decreased acylation activity. On the other hand, the ranked order of BCIG contributions on the R2-AE pathways suggests that the tetrahedral collapsing (B1 and P3) is the major source of the kinetic differences, as P3 and B1 are respectively ranked the second- and the fourth-highest BCIG contributions in the Toho/AMP contributions (Figure e), and increase to the first and the third as in the Toho/CEX contributions (Figure f), respectively. These implications are related to the BCIG-explained reactivities.

Reactivity Interpreted from BCIG Contributions

As aforementioned, the BCIG contributions cannot be compared among different models trained from different MEP datasets due to that the distributions of input features from different pathway datasets are not guaranteed to be identical. In this case, the BCIG contributions were incompatible with each other as the models from which the contributions were derived do not share the same input space (Figure c). A solution is to derive BCIG contributions from a new model that is trained on both datasets, which allows the ML model to extract knowledge from the union input space of different systems. Therefore, the pathway datasets of the same mechanism were merged to produce the Toho/AMP&CEX: R1-AE and Toho/AMP&CEX: R2-AE datasets. Additional ML-MEP models with the DaWNN architecture were trained on the combined datasets with extended pathway identity encodings. We note that the mixed ML-MEP models could deep-embed all the input MEP samples through their hidden layers from both systems, adapting their trainable weights to the mixing of training samples from different sources. While this mixed embedding does not impact the prediction accuracy of the model, from the perspective of sample affiliations (Toho/AMP or Toho/CEX), the model’s knowledge on the learned PES landscape of one system is likely biased. In other words, when the DaWNNs are trained from a dataset with MEPs from different systems, the BCIG-explained reactivity contribution of the same feature from one system (e.g., the Toho/AMP: R1-AE pathways) can be compared to its contribution to the MEPs of the other system (the Toho/CEX: R1-AE pathways), as the knowledge landscape of the ML-MEP models is aware of the coexistence and the merged sample distribution of both systems in the training data. Meanwhile, the reactivity contributions of different features from the same system cannot be compared because the contributions of all feature dimensions in one system are biased by the other. With the above awareness, we first examined the predictive performance of the ML-MEP models trained on the unified datasets. As shown in Figure a,b, the model prediction on the replica energies or the pathway barriers remain at the same predictive accuracies (RMSEs < 2 kcal mol–1 and R2 > 0.995) compared with the models trained on individual datasets (Figure a,b). The scalability of the DaWNN architecture for learning MEPs is demonstrated.

Figure 5

Predictive performance and the BCIG contributions of the unified ML-MEP models. The predictive performance of (left to right) the replica energies and the pathway barriers of (a) Toho/AMP&CEX: R1-AE and (b) Toho/AMP&CEX: R2-AE models. The BCIG contributions of (c) Toho/AMP&CEX: R1-AE and (d) Toho/AMP&CEX: R2-AE models. The BCIG contributions are computed for the reactivity-explaining models: Toho/AMP&CEX: R1-AE and Toho/AMP&CEX: R2-AE. For the R1-AE pathways, the BCIG contributions to all chemical processes are much higher in the Toho/CEX acylation MEPs than the Toho/AMP ones (Figure c). As expected, the correlated P2 and P3 contributions largely increase for the Toho/CEX R1-AE acylation pathways, reflecting the less active protonation of the cephem nitrogen. In addition, the nucleophilic serine attack (B0 and P1) in Toho/CEX is more than three times higher than that in Toho/AMP, which aligns with the increment of rankings of B0 and P1 for the energetic contribution in Toho/CEX (Figure c,d). While the acylation pathways are initialized by the nucleophilic serine addition, the high BCIG contributions attributed to this process indicate that the R1-AE acylation pathways for Toho/CEX are unfavored compared to Toho/AMP. Combining the enzyme kinetics discussed above, the interpretation of BCIG contributions shows that the acylation pathway using solely Glu166 as the general base is turned off for Toho/CEX due to its incapability to activate the serine attack on the cephem carbonyl. On the other hand, the contributions in the Toho/AMP&CEX: R2-AE models demonstrated the same trend (Figure d): the BCIG values for most chemical processes in Toho/CEX pathways are higher than those of the Toho/AMP pathways. The differences of the BCIG contributions in the two systems mainly come from the residue-involved processes: B0, P4 (the serine nucleophilic attack to β-lactam carbonyl), and P2, P3 (the concerted proton transfers to protonate the β-lactam nitrogen). Interestingly, the BCIG contribution from C–N bond breaking (B1) for Toho/CEX concerted base acylation is shown to be slightly lower than that for Toho/AMP. In brief, the BCIG contributions from the Toho/AMP&CEX: R2-AE MEP learning model show that the energy contributions of concerned chemical processes on the Toho/CEX pathways are moderately higher than the Toho/AMP pathways, suggesting lower acylation activity for the Toho/CEX: R2-AE pathways.

Validation of the BCIG Explanations

The mechanistic insights into the Toho/β-lactam energetic and reactivity contributions derived from the ML-MEP models using the BCIG metric are further validated by additional QM/MM protocols. Specifically, we measured the atomic distances between critical heavy atoms and performed the ChElPG population analysis[86] on the 800 reactant conformations in all MEPs. We first validate the BCIG-explained energy contribution for each system. For the Glu166 mediated acylation pathways (R1-AE), the concerted proton transfers from Lys73 to the β-lactam nitrogen (P2 and P3) are determined as the reaction steps of the highest energy contributions in both Toho/AMP and CEX pathways (Figure c,d). The proton transfer processes in enzyme catalysis are known to highly correlate with the strength of the hydrogen-bonding interactions between the proton donor and the acceptor. Here, the atomic distances among Lys73 Nζ, Ser70 Oγ, and the β-lactam nitrogen (P2 and P3) are shown to be the largest compared to that of other proton transfers (P0 and P1, Figure a), suggesting the weaker the hydrogen actions, the harder the proton transfers. A similar conclusion could also be drawn on the R2-AE acylation pathways (Figure e,f), where the P2/P3 atomic distances are larger than that of P4 (Figure c).

Figure 6

Atomic distances between the ChElPG charges on critical heavy atoms in all reactant states at the B3LYP-D3/6-31+G**/C36 level. The (a) atomic distances and (b) ChElPG charge profiles of the Toho/AMP: R1-AE and Toho/CEX: R1-AE pathways. The (c) atomic distances and (d) ChElPG charge profiles of the Toho/AMP: R2-AE and Toho/CEX: R2-AE pathways. The meanings of the labels on x-axes are defined in the symbolic legend (right panel). The BCIG-explained reactivities generally state that most of the chemical processes in Toho/AMP acylation pathways are more reactive than the Toho/CEX ones, regardless of the acylation mechanism. For the R1-AE acylation pathways that are prohibitive for Toho/CEX, it is shown that the atomic distances between the Lys73 amino and the Ser130 hydroxyl (P2) and the Ser130 hydroxyl to the cephem nitrogen (P3) are generally longer than those in Toho/AMP (Figure a). Moreover, the ChElPG charges on the β-lactam nitrogen atoms in CEX are much higher than that of the penam nitrogen of AMP (Figure b). These suggest the poor proton affinity of the CEX nitrogen atoms, which lead to higher BCIG contributions of P2 and P3 observed in Figure c. The contributions from the Ser70 addition processes (B0, P1) are another source of the deactivation of R1-AE acylation pathways for CEX. The key reactant distances in the two systems also suggest that the serine attack in CEX is less favorable than that of AMP due to the longer B0 distance in Toho/CEX (Figure a). Meanwhile, the ChElPG analysis shows no significant change in the density population of Ser70 Oγ upon binding of different ligands, but the atomic charges on the CEX carbonyl carbon are found to decrease (Figure b). Such electrostatic features are expected for the cephem scaffolds as the extended π-conjugations introduced by the C3=C4 double bond could delocalize the densities on the β-lactam carbonyl and nitrogen, leading to stronger resistance to the nucleophilic serine addition for CEX acylation. In the BCIG-explained ML-MEPs, this is reflected as the increased contributions from B0 and P1 observed in Toho/CEX: R1-AE pathways (Figure c). The general mechanistic insights made for the different reactivity in R1-AE acylation also hold for the R2-AE acylation pathways: the cephem scaffold delocalizes the density on the β-lactam, promoting its resistance to the protonation of the dihydrothiazine nitrogen (P2, P3, Figure c) and the nucleophilic serine addition (B0, P4, Figure d). As for the C–N bond breaking (B1) in different systems, the BCIG contribution from the pairwise distance between two bonded atoms refers not to the bond strength between the atoms but the impact to the energy profiles when perturbations are applied to the progress of that distance (eq ). Specifically, the BCIG contribution of B1 in the R2-AE hybrid models being close (Figure d) does not indicate that the “bond strengths” of the scissile C–N bonds are similar. Instead, it suggests that the “sensitivity” of the energetics of the QM/MM MEPs to the perturbation of the atomic distance between the carbonyl carbon and the nitrogen are similar. This is further supported by the reactant conformations and the atomic charges. As shown in Figure c, the β-lactam C–N bond distances are generally similar in different systems; it is expected that the BCIG contributions of B1 are close in Toho/CEX: R2-AE and Toho/AMP: R2-AE. On the other hand, the distributions of ChElPG charges of the bonded atoms (the β-lactam C, N) largely deviate between the two systems (Figure d), suggesting very different densities and thus the “bond strength” of the C–N scissile bonds in Toho/AMP and Toho/CEX systems. Alternatively, the relative “bond strength” of the β-lactam carbonyl nitrogen is reflected from the contribution of the serine attack (B0) and the nitrogen protonation (P3), which are concerted processes to the ring-opening (B1).

Discussions

In this work, we propose the contribution attribution method, BCIG, as an XAI technique to interpret ML-MEP models for mechanistic insights into enzyme catalysis. Using the BCIG contributions, we demonstrate how to extract interpretable mechanistic information that is difficult to quantify due to the concerted nature of the enzymatic reaction steps. The BCIG method is helpful to discern the thermodynamics that governs all chemical transitions. However, since the ML-MEP models are trained based on the limited samples of MEPs, the direct application of the Boltzmann factor to the cumulative IG could be limited by the size of the training set. Therefore, a PDF was introduced to smoothen the density of barriers distributions. Pioneering efforts have extensively applied equivalent density smoothening assumptions, such as Gaussian functions, for investigating or promoting the numerical convergence of free-energy methods.[84,87,88] Moreover, von der Esch et al.[89] reported that the adiabatic mapped desuccinylation barriers of Sirtuin five could reasonably resemble a Gaussian distribution with ∼150 samples. In this regard, we further examined the applicability of Gaussian distributions of our MEP barriers by a goodness-of-fit test. It was shown that Toho/CEX: R2-AE barriers failed to approximate a reasonable Gaussian distribution (Table S2). However, we note that the type of the PDFs does not seem to qualitatively impact the BCIG results (Table S3, Figure S4). Furthermore, numerical features of BCIG are characterized with a bootstrapping-based convergence test (Supporting Information Methods). We note that the BCIG contributions in all four pathway datasets converge when ∼170 MEPs were used (Figure S5). Moreover, our BCIG approach inherits the advantage of “model implementation invariance” from the IG formulism of Sundararajan et al.,[29] with the contributions attributed to the feature inputs being independent of the ML/DL algorithms used. Based on the model implementation invariance, if two models predict the same output on all input examples, the gradients of the predicted hypersurface should also be equal everywhere, and thus the invariance of the IG and the BCIG contributions. On the other hand, it is notable that the “path implementation invariance” of the IG approach is not preserved in the BCIG formulism for obvious reasons: it is vital to account for the contribution of the chemical process along a specific transition path (MEPs as in the current study). Thus, the gradients calculated for each replica along the path are cumulatively summed to achieve path specificity for the feature contributions. It is also worth mentioning that our explanations are purpose-oriented. It is preferred to build individual ML-MEP models using datasets for distinct chemical processes, such as Toho/AMP acylation pathways or Toho/CEX acylation pathways in this study. Therefore, we intentionally trained multiple ML-MEP models to learn different aspects of Toho-1-catalyzed acylation. Practically, the energetic contributions in each acylation pathway dataset are interpreted from models that learn only on a particular pathway dataset, and the reactivity of the chemical processes between different systems is interpreted from models trained on per mechanism merged datasets. We further note the extensibility of the current workflow based on the BCIG approach. In terms of data preparation, any MEP sampling method that effectively explores the transition region could be applied to construct the MEP datasets. Thanks to the “model implementation invariance,” the ML/DL algorithm used to learn the chemical MEPs are not limited to the DaWNN as we used. However, it is important to monitor the transformations applied to the model input and rationally design the learning strategies. Herein, we sequentially applied feature selections and pathway-wise feature rescaling, resulting in the largely dimensionality-reduced feature space and the loss of the pathway context for each replica. Accordingly, this is solved by hardcoding the pathway identity, which leads to the DaWNN architecture. Lastly, the c term to be explained is not limited to “chemical processes” as in the current study. The only requirement for the representation of c is that it should describe a certain interaction of interest, which is generally open to any physical descriptors that are known to correlate with chemical transitions.

Conclusions

In this study, we first presented a QM/MM computational workflow that achieves fast sampling of QM/MM MEPs for enzyme catalysis. Based on the observation that the 3OB-f DFTB3 parameter set could correctly resemble the bond dissociation distances during the β-lactam acylation reaction, we optimized 800 MEP conformations and refined the single-point energies using B3LYP-D3/6-31+G**/C36 calculations. The energetics from this computational workflow are in good agreement with previous calculations based on DFT/MM calculations.[40] ML-MEP models with high performance and scalability using the DaWNN architecture were developed for the ML of the QM/MM MEPs of enzyme catalysis. Compared to conventional DNN models, the DaWNN architecture achieves much higher accuracy in learning the energetic profiles from the conformational evolutions along the QM/MM MEPs. Furthermore, the DaWNN model is shown to be highly scalable to the training size or the source of the training data without significant loss in performance. Inspired by the IG approach for explaining ML/DL models, we further developed the BCIG approach to interpreting the ML-MEP models for mechanistic insights into enzyme catalysis. Using Toho/AMP and Toho/CEX as the model systems, the energetic and the reactivity contributions of the processes with different substrates are quantified by the BCIG attributions. The conformational factors that differentiate the Toho-1 acylation activities of AMP and CEX were identified. The BCIG contributions quantified that the cephem scaffold was less susceptible to the nucleophilic serine addition and the protonation of the β-lactam nitrogen than the penam. Moreover, we presented a purpose-oriented training-explaining strategy to focus on mode interpretability. Whereas the different ML-MEP models are trained and interpreted for specific mechanistic aspects, we have shown that the interpretations of different models give consistent mechanistic insights that agree with our intuitive mechanistic understandings and the validating QM/MM calculations on the modeled systems. With the developments of robust ML/DL-based QM/MM potential functions and the implementations of thermodynamics-aware XAI approaches for protein-related systems, one could foresee the emergence of the “full-stack” (simulation to analysis) ML/DL-assisted QM/MM studies on enzyme catalytic mechanisms in the near future.

65 in total

1. Roles of residues Cys69, Asn104, Phe160, Gly232, Ser237, and Asp240 in extended-spectrum beta-lactamase Toho-1.

Authors: Akiko Shimizu-Ibuka; Mika Oishi; Shoko Yamada; Yoshikazu Ishii; Kiyoshi Mura; Hiroshi Sakai; Hiroshi Matsuzawa
Journal: Antimicrob Agents Chemother Date: 2010-11-15 Impact factor: 5.191

Review 2. Bacterial resistance to beta-lactam antibiotics: compelling opportunism, compelling opportunity.

Authors: Jed F Fisher; Samy O Meroueh; Shahriar Mobashery
Journal: Chem Rev Date: 2005-02 Impact factor: 60.622

3. Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces.

Authors: Zhenwei Li; James R Kermode; Alessandro De Vita
Journal: Phys Rev Lett Date: 2015-03-06 Impact factor: 9.161

4. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning.

Authors: Frank Noé; Simon Olsson; Jonas Köhler; Hao Wu
Journal: Science Date: 2019-09-06 Impact factor: 47.728

5. QM/MM modeling of class A β-lactamases reveals distinct acylation pathways for ampicillin and cefalexin.

Authors: Zilin Song; Francesco Trozzi; Timothy Palzkill; Peng Tao
Journal: Org Biomol Chem Date: 2021-11-03 Impact factor: 3.876

6. DFTB3: Extension of the self-consistent-charge density-functional tight-binding method (SCC-DFTB).

Authors: Michael Gaus; Qiang Cui; Marcus Elstner
Journal: J Chem Theory Comput Date: 2012-04-10 Impact factor: 6.006

7. CHARMM general force field: A force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields.

Authors: K Vanommeslaeghe; E Hatcher; C Acharya; S Kundu; S Zhong; J Shim; E Darian; O Guvench; P Lopes; I Vorobyov; A D Mackerell
Journal: J Comput Chem Date: 2010-03 Impact factor: 3.376

8. Microscopic basis for kinetic gating in Cytochrome c oxidase: insights from QM/MM analysis.

Authors: Puja Goyal; Shuo Yang; Qiang Cui
Journal: Chem Sci Date: 2015-01 Impact factor: 9.825

9. Layer-wise relevance propagation of InteractionNet explains protein-ligand interactions at the atom level.

Authors: Hyeoncheol Cho; Eok Kyun Lee; Insung S Choi
Journal: Sci Rep Date: 2020-12-03 Impact factor: 4.379