Literature DB >> 27162970

Modeling Epoxidation of Drug-like Molecules with a Deep Machine Learning Network.

Tyler B Hughes¹, Grover P Miller², S Joshua Swamidass¹.

Abstract

Drug toxicity is frequently caused by electrophilic reactive metabolites that covalently bind to proteins. Epoxides comprise a large class of three-membered cyclic ethers. These molecules are electrophilic and typically highly reactive due to ring tension and polarized carbon-oxygen bonds. Epoxides are metabolites often formed by cytochromes P450 acting on aromatic or double bonds. The specific location on a molecule that undergoes epoxidation is its site of epoxidation (SOE). Identifying a molecule's SOE can aid in interpreting adverse events related to reactive metabolites and direct modification to prevent epoxidation for safer drugs. This study utilized a database of 702 epoxidation reactions to build a model that accurately predicted sites of epoxidation. The foundation for this model was an algorithm originally designed to model sites of cytochromes P450 metabolism (called XenoSite) that was recently applied to model the intrinsic reactivity of diverse molecules with glutathione. This modeling algorithm systematically and quantitatively summarizes the knowledge from hundreds of epoxidation reactions with a deep convolution network. This network makes predictions at both an atom and molecule level. The final epoxidation model constructed with this approach identified SOEs with 94.9% area under the curve (AUC) performance and separated epoxidized and non-epoxidized molecules with 79.3% AUC. Moreover, within epoxidized molecules, the model separated aromatic or double bond SOEs from all other aromatic or double bonds with AUCs of 92.5% and 95.1%, respectively. Finally, the model separated SOEs from sites of sp(2) hydroxylation with 83.2% AUC. Our model is the first of its kind and may be useful for the development of safer drugs. The epoxidation model is available at http://swami.wustl.edu/xenosite.

Entities: CellLine Chemical Disease Gene Species

Year: 2015 PMID： 27162970 PMCID： PMC4827534 DOI： 10.1021/acscentsci.5b00131

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 14.553

Introduction

Drug discovery and development involve significant efforts to identify safe and efficacious drugs; nevertheless, unanticipated toxicity and adverse drug reactions do occur and cause approximately 40% of drug candidates to fail.[1] Frequently, these harmful outcomes are linked to the formation of electrophilic metabolites that covalently bind to proteins or DNA and, in some cases, elicit an immune response in susceptible patients.[2−6] One of the most common types of reactive metabolites are epoxides, the subject of this study. Epoxides are three membered cyclic ethers and are often highly reactive due to ring tension and polarized carbon–oxygen bonds.[7−11] Epoxides are formed by cytochromes P450 acting on aromatic or double bonds,[12,13] and these epoxidation reactions comprise around 10%[14] to 15%[15] of all bioactivation reactions. Biological defense mechanisms to epoxides, including glutathione conjugation and cleavage by epoxide hydrolase, offer only partial protection.[7,11,16,17] Glutathione can be depleted,[18,19] and certain products of glutathione conjugation[17] and epoxide hydrolase[20,21] are themselves toxic. Epoxide metabolites often drive toxicity for drugs, and accurate strategies for anticipating the formation of epoxides are critical in drug development. Knowledge of epoxide formation aids assessment of drug candidates. Furthermore, the identity of the specific bond in a molecule undergoing epoxidation, its site of epoxidation (SOE), could enable rational modification of the molecule to reduce risk of reactive metabolite formation. An example of how this knowledge can lead to drugs with improved safety is illustrated by carbamazepine (Figure 1). The metabolism of this anti-epileptic drug forms carbamazepine-10,11-epoxide. Carbamazepine metabolism can also form an iminoquinone,[22] but the epoxide’s formation is the focus of this study and more correlated with adverse reactions.[23−25] The molecular mechanism for this response involves reactions between the epoxide and proteins to form adducts.[26] However, the epoxide formation can be blocked by modifying carbamazepine’s SOE. For example, oxcarbazepine[23] or eslicarbazepine are analogues of carbamazepine that are no longer epoxidized.[25] While oxcarbazepine and eslicarbazepine were not prospectively designed in order to reduce epoxide formation, they demonstrate how small molecular changes can significantly impact toxicity caused by epoxide metabolites. These analogues retain the same mechanism of action as carbamazepine, yet have a lower incidence of adverse effects because they prevent the formation of epoxides.[25,27]

Figure 1

Adverse drug reactions are often caused by reactive metabolites. For example, carbamazepine is metabolized by cytochromes P450 to carbamazepine-10,11-epoxide. Carbamazepine metabolism can also form an iminoquinone,[22] but the epoxide’s formation is the focus of this study and more correlated with adverse reactions.[23−25] The epoxide is electrophilically reactive and covalently binds to nucleophilic sites within proteins. The resulting adduct serves as a hapten complex and elicits an immune response. This mechanism is thought to be responsible for many carbamazepine adverse reactions.[35,36] This site of epoxidation is circled on carbamazepine. A number of studies, including those by our group, have established that computational methods can predict the sites at which molecules are metabolized.[28−33] A shortcoming of those approaches has been the lack of predictions for the actual metabolites generated by those reactions. Cytochromes P450 catalyze many different types of oxidative reactions, including commonly observed hydroxylations.[12,30,34] While several cytochromes P450 site of metabolism models are reported in the literature, to the best of our knowledge, none of those models specifically identify SOEs in molecules. Instead, all existing methods only report which atoms undergo oxidation, without distinguishing the specific type of reaction—such as epoxidation or hydroxylation—or the resulting modification to the structure. In this study, we construct an epoxidation model—based on the structural data of several hundred diverse molecules—that is successful at three key objectives. First, the model accurately predicts SOE within epoxidized molecules; these SOE predictions can be used to direct structural modifications to drug candidates. Second, the model distinguishes SOE from sites of sp2 hydroxylation (SOH), a key negative control. Both SOEs and SOHs are oxidized by P450s, and we expect a useful model to correctly identify which of these oxidations give rise to epoxides. In contrast, commonly reported P450 site of metabolism models will not distinguish these two cases and report both as sites of metabolism. Third, the model identifies which molecules are metabolized into epoxides, separating these molecules from closely related molecules that are not epoxidized. This enables rapid screening of drug candidates for molecules that are potentially toxic due to epoxidation.

Methods

Epoxidation Training Data

We mined a large, chemically diverse training data set from the Accelrys Metabolite Database (AMD), which includes a collection of metabolic reactions drawn from the literature. A total of 702 reactions were extracted, each of which takes place in humans, human cells, or human microsomes and is classified as epoxidation. Because of the short half-life of many epoxides, however, some product molecules do not explicitly contain an epoxide. Instead, an epoxidation product may be a dihydrodiol or a DNA, glutathione, or protein conjugate (Figure 2).[37,38] An automated labeling algorithm used these motifs to label SOEs on the starting molecule of each reaction.

Figure 2

In the database, each epoxidation reaction acting on a site of epoxidation (abbreviated SOE and circled) forms an epoxide, dihydrodiol, or a conjugate adjacent to a hydroxylation. For example, the epoxidation reaction of nevirapine forms an epoxide (top),[40] of N-desmethyl triflubazam forms a dihydrodiol (middle),[41] and of benzo(a)pyrene forms a DNA conjugate adjacent to a hydroxylation (bottom).[38] The first case explicitly records the epoxide, while the other two record a tell-tale signature of a transient, reactive epoxide that is not directly observed.[37,38] A total of 702 human epoxidation reactions were identified in the Accelrys Metabolite Database. An automated labeling algorithm labeled SOEs on the starting molecule of each reaction based on these motifs. In this study, we defined an SOE as the bond between the two carbons to which an epoxide forms and identified these bonds in depictions with circles. When bonds were topologically equivalent to observed SOEs, as identified using the Pybel python library, they were themselves labeled as SOE.[39] Duplicate starting molecules were identified by canonical SMILES and merged into a single training example with all observed SOEs labeled. The final data set included 389 epoxidized molecules, each with its SOEs labeled. These epoxidized molecules included 411 aromatic bond SOEs and 168 double bond SOE. Additionally, 20 single bond SOEs were included; the labeling of single bonds as SOEs is likely due to rearrangements or intermediates—absent from the database—allowing epoxidation to occur at an aromatic or double bond. We also identified structurally similar but non-epoxidized molecules. These target compounds were mined from the reaction network for each previously identified epoxidized molecule. This strategy ensured the inclusion of the metabolic parent and sibling molecules so that a robust distinction between molecules undergoing epoxidation and those that are not became possible. After excluding molecules already classified as epoxidized, the remaining 135 molecules were marked non-epoxidized. Each one was metabolically studied and chemically similar to an epoxidized molecule in the data set. Our license for the AMD data did not allow us to disclose the structures of the full data set. However, all molecule registry numbers are included in the Supporting Information, and this is sufficient data to rebuild the database and reproduce our results.

Hydroxylation Negative Control Data

As discussed in the Introduction, sp2 sites can be either epoxidized or hydroxylated. An epoxidation model must be validated using hydroxylation data as a negative control to distinguish the epoxidation model from a general oxidation model. An epoxidation model should rank SOEs above SOHs, whereas an oxidation model would rank them approximately equally. For use as negative controls, we also extracted SOHs from the AMD. Both SOHs and SOEs are acted on by cytochromes P450, but the epoxides formed from SOEs are more likely to be toxic. To build a hydroxylation test data set, 3000 human hydroxylation reactions were randomly sampled from the AMD. We filtered out sp3 hydroxylations and any SOHs that included non-carbon atoms, both of which are easily distinguishable from epoxidations. After these filtrations, 1105 hydroxylations remained. Duplicate starting molecules were identified by canonical SMILES and merged by labeling all known SOHs for each molecule. This final data set included 811 molecules, each with bonds adjacent to hydroxylations labeled as SOHs.

Descriptors

Our approach used information encoded in descriptors for each bond to assess its susceptibility to epoxidation. Each bond was associated with a total of 214 numerical descriptors, including atom-level, bond-level, and molecule-level descriptors. Descriptors were calculated by in-house software that took as input SDF files with explicit hydrogens and 3D coordinates created by Open Babel.[42] The majority of our descriptors were atom-level descriptors previously developed for the XenoSite metabolism model[28] and the XenoSite reactivity model.[43] Each bond contained 89 descriptors from its “left” atom and its “right” atom. To prevent representation bias due to atom ordering, left and right atom assignment was randomized on a bond-by-bond basis. Twenty-three molecule-level descriptors, reported in our prior work, were also computed and used by the network to make predictions. We supplemented these atom and molecule descriptors with bond descriptors developed specifically to capture the chemical properties of bonds. These 13 new bond descriptors are summarized in Table 1; a comprehensive table of the descriptors used in this study is available in the Supporting Information. There were two types of bond descriptors. First, topological bond descriptors summarized information from the molecular 2D structure. Second, quantum chemical descriptors were calculated from self-consistent field computations by MOPAC, a semiempirical quantum chemistry modeler, utilizing an implicit solvent model and the PM7 force field.[44,45]

Table 1

Condensed List of Bond Descriptors Developed for This Studya

Topological Bond Descriptors
single bond	binary value indicating whether bond is a single bond
aromatic bond	binary value indicating whether bond is an aromatic bond
double bond	binary value indicating whether bond is a double bond
conjugated bond	binary value indicating whether bond is conjugated
triple bond	binary value indicating whether bond is a triple bond
topologically equivalent	number of topologically equivalent bonds in the same molecule

Descriptors were generated using both topological and quantum chemical information. A full list of descriptors used in this study is available in the Supporting Information.

Descriptors were generated using both topological and quantum chemical information. A full list of descriptors used in this study is available in the Supporting Information. In total, 214 numbers were used to describe each bond: 89 atom descriptors for the “left” atom, 89 for the “right” atom, 23 molecule descriptors, and 13 bond specific descriptors.

Combined Atom- and Molecule-Level Epoxidation Model

We built a model for bond and molecule epoxidation using a deep neural network with one input layer, two hidden layers, and two output layers (Figure 3). The top-level output layer computed molecule-level predictions called the molecule epoxidation scores (MES); the next output layer computed bond-level predictions called the bond epoxidation scores (BES). Here, the term “deep network” does not mean a deep autoencoder network as is being increasingly used.[46] Instead, we mean a deep convolution network, with many more layers than a standard network and extensive weight sharing between replicates of the BES network.[47] This network was trained in two stages.

Figure 3

The structure of the epoxidation model. This diagram shows how information flowed through the model, which was composed of one input layer, two hidden layers, and two output layers. This model computed a molecule-level prediction for each test molecule as well as predictions for each bond within that test molecule. From the 3D structure of an input molecule, 23 molecule-level and 191 bond-associated descriptors were calculated. These inputs nodes are inputted into the first hidden layer (with 10 nodes), which outputs a bond epoxidation score (BES) for each bond in the molecule. The BES quantifies the probability that the bond is a site of epoxidation. The top five BES, and all molecule-level descriptors, flow into the second hidden layer (with 10 nodes), which outputs a single molecule epoxidation score (MES) for the input molecule, reflecting the probability that the molecule will be epoxidized. For conciseness, the diagram is abbreviated and only shows two nodes for each hidden layer, one molecule input node, two atom input nodes (for each atom associated with the bond), and one bond input node. The actual model had several additional nodes in the input and hidden layers. First, we trained the bond-level network to compute accurate BES values. In this training, each bond within a molecule was considered a possible SOE. Each bond had a vector of numbers (descriptors), with each entry of the vector describing a chemical property of that bond. The data set was a matrix, structured as one column per descriptor, and one row per bond. A final binary target vector labeled experimentally observed SOEs with a 1. The weights of the network were trained using gradient descent on the cross-entropy error, so that SOEs scored higher BES than other bonds. These BES ranged from zero to one, representing the probability that a bond was an SOE. Second, the molecule-level output layer was trained to compute MES values. Several versions of this output layer were considered, including another multilayer neural network, a logistic regressor, and a max layer that computed the MES as the maximum BES observed in the molecule. The logistic regressor and neural network took as input the top five BES, as well as all molecule-level descriptors. As we will see, both the neural network and the logistic regressor offer better scaled predictions with higher classification performances than the simpler max layer.

Results and Discussion

The following sections study the classification performance and inner workings of the epoxidation model. First, we evaluated the ability of BES to predict the SOE of epoxidized molecules. Second, we considered the credibility of the model by analyzing which descriptors are most important to the model’s performance. Third, we increased resolution on the quality of the model predictions by calculating classification performance on aromatic and double bonds individually. Fourth, we asked whether BES distinguish SOEs from sites of sp2 hydroxylation, because both epoxidation and sp2 hydroxylation are catalyzed by P450s but have significantly different implications for toxicity. Fifth, we tested how well MES separated epoxidized and non-epoxidized molecules. Finally, we studied how the model could direct drug modifications to reduce toxicity of known drugs.

Accuracy in Identifying Sites of Epoxidation

An important goal for designing drugs less prone to metabolic activation is to accurately identify the site (bond) within a molecule that undergoes epoxidation. In our study, SOE predictions gave a specific hypothesis about the mechanism of a molecule’s toxicity. Furthermore, knowledge of the SOE lays a strong foundation for guiding the modification of a molecule to make it less susceptible to epoxidation and thus less likely to cause protein and DNA adducts that lead to toxic effects. There are currently no other published computational methods that specifically predict SOEs among a diverse set of molecules. The trained model predicts SOEs by computing a BES for each bond in a test molecule. These scores ranged between zero and one and reflected the probability that an epoxide will form on the two atoms within the corresponding bond. If accurate, BES should discriminate between SOEs and all other bonds within epoxidized molecules. We assessed the generalization performance of our model using a cross-validation protocol. In this procedure, we separated molecules into metabolically related groups that represented metabolic networks in the database. Each group was comprised of epoxidized molecules and all parent and sibling molecules of those epoxidized molecules. One by one, each group of molecules derived from these networks was withheld from the training set. The rest of the molecules was used to train a model and make predictions on all the molecules present in the group left out of the training process. In each cross-validation fold, the model predictions for test molecules then did not depend on training data from identical or closely related molecules and thus provided a rigorous evaluation of the model. In this way, BES predictions were made on all molecules in the training data. We used two metrics to quantitatively measure the classification performance of the cross-validated BES. First, we computed the “average site AUC” by calculating the area under the ROC curve (AUC) for each molecule and quantified the whole data set performance by averaging the AUCs for each molecule in the data set. Second, we used the “top-two” metric, which is often used in site of metabolism prediction.[28,48,49] By this metric, a molecule was considered correctly predicted if any of its observed SOEs were predicted in the first- or second-rank position by a given model. Both metrics measure the separation of known SOEs from all other bonds within each molecule known to undergo epoxidation. The BES reported by the neural network model accurately identified SOEs with an average site AUC performance of 94.9% and a top-two performance of 83.0% (Figure 4). The neural network outperformed a simpler logistic regressor model (BES[LR] in the figure), which had an average site AUC performance of 93.7% and a top-two performance of 80.5%. The neural network was significantly more accurate than the logistic regressor, reducing the error by 19.0% (average site AUC) and 12.8% (top-two). This improvement is significant according to a paired t-test, with p-values of 0.000454 (average site AUC) and 0.0328 (top-two).[50] This improvement indicated nonlinearity in the epoxidation data that cannot be taken into account by a logistic regressor. This finding justified the use of the more complex neural network and was consistent with a previous study on site of metabolism prediction,[51] as well as our previous work on sites of glutathione reactivity.[43]

Figure 4

Bond epoxidation scores accurately (BES) identify sites of epoxidation (SOEs). Top left, for each prediction method, average site AUC was computed for 389 molecules extracted from the Accelrys Metabolite Database with their SOEs labeled. This metric reflected how often SOEs were ranked above other sites within these molecules. Bottom left, top-two classification performance was computed, by which a molecule was considered correctly predicted if any of its observed SOEs were predicted in the first- or second-rank position. By both metrics, the cross-validated predictions generated by a neural network (BES) outperformed the predictions of a logistic regressor (BES[LR]). The classification performance of BES also exceeded that of all raw descriptors, the five best of which are included in each panel. Right, examples from the data set are visualized with their predictions.[52−54] In the bar graph axis, the two-center electron–nuclear attraction energy is abbreviated as electron–nuclear attraction. For each molecule, the colored shading represents BES, which range from 0 to 0.73. Each experimentally observed SOE is circled. This model for epoxidation is the first of its kind, and thus there are no other published models to which performance can be compared. Instead, we tested the performance of each raw descriptor to provide a baseline for comparison. Each descriptor was treated as a very simple model limited to a single chemical attribute to predict SOE. The best performing descriptor was πp occupancy; however, this descriptor significantly underperformed our model, with accuracies of 90.8% (average site AUC) and 72.8% (top-two). Using machine learning to collectively consider many chemical attributes classified SOEs more accurately than any attribute considered in isolation.

Descriptors Driving Bond Epoxidation Score Performance

We identified which descriptors the model relied upon by using sensitivity analysis by using sensitivity analysis to further assess the sensibleness of the model. The contribution of individual descriptors for identifying SOEs was measurable with a permutation sensitivity analysis.[43,55] First, a baseline model was built using the entire training data set, and its performance was calculated on this training data. The average site AUC performance was used for the sensitivity analysis, because it most closely measures performance in the intended use case. It quantifies how accurately the model identifies the correct SOEs within epoxidized test molecules, relative to all other potential sites, on a molecule-by-molecule basis. Reassuringly, very similar results from the sensitivity analysis are obtained using other metrics (data not shown). Next, the influence of individual descriptors, as well as groups of descriptors, was measured by recording the drop in the model’s performance on the training data when the descriptor values were shuffled randomly. For each descriptor set, the shuffling procedure was performed 10 times, and the mean performance drop reported. Descriptors more heavily relied upon by the model were associated with higher performance drops. As seen in Figure 5, the model primarily relied on quantum chemical bond descriptors. Shuffling all quantum chemical bond descriptors (listed in Table 1) as a group resulted in a performance drop of 10.3%. The most important individual descriptor was πp occupancy; shuffling of this descriptor was associated with a performance drop of 4.8%. This observation was consistent with πp occupancy predicting SOEs reasonably well by itself, with the best performance among all lone descriptors (Figure 4). The model's heavy reliance on πp occupancy is logical given its role in epoxidation. In fact, a π-complex is the initial intermediate formed during epoxidation by cytochromes P450.[37,56,57] While reasonable, πp occupancy has never been proposed as a way to identify SOE.

Figure 5

The importance of specific descriptors to the bond epoxidation model. A permutation sensitivity analysis quantified the importance of descriptors for the final trained site of epoxidation model. Left, the 10 most important individual descriptors in decreasing order of importance from top to bottom. Right, the importance of four broad descriptor categories. The graph shows the model performance drop associated after permuting the associated descriptor values, averaging over 10 iterations. The second most important descriptor was SMARTCyp reactivity, with a performance drop of 2.5%. The relevance of SMARTCyp reactivity is readily understandable, because it predicts the sites of cytochromes P450 metabolism of drug-like molecules.[30] The remaining most important individual descriptors were topological. Previous studies by our group found topological descriptors to be important for many different types of chemical modeling.[43] Topology encompasses fundamental information, such as atom element identity or bond type, which has been useful for finding many different types of patterns. Overall, the results of sensitivity analysis indicated that the model logically relied upon descriptors relevant to epoxidation.

Accuracy in Identifying Aromatic and Double Bond Sites of Epoxidation

Ideally, the model should be able to distinguish SOEs from all other bonds across the entire data set. This is not assessed by the average site AUC and top-two metrics used in prior sections, which only compare BES predictions on a molecule-by-molecule basis. In contrast, global AUC, computed across all atoms in the data set does measure this behavior. The model’s BES is very accurate across the whole data set, with a global AUC of 95.6%. The logistic regressor is slightly less accurate with a global AUC of 94.5%, but this performance drop is significant with a p-value of 10–8 computed with a paired t-test.[50] Similarly, the best performing descriptor is the πp occupancy with a global AUC of 88.4%, which is also a significant performance drop from the BES with a p-value approaching zero. We further assessed the model’s performance by ensuring it was able to distinguish SOEs from either aromatic and double bonds (Figure 6). These tests excluded (for example) single bonds, which are very rarely epoxidized and might artificially inflate performance if included in performance calculations. An aromatic bond AUC was computed by first extracting all aromatic bonds within epoxidized molecules and then calculating AUC. A double bond AUC was calculated similarly. Encouragingly, BES were very accurate in identifying both epoxidized aromatic bonds (92.5%) and epoxidized double bonds (95.1%) and also substantially outperformed all individual descriptors.

Figure 6

Bond epoxidation scores (BES) accurately identified both aromatic and double bond sites of epoxidation. Across the 389 molecules that underwent epoxidation, the model accurately separated epoxidized and non-epoxidized aromatic bonds (left) and double bonds (right). Using cross-validated scores, classification performance was quantified by computing the AUC of the model on either the aromatic or the double bonds in the full data set. The AUC of the model was compared with similarly computed AUCs for individual descriptors. In both cases, the model BES outperformed all individual descriptors.

Distinguishing Epoxidation from Hydroxylation

Another key task was to accurately distinguish SOEs from SOHs, because epoxidation and hydroxylation may have significantly different implications for toxicity and downstream metabolism. Generally, SOEs are not obviously distinguishable from sites of sp2 hydroxylation, because either epoxidation or hydroxylation may occur at sp2 atoms. While several studies have already demonstrated that computational models can predict the sites where molecules are oxidized,[28−33] they do not predict if the oxidation is an epoxidation or a hydroxylation. For our study, we tested whether BES distinguished SOEs from SOHs. We initially built a hydroxylation data set of 3000 hydroxylation reactions that were randomly sampled from the AMD resource, as described in the Methods. This final data set included 811 molecules, in which atoms were marked if they are sites of sp2 hydroxylation. In this study, an SOE was defined as a bond between the two carbons of the final epoxide, whereas an SOH is usually defined as the single atom targeted for hydroxylation. However, our model only makes predictions on bonds. So, for validation purposes, we labeled the bonds connecting to hydroxylated atoms as SOHs and asked whether these sites receive lower scores than SOEs. Only bonds between two sp2 carbon atoms were included. Each of the 811 molecules in the hydroxylation data set was tested by our model, and the predictions for each bond of hydroxylation were extracted. As previously explained, the hydroxylation reactions were sampled randomly from our database. Therefore, molecules subject to both hydroxylation and epoxidation data sets were included. Cross-validated predictions were used for molecules that were also part of the training set. Within these molecules, it was possible for the same site to be subject to both epoxidation and hydroxylation. These sites were labeled as SOEs. We investigated whether these SOEs were distinguishable from SOHs. Encouragingly, BES separated SOEs from SOHs with an AUC of 83.3% (Figure 7). In contrast, the best performing raw descriptor among all tested was πp occupancy, with an AUC of only 77.0%. This is a critical result because it demonstrates that the model can distinguish SOEs from sites that are also acted on by P450s, but not epoxidized.

Figure 7

Bond epoxidation scores (BES) distinguish sites of epoxidation (SOEs) from sites of hydroxylation (SOHs). Top, each prediction method was assessed by its ability to separate SOEs from SOHs. The cross-validated scores on the SOEs of 389 epoxidized training molecules were compared with the SOH scores on 811 test molecules with their sites of sp2 hydroxylation labeled. The scores for each SOE and SOH were extracted and performance was quantified by computing the AUC. The classification performance of the model was then compared with similarly computed AUCs for individual descriptors. The model’s BES outperformed all individual descriptors. Right, from top to bottom are 1-nitropyrene[58] and ketoconazole,[59] example molecules subject to both epoxidation and hydroxylation. Each SOE is indicated by solid circles, and SOHs are indicated by dashed circles. The colored shading indicates BES (which range from 0 to 0.45).

πp Occupancy and Epoxidation

One striking result from these experiments is the consistently high importance of πp occupancy in identifying SOEs. Although it has been known for a long time that a π-complex is the initial intermediate formed during epoxidation by cytochromes P450,[37,56,57] no published literature has suggested πp occupancy is a marker for SOEs or quantitatively assessed its ability to identify SOEs. To further investigate this observation, which may provide mechanistic clues, we studied the distribution of πp occupancy and BES as a function of epoxidation and bond type (Figure 8). From these distributions, it seems immediately clear that SOEs have higher πp occupancy than non-epoxidized sites. However, πp occupancy is also strongly correlated with the type of bond, and the optimal cutoff between SOEs and non-epoxidized sites is different for double and aromatic bonds. This result suggests that πp occupancy may not be the direct driver of the π-intermediate’s formation. Instead, πp occupancy may be a proxy for another factor that we do not directly capture in other descriptors. One possible factor may be the ability of neighboring groups to donate πp electrons, but directly testing this hypothesis is beyond the immediate scope of this study and will be left for future work.

Figure 8

Bond epoxidation scores (BES) represent a well-scaled probability that a site will be epoxidized. Across the 389 molecules that underwent epoxidation, the normalized distribution of BES (bottom) and πp occupancy (top) across both aromatic bonds (left) and double bonds (right) are displayed for all epoxidized and non-epoxidized sites, indicated by the shaded bars. The solid lines represent the percentage of bonds that are epoxidized (using non-normalized frequencies). The diagonal dashed lines on the bottom plots indicate a hypothetical perfectly scaled prediction. This demonstrates that BES is much better scaled than πp occupancy. These distributions also highlight another key feature of our approach; the model’s output is well-scaled and can be interpreted as a probability. In other words, bonds with a BES score of 0.8 have approximately a 80% chance of being epoxidized. In contrast, πp occupancy, though predictive, is not scaled to be an SOE probability.

Accuracy at Identifying Molecules that Undergo Epoxidation

We also assessed the ability of our model to separate epoxidized from non-expoxidized molecules. With high enough classification performance, our model might be a useful tool to rapidly screen drug candidates for potentially problematic molecules.[7−11] In this assessment, we trained the model for epoxidation to distinguish between those molecules that underwent epoxidation and those that did not. We included in our training data set molecules that are structurally closely related to epoxidized molecules, but are not themselves epoxidized in our database. After training the model on the SOE level, we tested several methods of separating epoxidized and non-epoxidized molecules (Figure 9). In this case, classification performance was quantified by measuring the AUC across the entire data set.

Figure 9

Molecule epoxidation scores accurately identify molecules subject to epoxidation. Left, several prediction methods were compared by their ability to identify molecules that underwent epoxidation. The data set included 524 molecules, 389 of which were epoxidized and 135 structurally similar but not epoxidized molecules. Model performance was measured by computing the AUC across epoxidized and non-epoxidized molecules (Molecule AUC), using cross-validated scores. By this metric, the best approach inputted the top five bond epoxidation scores (BES) and all molecule-level descriptors into a neural network (MES[NN]). This slightly outperformed the simpler methods of using a logistic regressor (MES[LR]) or merely taking the maximum bond epoxidation score (max[BES]). While this improvement is not statistically significant, on the basis of the reliability plots in Figure 10, the neural network (MES[NN]) was chosen to calculate molecule epoxidation scores (MES) for this study. Right, example pairs of epoxidized and closely related non-epoxidized molecules are visualized. From left to right, top to bottom: resveratrol (MES: 0.79),[60] quinalbarbitone (MES: 0.88),[61] glucuronidated resveratrol (MES: 0.37),[62] and thiopental (MES: 0.60).[63] Each experimentally observed site of epoxidation is circled. For each molecule, the colored shading represents BES, which range from 0 to 0.76.

Figure 10

MES[NN] offers a well-scaled probabilistic prediction of molecule epoxidation. The bar graphs plot the normalized distributions of max[BES], MES[LR], and MES[NN] across 525 epoxidized and non-epoxidized molecules. The solid lines plot the percentage of molecules that are epoxidized (using non-normalized frequencies) in each bin. The diagonal dashed lines indicate a hypothetical perfectly scaled prediction. MES[NN] offers the best scaled prediction of the three methods, with a strong correlation to a perfectly scaled prediction. This means that the MES[NN] is interpretable as the probability that a molecule is epoxidized.

The simplest method for predicting molecule epoxidation was to take the cross-validated maximum BES score within each molecule. Across the entire data set, this approach yielded MES that separate epoxidized and non-epoxidized molecules with an AUC of 78.6%. The addition of a training step to input the top five BES and molecule-level descriptors into a logistic regressor or neural network slightly improved classification performance. The cross-validated scores outputted by a logistic regressor (MES[LR]in the figure) had a higher AUC of 78.9%, and those of the neural network (MES[NN]) had an AUC of 79.3%. A false positive rate paired t-test[50] indicated that MES[NN] was not significantly better than max[BES] (p-value 0.14) or MES[LR] (p-value 0.19). However, MES[NN] provided a better scaled prediction than either max[BES] or MES[LR], as demonstrated by the reliability plots in Figure 10. The neural network closely approximated a perfectly well-scaled prediction, with an R2 value of 0.971, compared to 0.956 for the logistic regressor or 0.889 for max[BES]. The neural network’s reliability plot is superior to that of the logistic regressor, not only due to the higher R2 value, but also because it assigns significantly more non-epoxidized molecules low scores, and epoxidized molecules high scores, evidenced by the relative densities in Figure 10. MES[NN] offers a well-scaled probabilistic prediction of molecule epoxidation. The bar graphs plot the normalized distributions of max[BES], MES[LR], and MES[NN] across 525 epoxidized and non-epoxidized molecules. The solid lines plot the percentage of molecules that are epoxidized (using non-normalized frequencies) in each bin. The diagonal dashed lines indicate a hypothetical perfectly scaled prediction. MES[NN] offers the best scaled prediction of the three methods, with a strong correlation to a perfectly scaled prediction. This means that the MES[NN] is interpretable as the probability that a molecule is epoxidized. Nevertheless, choosing between the logistic regressor and neural network is debatable. The logistic regressor offers a simpler model structure, whereas the neural network provides a slightly higher classification performance and better scaled prediction. Going forward, we decided to use the neural network, but we believe that the logistic regressor could also be used with similar results. For the rest of the study, we define MES to mean MES[NN]. The significantly lower AUC of the molecule-level MES compared to the site-level BES was a consequence of the lower quality of the molecule-level data, which included “non-epoxidized” molecules. This was based on our assumption that molecules were non-epoxidized if they were not subject to any epoxidation reaction in our literature-derived database. While necessary, this assumption was not strong evidence that molecules were not subject to epoxidation, because not all studies look for epoxidation products. As a consequence, some epoxidizable molecules were incorrectly labeled as non-epoxidized in our data. In contrast, our site-level epoxidation data is much less noisy, because it is drawn from experiments detecting epoxidation, and this is reflected in the higher site-level performance. Nevertheless, MES separated epoxidized and non-epoxidized molecules with 79.3% AUC. This result is consistent with our presumption that most of the molecules labeled as non-epoxidized, are truly not epoxidized. If epoxidized and non-epoxidized molecules were drawn from the same chemical distribution, it would not be possible to separate them with any accuracy. Furthermore, MES outperformed all molecule-level descriptors in terms of classification performance. This result demonstrated that our model offers an informative prediction on the molecule level. The best performing descriptor was the negative of the total number of single bonds in a molecule, yet its AUC was only 72.3%, considerably worse than MES. In contrast to site-level epoxidation, for which πp occupancy was quite predictive (Figure 4), maximum πp occupancy predicts molecule epoxidation with only 57.7% AUC. The model MES much more accurately predicted which molecules will be epoxidized than any single chemical descriptor.

Case Studies

Knowledge of the SOE of a drug or drug candidate can direct rational drug design to avoid the formation of reactive metabolites and reduce the risk of adverse drug reactions. Case studies provide excellent examples of how our model could enable the development of safer drugs (Figure 11).

Figure 11

The epoxidation model recognizes sites of epoxidation within drugs that can be modified to reduce toxicity. The figure includes three groups of closely related drugs shaded by their BES scores; the top three are prone to hypersensitivity reactions while their analogues are not. The top three molecules and meloxicam are epoxidized and their sites of epoxidation are circled.[21,23−25,66,67] The model’s BES correctly identifies the SOEs in these molecules. The model’s MES correctly identifies these molecules as epoxidized, with higher scores than the non-epoxidized molecules. For the top three molecules, epoxidation is the primary mechanism of their hypersensitivty.[65] Encouragingly, the two analogues of carbamazepine are correctly identified as non-epoxidized and therefore non-hepatotoxic. This demonstrates how the model could be used to identify less toxic analogues. Furosemide does not have a close analogue on the market, but the model correctly identifies the furan ring as problematic. The other diuretics with the same active scaffold, but without this furan, are less toxic.[65] Identifying meloxicam as less toxic is a more difficult task and would require more comprehensive metabolism modeling. Meloxicam is a safer analogue of sudoxicam because an alternate hydroxylation pathway is introduced by the modification that outcompetes the epoxidation pathway.[21] Carbamazepine is an effective drug to treat epilepsy; however, it can cause severe adverse reactions mediated by reactive metabolites. Carbamazepine metabolism can form several reactive metabolites, including an iminoquinone,[22] but the epoxide’s formation is the focus of this study and more correlated with adverse reactions.[23−25] Analogues of carbamazepine that block the epoxidation have a lower incidence of adverse effects. Replacement of the problematic double bond with a ketone yielded oxcarbazepine, which lacks the metabolic activation to an epoxide and adverse events, yet retains similar efficacy.[27] Similarly, eslicarbazepine does not contain the problematic double bond, is no longer epoxidized at this position, and also has a lower incidence of adverse reactions.[25,64] The model correctly identifies carbamazepine’s SOE. Furthermore, the model correctly identified two carbamazepine analogous as less likely to be epoxidized: oxcarbazepine (MES: 0.38) and eslicarbazepine (MES: 0.20) compared with carbamazepine (MES: 0.88). Furosemide is a commonly prescribed diuretic but is prone to hypersensitivity reactions and hepatotoxicity due to the epoxidation of its furan ring.[65] The model correctly identifies this as an SOE. There are no close analogues of furosemide on the market. However, there are three other drugs in the same class that contain the same sulfamyl-based active scaffold: piretanide, bumetanide, and torasemide. None of these drugs contain the problematic furan, are all predicted not to form epoxides (MES: 0.21, 0.19, and 0.21, respectively, compared with 0.94 for furosemide), and all are less prone to hypersensitivity driven reactions than furosemide.[65] The case of hepatotoxic sudoxicam and its non-hepatotoxic analogue, meloxicam, is more complicated.[65] Sudoxicam is a NSAID that was withdrawn from testing due to hepatotoxicity caused by epoxidation of its thiazole ring; the unstable epoxide causes ring scission and formation of a reactive acylthiourea metabolite.[21,65] This reaction pathway is suppressed by the addition of a single methyl group to sudoxicam’s SOE. The resulting drug meloxicam is less prone to epoxidation, although the epoxide still forms.[21] Instead, meloxicam is primarily hydroxylated at the added carbon.[21] As a result, the reactive acylthiourea urea metabolite forms less often, and consequently meloxicam is not hepatotoxic, despite being prescribed at a similar dose to the hepatotoxic sudoxicam.[21,65] The model correctly predicts the SOEs of both sudoxicam and meloxicam, and assigns them high MES of 0.95 and 0.96, respectively. However, the model does not identify meloxicam as the less toxic molecule. This is exactly what we should expect, because both molecules are epoxidized by P450s.[21] Meloxicam’s modification introduces an alternative hydroxylation pathway that reduces the amount of epoxide formed, and this change is responsible for its reduced toxicity. This highlights the limitations of considering the epoxidation pathway in isolation. A better risk assessment might combine epoxidation predictions with more comprehensive models of metabolism to predict if epoxides are a major metabolite. Building this system is exactly our long-range goal, but beyond the scope of the current study. Nevertheless, our findings provide a critical step in the right direction: the first reported model that predicts the formation of reactive epoxides from drug candidates and the accurate identification of the specific epoxidized bonds. As is clear in all three of these cases, the model can be used to identify SOEs that can then be modified to make drugs safer.

Conclusion

This study establishes a new system to predict the formation of reactive epoxide metabolites. The epoxidation model—trained on SOE data—identifies with 94.9% AUC performance the SOEs within epoxidized molecules. The model also classifies epoxidized and non-epoxidized molecules with 79.3% AUC. This method needs to be combined with additional tools to be useful for predicting the toxicity of drugs. For example, while this model predicts the formation of epoxides, it does not score the reactivity of these epoxides. Epoxide reactivity can vary widely, with half-lives ranging from one second to several hours,[37] and this variation may have significant implications for toxicity. To address this, we plan to combine this epoxidation model with a model of reactivity already developed.[43] Furthermore, we will expand to model quinone formation, another motif of potentially high reactivity that frequently causes adverse drug reactions.[15,68,69] Ultimately, we envision a powerful model for predicting adverse drug reactions that integrates metabolism models, reactivity models, and dosage information. By accurately modeling epoxidation, this study provides a key piece of this ultimate goal.

60 in total

Review 1. Role of epoxide hydrolases in lipid metabolism.

Authors: Christophe Morisseau
Journal: Biochimie Date: 2012-06-18 Impact factor: 4.079

Review 2. The primary role of hepatic metabolism in idiosyncratic drug-induced liver injury.

Authors: David E Amacher
Journal: Expert Opin Drug Metab Toxicol Date: 2012-01-31 Impact factor: 4.481

3. Exposition and reactivity optimization to predict sites of metabolism in chemicals.

Authors: Gabriele Cruciani; Massimo Baroni; Paolo Benedetti; Laura Goracci; Cosimo Gianluca Fortuna
Journal: Drug Discov Today Technol Date: 2013

4. EaMEAD: Activation energy prediction of cytochrome P450 mediated metabolism with effective atomic descriptors.

Authors: Doo Nam Kim; Kwang-Hwi Cho; Won Seok Oh; Chang Joon Lee; Sung Kwang Lee; Jihoon Jung; Kyoung Tai No
Journal: J Chem Inf Model Date: 2009-07 Impact factor: 4.956

5. Role of glutathione conjugate efflux in cellular protection against benzo[a]pyrene-7,8-diol-9,10-epoxide-induced DNA damage.

Authors: Sanjay K Srivastava; Simon C Watkins; Erin Schuetz; Shivendra V Singh
Journal: Mol Carcinog Date: 2002-03 Impact factor: 4.784

6. Comparison of linear and nonlinear classification algorithms for the prediction of drug and chemical metabolism by human UDP-glucuronosyltransferase isoforms.

Authors: Michael J Sorich; John O Miners; Ross A McKinnon; David A Winkler; Frank R Burden; Paul A Smith
Journal: J Chem Inf Comput Sci Date: 2003 Nov-Dec

Review 7. Drug metabolite profiling and elucidation of drug-induced hepatotoxicity.

Authors: Wei Tang
Journal: Expert Opin Drug Metab Toxicol Date: 2007-06 Impact factor: 4.481

8. Quantifying the metabolic activation of nevirapine in patients by integrated applications of NMR and mass spectrometries.

Authors: Abhishek Srivastava; Lu-Yun Lian; James L Maggs; Masautso Chaponda; Munir Pirmohamed; Dominic P Williams; B Kevin Park
Journal: Drug Metab Dispos Date: 2010-01 Impact factor: 3.922

Review 9. Acute liver failure: mechanisms of immune-mediated liver injury.

Authors: Zeguang Wu; Meifang Han; Tao Chen; Weiming Yan; Qin Ning
Journal: Liver Int Date: 2010-05-20 Impact factor: 5.828

Review 10. The role of human glutathione transferases and epoxide hydrolases in the metabolism of xenobiotics.

Authors: J Seidegård; G Ekström
Journal: Environ Health Perspect Date: 1997-06 Impact factor: 9.031

28 in total

1. Metrics for Performance Evaluation of Patient Exercises during Physical Therapy.

Authors: Aleksandar Vakanski; Jake M Ferguson; Stephen Lee
Journal: Int J Phys Med Rehabil Date: 2017-04-20

2. Deep Learning to Predict the Formation of Quinone Species in Drug Metabolism.

Authors: Tyler B Hughes; S Joshua Swamidass
Journal: Chem Res Toxicol Date: 2017-02-02 Impact factor: 3.739

Review 3. The Next Era: Deep Learning in Pharmaceutical Research.

Authors: Sean Ekins
Journal: Pharm Res Date: 2016-09-06 Impact factor: 4.200

4. Computationally Assessing the Bioactivation of Drugs by N-Dealkylation.

Authors: Na Le Dang; Tyler B Hughes; Grover P Miller; S Joshua Swamidass
Journal: Chem Res Toxicol Date: 2018-02-06 Impact factor: 3.739

5. Computational Approach to Structural Alerts: Furans, Phenols, Nitroaromatics, and Thiophenes.

Authors: Na Le Dang; Tyler B Hughes; Grover P Miller; S Joshua Swamidass
Journal: Chem Res Toxicol Date: 2017-03-14 Impact factor: 3.739

Review 6. Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era.

Authors: Yankang Jing; Yuemin Bian; Ziheng Hu; Lirong Wang; Xiang-Qun Xie
Journal: AAPS J Date: 2018-03-30 Impact factor: 4.009

7. The Metabolic Rainbow: Deep Learning Phase I Metabolism in Five Colors.

Authors: Na Le Dang; Matthew K Matlock; Tyler B Hughes; S Joshua Swamidass
Journal: J Chem Inf Model Date: 2020-02-24 Impact factor: 4.956

8. A deep learning approach for the blind logP prediction in SAMPL6 challenge.

Authors: Samarjeet Prasad; Bernard R Brooks
Journal: J Comput Aided Mol Des Date: 2020-01-30 Impact factor: 3.686

9. Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data.

Authors: Alexander Aliper; Sergey Plis; Artem Artemov; Alvaro Ulloa; Polina Mamoshina; Alex Zhavoronkov
Journal: Mol Pharm Date: 2016-06-08 Impact factor: 4.939

10. Metabolic Forest: Predicting the Diverse Structures of Drug Metabolites.

Authors: Tyler B Hughes; Na Le Dang; Ayush Kumar; Noah R Flynn; S Joshua Swamidass
Journal: J Chem Inf Model Date: 2020-09-16 Impact factor: 4.956