Christina de Bruyn Kops1, Martin Šícho2, Angelica Mazzolari3, Johannes Kirchmair1,4. 1. Center for Bioinformatics (ZBH), Department of Informatics, Faculty of Mathematics, Informatics and Natural Sciences, Universität Hamburg, 20146 Hamburg, Germany. 2. CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, 166 28 Prague 6, Czech Republic. 3. Facoltà di Scienze del Farmaco, Dipartimento di Scienze Farmaceutiche "Pietro Pratesi", Università degli Studi di Milano, I-20133 Milan, Italy. 4. Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria.
Abstract
Predicting the structures of metabolites formed in humans can provide advantageous insights for the development of drugs and other compounds. Here we present GLORYx, which integrates machine learning-based site of metabolism (SoM) prediction with reaction rule sets to predict and rank the structures of metabolites that could potentially be formed by phase 1 and/or phase 2 metabolism. GLORYx extends the approach from our previously developed tool GLORY, which predicted metabolite structures for cytochrome P450-mediated metabolism only. A robust approach to ranking the predicted metabolites is attained by using the SoM probabilities predicted by the FAME 3 machine learning models to score the predicted metabolites. On a manually curated test data set containing both phase 1 and phase 2 metabolites, GLORYx achieves a recall of 77% and an area under the receiver operating characteristic curve (AUC) of 0.79. Separate analysis of performance on a large amount of freely available phase 1 and phase 2 metabolite data indicates that achieving a meaningful ranking of predicted metabolites is more difficult for phase 2 than for phase 1 metabolites. GLORYx is freely available as a web server at https://nerdd.zbh.uni-hamburg.de/ and is also provided as a software package upon request. The data sets as well as all the reaction rules from this work are also made freely available.
Predicting the structures of metabolites formed in humans can provide advantageous insights for the development of drugs and other compounds. Here we present GLORYx, which integrates machine learning-based site of metabolism (SoM) prediction with reaction rule sets to predict and rank the structures of metabolites that could potentially be formed by phase 1 and/or phase 2 metabolism. GLORYx extends the approach from our previously developed tool GLORY, which predicted metabolite structures for cytochrome P450-mediated metabolism only. A robust approach to ranking the predicted metabolites is attained by using the SoM probabilities predicted by the FAME 3 machine learning models to score the predicted metabolites. On a manually curated test data set containing both phase 1 and phase 2 metabolites, GLORYx achieves a recall of 77% and an area under the receiver operating characteristic curve (AUC) of 0.79. Separate analysis of performance on a large amount of freely available phase 1 and phase 2 metabolite data indicates that achieving a meaningful ranking of predicted metabolites is more difficult for phase 2 than for phase 1 metabolites. GLORYx is freely available as a web server at https://nerdd.zbh.uni-hamburg.de/ and is also provided as a software package upon request. The data sets as well as all the reaction rules from this work are also made freely available.
Metabolism has a large
impact on the safety and efficacy of the
xenobiotics that enter the human body, from drugs to cosmetics and
agrochemicals, because metabolic reactions can change these compounds
into metabolites with different physicochemical and pharmacological
properties.[1] Computational approaches can
be useful for predicting how drugs and other xenobiotics will be metabolized
in humans, allowing, for example, the focusing of the drug development
process on the most promising compounds in order to save time and
reduce costs.Human xenobiotic metabolism is generally separated
into two phases,
phase 1 and phase 2, based on the type of reaction (note that the
nomenclature does not indicate that a phase 1 reaction must occur
before a phase 2 reaction can take place). Phase 1 metabolism consists
of oxidation, reduction, and hydrolysis reactions that generally result
in increased polarity of the metabolite compared to the parent molecule
by creating or unmasking polar functional groups. The main enzyme
family responsible for phase 1 metabolism is the cytochrome P450 (CYP)
enzyme family, which is responsible for the formation of approximately
60% of first-generation metabolites but only for approximately 40%
of metabolites overall (all percentages presented here are based on
the current version[2] of the MetaQSAR database,[3] which contains over 4000 parent molecules, including
drugs and other xenobiotics, along with their first-, second-, and
third-generation metabolites produced by mammalian metabolic enzymes
in vitro and/or in vivo). CYPs are also the cause of a large portion
of toxic metabolites and drug–drug interactions.[4] Of the non-CYP phase 1 enzymes, the ones with
the highest impact on metabolite formation are hydrolases and flavin-containing
monooxygenases (FMOs), which account for approximately 9% and approximately
4% of all metabolites, respectively.Phase 2 metabolism consists
of conjugation reactions that, like
phase 1 metabolic reactions, tend to modify compounds to more excretable
forms. In all, phase 2 metabolism accounts for approximately 30% percent
of all metabolites.[2] The enzymes responsible
for phase 2 drug metabolism belong primarily to five enzyme families:
UDP-glucuronosyltransferases (UGTs), glutathione S-transferases (GSTs), sulfotransferases (SULTs), N-acetyltransferases (NATs), and methyltransferases (MTs).[5] These five enzyme families are responsible for
nearly 90% of all phase 2 metabolites.[4]Computational prediction of xenobiotic metabolism encompasses
several
aspects, including the prediction of the metabolically labile atom
positions in molecules, which are known as sites of metabolism (SoM),
and prediction of metabolite structures.[1,6] Although SoM
prediction can provide valuable information and often allows the structure
of the resulting metabolite to be inferred by a chemist, it is also
of interest to directly predict the metabolites themselves. There
are many freely available and commercial methods for metabolite structure
prediction, though most focus exclusively on CYP-mediated metabolism.
Commercial tools that offer comprehensive prediction of metabolites
for both phase 1 and phase 2 metabolism include Meteor Nexus (Lhasa
Ltd.),[7] TIMES (LMC),[8] and MetabolExpert (CompuDrug Ltd.).[9]In terms of freely available metabolite structure predictors,
a
popular and relatively long-lived, open-source tool is SyGMa.[10] SyGMa generates and ranks metabolites based
on reaction rules and their occurrence ratios derived from the Metabolite
database.[11] The current version of SyGMa
predicts phase 1 and phase 2 metabolites using a set of 145 knowledge-based
reaction rules (118 phase 1 rules and 27 phase 2 rules). Using the
combined set of reaction rules, SyGMa was able to predict 68% of the
metabolites in a test set consisting of 175 parent compounds and 385
reactions (taken from a later release of the Metabolite database compared
to the training data).[10] The predictor
was able to rank 45% of the metabolites in the test set within the
top 10 predictions for their corresponding parent molecules. Unfortunately,
the Metabolite database, which was used to develop the reaction rules
and the occurrence ratios for the scoring, has been discontinued.Another freely available tool for metabolite prediction is BioTransformer,[12] an open-source, comprehensive program that predicts
metabolite structures for humanCYP and phase 2 metabolism as well
as gut microbial, environmental microbial, and human “enzyme
commission (EC)-based” metabolism. BioTransformer has 163 CYP
rules and 74 phase 2 rules as well as additional constraints regarding
molecule types that various rules are allowed to be applied to. Using
a combination of the CYP, phase 2, and EC-based modules (408 rules),
BioTransformer achieved a recall of 88% on its test set of 40 pharmaceuticals
and pesticides, with a precision of 0.49 (188 true positive predictions
and 198 putative false positive predictions). BioTransformer does
not currently rank its predictions.A further freely available
tool, MetaTox,[13,14] predicts metabolites by separately
predicting the probability that
each potential reaction class is relevant to the given molecule and
also predicting the probability of a reaction occurring at each possible
reaction center given each possible reaction class. The probability
that the resulting metabolite is formed is calculated by combining
both of these probabilities, which can then be used to rank the predictions.
The reaction types include both phase 1 and phase 2 reaction types,
though it is unclear how many reaction rules there are in total. During
leave-one-out cross-validation, MetaTox obtained invariant accuracy
prediction (IAP, a metric related to area under the receiver operating
characteristic curve (AUC)) values between 0.79 and 0.95 for the prediction
of the correct reaction class and IAP values between 0.86 and 0.99
for the prediction of the reacting atoms for each of the biotransformation
classes.[14]We recently reported on
the development of a tool called GLORY
that predicts the structures of metabolites formed by the CYP enzyme
family.[15] GLORY includes a new set of reaction
rules for CYP-mediated metabolism, whereby common reaction types are
distinguished from more unusual reactions. Importantly, GLORY explored
how SoM prediction could be effectively employed within the context
of metabolite structure prediction. We were able to demonstrate that
using the predicted SoM probabilities for each atom in a molecule
to score the predicted metabolites, resulting from reactions taking
place at those atom positions, led to a meaningful ranking of the
predictions.The software for SoM prediction that was used in
GLORY was FAME
2,[16] a machine learning-based SoM prediction
program that uses extremely randomized trees classifiers combined
with two-dimensional (2D) circular descriptors to predict SoMs for
CYP-mediated metabolism. Since the development of GLORY, a successor
to FAME 2 has become available. FAME 3[17] continues to use the concept of extra trees classifiers and 2D circular
descriptors developed in FAME 2 and applies this approach to generate
comprehensive SoM prediction models for both phase 1 and phase 2 metabolism.There are several other metabolite prediction tools, both commercial
(e.g., ADMET Predictor[18] from SimulationsPlus,
StarDrop[19] from Optibrium, and MetaSite
from Molecular Discovery[20]) and freely
available (e.g., MetaTox[13,14]), that incorporate
SoM prediction into their metabolite prediction approaches. These
tools either focus solely on CYP-mediated metabolism, have not been
published, or, as described above for MetaTox, have been evaluated
in such a way that makes it difficult to determine how well the metabolite
structures themselves were predicted. Thus, although the concept of
combining SoM prediction with reaction rules for comprehensive metabolite
prediction has been applied in various ways, a systematic analysis
of the performance for both phase 1 and phase 2 metabolism has not
yet been published.We have extended the approach developed
in GLORY to create a new
tool, called GLORYx, that combines SoM prediction with a set of reaction
rules to predict metabolites for both phase 1 and phase 2 metabolism.
GLORYx employs FAME 3 for SoM prediction, the results of which are
used to score and rank the predicted metabolites. Compared to GLORY,
GLORYx requires more reaction rules in order to cover non-CYP phase
1 metabolic reactions as well as phase 2 metabolic reactions. GLORYx
is freely available via a web server at https://nerdd.zbh.uni-hamburg.de/.
Methods
Reference Data Set
A reference data set of compound-metabolite
pairs was compiled from the freely available metabolism data in the
DrugBank (drug group “All”)[21,22] and MetXBioDB[23] databases to serve as
a basis for evaluation of the method during the development of GLORYx.
For each metabolic reaction in either database, the reactant was considered
to be the parent molecule, and the product was considered to be the
metabolite. The reference data set is therefore in the format of a
map of each parent molecule to its first-generation metabolites, regardless
of whether the parent molecule is itself the metabolite of another
molecule.The extraction of the data from DrugBank and MetXBioDB
is consistent with the method used in GLORY (see ref (15)). The differences in the
preprocessing of the data (i.e., assigning a phase and removing the
minor component of salts; see below) arise from considering both phase
1 and phase 2 metabolism rather than just CYP metabolism.The
preprocessing procedure is as follows:Structural information for both the
parent and the metabolite was required in order for a reaction to
be included. For DrugBank, the structures are provided in SD file
format. In MetXBioDB, only InChIs and InChIKeys are provided, so the
InChI was used to generate the structure. Note that stereochemistry
information was ignored for parent molecules as well as metabolites,
so stereoisomers were thereby condensed.The multicomponent parent molecules
in the DrugBank database had to be handled (no multicomponent parent
molecules were found in MetXBioDB). The minor component of each salt
was removed (e.g., K+, Ca2+). There was one
multicomponent compound (DrugBank ID: DB09327) in which the main component
could not be automatically determined, so this compound was excluded
from the reference data set. Note that multicomponent metabolites,
on the other hand, are simply separated into their individual components
and each is considered a separate metabolite.Any metabolite that contained only
one heavy atom (six cases consisting of metal ions, SeH2, and a water molecule; DrugBank only) or had the same InChI, ignoring
stereochemistry, as its parent molecule, was excluded.The metabolites were classified as
either phase 1 or phase 2 metabolites, according to the enzyme or
biotransformation type annotation (see subsection below for details).
If a metabolite could not be assigned a phase, the metabolite was
ignored.Parent molecules
with no remaining
valid metabolites after applying the above criteria were removed.The metabolism data corresponding
to all parent molecules that overlap with a manually curated test
data set (described below) were removed from the reference data set.
The removal of the overlap with the test data set affected 15 parent
molecules from DrugBank and 9 from MetXBioDB.The DrugBank and MetXBioDB data were combined in a straightforward
manner using InChIs generated without stereochemistry information
to compare molecules. If the same parent molecule was present in both
DrugBank and MetXBioDB, then the metabolites from both sources were
combined, disregarding stereochemistry, into one set.
Assigning
a Metabolism Phase to Metabolites in the Data Set
To enable
separate evaluation for phase 1 and phase 2 metabolite
prediction, we assigned each metabolite in the reference data set
to a phase based on the relevant information in DrugBank and MetXBioDB.
This allowed the creation of two distinct subsets of the reference
data set. The phase 1 and phase 2 subsets of the reference data set
represent only phase 1 and phase 2 biotransformations, respectively.
If a parent compound has no relevant metabolites for the given phase,
then it was excluded from the corresponding subset of the data set.For the DrugBank data, the metabolites were assigned to a metabolism
phase based on the enzyme annotation of the reaction. Some enzymes
were omitted completely because they are not enzymes typically associated
with human xenobiotic metabolism (e.g., hemoglobin, serum albumin,
and lyases). See Table S1 in the Supporting
Information for a list of all enzymes that were excluded. This criterion
resulted in the exclusion of only 17 metabolites from 11 parent compounds.For the MetXBioDB data, the appropriate phase for each metabolite
was determined based on the “Biotransformation type”
annotation in the database. The reactions annotated “Human
Phase 1” or “Human Phase 2” were classified as
phase 1 or phase 2, respectively.
Manually Curated Test Data
Set
The test data set was
manually assembled from the scientific literature. We wanted to include
all known metabolites of the parent compounds (i.e., all metabolites
which have been experimentally observed and reported in the scientific
literature), so we chose to structure the data as metabolic trees,
including all generations of metabolites that were found in the literature.The selection of parent molecules for the test data set was based
on the top 100 best-selling drugs from 2018.[24,25] For all the small-molecule drugs within these 100 drugs that are
made up of only the atoms H, C, N, S, O, F, Cl, Br, I, and P, we searched
the scientific literature for relevant metabolism information, specifically
the structures of human metabolites and preferably a scheme depicting
the metabolic tree (see below for more detail). For the listed pharmaceutical
products that are a combination of two or more named drugs (e.g.,
Mavyret is composed of glecaprevir and pibrentasvir), a separate literature
search was undertaken for each drug component. For sources of metabolite
information, we considered all scientific journal publications that
could be found online with Google.The basic criteria for inclusion
in the data set were as follows:The metabolites must be clearly indicated
to be found in humans (either in vivo or in vitro using human hepatocytes,
human liver microsomes, or human liver S9 fractions).Structures of metabolites must be
provided. In cases in which a metabolism scheme is not shown, it must
be clear, based on chemistry knowledge, that the depicted metabolites
are not metabolites of each other, that is, that the depicted metabolites
are all first-generation metabolites of the parent drug.Only fully defined metabolite structures
(i.e., the exact position of the added functional group is shown)
are included in the data set. The branches of the metabolic tree are
followed, and the metabolites included and annotated with the corresponding
generation, until a not-fully defined structure is reached. Any further
metabolites derived from such a not-fully defined structure are ignored.
The maximum metabolite generation included in the data set was generation
five, which occurred for only two parent molecules.Intermediates designated as such in
the scheme are not included in the data set.Some metabolites could not be considered
first-generation (based on chemistry knowledge and additional information
from the text of the publication), even though the visual scheme indicated
that their precursors were only intermediates. Such cases were also
removed.Fatty acid
conjugation was not considered.In the case of one prodrug (abiraterone
acetate), we used the drug itself (abiraterone) as the parent molecule
in the data set because it had more (first-generation) metabolites
shown in the scheme.The data set was
assembled by extracting the SMILES for the parent
compounds from the ChEMBL Database[26,27] by looking
up each drug name. These structures were manually verified for correctness
before proceeding. The SMILES for the metabolites were generated with
MarvinSketch[28] by modifying the parent
molecules to create the metabolites and saving them in SMILES format.
Metabolite stereoisomers were combined, resulting in a structure with
unspecified stereochemistry at the relevant stereocenter.
Data Set
Structure
The final data set contains 37 parent
molecules and is provided as a JSON file (see Notes). There are 136 first-generation metabolites in total.The
JSON file is structured to represent the metabolic trees, which include
multiple generations of metabolites, whenever relevant, following
the procedure explained above. For each parent compound, the DOI or
PMID of the reference paper(s) is provided, along with the drug name,
SMILES, and metabolites. For each metabolite, the name it was given
in the publication is provided for reference (this name is often something
like “M1”) along with the metabolism generation number
and the SMILES. Due to the JSON file format, it is always clear for
the second, third, and subsequent generations of metabolites which
first-generation metabolites were their precursor, and so on.No distinction between phase 1 and phase 2 metabolites is made,
and enzyme annotations are not included, as this information was only
rarely provided in the original literature used to assemble the data
set.
Analysis of the Metabolite Data from MetXBioDB
and DrugBank
The data from MetXBioDB and DrugBank were considered
separately.
The data were extracted and preprocessed from each source as described
in the Reference Data Set subsection, except
that the parent molecules that overlap with the test data set were
not removed. For the analysis described here, only the properties
of the parent molecules were considered.Calculations of molecular
weight and log P in order to plot the distributions
were performed using RDKit.[29] The molecular
weight calculated was the average molecular weight including hydrogens.
One molecule in MetXBioDB was not considered a valid molecule by RDKit
(explicit valence greater than permitted) and was therefore excluded
from all analysis described in this section.Principal component
analysis (PCA) was performed with scikit-learn[30,31] using 44 physicochemical descriptors calculated with the Molecular
Operating Environment (MOE).[32] A full list
of the descriptors, as well as a brief description of each, can be
found in Table S2.
Reaction Rules
The metabolic reaction rules used in
GLORYx are encoded as SMIRKS.[33] Three sets
of reaction rules were used: (1) all of SyGMa’s reaction rules,
which include both phase 1 and phase 2 rules; (2) GLORY’s reaction
rules, covering only CYP metabolism; and (3) a newly developed set
of GSH conjugation rules to augment SyGMa’s phase 2 rules,
which are missing reactions of this type. The reaction rules from
GLORY were used unchanged.
Implementation of SyGMa Reaction Rules
Because the
so-called SMIRKS provided in SyGMa’s open-source python package[34] are actually in the format of RDKit’s
reaction SMARTS, it was necessary to convert them to proper SMIRKS
in order to implement them in our software. This conversion was performed
manually, with care being taken to preserve the chemical meaning of
the reaction.In one case, namely that of oxidative deamination,
additional SMIRKS strings were necessary to achieve the same result
with the SMIRKS that SyGMa achieved with its reaction SMARTS. The
reason is that double bonds in an aromatic ring are not automatically
shifted during transformation in GLORYx. We therefore added two additional
SMIRKS in order to explicitly shift the double bonds for 6-rings and
5-rings. Any invalid products generated by the SMIRKS for this reaction
are ignored because the molecule validity checker in GLORYx discards
transformation products with a carbon atom of invalid atom type (in
this case, a valence >4 due to incorrect bond placement).
Development
of Reaction Rules for Glutathione Conjugation
The scientific
literature indicates that glutathione (GSH) conjugation
by the GST enzyme family occurs mainly at the following functional
groups: epoxides, α,β-unsaturated carbonyls, quinones,
nucleophilic substitution (aliphatic and aromatic), isocyanates (and
isothiocyanates), and nitriles.[35−40] The SMIRKS for these cases were developed based on the reaction
descriptions and example reactions present in the referenced literature.
Metabolite Prediction Program GLORYx
GLORYx applies
the reaction rules to all appropriate positions in the molecule, determined
by where each reaction rule SMIRKS matches, if it matches at all.
Within the program, SoMs are predicted with FAME 3,[17] and the predicted SoM probabilities are used to score and
rank the predicted metabolites. The software is written in Java and
uses CDK version 2.0.[41,42]GLORYx performs an initial
preprocessing step for all input molecules to check that the input
molecule can be successfully parsed by CDK, does not have multiple
components, and contains no element types other than C, N, S, O, H,
F, Cl, Br, I, P, B, and Si (FAME 3’s allowed element types;
note that FAME 3 does not make predictions for B and Si due to a lack
of training data, and for this reason the test set was chosen to not
include any molecules with a B or Si atom). If any of these checks
fail, no predictions are made for the input molecule. Further preprocessing
steps that occur within the context of SoM prediction and the application
of the reaction rules are described in the following subsections.
SoM
Prediction
The SoM prediction in GLORYx is performed
using FAME 3.[17] FAME 3 was trained on the
SoM data from the MetaQSAR database[3,43] and offers
three SoM prediction models: the P1 model predicts SoMs corresponding
to phase 1 metabolic reactions, the P2 model predicts SoMs corresponding
to phase 2 metabolic reactions, and the P1 + P2 model predicts SoMs
corresponding to both metabolism phases.The FAME 3 code includes
preprocessing of the input molecules, involving the standardization
of nitro groups, aromaticity detection, and automatic addition of
hydrogens if the hydrogens of the input molecule are not explicitly
specified. Because the SoM prediction step comes before the application
of the reaction rules within the GLORYx program, the standardization
of the molecules described here remains in place for the subsequent
transformation step described below.FAME 3 uses circular descriptors
that incorporate 15 basic 2D CDK
descriptors and circular atom-type fingerprints (see ref (17) for details). During the
development of FAME 3, the effect of the bond depth of the circular
descriptors was examined, and a bond depth of five was chosen as the
default bond depth for the descriptors. GLORYx uses the default models
with a descriptor depth of five.In order to improve GLORYx’s
ability to rank its predictions
of phase 2 metabolites (see Results for details),
we used previously unpublished reaction class-specific individual
phase 2 SoM models from FAME 3. These models were created using the
identical modeling procedure described in the FAME 3 paper (see ref (17)), but with each model
trained on only a subset of the data. The SoM data from the reaction
classes in the MetaQSAR database[43] corresponding
approximately to the five main enzyme families of phase 2 xenobiotic
metabolism (UGTs, GSTs, SULTs, MTs, and NATs) were selected, and a
separate model was created for each subset. The reaction class types
and the number of molecules used to train the models are described
in Table S3. Note that the data from two
classes of glucuronidation reactions in MetaQSAR were combined to
create a single glucuronidation model.For the reaction class-specific
phase 2 SoM models, GLORYx again
uses the models with a descriptor depth of five, for consistency.
Transformation of Molecules According to Reaction Rules
The reaction rules were applied using Ambit-SMIRKS.[44,45] As for GLORY, any product containing fewer than three heavy atoms
is not included in the set of predicted metabolites.In order
to apply the reaction rules correctly, that is, to achieve the same
predicted metabolites as SyGMa while using the same rules, it was
necessary to use an aromaticity model that could recognize aromaticity
in rings with exocyclic heteroatoms. To achieve this, we chose an
aromaticity model in CDK that uses the ElectronDonation.daylight() electron donation model. In order to allow for better ring recognition
in molecules with more than three rings, we set the cycles portion
of the aromaticity model to Cycles.or(Cycles.all(), Cycles.relevant()), indicating that all cycles would be used whenever possible and
the “relevant” cycles would be used in cases in which
the molecule contained too many cycles for all of them to be considered.
This new aromaticity model is applied directly before reaction rule-based
transformation using all reaction rules, not just the rules sourced
from SyGMa, and does not affect the aromaticity recognition used for
SoM prediction with FAME 3.There is one noticeable remaining
discrepancy related to aromaticity,
as determined by comparison on the reference data set, between GLORYx’s
predictions and SyGMa’s predictions for the “same”
reaction rules. Tetrazoles appear to be recognized as aromatic by
GLORYx but not by SyGMa, as indicated by an aromatic glucuronidation
reaction being successfully applied by GLORYx but not by SyGMa. This
discrepancy affects only three parent molecules in the phase 2 subset
of the reference data set.
Scoring
Each predicted metabolite
is assigned a priority
score in order to rank the predictions. The priority score has two
components. The first component is the predicted SoM probability from
the FAME 3 model used to make the prediction. The maximum SoM
probability among the atoms in the mapping of the reaction rule’s
SMIRKs onto the parent molecule is used.The second component
is a reaction rule weighting factor based on a simple designation
of either “common” or “uncommon” for each
reaction type. This designation, which we previously introduced for
CYP reactions in ref (15), was based primarily on a detailed review of CYP-mediated reaction
types that described both common and uncommon types of reactions.[46] In this work, we use the common–uncommon
designation more loosely, as a simple differentiation in reaction
type prioritization that allows a binary weighting of the reaction
rules. A weighting factor corresponding to the common–uncommon
classification is multiplied with the maximum SoM probability mentioned
above in order to calculate the priority score for the predicted metabolite.
In GLORYx, a weighting factor of 1 is used for reaction rules designated
“common”, and a weighting factor of 0.2 is used for
reaction rules designated “uncommon”. These weighting
factors thereby maintain the same ratio of 5:1 as described previously
in ref (15) but are
scaled such that the final priority score more reflects a probability-like
concept, with values ranging from 0 to 1.The final priority
score of a predicted metabolite is thereby the
product of the maximum SoM probability and the weighting factor corresponding
to the priority level, common or uncommon, of the reaction type.The final assignment of a priority level to the reaction rules
was determined rationally. The priority levels of the CYP metabolism-based
rules from GLORY were not changed. All of the phase 1 rules from SyGMa
were designated uncommon, which does not affect the higher priority
given to common CYP-mediated reaction types in the case of duplicate
reaction types in the SyGMa and GLORY rule sets. The phase 2 rules
corresponding to the five main phase 2 enzyme families were designated
common, while the others (glycination, phosphorylation, and dephosphorylation)
were designated uncommon.
Validation
Predicted
metabolites were compared to the
known metabolites from either the reference data set or the test data
set using InChIs that were generated without stereochemistry information.
Special
Consideration for CYP Reactions
Spontaneous
oxidation from an aldehyde to a carboxylic acid was considered during
the evaluation process, as in GLORY (see ref (15)), but only for predicted
metabolites that were the product of a phase 1 reaction rule. It was
intended that this consideration only apply to CYP reactions, but
SyGMa’s phase 1 reaction rules do not distinguish between CYP
and non-CYP, so this step was applied to all phase 1 products. Note
that this applies only to the validation and does not affect the predicted
metabolites that are provided to the users of GLORYx.
Comparison
to SyGMa
The comparison of GLORYx to SyGMa
was performed using SyGMa[34] with RDKit.[29] One change to the standard usage of SyGMa was
required, in the case where both phase 1 and phase 2 metabolite predictions
were desired. When SyGMa is run with a single metabolism Scenario
object specifying both phase 1 and phase 2, the rule sets for the
phases are applied sequentially, that is, the first rule set listed
(phase 1) is applied first, and then the second rule set (phase 2)
is applied to the parent compound as well as the products of the first
rule set. This behavior corresponds to a different research question
than the one posed in our evaluation, so SyGMa was instead run twice
for each molecule in the test set, once using only the phase 1 rules
and then separately using only the phase 2 rules. The predictions
from both runs were combined.In addition, any predicted metabolite
with the same InChI as the parent compound was ignored, and, for the
sake of comparison, a filter to remove all predicted metabolites with
fewer than three heavy atoms was added (SyGMa’s built-in percentage-based
size filter was turned off). Implicit hydrogens were also added to
SyGMa’s output SMILES before generating the InChIs for comparison
with GLORYx’s predictions.
Results
The concept
of GLORYx is that SoMs, or rather the probability of
each heavy atom being a SoM, are predicted with FAME 3, and, building
on these predictions, a set of reaction rules is applied in order
to generate the structures of predicted metabolites for both phase
1 and phase 2 metabolism. We have previously determined, for our earlier
CYP-focused metabolite prediction tool GLORY, that using the predicted
SoM probabilities as a hard cutoff to determine whether or not to
apply a reaction rule at a given position is not a particularly effective
approach, except if the goal is to simply reduce the number of predictions.[15] Instead, we found that using the predicted SoM
probabilities to score and rank the predicted metabolites enabled
a reasonable ranking of the predicted metabolites while retaining
a high recall of known metabolites. Therefore, we again use the predicted
SoM probabilities to rank the metabolites predicted by GLORYx. For
GLORYx, we also have the capability of using a different FAME 3SoM
prediction model depending on which phase of metabolism is being predicted.GLORYx was developed and analyzed using a large reference data
set containing metabolism data from DrugBank and MetXBioDB. This reference
data set was used to examine phase 1 and phase 2 metabolism separately
to make sure each phase could be handled satisfactorily on its own
as well as to determine how to best combine predictions for both phases.
The final validation of GLORYx was subsequently performed on a manually
curated test data set.
Analysis of the Approach Using a Large Reference
Data Set
A reference data set for the development of the
GLORYx method was
created by combining the freely available metabolism data from DrugBank
and MetXBioDB (see Methods for details). Considering
both phase 1 and phase 2 metabolism, and using the data preparation
process described in Methods, we collected
metabolite data for 560 parent molecules from DrugBank and 1188 parent
molecules from MetXBioDB. Of these parent molecules, 310 are identical,
not considering stereochemistry, meaning there are 1438 parent molecules
total from both sources combined. The metabolites for the overlapping
parent molecules were consolidated when forming the reference data
set. Within this overlap, 555 of 868 metabolites were present in both
data sets. Of the rest, 135 were from DrugBank and 178 from MetXBioDB.It is relevant to mention here that DrugBank does not contain species
annotations for the metabolism data, while MetXBioDB specifies “Human
Phase 1” and “Human Phase 2” metabolic reactions.
Neither data source includes annotations regarding whether any given
metabolite data were collected in an in vivo or an in vitro study.Beyond noting the amount of overlap between the two data sources,
we wanted to examine the chemical space covered by each, in terms
of the parent molecules. To the best of our knowledge, such an analysis
has not yet been done for MetXBioDB. For DrugBank, an analysis focused
specifically on the compounds for which there is metabolite data has
also not yet been undertaken. When performing this analysis, we retained
the overlapping parent molecules in both data sets.In terms
of molecule size, we observe a narrower distribution among
the parent molecules of MetXBioDB than among those of DrugBank, as
seen in Figure A for
molecular weight. In addition, we noted a shift in the distributions,
whereby DrugBank has a median molecular weight of 322 while MetXBioDB
has a median of only 282. The mean values are not compared due to
the presence of an outlier with a molecular weight of 4114 Da (semaglutide)
in the DrugBank data. For calculated log P (clog P) values as well, a narrower distribution is observed for
MetXBioDB (Figure B). However, for clog P the median values of the
two distributions are very similar, at 3.04 for DrugBank and 3.05
for MetXBioDB.
Figure 1
Comparison of the metabolite data from MetXBioDB and DrugBank,
in terms of parent molecules. (A) Distribution of molecular weight.
(B) Distribution of clog P. (C) Histogram of the
number of metabolites per parent molecule in terms of percentage of
parent molecules. (D) Comparison of the chemical space of the parent
molecules from MetXBioDB and DrugBank using PCA calculated using 44
physicochemical descriptors. The percentage of the total variance
explained by each of the first two principal components is included
in the axis labels.
Comparison of the metabolite data from MetXBioDB and DrugBank,
in terms of parent molecules. (A) Distribution of molecular weight.
(B) Distribution of clog P. (C) Histogram of the
number of metabolites per parent molecule in terms of percentage of
parent molecules. (D) Comparison of the chemical space of the parent
molecules from MetXBioDB and DrugBank using PCA calculated using 44
physicochemical descriptors. The percentage of the total variance
explained by each of the first two principal components is included
in the axis labels.In the context of metabolite
prediction, it is especially interesting
to compare the ratio of parent molecules and metabolites recorded
in a data set as this ratio can give an indication of the comprehensiveness
of the metabolism data (metabolism data are generally incomplete;
more metabolites are typically known for compounds of high relevance,
in particular approved drugs). In the case of the DrugBank and MetXBioDB
data, the distributions of the number of metabolites per parent molecule
are quite similar (Figure C). In both cases, the majority of parent molecules have only
one known metabolite. At the same time, over 40% of the parent molecules
from each data source have multiple metabolites.Finally, to
achieve a visual comparison that takes into account
multiple physicochemical properties of the parent molecules, we performed
principal component analysis (PCA) on each set of parent molecules
using 44 physicochemical descriptors (Figure D; see Methods for
details). From the PCA we see that there is a large amount of overlap
between the two data sets, which is unsurprising given that most of
the molecules in the DrugBank data set are also included in the MetXBioDB
data set. However, we also see that there are portions of the chemical
space populated by parent molecules from DrugBank but not from MetXBioDB,
which is consistent with the results from the comparison of the distributions
of molecular weight and clog P. Inspection of the
PCA loading plot (Figure S1) shows that
molecule size and polarity seem to play a large role in the variance
in the PCA plot. In particular, molecule size seems to influence the
first principal component, while polarity seems to influence the second
principal component. Interestingly, the five data points (two from
DrugBank, three from MetXBioDB) in the far right portion of the PCA
plot correspond to the five largest molecules included in the calculation,
all of which have a molecular weight between 1000 and 1300 Da (the
outlier with a molecular weight of over 4000 Da was not included in
the PCA). These five molecules consist of five macrocyclic peptides
(including cyclosporine) and one nonmacrocyclic peptide (angiotensin II).Whereas
the above chemical space analysis included all valid metabolite
data from DrugBank and MetXBioDB, a further data preprocessing step
was performed for the formation of the final reference data set used
for the evaluation of the metabolite structure prediction approach.
All metabolism data corresponding to parent molecules contained in
the test set were removed from the reference data set. This removal
resulted in a final reference data set containing 1420 parent molecules
and a total of 2453 metabolites.The reference data set was
further separated into two subsets,
corresponding to phase 1 and phase 2 metabolism. The phase 1 subset
contains 944 parent molecules and 1763 metabolites, and the phase
2 subset contains 582 parent molecules and their 690 metabolites (Table ). Most of the phase
1 metabolites are CYP metabolites, and most of the phase 2 metabolites
are UGT metabolites (Table ). Note that some of the phase 2 metabolites do not correspond
to any of the listed enzyme families, just as some of the phase 1
metabolites are not formed by CYPs.
Table 1
Composition of the
Reference Data
Seta in Terms of Metabolism Phase and Enzyme
Family
number of metabolitesb
number of
parent molecules
phase 1, all
1763
944
CYP
1640
–
phase 2, all
690
582
UGT
480
–
SULT
92
–
GST
46
–
NAT
34
–
MT
17
–
phase 1 + phase 2
2453
1420
The reference data set was created
by combining the DrugBank and MetXBioDB metabolism data and removing
the data for all parent molecules contained in the test set.
Note that the total numbers of phase
1 and phase 2 metabolites do not equal the sum of the metabolites
from the listed enzyme families, because not all metabolites in the
data set correspond to these main enzyme families.
The reference data set was created
by combining the DrugBank and MetXBioDB metabolism data and removing
the data for all parent molecules contained in the test set.Note that the total numbers of phase
1 and phase 2 metabolites do not equal the sum of the metabolites
from the listed enzyme families, because not all metabolites in the
data set correspond to these main enzyme families.The two separate subsets of the
reference data set were used to
analyze the performance of GLORYx for phase 1 and phase 2 individually,
because there are slightly different considerations for each metabolism
phase. In addition, the entire reference data set was used to analyze
the combined prediction of both phase 1 and phase 2 metabolites.Note that GLORYx is unable to process two parent molecules in the
phase 1 subset of the reference data set and one parent molecule in
the phase 2 subset. Both of the phase 1 parent molecules contain a
Se atom, which FAME 3 cannot handle (partial charges cannot be calculated;
see Methods for a list of allowed element
types). Because no SoM predictions can be made, no metabolites are
predicted. The parent molecule in the phase 2 subset is unable to
be processed because it contains a nitrogen atom with a state that
FAME 3 does not recognize. This is the case regardless of which FAME
3 model is used.
Phase 1 Metabolism
The fundamental
concept of our approach
to predicting metabolites is to integrate machine learning-based SoM
prediction in order to score the predicted metabolites. Therefore,
the first thing we wanted to know is how GLORYx’s SoM probability-based
scoring approach compares to the scoring approach used by the state-of-the-art,
open source, comprehensive metabolite prediction tool SyGMa.To compare the scoring approaches, GLORYx was initially implemented
using only the phase 1 reaction rules sourced from SyGMa. The phase
1-specific FAME 3SoM prediction model (model P1) was used to predict
SoMs. The predicted metabolites were scored using the maximum SoM
probability predicted among all heavy atoms in the mapping onto the
parent molecule of the reaction rule that led to the particular predicted
metabolite. In this case, the score was therefore equal to this SoM
probability; no weighting based on reaction type was used. SyGMa,
on the other hand, ranks its predictions based on probability scores
that are calculated using the occurrence ratios of each reaction rule
in the Metabolite database. Each of SyGMa’s predicted metabolites
is assigned a probability score corresponding to the reaction rule
that formed the predicted metabolite.Given the same reaction
rules, SyGMa with its reaction probability
score-based ranking performed slightly better than our SoM probability-based
ranking, with an AUC of 0.76 compared to 0.73, respectively, as shown
in Figure A. This
result is reasonable if we suppose that the Metabolite database, which
was used to calculate the occurrence ratios for SyGMa’s reaction
types, was so exhaustive even in its 2001 version (the version used
to develop SyGMa) that it contained most of the contents of the current
versions of DrugBank and MetXBioDB. This supposition is consistent
with the observation in 2013 by Kirchmair et al. that nearly all of
the Approved Drugs in DrugBank at the time (1341 out of 1391) were
found in the 2011 version of the Metabolite database as top-level
substrates.[47] Unfortunately, without access
to the Metabolite database, which is currently unavailable, we are
unable to perform a comparison ourselves. Nevertheless, it appears
that GLORYx achieves a comparable ranking performance to SyGMa.
Figure 2
Rank-based
ROC curves for the evaluation of metabolite prediction
performance on the reference data set. The ranks are calculated based
on the priority scores of the predicted metabolites for each parent
molecule. (A) Comparison of GLORYx, which scores its predicted metabolites
based on predicted SoM probabilities, to SyGMa, which uses reaction
probability scores, for phase 1 metabolite prediction. Weighted rules
refer to the weighting of the SoM probability-based score based on
whether the reaction type is designated common or uncommon. (B) Comparison
of the ranking performance of GLORYx with different scoring approaches
and rule sets as well as direct comparison to SyGMa’s performance,
for phase 2 metabolite prediction. The scoring approach that is based
on both SoM probability and reaction probability is achieved by a
simple multiplication of the two components. (C) Comparison of the
ranking performance of GLORYx for combined prediction of metabolites
for phases 1 and 2 metabolism, using different SoM prediction approaches
to score the predicted metabolites. In both cases, the score is based
on predicted SoM probability with weighting according to reaction
type, and the rule set is made up of the final phase 1 rule set (SyGMa
and GLORY rules) and final phase 2 rule set (SyGMa and GSH conjugation
rules).
Rank-based
ROC curves for the evaluation of metabolite prediction
performance on the reference data set. The ranks are calculated based
on the priority scores of the predicted metabolites for each parent
molecule. (A) Comparison of GLORYx, which scores its predicted metabolites
based on predicted SoM probabilities, to SyGMa, which uses reaction
probability scores, for phase 1 metabolite prediction. Weighted rules
refer to the weighting of the SoM probability-based score based on
whether the reaction type is designated common or uncommon. (B) Comparison
of the ranking performance of GLORYx with different scoring approaches
and rule sets as well as direct comparison to SyGMa’s performance,
for phase 2 metabolite prediction. The scoring approach that is based
on both SoM probability and reaction probability is achieved by a
simple multiplication of the two components. (C) Comparison of the
ranking performance of GLORYx for combined prediction of metabolites
for phases 1 and 2 metabolism, using different SoM prediction approaches
to score the predicted metabolites. In both cases, the score is based
on predicted SoM probability with weighting according to reaction
type, and the rule set is made up of the final phase 1 rule set (SyGMa
and GLORY rules) and final phase 2 rule set (SyGMa and GSH conjugation
rules).
Phase 1 Metabolism: Combination of Reaction
Rules from SyGMa
and GLORY
For GLORY, we had developed a set of reaction rules
specific to CYP-mediated metabolism.[15] These
reaction rules were manually created based on the scientific literature
on CYP-mediated reaction types and mechanisms, and each reaction rule
received a designation of either common or uncommon reaction type,
also according to the literature. SyGMa’s phase 1 reaction
rules are not separated into CYP and non-CYP rules, so it was of interest
to determine whether adding these CYP-specific rules to the phase
1 rules sourced from SyGMa would result in any gains in performance
for GLORYx.When combining the rule sets, the overlap of the
rules from the two different sources is handled in a straightforward
manner. Duplicate metabolite predictions are combined by retaining
the highest priority score. The addition of the CYP reaction rules
from GLORY resulted in a substantial jump in recall (portion of known
metabolites that were successfully predicted, also known as sensitivity)
from 0.72 to 0.84 (Table ). The precision (portion of predictions that match known
metabolites), on the other hand, was halved, as the number of total
predicted metabolites more than doubled, from over 10,000 to nearly
25,000. Note that only a fraction of the metabolites generated by
organisms is experimentally observed and reported in the scientific
literature and databases, for a number of reasons (e.g., lack of chemical
stability, low concentrations of metabolites, limitations of the in
vitro system, research interest focused on a specific metabolic enzyme
or reaction or metabolite). Therefore, any predicted metabolites that
are not “known” should more correctly be considered
as putative false positive predictions. Nevertheless, the number of
predicted metabolites is enormous, so it is crucial that metabolite
prediction methods are able to rank their predictions in a meaningful
way.
Table 2
Performance of GLORYx on Predicting
Phase 1 Metabolites
GLORYx using
reaction rules from both SyGMa and GLORY
GLORYx using
reaction rules from SyGMa only
recall
0.84
0.72
precision
0.060
0.12
total number of predictions
24,906
10,550
number of true
positive
predictions
1487
1262
AUC (rank-based)
0.80
0.73
To examine the ranking performance of GLORYx using
the combined
rule set, we first used only the SoM probability to score and rank
the predicted metabolites, as described above. This nonweighted scoring
approach resulted in an AUC of 0.75 (Figure A), which was close to SyGMa’s AUC
of 0.76. Note that even though the sets of predicted metabolites are
different in this case, the ranking ability of each approach can still
be compared using the ROC curves and AUC. We then applied the concept
of weighting reaction rules that we first developed for GLORY, namely
applying a simple common vs uncommon distinction between reaction
types and generating the priority score for a predicted metabolite
by multiplying the SoM probability by a factor corresponding to whether
or not the reaction that led to that particular predicted metabolite
was designated common (see ref (15) for details). The common–uncommon designations of
the reaction rules from GLORY were used unchanged. Then we simply
designated all of SyGMa’s phase 1 reaction rules as uncommon,
based on the following logic: The CYP enzyme family is the most prevalent
enzyme family involved in phase 1 metabolism,[4] SyGMa’s phase 1 reaction rules contain rules for both CYP-
and non-CYP-mediated reactions, and our process of combining duplicate
predictions by keeping the highest score ensures that any CYP rules
from SyGMa that are also “common” rules from GLORY will
be in effect scored appropriately as being “common”.
The result of this weighting of the rule sets was a jump in AUC to
0.80 (Figure A).A similar trend in AUCs for GLORYx in terms of the weighting approach
is observed when the ROC curves are calculated based on score rather
than rank (Figure S2A). This means that
predicted metabolites are compared across different parent molecules
in the reference data set in terms of their priority scores. Here,
it is important to note that the original publication of SyGMa implied
that its score was only intended to be used to compare likelihoods
of predicted metabolites of the same parent molecule, and the evaluation
in that publication only considered the ranking per parent molecule.[10] This consideration should be kept in mind when
viewing all score-based ROC curves for SyGMa throughout this manuscript,
which are included for the sake of completeness, especially since
the score-based ROC curves for GLORYx tend to yield a higher AUC than
the rank-based curves, yet the opposite is true for SyGMa (Figure S2 and Figure ).It is also relevant to note that
the phase 1 subset of our reference
data set is heavily biased toward CYP-mediated metabolism, with over
90% of the metabolites in the data set being CYP metabolites (Table ). Although CYPs are
widely considered the most relevant enzyme family for phase 1 human
xenobiotic metabolism, the available data are perhaps even more skewed
toward CYP data than would be realistic in humans. Due to the composition
of this phase 1 reference data set, it is reasonable that the addition
of the CYP-specific rules from GLORY leads to improved performance.
Phase 2 Metabolism
For phase 2 metabolite structure
prediction, we again examined the question of how scoring the predicted
metabolites based on the SoM probability predicted by FAME 3 compares
to SyGMa’s scoring approach. Similarly to the phase 1 protocol,
the initial comparison was carried out using only the phase 2 reaction
rules from SyGMa, along with the general phase 2 SoM prediction model
from FAME 3 (model P2), and scoring the predicted metabolites using
only the SoM probability predicted by the SoM model. This comparison
showed a large difference in ranking performance between SyGMa and
our approach (Figure B). SyGMa achieved an AUC of 0.85, while our approach, which used
the SoM probabilities predicted by the FAME 3P2 model to rank the
predicted metabolites, achieved an AUC of only 0.67.It therefore
appears that SoM probabilities are a surprisingly poor indication
of the likelihood of phase 2 metabolism occurring. We know, however,
that FAME 3 predicts SoMs corresponding to phase 2 metabolic reactions
very well (AUC of 0.97 on a holdout data set consisting of 157 randomly
selected compounds with a total of 3476 annotated atoms).[17] The reason for this discrepancy is that multiple
predicted metabolites, corresponding to different reaction types,
receive the same score because they correspond to the same predicted
SoM. Phase 2 metabolic reactions are more specific in terms of functional
groups at which they can occur than, for example, CYP-mediated reactions,
which makes it easier to predict SoMs but more difficult to predict
which reaction type would be more likely to actually occur at a given
location. To illustrate this point, consider the case of a hydroxyl
group. A hydroxyl group that is a phase 2 SoM could be glucuronidated,
sulfated, methylated, or phosphorylated. Another difficult case would
be an amine group, which, if it is a phase 2 SoM, could be glucuronidated
or N-acetylated. These observations combined with the poor ranking
performance indicate that, so far, GLORYx struggles to discriminate
between phase 2 reaction types.In light of this observation
and to further investigate the relationship
between the predictive capabilities of SoM probabilities and reaction
probabilities, we attempted to combine the two scores, since in theory
both the SoM and the likelihood of a particular reaction rule compared
to other reaction rules that could be applied at a given location
are both relevant to the likelihood of the predicted metabolite. We
tried two combination approaches: multiplying the reaction probability
with the SoM probability and calculating a weighted average. Despite
trying various weights (Table S4), a combination
score was unable to do better than SyGMa’s reaction probability-based
scoring approach alone at ranking the predictions. In addition, by
varying the weights, it became clear that the more highly the predicted
SoM probability was weighted compared to the reaction probability,
the worse the ranking performance was (Table S4). The weighted average score combination, using weights up to 10:1,
achieved a maximum rank-based AUC of 0.83 (Table S4), whereas multiplying the SoM probability by the reaction
probability resulted in a rank-based AUC of 0.85 (Table S4, Figure B), which is the same as for SyGMa’s reaction probability
score alone (however, the shape of the ROC curve is slightly different).
These results indicated that the SoM probabilities predicted in this
way could not compete with SyGMa’s reaction probability scores
when it comes to ranking performance.SyGMa’s good ranking
performance was to be expected, for
the same reasons discussed in the above section on phase 1 metabolism
regarding the use of the Metabolite database to develop SyGMa’s
reaction probability scores. Meanwhile, the poor showing by the SoM
probability scoring approach indicates that reactivity is not sufficient
to discriminate between the different types of phase 2 reactions,
especially not when compared to the data-derived likelihoods of each
reaction type. We therefore examined how we could use SoM prediction
to achieve a distinction between different reaction types without
resorting to precomputed occurrence ratios for the reaction rules.
Phase
2 Metabolism: Reaction Type-Specific SoM Prediction Models
In order to attempt to better predict which reaction type would
be more likely at a given SoM, without using SyGMa’s reaction
probabilities, we developed FAME 3 reaction type-specific SoM prediction
models that roughly correspond to the five main phase 2 enzyme families:
UGTs, GSTs, SULTs, MTs, and NATs. These models were created using
the same training protocol as for the previously published FAME 3
models. Each model was trained on only a subset of the FAME 3 data
set, whereby the subsets were selected based on the reaction class
annotation in the MetaQSAR database. The reaction classes and the
number of molecules used to train each model are provided in Table S3. The 10-fold cross-validation performance
of these models was high across the board, with average AUCs all above
0.95 and the average percentage of molecules in which a correct SoM
was predicted among the top two atom positions with the highest SoM
probabilities (top 2 metric) all above 0.87 (Table ), despite the relatively small number of
molecules used for training in each case. Note, however, that these
models are trained on atoms, not molecules, so the number of training
instances (although not entirely independent from each other) is much
larger than the number of molecules.
Table 3
Average
SoM Prediction Performance
of the FAME 3 Reaction Class-Specific Models During Cross-Validation
reaction
class
average top
2
average
AUC
glucuronidations and glycosylations
0.957
0.988
GSH and RSHa conjugations
0.874
0.950
sulfonations
0.966
0.992
methylations
0.877
0.968
acetylations and acylations
0.956
0.992
RSH = protein thiol.
RSH = protein thiol.Because these reaction type-specific
SoM models were each trained
on only a subset of the molecules that were used to train the general
phase 2 SoM model, not all atom types (i.e., Sybyl atom types) were
represented in the training data for each individual model, which
can then not make predictions for molecules containing these unrepresented
atom types. Therefore, these individual reaction type-specific SoM
models were used to overrule the predicted SoM probabilities from
the general P2 model for the molecules to which they apply rather
than as a complete substitute for the general model.There are
a few phase 2 reaction rules in SyGMa that do not correspond
to any of the five main phase 2 enzyme families. These rules are simply
designated “uncommon”, while all other phase 2 reaction
rules are designated “common”, and the general P2SoM
prediction model is always used to score the products of these uncommon
reaction rules.The general P2 model is also used to score the
predicted metabolites
corresponding to the individual reaction type-specific SoM models
that can not make predictions for a given input molecule. For example,
if the SoM model for sulfonation reactions could not make predictions,
the predicted metabolites resulting from sulfonation reaction rules
are scored using the predicted SoM probabilities from the general
P2 model. An illustration of the workflow for predicting phase 2 metabolites
using the reaction type-specific SoM models for scoring is shown in Figure .
Figure 3
Workflow of phase 2 metabolite
prediction using reaction type-specific
SoM models to score and rank the predicted metabolites. The reaction
type-specific SoM models (“UGT”, “GST”,
“SULT”, “NAT”, “MT”) are
used instead of the general phase 2 SoM model (P2) to score the products
of the relevant reactions for all molecules in which all of the reaction
type-specific models are able to make a prediction. The green arrows
indicate the molecules that were predicted successfully by the relevant
reaction type-specific SoM model. If one or more of the reaction type-specific
models cannot make predictions for a given molecule, then that molecule
additionally follows the path of the black arrows, followed by a deduplication
of predictions. The “UGT” model covers glucuronidation
and glycosylation reactions, the “GST” model covers
GSH and RSH conjugations, the “SULT” model covers sulfonations,
the “NAT” model covers acetylations and acylations,
and the “MT” model covers methylation reactions. The
“other phase 2 rules” refer to the rules that are neither
glucuronidation, GSH conjugation, sulfonation, acetylation, or methylation
rules.
Workflow of phase 2 metabolite
prediction using reaction type-specific
SoM models to score and rank the predicted metabolites. The reaction
type-specific SoM models (“UGT”, “GST”,
“SULT”, “NAT”, “MT”) are
used instead of the general phase 2 SoM model (P2) to score the products
of the relevant reactions for all molecules in which all of the reaction
type-specific models are able to make a prediction. The green arrows
indicate the molecules that were predicted successfully by the relevant
reaction type-specific SoM model. If one or more of the reaction type-specific
models cannot make predictions for a given molecule, then that molecule
additionally follows the path of the black arrows, followed by a deduplication
of predictions. The “UGT” model covers glucuronidation
and glycosylation reactions, the “GST” model covers
GSH and RSH conjugations, the “SULT” model covers sulfonations,
the “NAT” model covers acetylations and acylations,
and the “MT” model covers methylation reactions. The
“other phase 2 rules” refer to the rules that are neither
glucuronidation, GSH conjugation, sulfonation, acetylation, or methylation
rules.In this way, the same metabolites
are predicted as if only the
general P2 model was used, but the reaction type-specific scoring
approach results in different ranks of the metabolites and, perhaps
most importantly, a drastic reduction in the number of tied ranks
for predicted metabolites of a single parent molecule.Using
the individual reaction type-specific phase 2 SoM models
to score the predicted metabolites resulted in a large improvement
in the ranking, with an AUC of 0.77 compared to an AUC of 0.67 using
the general P2 model (Figure B) and only the reaction rules sourced from SyGMa for comparison.
Similarly, the score-based AUC increased from 0.66 to 0.79 upon implementation
of the reaction type-specific SoM models (Figure S2). Unfortunately, even using the reaction type-specific SoM
prediction models resulted in a ranking performance that was worse
than SyGMa’s (AUC of 0.77 compared to 0.85, respectively).
However, as discussed for phase 1 metabolism above, SyGMa’s
approach has the advantage of having derived its scoring approach
directly based on, in effect, all available metabolism data from a
comprehensive but not freely available database. Meanwhile, the difficulty
of using SoM prediction for phase 2 metabolism appears to be that
there are relatively few potential SoMs, but that the atom environments
may not be specific enough to differentiate between different types
of reactions.Based on these results, we therefore use the individual
reaction
type-specific phase 2 SoM models to score the phase 2 metabolites
predicted in all subsequent sections of this manuscript.
Phase 2
Metabolism: Addition of GSH Conjugation Reaction Rules
We
found that the reaction rules sourced from SyGMa do not contain
any GSH conjugation reactions, which correspond to the GST enzyme
family, one of the five main enzyme families for phase 2 xenobiotic
metabolism. We therefore developed a set of GSH conjugation reaction
rules based on descriptions of GSH conjugation metabolic reactions
in the scientific literature. This resulted in nine new reaction rules.When only the reaction rules sourced from SyGMa were implemented,
GLORYx achieved a recall of 0.78 and a precision of 0.21 (Table ). When the new GSH
conjugation reaction rules were added, GLORYx achieved a recall of
0.80 with the same precision, because only 141 more metabolites were
predicted in total (Table ). Though we do not see a very large improvement in performance
on the reference data set after adding these GSH conjugation reaction
rules to GLORYx, we believe that this addition is actually meaningful
for the purpose of metabolite structure prediction in the real world,
because GSTs are actually the second most relevant phase 2 enzyme
family for xenobiotic metabolism in terms of number of metabolites
formed.[4]
Table 4
Performance of GLORYx
on Predicting
Phase 2 Metabolites
GLORYx using
SyGMa rules only
GLORYx using
SyGMa rules plus GSH conjugation rules
recall
0.78
0.80
precision
0.21
0.21
total number of predictions
2509
2650
number of true
positive
predictions
539
555
AUC (rank-based)
0.77
0.78
As was expected based on the relatively low number
of GST-mediated
metabolites in the reference data set (Table ), the ranking performance remained similar
upon the addition of GSH conjugation reaction rules (AUC of 0.77 and
0.78, respectively; Figure B). This comparable performance seems to suggest that the
products of the new GSH conjugation reaction rules are scored in a
meaningful way based on the corresponding reaction type-specific SoM
prediction model.
Combined Phase 1 and Phase 2
A general
use case of
predicting “all” possible metabolites at once was also
considered. FAME 3 provides one model, “P1 + P2”, that
predicts all SoMs from both phases of metabolism. For this use case,
we therefore examined whether it makes sense to use the P1 + P2 FAME
3 model’s predictions to score the predicted metabolites or
to use the separate models, as determined separately for phase 1 and
phase 2 (see sections Phase 1 Metabolism and Phase 2 Metabolism), and combine
the predictions. The predicted metabolite structures are the same
in both cases; what changes is their scores, since those are based
on the predicted SoM probabilities.Using separate SoM prediction
models for the two phases did provide a slight advantage in terms
of the ranking performance, with an improvement in AUC from 0.78 to
0.80 compared to using the P1 + P2SoM model, as shown in Figure C. An improvement
of the same amount is seen in the AUCs of the score-based ROC curves
(AUC increased from 0.79 to 0.81; Figure S2C). Although this advantage appears small at first glance, it is important
to recall the composition of the reference data set. This data set
contains more than twice as much phase 1 data as phase 2 data, in
terms of number of known metabolites, which may cause the benefit
of using separate SoM prediction models for the two phases to be underrepresented
by this analysis. Based on these considerations along with the ROC
curves, we conclude that the multimodel approach should be used for
optimal performance, and we use this approach in the validation on
the test data set (see section Performance on a
Manually Curated Test Data Set).
Performance on a Manually
Curated Test Data Set
The
performance of the final version of GLORYx was evaluated on a manually
curated test data set consisting of 37 parent molecules that were
among the top 100 best-selling drugs in 2018. For these parent molecules,
the data set contains a total of 136 first-generation metabolites,
which equates to an average of 3.7 known metabolites per parent molecule.
This test data set does not contain enzyme or metabolism phase annotations,
so the evaluation was carried out from the perspective of predicting
all possible metabolites, from both phase 1 and phase 2 metabolism.GLORYx was able to predict 77% of the known metabolites in the
test data set, which is higher than SyGMa’s recall of 68% (Table ). In conjunction
with this higher recall, GLORYx had a lower precision than SyGMa (0.061
compared to 0.12, respectively), which is unsurprising given that
GLORYx contains many more reaction rules than SyGMa due to the addition
of the CYP metabolism rules from GLORY and the new GSH conjugation
rules. The total number of metabolites predicted by GLORYx was nearly
double the number predicted by SyGMa. However, SyGMa’s precision
of 0.12 was also very low, due to a relatively large number of predictions
(800 total). Another potential contribution to the low precision of
both tools is that experimentally determined metabolites whose structures
have not been fully defined were not included in the test data set.
It is possible that this aspect of the data set has an effect on the
number of false positive predictions, which would have an effect on
the precision as well.
Table 5
Performance of GLORYx
and SyGMa on
the Test Data Set of 37 Parent Compounds and Their 136 Metabolites
GLORYx
SyGMa
recall
0.77
0.68
precision
0.061
0.12
total number of predictions
1724
800
number of true positives
(out of 136)
105
93
AUC (rank-based)
0.79
0.74
The relatively large number of predictions made by
both SyGMa and
GLORYx is a general problem that is shared by all available metabolite
structure prediction approaches.[48] This
phenomenon clearly underlines the need to have a meaningful way to
rank the predicted metabolites. In our case in particular, neither
SyGMa nor GLORYx has sufficiently high precision to be used without
ranking the predicted metabolites.In terms of the ability to
rank the predicted metabolites, GLORYx
showed better performance than SyGMa, as indicated by the ROC curves
shown in Figure .
The AUC of the rank-based ROC curve was 0.79, compared to 0.74 for
SyGMa. In addition, GLORYx’s priority score seems to be a meaningful
score in and of itself, not just for ranking the predictions for individual
parent molecules separately, because the ROC curve and AUC were actually
slightly better using the score than they were using the rank (0.81
compared to 0.79, respectively; Figure ). For SyGMa, the AUC of the score-based approach was
also higher than that of the rank-based approach (0.77 compared to
0.74); however, it is important to note that SyGMa’s score
was most likely not intended to be used to compare predicted metabolites
from different parent compounds (see section Phase 1 Metabolism: Combination of
Reaction Rules from SyGMa and GLORY).
Figure 4
ROC curves for GLORYx
and SyGMa representing ranking performance
on the test set based on the (A) ranks and (B) scores of the predicted
metabolites.
ROC curves for GLORYx
and SyGMa representing ranking performance
on the test set based on the (A) ranks and (B) scores of the predicted
metabolites.To get an idea of the variability
in the ranking performance on
the test data set, we calculated the AUCs while systematically removing
one parent molecule at a time from the data set. This resulted in
37 different AUCs for each tool and AUC type (rank-based or score-based),
which are plotted in Figure S3. From this
analysis, we observed a similar amount of variability in the AUCs
between the two tools with the different metrics (score based and
rank based). In all cases, the median AUCs from this analysis were
within 0.01 of the AUCs reported in Figure . From the outliers observed in Figure S3, we learn that two to three molecules
are particularly challenging for each tool. In the case of GLORYx,
the two outliers correspond to having excluded everolimus (the furthest
outlier) and budesonide for both the score-based and rank-based AUCs.
For SyGMa, the furthest outlier is also caused by the exclusion of
everolimus, while the second-furthest outlier corresponds to having
excluded darunavir. Everolimus is a macrocycle with 12 known metabolites
in the test data set, while budesonide and darunavir each have 6 known
metabolites.Overall, a clear difference in performance is observed
between
the two tools, with GLORYx outperforming SyGMa in both cases. The
improvement in ranking performance seems to indicate that combining
predicted SoM probabilities with reaction rules to score the predicted
metabolites, whereby the SoM model and the reaction rules correspond
to the same type(s) of reactions, provides very valuable information.
This approach also has the benefit of not relying on reaction rule
occurrence ratios based on existing metabolism data to score and rank
the predictions. Our reference data set was used to measure performance
during development but was not used to develop reaction rules or calculate
occurrence ratios. This difference could potentially make GLORYx more
flexible with regard to never-before-seen input molecules.
Conclusion
GLORYx is a new tool for predicting the structures of metabolites
formed by both phase 1 and phase 2 metabolic reactions in humans.
The tool utilizes FAME 3 to predict, for all atom positions in a molecule,
the likelihood of a biotransformation to take place at this position
and, based on these predictions, applies a set of reaction rules to
generate and rank likely metabolites.In conjunction with a
high recall of known metabolites (77% on
the test data set), GLORYx ranked the predicted metabolites with an
AUC of 0.79 on the manually curated test data set. This recall and
ranking performance is better than we observed for the established,
freely available tool SyGMa on the same data set. However, when considering
only phase 2 metabolite prediction, SyGMa’s ranking performance
was better than that of GLORYx.We have observed that it is
difficult to predict phase 2 metabolites,
that is, difficult to rank the predicted metabolites in a meaningful
way, based on predicted SoM probabilities despite high performance
of the SoM prediction models themselves. We have concluded that the
cause of this difficulty is that reactivity is an insufficient metric
for determining which type of conjugation reaction would be more likely
to occur at a particular atom position. We were able to mitigate this
problem substantially by using individual reaction type-specific SoM
prediction models corresponding roughly to the five main phase 2 enzyme
families.During each run of GLORYx, the algorithm generates
and ranks one
generation of metabolites based on the parent compound(s) provided.
Users may of course provide (predicted) metabolites as input to GLORYx,
hence enabling multigeneration metabolite prediction.Given
the scarcity of the available high-quality data on small-molecule
metabolism, it is difficult to provide a robust definition of the
applicability domain of GLORYx. However, we know from thorough analyses
of FAME 3 that the metabolic properties of the atoms in a molecule
are first and foremost determined by the proximate atom environment,
and these environments are much more redundant across the chemical space than the overall (global) structure
of molecules. Considering also that the reaction rules implemented
in GLORYx are based on only a few connected atoms, GLORYx is expected
to provide reliable results for a wide range of synthetic compounds
and natural products alike.GLORYx is freely available as a
web server at https://nerdd.zbh.uni-hamburg.de/ and is also provided as a software package upon request. Note that
GLORYx should be considered an extension of GLORY rather than a replacement.
Hence both tools are available on the Web site, to enable users to
choose between CYP-specific metabolite structure prediction with GLORY
and comprehensive phase 1 and phase 2 metabolite structure prediction
with GLORYx.
Authors: Petra Jancova; Pavel Anzenbacher; Eva Anzenbacherova Journal: Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub Date: 2010-06 Impact factor: 1.245
Authors: Anna Gaulton; Anne Hersey; Michał Nowotka; A Patrícia Bento; Jon Chambers; David Mendez; Prudence Mutowo; Francis Atkinson; Louisa J Bellis; Elena Cibrián-Uhalte; Mark Davies; Nathan Dedman; Anneli Karlsson; María Paula Magariños; John P Overington; George Papadatos; Ines Smit; Andrew R Leach Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971
Authors: Yongjie Zhang; Shalenie P den Braver-Sewradj; Michiel W den Braver; Steven Hiemstra; Nico P E Vermeulen; Bob van de Water; Jan N M Commandeur; J C Vos Journal: Front Pharmacol Date: 2018-04-18 Impact factor: 5.810
Authors: Marjory Moreau; Pankajini Mallick; Marci Smeltz; Saad Haider; Chantel I Nicolas; Salil N Pendse; Jeremy A Leonard; Matthew W Linakis; Patrick D McMullen; Rebecca A Clewell; Harvey J Clewell; Miyoung Yoon Journal: Front Toxicol Date: 2022-04-29
Authors: Prashant J Chaudhari; Sanjaykumar B Bari; Sanjay J Surana; Atul A Shirkhedkar; Chandrakant G Bonde; Saurabh C Khadse; Vinod G Ugale; Akhil A Nagar; Rameshwar S Cheke Journal: ACS Omega Date: 2022-05-12
Authors: S Cyrus Khojasteh; Upendra A Argikar; James P Driscoll; Carley J S Heck; Lloyd King; Klarissa D Jackson; Wenying Jian; Amit S Kalgutkar; Grover P Miller; Valerie Kramlinger; Ivonne M C M Rietjens; Aaron M Teitelbaum; Kai Wang; Cong Wei Journal: Drug Metab Rev Date: 2021-06-24 Impact factor: 6.984
Authors: Scott Wyer; Danyelle M Townsend; Zhiwei Ye; Antonis Kourtidis; Yeun-Mun Choo; André Luís Branco de Barros; Mohamed S Donia; Mark T Hamann Journal: Biomed Pharmacother Date: 2022-02-08 Impact factor: 6.529