Quantitative analysis of known drug-target interactions emerged in recent years as a useful approach for drug repurposing and assessing side effects. In the present study, we present a method that uses probabilistic matrix factorization (PMF) for this purpose, which is particularly useful for analyzing large interaction networks. DrugBank drugs clustered based on PMF latent variables show phenotypic similarity even in the absence of 3D shape similarity. Benchmarking computations show that the method outperforms those recently introduced provided that the input data set of known interactions is sufficiently large--which is the case for enzymes and ion channels, but not for G-protein coupled receptors (GPCRs) and nuclear receptors. Runs performed on DrugBank after hiding 70% of known interactions show that, on average, 88 of the top 100 predictions hit the hidden interactions. De novo predictions permit us to identify new potential interactions. Drug-target pairs implicated in neurobiological disorders are overrepresented among de novo predictions.
Quantitative analysis of known drug-target interactions emerged in recent years as a useful approach for drug repurposing and assessing side effects. In the present study, we present a method that uses probabilistic matrix factorization (PMF) for this purpose, which is particularly useful for analyzing large interaction networks. DrugBank drugs clustered based on PMF latent variables show phenotypic similarity even in the absence of 3D shape similarity. Benchmarking computations show that the method outperforms those recently introduced provided that the input data set of known interactions is sufficiently large--which is the case for enzymes and ion channels, but not for G-protein coupled receptors (GPCRs) and nuclear receptors. Runs performed on DrugBank after hiding 70% of known interactions show that, on average, 88 of the top 100 predictions hit the hidden interactions. De novo predictions permit us to identify new potential interactions. Drug-target pairs implicated in neurobiological disorders are overrepresented among de novo predictions.
Drug discovery and development
has become increasingly challenging
in recent years, evidenced by the estimated cost of around $1.8 billion
for the development of a novel molecular entity with suitable pharmacological
properties.[1] This cost increase partly
originates from the failure of many drug candidates in phase II or
III clinical trials due to their toxicity or lack of efficacy.[2] The efficiency of drug discovery and development
might be improved by adopting a systemic approach that takes into
consideration the interaction of existing drugs and candidate compounds
with the entire network of target proteins and other biomolecules
in a cell.[3] Indeed, the “one gene,
one drug, one disease” paradigm is widely recognized to fail
in describing experimental observations.[4] Many drugs act on multiple targets, and many targets are themselves
involved in multiple pathways. For example, β-lactam antibiotics
and most antipsychotic drugs exert their effect through interactions
with multiple proteins.[5,6] Biological networks are highly
robust to single-gene knockouts, as recently shown for yeast where
80% of the gene knockouts did not affect cell survival.[7] Similarly, 81% of the 1500 genes knocked out
in mice did not cause embryonic lethality, further corroborating the
robustness of biological networks against single target perturbagens.[8] These results suggest that quantitative systems
pharmacology strategies that take account of target (and drug) promiscuities
can present attractive alternative routes to drug discovery.Recent years have seen many network-based models adopted to reduce
the complexity of, and efficiently explore, drug–target interaction
systems.[2,5,6,9] In particular, the development of computational methods
that can efficiently assess potential new interactions became an important
goal. In this regard, the important role that machine learning approaches
such as active learning (AL) can play has been recently been highlighted.[10] Computational approaches used to predict unknown
drug–target interactions can be divided into roughly four categories:
chemical-similarity-based methods,[11−13] target-similarity-based
methods,[14−16] integrative (both target- and chemical-similarity-based)
methods,[17−23] and holistic approaches.[24−29] The first two posit that if two entities are chemically or structurally
similar they will share interactions. The integrative approaches combine
the chemical- and target-similarity methods. While the intuition behind
these approaches is very reasonable, their performance has been observed
to be tied to the underlying similarity computation method. We also
note that the utility of different methods may depend on the size
of the data set being analyzed, e.g., computing chemical–chemical
and target–target similarity matrices can be problematic for
large databases like STITCH[30] (that contains
information on the interactions between more than 2.6 million proteins
and 300 000 chemicals). To overcome these limitations, holistic
methods have been introduced, which utilize a number of different
data sources such as gene expression perturbation[25,26] or high-throughput screening.[28]In this study, we propose a novel approach by using a collaborative
filtering algorithm to predict interactions without reliance on chemical/target
similarity or external data collection. We validate the utility of
probabilistic matrix factorization (PMF) for predicting unknown drug–target
interactions with the help of a detailed investigation of its performance.
The method is shown to group drugs according to their therapeutic
effects, irrespective of their three-dimensional (3D) shape similarity.
Benchmarking computations show that the method outperforms recent
methods[17,20,22] when applied
to large data sets of protein–drug associations, such as those
of enzyme– and ion channel–drug pairs; whereas the performance
falls short of these methods with decreasing size of the examined
data set (e.g., GPCR- and nuclear receptor-drug data sets). The ability
of the method to efficiently analyze and make inferences from large data sets of protein–drug interactions suggests
that, with growing sizes of those data sets, the utility (and accuracy)
of the method will further improve.Application of the same
benchmarking procedure to DrugBank[31] confirms
its ability to disclose hidden data:
88 out of the top 100 predictions (or 587 out of 1000) are found to
hit known (but hidden) interactions, when only 30% of the entire data
is used for training. Finally, when the method is trained on the entire
data set of drug–target interactions compiled in DrugBank,
de novo predictions for drug repurposing can be made along with the
corresponding confidence levels. Top de novo predictions include many
drugs indicated for neurodegenerative diseases or neurobiological
disorders, including drug–target pairs apparently supported
by previous experiments (not reported in DrugBank), e.g., ergotamine–serotonin
receptor 1A (5HT1A),[32] amoxapine-5-HT2A,[33] and verapamil–calmodulin.[34] In conclusion, the newly introduced computational
method provides an efficient approach for identifying potential drug–target
association between chemicals and targets and formulating new hypotheses
for repurposable drugs or side effects, thus complementing those deduced
from chemical–chemical or target–target similarities.
Materials and Methods
Problem Definition
The drug–target
interaction network is a bipartite graph with two types of nodes:
drugs and targets (Supporting Information Figure S1). Each edge represents an interaction between a drug and
a target. The drug–target interaction identification problem
is to determine the missing edges that are likely to exist given all
nodes and some of the edges in the network.
Data
Set
We used DrugBank (version
of September 20, 2011) as the database.[31] All drugs annotated therein as approved, along with their annotated
targets, are included in our data set (i.e., we excluded compounds
annotated as withdrawn or nutraceutical), resulting in N = 1413 drugs and M = 1050 targets with 4731 interactions
among them. The interaction network displays small-world characteristics:
many nodes have low degree and a few, very high degree, as illustrated
in panels b and c of Supporting Information Figure S1, in line with previous studies on drug–target networks.[35] On average, there are 3.35 interactions per
drug, and 4.50 interactions per target.
Probabilistic
Matrix Factorization (PMF)
PMF is a member of the collaborative
filtering family of machine
learning algorithms that decomposes the connectivity matrix, R, of
a bipartite graph of N drugs and M targets as a product of two matrices of latent variables (LVs).[36,37]R is defined asThe
matrix R is modeled as the product of two matrices UT and V, that
express each drug/target in terms of D LVs. Our objective
is to find the best approximation for LVs, while
avoiding overfitting. The predicted R̂ is then expressed aswhere UT and V are composed of N rows uT and M columns v, respectively, each being D-dimensional. The PMF adopts a probabilistic linear model
with Gaussian noise to model the interaction. Therefore, the conditional
probability over observed interactions is represented aswhere f(x|μ, σ2) is the Gaussianly distributed probability
density function for x, with mean μ and variance
σ, and I is the
indicator function equal to 1 if the entry R is known and 0 otherwise. Therefore, p(R|U, V, σ2) gives us a probabilistic representation of the connectivity
matrix, R.[37] Using zero-mean,
spherical Gaussian priors on LVs, we can writewhich leads to the log-likelihood
of U and V given byHere C is a term that does
not depend on LVs; the first term on the right-hand side is the squared
error function to be minimized; and the two summations over the square
magnitudes of u and v are regularization terms that
favor simpler solutions and penalize overfitting. The above log-likelihood
directly follows from the Bayes’ rule where R stands
for data, and U and V represent the model
(see the Supporting Information for details).
To learn an optimal model means to find the U and V matrices, or the D-dimensional LV vectors, u (1 ≤ i ≤ N) and v (1 ≤ j ≤ M), that maximize the log-likelihood function.The PMF method
yields the optimal u and v vectors corresponding to each
drug, d, and each target t, respectively. The basic
idea is that the model is forced toward making a “no-interaction”
prediction by the regularization—i.e., there is a penalty associated
with any nonzero value in the LV matrices. However, there is also
a penalty for failing to capture known interactions—i.e., if
the dot product of the LV vectors corresponding to an interacting
drug–target pair is close to zero. Therefore the learning of
a model means to optimally balance out two objectives: developing
a sufficiently complex model to describe the known interactions, but
not overly complex to end up in overfitting. In this study, we use
gradient descent for optimization. The adoption of higher D values usually yields more accurate results, although
beyond a certain limit the increase in complexity and decrease in
efficiency may not warrant the marginal improvement, if any, in prediction
accuracy. D = 50 is adopted here as an optimal dimensionality
for prediction runs. The method is highly efficient: a 50-dimensional
model is trained on the entire DrugBank in approximately 2 s using
a 2.00 GHz AMD Opteron processor. Moreover, the computing time to
learn a PMF model scales linearly with the number of interactions,
and as such, the method can be advantageously used for much larger
data sets.
Active Learning (AL) Using
Probabilistic Matrix
Factorization
The AL strategy adopted in the present study
is, in part, motivated by the success reported by Warmuth et al.[38] who demonstrated that hit maximization is a
viable AL strategy applicable to predicting drug–target interactions.
The AL strategy adopted here also prioritizes the discovery of unknown
interactions. Our method differs in that we aim at capturing the interactions
between all drugs and targets, as opposed to predicting activity against
a single target.The procedure is the following: We begin with
the set of N drugs and M targets,
and known associations, schematically shown in Figure 1 by black connectors. The purpose is to identify
new associations, indicated by red connectors. For
each candidate interaction, say the possible interaction between d and t, we compute the model’s estimate, f(R|uTv, σ2) (eq 3). The dot product uTv serves as a weight ω for the edge/connector between d and t. Clearly, ω, or the likelihood
of association between d and t, is high when u and v have both large values of the same sign at the same
dimension(s). For example, a relatively large weight may originate
from the second component of both u and v, which means
that the predicted association is mainly due to latent variable 2.
We evaluated the statistical weights ω(d, t) for the N × M pairs of drug targets for two purposes: (i) benchmarking the methodology
via an iterative AL scheme and (ii) making de novo predictions. In
the former case, the method is benchmarked by hiding 70% of known
interactions and examining whether the top-ranking prediction is a
“hit”, i.e., whether it corresponds to a known (but
hidden) interaction. The outcome from this test is fed back to the
model, to repeat the calculation for the next prediction. Therefore,
the AL model is updated at each iteration using the newly acquired “hit”
or “miss” data until a predetermined number (m) of predictions are made. The passive learner (PL) makes
the m predictions simultaneously without updating
its model.
Figure 1
Qualitative illustration of the method for identifying drug–target
interactions. The known interactions between drugs and targets (indicated
by the black lines) are used to learn the LV vectors
(shown adjacent to each node) that describe each drug and target.
The dot product uTv of the LVs for each pair
of drug d and target t defines the predicted statistical
weight ω of corresponding connection.
Example predictions are shown in red.
Qualitative illustration of the method for identifying drug–target
interactions. The known interactions between drugs and targets (indicated
by the black lines) are used to learn the LV vectors
(shown adjacent to each node) that describe each drug and target.
The dot product uTv of the LVs for each pair
of drug d and target t defines the predicted statistical
weight ω of corresponding connection.
Example predictions are shown in red.In the case of de novo predictions, all DrugBank
data were used
as input. De novo predictions also lend themselves to an AL scheme
provided that the top-ranking prediction is experimentally tested
and then the new hit or miss data are incorporated in the model to
perform a new prediction, and so on, until the experimentation budget
is exhausted.
Results and Discussion
PMF Cluster Drugs with Therapeutic Similarities,
Irrespective of Their Chemical–Structural Similarities
To assess whether the LVs provide us with a pharmacologically meaningful
metric, we examined the clustering of drugs in the D-dimensional space of the latent vectors. The clustering was performed
for D = 30—the value that gave the lowest
Akaike information criterion,[39] using as
basis the drug–drug distance L1(d, d) = ∑|u – u| where u designates the kth component of u, and
the summation is performed over D components.Inasmuch as our method evaluates drugs based on their interaction
profiles with targets, which in turn refer to specific therapeutic
or phenotypic actions, the similarity of a pair of drugs should be
high when their therapeutic effects are comparable and vice versa.
Thus, the method will tend to cluster drugs that exhibit similar patterns
of interactions (with target proteins), which we term as functionally
similar drugs.The heat map in Figure 2a displays the resulting
organization of drugs in 30 clusters (indicated by different colors
and indices along the axes). Supporting Information Table 1 lists the dominant therapeutic action associated with each
cluster. The dark regions on the map indicate high functional similarity.
The dark blocks along the diagonal show that most clusters include
highly similar members, except for two (clusters 29 and 30), which apparently combine the outliers.
Figure 2
Comparison
of pairwise similarities of drugs, based on their (a)
therapeutic targets compiled in DrugBank and (b) 3D structure. (a)
30 clusters of drugs (color-coded along the axes; see Supporting Information Table S1 for their dominant
therapeutic indication) deduced from the PMF of 1413 approved drugs
and corresponding 1050 targets compiled in DrugBank. By definition,
drugs belonging to a given cluster share similar interaction patterns
with respect to targets. (b) 3D similarities, with the drugs being
ordered as in panel a. Dark regions indicate high
similarity based on LVs (panel a) or 3D similarities (panel b). Comparison
of the panels shows that close proximity in LV space (which indicates
functional similarity) does not necessarily imply 3D-structure similarity.
LV distances were distributed in the range [0, 1]; with the distribution
of values also skewed in different ways. To render the two sets comparable,
we performed rank normalization on both the LV similarities and 3D
similarities. Selected boxes are enlarged in Figures 3 (white) and 4 (yellow), and Supporting Information Figure S2 (green).
Comparison
of pairwise similarities of drugs, based on their (a)
therapeutic targets compiled in DrugBank and (b) 3D structure. (a)
30 clusters of drugs (color-coded along the axes; see Supporting Information Table S1 for their dominant
therapeutic indication) deduced from the PMF of 1413 approved drugs
and corresponding 1050 targets compiled in DrugBank. By definition,
drugs belonging to a given cluster share similar interaction patterns
with respect to targets. (b) 3D similarities, with the drugs being
ordered as in panel a. Dark regions indicate high
similarity based on LVs (panel a) or 3D similarities (panel b). Comparison
of the panels shows that close proximity in LV space (which indicates
functional similarity) does not necessarily imply 3D-structure similarity.
LV distances were distributed in the range [0, 1]; with the distribution
of values also skewed in different ways. To render the two sets comparable,
we performed rank normalization on both the LV similarities and 3D
similarities. Selected boxes are enlarged in Figures 3 (white) and 4 (yellow), and Supporting Information Figure S2 (green).
Figure 3
Latent variables capturing
therapeutic action similarities when
3D similarity metrics cannot. Closer examination of the similarities
between the members of cluster 14 in Figure 2 (enclosed in white boxes in Figure 2, enlarged in panels b and c here) shows that the cluster
contains a series of antianxiety drugs. A few members of this cluster
(indicated by orange boxes along the abscissa of panels b and c) are
displayed in panel a, to illustrate their shared structural features,
also indicated by panel c that reflects their 3D similarities. The
same cluster however contains ethchlorvynol, also used as a sedative,
which would have been missed if we had used exclusively used 3D similarity
to identify functionally similar drugs.
Figure 4
Strong
cross-correlations between different clusters of drugs,
consistent with their similar therapeutic functions. Cluster 11, color-coded cyan, is essentially composed
of hypnotics and sedatives. Cluster 14 (dark
gray) contains antianxiety drugs. The drugs in these two
clusters are located very closely on the drug–target interaction
network, as shown in panel a, consistent with their similar actions.
The LV-derived heat maps capture the functional similarity between
these two clusters (as indicated by strong signals, or the dark region, in panel b); the maps based on 3D similarity
(panel c) do not. In panel a drugs are shown in blue, protein targets in red. Most drugs and targets
are part of a single connected component. Data are retrieved from
DrugBank.[48] Cytoscape is used for visualization.[56]
Given that (promiscuous) proteins present more than one site
for
ligand-binding, different functionalities may be modulated by chemical-structurally
different drugs, depending on the binding site on the target (e.g.,
catalysis, substrate recognition, or allosteric signaling). Furthermore,
a shared phenotype may arise from the targeting of different proteins
along a given pathway. In order to make a better assessment of the
properties of drugs grouped in those clusters, we examined their 3D
structural similarities. High similarities would suggest that they
bind similar epitopes, if not similar (or identical) structural domains
or proteins. If, on the contrary, they are structurally dissimilar,
this might indicate a different site on the same protein, a different
target on the same pathway, or other indirect effect due to drug–target
network connectivity.The extent of 3D structure similarity
between pairs of drugs was
computed using the OpenEye Scientific software (http://www.eyesopen.com/). 3D similarity was reported to be a better predictor than 2D methods
for off-target interactions and to perform equally well in on-target
interactions,[40] although 3D methods may
suffer from more noise due to the conformational flexibility of the
small molecule. We generated for each drug all possible stereoisomers
using OpenEye FLIPPER[41] and up to 200 conformers
per stereoisomer using OpenEye OMEGA.[41] All combinations of conformers accessible to the examined pair of
drugs were examined using OpenEye Shape[23] toolkit; and the best matching pair was adopted to assign a 3D similarity
score. This computationally expensive task led to the heat map presented
in panel b of Figure 2. The drugs (along the
axes) are ordered as in panel a to enable visual comparison.The comparison of Figure 2 shows that some
clusters of functionally similar drugs (panel a) also exhibit some
3D similarities (panel b), whereas others display little structural
similarity. We examined more closely the individual clusters to see
if shared therapeutic functions were captured even when 3D similarities
were absent. Figure 3 illustrates the results
for cluster 14. This cluster essentially consists
of antianxiety drugs, the majority of which are both functionally
(panel b) and structurally (panel c) similar. However, the cluster
also includes a structurally dissimilar drug, ethchlorvynol (panel
a), which shares the same type of phenotypic action (as a sedative)
as the majority of the cluster membership (mostly targeting GABA receptors).
The present approach thus detects chemically or structurally distinctive
drugs that share common activities, which would have been missed by
methods based on ligand fingerprint similarities.Latent variables capturing
therapeutic action similarities when
3D similarity metrics cannot. Closer examination of the similarities
between the members of cluster 14 in Figure 2 (enclosed in white boxes in Figure 2, enlarged in panels b and c here) shows that the cluster
contains a series of antianxiety drugs. A few members of this cluster
(indicated by orange boxes along the abscissa of panels b and c) are
displayed in panel a, to illustrate their shared structural features,
also indicated by panel c that reflects their 3D similarities. The
same cluster however contains ethchlorvynol, also used as a sedative,
which would have been missed if we had used exclusively used 3D similarity
to identify functionally similar drugs.Another interesting observation concerns the cross-correlations
between different clusters (i.e., the off-diagonal regions of the
heat maps). We note for example that cluster 11 also
contains a set of sedatives. LVs are able to capture the commonality
between the clusters 11 and 14 as
may be seen by the strong signal (dark region) at the off-diagonal
region enlarged in Figure 4b. The 3D similarity,
on the other hand, cannot recognize the functional similarity and
potential interference/side effects between these drugs in these two
clusters (Figure 4c). See Supporting Information Figure S2, which illustrates the same
behavior for another cluster, whose members are mostly antineoplastic
agents, albeit with various 3D structures. The LVs thus provide information
on drug groups that potentially share pathways or exhibit similar
activity patterns despite their distinct physicochemical properties.Strong
cross-correlations between different clusters of drugs,
consistent with their similar therapeutic functions. Cluster 11, color-coded cyan, is essentially composed
of hypnotics and sedatives. Cluster 14 (dark
gray) contains antianxiety drugs. The drugs in these two
clusters are located very closely on the drug–target interaction
network, as shown in panel a, consistent with their similar actions.
The LV-derived heat maps capture the functional similarity between
these two clusters (as indicated by strong signals, or the dark region, in panel b); the maps based on 3D similarity
(panel c) do not. In panel a drugs are shown in blue, protein targets in red. Most drugs and targets
are part of a single connected component. Data are retrieved from
DrugBank.[48] Cytoscape is used for visualization.[56]
Benchmarking Computations Support the Utility
of the Method for Analyzing Large Data Sets
To evaluate the
performance of the method in comparison to previous work, we considered
three important studies in this area, one recently published by Gonen[22] and two by Yamanishi et al.[17,20] Gonen used a Kernel-based matrix factorization (KBMF) with chemical
and genomic similarities to predict multiple targets. Yamanishi et
al., on the other hand, integrated chemical, genomic, and pharmacological
data to map all drugs and targets to the same unified feature space
where each protein–compound pair closer than a predefined threshold
was predicted to interact. Our approach differs from both studies,
in that PMF assumes an independent LV for each row and column with
Gaussian priors; whereas KBMF employs LVs spanning all rows and columns
with Gaussian process priors, and Yamanishi et al. project drugs and
targets into a pharmacological space based on the eigenvalue decomposition
of the graph-based similarity matrix.The benchmarking procedure
that we adopted is a 5-fold cross-validation of drugs on four target
classes: enzymes, ion channels, G-protein coupled receptors (GPCRs),
and nuclear receptors. In order to achieve comparable results, we
used the same protocol as that adopted earlier, i.e., we divided our
data set into five subsets; each was used as a test set, and the others,
as training sets. Due to the randomness involved in the selection
of subsets, we repeated the cross-validation experiments 100 times
with randomly selected subsets and evaluated the average AUC (area
under the receiver operating curve) for each subset. The first four
rows in Table 1 compare the results (columns
6–10) for the four classes, and the fifth row lists the average
performances weighted by the size of the interaction space. Our method
performs best when applied to large data sets (e.g., enzymes and ion
channels); whereas Gonen’s performs best in the case of GPCRs,
and Yamanishi et al.[20] exhibits the highest
performance for nuclear receptors, where the present method yields
a relatively low (0.642) AUC value. Examination of the statistical
significance of our results (Supporting Information Figure S3a) indicates that the mean AUC values obtained for all
four sets are highly robust. Their covariances vary from 2% (enzymes
and ion channels) to 11% (nuclear receptors). Finally, the application
of the same benchmarking protocol to DrugBank yielded an accuracy
rate of 79.4 ± 0.01% (Table 1, last row,
and Figure S3), supporting the utility of the method when applied
to large data sets.
Table 1
Properties of the
Examined Space of
Proteins–Drugs and Performance of the Present Method in Comparison
to Others (*)
Yamanishi (pred pharmacol effects)
target type
no. of known
interactions
no. of drugs
(N)
no. of targets
(M)
size of interaction
space (N×M)
percent occupancy
of the space
2008a
2010a
Gonen, 2012a
present method (D = 50)
enzymes
1515
212
478
101336
1.50%
0.821
0.845
0.832
0.861 ± 0.02
ion channels
776
99
146
14454
5.37%
0.692
0.731
0.799
0.904 ± 0.02
GPCRs
314
105
84
8820
3.56%
0.811
0.812
0.857
0.771 ± 0.04
nuclear receptors
44
27
22
594
7.41%
0.814
0.830
0.824
0.650 ± 0.11
allb
2649
443
730
0.782
0.807
0.825
0.859 ±
0.03
DrugBank
4731
1413
1050
1483650
0.32%
0.794 ± 0.01
The last
four columns present the
comparison with the results of Yamanishi[17,20] and Gonen[22] for the same data set.
Weighted-average mean and covariances,
evaluated using the number of interactions as weights.
The last
four columns present the
comparison with the results of Yamanishi[17,20] and Gonen[22] for the same data set.Weighted-average mean and covariances,
evaluated using the number of interactions as weights.In principle, it might be intrinsically
harder to make accurate
predictions for larger data sets as the size of the potential interaction
space N × M grows quadratically
when the number of drugs and targets grow linearly, particularly if
the number of known interactions is small. The occupancy of the N × M interaction matrix is only 1.5%
in the enzyme class, which could make it difficult to learn an informative
model. The present PMF technique, however, successfully learned an
informative model and handled the complexity of interactions in this
space of interactions, apparently due to the availability of a sufficiently
large (absolute) number of known interactions (Supporting Information Figure S3b).The ion channel
drug class has the second largest number of known
interactions among the four. Although the size of interaction space
is 1 order of magnitude smaller than the enzyme class, there are 776
known interactions leading to a percent occupancy of 5.37% of all
possible ion channel-drug associations. The success of our method
in this case may be attributed to both the relatively large number
of known interactions and the rich annotation of that class of interactions.The two other classes, GPCRs and nuclear receptors, are significantly
smaller in terms of their interaction space and/or occupancy of that
space. Nuclear receptors comprise only 27 drugs, 22 targets, and 44
interactions. A method that relies solely on connectivity, like ours,
cannot presumably formulate an informative model when the set of “edges”
to construct the network connectivity matrix is incomplete. In those
cases, the data that come from other sources, e.g. chemical similarity
and genomic patterns, amend this lack of information. Consequently,
methods that incorporate such features[20,22] outperform
ours.To further examine the effect of scarcity of known interactions
on the performance of the present method, we performed additional
tests by varying the fraction of hidden interactions. The results
are presented in Supporting Information Figure S4. Panels a–d show the performance on ion channels,
enzymes, GPCRs, and nuclear receptors, respectively. These results
show that the performance depends on the fraction of known interactions.
To put the results into perspective, we indicated by a vertical dashed
line in each panel the fraction of data (80%) used in previous studies[20,22] for training purposes. Consistent with the above findings, ion channels
yield the best result: previous AUC values[22] (of 0.799; Table 1) are matched with about
only 35% of the data. In the enzyme group, we match the performance
of Yamanishi et al.[20] (AUC of 0.845) with
roughly 70% of the data used for training. GPCRs and nuclear receptors
yield AUC values lower than those previously attained,[20,22] irrespective of the fraction of hidden interactions.In summary,
the method is particularly suitable for screening and
inferring repurposable drugs or potential side effects from large
data sets where computational assessment of structure similarity kernels
become prohibitively expensive. In cases where the data set of known
interactions is too small, on the other hand, 2D or 3D similarity
metrics provide more accurate assessments.
Absolute
Number of Known Interactions Overrides
the Scarcity of the Data in Determining the Accuracy Rate of PMF Active
Learner
As a more stringent test, 3318 (70%) of the known
4731 interactions in DrugBank were randomly hidden, reducing the average
number of interactions per drug from 3.35 to 1. The resulting “incomplete”
interaction matrix was then used to predict the hidden interactions,
one at a time (rank-ordered by statistical weights ω(d, t)) as described
in Materials and Methods. The outcome was
checked in a simulated experiment to assess whether the predicted
interaction is a true positive (TP) or a false positive (FP). If the
prediction is an existing, but hidden, interaction, the result is
considered a TP (or hit), otherwise a FP (or miss). Then the model
is updated in line with our AL scheme, and this loop is repeated until
the completion of m = 1000 predictions. At that point,
the simulation is halted and the overall performance of the model,
or the hit ratio, is evaluated. Note that this method gives us a lower
bound for hit ratio because the predictions are labeled as hits only
if they are annotated in DrugBank, although they can be true but not
yet observed experimentally or annotated in DrugBank.The results
are presented in Figure 5. The figure displays
the number of hits as a function of the number of predictions, obtained
with three approaches: active learning (dark blue curves), passive learning (dark red curves), and
random (green). The approach is able to achieve,
on average, 587 hits out of 1000 predictions via AL, 407 hits, via
PL; and the corresponding variances (indicated by the dashed curves)
are 35 and 46, respectively. Compared to the random probability of
2.23 hits per 1000 predictions, the AL result is a 263-fold improvement
over random. The improvement of AL over PL is 1.44 fold. The AL improvement
over random was reported to be up to 3.19-fold in a previous SVM-based
study for predicting the activity of 1316 drugs against a single target.[38] The same study also reported 1.59-fold improvement
between passive and active learners. Closer examination of the results
from the top 100 predictions (enlarged in the inset) further shows
that hit ratios of 88.0 ± 4.7% and 82.2 ± 6.4% are obtained
by the respective AL and PL protocols. The results are obtained with D = 50, which yields optimal results, as can be seen from Supporting Information Figures S5 and 6 display the dependence of the results on D, in support of the choice of D = 50.
Figure 5
Ability
of the method to recapitulate hidden drug–target
interactions. The number of drug–target interactions per drug
was reduced from 3.35 (average) to 1 by hiding 70% of known interactions,
selected randomly. Simulations were repeated n =
96 times for each of the 1 < m < 1000 predictions
(abscissa) and the number of hits (correctly identified hidden interactions)
is plotted for each run, along the ordinate. The dark blue
and dark red solid curves refer to the average performance
obtained by active learning and passive learning protocols, respectively,
using D = 50, ε = 3, λ = 0.01, and μ
= 0.9 in the adopted PMF algorithm. Dashed curves show the corresponding
variances (by one standard deviation) above and below the mean value.
The green curves (practically overlapping with the
abscissa) refer to results from random predictions. The inset shows
a close-up of the first 100 predictions. AL reaches an accuracy rate
(hit ratio) of 88.0 ± 4.7% and 58.7 ± 3.5% in the respective
cases of m = 100 and 1000 predictions.
Figure 6
Improvement in prediction accuracy by AL over random (a)
and over
PL (b), as a function of the latent space dimensionality. Fold improvement
is based on hit ratios obtained at the end of 1000 predictions, using
same parameters as Figure 5. The AL performance
levels off at about D = 50 in panel a. The last bar
in each panel refers to the work of Warmuth et al.[38]
Ability
of the method to recapitulate hidden drug–target
interactions. The number of drug–target interactions per drug
was reduced from 3.35 (average) to 1 by hiding 70% of known interactions,
selected randomly. Simulations were repeated n =
96 times for each of the 1 < m < 1000 predictions
(abscissa) and the number of hits (correctly identified hidden interactions)
is plotted for each run, along the ordinate. The dark blue
and dark red solid curves refer to the average performance
obtained by active learning and passive learning protocols, respectively,
using D = 50, ε = 3, λ = 0.01, and μ
= 0.9 in the adopted PMF algorithm. Dashed curves show the corresponding
variances (by one standard deviation) above and below the mean value.
The green curves (practically overlapping with the
abscissa) refer to results from random predictions. The inset shows
a close-up of the first 100 predictions. AL reaches an accuracy rate
(hit ratio) of 88.0 ± 4.7% and 58.7 ± 3.5% in the respective
cases of m = 100 and 1000 predictions.Improvement in prediction accuracy by AL over random (a)
and over
PL (b), as a function of the latent space dimensionality. Fold improvement
is based on hit ratios obtained at the end of 1000 predictions, using
same parameters as Figure 5. The AL performance
levels off at about D = 50 in panel a. The last bar
in each panel refers to the work of Warmuth et al.[38]These results permit us to draw
two conclusions. First, a hit ratio
of 88% is attainable in the top 100 predictions (and 59% in top 1000)
upon adopting a PMF-based AL strategy for identifying hidden/unknown
interactions in a sparse (0.32% occupancy) data set of about 1.5 million
potential interactions. Second, the AL method outperforms random by
2 orders of magnitude and PL by a ratio of 1.5 approximately, in support
of AL strategy for predicting new interactions.
De novo Predictions of Drug–Target
Interactions
We used our method to predict new (potential)
drug–target interactions after training our model on the latest
available version of DrugBank (September 10, 2013) comprised of 5041
interactions between 1502 approved drugs and 1138 targets. The highest
confidence pairs obtained for twenty distinct drugs are presented
in Table 2. The pairs therein were observed
to lie frequently among the top-ranking 10 pairs (in the space of N × M = 1.7 × 106 potential
interactions), deduced from 104 independent runs initiated
with different random numbers.
Table 2
De novo Predictions,
Rank-Ordered
Based on Confidence
The cases listed as “indirect”
refer to interactions of the drugs with different subtypes of the
identified target.
Orphenadrine
inhibits norepinephrine
reuptake thus potentiating the effect of norepinephrine There are
several drugs acting as norepinephrine-dopamine reuptake inhibitors,
and ophenadrine might exhibit the same behavior.
Abbreviations:
GABA γ-aminobutyric-acid
receptor; 5-HT 5-hydroxytryptamine (or serotonin) receptor; NET norepinephrine
(or noradrenaline) transporter; ADRA1A adrenoreceptor α1 A;
DAT dopamine transporter.The cases listed as “indirect”
refer to interactions of the drugs with different subtypes of the
identified target.Orphenadrine
inhibits norepinephrine
reuptake thus potentiating the effect of norepinephrine There are
several drugs acting as norepinephrine-dopamine reuptake inhibitors,
and ophenadrine might exhibit the same behavior.We note that the list of de novo
predictions in Table 2 is dominated by drugs
used for neurological or
psychiatric disorders, consistent with the known pharmacological promiscuity
of this of drugs. The last column in Table 2 lists the experimental support from the literature, if any, for
the possible interaction of the drug–target pair in each row.
Among pairs supported by previous experiments, we note ergotamine–serotonin
receptor 1A (5HT1A),[32] amoxapine-5-HT2A,[33] verapamil–calmodulin,[34] paliperidone-5-HTc,[42] meperidine–sodium channels,[43] and cinnarizine–calmodulin.[44] We also note that chronic treatment with paroxetine
has been recently reported to increase the mRNA levels of histamine
receptor H1, indicating an association between paroxetine
and Histamine receptor H1.[45] Although paroxetine
is a potent serotonin reuptake inhibitor, its weight gain side effect
has been attributed to its medication action on histamine receptors.[46]Many others (indicated as “indirect”)
are interactions
known to occur either with subtypes of the listed targets or proteins
implicated in the same phenotype (e.g., citalopram induces norepinephrine
receptor hypoactivity,[47]which may relate
to norepinephrine transport by NET). Yet, the validity of these predictions
need to be established by experiments. Here, for exploratory purposes,
we examined the potential binding pose and energetics of verapamil–calmodulin.
Verapamil is known as a Ca2+ channel entry blocker. Its
interaction with calmodulin is supported by the fluorescence experiments[32] and by the inhibition of calmodulin-stimulated
(Ca2+ + Mg2+) ATPase activity.[48] We used as template the Protein DataBank (PDB)[49] structure of the cocrystal between calmodulin
and trifluoperazine (PDB identifier 1CTR)[50] and examined
the binding affinity of verapamil relative to that of trifluoperazine,
using the module SMINA,[51] based on AutoDock
VINA.[52] The software yielded an attractive
energy of −4.8 kcal/mol for trifluoperazine binding to calmodulin,
which after energy minimization, becomes −6.8 kcal/mol. Docking
of verapamil to the same location and energy minimization led to −5.8
kcal/mol (Figure 7). While docking software
with fixed target often fails to provide a quantitative assessment
of binding affinity, and more elaborate simulation methods have been
developed for druggability assessment (see, e.g., our recent work[53]), these results support, qualitatively, the
shape complementarity between calmodulin and verapamil, as well as
their favorable electrostatic interaction. Establishment of this and
other predicted interactions, however, awaits experimental verifications
by essays designed to test the specific drug–target interactions.
Figure 7
Structural
model for possible binding pose of verapamil onto calmodulin.
The molecular surfaces of both the drug and calmodulin are colored
by electrostatic potential (a and b) and a close-up view is shown
in panel c. Calmodulin surface, especially at the ligand-binding cleft,
is negatively (partial) charged, favorably interacting with the verapamil
surface that has partial positive charge. Note the shape between the
verapamil and calmodulin. All the visualizations have been performed
with OpenEye VIDA software.
Structural
model for possible binding pose of verapamil onto calmodulin.
The molecular surfaces of both the drug and calmodulin are colored
by electrostatic potential (a and b) and a close-up view is shown
in panel c. Calmodulin surface, especially at the ligand-binding cleft,
is negatively (partial) charged, favorably interacting with the verapamil
surface that has partial positive charge. Note the shape between the
verapamil and calmodulin. All the visualizations have been performed
with OpenEye VIDA software.
Concluding Remarks
Over the last couple
of years, there have been a number of computational
studies performed to identify targets of existing drugs and drug candidates
other than those originally known/proposed to be targeted. A pioneering
study is that of Roth, Shoichet, and co-workers[11,13] based on compound chemical similarities. Dudley et al. focused on
inverse correlations between gene expression profiles in the presence
of a drug and in a disease state.[25] Yamanishi
and his colleagues represented drugs and targets in an integrated
“pharmacological space”.[17,20] Gonen used
a KBMF method where chemical and genomic similarities were integrated.[22] We proposed a PMF-based AL methodology that
can be advantageously used for large data sets.The applicability
of the method to large data sets is worth further
attention, given that we will increasingly have access to bigger data
(e.g., STITCH Database[30]), which will be
exploited for repurposable drug identification. The software developed
here, made accessible in http://www.csb.pitt.edu/Faculty/bahar/files/, is readily scalable. For very large data sets, which typically
have more known interactions, the PMF is able to construct a better
model using the plethora of available data; whereas when the number
of known interactions is limited, the use of chemical and genomic
kernels allows KBMF to outperform PMF. The application of KBMF to
large data sets may, however, become challenging, For example, STITCH
contains on the order of 106 proteins and 105 compounds, implying that 1012 sequence and 1010 chemical similarity comparisons are needed to make predictions.
However, the PMF method is independent of chemical, structural or
other similarity metrics, and its computation time scales linearly
with the number of known interactions; and it proves to perform well
on large data sets. The data sets reporting drug–target interactions
are constantly improving in quality and quantity and, therefore, expected
to give even better results when analyzed by an efficient tool. Finally,
the extension of the method to analyzing big data (with millions of
nodes) is foreseeable in the near future. The recently introduced
GraphChi tool[54] can be used for optimized
and parallelized model learning for further performance improvements.The fact that the PMF is independent of 2D/3D shape comparison
methods commonly employed in drug–target pair inferences implies
that the derived LVs capture similarities based on the interaction
patterns of drugs at the cellular level, even if their molecular structures
are dissimilar (see Figures 2 and 3). As such, the method may be advantageously used
for lead hopping, thus complementing those (e.g., SVM classification
algorithms) used in conjunction with 2D or 3D pharmacophoric fingerprints
(see the work of Saeh et al.[55]). Inasmuch
as the currently proposed method does not require structural data
for proteins but knowledge of drug–target interactions, it
can be advantageously applied to membrane proteins (major drug targets)
for which structural data still remain sparse. It can also be used
to make predictions across major drug or target classification boundaries.
One implication is that the de novo predictions are not restricted
to major drug or target classification boundaries.A major utility
of the developed tool is the ability to deliver
testable hypotheses with regard to repurposable drugs, thus significantly
reducing the search space for identifying potent applications of existing
drugs (that proved to meet ADMET requirements). The number of experiments
that can be efficiently conducted is usually limited, e.g. of the
order of 102 if not 101 for high-confidence
assays as opposed to the complete space of ∼1.5 million combinations
for the data set used in this study. The fact that the top-ranking
predictions exhibit a hit ratio of 59% (for the top 1000 predictions;
or 88% for top 100 predictions) suggest that de novo predictions made
by the presently introduced method of approach applied to increasingly
large data sets are likely to provide useful guidance for experimentally
testing, streamlining or prioritizing existing or investigational
drugs or new compounds. Another important byproduct is the probabilistic
assessments on potential side effects, a topic that will become increasingly
important with advances in personalized medicine.
Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971
Authors: Maureen E Hillenmeyer; Eula Fung; Jan Wildenhain; Sarah E Pierce; Shawn Hoon; William Lee; Michael Proctor; Robert P St Onge; Mike Tyers; Daphne Koller; Russ B Altman; Ronald W Davis; Corey Nislow; Guri Giaever Journal: Science Date: 2008-04-18 Impact factor: 47.728
Authors: Muhammed A Yildirim; Kwang-Il Goh; Michael E Cusick; Albert-László Barabási; Marc Vidal Journal: Nat Biotechnol Date: 2007-10 Impact factor: 54.908
Authors: D Lansing Taylor; Albert Gough; Mark E Schurdak; Lawrence Vernetti; Chakra S Chennubhotla; Daniel Lefever; Fen Pei; James R Faeder; Timothy R Lezon; Andrew M Stern; Ivet Bahar Journal: Handb Exp Pharmacol Date: 2019
Authors: Mary Regina Boland; Alexandra Jacunski; Tal Lorberbaum; Joseph D Romano; Robert Moskovitch; Nicholas P Tatonetti Journal: Wiley Interdiscip Rev Syst Biol Med Date: 2015-11-12