Literature DB >> 35249529

In silico drug repositioning based on integrated drug targets and canonical correlation analysis.

Hailin Chen1, Zuping Zhang2, Jingpu Zhang3.   

Abstract

BACKGROUND: Besides binding to proteins, the most recent advances in pharmacogenomics indicate drugs can regulate the expression of non-coding RNAs (ncRNAs). The polypharmacological feature in drugs enables us to find new uses for existing drugs (namely drug repositioning). However, current computational methods for drug repositioning mainly consider proteins as drug targets. Meanwhile, these methods identify only statistical relationships between drugs and diseases. They provide little information about how drug-disease associations are formed at the molecular target level.
METHODS: Herein, we first comprehensively collect proteins and two categories of ncRNAs as drug targets from public databases to construct drug-target interactions. Experimentally confirmed drug-disease associations are downloaded from an established database. A canonical correlation analysis (CCA) based method is then applied to the two datasets to extract correlated sets of targets and diseases. The correlated sets are regarded as canonical components, and they are used to investigate drug's mechanism of actions. We finally develop a strategy to predict novel drug-disease associations for drug repositioning by combining all the extracted correlated sets.
RESULTS: We receive 400 canonical components which correlate targets with diseases in our study. We select 4 components for analysis and find some top-ranking diseases in an extracted set might be treated by drugs interfacing with the top-ranking targets in the same set. Experimental results from 10-fold cross-validations show integrating different categories of target information results in better prediction performance than only using proteins or ncRNAs as targets. When compared with 3 state-of-the-art approaches, our method receives the highest AUC value 0.8576. We use our method to predict new indications for 789 drugs and confirm 24 predictions in the top 1 predictions.
CONCLUSIONS: To the best of our knowledge, this is the first computational effort which combines both proteins and ncRNAs as drug targets for drug repositioning. Our study provides a biologically relevant interpretation regarding the forming of drug-disease associations, which is useful for guiding future biomedical tests.
© 2022. The Author(s).

Entities:  

Keywords:  Canonical correlation analysis; Drug repositioning; Integrated targets

Mesh:

Substances:

Year:  2022        PMID: 35249529      PMCID: PMC8898485          DOI: 10.1186/s12920-022-01203-1

Source DB:  PubMed          Journal:  BMC Med Genomics        ISSN: 1755-8794            Impact factor:   3.063


Background

Over 100 years ago, the Nobel laureate Paul Ehrlich established his revolutionary ‘magic bullet’ concept, which has successfully inspired generations of chemists and pharmacologists to create target-specific drugs for disease treatment [1]. This declared paradigm has become a pragmatic criterion in drug discovery for the past decades. However, the interpretation of the magic bullet as a drug which acts through a single crucial target in an exclusive and highly specific way has been challenged, because increasing studies demonstrate drugs usually have multiple physiological targets rather than one target [2-4]. The polypharmacological feature in drugs enables us to find new indications (also known as drug repositioning [5]) for existing drugs. For instance, a study conducted by Skrott et al. [6] found that the metabolite of disulfiram binds to a new target NPL4, which is responsible for anti-cancer effects. Therefore, the old alcohol-aversion drug can be repurposed for tumour treatment. Meanwhile, unintended ‘off-targets’ may cause adverse drug reactions (ADR) [7], which would limit the use of drugs. It is therefore necessary to discover the real targets implicated in drug indications. There are 4 potential types of macromolecules in biological systems with which we can interfere using small-molecule drugs: proteins, polysaccharides, lipids and nucleic acids [8]. Previous research efforts were mainly made on the first type of molecular targets [9-12]. The most recent studies in pharmacogenomics have discovered that drugs can regulate the expression levels of two categories of ncRNAs, namely miRNAs and lncRNAs. For example, Smith et al. [13] revealed that the expression levels of 44 miRNAs are repressed during glucocorticoid-induced apoptosis. Guo et al. [14] identified aspirin can activate the expression of a lncRNA named OLA1P2 in human colorectal cancer. Given the intriguing fact that ncRNAs play significant roles in disease development [15-17], targeting these ncRNAs with small-molecule drugs offers another new and promising type of therapy for human diseases [18-23]. As traditional biomedical experiments are expensive and time-consuming, computational approaches provide an alternative tool for drug repositioning. For example, Chen et al. [24] exploited multiple heterogeneous data to integrate drug-disease network and drug–target network into one coherent model, and applied cross-network embedding to predict drug-disease associations for drug repositioning. A comprehensive and detailed survey on computational drug repositioning is available at Review [25]. Note that previous computational approaches for drug repositioning seldom take integrated target information into consideration. They usually exploit proteins as drug targets. We argue that integrating different types of targets would provide a better and more comprehensive understanding of drug’s MoA. Further, these methods discover only statistical associations between drugs and diseases at data level. They seldom investigate how drug-disease associations are formed at the molecular target level. In this paper, we first comprehensively select drug targets from proteins, miRNAs and lncRNAs to construct drug–target interactions. Therapeutically verified drug indications are downloaded to form drug-disease associations. Then, we apply a CCA-based method to extract correlated sets of targets and diseases. The correlated targets and diseases provide explanations of the forming of drug-disease associations. We finally predict novel drug-disease associations for drug repositioning by combining the correlated sets. Comprehensive experiments demonstrate using integrated target information not only improves prediction performance, but also provides a more extensive view of drug’s MoA. Case studies suggest some top predictions are confirmed by existing databases. When compared with other methods using the benchmark datasets in our study, our approach shows improvements in terms of AUC value.

Results

Preliminary analysis of the datasets

In total, we receive 1190 drugs with both target and indication information. For the 1190 drugs, we obtain 5331 drug–target interactions containing 1668 targets and 5869 drug-disease associations including 1111 diseases. An overview of the two datasets is available at Tables 1 and 2, respectively.
Table 1

Statistics of the drug–target interactions used in our manuscript

NameStatistics
# drugs1190
# total targets (including proteins, miRNAs and lncRNAs)1668
# proteins1167
# miRNAs348
# lncRNAs153
# total drug–target interactions5331
# drug–protein interactions4337
# drug–miRNA interactions825
# drug–lncRNA interactions169
Average number of targets for each drug4.5
Table 2

Statistics of the drug-disease associations used in our manuscript

NameStatistics
# drugs1190
# diseases1111
# drug-disease associations5869
Average number of associated diseases for each drug4.9
Statistics of the drug–target interactions used in our manuscript Statistics of the drug-disease associations used in our manuscript We further use a boxplot (Additional file 1) to describe the distribution of numbers of targets and indications of the 1190 drugs. We discover that there are 885 (74.4%) drugs whose target numbers are less than 4.5 (the average value) and 887 (80.0%) drugs whose indication numbers are less than 4.9 (the average value). Meanwhile, as a category of newly discovered targets, the number of experimentally supported drug–ncRNA interactions are far less than that of drug–protein interactions. We can conclude from the analysis that our knowledge about drug–target interactions and drug-disease associations is not complete.

Performance evaluation

In this study, we collect both proteins and ncRNAs as drug targets. We therefore separately use proteins, ncRNAs and integrated targets to conduct 10-fold cross-validation experiments. We use average AUC values for performance evaluation. The results are summarized in Table 3. We discover that integrating both proteins and ncRNAs results in better prediction performance than only using proteins or ncRNAs as targets. We also find that imposing sparsity constraint on CCA can improve prediction performance. Note almost all elements in the weight vectors in ordinary CCA (OCCA) are non-zero, indicating that OCCA cannot select a small number of features as informative drug targets and indications.
Table 3

Average AUC values received from the CCA methods based on 10-fold cross-validations

SCCA (proteins + ncRNAs)SCCA (ncRNAs)SCCA (proteins)OCCA (proteins + ncRNAs)OCCA (ncRNAs)OCCA (proteins)
AUC value0.85760.73910.85370.81070.72830.8106

The bold value indicated the highest one

Average AUC values received from the CCA methods based on 10-fold cross-validations The bold value indicated the highest one

Effects of parameters on cross-validation experiments

There are three parameters (c1, c2 and k) in our method. The parameters c1 and c2 are to control the sparsity. The parameter k is the number of canonical components. For simplicity, we choose the same value for c1 and c2. We comprehensively set the values of c1 and c2 in the range of [0.1, 0.9], and the value of k in the range of [60, 500] when conducting 10-fold cross validations. We list the average AUC values in Table 4. We find the best inference performance is achieved when c1 = c2 = 0.1, and k = 400.
Table 4

Average AUC values received based on 10-fold cross-validations by parameter tuning

k = 6080100200300400500
c1 = c2 = 0.10.81460.82440.82930.84630.85420.85760.8575
0.30.81240.81240.81070.80270.80120.80140.8003
0.50.81460.80990.80260.77530.77170.76860.7649
0.70.81600.81070.80430.77520.77020.76590.7645
0.90.81600.81060.80420.77510.77020.76590.7645

The bold value indicated the highest one

Average AUC values received based on 10-fold cross-validations by parameter tuning The bold value indicated the highest one

Investigating drug’s MoA at the molecular target level

Drugs exert their therapeutic effects through modulating their biological targets, and in turn promote healthy functioning of our metabolic system. As a drug usually has multiple targets, detecting the real target(s) implicated in a disease is critical for understanding drug’s MoA and for further drug repositioning. We obtain 400 canonical components (Additional file 2) which correlate targets with diseases. We use four components (#1, #3, #6 and #7) as examples to investigate the biological meaning of the extracted sets of targets and diseases. We select the top targets and diseases in each component for analysis. In component #1, there are 34 targets and 23 diseases with positive weight. We find from the database DisGeNET [26] that 4 high-ranking target proteins, Interleukin-1 beta (3rd), Caspase-1 (3rd), Caspase-3 (3rd) and Matrix metalloproteinase-9 (3rd), are associated with the top disease Periodontitis (1st). Two top-scoring targets, Interleukin-1 beta (3rd) and Matrix metalloproteinase-9 (3rd), are related with one top-scoring disease Cholera (4th). The target Caspase-3 (3rd) is associated with the disease Chlamydia trachomatis infection of genital structure (5th). Similar findings are discovered in component #3, #6 and #7. We list the confirmed top target-disease associations in the three components in Additional file 3, 4 and 5, respectively. Besides proteins, ncRNAs are found to be associated with diseases. For example, we discover in component #3 the top-ranking miRNA (miR-135b) is related with malignant neoplasm of thyroid (4th), malignant neoplasm of lung (6th) and breast carcinoma (7th), and the top-ranking miRNA (miR-520h) is associated with malignant neoplasm of lung (6th) and breast carcinoma (7th). These relationships are confirmed by the database HMDD [15]. In component #7, a lncRNA UCA1 (8th) is found to be related with Leukemia, Myeloid, Chronic-Phase (3rd), which is verified by the database LncRNADisease [16]. Based on these findings, we presume drugs may act on the top-ranking targets in one canonical component to treat the top-ranking diseases in the same component.

Comparison with other methods

As mentioned before, this is the first computational effort using integrated targets for drug repositioning. Previous computational approaches for drug repositioning were developed based on different data features they analysed. We therefore choose 3 other methods which can take our datasets as inputs for comparison. The 3 baseline methods are as follows: DBSI [27]: a collaborative-filtering-based method using chemical similarity for drug–target interaction prediction. SDTNBI [28]: an integrated tool for large-scale drug–target interaction prediction using chemical substructures. MLKNN [29]: a multi-label k-nearest neighbour method for drug side effect prediction. To make fair comparison, we apply the 3 methods to our datasets and use 10-fold cross-validations for prediction performance comparison. For the method DBSI, we calculate drug–drug similarity according to Jaccard score based on their target information. This strategy of similarity calculation has been applied in other studies [30, 31]. The received AUC values for these methods are shown in Table 5. We perform Wilcoxon rank sum tests between SCCA and the other 3 methods based on the AUC values. The calculated p values are available at Table 6. The experimental results demonstrate our approach SCCA performs best in the 4 methods. Note that the other 3 methods cannot provide clues for biological interpretation.
Table 5

Comparison of average AUC values with existing methods based on 10-fold cross-validations

SCCADBSISDTNBIMLKNN
AUC value0.8576 ± 0.00050.8413 ± 0.00220.8395 ± 0.00100.7945 ± 0.0002

The bold value indicated the highest one

Table 6

The p-values received from Wilcoxon rank sum tests

DBSISDTNBIMLKNN
p value between SCCA and another method based on AUC values1.6305E−041.7168E−041.6973E−04
Comparison of average AUC values with existing methods based on 10-fold cross-validations The bold value indicated the highest one The p-values received from Wilcoxon rank sum tests

New indication prediction for existing drugs

After confirming the prediction ability, we further apply our method to those drugs, which are not in the benchmark datasets but whose target information is available, for their new indication predictions. There are 789 drugs of such kind. All known information, including drug–target interactions and drug-disease associations, in our gold-standard datasets is used for training. The potential indications are prioritized based on the prediction scores in descending order according to the method SCCA. We list the top 50 predicted results of the 789 drugs in Additional file 6 for future screenings. We further validate the top k (k = 5, 10, 20, 30 and 50) predictions by checking the public database CTD [32], a knowledgebase that houses information of chemicals, genes, phenotypes, diseases and exposures to advance understanding about human health. As this database contains both inferred and curated records, we only select curated drug-disease associations for prediction confirmation. The numbers of confirmed drug indications in the top k predictions are illustrated in Fig. 1. Because of space limitation, we only report the top 1 drug indication predictions supported by CTD in Table 7. More detailed information of the verified drug-disease associations in the top 50 predictions is available at Additional file 7. The excellent results indicate our method can be applied in real situations.
Fig. 1

The numbers of validated indications by CTD in the top k predictions for the 789 drugs

Table 7

The confirmed results in the top 1 drug indication predictions by CTD

Drug nameDisease nameRanking in the prediction listEvidence
TroglitazoneHypertriglyceridemiaTop 1CTD
MethysergideMigraine disordersTop 1CTD
RopivacainePruritusTop 1CTD
Tenofovir disoproxilHIV infectionsTop 1CTD
RemoxiprideSchizophreniaTop 1CTD
RosiglitazoneHypercholesterolemiaTop 1CTD
CerivastatinHypercholesterolemiaTop 1CTD
MeperidinePainTop 1CTD
DronabinolObesityTop 1CTD
PhenindioneThromboembolismTop 1CTD
AmodiaquineMalaria, falciparumTop 1CTD
AlfentanilPainTop 1CTD
Risedronic acidOsteoporosis, postmenopausalTop 1CTD
LevobupivacainePruritusTop 1CTD
KetaminePainTop 1CTD
SulfadoxineMalaria, falciparumTop 1CTD
MethotrimeprazineSchizophreniaTop 1CTD
AcenocoumarolThromboembolismTop 1CTD
DiamorphinePainTop 1CTD
PimavanserinSchizophreniaTop 1CTD
CiprofibrateHypertriglyceridemiaTop 1CTD
Vitamin dHypoparathyroidismTop 1CTD
ElagolixEndometriosisTop 1CTD
mg132Multiple myelomaTop 1CTD
The numbers of validated indications by CTD in the top k predictions for the 789 drugs The confirmed results in the top 1 drug indication predictions by CTD

Discussion

Uncovering drug’s MoA is of great importance for drug repositioning. In vivo and in vitro experiments are useful but expensive tools to address the problem. Our CCA-based computational method provides an alternative to revealing the targets which are implicated in drug indications, and results suggest the extracted sets of targets and diseases are biologically meaningful. Compared with previous studies, we integrate both proteins and ncRNAs as drug targets. Experiments further demonstrate using integrated targets improves prediction performance. Even though, our proposed method has been shown to be useful in drug repositioning. Some limitations in this study need to be pointed out. First, our method depends heavily on known drug–target interactions and drug-disease associations. As we know, many drug targets (especially drug–ncRNA interactions) and drug indications have not been discovered. The incompleteness of data would result in biased prediction results. We expect combining more experimentally confirmed drug–target interactions and drug-disease associations would provide more reliable predictions. Meanwhile, there are 3 parameters in our method. Selecting appropriate values for the 3 parameters to receive optimal results is a challenging task. Third, the numbers of extracted components are determined by the parameter k (see Eq. 4) in our method, and different numbers of extracted components would influence our interpretation of drug’s MoA. More recently, a growing number of studies [24, 33–43]are exploiting both features from drugs and diseases for drug repositioning. Integrating these features may provide more reliable prediction results. Another trend in drug repositioning is drug combinations [44-46] (see Review [47] for more details), which can result in low adverse side effects and high treatment efficacy compared to single drug administration. We believe these efforts offer help with drug discovery and disease treatment from different perspectives.

Conclusions

In this study, we apply a CCA-based method to extract correlated sets of targets and diseases, and the correlated targets and diseases provide clues for explaining drug’s MoA for drug repositioning. We further propose a prediction scheme for drug repositioning based on the extracted correlated sets. Experimental results of cross-validations indicate that integrating different categories of targets and imposing sparsity constraint on CCA improve prediction performance. Case studies demonstrate that some of the top predictions by our method are supported by literature. Moreover, our method shows improvement in prediction accuracy when compared with other approaches. We expect that our study offers a useful tool for drug repositioning.

Methods

Data preparation

We collect two datasets, namely drug–target interactions and drug-disease associations, from public databases for our study. The two datasets are regarded as gold-standard data. We use the benchmark datasets to evaluate the performance of our method. We also use the two datasets as training datasets for comprehensive indication prediction. For drug–target interactions, we integrate 3 categories of macromolecules (proteins, miRNAs and lncRNAs) as drug targets. We obtain drug–protein interactions from DrugBank [48], a freely available web resource containing detailed information about drugs, their mechanisms, their interactions and their targets. We only select small molecule drugs and approved targets in DrugBank in our study. We download drug–miRNA interactions and drug–lncRNA interactions from SM2miR [49] and D-lnc [50], respectively. The two databases separately provide comprehensive repositories to detect the modification of drugs on miRNA and lncRNA expression. We restrict the species to Homo sapiens in both databases. We do not take inferred results in D-lnc for consideration. Drug-disease associations are received from repoDB [51], a database consisting of approved and failed drugs and their indications. We only keep the approved drug-indication pairs in the database in our datasets.

Method description

Suppose that we have a set of m drugs with p molecular target features and q disease features. We denote each drug by a target feature vector t = (t1, t2, t3, … t) and by a disease feature vector d = (d1, d2, d3, … d), where t (or d) is represented for the presence or absence of a target (or a disease) by 1 or 0, respectively. Consider two linear combinations for targets and diseases as and (i = 1, 2, 3, …, m), where α = (α1, α2, α3, … α) and β = (β1, β2, β3, … β)are weight vectors. We apply canonical correlation analysis [52] to find weight vectors α and β which maximize the following correlation coefficient: Let X denote an m × p matrix and Y denote an m × q matrix. Then the maximization problem can be formally rewritten as follows: We refer to it as ordinary canonical correlation analysis (OCCA). OCCA usually results in vectors α and β with many non-zero elements. To impose sparsity on α and β, we choose to add penalties to (2) like reference [53-55] and the maximization problem is considered as:where c1 and c2 are parameters to control the sparsity. We refer to this as sparse canonical correlation analysis (SCCA). We apply a strategy of penalized matrix decomposition (PMD) [56] to the matrix to obtain the weight vectors α and β. To receive multiple canonical variates, we use a deflation manipulation iteratively as follows:where and are the weight vectors, and d is the singular value obtained in each iteration step. We choose targets and diseases in the k pairs of weight vectors with the highest values as correlated sets. To predict new indications for a drug with a known target vector x, we compute the scores of y by combining the k pairs of weight vectors according to the following equation: The elements in y with the highest scores are chosen as the predicted indications for the drug. This prediction strategy was used in previous studies [53, 54]. The workflow of our method is depicted in Fig. 2.
Fig. 2

The workflow of our proposed method. Drug–target interactions and drug-disease associations are first downloaded from public databases. CCA is then applied to the two datasets to extract correlated sets. Finally, new drug-disease associations are predicted by combining the extracted sets. The top predictions are selected as new indications for drugs of interest.

The workflow of our proposed method. Drug–target interactions and drug-disease associations are first downloaded from public databases. CCA is then applied to the two datasets to extract correlated sets. Finally, new drug-disease associations are predicted by combining the extracted sets. The top predictions are selected as new indications for drugs of interest.

Evaluation metrics

In order to test the prediction performance of our method, we implement 10-fold cross-validations on the drugs. We split the whole drugs into 10 subsets of roughly equal sizes, and each subset is used in turn as a test set. We train our method on the remaining 9 subsets. We prioritize the inferred drug-disease associations according to the prediction scores (see Eq. (5)). Setting different thresholds, true positive rate (TPR) and false positive rate (FPR) are calculated to plot ROC curves. Area under ROC curve (AUC) values are computed for performance evaluation. To obtain robust results, we repeated the cross-validation experiments 10 times. Moreover, we comprehensively predict novel drug-disease associations for drug repositioning for the drugs not included in the benchmark datasets. We analyse the top-ranked results by searching evidence from the public database CTD [32]. Note we only choose curated records of drug indications in this database for prediction confirmation. Additional file 1. Distribution of numbers of targets and indications of the 1190 drugs. Additional file 2. The extracted 400 correlated sets by SCCA. Additional file 3. Confirmed top-ranking target-disease associations in component #3. Additional file 4. Confirmed top-ranking target-disease associations in component #6. Additional file 5. Confirmed top-ranking target-disease associations in component #7. Additional file 6. The top 50 predicted indications for the 789 drugs. Additional file 7. The verified drug-disease associations in the top 50 predictions. Additional file 8. The source code and data sets used in this study.
  52 in total

1.  Targeting microRNAs with small molecules: from dream to reality.

Authors:  S Zhang; L Chen; E J Jung; G A Calin
Journal:  Clin Pharmacol Ther       Date:  2010-04-28       Impact factor: 6.875

Review 2.  Protein promiscuity and its implications for biotechnology.

Authors:  Irene Nobeli; Angelo D Favia; Janet M Thornton
Journal:  Nat Biotechnol       Date:  2009-02       Impact factor: 54.908

3.  Chemical biology and the limits of reductionism.

Authors:  Randall T Peterson
Journal:  Nat Chem Biol       Date:  2008-11       Impact factor: 15.040

Review 4.  A survey of current trends in computational drug repositioning.

Authors:  Jiao Li; Si Zheng; Bin Chen; Atul J Butte; S Joshua Swamidass; Zhiyong Lu
Journal:  Brief Bioinform       Date:  2015-03-31       Impact factor: 11.622

5.  SM2miR: a database of the experimentally validated small molecules' effects on microRNA expression.

Authors:  Xinyi Liu; Shuyuan Wang; Fanlin Meng; Jizhe Wang; Yan Zhang; Enyu Dai; Xuexin Yu; Xia Li; Wei Jiang
Journal:  Bioinformatics       Date:  2012-12-05       Impact factor: 6.937

6.  Predicting drug side-effect profiles: a chemical fragment-based approach.

Authors:  Edouard Pauwels; Véronique Stoven; Yoshihiro Yamanishi
Journal:  BMC Bioinformatics       Date:  2011-05-18       Impact factor: 3.169

7.  The aspirin-induced long non-coding RNA OLA1P2 blocks phosphorylated STAT3 homodimer formation.

Authors:  Haiyan Guo; Jun Liu; Qiwen Ben; Yuehong Qu; Man Li; Ying Wang; Wantao Chen; Jianjun Zhang
Journal:  Genome Biol       Date:  2016-02-22       Impact factor: 13.583

8.  Drug repositioning based on bounded nuclear norm regularization.

Authors:  Mengyun Yang; Huimin Luo; Yaohang Li; Jianxin Wang
Journal:  Bioinformatics       Date:  2019-07-15       Impact factor: 6.937

9.  Comparative Toxicogenomics Database (CTD): update 2021.

Authors:  Allan Peter Davis; Cynthia J Grondin; Robin J Johnson; Daniela Sciaky; Jolene Wiegers; Thomas C Wiegers; Carolyn J Mattingly
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

10.  SynergyFinder 2.0: visual analytics of multi-drug combination synergies.

Authors:  Aleksandr Ianevski; Anil K Giri; Tero Aittokallio
Journal:  Nucleic Acids Res       Date:  2020-07-02       Impact factor: 16.971

View more
  1 in total

1.  Pharmacophore-Model-Based Drug Repurposing for the Identification of the Potential Inhibitors Targeting the Allosteric Site in Dengue Virus NS5 RNA-Dependent RNA Polymerase.

Authors:  Sanjay Kumar; Leena H Bajrai; Arwa A Faizo; Aiah M Khateb; Areej A Alkhaldy; Rashmi Rana; Esam I Azhar; Vivek Dhar Dwivedi
Journal:  Viruses       Date:  2022-08-20       Impact factor: 5.818

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.