| Literature DB >> 27716836 |
Hansaim Lim1, Aleksandar Poleksic2, Yuan Yao3, Hanghang Tong4, Di He1, Luke Zhuang5, Patrick Meng6, Lei Xie1,7.
Abstract
Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale. Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27716836 PMCID: PMC5055357 DOI: 10.1371/journal.pcbi.1005135
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
The symbols and the descriptions for numerical calculations
| Symbol | Definition and Description |
|---|---|
| The adjacency matrix of the known drug-target associations | |
| The chemical-chemical and the target-target similarity matrices | |
| The chemical-chemical similarity score for the chemicals | |
| The Tanimoto dissimilarity coefficient for the chemicals | |
| The target-target similarity score for the query protein | |
| The bit score for the query protein | |
| The degree matrices of | |
| The chemical-side and the target-side low-rank approximation matrices | |
| The element of | |
| The | |
| The | |
| The transpose matrix of | |
| The trace of | |
| The penalty weight on observed and unobserved associations which indicate the reliability of assigned probability of true association | |
| The imputed value (i.e. the probability of unobserved associations as real associations | |
| The regularization parameter to prevent overfitting | |
| The importance parameter for chemical-chemical similarity | |
| The importance parameter for protein-protein similarity | |
| The rank of the low-rank approximation matrices | |
| The number of maximum iterations to minimize the objective function | |
| The raw prediction score by REMAP for the |
Fig 1The overall process of REMAP. The rectangular boxes with capitalized symbols are matrices, and the smaller boxes and ovals are chemicals and proteins, respectively, in the simplified network representation (top-left corner).
Solid lines within the network represent connectivity (edges), and the arrows represent mathematical processes. Red squares represent single similarity values, and blue bars in U and V represent row and column vectors. Lower-case c and p represents chemicals and proteins, respectively. The letter symbols are annotated in Table 1.
Fig 2(A) REMAP score distributions for active (blue), inactive (orange), and ambiguous (green) pairs. For each bin of raw prediction scores (x-axis, bin width = 0.05), the number of pairs found in the bin was divided by the total of the type of data (total numbers in the plot). Raw prediction scores over 1.10 were regarded as outliers and not included in the figure. Active pairs were obtained from the ZINC and the ChEMBL databases, and inactive, and ambiguous pairs were obtained from the ChEMBL database. (B) Adjusted scores for each bin of raw prediction scores (x-axis, same bin width as A). Adjustment by the counts only (blue) and adjustment with weighted counts (orange). A weight of 5.25 was given for the counts of inactive pairs as explained in the prediction score adjustment section.
Fig 3Performance comparison for REMAP (green), PRW (blue), and NRLMF (orange).
NT2 (2 known targets per chemical) datasets used for varying number of ligands (A) and chemical structural similarity (B). Performance measurement explained in the measuring prediction accuracy of REMAP by TPR vs. cutoff rank section. (A) Performance comparison on the datasets with varying number of ligands per protein. For example, the x-axis of L11to15 means that the proteins of interest have between 11 and 15 known chemicals to bind. (B) Performance comparison on the datasets with the ranges of chemical structural similarity of the tested chemicals to the trained chemicals. For instance, the x-axis of Tc0.6to0.7 means that for the tested chemicals, at least one trained chemical was found such that and no trained chemical was found in greater similarity than 0.7. All TPR values are based on 10-fold cross validation. Error bars represents s.e.m. Asterisks represents statistical significance based one t-test of the 10 TPR values (* for p < 0.05, ** for p < 0.001).
Fig 4Performance comparison for REMAP (green), PRW (blue), and NRLMF (orange).
NT3 (3 or more known targets per chemical) datasets used for varying number of ligands (A) and chemical structural similarity (B). Performance measurement explained in the measuring prediction accuracy of REMAP by TPR vs. cutoff rank section. (A) Performance comparison on the datasets with varying number of ligands per protein. For example, the x-axis of L21more means that the proteins of interest have 21 or more known chemicals to bind. (B) Performance comparison on the datasets with the ranges of chemical structural similarity of the tested chemicals to the trained chemicals. For instance, the x-axis of Tc0.5to0.6 means that for the tested chemicals, at least one trained chemical was found such that and no trained chemical was found in greater similarity than 0.6. All TPR values are based on 10-fold cross validation. Error bars represents s.e.m. Asterisks represents statistical significance based one t-test of the 10 TPR values (* for p < 0.05, ** for p < 0.001).
Fig 5Performance of REMAP according to the amount of the chemical-chemical or the protein-protein similarity information used for its 10-fold cross validation on the ZINC dataset.
(A) True Positive Rate at the given cutoff rank. All available chemical and protein similarity information included (blue), a half of chemical-chemical similarity was ignored (orange), and the entire chemical-chemical similarity was ignored (green). (B) The blue line is the same as A. A half of protein-protein similarity matrix was ignored (gray), and the entire protein-protein similarity was ignored (red).
Fig 6Performance of REMAP according to the importance parameters for the chemical-chemical (p) or the protein-protein (p) similarity information used for its 10-fold cross validation on the ZINC dataset.
(A) The chemical-chemical similarity importance parameter, p, was controlled while p = 0.1 fixed. (B) The protein-protein similarity importance parameter, p, was controlled while p = 0.1 fixed.
Fig 7Average running times of REMAP using a single core node with 2.88 GB of memory. All running times are in seconds.
(A) Average running times on the ZINC dataset (12,384 chemicals and 3,500 proteins) according to the low-rank (r). The linear fit with R2 = 0.9856 (orange line). (B) Average running times according to the number of proteins (columns) from 1,000 to 20,000. The number of chemicals (rows) were fixed to 200,000. Error bars represent s.e.m., with n ≥ 15 for (A) and n ≥ 30 for (B).
The known uses and target information for the anti-cancer drug cluster in Fig 8B obtained from DrugBank.
The known targets are in UniProt Accession. The target information from UniProt is in S1 Table.
| Drug name | Approved treatment(s) | Known binding target(s) | Principal mode of action |
|---|---|---|---|
| Albendazole | Parenchymal neurocysticercosis | F1L7U3, Q71U36, P68371, P83223 | Tubulin polymerization inhibitor |
| Aprepitant | Antiemetic | P25103 | Substance P/Neurokinin NK1 receptor antagonist |
| Carbidopa hydrate | Reduce adverse effects of levodopa in Parkinson disease treatment | P20711 | DOPA decarboxylase inhibitor |
| Colchine | Gout | Q9H4B7, P07437 | N/A (depolymerize microtubule) |
| Griseofulvin | Ringworm infection | P10875, P87066, Q99456 | N/A |
| Mebendazole | Anthelmintic | Q71U36, P68371 | Tubulin polymerization inhibitor |
| Niclosamide | Anthelmintic against tapeworm infections | P40763, O60674, P12931 | disrupt oxidative phosphorylation |
| Aza-epothilone B | Breast cancer | Q13509 | Microtubule stabilizer |
| Bosutinib | Chronic Myelogenous Leukemia | P11274, P00519, P07948, P08631, P12931, P24941, Q02750, P36507, Q9Y2U5, Q13555 | Tyrosin kinase inhibitor |
| Cabazitaxel | Prostate cancer | P68366, Q9H4B7 | Microtubule stabilizer |
| Crizotinib | Non-small cell lung cancer | Q9UM73, P08581 | Anaplastic lymphoma kinase inhibitor |
| Dabrafenib | Metastatic melanoma | P15056, P04049, P57059, Q8NG66, P53667 | Inhibitor of some mutant BRAF kinases |
| Dasatinib | Chronic myeloid leukemia | P00519, P12931, P29317, P06239, P07947, P10721, P09619, P51692, P24684, P06241 | BRC/ABL and Src family tyrosine kinase inhibitor |
| Docetaxel | Breast, ovarian and non-small cell lung cancer | Q9H4B7, P10415, P11137, P27816, P10636, O75469 | Microtubule stabilizer |
| Erlotinib | Non-small cell lung cancer, pancreatic cancer | P00533, O75469 | N/A (EGFR inhibitor) |
| Gefitinib | Non-small cell lung cancer | P00533 | EGFR inhibitor |
| Imatinib | Chronic myelogenous leukemia | A9UF02, P10721, O43519, P04629, P07333, P16234, Q08345, P00519, P09619 | Tyrosine kinase inhibitor |
| Nilotinib | Various leukemias (investigational) | P00519, P10721 | Tyrosine kinase inhibitor |
| Paclitaxel | Lung, ovarian and breast cancers | P10415, Q9H4B7, O75469, P27816, P11137, P10636 | Microtubule stabilizer |
| Pazopanib | Renal cell cancer and soft tissue sarcoma | P17948, P35968, P35916, P16234, P09619, P10721, P22607, Q08881, P05230, Q9UQQ2 | Tyrosine kinase inhibitor |
| Ponatinib | Chronic myeloid leukemia | P00519, P11274, P10721, P07949, Q02763, P36888, P11362, P21802, P22607, P22455, P06239, P12931, P07948, P35968, P16234 | Bcr-Abl tyrosine kinase inhibitor |
| Regorafenib | Metastatic colorectal cancer and gastrointestinal stromal tumors | P07949, P17948, P35968, P35916, P10721, P16234, P09619, P11362, P21802, Q02763, Q16832, P04629, P29317, P04049, P15056, P15759, P42685, P00519 | Multiple kinases inhibitor |
| Ruxolitinib | Myelofibrosis | P23458, O60674 | Janus Associated Kinases (JAK) 1 and 2 inhibitor |
| Sorafenib | Renal cell carcinoma | P15056, P04049, P35916, P35968, P36888, P09619, P10721, P11362, P07949, P17948 | Inhibitor of Raf kinase, PDGF, VEGFR 2 and 3 |
| Sunitinib | Renal cell carcinoma and gastrointestinal stromal tumor | P09619, P17948, P10721, P35968, P35916, P36888, P07333, P16234 | Multi-targeted receptor tyrosine kinase inhibitor |
| Trametinib | Metastatic melanoma | Q02750, P36507 | Allosteric inhibitor of mitogen-activated extracellular signal regulated kinase 1 and 2 |
| Vandetanib | Broad range tumor types | P15692, P00533, Q13882, Q02763 | Inhibitor of VEGFR |
| Vinblastine | Breast, testicular cancers, lymphomas, neuroblastoma | Q71U36, P07437, Q9UJT1, P23258, Q9UJT0, P05412 | N/A (inhibition of mitosis at metaphase) |
| Vincristine | Acute lymphocytic leukemia, lymphomas, neuroblastoma, rhabdomyosarcoma | P07437, P68366 | N/A (inhibition of mitosis at metaphase) |
| Vindesine | Acute leukemia, malignant lymphoma, Hodgkin’s disease, acute erythraemia, acute panmyelosis | Q9H4B7 | Inhibition of mitosis at metaphase |
| Vinorelbine | Non-small cell lung carcinoma | P07437 | N/A (inhibition of mitosis at metaphase) |
Fig 8(A) The drug clusters created based on the profile similarity with the anti-cancer drug cluster in the middle (darker blue grid). (B) The clusters of FDA-approved anti-cancer drugs. A set of 25 known anti-cancer drugs (blue boxes), and another set of 7 FDA-approved drugs that are closely linked to the former set but have not yet been approved for anti-cancer treatment (darker blue boxes). Procedures explained in the drug-target interaction profile analysis for drug repurposing section.