| Literature DB >> 27893735 |
Edgar D Coelho1, Joel P Arrais2, José Luís Oliveira1.
Abstract
De novo experimental drug discovery is an expensive and time-consuming task. It requires the identification of drug-target interactions (DTIs) towards targets of biological interest, either to inhibit or enhance a specific molecular function. Dedicated computational models for protein simulation and DTI prediction are crucial for speed and to reduce the costs associated with DTI identification. In this paper we present a computational pipeline that enables the discovery of putative leads for drug repositioning that can be applied to any microbial proteome, as long as the interactome of interest is at least partially known. Network metrics calculated for the interactome of the bacterial organism of interest were used to identify putative drug-targets. Then, a random forest classification model for DTI prediction was constructed using known DTI data from publicly available databases, resulting in an area under the ROC curve of 0.91 for classification of out-of-sampling data. A drug-target network was created by combining 3,081 unique ligands and the expected ten best drug targets. This network was used to predict new DTIs and to calculate the probability of the positive class, allowing the scoring of the predicted instances. Molecular docking experiments were performed on the best scoring DTI pairs and the results were compared with those of the same ligands with their original targets. The results obtained suggest that the proposed pipeline can be used in the identification of new leads for drug repositioning. The proposed classification model is available at http://bioinformatics.ua.pt/software/dtipred/.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27893735 PMCID: PMC5125559 DOI: 10.1371/journal.pcbi.1005219
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Confusion matrix of external validation data set classification.
| Predicted positive | Predicted negative | |
|---|---|---|
| Condition positive | 2,792 (TP) | 542 (FN) |
| Condition negative | 69 (FP) | 5,126 (TN) |
TN–true negative; FP–false positive; FN–false negative; TP–true positive
Top ten best putative drug-targets.
| STRING ID | UniProt ID | Protein name | SC | BC |
|---|---|---|---|---|
| SACOL2213 | Q5HDY4 | DNA-directed RNA polymerase subunit alpha | 1.85E+23 | 0.0329 |
| SACOL0591 | Q5HID0 | 30S ribosomal protein S12 | 2.76E+23 | 0.0198 |
| SACOL0588 | Q5HID3 | DNA-directed RNA polymerase subunit beta | 1.17E+23 | 0.0178 |
| SACOL2675 | Q5HCP4 | Accessory Sec system protein translocase subunit SecY2 | 1.01E+23 | 0.0128 |
| SACOL1292 | Q5HGF8 | 30S ribosomal protein S15 | 2.65E+23 | 0.0112 |
| SACOL0593 | Q5HIC8 | Elongation factor G | 2.82E+23 | 0.0093 |
| SACOL2234 | Q5HDW3 | 50S ribosomal protein L22 | 3.29E+23 | 0.0049 |
| SACOL2233 | Q5HDW4 | 30S ribosomal protein S3 | 3.11E+23 | 0.0047 |
| SACOL2207 | Q5HDZ0 | 50S ribosomal protein L13 | 2.94E+23 | 0.0046 |
| SACOL0545 | Q5HIH4 | 50S ribosomal protein L25 | 1.06E+23 | 0.0045 |
SC–Subgraph centrality; BC–Betweenness centrality
Five best scoring putative drug-target interactions.
| UniProt ID | Protein Name | ZINC ID | Ligand name | Class probability |
|---|---|---|---|---|
| Q5HIC8 | Elongation factor G | ZINC85537089 | Proglumetacin maleate | 0.93 |
| Q5HID3 | DNA-directed RNA polymerase subunit beta | ZINC01550477 | Lapatinib | 0.93 |
| Q5HID3 | DNA-directed RNA polymerase subunit beta | ZINC85537027 | Tacrolimus | 0.92 |
| Q5HCP4 | Accessory Sec system protein translocase subunit SecY2 | ZINC19418959 | Trifluoperazine dihydrochloride | 0.92 |
| Q5HIC8 | Elongation factor G | ZINC01535101 | Rosuvastatin calcium | 0.91 |
Fig 1ProSA-web overall model quality output for Q5HIC8 (left) and Q5HCP4 (right), respectively.
Panels show these proteins are within the range of scores typically found for proteins of similar size.
Results of the molecular docking experiments performed for predicted and real (benchmark) DTIs.
| ZINC ID | UniProt ID | Target type | PDB ID | SD | AD4 |
|---|---|---|---|---|---|
| 85537089 | Q5HIC8 | Predicted | N/A | -2.69 | -2.03 |
| P23219 | Real | 1CQE | -2.06 | -3.93 | |
| P35354 | Real | 5F19 | -2.29 | -3.98 | |
| 19418959 | Q5HCP4 | Predicted | N/A | -0.89 | -5.69 |
| P63316 | Real | 1J1D | -1.50 | -5.28 | |
| P62158 | Real | 1CLL | -1.29 | -7.16 | |
| P26447 | Real | 2Q91 | -0.59 | -6.83 | |
| P14416 | Real | 5AER | -1.04 | -5.56 | |
| 01535101 | Q5HIC8 | Predicted | N/A | -2.90 | -1.59 |
| P04035 | Real | 1DQ8 | -2.20 | -2.63 |
Values are presented in cal/mol. SD–SwissDock; AD4 –AutoDock4.
Fig 2Diagram of the proposed pipeline.
Fig 3Data set construction.