| Literature DB >> 31507413 |
Yan A Ivanenkov1,2,3,4, Alex Zhavoronkov4, Renat S Yamidanov1,4, Ilya A Osterman3,5, Petr V Sergiev5,6, Vladimir A Aladinskiy2,4, Anastasia V Aladinskaya2,4, Victor A Terentiev1,2,4, Mark S Veselov1,2,4, Andrey A Ayginin1,2, Victor G Kartsev7, Dmitry A Skvortsov3,8, Alexey V Chemeris1, Alexey Kh Baimiev1, Alina A Sofronova9, Alexander S Malyshev10, Gleb I Filkov2, Dmitry S Bezrukov3,5, Bogdan A Zagribelnyy3, Evgeny O Putin11, Maria M Puchinina2, Olga A Dontsova3,5,6.
Abstract
Many pharmaceutical companies are avoiding the development of novel antibacterials due to a range of rational reasons and the high risk of failure. However, there is an urgent need for novel antibiotics especially against resistant bacterial strains. Available in silico models suffer from many drawbacks and, therefore, are not applicable for scoring novel molecules with high structural diversity by their antibacterial potency. Considering this, the overall aim of this study was to develop an efficient in silico model able to find compounds that have plenty of chances to exhibit antibacterial activity. Based on a proprietary screening campaign, we have accumulated a representative dataset of more than 140,000 molecules with antibacterial activity against Escherichia coli assessed in the same assay and under the same conditions. This intriguing set has no analogue in the scientific literature. We applied six in silico techniques to mine these data. For external validation, we used 5,000 compounds with low similarity towards training samples. The antibacterial activity of the selected molecules against E. coli was assessed using a comprehensive biological study. Kohonen-based nonlinear mapping was used for the first time and provided the best predictive power (av. 75.5%). Several compounds showed an outstanding antibacterial potency and were identified as translation machinery inhibitors in vitro and in vivo. For the best compounds, MIC and CC50 values were determined to allow us to estimate a selectivity index (SI). Many active compounds have a robust IP position.Entities:
Keywords: Kohonen-based SOM; machine learning techniques; novel antibacterials; translation inhibitors; virtual screening
Year: 2019 PMID: 31507413 PMCID: PMC6719509 DOI: 10.3389/fphar.2019.00913
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
In silico models for the development of novel antibacterial compounds.
| No. | Ntotal | Nantibiotics | Number of variables | Techniquea) | Overall accuracyb) (%) | Ref. |
|---|---|---|---|---|---|---|
|
| 111 | 60 | 7 | LDA | 93.8/91.5** | ( |
| ANN | 89.0/97.9** | |||||
|
| 664 | 249 | 62 | ANN | 94.8** | ( |
|
| 59 | 24 | 17 | LDA | 85.0/84.0*** | ( |
|
| 661 | 249 | 6 | LDA | 92.6/93.6* | ( |
| BLR | 94.7/94.3* | |||||
|
| 664 | 249 | 3 | LDA | 90.1** | ( |
| BLR | 92.1** | |||||
|
| 351 | 213 | 7 | LDA | 91.0/89.0*** | ( |
|
| 433 | 217 | 6 | LDA | 85.7/87.5** | ( |
| 62 | ANN | 98.7/91.4** | ||||
|
| 667 | 363 | 7 | LDA | 92.9/94.0** | ( |
|
| 657 | 249 | 34 | ANN | 92.9**/100.0*** | ( |
|
| 2,030 | 1,006 | 8 | LDAc) | 90.4/89.3**/93.1*** | ( |
|
| 4,346 | 520 | 62 | kNN | 95.0/95.0*/84.4*** | ( |
|
| 611 | 230 | 36 | SVC | 100.0*/100.0**/98.1*** | ( |
| kNN | 97.7**/96.1*** | |||||
| DT | 98.6*/92.3**/91.0*** | |||||
|
| 7,517 | 2,066 | 21 | kNNc) | 99.2*/81.8**/78.3*** | ( |
|
| 2,230 | 1,051 | 3 | LDAc) | 85.6/87.2**/86.2*** | ( |
|
| 3,500 | 628 | 4 | ISE | 94.6/72.0*** | ( |
a)LDA, linear discriminant analysis; ANN, artificial neural network; BLR, binary logistic regression; kNN, k-nearest neighbors; MLR, multiple linear regression; SVC, support vector classification; DT, decision tree; ISE, iterative stochastic elimination. b)*Cross-validation; **internal test set; ***external test set. c)Models that demonstrated the highest quality with external test set
Key features of the training dataset.
| Number of compounds | Active | Inactive | Diversity* | Unique heterocycles | Clusters** | Av. cluster size | Singletons | ||
|---|---|---|---|---|---|---|---|---|---|
| All | Active | Inactive | |||||||
| 74,567 | 8,724 | 65,843 | 0.86 | 3,961 | 1,146 | 3,370 | 2,021 | 15 | 22,521 |
*Reverse Tanimoto metrics; **min. cluster size, 5; max. cluster size, 30; similarity threshold, 0.5.
Figure 1Representative examples of molecular descriptors included in the final set of input variables; HBD, number of potential H-bond donors, Hy, hydrophilic factor, RB, number of free-rotatable bonds; LogP_VSA, reflects hydrophobic and hydrophilic effects; TPSA, polar surface area; HBA, number of potential H-bond acceptors.
Figure 2A 30 × 30 2D Kohonen SOM for discrimination between antibacterial (A) and nonantibacterial (B) compounds within the same map. Color gradient corresponds to the percentage of molecules. Basic contours of the map were smoothed for a convenient visual inspection.
Figure 3A brief statistical analysis on basic nonheterocyclic (A) and heterocyclic (B) fragments presented in antibacterial and nonantibacterial compounds.
Representative examples of active compounds that were correctly predicted as antibacterials. The detailed biological results are presented in .
| ID | Structure | Activity | ID (from database) | MIC (µg/ml, ΔtolC) | Mechanism of action | In vitro translation | 14C-test | SI* | IP** |
|---|---|---|---|---|---|---|---|---|---|
|
| ++++ | - | 0.016 ± 0.009 | SOS | − | − | H | - | |
|
| ++++ | - | 2.5 ± 0.5 | T | + | + | M | - | |
|
| +++ | STOCK1S-88700 | 1.8 ± 0.8 | T | + | + | M | M | |
|
| +++ | STOCK1N-86948 | 2 ± 0.4 | T | + | ± | M | M | |
|
| ++++ | STOCK1N-55723 | 3.9 ± 1.4 | S + T | + | − | L | H | |
|
| +++ | D090-0093 | 6.25 ± 1.3 | T | + | + | H | H | |
|
| ++ | P991-0387 | 12.5 ± 1.9 | T | ± | − | H | H | |
|
| + | F333-0013 | 42 ± 5 | T | + | + | H | M | |
|
| +++ | F418-0205 | 0.8 | SOS | − | − | H | M | |
|
| +++ | STOCK1N-64226 | 20.8 | SOS | − | − | L | H | |
|
| +++ | F092-0369 | <0.2 | O | − | − | H | M | |
|
| +++ | F269-0279 | <0.2 | O | − | − | H | H | |
|
| +++ | Y030-6952 | 0.8 | O | − | − | H | H | |
|
| +++ | D475-2799 | 0.8 | O | − | − | M | H | |
|
| +++ | STOCK2S-91453 | 1.8 ± 0.8 | T | + | + | M | M |
*SI, selectivity index = CC (µg/ml or %)/MIC (µg/ml); H, high, SI > 100; M, moderate, 20 < SI < 100; L, low, SI < 20; T, translation inhibition; SOS, SOS response, O, other mechanism of action; **IP, intellectual property; L (low), match antibacterial Markush structure; M (moderate), match non-antibacterial Markush structure (but not listed among examples); H (high), clear IP status.
Antibacterial activity of compounds 11 (4-bromo-N-{5-[(4-chlorophenyl)methyl]-1,3,4-thiadiazol-2-yl}benzene-1-sulfonamide) and 13 (5′-(4-fluorobenzamido)-[2,3′-bithiophene]-4′-carboxylic acid) against selected archival strains.
| Species | Strain ID | Source | Activity | |
|---|---|---|---|---|
| Compound 11 | Compound 13 | |||
|
| ATCC 25922 | ATCC* | − | ± |
|
| 181210171-2 | Clinic of the Bashkir State Medical University | ± | + |
|
| ATCC 27853 | ATCC | − | − |
|
| ATCC USA 206 | Clinic of the Bashkir State Medical University | ++++ | ++++ |
|
| 181210169-1 | ATCC | + | ± |
*ATCC, American Type Culture Collection.
Overall performance of in silico modeling.
| Model | Classification accuracy (%)* | Prediction accuracy active/inactive (%)** | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Set | Training | Cross-validation | Internal test | |||||||
| Subset | Active | Inactive | Average | Active | Inactive | Average | Active | Inactive | Average | |
|
| 75 | 80 | 77.5 | – | – | – | – | – | – | 73/78 |
|
| 83.2 | 93.4 | 88.3 | 74.2 | 90.5 | 82.4 | 73.1 | 90.5 | 81.3 | 72/77 |
|
| 100 | 100 | 100 | 70.7 | 92.7 | 81.7 | 68.8 | 90.2 | 79.5 | 69/80 |
|
| 79.2 | 95.7 | 87.5 | 68.5 | 91.1 | 79.3 | 68 | 87 | 77.5 | 68/81 |
|
| 84.5 | 97.6 | 91.0 | 73.5 | 91.7 | 82.6 | 73.9 | 91.5 | 82.7 | 73/78 |
|
| 100 | 100 | 100 | 77.9 | 87.9 | 82.9 | 77.7 | 88.7 | 83.2 | 63/76 |
*Values for the best randomization; ** external test set.