| Literature DB >> 34410004 |
Sara Tortorella1, Emanuele Carosati2, Giulia Sorbi1, Giovanni Bocci3, Simon Cross4, Gabriele Cruciani2, Loriano Storchi5,4.
Abstract
Molecular interaction fields (MIFs), describing molecules in terms of their ability to interact with any chemical entity, are one of the most established and versatile concepts in drug discovery. Improvement of this molecular description is highly desirable for in silico drug discovery and medicinal chemistry applications. In this work, we revised a well-established molecular mechanics' force field and applied a hybrid quantum mechanics and machine learning approach to parametrize the hydrogen-bonding (HB) potentials of small molecules, improving this aspect of the molecular description. Approximately 66,000 molecules were chosen from available drug databases and subjected to density functional theory calculations (DFT). For each atom, the molecular electrostatic potential (EP) was extracted and used to derive new HB energy contributions; this was subsequently combined with a fingerprint-based description of the structural environment via partial least squares modeling, enabling the new potentials to be used for molecules outside of the training set. We demonstrate that parameter prediction for molecules outside of the training set correlates with their DFT-derived EP, and that there is correlation of the new potentials with hydrogen-bond acidity and basicity scales. We show the newly derived MIFs vary in strength for various ring substitution in accordance with chemical intuition. Finally, we report that this derived parameter, when extended to non-HB atoms, can also be used to estimate sites of reaction.Entities:
Keywords: drug discovery; machine learning; medicinal chemistry applications; molecular descriptors; molecular interaction fields
Mesh:
Substances:
Year: 2021 PMID: 34410004 PMCID: PMC9291213 DOI: 10.1002/jcc.26737
Source DB: PubMed Journal: J Comput Chem ISSN: 0192-8651 Impact factor: 3.672
Statistical parameters for the obtained models. AT atom type; chemical description of the atom type; H‐bond type H‐bond donor (D) or H‐bond acceptor (a); atoms number of atoms of the training set; LV number of latent variables considered; R coefficient of determination for the training set; Q coefficient of determination for predicted compounds; SDEC standard deviation error in calculation; SDEP standard deviation error in external prediction
| AT | Description | H‐bond type | Atoms | LV | R2 | Q2 | SDEC (kcal/Mol) | SDEP (kcal/Mol) |
|---|---|---|---|---|---|---|---|---|
| N: | sp3 (tertiary) nitrogen, accepting one H‐bond | A | 6954 | 9 | 0.92 | 0.88 | 0.56 | 0.41 |
| N1: | sp3 (secondary) nitrogen, donating one hydrogen and accepting one H‐bond | A | 3941 | 8 | 0.91 | 0.84 | 0.24 | 0.49 |
| D | 4776 | 7 | 0.96 | 0.92 | 0.30 | 0.53 | ||
| N2: | sp3 (primary)nitrogen, donating up to two hydrogen and accepting one H‐bond | A | 3618 | 8 | 0.84 | 0.71 | 0.26 | 0.38 |
| D | 4895 | 7 | 0.95 | 0.92 | 0.30 | 0.41 | ||
| ON | oxygen of nitro or nitroso group, accepting up to two H‐bond | A | 4907 | 8 | 0.82 | 0.69 | 0.26 | 0.38 |
| N:= | sp2 (aromatic) nitrogen, accepting one H‐bond | A | 27,140 | 12 | 0.91 | 0.89 | 0.35 | 0.47 |
| N:: | sp2 nitrogen with two lone pairs and one double bond | A | 472 | 4 | 0.89 | 0.59 | 0.23 | 0.12 |
| N:# | sp nitrogen | A | 15,798 | 10 | 0.72 | 0.66 | 0.29 | 0.32 |
| O1 | Alcoholic oxygen atom in sp3 hydroxyl group, capable of donating one hydrogen and accepting up to two H‐bonds | A | 1367 | 6 | 0.86 | 0.66 | 0.30 | 0.55 |
| D | 1392 | 7 | 0.87 | 0.65 | 0.29 | 0.50 | ||
| OC1 | Aliphatic and aryl ether oxygen, accepting one H‐bonds | A | 12,725 | 10 | 0.74 | 0.66 | 0.32 | 0.44 |
| OC2 | Aliphatic ether oxygen, accepting two H‐bonds | A | 7100 | 8 | 0.81 | 0.73 | 0.30 | 0.44 |
| OC= | Aryl ether oxygen, accepting one H‐bond | A | 2527 | 9 | 0.89 | 0.75 | 0.26 | 0.46 |
| OES | Tetrahedral ester oxygen, not accepting H‐bonds | A | 11,501 | 10 | 0.82 | 0.76 | 0.28 | 0.39 |
| OFU | Aromatic furan or oxazole oxygen, accepting one H‐bond | A | 6114 | 9 | 0.88 | 0.81 | 0.26 | 0.47 |
| OH | Phenolic and carboxy oxygen, capable of donating one hydrogen and accepting up to two H‐bonds | A | 4892 | 7 | 0.78 | 0.62 | 0.29 | 0.50 |
| D | 4892 | 7 | 0.78 | 0.62 | 0.29 | 0.50 | ||
| O=S | Oxygen bonded only to one central S (sulphones, sulfates, unionized sulfate, sulphonamides), accepting two H‐bonds | A | 15,886 | 10 | 0.84 | 0.81 | 0.24 | 0.37 |
| OS | Oxygen bonded only to one central S (sulphoxides, unionized sulphonate esters, unionized alkyl sulphinates), accepting two H‐bonds | A | 947 | 4 | 0.90 | 0.69 | 0.25 | 0.41 |
| O= | Oxygen bonded to one atom (e.g., phosphates arsenates silicates) and accepting up to two H‐bonds | A | 13,307 | 7 | 0.86 | 0.83 | 0.33 | 0.44 |
| O | sp2 carbonylic oxygen, accepting up to two H‐bonds | A | 7811 | 6 | 0.90 | 0.86 | 0.33 | 0.61 |
FIGURE 1dEmin versus QM electronic potential correlations for (A) the N:= atom type (2711 atoms, R – Pearson = 0.90) and (B) the N1 atom type (2159 atoms, R – Pearson = −0.89) of the test set. The red lines represent values of the traditional, static Emin of the GRID force field, namely −5.5 for N:= and − 4.0 for N1 atom types. dEmin, dynamic Emin [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 2dEmin versus H‐bond basicity scale for the Kenny dataset (279 atoms, R – Pearson = −0.85). Color palette at the bottom of the picture. dEmin, dynamic Emin [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 3MIFs for phenazopyridine derivatives (A—Deaminated and nitro substituted phenazopyridine B—Deaminated phenazopyridine C—phenazopyridine). The energy values of the isocontour surfaces chosen for H‐bond donating probe (“N1,” blue fields) was −4.0 kcal/Mol [Color figure can be viewed at wileyonlinelibrary.com]
FIGURE 4Summary of the “fraction of drug dose excreted unchanged in urine” model performance. (A) Pie chart depicting the training set confusion matrix. (B) Pie chart depicting the test set confusion matrix. (C) Metrics showing the model performance in fitting (prediction of training set molecules). (D) Metrics showing the model performance in validation (prediction of test set molecules). In the confusion matrix pies, colors indicate the different predictions: True positives in blue, false negatives in orange, true negatives in gray and false positives in yellow. In the bar plots, the metrics described are the following: ACC, accuracy; F1, f1‐score; MCC, Matthew's correlation coefficient; PREC+, positive precision; PREC‐, negative precision; SE, sensitivity; SP, specificity [Color figure can be viewed at wileyonlinelibrary.com]
Predicted and experimental sites of reaction prediction as in Reference 68 compared with VolSurf3 electronic description (GRID charges, GC). Highlighted in bold, the molecular moiety of possible sites of reaction proposed in Reference 68. Electron‐poor molecular moieties are highlighted in red, electron‐rich in blue
| Substrate | Reaction | Predicted | Experimental |
|---|---|---|---|
| Risperdal | Electrophilic halogenatation |
|
|
| Methotrexate | Electrophilic halogenatation |
|
|
| Voriconazole | Acid‐promoted electrophilic bromination |
|
|
| Pioglitazone | Baran‐Minisci reaction with different alkylsulfinate Diversinate |
| |