| Literature DB >> 30555904 |
Evan N Feinberg1, Debnil Sur2, Zhenqin Wu3, Brooke E Husic3, Huanghao Mai2, Yang Li4, Saisai Sun4, Jianyi Yang4, Bharath Ramsundar2, Vijay S Pande5.
Abstract
The arc of drug discovery entails a multiparameter optimization problem spanning vast length scales. The key parameters range from solubility (angstroms) to protein-ligand binding (nanometers) to in vivo toxicity (meters). Through feature learning-instead of feature engineering-deep neural networks promise to outperform both traditional physics-based and knowledge-based machine learning models for predicting molecular properties pertinent to drug discovery. To this end, we present the PotentialNet family of graph convolutions. These models are specifically designed for and achieve state-of-the-art performance for protein-ligand binding affinity. We further validate these deep neural networks by setting new standards of performance in several ligand-based tasks. In parallel, we introduce a new metric, the Regression Enrichment Factor EFχ (R), to measure the early enrichment of computational models for chemical data. Finally, we introduce a cross-validation strategy based on structural homology clustering that can more accurately measure model generalizability, which crucially distinguishes the aims of machine learning for drug discovery from standard machine learning tasks.Entities:
Year: 2018 PMID: 30555904 PMCID: PMC6276035 DOI: 10.1021/acscentsci.8b00507
Source DB: PubMed Journal: ACS Cent Sci ISSN: 2374-7943 Impact factor: 14.553
Figure 1Visual depiction of the gated graph neural network with atoms as nodes and bonds as edges. The small molecule propanamide is chosen to illustrate the propagation of information among the different update layers of the network.
Figure 2Visual depiction of multistaged spatial gated graph neural network. Stage 1 entails graph convolutions over only bonds, which derives new node (atom) feature maps roughly analogous to differentiable atom types in more traditional forms of molecular modeling. Stage 2 entails both bond-based and spatial distance-based propagation of information. In the final stage, a graph gather operation is conducted over the ligand atoms, whose feature maps are derived from bonded ligand information and spatial proximity to protein atoms.
Figure 3PotentialNet stage 1 exploits only covalent or bonded interaction edge types encoded in the first slices of the last dimension of the adjacency tensor A.
Figure 4PotentialNet stage 2 exploits both bonded and nonbonded interaction edge types spanning the entirety of the last dimension of the adjacency tensor A.
Figure 5Notional comparison of cross-validation splitting algorithms. The first four vertical panels from the left depict simple examples of random split, stratified split, time split, and scaffold split. The rightmost panel depicts a toy example of the agglomerative split proposed in this work. Both scaffold split and agglomerative split group similar data points together to promote the generalizability of the network to new data. Scaffold split uses the algorithm introduced by Bemis and Murcko[47] to group ligands into common frameworks. The agglomerative split uses hierarchical agglomerative clustering to group ligand–protein systems according to pairwise sequence or structural similarity of the proteins. This figure is adapted from ref (3) with permission from the Royal Society of Chemistry.
Benchmark: PDBBind 2007, Refined Train, Core Testa
| model | Test | Test EFχ( | Test Pearson | Test Spearman | Test stdev | Test MUE |
|---|---|---|---|---|---|---|
| PotentialNet | 0.668 (0.043) | 1.643 (0.127) | 0.822 (0.021) | 1.388 (0.070) | 0.626 (0.037) | |
| PotentialNet, (ligand-only control) | 0.419 (0.234) | 1.404 (0.171) | 0.650 (0.017) | 0.670 (0.014) | 1.832 (0.135) | 0.839 (0.005) |
| TopologyNet, no validation set | N/A | N/A | N/A | N/A | N/A | |
| RF-Score | N/A | N/A | 0.783 | 0.769 | N/A | N/A |
| X-Score | N/A | N/A | 0.643 | 0.707 | N/A | N/A |
Error bars are recorded as standard deviation of the test metric over three random initializations of the best model as determined by average validation set score. MUE is mean unsigned error. Pearson test scores for TopologyNet are reported from ref (37), and RF- and X-Scores are reported from ref (44).
Benchmark: PDBBind 2007 Refined, Agglomerative Structure Splita
| model | Test | Test EFχ( | Test Pearson | Test Spearman | Test MUE |
|---|---|---|---|---|---|
| PotentialNet | |||||
| ligand-only PotentialNet | 0.500 (0.010) | 1.498 (0.411) | 0.733 (0.007) | 0.726 (0.005) | 1.700 (0.067) |
| RF-Score | 0.594 (0.005) | 0.869 (0.090) | 0.779 (0.003) | 0.757 (0.005) | 1.542 (0.046) |
| X-Score | 0.517 | 0.891 | 0.730 | 0.751 | 1.751 |
Error bars are recorded as standard deviation of the test metric over three random initializations of the best model as determined by average validation set score. MUE is mean unsigned error. X-Score does not have error because it is a deterministic linear model.
Quantum Property Prediction with QM8 Data Seta
| network | Valid MAE | Test MAE |
|---|---|---|
| spatial PotentialNet, staged | 0.0120 | |
| spatial PotentialNet, SingleUpdate | 0.0133 | 0.0131 (0.0001) |
| MPNN | 0.0142 | 0.0139 (0.0007) |
| DTNN | 0.0168 | 0.0163 (0.0010) |
Error bars are recorded as standard deviation of the test metric over three random initializations of the best model as determined by average validation set score.
Toxicity Prediction with the Tox21 Data Seta
| network | Valid ROC–AUC | Test ROC–AUC |
|---|---|---|
| PotentialNet | 0.878 | |
| Weave | 0.852 | 0.831 (0.013) |
| GraphConv | 0.858 | 0.838 (0.001) |
| XGBoost | 0.778 | 0.808 (0.000) |
Error bars are recorded as standard deviation of the test metric over three random initializations of the best model as determined by average validation set score.
Solubility Prediction with the Delaney ESOL Data Seta
| network | Valid RMSE | Test RMSE |
|---|---|---|
| PotentialNet | 0.517 | |
| Weave | 0.549 | 0.553 (0.035) |
| GraphConv | 0.721 | 0.648 (0.019) |
| XGBoost | 1.182 | 0.912 (0.000) |
Error bars are recorded as standard deviation of the test metric over three random initializations of the best model as determined by average validation set score.
Benchmark: PDBBind 2007 Refined, Agglomerative Sequence Splita
| model | Test | Test EFχ( | Test Pearson | Test Spearman | Test MUE |
|---|---|---|---|---|---|
| PotentialNet | 0.480 (0.030) | 0.867 (0.036) | 0.700 (0.003) | 0.694 (0.012) | 1.680 (0.061) |
| ligand-only PotentialNet | 0.414 (0.058) | 0.883 (0.025) | 0.653 (0.031) | 0.674 (0.020) | 1.712 (0.110) |
| RF-Score | 1.078 (0.143) | 0.723 (0.013) | |||
| X-Score | 0.470 | 0.702 | 1.667 |
Error bars are recorded as standard deviation of the test metric over three random initializations of the best model as determined by average validation set score. MUE is mean unsigned error. X-Score does not have error because it is a deterministic linear model.
Hyperparameters for Neural Networks (equations –9)
| network | hyperparameter name | symbol | possible values |
|---|---|---|---|
| PotentialNet | gather widths (bond and spatial) | [64, 128] | |
| PotentialNet | number of bond convolution layers | bond | [1, 2] |
| PotentialNet | number of spatial convolution layers | spatial | [1, 2, 3] |
| PotentialNet | gather width | [64, 128] | |
| PotentialNet | number of graph convolution layers | [1, 2, 3] | |
| both | fully connected widths | [[128, 32, 1], [128, 1], [64, 32, 1], [64, 1]] | |
| both | learning rate | [1e–3, 2e–4] | |
| both | weight decay | [0., 1e–7, 1e–5, 1e–3] | |
| both | dropout | [0., 0.25, 0.4, 0.5] |
QM8 Test Set Performances of All Tasks (Mean Absolute Error)
| task | DTNN | MPNN | PotentialNet, single update | PotentialNet, staged |
|---|---|---|---|---|
| E1–CC2 | 0.0092 | 0.0084 | 0.0070 | |
| E2–CC2 | 0.0092 | 0.0091 | 0.0079 | |
| f1–CC2 | 0.0182 | 0.0151 | 0.0137 | |
| f2–CC2 | 0.0377 | 0.0314 | 0.0296 | |
| E1–PBE0 | 0.0090 | 0.0083 | 0.0070 | |
| E2–PBE0 | 0.0086 | 0.0086 | 0.0074 | |
| f1–PBE0 | 0.0155 | 0.0123 | 0.0112 | |
| f2–PBE0 | 0.0281 | 0.0236 | 0.0228 | |
| E1–CAM | 0.0086 | 0.0079 | 0.0066 | |
| E2–CAM | 0.0082 | 0.0082 | 0.0069 | |
| f1–CAM | 0.0180 | 0.0134 | 0.0123 | |
| f2–CAM | 0.0322 | 0.0258 | 0.0245 |