| Literature DB >> 28470045 |
Han Altae-Tran1, Bharath Ramsundar2, Aneesh S Pappu2, Vijay Pande2.
Abstract
Recent advances in machine learning have made significant contributions to drug discovery. Deep neural networks in particular have been demonstrated to provide significant boosts in predictive power when inferring the properties and activities of small-molecule compounds (Ma, J. et al. J. Chem. Inf. MODEL: 2015, 55, 263-274). However, the applicability of these techniques has been limited by the requirement for large amounts of training data. In this work, we demonstrate how one-shot learning can be used to significantly lower the amounts of data required to make meaningful predictions in drug discovery applications. We introduce a new architecture, the iterative refinement long short-term memory, that, when combined with graph convolutional neural networks, significantly improves learning of meaningful distance metrics over small-molecules. We open source all models introduced in this work as part of DeepChem, an open-source framework for deep-learning in drug discovery (Ramsundar, B. deepchem.io. https://github.com/deepchem/deepchem, 2016).Entities:
Year: 2017 PMID: 28470045 PMCID: PMC5408335 DOI: 10.1021/acscentsci.6b00367
Source DB: PubMed Journal: ACS Cent Sci ISSN: 2374-7943 Impact factor: 14.553
Figure 1Schematic of Network Architecture for one-shot learning in drug discovery.
Figure 2Pictorial depiction of iterative refinement of embeddings. Inputs/outputs are two-dimensional for illustrative purposes, with q1 and q2 forming the coordinate axes. Red and blue points depict positive/negative samples (for illustrative purposes only). The original embedding g′(S) is shown as squares. The expected features r are shown as empty circles.
Figure 3Graphical representation of the major graph operations described in this paper. For each of the operations, the nodes being operated on are shown in blue, with unchanged nodes shown in light blue. For graph convolution and graph pool, the operation is shown for a single node, v; however, these operations are performed on all nodes v in the graph simultaneously.
ROC-AUC Scores of Models on Median Held-out Task for Each Model on Tox21a
| Tox21 | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.586 ± 0.056 | 0.648 ± 0.029 | 0.820 ± 0.003 | 0.801 ± 0.001 | |
| 5+/10– | 0.573 ± 0.060 | 0.637 ± 0.061 | 0.823 ± 0.004 | 0.753 ± 0.173 | |
| 1+/10– | 0.551 ± 0.067 | 0.541 ± 0.093 | 0.549 ± 0.088 | 0.724 ± 0.008 | |
| 1+/5– | 0.559 ± 0.063 | 0.595 ± 0.086 | 0.687 ± 0.210 | 0.593 ± 0.153 | |
| 1+/1– | 0.535 ± 0.056 | 0.589 ± 0.068 | 0.657 ± 0.222 | 0.507 ± 0.079 |
Numbers reported are means and standard deviations. Randomness is over the choice of support set; experiment is repeated with 20 support sets. The Appendix contains results for all held-out Tox21 tasks. The result with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on Median Held-out Task for Each Model on SIDERa
| SIDER | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.535 ± 0.036 | 0.483 ± 0.026 | 0.553 ± 0.058 | 0.669 ± 0.007 | |
| 5+/10– | 0.533 ± 0.030 | 0.473 ± 0.029 | 0.648 ± 0.070 | 0.534 ± 0.053 | |
| 1+/10– | 0.540 ± 0.034 | 0.447 ± 0.016 | 0.544 ± 0.056 | 0.506 ± 0.016 | |
| 1+/5– | 0.529 ± 0.028 | 0.457 ± 0.029 | 0.530 ± 0.050 | 0.505 ± 0.022 | |
| 1+/1– | 0.506 ± 0.039 | 0.468 ± 0.045 | 0.510 ± 0.016 | 0.501 ± 0.022 |
Numbers reported are means and standard deviations. Randomness is over the choice of support set; experiment is repeated with 20 support sets. The Appendix contains results for all held-out SIDER tasks. The result with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on Median Held-out Task for Each Model on MUVa
| MUV | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.568 ± 0.085 | 0.601 ± 0.041 | 0.504 ± 0.058 | 0.499 ± 0.053 | |
| 5+/10– | 0.565 ± 0.068 | 0.655 ± 0.166 | 0.507 ± 0.052 | 0.663 ± 0.019 | |
| 1+/10– | 0.556 ± 0.084 | 0.569 ± 0.061 | 0.504 ± 0.044 | 0.569 ± 0.012 | |
| 1+/5– | 0.598 ± 0.067 | 0.554 ± 0.089 | 0.514 ± 0.053 | 0.515 ± 0.021 | |
| 1+/1– | 0.552 ± 0.084 | 0.500 ± 0.0001 | 0.500 ± 0.027 | 0.479 ± 0.037 |
Numbers reported are means and standard deviations. Randomness is over the choice of support set; experiment is repeated with 20 support sets. The Appendix contains results for all held-out MUV tasks. The result with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models Trained on Tox21 on Median SIDER Task for Each Model on SIDERa
| SIDER from Tox21 | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|
| 10+/10– | 0.511 ± 0.031 | 0.509 ± 0.014 | 0.509 ± 0.012 |
Note that models are evaluated on all SIDER tasks and not just the held-out SIDER tasks from previous section. Numbers reported are means and standard deviations. Randomness is over the choice of support set; experiment is repeated with 20 support sets. The result with highest mean in each row is highlighted. The notation 10+/10- indicates supports with 10 positive examples and 10 negative examples.
Convolutional Network Architecture
| layer | conv | pool | conv | pool | conv | pool | dense | gather |
|---|---|---|---|---|---|---|---|---|
| dimension | 64 | 128 | 64 | 128 | ||||
| nonlinarity | relu | relu | relu | tanh | tanh |
ROC-AUC Scores of Models on Tox21 Assay SR-HSEa
| SR-HSE | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.532 ± 0.033 | 0.540 ± 0.025 | 0.767 ± 0.005 | 0.747 ± 0.003 | |
| 5+/10– | 0.521 ± 0.037 | 0.546 ± 0.023 | 0.733 ± 0.098 | 0.716 ± 0.098 | |
| 1+/10– | 0.525 ± 0.033 | 0.531 ± 0.035 | 0.647 ± 0.202 | 0.498 ± 0.046 | |
| 1+/5– | 0.510 ± 0.041 | 0.537 ± 0.043 | 0.680 ± 0.167 | 0.505 ± 0.074 | |
| 1+/1– | 0.507 ± 0.039 | 0.526 ± 0.0378 | 0.613 ± 0.187 | 0.507 ± 0.029 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on Tox21 Assay SR-MMPa
| SR-MMP | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.629 ± 0.058 | 0.648 ± 0.029 | 0.825 ± 0.006 | 0.801 ± 0.001 | |
| 5+/10– | 0.634 ± 0.079 | 0.637 ± 0.061 | 0.846 ± 0.028 | 0.811 ± 0.003 | |
| 1+/10– | 0.587 ± 0.068 | 0.541 ± 0.093 | 0.551 ± 0.086 | 0.730 ± 0.003 | |
| 1+/5– | 0.597 ± 0.097 | 0.595 ± 0.086 | 0.687 ± 0.210 | 0.602 ± 0.122 | |
| 1+/1– | 0.560 ± 0.0844 | 0.589 ± 0.068 | 0.657 ± 0.222 | 0.527 ± 0.090 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on Tox21 Assay SR-p53a
| SR-p53 | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.586 ± 0.056 | 0.653 ± 0.021 | 0.820 ± 0.003 | 0.809 ± 0.002 | |
| 5+/10– | 0.573 ± 0.0604 | 0.639 ± 0.042 | 0.823 ± 0.004 | 0.753 ± 0.173 | |
| 1+/10– | 0.551 ± 0.067 | 0.597 ± 0.083 | 0.549 ± 0.088 | 0.724 ± 0.008 | |
| 1+/5– | 0.559 ± 0.063 | 0.595 ± 0.073 | 0.745 ± 0.156 | 0.593 ± 0.153 | |
| 1+/1– | 0.535 ± 0.056 | 0.591 ± 0.084 | 0.680 ± 0.197 | 0.507 ± 0.079 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on “Renal and Urinary Disorders”a
| R.U.D. | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.564 ± 0.031 | 0.496 ± 0.035 | 0.576 ± 0.081 | 0.706 ± 0.002 | |
| 5+/10– | 0.564 ± 0.022 | 0.477 ± 0.031 | 0.670 ± 0.119 | 0.540 ± 0.026 | |
| 1+/10– | 0.540 ± 0.034 | 0.449 ± 0.025 | 0.518 ± 0.025 | 0.575 ± 0.015 | |
| 1+/5– | 0.538 ± 0.045 | 0.457 ± 0.029 | 0.518 ± 0.060 | 0.509 ± 0.026 | |
| 1+/1– | 0.508 ± 0.046 | 0.468 ± 0.045 | 0.503 ± 0.063 | 0.497 ± 0.022 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10- indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on “Pregnancy, Puerperium and Perinatal Conditions”a
| P.P.P.C. | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.515 ± 0.034 | 0.552 ± 0.028 | 0.505 ± 0.018 | 0.669 ± 0.007 | |
| 5+/10– | 0.517 ± 0.050 | 0.548 ± 0.032 | 0.645 ± 0.073 | 0.545 ± 0.041 | |
| 1+/10– | 0.529 ± 0.043 | 0.521 ± 0.041 | 0.505 ± 0.026 | 0.539 ± 0.018 | |
| 1+/5– | 0.507 ± 0.044 | 0.538 ± 0.030 | 0.511 ± 0.032 | 0.505 ± 0.022 | |
| 1+/1– | 0.504 ± 0.032 | 0.527 ± 0.027 | 0.510 ± 0.016 | 0.493 ± 0.016 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on “Ear and Labyrinth Disorders”a
| E.L.D. | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.527 ± 0.034 | 0.533 ± 0.026 | 0.559 ± 0.070 | 0.661 ± 0.001 | |
| 5+/10– | 0.518 ± 0.038 | 0.486 ± 0.032 | 0.648 ± 0.070 | 0.528 ± 0.047 | |
| 1+/10– | 0.524 ± 0.021 | 0.456 ± 0.018 | 0.547 ± 0.037 | 0.506 ± 0.016 | |
| 1+/5– | 0.526 ± 0.031 | 0.463 ± 0.027 | 0.534 ± 0.064 | 0.504 ± 0.021 | |
| 1+/1– | 0.509 ± 0.032 | 0.519 ± 0.035 | 0.514 ± 0.059 | 0.501 ± 0.022 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on “Cardiac Disorders”a
| C.D. | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.552 ± 0.036 | 0.456 ± 0.038 | 0.687 ± 0.089 | 0.532 ± 0.076 | |
| 5+/10– | 0.560 ± 0.041 | 0.444 ± 0.027 | 0.678 ± 0.085 | 0.534 ± 0.053 | |
| 1+/10– | 0.540 ± 0.029 | 0.422 ± 0.035 | 0.544 ± 0.056 | 0.504 ± 0.016 | |
| 1+/5– | 0.537 ± 0.052 | 0.447 ± 0.035 | 0.536 ± 0.052 | 0.517 ± 0.045 | |
| 1+/1– | 0.506 ± 0.039 | 0.461 ± 0.0478 | 0.543 ± 0.068 | 0.509 ± 0.029 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on “Nervous System Disorders”a
| N.S.D. | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.681 ± 0.077 | 0.367 ± 0.040 | 0.809 ± 0.013 | 0.657 ± 0.119 | |
| 5+/10– | 0.638 ± 0.102 | 0.360 ± 0.035 | 0.791 ± 0.022 | 0.637 ± 0.078 | |
| 1+/10– | 0.639 ± 0.043 | 0.334 ± 0.025 | 0.631 ± 0.115 | 0.511 ± 0.057 | |
| 1+/5– | 0.604 ± 0.091 | 0.344 ± 0.033 | 0.617 ± 0.107 | 0.514 ± 0.080 | |
| 1+/1– | 0.598 ± 0.100 | 0.437 ± 0.095 | 0.515 ± 0.121 | 0.508 ± 0.060 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on “Injury, Poisoning and Procedural Complications”a
| I.S.P.C. | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.535 ± 0.036 | 0.483 ± 0.026 | 0.553 ± 0.058 | 0.667 ± 0.001 | |
| 5+/10– | 0.533 ± 0.0302 | 0.473 ± 0.029 | 0.589 ± 0.125 | 0.509 ± 0.036 | |
| 1+/10– | 0.541 ± 0.021 | 0.447 ± 0.016 | 0.537 ± 0.045 | 0.510 ± 0.015 | |
| 1+/5– | 0.529 ± 0.028 | 0.458 ± 0.024 | 0.530 ± 0.050 | 0.501 ± 0.021 | |
| 1+/1– | 0.477 ± 0.029 | 0.475 ± 0.023 | 0.501 ± 0.044 | 0.504 ± 0.019 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on MUV-832a
| MUV-832 | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.568 ± 0.0851 | 0.655 ± 0.066 | 0.484 ± 0.058 | 0.500 ± 0.053 | |
| 5+/10– | 0.565 ± 0.068 | 0.656 ± 0.136 | 0.517 ± 0.045 | 0.726 ± 0.025 | |
| 1+/10– | 0.556 ± 0.084 | 0.569 ± 0.061 | 0.511 ± 0.042 | 0.573 ± 0.013 | |
| 1+/5– | 0.598 ± 0.067 | 0.573 ± 0.082 | 0.511 ± 0.179 | 0.529 ± 0.052 | |
| 1+/1– | 0.552 ± 0.084 | 0.500 ± 0.001 | 0.497 ± 0.030 | 0.463 ± 0.024 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on MUV-846a
| MUV-846 | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.608 ± 0.079 | 0.601 ± 0.041 | 0.504 ± 0.058 | 0.460 ± 0.054 | |
| 5+/10– | 0.595 ± 0.063 | 0.655 ± 0.166 | 0.494 ± 0.040 | 0.663 ± 0.019 | |
| 1+/10– | 0.576 ± 0.075 | 0.602 ± 0.118 | 0.504 ± 0.045 | 0.598 ± 0.013 | |
| 1+/5– | 0.554 ± 0.089 | 0.562 ± 0.149 | 0.517 ± 0.059 | 0.632 ± 0.011 | |
| 1+/1– | 0.588 ± 0.077 | 0.500 ± 0.0001 | 0.496 ± 0.015 | 0.511 ± 0.029 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on MUV-852a
| MUV-852 | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.657 ± 0.103 | 0.678 ± 0.047 | 0.514 ± 0.049 | 0.514 ± 0.048 | |
| 5+/10– | 0.670 ± 0.068 | 0.765 ± 0.017 | 0.495 ± 0.046 | 0.755 ± 0.023 | |
| 1+/10– | 0.631 ± 0.105 | 0.627 ± 0.156 | 0.574 ± 0.053 | 0.569 ± 0.012 | |
| 1+/5– | 0.632 ± 0.106 | 0.597 ± 0.135 | 0.663 ± 0.109 | 0.485 ± 0.022 | |
| 1+/1– | 0.590 ± 0.134 | 0.500 ± 0.002 | 0.502 ± 0.032 | 0.471 ± 0.032 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on MUV-858a
| MUV-858 | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.552 ± 0.083 | 0.550 ± 0.143 | 0.516 ± 0.053 | 0.530 ± 0.044 | |
| 5+/10– | 0.564 ± 0.072 | 0.554 ± 0.069 | 0.580 ± 0.105 | 0.548 ± 0.051 | |
| 1+/10– | 0.537 ± 0.089 | 0.552 ± 0.069 | 0.553 ± 0.101 | 0.492 ± 0.032 | |
| 1+/5– | 0.577 ± 0.068 | 0.526 ± 0.050 | 0.486 ± 0.082 | 0.506 ± 0.028 | |
| 1+/1– | 0.500 ± 0.009 | 0.500 ± 0.027 | 0.503 ± 0.041 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. The model with highest mean in each row is highlighted. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.
ROC-AUC Scores of Models on MUV-859a
| MUV-859 | RF (100 trees) | Graph Conv | Siamese | AttnLSTM | IterRefLSTM |
|---|---|---|---|---|---|
| 10+/10– | 0.503 ± 0.0717 | 0.534 ± 0.084 | 0.514 ± 0.054 | 0.498 ± 0.098 | 0.474 ± 0.059 |
| 5+/10– | 0.502 ± 0.068 | 0.510 ± 0.067 | 0.498 ± 0.051 | 0.507 ± 0.052 | 0.386 ± 0.017 |
| 1+/10– | 0.530 ± 0.053 | 0.511 ± 0.049 | 0.507 ± 0.062 | 0.497 ± 0.076 | 0.412 ± 0.010 |
| 1+/5– | 0.515 ± 0.074 | 0.513 ± 0.042 | 0.514 ± 0.053 | 0.515 ± 0.021 | 0.397 ± 0.010 |
| 1+/1– | 0.521 ± 0.060 | 0.493 ± 0.065 | 0.500 ± 0.001 | 0.502 ± 0.044 | 0.479 ± 0.037 |
Numbers reported are means and standard deviations. Each model is evaluated 20 times with different support sets to compute means and standard deviations. No models had signal so did not highlight any models. The notation 10+/10– indicates supports with 10 positive examples and 10 negative examples.