| Literature DB >> 31341196 |
Zhenpeng Zhou1,2, Steven Kearnes3, Li Li3, Richard N Zare1, Patrick Riley4.
Abstract
We present a framework, which we call Molecule Deep Q-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double Q-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100% chemical validity. Further, we operate without pre-training on any dataset to avoid possible bias from the choice of that set. MolDQN achieves comparable or better performance against several other recently published algorithms for benchmark molecular optimization tasks. However, we also argue that many of these tasks are not representative of real optimization problems in drug discovery. Inspired by problems faced during medicinal chemistry lead optimization, we extend our model with multi-objective reinforcement learning, which maximizes drug-likeness while maintaining similarity to the original molecule. We further show the path through chemical space to achieve optimization for a molecule to understand how the model works.Entities:
Year: 2019 PMID: 31341196 PMCID: PMC6656766 DOI: 10.1038/s41598-019-47148-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Valid actions on the state of cyclohexane. Modifications are shown in red. Invalid bond additions which violate the heuristics explained in Section 2.1 are not shown.
Top three unique molecule property scores found by each method.
| Penalized logP | QED | |||||||
|---|---|---|---|---|---|---|---|---|
| 1st | 2nd | 3rd | Validity | 1st | 2nd | 3rd | Validity | |
| random walka | −3.99 | −4.31 | −4.37 | 100% | 0.64 | 0.56 | 0.56 | 100% |
| greedyb | 11.41 | — | — | 100% | 0.39 | — | — | 100% |
| 11.64 | 11.40 | 11.40 | 100% | 0.914 | 0.910 | 0.906 | 100% | |
| JT-VAEc | 5.30 | 4.93 | 4.49 | 100% | 0.925 | 0.911 | 0.910 | 100% |
| ORGANc | 3.63 | 3.49 | 3.44 | 0.4% | 0.896 | 0.824 | 0.820 | 2.2% |
| GCPNc | 7.98 | 7.85 | 7.80 | 100% | 0.948 | 0.947 | 0.946 | 100% |
| MolDQN-naïve | 11.51 | 11.51 | 11.50 | 100% | 0.934 | 0.931 | 0.930 | 100% |
| MolDQN-bootstrap | 11.84 | 11.84 | 11.82 | 100% | 0.948 | 0.944 | 0.943 | 100% |
| MolDQN-twosteps | — | — | — | — | 0.948 | 0.948 | 0.948 | 100% |
a“random walk” is a baseline that chooses a random action for each step.
b“greedy” is a baseline that chooses the action that leads to the molecule with the highest reward for each step. “-greedy” follows the “random” policy with probability , and “greedy” policy with probability . In contrast, the -greedy MolDQN models choose actions based on predicted Q-values rather than rewards.
cvalues are reported in You et al.[18].
Figure 2Sample molecules in the property optimization task. (a) Optimization of penalized logP from MolDQN-bootstrap; note that the generated molecules are obviously not drug-like due to the use of a single-objective reward. (b) Optimization of QED from MolDQN-twosteps.
Mean and standard deviation of penalized logP improvement in constraint optimization tasks.
|
| JT-VAEa | GCPNa | MolDQN-naive | MolDQN-bootstrap | ||||
|---|---|---|---|---|---|---|---|---|
| Improvement | Success | Improvement | Success | Improvement | Success | Improvement | Success | |
| 0.0 | 1.91 ± 2.04 | 97.5% | 4.20 ± 1.28 | 100% | 6.83 ± 1.30 | 100% | 7.04 ± 1.42 | 100% |
| 0.2 | 1.68 ± 1.85 | 97.1% | 4.12 ± 1.19 | 100% | 5.00 ± 1.55 | 100% | 5.06 ± 1.79 | 100% |
| 0.4 | 0.84 ± 1.45 | 83.6% | 2.49 ± 1.30 | 100% | 3.13 ± 1.57 | 100% | 3.37 ± 1.62 | 100% |
| 0.6 | 0.21 ± 0.71 | 46.4% | 0.79 ± 0.63 | 100% | 1.40 ± 1.05 | 100% | 1.86 ± 1.21 | 100% |
δ is the threshold of the similarity constraint . The success rate is the percentage of molecules satisfying the similarity constraint.
avalues are reported in You et al.[18].
Figure 3(a) The QED and Tanimoto similarity of the molecules optimized under different objective weights. The grey dashed line shows the QED and similarity score of the starting molecule. The legends are transparent, thus it will not cover any point. (b) The empirical distribution of the relative QED improvements in 20 multi-objective optimization tasks. The variable w in legends denotes the weight of the similarity in the multi-objective reward, while the QED score is weighted by , i.e. . (c) Unique molecules sampled from the multi-objective optimization task. The original molecule is boxed.
Figure 4(a) Visualization of the Q-values of selected actions. The full set of Q-values of actions are shown in Fig. S2. The original atoms and bonds are shown in black while modified ones are colored. Dashed lines denote bond removals. The Q-values are rescaled to . (b) The steps taken to maximize the QED starting from a molecule. The modifications are highlighted in yellow. The QED values are presented under the modified molecules.