| Literature DB >> 34833853 |
Rajendra P Joshi1, Neeraj Kumar1.
Abstract
Domain-aware artificial intelligence has been increasingly adopted in recent years to expedite molecular design in various applications, including drug design and discovery. Recent advances in areas such as physics-informed machine learning and reasoning, software engineering, high-end hardware development, and computing infrastructures are providing opportunities to build scalable and explainable AI molecular discovery systems. This could improve a design hypothesis through feedback analysis, data integration that can provide a basis for the introduction of end-to-end automation for compound discovery and optimization, and enable more intelligent searches of chemical space. Several state-of-the-art ML architectures are predominantly and independently used for predicting the properties of small molecules, their high throughput synthesis, and screening, iteratively identifying and optimizing lead therapeutic candidates. However, such deep learning and ML approaches also raise considerable conceptual, technical, scalability, and end-to-end error quantification challenges, as well as skepticism about the current AI hype to build automated tools. To this end, synergistically and intelligently using these individual components along with robust quantum physics-based molecular representation and data generation tools in a closed-loop holds enormous promise for accelerated therapeutic design to critically analyze the opportunities and challenges for their more widespread application. This article aims to identify the most recent technology and breakthrough achieved by each of the components and discusses how such autonomous AI and ML workflows can be integrated to radically accelerate the protein target or disease model-based probe design that can be iteratively validated experimentally. Taken together, this could significantly reduce the timeline for end-to-end therapeutic discovery and optimization upon the arrival of any novel zoonotic transmission event. Our article serves as a guide for medicinal, computational chemistry and biology, analytical chemistry, and the ML community to practice autonomous molecular design in precision medicine and drug discovery.Entities:
Keywords: artificial intelligence; autonomous workflow; computational modeling and simulations; computer aided drug discovery; deep learning; machine learning; machine reasoning and causal inference and causal reasoning; quantum mechanics and quantum computing; therapeutic design
Mesh:
Year: 2021 PMID: 34833853 PMCID: PMC8619999 DOI: 10.3390/molecules26226761
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Closed-loop workflow for computational autonomous molecular design (CAMD) for medical therapeutics. Individual components of the workflow are labeled. It consists of data generation, feature extraction, predictive machine learning and an inverse molecular design engine.
Figure 2Molecular representation with all possible formulation used in the literature for predictive and generative modeling.
Figure 3The iterative update process used for learning a robust molecular representation either based on 2D SMILES or 3D optimized geometrical coordinates from physics-based simulations. The molecular graph is usually represented by features at the atomic level, bond level, and global state, which represents the key properties. Each of these features are iteratively updated during the representation learning phase, which are subsequently used for the predictive part of model.
Figure 4Physics-informed ML framework for predictive modeling. It takes into account the properties obtained from quantum mechanics-based simulation or from experimental data to ultimately generate features in addition to the standard process used in benchmark models (e.g., message passing neural network (MPNN).
Highlights and benchmark of predictive ML methods, their comparison, including their key features, advantages, and disadvantages.
| Methods | Key Feature | Advantage | Drawbacks |
|---|---|---|---|
| MPNN [ |
Message exchanged between the atoms depends only on the feature of the sending atom and the corresponding edge features and is independent of the representation of the atom receiving the message Generate global representation of the molecule Predicted property of the molecule is the function of global representations of the molecule Generate messages centered on the atoms |
Achieved chemical accuracy in 11 out of 13 properties in QM9 data Performs well for intensive properties |
Including the state of the message-receiving atom (dubbed as pair message) increases the property prediction error The message passed from atom A to atom B can be transmitted back to atom B, resulting in noise |
| d-MPNN [ |
Learns molecular representation centered on bonds instead of atoms Update on MPNN that combines the learned representation with the prior known fixed atomic, bond, and global molecular descriptors |
Avoid noise resulting from the message being passed along any path by using directed messages Use only SMILES string to generate input representation |
Does not use spatial information as a part of input features |
| SchNet [ |
Learns the atomistic representations of the molecules The total property of the molecule is the sum over the atomic contributions Learns representations only by using the atomic number and geometry as atom and bond features, respectively |
Improves the performance on 8 out of 13 properties in QM9 data compared to MPNN Performs relatively well compared to MPNN for extensive properties Requires only the nuclear charge and nuclear coordinates for learning input representations |
Relatively poor performance for intensive properties compared to MPNN Use optimized 3D coordinates |
| MEGNet [ |
Learns the global representations of the molecules Uses several atomic and bond properties of the atom and bond as atom and bond features Adds the global state attribute of molecule in addition to atom and bond feature |
Improves the performance on all the extensive properties compared to MPNN and SchNet Works equally well for molecules and solid Provides good accuracy with RDkit-generated 3D coordinates |
Larger error for intensive properties compared to MPNN It calculates MAE errors for atomization energies of U0, U, H, and G and compares with MAE on U0, U, H, and G of SchNet |
| SchNet-edge [ |
Edge feature also depends on the features of the atom receiving the message |
Improves the accuracy of the model over SchNet/MPNN in all the properties in the QM9 dataset |
Requires optimized 3D coordinates |
Mean absolute errors obtained from several benchmark methods on 12 different properties using the QM9 molecular dataset. Bold represents the lowest mean absolute errors among the models. * represents the property trained for respective atomization energies. Target corresponds to the chemical accuracy for each property desired from the predictive ML models.
| Property | Units | MPNN | SchNet-Edge | SchNet | MegNet | Target |
|---|---|---|---|---|---|---|
| HOMO | eV | 0.043 |
| 0.041 | 0.038 ± 0.001 | 0.043 |
| LUMO | eV | 0.037 |
| 0.034 |
| 0.043 |
| band gap | eV | 0.069 |
| 0.063 | 0.061 ± 0.001 | 0.043 |
| ZPVE | meV | 1.500 | 1.490 | 1.700 |
| 1.200 |
| dipole moment | Debye | 0.030 |
| 0.033 | 0.040 ± 0.001 | 0.100 |
| polarizability | Bohr | 0.092 |
| 0.235 | 0.083 ± 0.001 | 0.100 |
| R | Bohr | 0.180 |
| 0.073 | 0.265 ± 0.001 | 1.200 |
| U | eV | 0.019 | 0.011 * | 0.014 | 0.043 | |
| U | eV | 0.019 | 0.016 * | 0.019 | 0.043 | |
| H | eV | 0.017 | 0.011 * | 0.014 | 0.043 | |
| G | eV | 0.019 | 0.012 * | 0.014 | 0.043 | |
| C | cal (mol K) | 0.040 | 0.032 | 0.033 |
| 0.050 |
Figure 5Generative model such as 3D-scaffold [69] can be used to inverse design novel candidates with desired target properties starting from core scaffold or functional group.
Figure 6Molecular modeling methods used to study protein–ligand interactions including molecular docking simulations, molecular mechanics methods, hybrid Quantum Mechanics/Molecular Mechanics simulations, and deep learning models for the activity and affinity prediction.