| Literature DB >> 35269939 |
Abstract
The availability of computers has brought novel prospects in drug design. Neural networks (NN) were an early tool that cheminformatics tested for converting data into drugs. However, the initial interest faded for almost two decades. The recent success of Deep Learning (DL) has inspired a renaissance of neural networks for their potential application in deep chemistry. DL targets direct data analysis without any human intervention. Although back-propagation NN is the main algorithm in the DL that is currently being used, unsupervised learning can be even more efficient. We review self-organizing maps (SOM) in mapping molecular representations from the 1990s to the current deep chemistry. We discovered the enormous efficiency of SOM not only for features that could be expected by humans, but also for those that are not trivial to human chemists. We reviewed the DL projects in the current literature, especially unsupervised architectures. DL appears to be efficient in pattern recognition (Deep Face) or chess (Deep Blue). However, an efficient deep chemistry is still a matter for the future. This is because the availability of measured property data in chemistry is still limited.Entities:
Keywords: deep chemistry; deep learning; drug design; feature engineering; feature learning; molecular representation; self-organizing maps; supervised learning; unsupervised learning
Mesh:
Year: 2022 PMID: 35269939 PMCID: PMC8910896 DOI: 10.3390/ijms23052797
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The direct drug design problem can be defined as mapping property to structure (P → S)3. Mainly, it is realized in the indirect mode by structure to property mapping (S → P). Individual methods allow to include various domains (S → P)1 or (S → P)2. Domain diversity is indicated schematically by colors.
Figure 2Supervised learning vs. unsupervised learning architectures. Both modes demand optimization; however, while in supervised learning, we need a label within the inputs which we use to estimate the error between the label and the output value, in unsupervised learning, the error is minimized by comparing the unlabeled inputs.
Figure 3The propane vs. butane colored by methyl (yellow or blue in butane, yellow or green in propane) and methylene (red or green in butane, red in propane) fragments (a) provides a series of two types of CoMSA (SOM) projections (b), depending upon the SOM network regulation. Two types of patterns (b) can be explained by fuzzy topology (c). Details in text.
Figure 4A series of CBG steroid surface data projected by CoMSA (SOM) without superimposition [21]. Without a single misinterpretation, H (high) and M (medium) activity compounds can be differentiated from the L (low) activity compounds. Details in text. Copyright © 1996 Polish Chemical Society.
Recent DL applications in drug design.
| Problem | Data/Learning Type | Reference |
|---|---|---|
| DNA subregion binding | In vitro HTS/convolutional neural networks | [ |
| Protein function | 3D electron density/convolutional filters | [ |
| Genomics | Gene expression contrastive divergence (unsupervised) [ | [ |
| Pharmacodynamics (DeepDTI) | Drug-protein interaction/unsupervised/then supervised [ | [ |
| DeepAffinity | Compound-protein affinity/supervised | [ |
| DeepTox toxicity | Toxic data/multi-task networks (supervised) | [ |
| Drug IC50 | Mol. descriptors/supervised | [ |
| VAE chemical properties | SMILES; molecular graphs/unsupervised | [ |
| VAE/GENTRL DDR1 small molecule design | SMILES; Kohonen-SOM based reward function/semi-supervised | [ |
| VAE/Graph encoders | Molecular graphs/unsupervised | [ |
| Protein-ligand pair | SMILES; voxels/unsupervised | [ |
| CMap/gen perturbagens | Gen-expression profiles/unsupervised | [ |
| Scaffold generation | molecular graphs; physicochemical properties; fragments/unsupervised | [ |
Figure 5Automatic chemical design using a data-driven continuous representation of molecules. In the critical operation of the latent space formation, the architecture analyzes the similarity of the SMILES codes of the candidate and the known inhibitor structures. A deep neural network involves three coupled functions: an encoder, a decoder (a) and a predictor (b) [45]. Copyright © 2018 American Chemical Society.