| Literature DB >> 34159484 |
Manish Kumar Tripathi1, Abhigyan Nath2, Tej P Singh1, A S Ethayathulla1, Punit Kaur3.
Abstract
The accumulation of massive data in the plethora of Cheminformatics databases has made the role of big data and artificial intelligence (AI) indispensable in drug design. This has necessitated the development of newer algorithms and architectures to mine these databases and fulfil the specific needs of various drug discovery processes such as virtual drug screening, de novo molecule design and discovery in this big data era. The development of deep learning neural networks and their variants with the corresponding increase in chemical data has resulted in a paradigm shift in information mining pertaining to the chemical space. The present review summarizes the role of big data and AI techniques currently being implemented to satisfy the ever-increasing research demands in drug discovery pipelines.Entities:
Keywords: Artificial intelligence; Autoencoders; Big data; Deep learning; Drug discovery; Machine learning
Mesh:
Year: 2021 PMID: 34159484 PMCID: PMC8219515 DOI: 10.1007/s11030-021-10256-w
Source DB: PubMed Journal: Mol Divers ISSN: 1381-1991 Impact factor: 3.364
Fig. 1Growth of machine learning with the subsequent increase in big data and computation power; KB—Kilobyte, MB—Megabyte, CPU—Central processing unit, GPU—Graphics processing unit, HTS—High throughput sequencing
Data sources used in drug discovery
| S. no. | Database | Url link | Data type | Data size (as on 15 March 2021) | References |
|---|---|---|---|---|---|
| 1 | ChEMBL | Chemical database containing bioactive and drug-like molecules | 1,800,000 compounds | [ | |
| 2 | PubChem | Chemical database containing chemicals and their activity against biological targets | 110 Million Compounds, 271 Million Substance, 297 Million Bioactivities | [ | |
| 3 | Small molecule pathway database (SMPDB) | Database containing small molecule pathway of humans | 48,690 pathways | [ | |
| 4 | ZINC | Database containing curated chemical compounds | > 750 Million compounds | [ | |
| 5 | Human metabolome database (HMDB) | Database containing 1) Chemical data, 2) clinical data, and 3) molecular biology/biochemistry data | 114,304 metabolite entries | [ | |
| 6 | Binding database (BindingDB) | Database containing small molecules binding affinity data and protein targets | 2,240,573 binding data for 8503 protein targets, 971,073 small molecules | [ | |
| 1 | DrugBank | Database containing information about drug and drug targets | 13,441 drugs | [ | |
| 2 | Drugs@FDA database | Database containing FDA approved drug molecules | 1600 FDA approved drugs | [ | |
| 3 | Drug central | Database containing active chemical entities, pharmaceutical product and drug mode of action | 4642 drugs, 110,577 pharmaceutical products | [ | |
| 1 | Supertarget | Database containing drug target information | 3,32,828 drug target interaction | [ | |
| 2 | Ligand Depot | Database containing information of chemical and structural information about small molecules | 30,480 ligand entry | [ | |
| 3 | BioGRID | Database information about protein, genetic and chemical interactions | 2,015,809 protein and genetic interactions, 29,093 chemical interactions and 1,017,123 post-translational modifications | [ | |
| 1 | Database of interacting proteins (DIP) | Database containing information of protein–protein interaction | 40,678 interaction information | [ | |
| 2 | Therapeutic target database (TTD) | Database containing information of known therapeutic protein and nucleic acid targets | 2458 protein target, 5059 patented drugs | [ | |
| 3 | Potential drug target database (PDTD) | Database containing information about drug target with known 3d structure | 1100 entries covering > 800 known potential drug targets | [ | |
| 1 | BioCyc | Database containing information of organism specific Pathway/ Genome Databases | 18,030 Pathway/Genome information | [ | |
| 2 | BRENDA | Database containing information of enzyme function data | 84,000 Enzyme data | [ | |
| 3 | Reactome | A curated database containing information of biological pathways, including the metabolic, protein trafficking and signalling pathways | > 9600 proteins, 9800 reactions and 2000 pathways for humans | [ | |
| 4 | KEGG | Database containing information of genomic, chemical and functional information | 18,778 chemical compound metabolite, 7062 Genome information, 1312 Network information | [ | |
| 1 | Comparative toxicogenomics database | Database containing information about environmental exposures to human health | 2.7 million manually curated chemical gene, chemical phenotype, chemical disease, gene-disease and chemical exposure interactions | [ | |
| 2 | TOXNE | Database containing hazardous substance data | 5800 chemical substance | [ | |
| 3 | DrugMatrix | Database of toxicogenomic reference resources | 600 drug molecules and 10,000 genes | [ | |
Different classes of descriptors with their examples
| S. no. | Descriptor class | Property of particular class of descriptors |
|---|---|---|
| 1 | 0D or count descriptors | Atom counts, bond counts, molecular weight |
| 2 | 1D or fingerprints | Molecular weight |
| 3 | 2D or topological descriptors | Atom and bond count, connectivity between atoms, Pharmacophore features, adjacency and distance matrix, molecular fingerprint |
| 4 | 3D or geometrical descriptors | Potential energy, surface area, volume and shape, conformational charge |
Fig. 2Workflow of machine learning (ML) process in drug discovery
Fig. 3a Deep learning neural network (DLNN) without dropout b Deep learning neural network (DLNN) with dropout
Fig. 4De novo chemical design using generative adversarial networks (GANs)
Fig. 5Representation of an autoencoder. The green circles represent the hidden layer
Fig. 6A deep autoencoder with hidden layers. The hierarchical representations from the hidden layers can be used as features in the training of learning algorithms
Fig. 7Schematic representation of stacking ensemble approach
Fig. 8Role of AI technology in different phases of drug discovery
AI computational tools for drug design
| S. no. | Tools | Algorithm used | Url | References |
|---|---|---|---|---|
| 1 | AlphaFold | Predicts tertiary structure of a protein using deep neural network | [ | |
| 2 | Chemputer | Give detailed recipe for compound synthesis | [ | |
| 3 | Conv_qsar_fast | Predict molecular properties based CNN method | [ | |
| 4 | Chemical VAE | Automated chemical design using variational autoencoder (VAE) | [ | |
| 5 | DeepChem | An open-source Python library uses a deep learning algorithm for compound identification | [ | |
| 6 | DeepTox | Predict the toxicity of chemical compounds using deep learning algorithm | [ | |
| 7 | DeepNeuralNetQSAR | Predict molecular activity using multilevel deep neural network (DNN) | [ | |
| 8 | DeltaVina | Predict small molecule binding affinity with drug with a combination of random forest (RF) and AutoDock scoring function | [ | |
| 9 | Hit Dexter | Predict frequent hitter by using machine learning (ML) algorithm | [ | |
| 10 | InnerOuterRNN | Predicts the physical, chemical and biological properties using inner- and outer recursive neural networks | [ | |
| 11 | JunctionTree VAE | De novo molecule design using junction tree variational autoencoder (VAE) | [ | |
| 12 | Neural graph fingerprint | Predict the property of novel compounds using CNN | [ | |
| 13 | NNScore | Predict the affinity of protein–ligand interaction using neural network-based scoring function | [ | |
| 14 | ORGANIC | De novo design of organic molecule and polymer using ML algorithm | [ | |
| 15 | Open Drug Discovery Toolkit (ODDT’s) | Chemoinformatics pipeline using random forest score (RF)-Score and NNScore | [ | |
| 16 | PotentialNet | Predict binding affinity using graph convolutional neural network (CNN) | [ | |
| 17 | PPB2 | Predict the target of query molecule using nearest neighbour and machine learning algorithm | [ | |
| 18 | QML | Python toolkit for quantum machine learning | [ | |
| 19 | REINVENT | De novo design of molecule using RNN (recurrent neural network) and RL (reinforcement learning) | [ |
Collaborations of AI organization with pharmaceutical companies
| S. no. | Company | Role of AI | Collaboration with the pharmaceutical company | Platform developed/Clinical trial candidates |
|---|---|---|---|---|
| 1 | Numerate | A platform for AI-based drug design against oncology and gastroenterology | Takeda | Drug candidate S48168 in Phase 1 clinical trial against Ryanodine receptor 2 |
| 2 | Numerate | A platform for AI-based drug design against oncology and gastroenterology | Servier | Drug development for oncology, gastroenterology and central nervous system disorders |
| 3 | Atomwise | A platform for AI-based structure modelling | Lilly | Drug candidate BBT-401 in Phase 2 clinical trial |
| 4 | Atomwise | A platform for AI-based structure modelling | Bridge Biotherapeutics | Expansion of Pellino Inhibitor Pipeline; BBT-401 in Phase-2a clinical trial |
| 5 | Benevolent AI | AI-enabled Judgement Augmented Cognition System (JACS) to develop novel clinical candidate against neurodegenerative diseases | Janssen | New range of drug molecules to be developed through this collaboration |
| 6 | Benevolent AI | AI enable platforms to develop novel clinical candidate against chronic kidney diseases | AstraZeneca | Drug candidate Placebo in Phase 2b clinical trial as a drug candidate for chronic kidney disease |
| 7 | Exscientia | A platform for AI-based drug discovery and lead optimization | Sanofi | Research in obsessive–compulsive disorder, Drug candidate DSP-1181 in Phase I clinical trial. Developed Centaur Chemist™ platform for AI-based drug discovery |
| 8 | IBM Watson Health | Provide a platform for clinical and health-related data research | Pfizer | Fast-tracking drug discovery research in immuno-oncology |
| 9 | IBM Watson Health | Provide a platform for clinical and health-related data research | Novartis | Real-time monitoring of patients to improve breast cancer patient outcome |
| 10 | Microsoft | A platform for image processing and cell and gene-based therapeutics | Novartis | Establishing an AI Innovation lab to transform the drug discovery process and its commercialization |
| 11 | Owkin | Provide a platform for a clinical trial based on ML technology | Roche | Developed Owkin’s Studio platform using AI technology |
| 12 | Sensyne health | A platform for clinical AI technology | Bayer | Developed Sensyne Health’s proprietary clinical AI technology platform |
| 13 | XtalPi | A platform for Target identification and validation based on QM and ML algorithm | Pfizer | Prediction and Optimization of crystalline forms of drug candidates for early drug screening |
| 14 | BioXcel therapeutics | A platform for the drug discovery application using AI technology | Pfizer | Drug candidate BXCL501-in Phase 3 clinical trial Drug candidate BXCL701-in Phase 2 clinical trial |