| Literature DB >> 34504303 |
Marcelo C R Melo1,2,3, Jacqueline R M A Maasch1,2,3,4, Cesar de la Fuente-Nunez5,6,7.
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Databases for computational antibiotic discovery.
| Database | Site |
|---|---|
| General drug discovery and biomolecular informatics | |
| Binding MOAD[ | |
| BindingDB[ | |
| BRENDA[ | |
| ChEMBL[ | |
| Drug Design Data Resource | |
| Drug Repurposing Hub[ | |
| DrugBank[ | |
| MoleculeNet[ | |
| Protein Data Bank[ | |
| PubChem[ | |
| Search Tool for Interacting Chemicals[ | |
| Side Effect Resource[ | |
| SuperTarget[ | |
| Therapeutics Data Commons | |
| Therapeutic Target DB[ | |
| UniProt[ | |
| ZINC[ | |
| Exclusively infectious disease | |
| ADAM[ | |
| ADAPTABLE[ | |
| Collection of Antimicrobial Peptides[ | |
| Data Repository of Antimicrobial Peptides[ | |
| DB of Antimicrobial Activity and Structure of Peptides[ | |
| dbAMP[ | |
| MEGARes: Antimicrobial DB for High-Throughput Sequencing[ | |
| National DB of Antibiotic-Resistant Organisms | |
| Pathosystems Resource Integration Center[ | |
| Tropical Disease Research Targets[ |
Public databases (DB) of general use in computational drug discovery and biomolecular informatics, as well as those specific to antimicrobial discovery and resistance.
Fig. 1Computational antibiotic discovery pipeline.
The figure provides an overview of data and methods used in antibiotic discovery and development using AI. From left to right, key elements in the drug development process are exemplified. The first part of any AI-driven project is gathering the experimental information that will enable model creation. The data are then transformed into AI-ready representations. Subsequently, models are trained using algorithms that can range from traditional decision trees to novel neural networks. Finally, trained models can be used to predict diverse qualities, e.g., the effectiveness of an antibiotic, potential for toxic activity, development of resistance, or the structure of novel compounds that exhibit desirable traits.
Machine learning models for antibiotic discovery.
| Public release | ||||
|---|---|---|---|---|
| Algorithm | Code | Data | Software | Software type |
| Antimicrobial activity prediction | ||||
| Artificial neural network[ | Yes | |||
| Support vector machine[ | Yes | |||
| Multinomial logistic regression[ | Yes | |||
| LSTM RNN[ | Yes | Yes | Yes | Command-line tool |
| XGBoost[ | Yes | Yes | Yes | Command-line tool |
| Directed-message passing neural network[ | Yes | Yes | Yes | Web server, Docker container |
| DBSCAN[ | Yes | Yes | Web server | |
| DBSCAN[ | Yes | Web server | ||
| Convolutional neural network[ | Yes | Yes | Web server | |
| Generalized linear model[ | ||||
| Random forest[ | ||||
| Hemolytic activity prediction | ||||
| Classification and regression trees[ | Yes | |||
| Artificial neural network[ | Yes | Yes | Web server | |
| Gradient boosting classifiers[ | Yes | Yes | ||
| Support vector machine[ | Yes | Yes | Web server, mobile app, standalone | |
| De novo antibiotic design | ||||
| Variational autoencoder[ | Yes | |||
| LSTM RNN[ | Yes | Yes | Yes | Command-line tool |
| LSTM RNN[ | ||||
| Generative adversarial network[ | Yes | Yes | Yes | Command-line tool |
Machine learning models cited in this review pertain specifically to antimicrobial compound discovery, i.e., those that predict antimicrobial activity, those trained on antimicrobial compound data to predict drug-likeness, and those that generate potential antimicrobials. Public release of model source code, training and/or testing data, and/or associated software tools are noted. Criteria for data release were lenient, with “yes” indicating partial or full release of training or testing data.
Fig. 2Open science practices in machine learning for antibiotic discovery.
This Euler diagram visualizes public release rates for source code, training or testing data, software, and combinations thereof among publications cited in this review (Table 2). Note that data release criteria for this analysis include both partial and full public availability. This analysis was performed post hoc on studies previously cited in this review.
Fig. 3Machine learning in antibiotic discovery over time.
From top to bottom: total PubMed results when querying for AI/ML keywords only, total results when querying for AI/ML and general or disease group-specific drug keywords, and the proportion of general AI/ML publications pertaining to each category of drugs (i.e., total publication counts per drug category scaled by total AI/ML publications per year). Queries sought keywords in titles and abstracts only, with the general drug query excluding keywords contained in the disease group queries to prevent double-counting. Key events in the broader ML community are noted to contextualize trend lines. The relevant literature used to set key dates are as follows: development of SVM[146] and random forest algorithms[147] in 1995; publication of the R language and software environment in 1996[148]; development of LSTM in 1997[149]; development of the Biopython package in 2000[150]; release of the Java interface for Weka in 2002[151]; publication of the Torch library in 2002[152]; release of Bioconductor in 2004[153]; the publication of ImageNet in 2009[154]; the initial release of Scikit-learn in 2010[155]; the initial release of XGBoost[156] and development of GANs[102] in 2014; development of Keras[157] and TensorFlow[158] in 2015; and the initial release of PyTorch in 2016[159]. Exact Boolean searches in PubMed can be found in Supplementary Table 1.