| Literature DB >> 34349895 |
Alice Capecchi1, Xingguang Cai1, Hippolyte Personne1, Thilo Köhler2,3, Christian van Delden2,3, Jean-Louis Reymond1.
Abstract
Machine learning (ML) consists of the recognition of patterns from training data and offers the opportunity to exploit large structure-activity databases for drug design. In the area of peptide drugs, ML is mostly being tested to design antimicrobial peptides (AMPs), a class of biomolecules potentially useful to fight multidrug-resistant bacteria. ML models have successfully identified membrane disruptive amphiphilic AMPs, however mostly without addressing the associated toxicity to human red blood cells. Here we trained recurrent neural networks (RNN) with data from DBAASP (Database of Antimicrobial Activity and Structure of Peptides) to design short non-hemolytic AMPs. Synthesis and testing of 28 generated peptides, each at least 5 mutations away from training data, allowed us to identify eight new non-hemolytic AMPs against Pseudomonas aeruginosa, Acinetobacter baumannii, and methicillin-resistant Staphylococcus aureus (MRSA). These results show that machine learning (ML) can be used to design new non-hemolytic AMPs. This journal is © The Royal Society of Chemistry.Entities:
Year: 2021 PMID: 34349895 PMCID: PMC8285431 DOI: 10.1039/d1sc01713f
Source DB: PubMed Journal: Chem Sci ISSN: 2041-6520 Impact factor: 9.825
Fig. 1(a) Strategy schematic. An AMP RNN generative model, an AMP RNN activity classifier, and a hemolysis RNN classifier were trained using activity (orange) and hemolysis (blue) data from DBAASP. (1) Two copies of the AMP RNN generative model (prior model) were transferred learned using active and non-hemolytic peptides against specific strains: P. aeruginosa/A. baumannii and S. aureus, respectively. (2) The fine-tuned models were sampled, and the generated sequences were first classified using the RNN AMP activity classifier and then the RNN hemolysis classifier. (3) The selected sequences were further filtered to obtain short peptides of maximum 15 residues with at least five mutations from the sequences in DBAASP and no d amino acids. Then two different selection strategies were used. In the first selection strategy (1st strategy) we used the calculated amphiphilicity of the sequences to further filter them, and we clustered the selected ones. In the second selection strategy (2nd strategy) we select at random 10 sequences. (4) Finally, the 28 chosen sequences were synthesized and tested. (b) ROC curves of the test set for the NB, RF, SVM, RNN, and RNN with scrambled labels (RNN scr.) models for the AMP activity (b) and hemolysis (c) classification tasks. The probabilistic prediction values were converted into binary classification values using a threshold of 0.5.
Synthesis and activity of generated peptides
| cpd | Sequence |
|
| MRSA | MHC |
|
|---|---|---|---|---|---|---|
|
| ||||||
|
|
|
|
|
|
|
|
|
| RRWKWRRKIKKWL |
|
| 4 |
| 16 |
| GN3 | IDKWKAAFKKIKNLF |
|
| 8 |
| 8–16 |
| GN4 | LNALKKVFQKIRQGL | 32 |
| >64 |
| 4 |
| GN5 | KFFRKLKKLVKK |
| >64 | 64 |
| 64 |
| GN6 | RLRKKWRKLKKLL | 32 |
| 64 |
| 16–32 |
|
| ||||||
| GN7 | KRIRKWVRRILKKL | 4 | 4 | 4 | 250 | 16 |
| GN8 | LRKFWKKIRKFLKKI | 8 | 4 | 4 | 62.5 | 16 |
| GN9 | KRLWKRIYRLLKK | 8 | 8 | 8 | 250 | 4–8 |
|
| ||||||
| GN10 | IRRIRKKIKKIFKKI | 32 | 32 | 64 | >2000 | 16 |
| GN11 | LRKARRLLKKLRARL | >64 | 32 | 32 | >2000 | 32 |
| GN12 | GNWRKIVHKIKKAG | 32 | >64 | >64 | >2000 | 16 |
| GN13 | AGRLQKVFKVIAK | 64 | >64 | >64 | >2000 | 32 |
| GN14 | IHKLAKLAKNVL | >64 | >64 | >64 | >2000 | 32 |
|
| ||||||
|
|
|
|
|
|
|
|
| GP2 | RWRWPILGRILR | 8 | 16 |
|
| 16 |
|
| ||||||
| GP3 | FLHSIGKAIGRLLR | 16 | 16 | 8 | 250 | 8 |
|
| ||||||
| GP4 | GIGAVLNVAKKLL | 64 | 32 | 32 | >2000 | 16 |
| GP5 | KVARFLKKFFR | 64 | 32–64 | 32 | >2000 | 4 |
| GP6 | LKKLWKRIIKVGR | 32 | 16–32 | 64 | >2000 | 8 |
| GP7 | ARKWRKFLKKI | >64 | 64 | 64 | >2000 | 32–64 |
| GP8 | GRIKRIRKIIHKY | 8 | 32 | >64 | >2000 | 32 |
| GP9 | ARKKWRKRLKKLKI | 32–64 | >64 | >64 | >2000 | 32–64 |
| GP10 | AKKVVKKIYKRFQK | >64 | 64 | >64 | >2000 | 64 |
| GP11 | ARKFRRLVKKLR | >64 | >64 | >64 | >2000 | 64 |
| GP12 | LRKARRLVKKLA | >64 | >64 | >64 | >2000 | >64 |
| GP13 | KRLWKIRQRIAK | >64 | >64 | >64 | >2000 | 32 |
| GP14 | LNALKKVFQKIH | >64 | >64 | >64 | >2000 | >64 |
Compounds labeled as GN were obtained from the P. aeruginosa/A. baumannii model, compounds labeled as GP were obtained from the S. aureus model; in both sets, compounds were ordered according to their activity and hemolysis profile; GN2, 6, 9, 10 and GP2, 6, 9, 11 were obtained using the second selection strategy.
One-letter code for amino acids. All peptides are carboxamides (–CONH2) at the C terminus.
MIC was determined after incubation for 16–20 h at 37 °C.
MHC was measured on human red blood cells in 10 mM phosphate buffer saline, pH 7.4, 25 °C. 0.1% Triton X-100 was used as a positive control. Cells in italic denote MIC <32 μg mL−1 towards the bacterial strains used for the design (P. aeruginosa/A. baumannii for GN and S. aureus for GP) and MHC ≥500 μg mL−1.
MICa of GN1 and GP1 towards further MDR and non-MDR bacterial strains
|
|
| Polymyxin B | |
|---|---|---|---|
|
| 4 | 4 | 0.5 |
|
| 64 | 64 | 4 |
|
| 2 | 8–16 | <0.5 |
|
| 2 | 8–16 | 1 |
|
| 4 | 32–64 | 2 |
|
| 8 | 64 | 2 |
|
| 4 | 16 | 0.5 |
|
| 8 | 16–32 | 1 |
|
| >64 | 16–32 | 1 |
|
| >64 | 32 | 1 |
|
| >64 | >64 | >64 |
|
| 16 | 16 | 32–64 |
The MIC was determined in Müller–Hinton medium after 16–20 h of incubation at 37 °C. Each result represents two independent experiments performed in duplicate.
MDR strains.
Gram-negative strains.
Strains carrying spontaneous mutations in the indicated genes, all leading to polymyxin B resistance.
Gram-positive strain.
Fig. 2(a) CD spectra of GN1, GN2, and GP1 recorded at 0.100 mg mL−1 in 10 mM phosphate buffer pH 7.4 with or without 5 mM DPC. (b) Extraction of percentages of secondary structure from primary CD data using DichroWeb. The Contin-LL method and reference set 4 were used. (c) Helix properties predicted by HeliQuest. Circle size proportional to side-chain size, blue indicates cationic residues, yellow indicates hydrophobic residues, grey indicates alanine, green indicates proline, purple indicates serine. The arrows inside each helix wheel indicates the magnitude and direction of the hydrophobic moment.
Fig. 3MD simulations of GN1 in water and in presence of a DPC micelle over 250 ns using GROMACS. (a) Average structure (stick model) in water over 100 structures sampled over the last 100 ns (thin lines). Hydrophobic side chains are colored in red and cationic side chains in blue. (b) Average structure (cartoon model for backbone and stick model for side chains) with DPC micelle over 100 structures sampled over the last 100 ns (thin lines). (c) RMSD (root mean square deviation) of the peptide backbone atoms relative to the starting α-helical conformation. (d) Number of intramolecular hydrogen bonds. The DPC micelle was omitted for clarity.
Fig. 4TEM images of P. aeruginosa and A. baumannii, after 2 hours treatment of GN1 in MH medium. Blue arrows indicate effects on the bacteria.