| Literature DB >> 31375699 |
Gisela Gabernet1, Damian Gautschi1, Alex T Müller1, Claudia S Neuhaus1, Lucas Armbrecht2, Petra S Dittrich2, Jan A Hiss1, Gisbert Schneider3.
Abstract
Membranolytic anticancer peptides represent a potential strategy in the fight against cancer. However, our understanding of the underlying structure-activity relationships and the mechanisms driving their cell selectivity is still limited. We developed a computational approach as a step towards the rational design of potent and selective anticancer peptides. This machine learning model distinguishes between peptides with and without anticancer activity. This classifier was experimentally validated by synthesizing and testing a selection of 12 computationally generated peptides. In total, 83% of these predictions were correct. We then utilized an evolutionary molecular design algorithm to improve the peptide selectivity for cancer cells. This simulated molecular evolution process led to a five-fold selectivity increase with regard to human dermal microvascular endothelial cells and more than ten-fold improvement towards human erythrocytes. The results of the present study advocate for the applicability of machine learning models and evolutionary algorithms to design and optimize novel synthetic anticancer peptides with reduced hemolytic liability and increased cell-type selectivity.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31375699 PMCID: PMC6677754 DOI: 10.1038/s41598-019-47568-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Performance of support vector machine and random forest models for ACP prediction.
| Metrics | Support Vector Machine | Random Forest | ||||
|---|---|---|---|---|---|---|
| CV score | Train score | Test score | CV score | Train score | Test score | |
| MCC | 0.88 ± 0.05 | 0.91 | 0.90 | 0.90 ± 0.05 | 1 | 0.91 |
| Accuracy | 0.94 ± 0.02 | 0.96 | 0.96 | 0.95 ± 0.02 | 1 | 0.96 |
| Precision | 0.89 ± 0.04 | 0.92 | 0.91 | 0.96 ± 0.03 | 1 | 0.97 |
| Recall | 0.95 ± 0.06 | 0.96 | 0.95 | 0.90 ± 0.06 | 1 | 0.90 |
Scores obtained from ten-fold cross-validation (CV) score (mean ± std), on the whole training dataset (Train score) and the independent test dataset (Test score) for the support vector machine and random forest models.
Comparison of the model performance (Test score) with other online available ACP prediction tools calculated by using the independent test set. The Matthews correlation coefficient (MCC), accuracy, precision and recall were used as metrics (Methods Eqs 1–4).
| Metrics | AntiCP Model 1 | AntiCP Model 2 | iACP | MLACP |
|---|---|---|---|---|
| MCC | −0.04 | 0.81 | 0.51 | 0.84 |
| Accuracy | 0.29 | 0.92 | 0.77 | 0.93 |
| Precision | 0.29 | 0.81 | 0.58 | 0.96 |
| Recall | 0.99 | 0.92 | 0.78 | 0.80 |
The 18 features obtained after covariance elimination and sequential feature selection.
| Feature | Weight | Description |
|---|---|---|
| ADd0 | −1.94 | Frequency of amino acids with hydrogen-bond acceptor and donor groups (T, C, Q, N, S and Y) |
| DDd0 | 1.67 | Frequency of amino acids with hydrogen-bond donor groups (K, T, C, Q, H, R, W, N, S and Y) |
| H | 1.65 | Global peptide hydrophobicity (Eisenberg consensus scale[ |
| RPd0 | −0.72 | Frequency of aromatic amino acids with a positively ionizable group (H) |
| ADd2 | 0.65 | Frequency of amino acids with hydrogen-bond acceptor and amino acids with donor groups at distance 2 |
| µH | 0.50 | Peptide hydrophobic moment |
| LDd0 | 0.40 | Frequency of lipophilic amino acids with hydrogen-bond donor groups |
| Len | 0.40 | Peptide length |
| PPd2 | 0.39 | Frequency of amino acids with positively ionizable groups at distance 2 |
| RPd5 | 0.38 | Frequency of aromatic amino acids and amino acids with positively ionizable groups at distance 5 |
| APd6 | −0.38 | Frequency of amino acids with hydrogen-bond acceptor groups and amino acids with positively ionizable groups at distance 6 |
| RAd3 | −0.26 | Frequency of amino acids with hydrogen-bond acceptor and amino acids with donor groups at distance 3 |
| RAd2 | −0.25 | Frequency of amino acids with hydrogen-bond acceptor and amino acids with donor groups at distance 2 |
| APd1 | −0.25 | Frequency of amino acids with hydrogen-bond acceptor groups and amino acids with positively ionizable groups at distance 1 |
| DNd1 | −0.16 | Frequency of amino acids with hydrogen-bond donor groups and amino acids with negatively ionizable groups at distance 1 |
| APd2 | −0.11 | Frequency of amino acids with hydrogen-bond acceptor groups and amino acids with positively ionizable groups at distance 2 |
| RPd2 | −0.08 | Frequency of aromatic amino acids and amino acids with positively ionizable groups at distance 2 |
| RRd6 | 0.02 | Frequency of aromatic amino acids at distance 6 |
The top scoring features are ranked by their absolute support vector machine weight values, as a measure of their relative importance for ACP classification. An interpretation of each feature is provided.
Experimental validation of the SVM prediction model.
| Peptide | Sequencea | ϕACP | ϕNeg | Predictionb | MCF7 EC50/µM | A549 EC50/µM | Outcomec |
|---|---|---|---|---|---|---|---|
| Helical1 | FLWIKLGKLAGAVLKLILGLKKVV | 0.94 | 0.45 | + | 4.4 ± 1.3 | 8.3 ± 2.0 | TP |
| Helical2 | GLWAIAVKAGKVILKLIVFIWIRV | 0.94 | 0.45 | + | >50 | >50 | FP |
| Helical3 | GLLDIAGGNAETLAGHAV | 0.44 | 0.90 | − | >50 | >50 | TN |
| Helical4 | GLFDVIGSQAGGAAPHFLG | 0.46 | 0.89 | − | >50 | >50 | TN |
| AmphiArc1 | KWVKKVHNWLRRWIKVFEALFG | 0.96 | 0.46 | + | 7.0 ± 0.5 | 18.4 ± 0.7 | TP |
| AmphiArc2 | KIFKKFKTIIKKVWRIFGRF | 0.95 | 0.46 | + | 5.7 ± 0.7 | 9.3 ± 1.5 | TP |
| AmphiArc3 | AFRHSVKEELNYIRRRLERFPNRL | 0.42 | 0.91 | − | >50 | >50 | TN |
| AmphiArc4 | RIENGLRKRLQSIYRHLEE | 0.42 | 0.91 | − | >50 | >50 | TN |
| Gradient1 | KWVRIWIKVLRGLFVWVWFF | 0.96 | 0.46 | + | >50 | >50 | FP |
| Gradient2 | AWLKRIKKFLKALFWVWVW | 0.96 | 0.46 | + | 19.0 ± 1.8 | >50 | TP |
| Gradient3 | KVVDNFENILII | 0.40 | 0.85 | − | >50 | >50 | TN |
| Gradient4 | RVNAAIPNIIV | 0.41 | 0.84 | − | >50 | >50 | TN |
The peptides from each virtually designed library were evaluated according to a similarity-weighted score for belonging to the positive (ϕACP) and negative (ϕ) class. The two peptides with the highest ϕACP and ϕNeg scores for each library were synthesized and tested for anticancer activity on breast adenocarcinoma (MCF7) and lung adenocarcinoma (A549) cell lines (EC50, mean ± std, N = 3).
aAll peptides were synthesized with amidated C-termini; bPrediction: +predicted to be active, − predicted to be inactive; cOutcome: TP: true positive, FP: false positive, TN: true negative.
Figure 1Characterization of the AmphiArc2 peptide. (a) Helical wheel plot of the peptide sequence with annotated hydrophobic moment direction and magnitude (µ). Polar residues are shown in light blue, positively charged residues in dark blue, hydrophobic residues in yellow, and aromatic residues in orange. (b) Circular dichroism spectra of the peptide in water (blue) and in a 50% v/v TFE:water solution (red). (c) Time sequence of cell death of a single MCF7 cell trapped in a microfluidic chamber after exposure to the AmphiArc2 peptide. The cells were fluorescently labeled with calcein-AM dye in the cytosol, and their membrane was stained with fluorescently labeled EpCAM antibody. The scale bar represents 10 µm. (d) EC50 values of the peptide activity against the A549 and MCF7 cancer cells, noncancer HDMEC primary cells and the hemolytic activity (HC50) value of the peptide activity against human erythrocytes are shown. Error bars show the standard deviation of N = 3 independent experiments.
Figure 2Peptide selectivity optimization by simulated molecular evolution (SME). (a) Principle of the iterative variation or mutation and selection steps in SME, starting with the model parent peptide “ANTICANCER”. (b) Probability of the mutation of amino acid residue i in the parent sequence to residue j in the offspring as a function of the amino acid pairwise similarity (d). The sigma (σ) parameter controls the sequence diversity among the offspring. (c) Comparison of the 10 generated offspring sequences and their Euclidean distance to the parent sequence according to the Grantham similarity matrix. The [0, 1] normalized Shannon entropy (in bit in the graph) of each residue position is shown below. Residue coloring is as follows: light blue: polar, dark blue: positively ionizable, red: negatively charged, yellow: hydrophobic, orange: aromatic, green: proline. (d) Peptide activity towards the A549 and MCF7 cancer cell lines (EC50), the noncancer HDMEC primary cells (EC50), and the human erythrocytes (HC50). The error bars give the standard deviation of N = 2 independent measurements with six technical replicates each.
Figure 3Characterization of the parent peptides and the most selective offspring peptides from three subsequent SME generations. (a) Amino acid sequences; red residues denote sequence changes from the respective parent sequence. (b) Circular dichroism spectra in water (blue) and a mixture of 50% v/v TFE:water. (c) Helical wheel plots with hydrophobic moment direction and magnitude (µH). Residue coloring: polar residues in light blue, positively ionizable residues in dark blue, hydrophobic residues in yellow, and aromatic residues in orange. (d) Peptide activity towards the A549 and MCF7 cancer cell lines (EC50), the noncancer HDMEC primary cells (EC50), and the human erythrocytes (HC50). The error bars represent the standard deviation of three independent measurements. **p-value < 0.01, ***p-value < 0.001 of the mean differences (Welch t-test).
Cellular growth inhibition of 60 cell lines in the NCI-60 cancer cell test for the AmphiArc2 (Parent), Off2 and Off2.2.10 peptides.
| AmphiArc2 log GI50 | Off2 log GI50 | Off2.2.10 log GI50 | |
|---|---|---|---|
| Leukemia | −5.5 | −5.2 | −5.3 |
| Lung | −5.6 | −5.4 | −5.2 |
| Colon | −5.6 | −5.2 | −5.1 |
| CNS | −5.6 | −5.2 | −5.2 |
| Melanoma | −5.6 | −5.3 | −5.2 |
| Ovarian | −5.6 | −5.2 | −5.2 |
| Renal | −5.7 | −5.3 | −5.2 |
| Prostate | −5.7 | −5.5 | −5.4 |
| Breast | −5.6 | −5.4 | −5.4 |
The averaged peptide activity for the cancer types tested is shown as the logarithm of the half growth inhibitory concentration (GI50, Supplementary Information Eq. S1), which is the molar concentration of peptide needed to inhibit half of the normal cancer cell growth. The logarithm of GI50 is shown here as 10n M. The values from −5 to −6 correspond to growth inhibition in the 1–10 µM range. The growth inhibition values for the individual cell lines are displayed in Supplementary Information Table S4.