| Literature DB >> 34056481 |
Saeed Zanganeh1,2, Loghman Firoozpour3, Soroush Sardari4, Ali Afgar5, Reza Ahangari Cohan1, Nasir Mohajel6.
Abstract
Self-assembling amphiphilic peptides have recently received special attention in medicine. Nonetheless, testing the myriad of combinations generated from at least 20 coded and several hundreds of noncoded amino acids to obtain candidate sequences for each application, if possible, is time-consuming and expensive. Therefore, rapid and accurate approaches are needed to select candidates from countless combinations. In the current study, we examined three conventional descriptor sets along with a novel descriptor set derived from the simulated aggregation propensity of di- and tripeptides to model the critical aggregation concentration (CAC) of amphiphilic peptides. In contrast to the conventional descriptors, the radial kernel model derived from the novel descriptor set accurately predicted the critical aggregation concentration of the test set with a residual standard error of 0.10. The importance of aromatic side chains, as well as neighboring amino acids in the self-assembly, was emphasized by analysis of the influential descriptors. The addition of very long peptides (70-100 residues) to the data set decreased the model accuracy and changed the influential descriptors. The developed model can be used to predict the CAC of self-assembling amphiphilic peptides and also to derive rules to apply in designing novel amphiphilic peptides with desired properties.Entities:
Year: 2021 PMID: 34056481 PMCID: PMC8158804 DOI: 10.1021/acsomega.1c01293
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Schematic representation of different approaches used for CAC modeling of self-assembling amphiphilic peptides. Four different approaches of descriptor generation were used. Each approach is depicted by a colored line. Black line (whole-peptide): PaDEL descriptors defined for peptide structures; red line (PCA): principal components of PaDEL descriptors defined for each amino acid in the sequence; blue line (z-scale): z-scales defined for each amino acid in the sequence; and purple line (AP/APH scales): novel AP/APH scales defined for each amino acid in the sequence. The last approach consisted of three steps. In each step, more descriptors were added to the previous AP/APH scales. In brief, in the first step (indicated by ①) AP/APH scales for Ala-AA and AA-Ala (the structures and AP/APH scales for these peptides are presented in the first and second columns of Table S5) were used in model building. In the second step (indicated by ②) AP/APH scales for Ala-AA-Ala (the structures and AP/APH scales for these peptides are presented in the third column of Table S5) were used in addition to the AP/APH scales defined in the previous step. In the third step (indicated by ④) AP/APH scales for Ala-Ala- AA and AAAla-Ala (the structures and AP/APH scales for these peptides are presented in the fifth and sixth columns of Table S5) were used in addition to the AP scales defined in the two previous steps. AA is the ith amino acid in the peptide sequence. PCA: principal component analysis, ACC: auto cross-covariance.
Amphiphilic Peptides and the Measured CAC Values Collected from the Literaturea
| peptides | CAC (−log | refs | peptides | CAC (−log | refs | peptides | CAC (−log | refs |
|---|---|---|---|---|---|---|---|---|
| V6K2GRGDS | 4.83 | ( | A10H6 | 6.64 | ( | Ac-GAVILEE | 3.15 | ( |
| Ac-A6K± | 4.60 | ( | RF | 1.74 | ( | Ac-GAVILEE-NH2 | 3.10 | |
| Ac-L6K2-NH2 | 4.34 | ( | [RF]2 | 4.40 | 2.96 | |||
| Ac-L6K3-NH2 | 3.60 | [RF]3 | 2.64 | Ac-L3D | 2.92 | |||
| Ac-V6K2-NH2 | 3.48 | [RF]4 | 3.52 | Ac-L3K-NH2 | 2.92 | |||
| Ac-V6K3-NH2 | 3.08 | [RF]5 | 4.70 | 2.80 | ||||
| Ac-V3D | 2.64 | |||||||
| Ac-V6K4-NH2 | 2.33 | A6YD | 2.52 | ( | Ac-V6K-NH2 | 3.35 | ( | |
| 2.27 | V4WD2 | 2.12 | A3C | 3.42 | ( | |||
| Ac-A6K2-NH2 | 2.10 | V4D | 2.70 | ( | 3.77 | |||
| Ac-V6D | 3.30 | ( | V4WD | 2.32 | I3C | 4.15 | ||
| Ac-V6D2 | 2.96 | 2.39 | I4K | 3.60 | ( | |||
| A6RGD | 1.74 | ( | L4WD2 | 2.72 | I5K | 3.89 | ||
| G3A3V3I3K3 | 3.23 | ( | RFL4FR | 3.10 | ( | LI2K | 2.99 | |
| K3I3V3A3G3 | 3.19 | 10.19 | ( | L4K | 3.27 | |||
| I3V3A3G3K3 | 3.28 | 6.17 | 3.85 | |||||
| K3G3A3V3I3 | 3.55 | 5.46 | Ac-A6D | 3.34 | ( | |||
| V3G3I3A3K3 | 3.01 | 4.38 | Ac-GAVILRR-NH2 | 3.09 | ( | |||
| K3A3I3G3V3 | 3.15 | 4.01 | K4X4-gA | 3.70 | ( | |||
| K-K8 | 2.10 | ( | Ac-A9K-NH2 | 4.82 | ( | K5X3-gA | 3.68 | |
| KK8 | 2.10 | Ac-A6K-NH2 | 3.70 | K8-gA | 3.66 | |||
| IK-K11 | 2.58 | Ac-A3K-NH2 | 2.00 | K6X2-gA | 3.64 | |||
| IKK11 | 2.66 | Ac-A6D | 3.52 | ( | K7X1-gA | 3.62 | ||
| IK-K16 | 3.03 | DA6-NH2 | 3.70 | K3X5-gA | 3.77 | |||
| IKK16 | 3.13 | 3.52 | ||||||
| GAAVILRR | 1.52 | ( | Ac-I3K-NH2 | 3.35 |
The CAC values were measured by fluorimetry,[23−36] conductivity,[37−41] or dynamic light scattering (DLS) techniques[42−44] in pure water. All concentrations are represented as −log M. The defined test set is presented in bold font. Long peptides are indicated by italic format. All peptide structures and molar concentrations can be found in the supporting information Table S4.
The value of CAC for Ac-A6K-NH2 was extracted from reference (38) as the techniques used there tended to be more accurate than those used in the other reports.
Figure 2Performance of svmRadial, svmLinear, PLS, and GBM models on the training and test sets: model performance on the train (green dots) and test (orange dots) sets were measured using residual standard error (RSE) calculation and visualized by plotting predicted CAC values against experimental ones. The equation for the black lines is predicted CAC = experimental CAC.
Correlation between AP and the PaDEL Molecular Descriptors for Ala-AA-Ala Tripeptides Constructed by Replacing AAi with Amino Acids in the Data Seta
| number | descriptor type | definition | correlation coefficient |
|---|---|---|---|
| 1 | SdsCH | sum of atom-type E-state: =CH– | 0.96 |
| 2 | khs.aaCH | counts the number of occurrences of the E-state fragments | 0.95 |
| 3 | ndsCH | count of atom-type E-state: =CH– | 0.95 |
| 4 | nHother | count of atom-type H E-state: H on aaCH, dCH2 or dsCH | 0.95 |
| 5 | nHdsCH | count of atom-type H E-State: =CH– | 0.95 |
| 6 | HybRatio | hybridization ratio (fraction of sp3 carbons to sp2 carbons) | –0.87 |
| 7 | GATS1i | Geary autocorrelation—lag 1/weighted by first ionization potential | –0.82 |
| 8 | GATS1p | Geary autocorrelation—lag 1/weighted by polarizabilities | –0.81 |
| 9 | SpMax1_Bhs | largest absolute eigenvalue
of Burden modified matrix— | –0.80 |
| 10 | SpMax3_Bhs | largest absolute eigenvalue
of Burden modified matrix— | –0.80 |
Ten descriptors with the highest positive and negative correlations are presented.
Three Descriptors or ACCs with the Most Influence on the Best Model in Each Approach
| order | |||
|---|---|---|---|
| approach | 1 | 2 | 3 |
| whole peptide-svmRadial | TopoPSA | ATSC7v | nHBAcc3 |
| PCA-svmRadial | |||
| AP-scale-svmRadial | |||
| APH-scale-svmRadial | |||
Figure 3Addition of long peptides to the data set reduced the model performance. Model performance was measured after (a) adding five long peptides from ref (35) to the data set and (b) after removing the shortest peptide (RF) from the data set. The model fitted on the data set containing RF and the five long peptides had an RSE value of 0.29 on the test set. However, removing RF from the data set reduced the RSE value to 0.23.