| Literature DB >> 36249386 |
László Keresztes1, Evelin Szögi1, Bálint Varga1, Viktor Farkas2, András Perczel2,3, Vince Grolmusz1,4.
Abstract
Hexapeptides are widely applied as a model system for studying the amyloid-forming properties of polypeptides, including proteins. Recently, large experimental databases have become publicly available with amyloidogenic labels. Using these data sets for training and testing purposes, one may build artificial intelligence (AI)-based classifiers for predicting the amyloid state of peptides. In our previous work (Biomolecules 2021, 11, 500), we described the Support Vector Machine (SVM)-based Budapest Amyloid Predictor (https://pitgroup.org/bap). Here, we apply the Budapest Amyloid Predictor for discovering numerous amyloidogenic and nonamyloidogenic hexapeptide patterns with accuracy between 80% and 84%, as surprising and succinct novel rules for further understanding the amyloid state of peptides. For example, we have shown that for any independently mutated residue (position marked by "x"), the patterns CxFLWx, FxFLFx, or xxIVIV are predicted to be amyloidogenic, while those of PxDxxx, xxKxEx, and xxPQxx are nonamyloidogenic. We note that each amyloidogenic pattern with two x's (e.g.,CxFLWx) describes succinctly 202 = 400 hexapeptides, while the nonamyloidogenic patterns comprising four point mutations (e.g.,PxDxxx) give 204 = 160 000 hexapeptides in total. We also examine the restricted substitutions for positions "x" from subclasses of proteinogenic amino acid residues; for example, if "x" is substituted with hydrophobic amino acids, then there exist patterns containing three x's, like MxVVxx, predicted to be amyloidogenic. If we can choose for the x positions any hydrophobic amino acids, except the "structure breaker" proline, then we get amyloid patterns with five x positions, for example, xxxFxx, each corresponding to 32 768 hexapeptides. To our knowledge, no similar applications of artificial intelligence tools or succinct amyloid patterns were described before the present work.Entities:
Year: 2022 PMID: 36249386 PMCID: PMC9558248 DOI: 10.1021/acsomega.2c02513
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
List of All Nonamyloid Patterns with Four Free Positionsa
| PxPxxx | PxDxxx | xxPPxx | xxPDxx | xxPGxx | xxPKxx |
| xxPQxx | xxDPxx | xxDDxx | xxDGxx | xxDKxx | xxDQxx |
| xxKPxx | xxKDxx | xxNPxx | xxGPxx | xxRPxx | xxPxEx |
| xxPxKx | xxPxDx | xxDxEx | xxDxKx | xxDxDx | xxKxEx |
It contains 24 patterns. Note that each pattern describes 204 = 160 000 hexapeptides succinctly, all of which are predicted to be nonamyloids by the Budapest Amyloid Predictor.[14] From the 24 patterns, only nine do not contain proline in a fixed position.
Amyloid Effect Matrix, Constructed from the Precomputed Values from Equation a
| 1 | 2 | 3 | 4 | 5 | 6 | |
|---|---|---|---|---|---|---|
| A | –0.26 | –0.32 | –0.27 | –0.14 | –0.43 | –0.22 |
| R | –0.45 | –0.41 | –0.46 | –0.33 | –0.52 | –0.35 |
| N | –0.40 | –0.34 | –0.49 | –0.27 | –0.46 | –0.30 |
| D | –0.49 | –0.43 | –0.56 | –0.41 | –0.56 | –0.36 |
| C | –0.09 | –0.21 | 0.03 | –0.05 | –0.17 | –0.05 |
| Q | –0.37 | –0.30 | –0.36 | –0.34 | –0.48 | –0.32 |
| E | –0.51 | –0.41 | –0.43 | –0.30 | –0.61 | –0.39 |
| G | –0.23 | –0.37 | –0.46 | –0.37 | –0.30 | –0.33 |
| H | –0.32 | –0.26 | –0.26 | –0.30 | –0.35 | –0.25 |
| I | –0.06 | –0.08 | 0.26 | 0.09 | –0.06 | –0.07 |
| L | –0.10 | –0.18 | 0.02 | 0.04 | –0.22 | –0.13 |
| K | –0.39 | –0.45 | –0.51 | –0.35 | –0.59 | –0.32 |
| M | –0.17 | –0.25 | –0.02 | –0.10 | –0.19 | –0.18 |
| F | –0.13 | –0.11 | 0.05 | –0.03 | –0.13 | –0.11 |
| P | –0.56 | –0.38 | –0.56 | –0.51 | –0.42 | –0.45 |
| S | –0.37 | –0.35 | –0.41 | –0.30 | –0.48 | –0.23 |
| T | –0.34 | –0.33 | –0.28 | –0.23 | –0.40 | –0.23 |
| W | –0.17 | –0.17 | –0.09 | –0.06 | –0.12 | –0.16 |
| Y | –0.23 | –0.11 | –0.13 | –0.06 | –0.18 | –0.15 |
| V | –0.05 | –0.14 | 0.19 | 0.14 | –0.19 | 0.01 |
The rows correspond to the amino acids, while the columns correspond to the positions. The larger numbers show stronger amyloidogenic properties in the given position. Source: ref (14) (Copyright 2021 the authors). In ref (14), by ordering the columns of this table, a position-dependent amyloidogenecity order of amino acids is given in a subsequent table.
Figure 1Examples of amyloid and nonamyloid patterns.
Amino Acid Subsets Examineda
| class name | class elements listing | no. of free positions | no. of patterns |
|---|---|---|---|
| small nonpolar | GAST | 3 | 411 |
| hydrophobic | CVLIMPFYW | 3 | 43 |
| polar | DENQHKR | 3 | 4 |
| hydrophobic-{P} | CVLIMFYW | 5 | 38 |
| amino acids-{P} | QFYESNCDMLIAHGWRKVT | 3 | 4 |
The classification of the residues in the first three rows is as in ref (21). The last two rows correspond to the classes where we left out proline, a well-known structure-breaker from the hydrophobic set or from all of the amino acids. The third column shows the number of free positions we get in the special substitutions, and the fourth column shows the number of patterns found for these special substitutions for “x”.
List of All 43 Amyloidogenic Patterns with Three Free Positions When x Is Hydrophobic, Chosen from CVLIMPFYWa
| VxIVxx | VxIIxx | VxILxx | VxIFxx | VxVVxx | VxVIxx | VxVLxx | IxIVxx |
| IxIIxx | IxILxx | IxVVxx | IxVIxx | CxIVxx | CxIIxx | CxILxx | CxVVxx |
| CxVIxx | LxIVxx | LxIIxx | LxILxx | LxVVxx | LxVIxx | FxIVxx | FxIIxx |
| FxVVxx | MxIVxx | MxIIxx | MxVVxx | WxIVxx | WxIIxx | GxIVxx | YxIVxx |
| xxIVIx | xxIVxV | xxIVxC | xxIVxI | xxIVxF | xxIIxV | xxIIxC | xxILxV |
| xxVVxV | xxVVxC | xxVIxV |
Each pattern describes 93 = 729 hexapeptides.
List of All 38 Amyloidogenic Patterns with Five Free Positions When x Is Hydrophobic, but Cannot Be Proline, Chosen from CVLIMFYWa
| Vxxxxx | Ixxxxx | Cxxxxx | Lxxxxx | Fxxxxx | Mxxxxx | Wxxxxx | xIxxxx |
| xFxxxx | xYxxxx | xVxxxx | xWxxxx | xLxxxx | xCxxxx | xxIxxx | xxVxxx |
| xxFxxx | xxCxxx | xxLxxx | xxMxxx | xxWxxx | xxxVxx | xxxIxx | xxxLxx |
| xxxFxx | xxxCxx | xxxWxx | xxxYxx | xxxxIx | xxxxWx | xxxxFx | xxxxCx |
| xxxxYx | xxxxxV | xxxxxC | xxxxxI | xxxxxF | xxxxxL |
Each pattern describes 85 = 32 768 hexapeptides.