| Literature DB >> 32637039 |
Jaime Santos1, Jordi Pujols1, Irantzu Pallarès1, Valentín Iglesias1, Salvador Ventura1.
Abstract
Protein aggregation is a widespread phenomenon that stems from the establishment of non-native intermolecular contacts resulting in protein precipitation. Despite its deleterious impact on fitness, protein aggregation is a generic property of polypeptide chains, indissociable from protein structure and function. Protein aggregation is behind the onset of neurodegenerative disorders and one of the serious obstacles in the production of protein-based therapeutics. The development of computational tools opened a new avenue to rationalize this phenomenon, enabling prediction of the aggregation propensity of individual proteins as well as proteome-wide analysis. These studies spotted aggregation as a major force driving protein evolution. Actual algorithms work on both protein sequences and structures, some of them accounting also for conformational fluctuations around the native state and the protein microenvironment. This toolbox allows to delineate conformation-specific routines to assist in the identification of aggregation-prone regions and to guide the optimization of more soluble and stable biotherapeutics. Here we review how the advent of predictive tools has change the way we think and address protein aggregation.Entities:
Keywords: A3D, AGGRESCAN3D; APRs, Aggregation-prone regions; Amyloid; Bioinformatics; DI, Developability index; Evolution; IAPP, Islet amyloid polypeptide; IDPs, Intrinsically disordered proteins; Protein aggregation; Protein production; Protein structure; Proteomics; SAP, Spatial aggregation propensity; STAP, STructural Aggregation-Prone region; mAbs, Monoclonal antibodies
Year: 2020 PMID: 32637039 PMCID: PMC7322485 DOI: 10.1016/j.csbj.2020.05.026
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Innate competition between functional interactions and protein non-functional aggregation. Several factors contribute to balancing this subtle equilibrium.
Fig. 2Computational strategies to predict protein aggregation. In each folding state, aggregation is driven by different molecular determinants, delimiting the best-performing predictive strategy in each particular case. Aggregation-prone residues are colored in red and solubilizing amino acids in blue. APR and STAP designate Aggregation-Prone Regions and STructural Aggregation-prone Regions, respectively. PDB structures correspond to monomeric and tetrameric transthyretin (PDB: 1F41). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Sequence based-prediction methods, according to the rationale behind their analysis. *Registration prior to analysis is required.
| Method | Underlying rationale | Webserver, software or equation |
|---|---|---|
| Phenomenological methods | ||
| AGGRESCAN | Prediction is assayed against an aggregation propensity scale for the 20 proteinogenic amino acids derived from | |
| Zyggregator | Prediction of a 21-residue sliding window from an equation accounting for hydrophobicity, secondary structure propensity, and net charge built upon changing aggregation rate on mutations.It also considers the presence of gatekeeper residues or hydrophobic patches | |
| Theoretical methods | ||
| TANGO | Evaluation of the population of random coil, native conformation or aggregated species from empirically and statistically derived conformational amino acidic preferences, along with physico-chemical variables. | |
| PASTA 2.0 | Energetic function derived from high-resolution protein structures, which considers interaction potential and H-bond formation between all non-consecutive residues for parallel and anti-parallel β-pairing. | |
| FoldAmyloid | A protein structure derived scale; from the notion that hydrophobic stretches exhibit higher “packing density” and H-bonding propensity. | |
| WALTZ | Application of a position specific matrix derived from a large group of hexapeptides, for predicting amyloid-like formation. | |
| Pafig | Analysis of six-residue sliding window for a scale derived from machine supervised learning over 531 physicochemical properties, which led to best discrimination using 41 of them. | Code can be downloaded from their web page |
| Betascan | Evaluation of β-strand pairing propensity, obtained from probabilities of residues to be H-bonded in amphiphilic β-sheets. | |
| GAP | Discriminates amyloid-like or β-amorphous hexapeptides from position-specific pairing frequencies. | |
| 3D Profile | Energetic impact on the spatial accommodation to the backbone of the fibril forming Sup35 hexapeptide is assessed. | |
| Machine learning methods | ||
| APPNN | Machine learning approach based on the analysis of seven physicochemical and biochemical features such as β-sheet frequency, hydrophobic moment, helix termination parameters or isoelectric point. | |
| NetCSSP | Analysis of contact-dependent secondary structure prediction to identify hidden β-propensities. | |
| FiSH Amyloid | Classification of amyloidogenic stretches based on co-ocurrence patterns in protein sequences. | |
| Consensus methods | ||
| AmylPred2 | Generates consensus predictions over 11 algorithms but allows user-customized predictions as some methodologies can have a certain degree of redundancy, thus biasing the consensus prediction. | |
| MetAmyl | Score is obtained applying a linear combination of four predictors’ (which showed lower redundancy) outcome, weighting the individual contribution of each method. |
This list intends to be illustrative and not to provide an extensive enumeration and description of all available methods. Programs in this list are not necessarily more accurate than those absent.
Fig. 3Comparison between a computationally guided pipeline for optimizing protein-based biotherapeutics and currently used strategies. The computational analysis of a candidate pool and/or the reduction of their aggregation by introducing solubilizing mutations offers a powerful alternative to expensive and blinded trial/error approaches, being cost-effective strategies to increase the success rate in the development of protein-based therapeutics.