| Literature DB >> 34057308 |
Marcos Gil-Garcia1, Valentín Iglesias1, Irantzu Pallarès1, Salvador Ventura1.
Abstract
Prions are self-perpetuating proteins able to switch between a soluble state and an aggregated-and-transmissible conformation. These proteinaceous entities have been widely studied in yeast, where they are involved in hereditable phenotypic adaptations. The notion that such proteins could play functional roles and be positively selected by evolution has triggered the development of computational tools to identify prion-like proteins in different kingdoms of life. These algorithms have succeeded in screening multiple proteomes, allowing the identification of prion-like proteins in a diversity of unrelated organisms, evidencing that the prion phenomenon is well conserved among species. Interestingly enough, prion-like proteins are not only connected with the formation of functional membraneless protein-nucleic acid coacervates, but are also linked to human diseases. This review addresses state-of-the-art computational approaches to identify prion-like proteins, describes proteome-wide analysis efforts, discusses these unique proteins' functional role, and illustrates recently validated examples in different domains of life.Entities:
Keywords: bioinformatics; functional amyloids; prion; prion-like prediction; prion-like protein; proteome screenings
Mesh:
Substances:
Year: 2021 PMID: 34057308 PMCID: PMC8409284 DOI: 10.1002/2211-5463.13213
Source DB: PubMed Journal: FEBS Open Bio ISSN: 2211-5463 Impact factor: 2.693
Performance of aggregation prediction methods when identifying prionic sequences. AGGRESCAN [132], PATH [133], RFAmyloid [134] and AmyloGram [135] prediction methods were run with default parameters and results were obtained using their standard thresholds. PrionW [19] is intended for predicting prion‐like proteins and was used for performance comparison (in italics). The sensitivity, specificity, precision, accuracy, Matthews correlation coefficient (MCC) and F1 Score were calculated from yeast prion domains and prion‐like domains experimentally validated by Alberti and coworkers [13]. The authors characterized the domains for their amyloid and prion forming ability in four assays and scored them from 0 to 10. As described previously [26], those domains that were positive in all four assays and scored ≥ 9 were considered prions, and nonprions those sequences scoring ≤ 2 and being positive in one assay at maximum. The dataset was composed of 12 true positives (TP) (including bona fide prions Sup35, New1, Swi1, Ure2p, Rnq1) and 39 true negatives (TN). False negatives and false positives are abbreviated as FN and FP, respectively.
| Algorithm | TP | TN | FP | FN | Sensitivity | Specificity | Precision | Accuracy | MCC | F1 score |
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| AGGRESCAN | 0 | 39 | 0 | 12 | 0.00 | 1.00 | – | 0.76 | – | 0.00 |
| PATH | 3 | 39 | 0 | 9 | 0.25 | 1.00 | 1.00 | 0.82 | 0.45 | 0.40 |
| RFAmyloid | 8 | 15 | 24 | 4 | 0.67 | 0.38 | 0.25 | 0.45 | 0.04 | 0.36 |
| AmyloGram | 10 | 11 | 28 | 2 | 0.83 | 0.28 | 0.26 | 0.41 | 0.11 | 0.40 |
Prion‐like prediction methods and applied analytical rationale.
| Algorithm | Strategy | Brief description/Underlying rationale | Availability | References |
|---|---|---|---|---|
| DIANA | Compositional | Identifies Q/N‐rich domains by counting Qs and Ns in 80 consecutive amino acids windows and retrieves the most enriched stretch above a 30‐Q/N threshold. Stretch length and minimum Q/Ns content are based on the length and Q/N percentage of Sup35 and Ure2p yeast prions and human disease‐causing polyQ‐expansions | – | [ |
| LPS | Compositional | The program is designed to retrieve any compositionally biased sequence, and it has been used to identify Q/N‐rich regions as a proxy to PrLD. It first searches for all possible single amino acid bias by comparing each window against the inputted background frequencies and retrieving the lowest probability stretch. Subsequent updates allowed automatic calculation of bias for multiple residue types by checking if their combined probability was lower than those of the individual residues separately. LPS calculates for and against biases | Script | [ |
| PAPA | Compositional | Prediction is made on disordered segments, exploiting an amino acid propensity scale obtained by random mutating a short stretch of a prionic Sup35 variant | Web server + script | [ |
| PLAAC | Compositional | Applies an HMM trained on 28 yeast PrD and PrLD with high experimental prion propensity against user‐selected backgrounds of amino acid frequencies | Web server + script | [ |
| pWALTZ | Amyloid | Applies the WALTZs' experimentally‐derived amyloid propensity scoring matrix to longer stretches, averaging the amyloid load over 21‐residues stretches and retrieving the strongest amyloid‐core | Executable | [ |
| PrionW | Compositional + Amyloid | Disordered fragments with a minimum QN content are evaluated with pWALTZ. The QN‐threshold can be adjusted for different species' background frequencies | Web server | [ |
| PrionScan | Compositional | Uses an unsupervised classifier and a statistical representation of PrLD relying on the amino acid frequencies of positive and negative sequences in Lindquist's dataset. PrionScan incorporates a built‐in database that regularly updates its predictions for UniProt KB releases | Web server + Database | [ |
| pRANK | Compositional/Machine learning | Implements a supervised learning strategy, trained on 22 known Q/N‐rich yeast PrD as positive sequences and the remaining proteome with < 90% similarity to them as negative ones | Web server + script | [ |
Fig. 1Bioinformatics has been a valuable tool in the discovery of prion‐like proteins. (A) Prion‐like prediction methods correctly identify PrD in yeast prion proteins Sup35 and Ure2p. Disordered N‐terminal regions shown in cartoon representation are models not derived from structural data. (B) Proposed pipeline for optimizing prion‐like proteins discovery. Large scale computational analysis offers a powerful alternative to time‐consuming and more expensive experimental approaches. Stepwise sequential restrictions in the selection from a pool of initial candidates increase the success rate for discovering novel prion‐like proteins. (a) mPAPA corresponds to the 2020 modified version of PAPA (b) pWALTZ requires a previously defined PrLD; thus pWALTZ soft‐amyloid core predictions are only shown for PrionW.
Fig. 2Proteome‐wide analyses of prion‐like proteins in distinct kingdoms of life. Large‐scale analyses approximate the content of PrLD‐containing proteins in the proteomes unrelated organisms. Experimentally‐validated examples (if any) of prion‐like proteins and proteome enrichments are indicated for each species. ND, not determined. Created with BioRender.com.