| Literature DB >> 34048569 |
Gábor Erdős1, Mátyás Pajkos1, Zsuzsanna Dosztányi1.
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) exist without a single well-defined conformation. They carry out important biological functions with multifaceted roles which is also reflected in their evolutionary behavior. Computational methods play important roles in the characterization of IDRs. One of the commonly used disorder prediction methods is IUPred, which relies on an energy estimation approach. The IUPred web server takes an amino acid sequence or a Uniprot ID/accession as an input and predicts the tendency for each amino acid to be in a disordered region with an option to also predict context-dependent disordered regions. In this new iteration of IUPred, we added multiple novel features to enhance the prediction capabilities of the server. First, learning from the latest evaluation of disorder prediction methods we introduced multiple new smoothing functions to the prediction that decreases noise and increases the performance of the predictions. We constructed a dataset consisting of experimentally verified ordered/disordered regions with unambiguous annotations which were added to the prediction. We also introduced a novel tool that enables the exploration of the evolutionary conservation of protein disorder coupled to sequence conservation in model organisms. The web server is freely available to users and accessible at https://iupred3.elte.hu.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34048569 PMCID: PMC8262696 DOI: 10.1093/nar/gkab408
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Influence of an additional layer of smoothing for the performance of IUPred on the CAID dataset using the previous method (no second layer smoothing), using the Savitzky-Golay filter with parameters (19, 5) and using moving average smoothing with window size 11 compared to other stat-of-the-art disorder prediction tools
| AUC | F1 score | ||
|---|---|---|---|
|
|
| 0.736 | 0.417 |
|
|
| 0.738 | 0.421 |
|
|
| 0.744 | 0.428 |
|
|
| 0.798 | 0.472 |
|
| 0.765 | 0.43 | |
|
| 0.747 | 0.44 |
Figure 1.The output of IUPred3 for the repressor-activator protein 1 of Saccharomyces cerevisiae. The strong smoothing option was used to generate this plot. At the upper part of the figure the disordered and ordered unambiguous experimentally verified protein regions are marked by red lines at the top and bottom of the plot, respectively. According to the manual curation of experimental data, the part of protein that has unambiguous verified order/disorder profile is coloured grey. The bottom part shows the various annotations for Rap1. Disordered regions from DisProt are shown in deep red boxes. Light red and blue boxes correspond to Pfam families and domains, respectively. Green boxes mark mapped consensus regions of PDB structures.
Figure 2.The output of disorder conservation for the human eIF2A protein. At the top of the figure, IUPred3 profiles of the human eIF2A and its orthologs from five generally known model organisms are depicted, and predicted disordered regions are highlighted by red. The bottom of the figure represents the multiple sequence alignment of orthologs identified from an extended set of eukaryotic model organisms. The human eIF2A as the query protein is highlighted by red in both parts of the figure. Model organisms are classified by taxonomic levels which are indicated with different colours. Using the regular expression based motif search box, the YxPPxΦR motif of eIF2A is highlighted by blue rectangles in each profile.