| Literature DB >> 21646342 |
Ian Walsh1, Alberto J M Martin, Tomàs Di Domenico, Alessandro Vullo, Gianluca Pollastri, Silvio C E Tosatto.
Abstract
CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/.Entities:
Mesh:
Year: 2011 PMID: 21646342 PMCID: PMC3125791 DOI: 10.1093/nar/gkr411
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Pearson correlation of the three systems on CASP9 targets
| ESpritz | Spritz | Punch | |
|---|---|---|---|
| ESpritz | 1.00 | 0.51 | 0.59 |
| Spritz | 1.00 | 0.42 | |
| Punch | 1.00 |
The probabilities are produced by each component on all residues for 117 CASP9 targets. Since the correlations are low, combining the three systems improves performance over the individual systems.
Results for the top five performing groups at the CASP9 experiment, CSpritz and the original Spritz
| GroupID: Name | Sw (±SE) | ACC | AUC |
|---|---|---|---|
| 291: PRDOS2 | 50.44 (±1.08) | 75.22 | 0.852 |
| 119: MULTICOM-REFINE | 49.53 (±1.00) | 74.77 | 0.818 |
| 000: CSpritz | 49.27 (±1.02) | 74.64 | 0.828 |
| 351: BIOMINE_DR_PDB | 48.21 (±1.25) | 74.11 | 0.818 |
| 374: GSMETADISORDERMD | 47.13 (±0.96) | 73.57 | 0.815 |
| 193: MASON | 45.98 (±1.17) | 73.00 | 0.740 |
| 000: Spritz | 24.91 (±1.18) | 62.46 | 0.716 |
Disordered segments of less than three residues were removed (results unchanged if included, see Supplementary Table S3). The standard error (SE) for Sw is shown in brackets. ACC is the accuracy, i.e. (sensitivity + specificity)/2, and AUC the area under the receiver operator curve. A total of 32 groups participated in CASP9 disorder prediction category.
Comparison for DisProt disordered regions
| Method | Sw (±SE) | ACC | AUC |
|---|---|---|---|
| CSpritz (short) | 54.64 (±3.58) | 77.32 | 0.837 |
| CSpritz (long) | 65.70 (±3.52) | 82.85 | 0.891 |
| Spritz (short) | 12.12 (±6.16) | 56.06 | 0.685 |
| Spritz (long) | 35.55 (±3.58) | 67.78 | 0.734 |
| PONDR-FIT | 51.53 (±4.34) | 75.77 | 0.817 |
| Disopred2 | 46.20 (±4.00) | 73.10 | 0.806 |
| IUPred (short) | 37.65 (±4.77) | 68.83 | 0.814 |
| IUPred (long) | 42.57 (±4.75) | 71.29 | 0.818 |
CSpritz is compared with the original Spritz, PONDR-FIT, Disopred and IUPred. Where applicable both short and long options are reported. The standard error (SE) for Sw is shown in brackets. ACC is the accuracy, i.e. (sensitivity + specificity)/2, and AUC the area under the receiver operator curve. The decision threshold and best Sw was found to be 0.26 and 51.85 on the training set.
Figure 1.Global output page for multiple sequences. Summary statistics are displayed for some interesting values about the disorder segments of all query sequences. An archive is offered for download containing all disorder predictions, linear motifs and statistics for each protein the user supplied. The inset shows a graph displaying the length distribution of disorder segments among all proteins.
Figure 2.Individual output page for D. melanogaster Cryptochrome. The main figure shows the list of available files and actual disorder prediction. The latter is composed of the amino acid sequence, its predicted secondary structure and the CSpritz disorder classification, with disordered residues highlighted in red font. Disorder statistics about the protein is presented on the right. Two insets show the graphs for the disorder propensity plot (top right) and number of available structural coordinates versus disordered segments in homologous sequences. The inset on the bottom part shows the annotated disordered segment covering the C-terminus of Cryptochrome (residues 513–542). The propensities for secondary structure and location of putative functional motifs are shown. Links to the ELM description of the motif amino acids involved in the motif are supplied on the right. A graph and probabilities secondary structure propensity are also supplied.