Literature DB >> 16845005

SUMOsp: a web server for sumoylation site prediction.

Yu Xue1, Fengfeng Zhou, Chuanhai Fu, Ying Xu, Xuebiao Yao.   

Abstract

Systematic dissection of the sumoylation proteome is emerging as an appealing but challenging research topic because of the significant roles sumoylation plays in cellular dynamics and plasticity. Although several proteome-scale analyzes have been performed to delineate potential sumoylatable proteins, the bona fide sumoylation sites still remain to be identified. Previously, we carried out a genome-wide analysis of the SUMO substrates in human nucleus using the putative motif psi-K-X-E and evolutionary conservation. However, a highly specific predictor for in silico prediction of sumoylation sites in any individual organism is still urgently needed to guide experimental design. In this work, we present a computational system SUMOsp--SUMOylation Sites Prediction, based on a manually curated dataset, integrating the results of two methods, GPS and MotifX, which were originally designed for phosphorylation site prediction. SUMOsp offers at least as good prediction performance as the only available method, SUMOplot, on a very large test set. We expect that the prediction results of SUMOsp combined with experimental verifications will propel our understanding of sumoylation mechanisms to a new level. SUMOsp has been implemented on a freely accessible web server at: http://bioinformatics.lcd-ustc.org/sumosp/.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16845005      PMCID: PMC1538802          DOI: 10.1093/nar/gkl207

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Sumoylation, a reversible post-translational modification (PTM) of proteins by the small ubiquitin-related modifiers (SUMOs), is crucial in a variety of biological processes, including transcription (1,2), mRNA metabolism (3), signal transduction (4) and may be involved in the perception of sound (5). Protein sumoylation has also been reported to play essential roles in various diseases and disorders, such as type-1 diabetes (T1D) (6) and Parkinson's disease (PD) (7). SUMO proteins are highly conserved across eukaryotes, and consist of four components in mammals, SUMO-1, SUMO-2, SUMO-3 and SUMO-4 (8). There is only one SUMO gene SMT3 in budding yeast, while there exist at least eight SUMO paralogs in plants (9). Sumoylation is an unusual phenomenon with quite distinct characteristics. For example, although there are many lysines (K) in a sumoylated protein, only a few of them could be bona fide sumoylation sites. Many sumoylation sites follow a consensus motif ψ-K-X-E (ψ is a hydrophobic amino acid) (8,10) or ψ-K-X-E/D (11,12); however, the accumulating experimental data has shown that about 23% (56/239) of real sumoylation sites don't follow the above consensus motif [Supplementary Table S1 (A)]. It has also been proposed that a nuclear localization signal (NLS) and a consensus motif confer the ability to be sumoylated. But there exist some real SUMO substrates that are not localized in nucleus. For example, protein DRP1 (dynamin related protein) is localized in the mitochondria and is sumoylated during mitochondrial fission (13). In this regard, our understanding of sumoylation mechanisms is still in its infancy. Moreover, the sumoylation process is dynamic and only a small fraction of the proteome, often <1%, will be sumoylated in vivo at any given time (10). These complex features of sumoylation sites have introduced great difficulties in the systematic analysis of the sumoylation proteome. Using mass spectrometry (MS) approaches, several large-scale experiments of sumoylation substrates have been carried out (12,14–17), however, the bona fide sumoylation sites still remain to be identified. In this regard, computational approaches might represent a promising method for identification of sumoylation sites. Previous work on in silico identification of SUMO substrates with their sumoylation sites is mainly based on identification of the consensus motif, ψ-K-X-E or ψ-K-X-E/D, which may miss many true positives. And since many consensus sites are not sumoylated, these approaches will often generate very high false positive prediction rates. In this work, we have developed a computational system, SUMOsp—SUMOylation Sites Prediction, based on two methods, GPS (18,19) and MotifX (20). GPS and MotifX are originally designed for phosphorylation site prediction, and leave-one-out validation and 5-fold cross validation in this article indicate that these two pattern recognition strategies are also robust and accurate for the sumoylation site prediction. SUMOsp offers at least as good prediction performance as the only existing system, SUMOplot. To facilitate applications of this system by other users, we have developed an easy-to-use web server of SUMOsp, which is freely accessible at: .

IMPLEMENTATION

Data preparation

We searched PubMed with keywords ‘SUMO’ and ‘sumoylation’, and manually curated 239 unambiguously experimentally-identified sumoylation sites in 144 proteins from ∼400 research articles published online before December 10, 2005. We have retrieved their primary sequences from Swiss-Prot/TrEMBL database (). Due to the database updates, the sumoylation positions reported in the literature may have changed in the current primary sequences, therefore the dataset was manually validated before our analyzes.

Algorithm

We first define a potential sumoylation peptide PSP(n) as a lysine (K) residue flanked by n residues upstream and n residues downstream. We hypothesize that the biochemical properties of a sumoylation site mainly depend on the neighboring amino acids, and this hypothesis has been satisfactorily confirmed by our validation results. In this work, we use n = 7 for PSP(n)'s, which is confirmed by the prediction performance to be sufficient to represent the flanking information of a sumoylation site. Although other matrices could be employed, we choose BLOSUM62 as we have previously used (19). In this study, we have employed two powerful prediction strategies, GPS (18,19) and MotifX (20), for prediction of sumoylation sites, and our server provides both results to its users. As described in (19), two peptides flanking the same amino acid may have similar PTM, if the BLOSUM62 substitution score between them is sufficiently high. In this study, GPS firstly partitioned the dataset of PSP(7) flanking the 239 known sumoylation sites into three clusters. For a given PSP(7) flanking a lysine (K) amino acid and one of the clusters, the averaged value of the scores between this peptide and the peptides in the cluster is defined as the score of this cluster. The GPS score of this given peptide is defined as the maximum one of the scores between the peptide and the clusters. We use a particular cut-off value to make the final judgment. MotifX (20) generated a set of highly-specific motifs for the sumoylation sites, IKXEP, VKXE, IKXE, LKXE and KXE (X can be any amino acid), which can be easily used by users. In fact, we found that MotifX exhibits greater computing power when it combines with GPS. For example, a combination of MotifX with GPS predicts PSP(7) as a positive hit when the peptide is predicted as positive for either of them. So SUMOsp, the integration of GPS and MotifX, acts in this way.

RESULTS

We use sensitivity (Sn), specificity (Sp) and accuracy (Ac) to evaluate the performance of SUMOsp. Sensitivity and specificity measure the positive and negative predictions, respectively, while accuracy provides the correct prediction ratio. It is worth noting that we found that these measures are inadequate for the cases where the numbers of positive and negative data differ significantly. So in addition to Sn, Sp and Ac values, we have also used a correlation coefficient (CC) to assess our prediction system. CC is between −1 and 1, and the larger a CC is, the more accurate the prediction is. Analogous to the previous work (18,19,21), the known sumoylation sites are regarded as the positive data, while all the other lysine (K) amino acids in the known sumoylation substrates are regarded as the negative data. Among the data with positive predictions by SUMOsp, the real positive ones are called true positives (TP), and the others are called false positives (FP). Among the data with negative predictions by SUMOsp, the real positive ones are called false negatives (FN), while the others are called true negatives (TN). The performance measurements sensitivity (Sn), specificity (Sp), accuracy (Ac) and Matthews' correlated coefficient (CC) (22) are defined as follows: and We provide three cut-off scores, 1.5, 4 and 18, which are only effective for the GPS scores. Users may choose different cut-off score according to their requirements on the prediction performance (refer to Supplementary Table S2). SUMOsp with cut-off score 0 will generate the prediction results of GPS and MotifX for all the lysines, which is of interest for further investigations. We have compared the prediction performance of SUMOsp to the only publicly available tool SUMOplot (). Making predictions based on hydrophobic similarity with the consensus motif and the degree of matching with the sumoylation sites from Ubc9-binding substrates, SUMOplot is considered as an excellent computational program. Here we denote the two levels of stringencies of SUMOplot as high (hits with high probability) and all (all predictions). As in Table 1, the Ac, Sn, Sp and CC of SUMOsp with threshold 18 are 92.71%, 83.68%, 93.08% and 0.5012, respectively, while the Ac, Sn, Sp and CC of SUMOsp with threshold 4 are 80.43%, 89.12%, 80.07% and 0.3232, respectively. The Ac, Sn, Sp and CC of SUMOplot at high/all levels are 89.94%/80.45%, 79.50%/88.70%, 93.31%/80.07% and 0.4825/0.3211, respectively. So SUMOsp is more accurate by all measurements. To test SUMOsp's robustness, we have used both Leave-one-out validation and 5-fold cross validation. Both methods show similar levels of performance to the above results. The Ac, Sn, Sp and CC of the consensus motif ψ-K-X-E are 97.21%, 74.48%, 98.16% and 0.6689 respectively. So SUMOsp provides better sensitivity while keeping similar specificity. Experimentalists may want to generate a more reliable in silico prediction results by integrating the above methods, phylogenetic conservation and structural analysis. Detailed information about the validations could be found in Supplementary Table S2.
Table 1

Prediction performance of SUMOsp and SUMOplot

PredictorThresholdAc (%)Sn (%)Sp (%)CC
SUMOsp1892.7183.6893.080.5012
480.4389.1280.070.3232
SUMOplothigh89.9479.5093.310.4825
all80.4588.7080.070.3211
To illustrate how robust SUMOsp is in regard of threshold-independent performance, we provided the receiver operating characteristic (ROC) curves of self validation, Leave-one-out validation and 5-fold cross validation (refer to Supplementary Figure S1). Both the ROC curves and the areas under the ROC curves (AUC) suggest that SUMOsp is a robust prediction system. For those non-canonical real sumoylation sites, SUMOsp can also provide a satisfying prediction performance [as in Supplementary Table S1 (B)].

USE OF SUMOSP WEB SERVICE

SUMOsp web server has been developed in an easy-to-use manner. A user can visit SUMOsp at (Figure 1), enter the protein sequences either in raw format or FASTA format into the text box, and run the program by pressing the ‘Submit’ button. The prediction results should be regarded as potential sites before experimental validation. And by pressing the word here in the sentence ‘Download the TAB-deliminated data file from here’, a user can get prediction results in tab-deliminated plain text to be used for further consideration.
Figure 1

The prediction page of SUMOsp web server.

DISCUSSION AND CONCLUSION

The systematic identification of the sumoylation proteome represents a great challenge. Although experimental verifications are essential, computational methods can serve as a complementary and powerful tool to help accelerate the sumoylation research. Previously, we have performed a genome-wide analysis of the SUMO substrates in human nucleus, based on pattern recognition and evolutionary conservation (5). An in silico predictor for sumoylation sites is still urgently needed. In this work, we have developed a novel computational method and computer program, SUMOsp, for the highly-specific prediction of sumoylation sites. Based on its prediction performance, we believe that SUMOsp could serve as a powerful and complementary tool for in vivo or in vitro sumoylation site identification; and the combination of computational analyzes with experimental verification could greatly speed up our understanding of the mechanisms and dynamics of sumoylation systematically.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.
  22 in total

1.  Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

Authors:  B W Matthews
Journal:  Biochim Biophys Acta       Date:  1975-10-20

Review 2.  Protein modification by SUMO.

Authors:  Erica S Johnson
Journal:  Annu Rev Biochem       Date:  2004       Impact factor: 23.643

3.  Sumoylation of heterogeneous nuclear ribonucleoproteins, zinc finger proteins, and nuclear pore complex proteins: a proteomic analysis.

Authors:  Tianwei Li; Evgenij Evdokimov; Rong-Fong Shen; Chien-Chung Chao; Ephrem Tekle; Tao Wang; Earl R Stadtman; David C H Yang; P Boon Chock
Journal:  Proc Natl Acad Sci U S A       Date:  2004-05-25       Impact factor: 11.205

Review 4.  SUMO and transcriptional regulation.

Authors:  David W H Girdwood; Michael H Tatham; Ronald T Hay
Journal:  Semin Cell Dev Biol       Date:  2004-04       Impact factor: 7.727

5.  Prediction of phosphorylation sites using SVMs.

Authors:  Jong Hun Kim; Juyoung Lee; Bermseok Oh; Kuchan Kimm; Insong Koh
Journal:  Bioinformatics       Date:  2004-07-01       Impact factor: 6.937

6.  Sumo1 conjugates mitochondrial substrates and participates in mitochondrial fission.

Authors:  Zdena Harder; Rodolfo Zunino; Heidi McBride
Journal:  Curr Biol       Date:  2004-02-17       Impact factor: 10.834

7.  GPS: a novel group-based phosphorylation predicting and scoring method.

Authors:  Feng-Feng Zhou; Yu Xue; Guo-Liang Chen; Xuebiao Yao
Journal:  Biochem Biophys Res Commun       Date:  2004-12-24       Impact factor: 3.575

8.  The small ubiquitin-like modifier (SUMO) protein modification system in Arabidopsis. Accumulation of SUMO1 and -2 conjugates is increased by stress.

Authors:  Jasmina Kurepa; Joseph M Walker; Jan Smalle; Mark M Gosink; Seth J Davis; Tessa L Durham; Dong-Yul Sung; Richard D Vierstra
Journal:  J Biol Chem       Date:  2002-12-12       Impact factor: 5.157

9.  Proper SUMO-1 conjugation is essential to DJ-1 to exert its full activities.

Authors:  Y Shinbo; T Niki; T Taira; H Ooe; K Takahashi-Niki; C Maita; C Seino; S M M Iguchi-Ariga; H Ariga
Journal:  Cell Death Differ       Date:  2006-01       Impact factor: 15.828

10.  Regulation of Smad4 sumoylation and transforming growth factor-beta signaling by protein inhibitor of activated STAT1.

Authors:  Min Liang; Frauke Melchior; Xin-Hua Feng; Xia Lin
Journal:  J Biol Chem       Date:  2004-03-17       Impact factor: 5.157

View more
  86 in total

1.  Small ubiquitin-like modifier (SUMO) conjugation impedes transcriptional silencing by the polycomb group repressor Sex Comb on Midleg.

Authors:  Matthew Smith; Daniel R Mallin; Jeffrey A Simon; Albert J Courey
Journal:  J Biol Chem       Date:  2011-01-28       Impact factor: 5.157

2.  MMFPh: a maximal motif finder for phosphoproteomics datasets.

Authors:  Tuobin Wang; Arminja N Kettenbach; Scott A Gerber; Chris Bailey-Kellogg
Journal:  Bioinformatics       Date:  2012-04-23       Impact factor: 6.937

3.  CD2AP regulates SUMOylation of CIN85 in podocytes.

Authors:  Irini Tossidou; Rainer Niedenthal; Malte Klaus; Beina Teng; Kirstin Worthmann; Benjamin L King; Kevin J Peterson; Hermann Haller; Mario Schiffer
Journal:  Mol Cell Biol       Date:  2011-12-27       Impact factor: 4.272

4.  Tripartite motif-containing protein 28 is a small ubiquitin-related modifier E3 ligase and negative regulator of IFN regulatory factor 7.

Authors:  Qiming Liang; Hongying Deng; Xiaojuan Li; Xianfang Wu; Qiyi Tang; Tsung-Hsien Chang; Hongzhuang Peng; Frank J Rauscher; Keiko Ozato; Fanxiu Zhu
Journal:  J Immunol       Date:  2011-09-21       Impact factor: 5.422

5.  Multiple post-translational modifications affect heterologous protein synthesis.

Authors:  Alexander A Tokmakov; Atsushi Kurotani; Tetsuo Takagi; Mitsutoshi Toyama; Mikako Shirouzu; Yasuo Fukami; Shigeyuki Yokoyama
Journal:  J Biol Chem       Date:  2012-06-06       Impact factor: 5.157

6.  Preparation and characterization of polyclonal antibody against Kaposi's sarcoma-associated herpesvirus lytic gene encoding RTA.

Authors:  Weifei Fan; Qiao Tang; Chenyou Shen; Di Qin; Chun Lu; Qin Yan
Journal:  Folia Microbiol (Praha)       Date:  2015-04-02       Impact factor: 2.099

7.  In vivo modeling of polysumoylation uncovers targeting of Topoisomerase II to the nucleolus via optimal level of SUMO modification.

Authors:  Yoshimitsu Takahashi; Alexander Strunnikov
Journal:  Chromosoma       Date:  2007-11-29       Impact factor: 4.316

8.  Protein sumoylation sites prediction based on two-stage feature selection.

Authors:  Lin Lu; Xiao-He Shi; Su-Jun Li; Zhi-Qun Xie; Yong-Li Feng; Wen-Cong Lu; Yi-Xue Li; Haipeng Li; Yu-Dong Cai
Journal:  Mol Divers       Date:  2009-05-27       Impact factor: 2.943

9.  Dynamic compartmentalization of base excision repair proteins in response to nuclear and mitochondrial oxidative stress.

Authors:  Lyra M Griffiths; Dan Swartzlander; Kellen L Meadows; Keith D Wilkinson; Anita H Corbett; Paul W Doetsch
Journal:  Mol Cell Biol       Date:  2008-11-24       Impact factor: 4.272

10.  Large-scale comparative assessment of computational predictors for lysine post-translational modification sites.

Authors:  Zhen Chen; Xuhan Liu; Fuyi Li; Chen Li; Tatiana Marquez-Lago; André Leier; Tatsuya Akutsu; Geoffrey I Webb; Dakang Xu; Alexander Ian Smith; Lei Li; Kuo-Chen Chou; Jiangning Song
Journal:  Brief Bioinform       Date:  2019-11-27       Impact factor: 11.622

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.