| Literature DB >> 26154679 |
Hao-Dong Xu1, Shao-Ping Shi2, Xiang Chen1, Jian-Ding Qiu3.
Abstract
Protein function has been observed to rely on select essential sites instead of requiring all sites to be indispensable. Small ubiquitin-related modifier (SUMO) conjugation or sumoylation, which is a highly dynamic reversible process and its outcomes are extremely diverse, ranging from changes in localization to altered activity and, in some cases, stability of the modified, has shown to be especially valuable in cellular biology. Motivated by the significance of SUMO conjugation in biological processes, we report here on the first exploratory assessment whether sumoylation related genetic variability impacts protein functions as well as the occurrence of diseases related to SUMO. Here, we defined the SUMOAMVR as sumoylation related amino acid variations that affect sumoylation sites or enzymes involved in the process of connectivity, and categorized four types of potential SUMOAMVRs. We detected that 17.13% of amino acid variations are potential SUMOAMVRs and 4.83% of disease mutations could lead to SUMOAMVR with our system. More interestingly, the statistical analysis demonstrates that the amino acid variations that directly create new potential lysine sumoylation sites are more likely to cause diseases. It can be anticipated that our method can provide more instructive guidance to identify the mechanisms of genetic diseases.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26154679 PMCID: PMC4495600 DOI: 10.1038/srep10900
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Comparison of predictive performance of SumoPred with other predictors.
| method | stringency | the performance of prediction | |||
|---|---|---|---|---|---|
| Ac (%) | Sn (%) | Sp (%) | MCC (%) | ||
| GPS-SUMO | Low | 66.67% | 78.79% | 54.55% | 34.36% |
| Medium | 77.23% | 72.73% | 81.82% | 54.73% | |
| High | 72.73% | 51.52% | 50.20% | ||
| SeeSUMO | Low | 69.70% | 48.48% | 43.50% | |
| Medium | 63.14% | 75.76% | 51.52% | 28.11% | |
| High | 51.53% | 9.09% | 5.73% | ||
| SumoPred | — | 87.88% | 81.82% | ||
aGPS-SUMO, SeeSUMO and SumoPred were tested using an entirely independent dataset bThe highest values for each threshold are indicated in bold. Abbreviations: Ac, Accuracy; Sn, Sensitivity; Sp, Specificity; MCC, Matthews correlation coefficient.
Figure 1The receiver operating characteristic (ROC) curves of SumoPred prediction on 10 training sets.
Figure 2Schematic illustration of four of SUMOAMVRs, which include the change of an amino acid by lysine residue to create a potential new (Type I (+)) or remove an original lysine sumoylation site (Type I (−)); variations adjacent to sumoylation sites to create (Type II (+)) or remove (Type II (−)) sumoylation sites; and variations which may change the types of E3 ligase that recognize sumoylation sites, without changing the sumoylation site itself (Type III); variations adjacent to sumoylation sites and further transform the type of PTMs that it should have happened (Type IV).
Yellow amino acid residues are mutation residues, sumo represents the small ubiquitin-related modifier, and a lysine (K) linked with a sumo represents that this K can be sumoylated by E3 ligase. E3a represents one type of lysine E3 ligase and E3b is another type of lysine E3 ligase.
The data statistics for three types of SUMOAMVR detection in different specificity optionsa.
| Specificity | Type1 | All | Type11 | All | Type111 | ||
|---|---|---|---|---|---|---|---|
| Type1(+) | Type1(−) | Type11(+) | Type11(−) | ||||
| Default | 512 | 381 | 893 | 1364 | 1374 | 2738 | 8001 |
| 70% | 235 | 176 | 411 | 645 | 689 | 1334 | 3552 |
| 80% | 135 | 102 | 237 | 413 | 431 | 844 | 2053 |
| 90% | 34 | 45 | 79 | 194 | 203 | 397 | 760 |
| 95% | 9 | 12 | 21 | 75 | 88 | 163 | 220 |
aWe used SumoPred to predict sumoylation sites after taking into account of the amino acid variations in human proteins.
The examples of type I SUMOAMVR, which include the change of an amino acid with lysine residue or vice versa to create a potential new (Type I (+)) or remove an original lysine sumoylation site (Type I (−)).
| Gene name (UniProtKB ID) | Variation site | Sumo- ylation site | Local peptide sequence | Modify Source | Variation Source | Status | MIM |
|---|---|---|---|---|---|---|---|
| Type I (−) SUMOAMVRs | |||||||
| TP53 (P04637) | Lys386Asn | K386 | STSRHKKLMF | Phosphositeplus | IntOGen | DOID:1612-breast cancer | 114480 |
| NPM1 (P06748) | Lys263Arg | K263 | ASIEKGGSLP | Phosphositeplus | ICGC; IntOGen | DOID:9119-acute myeloid leukemia; DOID:74-hematopoietic system disease | 601626 |
| CBS (P35520) | Lys211Arg | K211 | ESHVGVAWRL | Phosphositeplus | dbSNP | — | — |
| NR1H3 (Q13133) | Lys434Asn | K434 | QVFALRLQDK | Phosphositeplus | dbSNP | — | — |
| ZNF221 (Q9UK13) | Lys423Ile | K423 | QQVHSGQKSF | Phosphositeplus | TCGA | — | — |
| FOSL2 (P15408) | Lys222Asn | K222 | GGGSVGAVVV | Phosphositeplus | TCGA | — | — |
| NR1H3 (Q13133) | Lys328Gln | K328 | DFSYNREDFA | Phosphositeplus | dbSNP | — | — |
| INO80B (Q9C086) | Lys168Asn | K168 | KGELDDNGDL | HPRD9.0 | nextProt | — | — |
| HIF1A (Q16665) | Lys477T | K477 | DPALNQEVAL | Phosphositeplus | TCGA | — | — |
| PML;MYL (P29590) | Lys487Asn | K487 | RKCSQTQCPR | Phosphositeplus | ICGC | — | — |
| SALL1 (Q9NSC2) | Lys1086Asn | K1086 | IPANSLSSLI | HPRD9.0 | TCGA | — | — |
| IRF2 (P14316) | Lys166Ile | K166 | PEYAVLTSTI | UniProtKB | dbSNP | — | — |
| RSBN1 (Q5VWQ0) | Lys313Asn | K313 | GLNKESFRYL | HPRD9.0 | dbSNP | — | — |
| SREBF1 (P36956) | Lys123Gln | K123 | MPAFSPGPGI | HPRD9.0 | dbSNP | — | — |
| ARID5B (Q14865) | Lys629Asn | K629 | MADYIANCTV | HPRD9.0 | ICGC;TCGA; COSMIC; IntOGen | DOID:2871-endometrial carcinoma; DOID:9460-uterinecorpus cancer | 608089 |
| ACTB (P60709) | Lys68Gln | K68 | AQSKRGILTL | Phosphositeplus | dbSNP | — | — |
| SCIN (Q9Y6U3) | Lys299Gln | K299 | AAKQIFVWKG | Phosphositeplus | COSMIC | DOID:234-colon adenocarcinoma | — |
| ZBTB1 (Q9Y2K1) | Lys328Gln | K328 | RAAERKRIII | Phosphositeplus | nextProt | DOID:2526-prostate adenocarcinoma | 176807 |
| VHL (P40337) | Lys171Asn | K171 | RCLQVVRSLV | Phosphositeplus | nextProt; COSMIC | DOID:4465- papillary renal cell carcinoma | 605074 |
| PIAS4 (Q8N2W9) | Lys128Met | K128 | GLGRLPAKTL | UniProtKB | nextProt; dbSNP | — | — |
| HIST1H4A (P62805) | Lys13Ile | K13 | GRGKGGKGLG | Phosphositeplus | nextProt; COSMIC; IntOGen | DOID:1749-squamous cell carcinoma; DOID:8557-oropharynx cancer | — |
| HIST1H4A (P62805) | Lys13Asn | K13 | GRGKGGKGLG | Phosphositeplus | TCGA; IntOGen | DOID:8557-oropharynx cancer | — |
| SNCA (P37840) | Lys96Arg | K96 | GSIAAATGFV | Phosphositeplus | ICGC;TCGA | DOID:234-colon adenocarcinoma | — |
| IGF1R (P08069) | Lys1150Arg | K1150 | NCMVAEDFTV | Phosphositeplus | TCGA; COSMIC | — | — |
| BLM (P54132) | Lys347Asn | K347 | TSKDLLSKPE | Phosphositeplus | NCI-60panel | — | — |
| HDAC1 (Q13547) | Lys476Gln | K476 | KEEKPEAKGV | Phosphositeplus | NCI-60panel | — | — |
| PDE4D (Q08499) | Lys387Asn | K387 | TNSSIPRFGV | UniProtKB | TCGA | — | — |
| UBA2 (Q9UBT2) | Lys257Asn | K257 | TGYDPVKLFT | Phosphositeplus | TCGA | — | — |
| BLM (P54132) | Lys344Asn | K344 | VLSTSKDLLS | Phosphositeplus | dbSNP | — | — |
| MAPKAPK2(P49137) | Lys353Arg | K353 | KEDKERWEDV | Phosphositeplus | nextProt; dbSNP | — | — |
| CHD7 (Q9P2D1) | Lys1196Asn | K1196 | MLRRLKEDVE | Phosphositeplus | ICGC;TCGA | DOID:2871-endometrial carcinoma | 608089 |
| XRCC4 (Q13426) | Lys210Arg | K210 | NAAQEREKDI | Phosphositeplus | dbSNP | — | — |
| USP25 (Q9UHP3) | Lys141Thr | K141 | RVLEASIAEN | Phosphositeplus | dbSNP | — | — |
| RLF (Q13129) | Lys1561Thr | K1561 | CMVQGCLSVV | HPRD9.0 | TCGA | — | — |
| TBX22 (Q9Y458) | Lys63Asn | K63 | GKSEPLEKQP | Phosphositeplus | nextProt; COSMIC; IntOGen | DOID:8557-oropharynx cancer | — |
| SUMO1 (P63165) | Lys16Asn | K16 | AKPSTEDLGD | Phosphositeplus | TCGA; COSMIC | — | — |
| KCNIP3 (Q9Y2W7) | Lys90Gln | K90 | DQLQAQTKFT | Phosphositeplus | TCGA | — | — |
| HIF1A (Q16665) | Lys391Arg | K391 | EDTSSLFDKL | Phosphositeplus | TCGA | — | — |
| RANBP2 (P49792) | Lys2725Gln | K2725 | EKKPTVEEKA | Phosphositeplus | TCGA | — | — |
| IRF2 (P14316) | Lys137Gln | K137 | TEKEDKVKHI | UniProtKB | COSMIC | DOID:684-hepatocellular carcinoma | 114550 |
| POLD3 (Q15054) | Lys433Gln | K433 | SVHRPPAMTV | Phosphositeplus | ICGC;TCGA; COSMIC; IntOGen | DOID:1324-lung cancer; DOID:3907-lung squamous cell carcinoma | 211980 608935 612593 614210 |
| ZNF462 (Q96JM2) | Lys2482Asn | K2482 | DEAIGIDFSL | HPRD9.0 | TCGA | — | — |
| MDM4 (O15151) | Lys254Asn | K254 | SVSEQLGVGI | Phosphositeplus | dbSNP | — | — |
| APP;A4 (P05067) | Lys670Asn | K670 | NIKTEEISEV | Phosphositeplus | nextProt | — | — |
| UBA2 (Q9UBT2) | Lys623Asn | K623 | KLDEKENLSA | Phosphositeplus | ICGC;TCGA | DOID:2871-endometrial carcinoma | 608089 |
| SUMO1 (P63165) | Lys7Gln | K7 | MSDQEA | Phosphositeplus | TCGA; IntOGen | DOID:1324-lung cancer | 211980 608935 612593 |
| HSF1 (Q00613) | Lys298Asn | K298 | PLSSSPLVRV | UniProtKB | dbSNP | — | — |
| USP25 (Q9UHP3) | Lys99Gln | K99 | TNVIDLTGDD | UniProtKB | dbSNP | — | — |
| NF2 (P35240) | Lys76Arg | K76 | YTIKDTVAWL | Experimental verification | — | Disrupts merlin cortical cytoskeleton residency and attenuates its stability | — |
| AGO2 (Q9UKV8) | Lys402Arg | K402 | PYVREFGIMV | Experimental verification | — | Regulate sits stability | — |
| RARA (P10276) | Lys399Arg | K399 | AKGAERVITL | UniProtKB | dbSNP | Regulates its subcellular localization and transcriptional activity | — |
| ATF7 (P17544) | Lys118Arg | K118 | SLPSTPDIKI | UniProtKB | — | Impact transcriptional and promoter binding activities | — |
| RARA (P10276) | Lys166Arg | K166 | ESVRNDRNKK | UniProtKB | dbSNP | Regulates its subcellular localization and transcriptional activity | — |
| RARA (P10276) | Lys171Arg | K171 | DRNKKKKEVP | UniProtKB | dbSNP | Regulates its subcellular localization and transcriptional activity | — |
| ZIC3 (O60481) | Lys248Arg | K248 | AFFRYMRQPI | Experimental verification | — | Regulates nuclear localization and function of zinc finger transcription factor ZIC3 | — |
| Type I (+) SUMOAMVRs | |||||||
| CASP8AP2 (Q9UKL3) | Gln1792Lys | K1792 | ANRPLKCIVE | Experimental verification | — | Regulates proteasome-dependent degradation of FLASH/Casp8AP2 | — |
| BHLHE41 (Q9C0J8) | Leu240 Lys | K240 | DFLRCHEERI | Experimental verification | — | — | — |
| BHLHE42 (Q9C0J8) | Thr255Lys | K255 | ADVKCVDWHP | Experimental verification | — | — | — |
| NPM1 (P06748) | Arg101Lys | K101 | GFEITPPVVL | Experimental verification | — | Involve in the survival of the parasite | — |
aLocation and amino acid changes of variations in the proteins.
bPeptide sequences with 21-mer amino acids. The amino acids in the eleventh position with bold style and underline are sumoylated residues.
cThe sources of the sumoylation sites.
dThe sources of the mutations.
eThe effects of the variations.
fMendelian Inheritance in Man of the related disease.
Figure 3A two-sample logo of the compositional biases around the sumoylation sites compared to the non-sumoylation sites.
This logo was prepared using the web server http://www.twosamplelogo.org/ and only residues significantly enriched and depleted surrounding sumoylation sites (t-test, P-value <0.05) are shown.
Several examples of the type II SUMOAMVR which include the amino acid variation not located on sumoylation position but on the adjacent positions that create (Type II (+)) or remove (Type II (−)) the sumoylation site.
| Gene name (UniProtKB ID) | Variation site | Sumo- ylation site | Local peptide sequence | Modify Source | Variation Source | Status | MIM |
|---|---|---|---|---|---|---|---|
| Type II (−) SUMOAMVRs | |||||||
| ATXN1 (P54253) | Ser776Ala | K772 | IEPSKPAATR | UniProtKB | dbSNP | DOID:1441- Spinocerebellar ataxia 1 | 164400 |
| HSF1 (Q00613) | Ser303Ala | K298 | PLSSSPLVRV | UniProtKB | dbSNP | – | – |
| LMNA (P02545) | Gln203Gly | K201 | VDAENRLQTM | UniProtKB | dbSNP | DOID:11726- Emery-Dreifuss muscular dystrophy 3, autosomal recessive | 181350 300696 310300 612998 612999 614302 |
| USP25 (Q9UHP3) | Ile92Ala | K99 | TNVIDLTGDD | UniProtKB | dbSNP | – | – |
| USP39 (Q53GS9) | Lys6Arg | K16 | Experimental verification | – | DOID:10283- prostate cancer | 176807 300147 300704 601518 602759 | |
aPeptide sequences with 21-mer amino acids. The amino acids marked only with the bold style are variation sites, and those with bold style and underline are sumoylated residues.
Several examples of the type III SUMOAMVR, caused by change in the types of E3 ligase involved, rather than in the sumoylation site itself, regardless of the positions of the variation.
| Gene name (UniProtKB ID) | Variation site | Sumo- ylation site | Local peptide sequence | Modify Source | Variation Source | Status | MIM |
|---|---|---|---|---|---|---|---|
| Type III SUMOAMVRs | |||||||
| CBX4 (O00257) | Thr497Ala | K494 | AGEPPSSLQV | UniProtKB | dbSNP | Small decrease in ZNF131 sumoylation | – |
| TP53 (P04637) | Phe385Ala | K386 | STSRH KKLM | UniProtKB | dbSNP | Reduced SUMO1 conjugation DOID:1612-breast cancer | 114480 |
Statistical analysis of different types of SUMOAMVR based on the condition of disease-associated and polymorphic mutations affecting lysine sumoylation with 70% specificity option.
| type | disease | polymorphism | unknown disease | p-value | |||
|---|---|---|---|---|---|---|---|
| num | per.(%) | num | per.(%) | num | per.(%) | ||
| Type1(+) | 69 | 0.28 | 138 | 0.36 | 28 | 0.43 | 0.09 |
| Type1(−) | 38 | 0.16 | 119 | 0.31 | 19 | 0.29 | 1.06E-4 |
| Type11(+) | 157 | 0.64 | 419 | 1.11 | 47 | 0.72 | 2.09E-9 |
| Type11(−) | 162 | 0.66 | 452 | 1.19 | 68 | 1.05 | 2.35E-9 |
| Type111 | 752 | 3.09 | 2501 | 6.61 | 293 | 4.51 | 0.00 |
| All | 1178 | 4.83 | 3629 | 9.58 | 455 | 7.00 | 0.00 |
aThe number of different types of SUMOAMVR.
bThe proportion of different types of SUMOAMVR.
cThe P-value of Pearson’s Chi-square test.
Figure 4The data statistics of pathway terms for disease-related sumoylation substrates on the background of normal sumoylation substrates.
The blue pillar represent the number of substrates of different pathway terms and the red pillar represent the percentage of the number of substrates of different pathway terms in all substrates. Statistical significance (P-value) gradually increased from top to bottom.
Statistical comparison of the GO terms of the disease-related and normal sumoylation substrates.
| Description of GO term | Diseased sumoylation | Normal sumoylation | P-value | Over/Under | ||
|---|---|---|---|---|---|---|
| Num | Per.(%) | Num | Per.(%) | |||
| Regulation of transcription from RNA polymeraseII promoter (GO:0006357) | 49 | 7.62 | 89 | 26.89 | 2.51E-15 | Over |
| Positive regulation of transcription (GO:0045941) | 47 | 7.31 | 83 | 25.08 | 7.45E-14 | Over |
| Positive regulation of gene expression (GO:0010628) | 50 | 7.78 | 85 | 25.68 | 1.68E-13 | Over |
| Negative regulation of nucleobase, nucleoside, and nucleic acid metabolic process (GO:0045934) | 34 | 5.29 | 69 | 20.85 | 5.41E-13 | Over |
| Positive regulation of nucleobase, nucleoside, and nucleic acid metabolic process (GO:0045935) | 52 | 8.09 | 85 | 25.68 | 5.58E-13 | Over |
| Intracellular organelle lumen (GO:0070013) | 105 | 16.33 | 135 | 40.79 | 2.18E-16 | Over |
| Organelle lumen (GO:0043233) | 110 | 17.11 | 136 | 41.09 | 1.44E-15 | Over |
| Membrane-enclosedlumen (GO:0031974) | 113 | 17.57 | 136 | 41.09 | 6.76E-15 | Over |
| Cell projection (GO:0042995) | 64 | 9.95 | 19 | 5.74 | 2.88E-02 | Over |
| Axon (GO:0030424) | 21 | 3.27 | 7 | 2.11 | 0.418431 | Under |
| Sequence-specific DNA Binding (GO:0043565) | 44 | 6.84 | 79 | 23.87 | 2.39E-05 | Over |
| Transcription activator Activity (GO:0016563) | 28 | 4.35 | 63 | 19.03 | 2.39E-05 | Over |
| RNA polymerase II transcription factor activity (GO:0003702) | 17 | 2.64 | 29 | 8.76 | 4.28E-04 | Over |
| Structure-specific DNA binding (GO:0043566) | 17 | 2.64 | 25 | 7.55 | 6.04E-04 | Over |
| Double-stranded DNA binding (GO:0003690) | 12 | 1.87 | 20 | 6.04 | 9.98E-04 | Over |
aThe number of diseased sumoylation substrate in different GO terms.
bThe proportion of diseased sumoylation substrate in different GO terms.
cThe P-value of Fisher exact test (Two-sided category).
dOver - or under-representation of diseased sumoylation compared with normal sumoylation in different GO terms.