| Literature DB >> 19139070 |
Gil-Mi Ryu1, Pamela Song, Kyu-Won Kim, Kyung-Soo Oh, Keun-Joon Park, Jong Hun Kim.
Abstract
We define phosphovariants as genetic variations that change phosphorylation sites or their interacting kinases. Considering the essential role of phosphorylation in protein functions, it is highly likely that phosphovariants change protein functions. Therefore, a comparison of phosphovariants between individuals or between species can give clues about phenotypic differences. We categorized phosphovariants into three subtypes and developed a system that predicts them. Our method can be used to screen important polymorphisms and help to identify the mechanisms of genetic diseases.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19139070 PMCID: PMC2651802 DOI: 10.1093/nar/gkn1008
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic illustration of phosphovariants according to their types.
The general performance test: the number of each data set
| Data I | Data II | |||
|---|---|---|---|---|
| (+) | (−) | (+) | (−) | |
| Ser/Thr | 1860 | 926 | 3017 | 315 |
| Tyr | 38 | 95 | 359 | 11 |
aDifferent data sets compiled with mass spectrometer. See the text for the detail explanation for data set I and II.
bThe types of amino acids located at the center of peptides. We annotated the peptides as (+) if the Ser/Thr or Tyr at the center of the peptides is phosphorylated. On contrary, we designated the peptide as (−) if the center of the peptides is not phosphorylated.
Examples of type I phosphovariants
| Gene name (Swiss-Prot ID) | Variation site | Phosphory lation site | Local peptide sequence | Effect | Reference(s) for variation | Reference(s) for phosphorylation site |
|---|---|---|---|---|---|---|
| EDNRB (P24530) | S305N (VAR_003472) | S305 | CEMLRKK S GMQIALN | Hirschsprung disease type 2 | 8852659 | 14636059 |
| FANCA (O15360) | S858R (VAR_017498) | S858 | QSRDTLC S CLSPGLI | Fanconi anemia | 10094191 11091222 | 17924679 |
| KCNJ1 (P48048) | S219R (VAR_019726) | S219 | RVANLRK S LLIGSHI | Bartter syndrome type 2 | 8841184 | 8621594 |
| L1CAM (P32004) | S1194L (VAR_003947) | S1194 | AFGSSQP S LNGDIKP | Hydrocephalus due to stenosis of the aqueduct of Sylvius mental retardation, aphasia, shuffling gait and adducted thumbs syndrome | 8556302 7881431 | 17081983 |
| MAPT (P10636) | S622N (VAR_010350) | S622 | KHVPGGG S VQIVYKP | Frontotemporal dementia and parkinsonism linked chromosome 17 | 10208578 | 7706316 |
| MAPT (P10636) | S637F (VAR_019665) | S637 | VDLSKVT S KCGSLGN | Pick disease | 11891833 | 11104762 9199504 |
| MAPT (P10636) | S669L (VAR_019667) | S669 | DFKDRVQ S KIGSLDN | Fatal respiratory hypoventilation | 14595660 | 11104762 |
| MITF (O75030) | S405P (VAR_010302) | S405 | QARAHGL S LIPSTGL | Waardenburg syndrome type IIa | 8589691 | 10587587 |
| NFKBIA (P25963) | S32I (VAR_034871) | S32 | LLDDRHD S GLDSMKD | Autosomal dominant anhidrotic ectodermal dysplasia with immunodeficiency | 14523047 | 10882136 9721103 8601309 16319058 10723127 9214631 |
| PER2 (O15055) | S662G (VAR_029080) | S662 | ALPGKAE S VASLTSQ | Familial advanced sleep-phase syndrome | 11232563 | 11232563 |
| PTPN11 (Q06124) | Y62D (VAR_015605) | Y62 | KIQNTGD Y YDLYGGE | Patients with Noonan syndrome 1 manifesting juvenile myelomonocytic leukemia | 11992261 12325025 12960218 12717436 | 15951569 15592455 |
| RAF1 (P04049) | S259F (VAR_037809) | S259 | SQRQRST S TPNVHMV | Noonan syndrome type 5 | 17603483 | 8349614 11997508 11971957 10576742 |
| RAF1 (P04049) | T491R (VAR_037819) | T491 | IGDFGLA T VKSRWSG | Noonan syndrome type 5 | 17603483 | 11447113 |
| RAF1 (P04049) | T491I (VAR_037818) | T491 | IGDFGLA T VKSRWSG | Noonan syndrome type 5 | 17603483 | 11447113 |
| RPS6KA3 (P51812) | S227A (VAR_006195) | S227 | DHEKKAY S FCGTVEY | Coffin–Lowry syndrome | 8955270 | 17192257 |
| STAT3 (P40763) | Y657C (VAR_037381) | Y657 | FAEIIMG Y KIMDATN | Hyperimmunoglobulin E recurrent infection syndrome autosomal dominant | 17881745 | 15037656 |
| TGFBR2 (P37173) | Y336N (VAR_022352) | Y336 | AKGNLQE Y LTRHVIS | Loeys–Dietz aortic aneurysm syndrome | 15731757 | 9169454 |
| TNNI3 (P19429) | S166F (VAR_029454) | S166 | LGARAKE S LDLRAHL | Hypertrophic cardiomyopathy | 12974739 | 11121119 |
| TSC1 (Q92574) | T417I (VAR_009403) | T417 | SLPQATV T PPRKEER | Tuberous sclerosis complex, could be a polymorphism | 10570911 10607950 | 14551205 |
| CDH1 (P12830) | S838G (VAR_001322) | S838 | LVFDYEG S GSEAASL | Ovarian cancer | 8075649 | 10671552 |
| CTNNB1 (P35222) | S23R (VAR_017612) | S23 | PDRKAAV S HWQQQSY | Hepatocellular carcinoma, no effect | 10435629 12027456 | 12027456 |
| CTNNB1 (P35222) | S33F (VAR_017617) | S33 | QQQSYLD S GIHSGAT | Pilomatrixoma, medulloblastoma and hepatocellular carcinoma | 10666372 10435629 10192393 | 12000790 12114015 11818547 |
| CTNNB1 (P35222) | S33L (VAR_017618) | S33 | QQQSYLD S GIHSGAT | Hepatocellular carcinoma | 10435629 | 12000790 12114015 11818547 |
| CTNNB1 (P35222) | S37A (VAR_017624) | S37 | YLDSGIH S GATTTAP | Medulloblastoma, hepatocellular carcinoma | 12027456 10435629 10666372 | 12000790 12114015 11818547 |
| CTNNB1 (P35222) | S37C (VAR_017625) | S37 | YLDSGIH S GATTTAP | Pilomatrixoma, hepatoblastoma | 9927029 10192393 | 12000790 12114015 11818547 |
| CTNNB1 (P35222) | S37F (VAR_017626) | S37 | YLDSGIH S GATTTAP | Pilomatrixoma | 10192393 | 12000790 12114015 11818547 |
| CTNNB1 (P35222) | T41A (VAR_017629) | T41 | GIHSGAT T TAPSLSG | Hepatoblastoma and hepatocellular carcinoma, also in a desmoid tumor | 12051714 10398436 9927029 12027456 10655994 10435629 | 12051714 12114015 11818547 12000790 |
| CTNNB1 (P35222) | T41I (VAR_017630) | T41 | GIHSGAT T TAPSLSG | Pilomatrixoma and hepatocellular carcinoma | 10192393 10435629 | 12051714 12114015 11818547 12000790 |
| CTNNB1 (P35222) | S45F (VAR_017631) | S45 | GIHSGAT T TAPSLSG | Hepatocellular carcinoma | 10435629 | 12051714 12000790 11955436 |
| CTNNB1 (P35222) | S45P (VAR_017632) | S45 | GATTTAP S LSGKGNP | Hepatocellular carcinoma | 10435629 | 12051714 12000790 11955436 |
| FAM10A4 (Q8IZP2) | S71L (VAR_023644) | S71 | DLKADEP S SEESDLE | B-cell leukemia, multiple myeloma, and prostate cancer | 12079276 | 17081983 |
| MET (P08581) | Y1230C (VAR_006292) | Y1230 | FGLARDM Y DKEYYSV | Hereditary papillary renal carcinoma | 9140397 | 12475979 |
| MET (P08581) | Y1230H (VAR_006293) | Y1230 | FGLARDM Y DKEYYSV | Hereditary papillary renal carcinoma | 9140397 | 12475979 |
| NME1 (P15531) | S120G (VAR_004625) | S120 | GRNIIHG S DSVESAE | Neuroblastoma | 8047138 | 8810265 |
| RB1 (P06400) | S567L (VAR_005579) | S567 | SLAWLSD S PLFDLIK | Retinoblastoma | 10671068 2594029 | 10207050 |
| TP53 (P04637) | T155A (VAR_005901) | T155 | DSTPPPG T RVRAMAI | Esophageal cancer | 1868473 | 12628923 |
| BARD1 (Q99728) | S186G (VAR_038371) | S186 | SYEFVSP S PPADVSE | Polymorphism (rs16852741) | 15855157 | |
| C10orf11 (Q9H2I8) | S153F (VAR_033686) | S153 | SSEDVAS S PERHYTP | Polymorphism (rs35349706) | 16964243 | |
| CTNND1 (O60716) | Y217C (VAR_020929) | Y217 | PDGYSRH Y EDGYPGG | Polymorphism (rs11570194) | 15592455 16212419 | |
| CTPS (P17812) | S571I (VAR_027055) | S571 | RDTYSDR S GSSSPDS | Polymorphism (rs17856308) | 15489334 | 16097034 17081983 |
| HIF1A (Q16665) | T796A (VAR_015854) | T796 | ESGLPQL T SYDCEVN | Polymorphism (rs1802821) | 17382325 | |
| INSR (P06213) | Y1361C (VAR_015933) | Y1361 | SYEEHIP Y THMNGGK | Polymorphism (rs13306449) | 7657032 | 11401470 |
| KRT36 (O76013) | T315M (VAR_020306) | T315 | EIIELRR T VNALEIE | Polymorphism (rs2301354) | 17081983 | |
| MYH15 (Q9Y2K3) | T1125A (VAR_030238) | T1125 | KTVKELQ T QIKDLKE | Polymorphism (rs3900940) | 17081983 | |
| PDLIM5 (Q96HC4) | S136F (VAR_023779) | S136 | PRPFGSV S SPKVTSI | Polymorphism (rs2452600) | 17287340 | |
| PNN (Q9H307) | S671G (VAR_023368) | S671 | HKSSKGG S SRDTKGS | Polymorphism (rs13021) | 10095061 | 17287340 |
| SUB1 (P53999) | S11G (VAR_032870) | S11 | SKELVSS S SSGSDSD | Polymorphism (rs17850527) | 15489334 | 17081983 16689930 |
| SRRM2 (Q9UQ35) | S883C (VAR_027260) | S883 | SPDPELK S RTPSRHS | Polymorphism (rs17136053) | 17287340 | |
| TP53 (P04637) | S366A (VAR_022317) | S366 | PGGSRAH S SHLKSKK | Polymorphism | 9183006 | |
| DDX27 (Q96GQ7) | G766S | S766 | ALKQYRA G PSFEERK | Unknown | 16565220 | 16565220 |
aLocations and amino acid changes of the variations in the proteins.
bPeptide sequences with 15-mer amino acids. The amino acids in the eighth position are phosphorylated residues.
cThe meanings or consequences of the variations. We referred to the feature tables of Swiss-Prot for these effects. If the polymorphisms are enrolled in dbSNP, the IDs of dbSNP are written in the parentheses.
dPubmed ID for the references of the variations
ePubmed ID for the references of the phosphorylation sites
Protein names which are abbreviated by their gene names: epithelial cadherin (precursor), CDH1; catenin β-1, CTNNB1; probable ATP-dependent RNA helicase DDX27, DDX27; endothelin B receptor (precursor), EDNRB; protein FAM10A4, FAM10A4; Fanconi anemia group A protein, FANCA; ATP-sensitive inward rectifier potassium channel 1, KCNJ1; keratin, type I cuticular Ha6, KRT36; Neural cell adhesion molecule L1, L1CAM; microtubule-associated protein tau, MAPT; hepatocyte growth factor receptor (precursor), MET; microphthalmia-associated transcription factor, MITF; NF-κ-B inhibitor α, NFKBIA; nucleoside diphosphate kinase A, NME1; period circadian protein homolog 2, PER2; tyrosine-protein phosphatase nonreceptor type 11, PTPN11; RAF proto-oncogene serine/threonine-protein kinase, RAF1; retinoblastoma-associated protein, RB1; ribosomal protein S6 kinase alpha-3, RPS6KA3; signal transducer and activator of transcription 3, STAT3; TGF-beta receptor type-2 (precursor), TGFBR2; cardiac troponin I, TNNI3; cellular tumor antigen p53, TP53; Hamartin, TSC1.
Figure 2.Sequence logos of amino acid sequences near phosphorylation sites recognized by the CMGC kinase group. The horizontal axis represents sequential positions relative to the phosphorylation site. The vertical axis represents decreases in uncertainty. Each letter refers to an amino acid. As the frequency of an amino acid at a given position increases, its height increases.
Examples of type II(−) phosphovariants
| Gene name (Swiss-Prot ID) | Variation site (Swiss-Prot variant ID) | Removed phosphorylation site (related kinases) | Local peptide sequence | Effect | Reference(s) for variation | Reference(s) for phosphorylation site |
|---|---|---|---|---|---|---|
| DUT (P33316) | P100S (VAR_022314) | S99 | GPETPAI S | Polymorphism | 17081983 8631817 | |
| GJA1 (P17302) | P283L (VAR_014101) | S282 | TAPLSPM S | Polymorphism (rs2228974) | 8631994 9535905 | |
| PPARG (P37231) | P113Q (VAR_010724) | S112 | AIKVEPA S | Obesity and polymorphism (rs1800571) | 9753710 | 9030579 |
| RXRA (P19793) | P261L (VAR_014620) | S260 | NMGLNPS S | Polymorphism (rs2234960) | 12048211 |
aThe variation sites are underlined and are marked with the bold style.
bThe removals of the phosphorylation sites by the variation have not been confirmed by experiments. However, the removals of the phosphorylation sites are highly possible because the nearby phosphorylation sites are proved to be recognized by the CMGC group.
cThe removal of the phosphorylation site by the variation has been confirmed by a experiment (12).
If the variations substitute the proline residues at position +1 relative to the phosphorylation sites into other amino acids, the nearby phosphorylation sites recognized by the CMGC kinase group can be eliminated or the efficiency of phosphorylation in that site is significantly decreased.
Protein names which are abbreviated by their gene names: deoxyuridine 5′-triphosphate nucleotidohydrolase, mitochondrial (precursor), DUT; gap junction α-1 protein, GJA1; peroxisome proliferator-activated receptor γ, PPARG; retinoic acid receptor RXR-α, RXRA.
Possible examples of type III phosphovariants
| Gene name (Swiss-Prot ID) | Variation site (Swiss-Prot variant ID) | Related phosphorylation site (kinase recognizing it) | Local peptide sequence | Effect | Reference(s) for variation | Reference(s) for phosphorylation site |
|---|---|---|---|---|---|---|
| PTPN1 (P18031) | P387L (VAR_022014) | S386 (CDC2 and CK2) | LRGAQAA S | Low glucose tolerance and polymorphism (rs16995309) | 15919835 | 9600099 8491187 |
| BRCA1 (P38398) | S1217Y (VAR_020695) | S1217 | ESSEENL S SEDEELP | Breast cancer and breast-ovarian cancer | 14722926 | 17081983 |
| CASP8 (Q14790) | S219T (VAR_025816) | S219 | PREQDSE S QTLDKVY | Polymorphism (rs35976359) | 17525332 | |
| CDK2 (P24941) | Y15S (VAR_016157) | Y15 | EKIGEGT Y GVVYKAR | Polymorphism (rs3087335) | 1396589 12912980 12972555 15144186 | |
| CTNNB1 (P35222) | S33Y (VAR_017619) | S33 | QQQSYLD S GIHSGAT | Pilomatrixoma | 12027456 10192393 | 12000790 12114015 11818547 |
| CTNNB1 (P35222) | S37Y (VAR_017627) | S37 | YLDSGIH S GATTTAP | Hepatocellular carcinoma | 10435629 | 12000790 12114015 11818547 |
| TEK (Q02763) | Y897S (VAR_008716) | Y897 | GACEHRG Y LYLAIEY | Dominantly inherited venous malformations | 10369874 | 11080633 |
| XRCC1 (P18887) | S485Y (VAR_014779) | S485 | QDNGAED S GDTEDEL | Polymorphism (rs2307184) | 15066279 | |
The reasons why these variations are classified as type III are detailed in the text.
Protein names which are abbreviated by their gene names: breast cancer type 1 susceptibility proteinBRCA1; caspase 8, CASP8; cyclin dependent kinase 2, CDK2; catenin β-1, CTNNB1; tyrosine-protein phosphatase nonreceptor type 1, PTPN1; angiopoietin-1 receptor (precursorP, TEK; DNA repair protein XRCC1, XRCC1.
The general performance test of the PredPhospho for two real data sets
| Specificity | Prediction at the group level | Specificity | Prediction at the family level | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Data I | Data II | Data I | Data II | ||||||
| No | 79.40 | 60.62 | 75.47 | 61.04 | No | 95.36 | 29.29 | 93.60 | 30.98 |
| 95% | 73.24 | 72.09 | 65.76 | 72.39 | 95% | 92.68 | 42.80 | 88.95 | 46.63 |
| 97% | 53.37 | 80.22 | 48.31 | 79.45 | 97% | 88.46 | 56.81 | 83.50 | 57.36 |
| 98% | 43.11 | 89.23 | 38.92 | 89.26 | 98% | 82.03 | 66.80 | 75.44 | 62.88 |
| 99% | 23.39 | 95.79 | 20.05 | 96.62 | 99% | 73.18 | 72.67 | 66.73 | 72.39 |
aOptions of the specificity. For example, ‘99%’ specificity option mean cutoff value is adjusted for each model to have 99% specificity, and ‘No’ specificity option means each model has default cutoff value without adjustment of specificity (See supplementary material).
Abbreviations: sensitivity, Sn; specificity, Sp.
The general performance test of the Scansite for two real data sets
| Stringency | Data I | Data II | ||
|---|---|---|---|---|
| Low | 84.47 | 52.60 | 83.92 | 57.06 |
| Medium | 48.63 | 85.21 | 43.81 | 87.73 |
| High | 16.39 | 96.77 | 13.60 | 95.71 |
aScansite has three levels of stringency: high, medium and low. High stringency involves low sensitivity and high specificity, whereas low stringency involves high sensitivity and low specificity.
The number of the phosphovariants
| Specificity | Type I(−) | Type I(+) | Type II(−) | Type II(+) | Type III |
|---|---|---|---|---|---|
| No | 1729 | 2036 | 5455 | 4980 | 5299 |
| 95% | 981 | 1195 | 1304 | 1070 | 986 |
| 97% | 613 | 778 | 694 | 542 | 401 |
| 98% | 314 | 409 | 329 | 213 | 151 |
| 99% | 116 | 150 | 98 | 52 | 21 |
| No | 3039 | 3717 | 3969 | 3926 | 23 955 |
| 95% | 2379 | 2910 | 2882 | 2840 | 8113 |
| 97% | 1720 | 2104 | 1439 | 1483 | 2390 |
| 98% | 1268 | 1551 | 783 | 862 | 1213 |
| 99% | 946 | 1180 | 539 | 548 | 638 |
| Low | 1581 | 1852 | 4255 | 3773 | 7697 |
| Medium | 443 | 498 | 487 | 384 | 152 |
| High | 83 | 128 | 35 | 28 | 1 |
Predicted phosphovariants whose phosphorylation sites were confirmed in human or orthologous proteins
| Gene name (Swiss-Prot ID) | Site (Swiss-Prot variant ID) | Related phosphorylation site (predicted kinase recognizing it | Local peptide sequence | Effect | Reference(s) for variation | Reference(s) for phosphorylation site |
|---|---|---|---|---|---|---|
| ACIN1 (Q9UKV3) | S478F (VAR_022033) | S478 (CDK, GSK, and MAPK) | VQLVGGL S PLSSPSD | Polymorphism (rs3751501) | 17242355 (mouse) | |
| MECP2 (P51608) | S229L (VAR_018200) | S229 (MAPK) | VKMPFQT S PGGKAEG | Polymorphism | 10767337 12872250 | 17046689 (rat) |
| PAH (P00439) | S16P (VAR_000869) | S16 (RSK and CAMKL) | PGLGRKL S DFGQETS | Phenylketonuria | 1679029 2246858 1301187 | 7387651 (rat) |
| GTSE1 (Q9NYZ3) | R506W (VAR_024154) | S504 (PKC) | PAPQSLL S A | Polymorphism (rs140054) | 10591208 | 16964243 |
| LIG1 (P18858) | P52L (VAR_020194) | S51 (CDK and MAPK) | GVVSESD S | Polymorphism (rs4987181) | 16964243 | |
Eight of the type I(−) phosphovariants (VAR_006195, VAR_023368, VAR_023779, VAR_023644, VAR_030238, VAR_020306, VAR_033686, and VAR_027260), and a type III phosphovariant (VAR_020695) were also predicted. However, their detail information are already written in Table 1 and 3.
aThe prediction was done with the 99% specificity option of PredPhospho at the kinase family level.
bKinases that were predicted to recognize the original sequence.
cThe experiment was done in the proteins of other than human. The names of the species are written in the parenthesis.
dRemoved kinases mean that they were predicted not to recognize the variation sequences, while they were predicted to recognize the original sequences.
eAdded kinases mean that they were predicted to recognized the variation sequences, while they were predicted not to recognize the original sequences. The added kinases were written in the parentheses.
fThe reference numbers which are started with ‘rs’ are dbSNP ID.
Protein names which are abbreviated by their gene names: Proto-oncogene tyrosine-protein kinase ABL1, ABL1; ATP-binding cassette sub-family B member 11, ABCB11; Apoptotic chromatin condensation inducer in the nucleus, ACIN1; Aquaporin-2, AQP2; Caspase-8 [Precursor], CASP8; Secretogranin-1 [precursor], CHGB; Eukaryotic translation initiation factor 4 gamma 3, EIF4G3; G2 and S phase-expressed protein 1, GTSE1; Zinc finger protein KIAA1802, KIAA1802; DNA ligase 1, LIG1; Methyl-CpG-binding protein 2, MECP2; Myosin-binding protein C, cardiac-type, MYBPC3; Phenylalanine-4-hydroxylase, PAH; Parkinson disease protein 7, PARK7; Membrane-associated tyrosine- and threonine-specific cdc2-inhibitory kinase, PKMYT1; Pinin, PNN; Protein phosphatase 1 regulatory subunit 12B, PPP1R12B; Ribosomal protein S6 kinase alpha-3, RPS6KA3; SH3 and PX domain-containing protein 2A, SH3PXD2A; WD repeat-containing protein 91, WDR91.