| Literature DB >> 22973286 |
Jan-Ole Christian1, Rostyslav Braginets, Waltraud X Schulze, Dirk Walther.
Abstract
The regulation of protein function by modulating the surface charge status via sequence-locally enriched phosphorylation sites (P-sites) in so called phosphorylation "hotspots" has gained increased attention in recent years. We set out to identify P-hotspots in the model plant Arabidopsis thaliana. We analyzed the spacing of experimentally detected P-sites within peptide-covered regions along Arabidopsis protein sequences as available from the PhosPhAt database. Confirming earlier reports (Schweiger and Linial, 2010), we found that, indeed, P-sites tend to cluster and that distributions between serine and threonine P-sites to their respected closest next P-site differ significantly from those for tyrosine P-sites. The ability to predict P-hotspots by applying available computational P-site prediction programs that focus on identifying single P-sites was observed to be severely compromised by the inevitable interference of nearby P-sites. We devised a new approach, named HotSPotter, for the prediction of phosphorylation hotspots. HotSPotter is based primarily on local amino acid compositional preferences rather than sequence position-specific motifs and uses support vector machines as the underlying classification engine. HotSPotter correctly identified experimentally determined phosphorylation hotspots in A. thaliana with high accuracy. Applied to the Arabidopsis proteome, HotSPotter-predicted 13,677 candidate P-hotspots in 9,599 proteins corresponding to 7,847 unique genes. Hotspot containing proteins are involved predominantly in signaling processes confirming the surmised modulating role of hotspots in signaling and interaction events. Our study provides new bioinformatics means to identify phosphorylation hotspots and lays the basis for further investigating novel candidate P-hotspots. All phosphorylation hotspot annotations and predictions have been made available as part of the PhosPhAt database at http://phosphat.mpimp-golm.mpg.de.Entities:
Keywords: Arabidopsis thaliana; hotspots; protein phosphorylation; regulation; support vector machines
Year: 2012 PMID: 22973286 PMCID: PMC3433687 DOI: 10.3389/fpls.2012.00207
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Frequency distribution of sequence distances between neighboring P-sites between (A) any pST and any other site P-site pSTY, (B) any pY and any other P-site. As for the respective neighboring site, no distinction was made as to what amino acid residue type (either S, T, or Y) was found phosphorylated. (C) Equivalent distributions for P-flag-randomized and sequence-randomized protein sequences averaged over 100 repeat runs for nearest pST and pSTY distances. (D) Similary for pY, pSTY distances. In P-flag, phosphorylation signals were randomly redistributed among the existing serines and tyrosines, whereas in sequence-randomized runs, the entire protein sequence was randomized.
Figure 2Frequency distribution of distances between closest neighboring predicted P-sites in the . (A) Between any pST and the nearest pSTY, and (B) between any pY and the nearest pSTY. For comparison, results for 100 P-flag randomizations are given by red filled circles. Evidently, the nearest neighbor distance distribution differs from the distribution between experimentally identified sites (Figure 1) with a secondary peak at dN(pST, pSTY) = 4 and a more even distribution of dN(pY, pSTY) for predicted sites.
.
| AGI ID | Gene symbol | Annotation | Phosphorylation sites | Start of hotspot | Hotspot sequence |
|---|---|---|---|---|---|
| AT1G01540.1 | Protein kinase family protein | 102, 106, 107, 110 | 98 | RVVFSDRVSSGESRGTA | |
| AT1G07985.1 | Expressed protein | 130, 132, 134, 138 | 126 | KVVGSSSPTNIHSKSWR | |
| AT1G08680.1 | ZIGA4, AGD14 | ZIGA4 (ARF GAP-like zinc finger-containing protein ZiGA4); ARF GTPase activator/DNA binding/zinc ion binding | 190, 191, 194, 195 | 184 | GLHAKASSFVYSPGRFS |
| AT1G20440.1 | COR47, RD17 | COR47 (COLD-REGULATED 47) | 89, 98, 108, 113 | 86 | QEKTEEDEENKPSVIEKLHRSNSSSSSSSDE |
| AT1G26540.1 | Agenet domain-containing protein | 324, 328, 334, 336 | 321 | HLRSFLNSKEISETPTKAK | |
| AT1G27500.1 | Kinesin light chain-related | 32, 33, 36, 44 | 29 | ELQSSNQSPSRQSFGSYGD | |
| AT1G29220.1 | Transcriptional regulator family protein | 80, 82, 86, 89 | 76 | GVGASSSAHGTPRSLDN | |
| AT1G29350.1 | Expressed in: male gametophyte, guard cell, pollen tube; expressed during: L mature pollen stage, M germinated pollen stage; BEST | 105, 107, 108, 112 | 100 | RYAGRSGSTHFSSTDSG | |
| AT1G35580.1 | CINV1 | CINV1 (cytosolic invertase 1); beta-fructofuranosidase | 43, 45, 48, 49 | 38 | SFDERSMSELSTGYSRH |
| AT1G35580.1 | CINV1 | CINV1 (cytosolic invertase 1); beta-fructofuranosidase | 60, 65, 69, 72, 73 | 57 | IHDSPRGRSVLDTPLSSARN |
| AT1G45688.1 | Unknown protein | 15, 19, 29, 31, 34 | 12 | AASSPARSPRRPVYYVQSPSRDSHDG | |
| AT1G55310.1 | SR33, ATSCL33, SCL33 | SR33; RNA binding/proteinbinding | 4, 5, 6, 8 | 0 | MRGRSYTPSPPRGYGRR |
| AT1G59710.1 | Expressed in: 23 plant structures; expressed during: 13 growth stages; contains InterPro domain/s: protein of unknown function DUF569 (InterPro:IPR007679), actin_cross-linking (InterPro:IPR008999) | 195, 196, 198, 203 | 191 | FRQESTDSLAVGSPPKS | |
| AT1G59870.1 | PEN3, PDR8, ATPDR8 | PEN3 (penetration 3); ATPase, coupled to transmembrane movement of substances/cadmium ion transmembrane transporter | 36, 39, 42, 44 | 32 | EDIFSSGSRRTQSVNDD |
| AT1G62830.1 | LDL1, SWP1, ATSWP1 | LDL1 (LSD1-LIKE1); amine oxidase/electron carrier/oxidoreductase | 820, 822, 830, 831 | 817 | ERKSLSQEGESMISSLKA |
| AT1G66680.1 | AR401 | AR401 | 34, 44, 46, 53 | 31 | SLASDDDRSIAADSWSIKSEYGSTLD |
| AT1G73200.1 | Phosphoinositide binding | 312, 314, 316, 317 | 306 | VQVISRSWSHSSHASDV | |
| AT1G76920.1 | F-box family protein (FBX3) | 177, 178, 180, 190 | 174 | ALYYSGTVVANQWLKFSSNL | |
| AT1G80530.1 | Nodulin family protein | 270, 271, 275, 276 | 265 | RSNAKSSPLGSSDNLAK | |
| AT2G01190.1 | Octicosapeptide/Phox/Bem1p (PB1) domain-containing protein | 381, 386, 394, 399 | 378 | RVYSDDERSDHGVQAGYRKPPTPRS | |
| AT2G23350.1 | PAB4, PABP4 | PAB4 [POLY(A) binding protein 4]; RNA binding/translation initiation factor | 640, 647, 649, 656 | 637 | SQGSEGNKSGSPSDLLASLSIND |
| AT2G26730.1 | Leucine-rich repeat transmembrane protein kinase, putative | 631, 632, 636, 639, 648 | 628 | LRQSSDDPSKGSEGQTPPGESRTP | |
| AT2G29210.1 | Splicing factor PWI domain-containing protein | 390, 392, 395, 400, 402, 405 | 387 | RRRSPSPLYRRNRSPSPLYRRN | |
| AT2G31650.1 | ATX1, SDG27 | ATX1 ( | 481, 482, 484, 487, 489 | 477 | MRKFTSLTDHSASALYK |
| AT2G35030.1 | Pentatricopeptide (PPR) repeat-containing protein | 110, 112, 116, 121 | 107 | NVVTWTAMVSGYLRSKQL | |
| AT2G35350.1 | PLL1 | PLL1 (poltergeist like 1); catalytic/protein serine/threonine phosphatase | 188, 190, 192, 198 | 185 | GEISRSNSAGVHFSAPL |
| AT2G35880.1 | Expressed in: 24 plant structures; expressed during: 13 growth stages; contains InterPro DOMAIN/s: targeting for Xklp2 (InterPro:IPR009675) | 108, 111, 116, 117 | 104 | YTDITRKSIDATTSKTS | |
| AT2G37340.1 | RSZ33, ATRSZ33 | RSZ33; nucleic acid binding/nucleotide binding/zinc ion binding | 201, 203, 210, 218, 225, 229, 238, 244, 252, 255, 264, 265 | 198 | MDDSLSPRARDRSPVLDDEGSPKIIDGSPPPSPKLQKEVGSDRDGGSPQDNGRNSVVSPVVGAGGDSSKED |
| AT2G41705.1 | Camphor resistance CrcB family protein | 60, 64, 65, 68 | 56 | RRRHSAGRSSRLSADDF | |
| AT2G41720.1 | EMB2654 | EMB2654 (EMBRYO DEFECTIVE 2654) | 529, 531, 537, 539 | 526 | KADSVTFTILISGSCRM |
| AT2G41740.1 | VLN2, ATVLN2 | VLN2 (VILLIN 2); actin binding | 845, 848, 854, 855 | 842 | NKKSPDTSPTRRSTSSN |
| AT2G43680.1 | IQD14 | IQD14; calmodulin-binding | 125, 127, 132, 142 | 122 | VPRTLSPKPPSPRAEVPRSLSPKP |
| AT2G45540.1 | WD-40 repeat family protein/beige-related | 1612, 1613, 1618, 1621 | 1608 | SSERSSGNSVTLDSGSQ | |
| AT2G46170.1 | Reticulon family protein (RTNLB5) | 27, 28, 29, 30, 31 | 21 | KIHHHDSSSSSESEYEK | |
| AT2G46495.1 | Zinc finger (C3HC4-type RING finger) family protein | 401, 405, 407, 410 | 397 | KRLLTFNISGSPFSPRF | |
| AT3G05090.1 | Transducin family protein/WD-40 repeat family protein | 368, 378, 385, 392 | 365 | EVQSPKTVFQRGGSFLAGNLSFNRARVSLEG | |
| AT3G07790.1 | DGCR14-related | 117, 120, 121, 127 | 114 | KTQTPGSTFLRNFTPLD | |
| AT3G13570.1 | SCL30A | SCL30a; RNA binding/nucleic acid binding/nucleotide binding | 165, 173, 175, 177 | 162 | GYNSPPAKRHQSRSVSPQD |
| AT3G13990.1 | Unknown protein | 493, 495, 498, 501 | 489 | RVSRSDSPVSAVSEPQL | |
| AT3G17420.1 | GPK1 | GPK1; ATP binding/kinase/protein kinase/protein serine/threonine kinase | 58, 62, 69, 74, 75 | 55 | VTQSPRFTEEIKEISVDHGSSNNN |
| AT3G23100.1 | XRCC4 | XRCC4; protein C-terminus binding | 224, 225, 230, 233 | 220 | EEEESTDKAESFESGRS |
| AT3G27960.1 | Kinesin light chain-related | 573, 577, 581, 582, 588 | 570 | CGPYHPDTLAVYSNLAGTYDAM | |
| AT3G29310.1 | Calmodulin-binding protein-related | 324, 325, 326, 331 | 319 | NRHDLTSSAEDDSVDGD | |
| AT3G29390.1 | RIK | RIK (RS2-interacting KH protein); RNA binding | 511, 513, 516, 517 | 506 | PPRSKTMSPLSSKSMLP |
| AT3G48530.1 | KING1 | KING1 (SNF1-related protein kinase regulatory subunit gamma 1) | 11, 13, 18, 21, 22 | 8 | IMRSESLGHRSDVSSPEA |
| AT3G52400.1 | SYP122, ATSYP122 | SYP122 (syntaxin of plants 122); SNAP receptor | 7, 17, 21, 27 | 4 | LSGSFKTSVADGSSPPHSHNIEMSKAK |
| AT3G52930.1 | Fructose-bisphosphatealdolase, putative | 31, 32, 34, 41 | 28 | ADESTGTIGKRLASINV | |
| AT3G53500.1 | RSZ32 | Zinc knuckle (CCHC-type) family protein | 172, 174, 183, 188, 190, 193 | 169 | RDQSLSPDRKVIDASPKRGSDYDGSPKE |
| AT3G55460.1 | SCL30 | SCL30; RNA binding/nucleic acid binding/nucleotide binding | 3, 4, 7, 9 | 0 | MRRYSPPYYSPPRRGYG |
| AT3G55460.1 | SCL30 | SCL30; RNA binding/nucleic acid binding/nucleotide binding | 175, 177, 179, 181 | 170 | DSRSRYRSRSYSPAPRR |
| AT3G55460.1 | SCL30 | SCL30; RNA binding/nucleic acid binding/nucleotide binding | 203, 204, 205, 208 | 197 | ENYSRRSYSPGYEGAAA |
| AT3G56510.1 | TBP-binding protein, putative | 232, 237, 238, 240 | 228 | RQKKSIENETSQSKPGL | |
| AT3G58940.1 | F-box family protein | 112, 115, 118, 121 | 108 | QRGVSDLYLFTDFSDED | |
| AT3G61860.1 | ATRSP31, RSP31 | RSP31; RNA binding/nucleic acid binding/nucleotide binding | 246, 252, 254, 256 | 243 | RQRSPGYDRYRSRSPVP |
| AT3G62280.1 | Carboxylesterase/hydrolase, acting on ester bonds | 90, 93, 95, 98, 100 | 87 | LKMTYLSPYLDSLSPNF | |
| AT4G05150.1 | Octicosapeptide/Phox/Bem1p (PB1) domain-containing protein | 263, 265, 269, 276 | 260 | EVSTLSDPGSPRRDVPSPYG | |
| AT4G07523.1 | Transposable element gene; similar to unknown protein ( | 3, 5, 6, 8, 9, 10 | MPLSYSSPSSSEERSDD | ||
| AT4G11740.1 | SAY1 | SAY1 | 312, 314, 323, 325, 327 | 309 | RAASGSLAPPNADRSRSGSPEE |
| AT4G13510.1 | AMT1;1, ATAMT1, ATAMT1;1 | AMT1;1 (AMMONIUM TRANSPORTER 1;1); ammoniumtransmembranetransporter | 487, 489, 491, 495 | 483 | VEPRSPSPSGANTTPTP |
| AT4G25160.1 | Protein kinase family protein | 312, 314, 321, 323 | 309 | TRFSWSGMGVDTTHSRAS | |
| AT4G25580.1 | Stress-responsive protein-related | 155, 157, 161, 167 | 152 | GAPTLTPHNTPVSLLSATE | |
| AT4G31580.1 | SRZ-22, SRZ-22, RSZP22 | SRZ-22; protein binding | 159, 169, 171, 173, 177 | 156 | RRRSPSPPPARGRSYSRSPPPYRAR |
| AT4G31700.1 | RPS6 | RPS6 (ribosomal protein S6); structural constituent of ribosome | 230, 236, 239, 240, 246 | 227 | RSESLAKKRSRLSSAAAKPSVTA |
| AT4G32250.1 | Protein kinase family protein | 12, 21, 23, 29, 30 | 9 | PDDTEYEIIEGESESALAAGTSPWM | |
| AT4G35785.1 | Nucleic acid binding/nucleotide binding | 40, 42, 48, 50 | 37 | RSRSRSLPRPVSPSRSR | |
| AT4G38600.1 | KAK, UPL3 | KAK (KAKTUS); ubiquitin-protein ligase | 1366, 1367, 1372, 1373, 1374 | 1362 | EGKITSLDDLSTTAAKV |
| AT4G39680.1 | SAP domain-containing protein | 310, 318, 319, 323 | 307 | AGDSEKLNLDRSSGDESMED | |
| AT5G02240.1 | Binding/catalytic/coenzyme binding | 234, 235, 236, 238 | 228 | GSKPEGTSTPTKDFKAL | |
| AT5G04930.1 | ALA1 | ALA1 (aminophospholipid ATPase1); ATPase, coupled to transmembrane movement of ions, phosphorylative mechanism | 39, 46, 51, 57 | 36 | DLGSKRIRHGSAGADSEMLSMSQKE |
| AT5G06210.1 | RNA binding protein, putative | 136, 138, 139, 141, 142 | 129 | DPAVIAATRTTETSKSD | |
| AT5G18660.1 | PCB2 | PCB2 (Pale-green and chlorophyll B reduced 2); 3,8-divinyl protochlorophyllide a 8-vinyl reductase | 370, 378, 381, 382 | 367 | AAESMLILDPETGEYSEEK |
| AT5G21160.1 | La domain-containing protein/proline-rich family protein | 369, 377, 380, 383 | 366 | SAETIGDGDKDSPKSITSGDN | |
| AT5G41600.1 | BTI3 | BTI3 (VIRB2-interacting protein 3) | 24, 25, 28, 30 | 19 | HGHGDSSSLSDSDDDKK |
| AT5G47690.1 | Binding | 1273, 1280, 1283, 1288 | 1270 | HLESDMDKNVSLDSHDENSDQE | |
| AT5G52040.1 | ATRSP41 | ATRSP41; RNA binding/nucleic acid binding/nucleotide binding | 191, 201, 209, 218, 219, 228, 230, 232, 238 | 188 | RRRSPSPYRRERGSPDYGRGASPVAHKRERTSPDYGRGRRSPSPYKRARLSPDY |
| AT5G52040.1 | ATRSP41 | ATRSP41; RNA binding/nucleic acid binding/nucleotide binding | 336, 341, 346, 348, 350 | 333 | GRGYDGADSPIRESPSRSPPA |
| AT5G57110.1 | ACA8, AT-ACA8 | ACA8 (autoinhibited Ca2+-ATPASE, isoform 8); calcium-transporting ATPase/calmodulin-binding/protein self-association | 18, 21, 26, 28 | 15 | DVESGKSEHADSDSDTF |
| AT5G62820.1 | Integral membrane protein, putative | 27, 30, 40, 46 | 24 | RFHSPLSDAGDLPESRYVSPEGSPFK | |
| AT5G64200.1 | ATSC35, SC35 | ATSC35; RNA binding/nucleic acid binding/nucleotide binding | 273, 277, 279, 282 | 269 | PERRSNERSPSPGSPAP |
HotSPotter prediction performance.
| Actual | ||||
|---|---|---|---|---|
| Positive (hotspot sequence) | Negative (non-hotspot sequence) | |||
| Predicted | Positive | TP = 63% (230/365) | FP = 2% (51/2915) | PPV = 82% |
| Negative | FN = 37% (135/365) | TN = 98% (2864/2915) | NPV = 95% | |
Contingency table of prediction results of the SVM-based classification termed HotSPotter applied to sequence windows of length 17 and obtained in fivefold cross-validation.
TP, true positive; FP, false positive; FN, false negative; TN, true negative; PPV, positive predictive value; NPV, negative predictive value. Numbers in parentheses refer to the actual counts. Here, positive prediction results are based on SVM scores >0.
Statistics of HotSPotter predictions in the .
| (A) SVM score threshold, S, for positive prediction | (B) Number of windows, length 17 | (C) Number of windows with positive score | (D) Number of STY- content filtered and run-consolidated hotspots | (E) Number of unique proteins (genes) containing hotspots |
|---|---|---|---|---|
| S > 0 | 12,866,960 | 945,670 (7.3%) | 54,329 (44,247) | 23,524 (19,252) |
| S > 1 | 160,780 (1.2%) | 13,677 (11,065) | 9,599 (7,847) |
(D) Numbers refer to counts of hotspots after checking for STY contents (≥4 and no more than 10 residues between any STY) and merged consecutive runs of positive windows. Numbers in parentheses refer to unique hotspot sequences. (E) The number of proteins refers to all proteins derived from all genes including annotated splice forms.
Gene ontology-slim terms statistically enriched or depleted in the set of 9,599 .
| FDR | GO-slim process | FDR | GO-slim function | FDR | GO-slim component |
|---|---|---|---|---|---|
| 9.27E−08 | DNA or RNA metabolism | 1.31E−28 | Nucleotide binding | 2.50E−67 | Chloroplast |
| 2.15E−07 | Developmental processes | 8.36E−16 | Kinase activity | 6.70E−16 | Nucleus |
| 2.38E−07 | Cell organization and biogenesis | 3.26E−15 | Transcription factor activity | 4.59E−13 | Plasma membrane |
| 1.38E−02 | Other cellular processes | 4.80E−14 | Protein binding | 1.22E−12 | Plastid |
| 1.52E−02 | Signal transduction | 4.67E−06 | Transferase activity | 3.00E−05 | Other intracellular components |
| 1.09E−02 | Hydrolase activity | 4.82E−02 | Golgi apparatus | ||
| 1.43E−02 | DNA or RNA binding | ||||
| 3.16E−05 | Transport | 6.83E−16 | Unknown molecular functions | 2.28E−48 | Other cellular components |
| 1.51E−04 | Unknown biological processes | 1.31E−15 | Other binding | 1.57E−19 | Unknown cellular components |
| 6.52E−03 | Response to stress | 4.11E−12 | Other enzyme activity | 1.97E−08 | Extracellular |
| 6.50E−09 | Structural molecule activity | 2.93E−08 | Cytosol | ||
| 4.43E−04 | Nucleic acid binding | 1.07E−07 | Ribosome | ||
| 2.68E−03 | Transporter activity | 1.03E−02 | Other membranes | ||
| 1.09E−02 | Other molecular functions | 1.10E−02 | ER | ||
| 2.25E−02 | Cell wall | ||||
Removing the set of 57 proteins jointly contained in the experimental and predicted set as well as removing chloroplastidial and mitochondrial proteins did not result in any significant changes of GO-slim term enrichment/depletion statistics.
Figure 3Screenshot of the PhosPhAt database with P-hotspot annotation information.