| Literature DB >> 20594336 |
Diego Mauricio Riaño-Pachón1, Sabrina Kleessen, Jost Neigenfind, Pawel Durek, Elke Weber, Wolfgang R Engelsberger, Dirk Walther, Joachim Selbig, Waltraud X Schulze, Birgit Kersten.
Abstract
BACKGROUND: Protein phosphorylation is an important post-translational modification influencing many aspects of dynamic cellular behavior. Site-specific phosphorylation of amino acid residues serine, threonine, and tyrosine can have profound effects on protein structure, activity, stability, and interaction with other biomolecules. Phosphorylation sites can be affected in diverse ways in members of any species, one such way is through single nucleotide polymorphisms (SNPs). The availability of large numbers of experimentally identified phosphorylation sites, and of natural variation datasets in Arabidopsis thaliana prompted us to analyze the effect of non-synonymous SNPs (nsSNPs) onto phosphorylation sites.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20594336 PMCID: PMC2996939 DOI: 10.1186/1471-2164-11-411
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Frequency distribution of the count of the number of phosphorylation sites per protein. To compute the expected distribution of phosphorylation sites per protein (black circles) we assumed that that every possible STY-site becomes phosphorylated based on a constant probability p, which is independent on the number of STY sites per protein and was obtained by dividing the total number of pSTY-sites by the total number of STY positions across all proteins in the data set, p = total number_pSTY/total number_STY. With p available, the expected number of phosphorylation sites per protein was computed as E(pSTY)x = p x Number of_SYT in protein X. The observed distribution of phosphorylation sites per proteins appears as red circles (A: Experimental phosphorylation sites; B: high-confidence predicted phosphorylation sites).
Proteins with 9 or more experimentally identified phosphorylation sites in A. thaliana. Proteins with phosphorylation hotspots appear in italics. Proteins annotated with the GO term "nucleobase, nucleoside, nucleotide and nucleic acid metabolic process" (GO: 6139).
| AGI | Number of phosphorylation sites | Number of p(S);p(T);p(Y) | Protein length | TAIR7 function |
|---|---|---|---|---|
| 22 | (19;1;2) | 357 | ATRSP41 (Arabidopsis thaliana arginine/serine-rich splicing factor 41); RNA binding | |
| AT2G29210.1 | 16 | (13;1;2) | 879 | Splicing factor PWI domain-containing protein |
| AT2G37340.1 | 13 | (13;0;0) | 291 | RSZ33 (Arginine/serine-rich Zinc knuckle-containing protein 33); nucleic acid binding/nucleotide binding/zinc ion binding |
| 13 | (13;0;0) | 304 | ATSC35 ("Arabidopsis thaliana arginine/serine-rich splicing factor 35, 35 kDa protein"); RNA binding | |
| AT2G43680.1 | 13 | (11;2;0) | 669 | IQD14; calmodulin binding |
| 13 | (11;0;2) | 263 | SCL30 (SC35-like splicing factor 30); RNA binding | |
| 12 | (10;2;0) | 552 | CINV1 (CYTOSOLIC INVERTASE 1); beta-fructofuranosidase | |
| AT3G25500.1 | 12 | (12;0;0) | 1052 | AFH1 (FORMIN HOMOLOGY 1); actin binding |
| AT3G63400.1 | 12 | (11;1;0) | 571 | Peptidyl-prolyl cis-trans isomerase cyclophilin-type family protein |
| AT5G47690.1 | 12 | (10;2;0) | 1606 | Binding |
| AT3G23900.1 | 11 | (11;0;0) | 988 | RNA recognition motif (RRM)-containing protein |
| AT2G18960.1 | 11 | (5;4;2) | 950 | AHA1 (PLASMA MEMBRANE PROTON ATPASE); ATPase |
| AT2G20960.1 | 11 | (4;7;0) | 749 | pEARLI4 |
| 10 | (9;0;1) | 244 | RSZ32; nucleic acid binding | |
| 10 | (8;2;0) | 1758 | 1-phosphatidylinositol-4-phosphate 5-kinase/zinc ion binding | |
| 10 | (7;3;0) | 1468 | ESP4 (ENHANCED SILENCING PHENOTYPE 4); binding | |
| AT1G31870.1 | 10 | (10;0;0) | 562 | Similar to splicing factor PWI domain-containing protein [Arabidopsis thaliana] (TAIR:AT2G29210.1) |
| AT5G47430.1 | 10 | (9;0;1) | 893 | Unknown function |
| AT4G32420.1 | 9 | (8;0;1) | 838 | Peptidyl-prolyl cis-trans isomerase cyclophilin-type family protein |
| AT5G10470.1 | 9 | (9;0;0) | 1274 | Kinesin motor protein-related |
| AT4G02510.1 | 9 | (8;1;0) | 1504 | TOC159 (translocon outer membrane complex 159) |
| 9 | (7;2;0) | 1847 | Transducin family protein/WD-40 repeat family protein | |
| AT3G26935.1 | 9 | (6;2;1) | 444 | Zinc finger (DHHC type) family protein |
| AT5G47910.1 | 9 | (9;0;0) | 922 | RBOHD (RESPIRATORY BURST OXIDASE PROTEIN D) |
| AT5G61150.1 | 9 | (9;0;0) | 626 | VIP4 (VERNALIZATION INDEPENDENCE 4) |
| AT5G43310.1 | 9 | (5;4;0) | 1238 | COP1-interacting protein-related |
| AT5G40450.1 | 9 | (8;1;0) | 2890 | Unknown function |
| 9 | (8;1;0) | 263 | SCL30a (SC35-like splicing factor 30a); RNA binding | |
| AT3G61860.1 | 9 | (8;0;1) | 265 | ATRSP31 (ARGININE/SERINE-RICH SPLICING FACTOR 31); RNA binding |
| AT1G48920.1 | 9 | (8;1;0) | 558 | Nucleolin, putative |
| AT1G19870.1 | 9 | (9;0;0) | 795 | IQD32 (IQ-domain 32); calmodulin binding |
A. thaliana proteins with potential phosphorylation hotspots consisting of experimental phosphorylation sites in a window of 10 amino acids.
| AGI | Number of significant windows | Number of phospho-sites in window of 10 amino acids (window start) | TAIR7 function |
|---|---|---|---|
| AT4G07523.1 | 4 | 5(6), 6(4), 4(7), 5(1) | Unknown function |
| AT4G33240.1 | 4 | 4(1538), 4(1545), 4(1557), 5(1543) | 1-phosphatidylinositol-4-phosphate 5-kinase/zinc ion binding |
| AT1G53165.1 | 3 | 4(442), 4(439), 5(441) | Kinase |
| AT2G46170.1 | 3 | 4(22), 5(28), 4(29) | Reticulon family protein (RTNLB5) |
| AT3G04650.1 | 3 | 4(4), 5(3), 4(1) | Oxidoreductase |
| AT1G35580.1 | 2 | 4(66), 4(44) | CINV1 (CYTOSOLIC INVERTASE 1); beta-fructofuranosidase |
| AT3G18180.1 | 2 | 4(67), 4(69) | Unknown function |
| AT1G08680.1 | 1 | 4(191) | ZIGA4 (ARF GAP-LIKE ZINC FINGER-CONTAINING PROTEIN ZIGA4); DNA binding |
| AT1G29220.1 | 1 | 4(81) | Transcriptional regulator family protein |
| AT1G55310.1 | 1 | 4(5) | SR33 (SC35-like splicing factor 33); RNA binding |
| AT1G70130.1 | 1 | 4(281) | Lectin protein kinase, putative |
| AT1G73200.1 | 1 | 4(313) | Unknown function |
| AT2G26730.1 | 1 | 4(632) | Leucine-rich repeat transmembrane protein kinase, putative |
| AT2G35880.1 | 1 | 4(109) | Unknown function |
| AT2G41705.1 | 1 | 4(61) | Camphor resistance CrcB family protein |
| AT2G46495.1 | 1 | 4(402) | Zinc finger (C3HC4-type RING finger) family protein |
| AT3G27960.1 | 1 | 4(574) | Kinesin light chain-related |
| AT3G29310.1 | 1 | 4(325) | Calmodulin-binding protein-related |
| AT3G29390.1 | 1 | 4(512) | RIK (RS2-INTERACTING KH PROTEIN) |
| AT3G48530.1 | 1 | 4(14) | CBS domain-containing protein |
| AT3G55460.1 | 1 | 4(176) | SCL30 (SC35-like splicing factor 30); RNA binding |
| AT3G58940.1 | 1 | 4(113) | F-box family protein |
| AT4G14605.1 | 1 | 4(290) | Mitochondrial transcription termination factor-related/mTERF-related |
| AT4G31580.1 | 1 | 4(170) | SRZ-22 (serine/arginine-rich 22) |
| AT4G32250.1 | 1 | 4(22) | Protein kinase family protein |
| AT5G02240.1 | 1 | 4(235) | Catalytic/coenzyme binding |
| AT5G14890.1 | 1 | 4(60) | NHL repeat-containing protein |
| AT5G52040.1 | 1 | 4(342) | ATRSP41 (Arabidopsis thaliana arginine/serine-rich splicing factor 41); RNA binding |
| AT5G64200.1 | 1 | 4(274) | ATSC35 ("Arabidopsis thaliana arginine/serine-rich splicing factor 35, 35 kDa protein"); RNA binding |
Figure 2Phosphorylation hotspots in . Hotspots in three A. thaliana proteins which were identified based on the analysis of experimental phosphorylation sites with a window size of 10 amino acids (Additional file 5). Hotspots are indicated by red boxes. A: AT4G07523.1 represents a protein of unknown function (TAIR7). B: AT2G46170.1 is annotated as reticulon family protein (TAIR7). C: AT1G53165.1 was found to be a protein kinase (TAIR7). Amino acid residues S, T and Y are marked by green, blue and purple rectangles. Rectangles with a flag represent experimentally verified phosphorylation sites. Pfam identified protein domains are highlighted by yellow boxes.
Figure 3Amino acids in the reference genome affected by SNPs. A. Log2 odds ratio relating the number of non-synonymous substitutions per amino acid with its abundance in the whole non-redundant proteome. All ratios are significantly different from 1 after Benjamini-Hochberg [63], p-value correction (FDR ≤ 5E-2, see Additional file 8). B. Proportion of amino acids affected by SNPs (synonymous and non-synonymous substitutions).
Figure 4Effect of SNPs, comparison between experimental phosphorylation sites and non-phosphorylation sites. We evaluated the enrichment and depletion of each substitution pair from an experimentally identified phosphorylation site to any other amino acid, by using 2-way contingency tables for each pair and evaluating the significance of an odds ratio different from 1 (corrected p-value, FDR ≤ 5E-2) with a Fisher's exact test. All ratios are statistically undistinguishable from 1 (Log Odds = 0). Only experimentally verified phosphoproteins were included in this analysis. Substitution of amino acids in bold were never found, neither in phosphorylation sites nor in non-phosphorylation sites. Substitution amino acids in red were present among non-phosphorylation sites, but absent among phosphorylation sites. The substitution cost is the minimal number of DNA substitutions that is required in order to change an amino acid into another (see Additional file 9).
Overview of the number of SNPs per dataset.
| Dataset | Number of non-redundant SNPs in this study | Number of non-redundant SNPs mapping onto cDNAs | Number of non-redundant SNPs mapping onto CDS | Number of non-redundant SNPs causing at least one non-synonymous substitution | Number of non-redundant SNPs always causing synonymous substitutions |
|---|---|---|---|---|---|
| Nordborg2005 | 20,667 | 9,251 | 8,023 | 4,047 | 3,975 |
| Clark2007 | 637,522 | 263,718 | 227,497 | 109,709 | 117,788 |
| Ossowski2008 | 860,154 | 220,984 | 174,559 | 84,400 | 90,159 |
| TOTAL | 1,247,284 | 382,770 | 315,039 | 156,034 | 159,004 |