Panayotis Vlastaridis1, Pelagia Kyriakidou1, Anargyros Chaliotis1, Yves Van de Peer2,3,4, Stephen G Oliver5, Grigoris D Amoutzias1. 1. Bioinformatics Laboratory, Department of Biochemistry and Biotechnology, University of Thessaly, Larisa, 41500, Greece. 2. Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, B-9052 Ghent, Belgium. 3. Bioinformatics Institute Ghent, Technologiepark 927, B-9052 Ghent, Belgium. 4. Department of Genetics, Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa. 5. Cambridge Systems Biology Centre & Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.
Abstract
BACKGROUND: Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast). RESULTS: In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable. CONCLUSIONS: Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70 % to 95 % of total phosphoproteins, but only 40 % to 60 % of total p-sites.
BACKGROUND: Phosphorylation is the most frequent post-translational modification made to proteins and may regulate protein activity as either a molecular digital switch or a rheostat. Despite the cornucopia of high-throughput (HTP) phosphoproteomic data in the last decade, it remains unclear how many proteins are phosphorylated and how many phosphorylation sites (p-sites) can exist in total within a eukaryotic proteome. We present the first reliable estimates of the total number of phosphoproteins and p-sites for four eukaryotes (human, mouse, Arabidopsis, and yeast). RESULTS: In all, 187 HTP phosphoproteomic datasets were filtered, compiled, and studied along with two low-throughput (LTP) compendia. Estimates of the number of phosphoproteins and p-sites were inferred by two methods: Capture-Recapture, and fitting the saturation curve of cumulative redundant vs. cumulative non-redundant phosphoproteins/p-sites. Estimates were also adjusted for different levels of noise within the individual datasets and other confounding factors. We estimate that in total, 13 000, 11 000, and 3000 phosphoproteins and 230 000, 156 000, and 40 000 p-sites exist in human, mouse, and yeast, respectively, whereas estimates for Arabidopsis were not as reliable. CONCLUSIONS: Most of the phosphoproteins have been discovered for human, mouse, and yeast, while the dataset for Arabidopsis is still far from complete. The datasets for p-sites are not as close to saturation as those for phosphoproteins. Integration of the LTP data suggests that current HTP phosphoproteomics appears to be capable of capturing 70 % to 95 % of total phosphoproteins, but only 40 % to 60 % of total p-sites.
Keywords:
Arabidopsis; Capture-Recapture; Curve-Fitting; Phosphoproteomics; human; mouse; total number of phosphoproteins; total number of phosphorylation sites; yeast
Authors: Ronghu Wu; Noah Dephoure; Wilhelm Haas; Edward L Huttlin; Bo Zhai; Mathew E Sowa; Steven P Gygi Journal: Mol Cell Proteomics Date: 2011-05-07 Impact factor: 5.911
Authors: Sina Ghaemmaghami; Won-Ki Huh; Kiowa Bower; Russell W Howson; Archana Belle; Noah Dephoure; Erin K O'Shea; Jonathan S Weissman Journal: Nature Date: 2003-10-16 Impact factor: 49.962
Authors: Grigoris D Amoutzias; Ying He; Kathryn S Lilley; Yves Van de Peer; Stephen G Oliver Journal: Mol Cell Proteomics Date: 2012-01-27 Impact factor: 5.911
Authors: Panayotis Vlastaridis; Pelagia Kyriakidou; Anargyros Chaliotis; Yves Van de Peer; Stephen G Oliver; Grigoris D Amoutzias Journal: Gigascience Date: 2017-02-01 Impact factor: 6.524
Authors: Trish T Hoang; I Caglar Tanrikulu; Quinn A Vatland; Trieu M Hoang; Ronald T Raines Journal: Mol Cancer Ther Date: 2018-10-03 Impact factor: 6.261
Authors: James M Johnson; Alexander S Hebert; Quentin H Drane; Robert F Lera; Jun Wan; Beth A Weaver; Joshua J Coon; Mark E Burkard Journal: Cell Chem Biol Date: 2020-02-03 Impact factor: 8.116