Pengyi Yang1, Sean J Humphrey2, David E James3, Yee Hwa Yang4, Raja Jothi1. 1. Systems Biology Section, Epigenetics & Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences, National Institutes of Health, RTP, NC 27709, USA. 2. Department of Proteomics and Signal Transduction, Max-Planck-Institute of Biochemistry, Martinsried, Germany. 3. Charles Perkins Centre, School of Molecular Bioscience, Sydney Medical School and. 4. School of Mathematics and Statistics, The University of Sydney, NSW 2006, Australia.
Abstract
MOTIVATION: Protein phosphorylation is a post-translational modification that underlines various aspects of cellular signaling. A key step to reconstructing signaling networks involves identification of the set of all kinases and their substrates. Experimental characterization of kinase substrates is both expensive and time-consuming. To expedite the discovery of novel substrates, computational approaches based on kinase recognition sequence (motifs) from known substrates, protein structure, interaction and co-localization have been proposed. However, rarely do these methods take into account the dynamic responses of signaling cascades measured from in vivo cellular systems. Given that recent advances in mass spectrometry-based technologies make it possible to quantify phosphorylation on a proteome-wide scale, computational approaches that can integrate static features with dynamic phosphoproteome data would greatly facilitate the prediction of biologically relevant kinase-specific substrates. RESULTS: Here, we propose a positive-unlabeled ensemble learning approach that integrates dynamic phosphoproteomics data with static kinase recognition motifs to predict novel substrates for kinases of interest. We extended a positive-unlabeled learning technique for an ensemble model, which significantly improves prediction sensitivity on novel substrates of kinases while retaining high specificity. We evaluated the performance of the proposed model using simulation studies and subsequently applied it to predict novel substrates of key kinases relevant to insulin signaling. Our analyses show that static sequence motifs and dynamic phosphoproteomics data are complementary and that the proposed integrated model performs better than methods relying only on static information for accurate prediction of kinase-specific substrates. AVAILABILITY AND IMPLEMENTATION: Executable GUI tool, source code and documentation are freely available at https://github.com/PengyiYang/KSP-PUEL. CONTACT: pengyi.yang@nih.gov or jothi@mail.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.
MOTIVATION: Protein phosphorylation is a post-translational modification that underlines various aspects of cellular signaling. A key step to reconstructing signaling networks involves identification of the set of all kinases and their substrates. Experimental characterization of kinase substrates is both expensive and time-consuming. To expedite the discovery of novel substrates, computational approaches based on kinase recognition sequence (motifs) from known substrates, protein structure, interaction and co-localization have been proposed. However, rarely do these methods take into account the dynamic responses of signaling cascades measured from in vivo cellular systems. Given that recent advances in mass spectrometry-based technologies make it possible to quantify phosphorylation on a proteome-wide scale, computational approaches that can integrate static features with dynamic phosphoproteome data would greatly facilitate the prediction of biologically relevant kinase-specific substrates. RESULTS: Here, we propose a positive-unlabeled ensemble learning approach that integrates dynamic phosphoproteomics data with static kinase recognition motifs to predict novel substrates for kinases of interest. We extended a positive-unlabeled learning technique for an ensemble model, which significantly improves prediction sensitivity on novel substrates of kinases while retaining high specificity. We evaluated the performance of the proposed model using simulation studies and subsequently applied it to predict novel substrates of key kinases relevant to insulin signaling. Our analyses show that static sequence motifs and dynamic phosphoproteomics data are complementary and that the proposed integrated model performs better than methods relying only on static information for accurate prediction of kinase-specific substrates. AVAILABILITY AND IMPLEMENTATION: Executable GUI tool, source code and documentation are freely available at https://github.com/PengyiYang/KSP-PUEL. CONTACT: pengyi.yang@nih.gov or jothi@mail.nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.
Authors: Rune Linding; Lars Juhl Jensen; Gerard J Ostheimer; Marcel A T M van Vugt; Claus Jørgensen; Ioana M Miron; Francesca Diella; Karen Colwill; Lorne Taylor; Kelly Elder; Pavel Metalnikov; Vivian Nguyen; Adrian Pasculescu; Jing Jin; Jin Gyoon Park; Leona D Samson; James R Woodgett; Robert B Russell; Peer Bork; Michael B Yaffe; Tony Pawson Journal: Cell Date: 2007-06-14 Impact factor: 41.582
Authors: Martin Lee Miller; Lars Juhl Jensen; Francesca Diella; Claus Jørgensen; Michele Tinti; Lei Li; Marilyn Hsiung; Sirlester A Parker; Jennifer Bordeaux; Thomas Sicheritz-Ponten; Marina Olhovsky; Adrian Pasculescu; Jes Alexander; Stefan Knapp; Nikolaj Blom; Peer Bork; Shawn Li; Gianni Cesareni; Tony Pawson; Benjamin E Turk; Michael B Yaffe; Søren Brunak; Rune Linding Journal: Sci Signal Date: 2008-09-02 Impact factor: 8.192
Authors: Fuyi Li; Yanan Wang; Chen Li; Tatiana T Marquez-Lago; André Leier; Neil D Rawlings; Gholamreza Haffari; Jerico Revote; Tatsuya Akutsu; Kuo-Chen Chou; Anthony W Purcell; Robert N Pike; Geoffrey I Webb; A Ian Smith; Trevor Lithgow; Roger J Daly; James C Whisstock; Jiangning Song Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622
Authors: Pengyi Yang; Sean J Humphrey; Senthilkumar Cinghu; Rajneesh Pathania; Andrew J Oldfield; Dhirendra Kumar; Dinuka Perera; Jean Y H Yang; David E James; Matthias Mann; Raja Jothi Journal: Cell Syst Date: 2019-05-08 Impact factor: 10.304
Authors: Westa Domanova; James Krycer; Rima Chaudhuri; Pengyi Yang; Fatemeh Vafaee; Daniel Fazakerley; Sean Humphrey; David James; Zdenka Kuncic Journal: PLoS One Date: 2016-06-23 Impact factor: 3.240
Authors: Zhiduan Su; James G Burchfield; Pengyi Yang; Sean J Humphrey; Guang Yang; Deanne Francis; Sabina Yasmin; Sung-Young Shin; Dougall M Norris; Alison L Kearney; Miro A Astore; Jonathan Scavuzzo; Kelsey H Fisher-Wellman; Qiao-Ping Wang; Benjamin L Parker; G Gregory Neely; Fatemeh Vafaee; Joyce Chiu; Reichelle Yeo; Philip J Hogg; Daniel J Fazakerley; Lan K Nguyen; Serdar Kuyucak; David E James Journal: Nat Commun Date: 2019-12-02 Impact factor: 14.919