MOTIVATION: Granzyme B (GrB) and caspases cleave specific protein substrates to induce apoptosis in virally infected and neoplastic cells. While substrates for both types of proteases have been determined experimentally, there are many more yet to be discovered in humans and other metazoans. Here, we present a bioinformatics method based on support vector machine (SVM) learning that identifies sequence and structural features important for protease recognition of substrate peptides and then uses these features to predict novel substrates. Our approach can act as a convenient hypothesis generator, guiding future experiments by high-confidence identification of peptide-protein partners. RESULTS: The method is benchmarked on the known substrates of both protease types, including our literature-curated GrB substrate set (GrBah). On these benchmark sets, the method outperforms a number of other methods that consider sequence only, predicting at a 0.87 true positive rate (TPR) and a 0.13 false positive rate (FPR) for caspase substrates, and a 0.79 TPR and a 0.21 FPR for GrB substrates. The method is then applied to approximately 25 000 proteins in the human proteome to generate a ranked list of predicted substrates of each protease type. Two of these predictions, AIF-1 and SMN1, were selected for further experimental analysis, and each was validated as a GrB substrate. AVAILABILITY: All predictions for both protease types are publically available at http://salilab.org/peptide. A web server is at the same site that allows a user to train new SVM models to make predictions for any protein that recognizes specific oligopeptide ligands.
MOTIVATION:Granzyme B (GrB) and caspases cleave specific protein substrates to induce apoptosis in virally infected and neoplastic cells. While substrates for both types of proteases have been determined experimentally, there are many more yet to be discovered in humans and other metazoans. Here, we present a bioinformatics method based on support vector machine (SVM) learning that identifies sequence and structural features important for protease recognition of substrate peptides and then uses these features to predict novel substrates. Our approach can act as a convenient hypothesis generator, guiding future experiments by high-confidence identification of peptide-protein partners. RESULTS: The method is benchmarked on the known substrates of both protease types, including our literature-curated GrB substrate set (GrBah). On these benchmark sets, the method outperforms a number of other methods that consider sequence only, predicting at a 0.87 true positive rate (TPR) and a 0.13 false positive rate (FPR) for caspase substrates, and a 0.79 TPR and a 0.21 FPR for GrB substrates. The method is then applied to approximately 25 000 proteins in the human proteome to generate a ranked list of predicted substrates of each protease type. Two of these predictions, AIF-1 and SMN1, were selected for further experimental analysis, and each was validated as a GrB substrate. AVAILABILITY: All predictions for both protease types are publically available at http://salilab.org/peptide. A web server is at the same site that allows a user to train new SVM models to make predictions for any protein that recognizes specific oligopeptide ligands.
Authors: N A Thornberry; T A Rano; E P Peterson; D M Rasper; T Timkey; M Garcia-Calvo; V M Houtzager; P A Nordstrom; S Roy; J P Vaillancourt; K T Chapman; D W Nicholson Journal: J Biol Chem Date: 1997-07-18 Impact factor: 5.157
Authors: Matthew Ravalin; Panagiotis Theofilas; Koli Basu; Kwadwo A Opoku-Nsiah; Victoria A Assimon; Daniel Medina-Cleghorn; Yi-Fan Chen; Markus F Bohn; Michelle Arkin; Lea T Grinberg; Charles S Craik; Jason E Gestwicki Journal: Nat Chem Biol Date: 2019-07-18 Impact factor: 15.040
Authors: Fuyi Li; Yanan Wang; Chen Li; Tatiana T Marquez-Lago; André Leier; Neil D Rawlings; Gholamreza Haffari; Jerico Revote; Tatsuya Akutsu; Kuo-Chen Chou; Anthony W Purcell; Robert N Pike; Geoffrey I Webb; A Ian Smith; Trevor Lithgow; Roger J Daly; James C Whisstock; Jiangning Song Journal: Brief Bioinform Date: 2019-11-27 Impact factor: 11.622
Authors: Olivier Julien; Min Zhuang; Arun P Wiita; Anthony J O'Donoghue; Giselle M Knudsen; Charles S Craik; James A Wells Journal: Proc Natl Acad Sci U S A Date: 2016-03-22 Impact factor: 11.205
Authors: Sam L Ivry; Nicole O Meyer; Michael B Winter; Markus F Bohn; Giselle M Knudsen; Anthony J O'Donoghue; Charles S Craik Journal: Protein Sci Date: 2017-12-08 Impact factor: 6.725
Authors: Jie Zhou; Shantao Li; Kevin K Leung; Brian O'Donovan; James Y Zou; Joseph L DeRisi; James A Wells Journal: Proc Natl Acad Sci U S A Date: 2020-09-24 Impact factor: 11.205
Authors: Anthony J O'Donoghue; A Alegra Eroy-Reveles; Giselle M Knudsen; Jessica Ingram; Min Zhou; Jacob B Statnekov; Alexander L Greninger; Daniel R Hostetter; Gang Qu; David A Maltby; Marc O Anderson; Joseph L Derisi; James H McKerrow; Alma L Burlingame; Charles S Craik Journal: Nat Methods Date: 2012-09-30 Impact factor: 28.547
Authors: Marat D Kazanov; Yoshinobu Igarashi; Alexey M Eroshkin; Piotr Cieplak; Boris Ratnikov; Ying Zhang; Zhanwen Li; Adam Godzik; Andrei L Osterman; Jeffrey W Smith Journal: J Proteome Res Date: 2011-07-08 Impact factor: 4.466