Fuyi Li1,2, Jinxiang Chen1,3, André Leier4,5, Tatiana Marquez-Lago4,5, Quanzhong Liu3, Yanze Wang3, Jerico Revote1, A Ian Smith1, Tatsuya Akutsu6, Geoffrey I Webb2, Lukasz Kurgan7, Jiangning Song1,2,8. 1. Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia. 2. Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia. 3. College of Information Engineering, Northwest A&F University, Yangling 712100, China. 4. Department of Genetics, USA. 5. Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA. 6. Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan. 7. Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA. 8. ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
Abstract
MOTIVATION: Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the 'life and death' cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases' functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events. RESULTS: We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites. AVAILABILITY AND IMPLEMENTATION: The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Proteases are enzymes that cleave target substrate proteins by catalyzing the hydrolysis of peptide bonds between specific amino acids. While the functional proteolysis regulated by proteases plays a central role in the 'life and death' cellular processes, many of the corresponding substrates and their cleavage sites were not found yet. Availability of accurate predictors of the substrates and cleavage sites would facilitate understanding of proteases' functions and physiological roles. Deep learning is a promising approach for the development of accurate predictors of substrate cleavage events. RESULTS: We propose DeepCleave, the first deep learning-based predictor of protease-specific substrates and cleavage sites. DeepCleave uses protein substrate sequence data as input and employs convolutional neural networks with transfer learning to train accurate predictive models. High predictive performance of our models stems from the use of high-quality cleavage site features extracted from the substrate sequences through the deep learning process, and the application of transfer learning, multiple kernels and attention layer in the design of the deep network. Empirical tests against several related state-of-the-art methods demonstrate that DeepCleave outperforms these methods in predicting caspase and matrix metalloprotease substrate-cleavage sites. AVAILABILITY AND IMPLEMENTATION: The DeepCleave webserver and source code are freely available at http://deepcleave.erc.monash.edu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: D A Matthews; W W Smith; R A Ferre; B Condon; G Budahazi; W Sisson; J E Villafranca; C A Janson; H E McElroy; C L Gribskov Journal: Cell Date: 1994-06-03 Impact factor: 41.582
Authors: Olivier Julien; Min Zhuang; Arun P Wiita; Anthony J O'Donoghue; Giselle M Knudsen; Charles S Craik; James A Wells Journal: Proc Natl Acad Sci U S A Date: 2016-03-22 Impact factor: 11.205
Authors: Deni Subasic; Thomas Stoeger; Seline Eisenring; Ana M Matia-González; Jochen Imig; Xue Zheng; Lei Xiong; Pascal Gisler; Ralf Eberhard; René Holtackers; André P Gerber; Lucas Pelkmans; Michael O Hengartner Journal: Genes Dev Date: 2016-10-01 Impact factor: 11.361
Authors: Matiss Ozols; Alexander Eckersley; Christopher I Platt; Callum Stewart-McGuinness; Sarah A Hibbert; Jerico Revote; Fuyi Li; Christopher E M Griffiths; Rachel E B Watson; Jiangning Song; Mike Bell; Michael J Sherratt Journal: Int J Mol Sci Date: 2021-03-17 Impact factor: 5.923