Simone Marini1, Francesca Vitali2, Sara Rampazzi3, Andrea Demartini4, Tatsuya Akutsu5. 1. Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA. 2. Department of Medicine, Center for Biomedical Informatics and Biostatistics, BIO5 Institute), University of Arizona, Tucson, AZ, USA. 3. Department of Computer Science and Engineering, University of Michigan, Ann Arbor, MI, USA. 4. Department of Electrical Computer and Biomedical Engineering, University of Pavia, Pavia, Italy. 5. Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan.
Abstract
MOTIVATION: Protein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning-based models, to guide the discovery of targets for the proteases responsible for protein cleavage. State-of-the-art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity or gene-gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration. RESULTS: By representing protease-protein target information in the form of relational matrices, we design a model (i) that is general and not limited to a single protease family, and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains and interactions. When compared with other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family. AVAILABILITY AND IMPLEMENTATION: https://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Protein cleavage is an important cellular event, involved in a myriad of processes, from apoptosis to immune response. Bioinformatics provides in silico tools, such as machine learning-based models, to guide the discovery of targets for the proteases responsible for protein cleavage. State-of-the-art models have a scope limited to specific protease families (such as Caspases), and do not explicitly include biological or medical knowledge (such as the hierarchical protein domain similarity or gene-gene interactions). To fill this gap, we present a novel approach for protease target prediction based on data integration. RESULTS: By representing protease-protein target information in the form of relational matrices, we design a model (i) that is general and not limited to a single protease family, and (b) leverages on the available knowledge, managing extremely sparse data from heterogeneous data sources, including primary sequence, pathways, domains and interactions. When compared with other algorithms on test data, our approach provides a better performance even for models specifically focusing on a single protease family. AVAILABILITY AND IMPLEMENTATION: https://gitlab.com/smarini/MaDDA/ (Matlab code and utilized data.). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Gennady G Fedonin; Alexey Eroshkin; Piotr Cieplak; Evgenii V Matveev; Gennady V Ponomarev; Mikhail S Gelfand; Boris I Ratnikov; Marat D Kazanov Journal: Biochim Biophys Acta Proteins Proteom Date: 2019-07-19 Impact factor: 3.036