Ghazaleh Taherzadeh1, Abdollah Dehzangi2, Maryam Golchin1, Yaoqi Zhou1,3, Matthew P Campbell3. 1. School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia. 2. Department of Computer Science, Morgan State University, Baltimore, MD, USA. 3. Institute for Glycomics, Griffith University, Parklands Drive, Gold Coast, QLD, Australia.
Abstract
MOTIVATION: Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. RESULTS: The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. AVAILABILITY AND IMPLEMENTATION: http://sparks-lab.org/server/SPRINT-Gly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouseglycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. RESULTS: The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouseglycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in humanglycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. AVAILABILITY AND IMPLEMENTATION: http://sparks-lab.org/server/SPRINT-Gly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Jalil Villalobos-Alva; Luis Ochoa-Toledo; Mario Javier Villalobos-Alva; Atocha Aliseda; Fernando Pérez-Escamirosa; Nelly F Altamirano-Bustamante; Francine Ochoa-Fernández; Ricardo Zamora-Solís; Sebastián Villalobos-Alva; Cristina Revilla-Monsalve; Nicolás Kemper-Valverde; Myriam M Altamirano-Bustamante Journal: Front Bioeng Biotechnol Date: 2022-07-07