| Literature DB >> 30841845 |
Fuyi Li1,2, Yang Zhang3, Anthony W Purcell1, Geoffrey I Webb2, Kuo-Chen Chou4,5, Trevor Lithgow6, Chen Li7,8, Jiangning Song9,10.
Abstract
BACKGROUND: As an important type of post-translational modification (PTM), protein glycosylation plays a crucial role in protein stability and protein function. The abundance and ubiquity of protein glycosylation across three domains of life involving Eukarya, Bacteria and Archaea demonstrate its roles in regulating a variety of signalling and metabolic pathways. Mutations on and in the proximity of glycosylation sites are highly associated with human diseases. Accordingly, accurate prediction of glycosylation can complement laboratory-based methods and greatly benefit experimental efforts for characterization and understanding of functional roles of glycosylation. For this purpose, a number of supervised-learning approaches have been proposed to identify glycosylation sites, demonstrating a promising predictive performance. To train a conventional supervised-learning model, both reliable positive and negative samples are required. However, in practice, a large portion of negative samples (i.e. non-glycosylation sites) are mislabelled due to the limitation of current experimental technologies. Moreover, supervised algorithms often fail to take advantage of large volumes of unlabelled data, which can aid in model learning in conjunction with positive samples (i.e. experimentally verified glycosylation sites).Entities:
Keywords: AlphaMax; Positive unlabelled-learning; Protein glycosylation prediction; Sequence analysis; Sequence-derived features; Supervised-learning
Mesh:
Substances:
Year: 2019 PMID: 30841845 PMCID: PMC6404354 DOI: 10.1186/s12859-019-2700-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The overall framework of the experiments
A statistical summary of glycosylated proteins and glycosylation sites collected from 2007, 2010, 2013, and 2016 data
| Year | Type | Initial dataset prior to redundancy removal | Final dataset after redundancy removal | ||
|---|---|---|---|---|---|
| Num. of sites | Num. of substrates | Num. of sites | Num. of substrates | ||
| 2007 | C-linked | 36 | 10 | 36 | 10 |
| N-linked | 1245 | 537 | 1208 | 520 | |
| O-linked | 321 | 101 | 320 | 100 | |
| 2010 | C-linked | 38 | 12 | 38 | 12 |
| N-linked | 2175 | 908 | 2118 | 872 | |
| O-linked | 345 | 114 | 344 | 113 | |
| 2013 | C-linked | 43 | 15 | 43 | 15 |
| N-linked | 2508 | 1004 | 2442 | 965 | |
| O-linked | 474 | 178 | 455 | 162 | |
| 2016 | C-linked | 46 | 17 | 46 | 17 |
| N-linked | 2805 | 1111 | 2728 | 1066 | |
| O-linked | 698 | 221 | 679 | 212 | |
Summary of the results for mislabelled negative sites
| Year | C-linked | N-linked | O-linked | |
|---|---|---|---|---|
| 2010 | N1a | 0 | 237 (26.04%) | 22 (91.67%) |
| P1 | 11.76% | 3.38% | 1.26% | |
| P2 | 14.86% | 3.41% | 1.28% | |
| P3 | 12.07% | 3.39% | 1.28% | |
| 2013 | N1 | 0 | 119 (36.73%) | 32 (19.82%) |
| P1 | 8.35% | 3.01% | 1.22% | |
| P2 | 9.36% | 4.36% | 1.24% | |
| P3 | 8.62% | 3.97% | 1.22% | |
| 2016 | N1 | 0 | 99 (34.62%) | 32 (19.82%) |
| P1 | 6.51% | 3.09% | 1.11% | |
| P2 | 7.15% | 4.68% | 1.13% | |
| P3 | 6.63% | 3.83% | 1.13% |
Note: a) N1, numbers and percentages of mislabelled non-glycosylation sites and their percentages as compared with previous collection years; b) P1, the actual class probability of glycosylation sites; c) P2: the prior probability of glycosylation sites estimated by the Elkan-Noto algorithm; d) P3: the prior probability of glycosylation sites estimated by the AlphaMax algorithm
Fig. 2Rapid increase in the numbers of glycosylation sites and unlabelled samples in an increasing chronological order (from years 2007 to 2016)
Performance comparison of PU-learning, supervised-learning, and one-class classification algorithms on the benchmark datasets
| Type | Algorithm | 2007 | 2010 | 2013 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| F1 | ACC | AUC | F1 | ACC | AUC | F1 | ACC | AUC | ||
| C | PA2DE (V2.0) | 0.917 ± 0.049 | 0.909 ± 0.041 | 0.975 ± 0.061 | 0.917 ± 0.147 | 0.917 ± 0.087 | 0.948 ± 0.074 | 0.917 ± 0.034 | 0.923 ± 0.073 | 0.941 ± 0.037 |
| PA2DE | 0.910 ± 0.051 | 0.904 ± 0.055 | 0.966 ± 0.039 | 0.898 ± 0.049 | 0.890 ± 0.054 | 0.925 ± 0.055 | 0.915 ± 0.044 | 0.910 ± 0.049 | 0.937 ± 0.046 | |
| PAODE | 0.843 ± 0.082 | 0.843 ± 0.077 | 0.912 ± 0.071 | 0.826 ± 0.111 | 0.837 ± 0.086 | 0.902 ± 0.083 | 0.870 ± 0.083 | 0.870 ± 0.074 | 0.933 ± 0.060 | |
| PNB | 0.889 ± 0.047 | 0.872 ± 0.060 | 0.943 ± 0.048 | 0.906 ± 0.047 | 0.895 ± 0.057 | 0.923 ± 0.047 | 0.908 ± 0.046 | 0.899 ± 0.056 | 0.936 ± 0.043 | |
| PTAN | 0.861 ± 0.056 | 0.841 ± 0.070 | 0.932 ± 0.049 | 0.887 ± 0.053 | 0.875 ± 0.064 | 0.923 ± 0.053 | 0.918 ± 0.046 | 0.912 ± 0.052 | 0.951 ± 0.039 | |
| PFBC | 0.869 ± 0.049 | 0.846 ± 0.066 | 0.947 ± 0.042 | 0.882 ± 0.045 | 0.866 ± 0.058 | 0.935 ± 0.044 | 0.893 ± 0.054 | 0.879 ± 0.068 | 0.940 ± 0.046 | |
| RF[a] | 0.847 ± 0.070 | 0.835 ± 0.075 | 0.922 ± 0.064 | 0.864 ± 0.058 | 0.856 ± 0.059 | 0.922 ± 0.056 | 0.883 ± 0.057 | 0.875 ± 0.062 | 0.941 ± 0.051 | |
| SVM | 0.810 ± 0.095 | 0.814 ± 0.080 | 0.814 ± 0.080 | 0.851 ± 0.083 | 0.853 ± 0.073 | 0.853 ± 0.073 | 0.847 ± 0.085 | 0.855 ± 0.068 | 0.855 ± 0.068 | |
| O-SVM[b] | 0.365 ± 0.152 | 0.612 ± 0.065 | 0.612 ± 0.065 | 0.400 ± 0.151 | 0.613 ± 0.066 | 0.613 ± 0.066 | 0.366 ± 0.136 | 0.606 ± 0.055 | 0.606 ± 0.055 | |
| O-Classifier[c] | 0.680 ± 0.142 | 0.760 ± 0.081 | 0.760 ± 0.081 | 0.662 ± 0.139 | 0.740 ± 0.082 | 0.740 ± 0.082 | 0.726 ± 0.122 | 0.785 ± 0.076 | 0.785 ± 0.076 | |
| N | PA2DE (V2.0) | 0.933 ± 0.041 | 0.928 ± 0.051 | 0.929 ± 0.012 | 0.989 ± 0.011 | 0.985 ± 0.003 | 0.998 ± 0.013 | 0.990 ± 0.012 | 0.985 ± 0.003 | 0.998 ± 0.011 |
| PA2DE | 0.916 ± 0.007 | 0.910 ± 0.008 | 0.914 ± 0.008 | 0.987 ± 0.003 | 0.983 ± 0.004 | 0.998 ± 0.002 | 0.988 ± 0.003 | 0.984 ± 0.004 | 0.998 ± 0.001 | |
| PAODE | 0.916 ± 0.009 | 0.910 ± 0.009 | 0.943 ± 0.026 | 0.940 ± 0.051 | 0.920 ± 0.049 | 0.928 ± 0.018 | 0.957 ± 0.004 | 0.938 ± 0.006 | 0.928 ± 0.015 | |
| PNB | 0.916 ± 0.009 | 0.910 ± 0.009 | 0.943 ± 0.026 | 0.948 ± 0.005 | 0.928 ± 0.007 | 0.923 ± 0.009 | 0.957 ± 0.004 | 0.938 ± 0.006 | 0.925 ± 0.009 | |
| PTAN | 0.985 ± 0.004 | 0.985 ± 0.004 | 0.996 ± 0.002 | 0.929 ± 0.010 | 0.899 ± 0.015 | 0.915 ± 0.010 | 0.939 ± 0.010 | 0.909 ± 0.015 | 0.920 ± 0.009 | |
| PFBC | 0.916 ± 0.008 | 0.910 ± 0.009 | 0.945 ± 0.029 | 0.949 ± 0.005 | 0.929 ± 0.007 | 0.937 ± 0.022 | 0.957 ± 0.004 | 0.938 ± 0.006 | 0.937 ± 0.021 | |
| RF[a] | 0.980 ± 0.005 | 0.980 ± 0.005 | 0.994 ± 0.003 | 0.984 ± 0.004 | 0.978 ± 0.005 | 0.997 ± 0.002 | 0.985 ± 0.003 | 0.979 ± 0.004 | 0.997 ± 0.002 | |
| SVM | 0.916 ± 0.007 | 0.910 ± 0.008 | 0.910 ± 0.008 | 0.948 ± 0.005 | 0.928 ± 0.007 | 0.912 ± 0.009 | 0.957 ± 0.004 | 0.938 ± 0.006 | 0.916 ± 0.009 | |
| O-SVM[b] | 0.551 ± 0.125 | 0.695 ± 0.695 | 0.695 ± 0.061 | 0.553 ± 0.108 | 0.585 ± 0.069 | 0.695 ± 0.051 | 0.567 ± 0.097 | 0.570 ± 0.066 | 0.701 ± 0.046 | |
| O-Classifier[c] | 0.868 ± 0.052 | 0.850 ± 0.045 | 0.894 ± 0.045 | 0.871 ± 0.037 | 0.855 ± 0.043 | 0.897 ± 0.036 | 0.923 ± 0.042 | 0.896 ± 0.051 | 0.904 ± 0.038 | |
| O | PA2DE (V2.0) | 0.979 ± 0.007 | 0.979 ± 0.011 | 0.995 ± 0.007 | 0.986 ± 0.007 | 0.986 ± 0.010 | 0.997 ± 0.006 | 0.982 ± 0.013 | 0.982 ± 0.007 | 0.996 ± 0.012 |
| PA2DE | 0.977 ± 0.011 | 0.978 ± 0.011 | 0.996 ± 0.012 | 0.983 ± 0.008 | 0.983 ± 0.008 | 0.997 ± 0.002 | 0.977 ± 0.013 | 0.977 ± 0.013 | 0.996 ± 0.005 | |
| PAODE | 0.968 ± 0.012 | 0.968 ± 0.013 | 0.994 ± 0.004 | 0.980 ± 0.010 | 0.980 ± 0.009 | 0.998 ± 0.002 | 0.967 ± 0.011 | 0.968 ± 0.010 | 0.995 ± 0.005 | |
| PNB | 0.984 ± 0.007 | 0.984 ± 0.007 | 0.997 ± 0.002 | 0.967 ± 0.010 | 0.966 ± 0.011 | 0.998 ± 0.001 | 0.980 ± 0.009 | 0.980 ± 0.008 | 0.997 ± 0.002 | |
| PTAN | 0.938 ± 0.016 | 0.936 ± 0.017 | 0.987 ± 0.007 | 0.942 ± 0.013 | 0.940 ± 0.014 | 0.988 ± 0.006 | 0.935 ± 0.014 | 0.932 ± 0.015 | 0.987 ± 0.006 | |
| PFBC | 0.967 ± 0.012 | 0.966 ± 0.013 | 0.998 ± 0.002 | 0.965 ± 0.011 | 0.965 ± 0.011 | 0.995 ± 0.003 | 0.963 ± 0.010 | 0.962 ± 0.010 | 0.993 ± 0.004 | |
| RF[a] | 0.979 ± 0.016 | 0.979 ± 0.016 | 0.994 ± 0.006 | 0.980 ± 0.016 | 0.981 ± 0.015 | 0.996 ± 0.004 | 0.967 ± 0.011 | 0.968 ± 0.010 | 0.995 ± 0.005 | |
| SVM | 0.977 ± 0.011 | 0.978 ± 0.011 | 0.996 ± 0.012 | 0.974 ± 0.014 | 0.974 ± 0.013 | 0.996 ± 0.005 | 0.981 ± 0.014 | 0.981 ± 0.013 | 0.994 ± 0.005 | |
| O-SVM[b] | 0.582 ± 0.053 | 0.691 ± 0.026 | 0.691 ± 0.026 | 0.575 ± 0.575 | 0.681 ± 0.028 | 0.681 ± 0.028 | 0.578 ± 0.045 | 0.666 ± 0.027 | 0.666 ± 0.027 | |
| O-Classifier[c] | 0.695 ± 0.039 | 0.593 ± 0.080 | 0.593 ± 0.080 | 0.665 ± 0.026 | 0.535 ± 0.059 | 0.535 ± 0.059 | 0.702 ± 0.026 | 0.621 ± 0.059 | 0.621 ± 0.059 | |
[a] RF – Random Forest; [b] O-SVM – One-class SVM; [c] O-Classifier – One-class Classifier
The numbers of glycosylated proteins and corresponding sites included in the test datasets
| Type | Num. of Sites | Num. of Substrates |
|---|---|---|
| C-linked | 3 | 2 |
| N-linked | 324 | 156 |
| O-linked | 244 | 76 |
Performance comparison of PU-learning, supervised-learning, and one-class classification algorithms on the test datasets
| Type | Algorithm | 2007 | 2010 | 2013 | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| F1 | ACC | AUC | F1 | ACC | AUC | F1 | ACC | AUC | ||
| N | PA2DE (V2.0) | 0.935 ± 0.003 | 0.934 ± 0.004 | 0.949 ± 0.007 | 0.933 ± 0.001 | 0.932 ± 0.002 | 0.950 ± 0.005 | 0.962 ± 0.011 | 0.963 ± 0.007 | 0.997 ± 0.008 |
| PA2DE | 0.930 ± 0.002 | 0.928 ± 0.002 | 0.943 ± 0.004 | 0.930 ± 0.002 | 0.929 ± 0.003 | 0.947 ± 0.003 | 0.951 ± 0.018 | 0.952 ± 0.017 | 0.996 ± 0.002 | |
| PAODE | 0.929 ± 0.013 | 0.928 ± 0.010 | 0.958 ± 0.013 | 0.922 ± 0.051 | 0.923 ± 0.035 | 0.950 ± 0.010 | 0.931 ± 0.002 | 0.929 ± 0.003 | 0.950 ± 0.008 | |
| PNB | 0.929 ± 0.004 | 0.928 ± 0.004 | 0.950 ± 0.013 | 0.896 ± 0.074 | 0.887 ± 0.094 | 0.954 ± 0.070 | 0.931 ± 0.003 | 0.929 ± 0.003 | 0.955 ± 0.011 | |
| PTAN | 0.916 ± 0.013 | 0.913 ± 0.015 | 0.933 ± 0.004 | 0.876 ± 0.019 | 0.860 ± 0.024 | 0.938 ± 0.006 | 0.875 ± 0.019 | 0.859 ± 0.024 | 0.941 ± 0.003 | |
| PFBC | 0.910 ± 0.016 | 0.904 ± 0.018 | 0.939 ± 0.004 | 0.930 ± 0.004 | 0.928 ± 0.005 | 0.955 ± 0.012 | 0.893 ± 0.076 | 0.882 ± 0.096 | 0.939 ± 0.088 | |
| RF[a] | 0.924 ± 0.018 | 0.929 ± 0.016 | 0.994 ± 0.004 | 0.922 ± 0.039 | 0.923 ± 0.035 | 0.950 ± 0.010 | 0.931 ± 0.002 | 0.929 ± 0.003 | 0.947 ± 0.003 | |
| SVM | 0.919 ± 0.002 | 0.907 ± 0.002 | 0.935 ± 0.002 | 0.904 ± 0.002 | 0.897 ± 0.003 | 0.929 ± 0.003 | 0.931 ± 0.002 | 0.929 ± 0.003 | 0.929 ± 0.003 | |
| O-SVM[b] | 0.683 ± 0.009 | 0.740 ± 0.011 | 0.740 ± 0.011 | 0.689 ± 0.010 | 0.748 ± 0.012 | 0.748 ± 0.012 | 0.689 ± 0.010 | 0.747 ± 0.012 | 0.747 ± 0.012 | |
| O-Classifier[c] | 0.820 ± 0.032 | 0.847 ± 0.023 | 0.847 ± 0.023 | 0.849 ± 0.029 | 0.865 ± 0.023 | 0.865 ± 0.023 | 0.836 ± 0.029 | 0.857 ± 0.021 | 0.857 ± 0.021 | |
| O | PA2DE (V2.0) | 0.933 ± 0.046 | 0.930 ± 0.053 | 0.986 ± 0.010 | 0.945 ± 0.021 | 0.943 ± 0.014 | 0.995 ± 0.012 | 0.986 ± 0.013 | 0.986 ± 0.020 | 0.997 ± 0.031 |
| PA2DE | 0.928 ± 0.052 | 0.924 ± 0.060 | 0.978 ± 0.022 | 0.932 ± 0.018 | 0.928 ± 0.050 | 0.981 ± 0.019 | 0.974 ± 0.019 | 0.974 ± 0.019 | 0.994 ± 0.006 | |
| PAODE | 0.848 ± 0.061 | 0.816 ± 0.090 | 0.976 ± 0.006 | 0.923 ± 0.017 | 0.926 ± 0.014 | 0.984 ± 0.019 | 0.952 ± 0.015 | 0.955 ± 0.013 | 0.996 ± 0.007 | |
| PNB | 0.906 ± 0.030 | 0.896 ± 0.036 | 0.989 ± 0.002 | 0.926 ± 0.017 | 0.921 ± 0.020 | 0.991 ± 0.002 | 0.970 ± 0.012 | 0.969 ± 0.012 | 0.997 ± 0.001 | |
| PTAN | 0.798 ± 0.075 | 0.832 ± 0.051 | 0.961 ± 0.011 | 0.844 ± 0.044 | 0.815 ± 0.067 | 0.924 ± 0.052 | 0.886 ± 0.064 | 0.867 ± 0.090 | 0.972 ± 0.035 | |
| PFBC | 0.838 ± 0.046 | 0.810 ± 0.070 | 0.916 ± 0.057 | 0.910 ± 0.031 | 0.901 ± 0.038 | 0.990 ± 0.002 | 0.904 ± 0.073 | 0.886 ± 0.103 | 0.991 ± 0.004 | |
| RF[a] | 0.914 ± 0.019 | 0.919 ± 0.016 | 0.984 ± 0.015 | 0.923 ± 0.017 | 0.926 ± 0.014 | 0.984 ± 0.019 | 0.952 ± 0.015 | 0.955 ± 0.013 | 0.996 ± 0.007 | |
| SVM | 0.924 ± 0.014 | 0.919 ± 0.016 | 0.988 ± 0.002 | 0.930 ± 0.008 | 0.975 ± 0.009 | 0.975 ± 0.009 | 0.920 ± 0.019 | 0.924 ± 0.020 | 0.974 ± 0.001 | |
| O-SVM[b] | 0.677 ± 0.016 | 0.537 ± 0.033 | 0.537 ± 0.033 | 0.661 ± 0.006 | 0.506 ± 0.012 | 0.506 ± 0.012 | 0.665 ± 0.007 | 0.007 ± 0.016 | 0.506 ± 0.016 | |
| O-Classifier[c] | 0.141 ± 0.141 | 0.529 ± 0.033 | 0.529 ± 0.033 | 0.135 ± 0.100 | 0.532 ± 0.025 | 0.532 ± 0.025 | 0.144 ± 0.116 | 0.527 ± 0.035 | 0.527 ± 0.035 | |
[a] RF – Random Forest; [b] O-SVM – One-class SVM; [c] O-Classifier – One-class Classifier
Statistical significance of PA2DE performance in terms of F1 scores relative to the RF and SVM algorithms on the test datasets
| Type | Algorithm | 2007 | 2010 | 2013 |
|---|---|---|---|---|
| N-linked | PA2DE | 6.35E-04 | 0.0369 | 6.07E-23 |
| Random Forest | ||||
| PA2DE | 5.61E-21 | 8.10E-06 | 6.96E-23 | |
| SVM | ||||
| PA2DE (V2.0) | 2.44E-09 | 0.0233 | 7.35E-40 | |
| Random Forest | ||||
| PA2DE (V2.0) | 1.09E-32 | 1.34E-06 | 8.33E-40 | |
| SVM | ||||
| O-linked | PA2DE | 0.0104 | 6.10E-04 | 1.86E-09 |
| Random Forest | ||||
| PA2DE | 0.4566 | 0.0210 | 9.64E-16 | |
| SVM | ||||
| PA2DE (V2.0) | 6.23E-04 | 1.04E-04 | 6.53E-20 | |
| Random Forest | ||||
| PA2DE (V2.0) | 0.0986 | 9.19E-04 | 8.19E-29 | |
| SVM |
Fig. 3Boxplots showing that PA2DE outperformed the RF and SVM algorithms in terms of F1 score on the test datasets
A statistical summary of glycosylated proteins and glycosylation sites collected from UniProt, dbPTM and PhosphoSitePlus
| Type | Before redundancy removal | After redundancy removal | ||
|---|---|---|---|---|
| Num. of Proteins | Num. of Sites | Num. of Proteins | Num. of Sites | |
| C-linked | 13 | 134 | 10 | 109 |
| N-linked (motif) | 1103 | 3850 | 770 | 2669 |
| N-linked (non-motif) | 100 | 158 | 91 | 146 |
| O-linked (S) | 192 | 683 | 165 | 602 |
| O-linked (T) | 169 | 2150 | 155 | 2095 |
Numbers of glycosylation sites included in the training sets and independent test sets
| Type | Training set | Independent test set |
|---|---|---|
| C-linked | 76 | 33 |
| N-linked (motif) | 1869 | 800 |
| N-linked (non-motif) | 102 | 44 |
| O-linked (S) | 421 | 181 |
| O-linked (T) | 1467 | 628 |
The number of different selected feature groups as result of feature selection
| Type | AAC | Auto-correlation | CTD | Sequence-order | Pseudo-AAC | AAindex |
|---|---|---|---|---|---|---|
| C-linked | 3 | 23 | 2 | 4 | 5 | 63 |
| N-linked (motif) | 2 | 1 | 4 | 2 | 4 | 87 |
| N-linked (non-motif) | 4 | 1 | 3 | 2 | 4 | 86 |
| O-linked (S) | 11 | 7 | 13 | 3 | 3 | 63 |
| O-linked (T) | 8 | 4 | 8 | 2 | 2 | 76 |
Summary of the training datasets and performance results of PA2DE (V2.0)
| Type | Number of Sites | AUC | ACC | F1 | |
|---|---|---|---|---|---|
| Positive | Unlabelled | ||||
| C-linked | 76 | 258 | 0.997 | 0.981 | 0.981 |
| N-linked (motif) | 1869 | 7111 | 0.927 | 0.886 | 0.894 |
| N-linked (non-motif) | 102 | 4888 | 0.874 | 0.974 | 0.973 |
| O-linked (S) | 421 | 8331 | 0.974 | 0.972 | 0.972 |
| O-linked (T) | 1467 | 10,000 | 0.876 | 0.857 | 0.859 |
Performance comparison results between different methods on the independent test datasets
| Type | Methods | AUC | ACC | F1 |
|---|---|---|---|---|
| C-linked | PA2DE (V2.0) | 0.999 | 0.983 | 0.984 |
| GlycoEP | 0.546 | 0.600 | 0.647 | |
| ModPred | 0.933 | 0.933 | 0.938 | |
| N-linked (motif) | PA2DE (V2.0) | 0.893 | 0.815 | 0.820 |
| GlycoEP | 0.697 | 0.637 | 0.638 | |
| NetNGlyc | 0.638 | 0.616 | 0.627 | |
| ModPred | 0.837 | 0.782 | 0.791 | |
| N-linked (non-motif) | PA2DE (V2.0) | 0.872 | 0.761 | 0.758 |
| GlycoEP | 0.669 | 0.648 | 0.644 | |
| NetNGlyc | 0.630 | 0.716 | 0.675 | |
| ModPred | 0.842 | 0.807 | 0.773 | |
| O-linked (S) | PA2DE (V2.0) | 0.915 | 0.859 | 0.766 |
| GlycoEP | 0.796 | 0.787 | 0.848 | |
| ModPred | 0.873 | 0.770 | 0.670 | |
| NetOGlyc | 0.770 | 0.662 | 0.583 | |
| O-linked (T) | PA2DE (V2.0) | 0.864 | 0.793 | 0.827 |
| GlycoEP | 0.739 | 0.694 | 0.747 | |
| ModPred | 0.821 | 0.746 | 0.781 | |
| NetOGlyc | 0.769 | 0.718 | 0.763 |
Fig. 4ROC curves for PA2DE (V2.0), NetNGlyc, NetOGlyc, GlycoEP, and ModPred on independent test datasets