| Literature DB >> 21034461 |
Shu-An Chen1, Tzong-Yi Lee, Yu-Yen Ou.
Abstract
BACKGROUND: While occurring enzymatically in biological systems, O-linked glycosylation affects protein folding, localization and trafficking, protein solubility, antigenicity, biological activity, as well as cell-cell interactions on membrane proteins. Catalytic enzymes involve glycotransferases, sugar-transferring enzymes and glycosidases which trim specific monosaccharides from precursors to form intermediate structures. Due to the difficulty of experimental identification, several works have used computational methods to identify glycosylation sites.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21034461 PMCID: PMC2989983 DOI: 10.1186/1471-2105-11-536
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The analyzing flowchart of GlycoRBF.
The statistics of experimentally verified O-linked glycosylation sites on transmembrane (TM) proteins and non-transmembrane (non-TM) proteins.
| Database | Original data | Independent test data | ||
|---|---|---|---|---|
| TM proteins | non-TM proteins | TM proteins | non-TM proteins | |
| Number of O-linked glycosylated proteins | 40 | 199 | 9 | 13 |
| Number of O-linked glycosylated serine | 87 | 281 | 7 | 10 |
| Number of O-linked glycosylated threonine | 115 | 356 | 7 | 34 |
| Number of non-glycosylated serine and threonine | 2,450 | 14,234 | 1,238 | 662 |
The experimental O-linked glycosylated proteins extracted from UniProt release 15.0 are regarded as the original data. To evaluate the real performance of the constructed models, the experimental O-linked glycosylated proteins collected from HPRD release 8.0 are used for independent testing.
The distribution of structural topology on 40 transmembrane proteins that are extracted from release 15.0 of UniProt.
| UniProt ID | Sequence length | Protein name | Number of TM segment | Percentage of sequence length in specific structural topology | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| L | N | E | C | TM | S | Unknown | ||||
| O08984 | 620 | Lamin-B receptor | TM = 8 | 0.0% | 34.4% | 0.0% | 0.0% | 27.4% | 0.0% | 38.2% |
| O14786 | 923 | Neuropilin-1 | TM = 1 | 0.0% | 0.0% | 90.5% | 4.8% | 2.5% | 2.3% | 0.0% |
| O15431 | 190 | High affinity copper uptake protein 1 | TM = 3 | 0.0% | 0.0% | 33.7% | 33.2% | 33.2% | 0.0% | 0.0% |
| P01375 | 233 | Tumor necrosis factor | TM = 1 | 0.0% | 0.0% | 76% | 15% | 9% | 0.0% | 0.0% |
| P01589 | 272 | Interleukin-2 receptor alpha chain | TM = 1 | 0.0% | 0.0% | 80.5% | 4.8% | 7% | 7.7% | 0.0% |
| P01867 | 404 | Ig gamma-2B chain C region | TM = 1 | 0.0% | 0.0% | 86.6% | 8.9% | 4.5% | 0.0% | 0.0% |
| P02724 | 150 | Glycophorin-A | TM = 1 | 0.0% | 0.0% | 48% | 24% | 15.3% | 12.7% | 0.0% |
| P02725 | 133 | Glycophorin-A | TM = 1 | 0.0% | 0.0% | 46.6% | 39.0.0% | 17.3% | 0.0% | 0.0% |
| P02726 | 120 | Glycophorin-A | TM = 1 | 0.0% | 0.0% | 40.8% | 40.0% | 19.2% | 0.0% | 0.0% |
| P02727 | 129 | Glycophorin-A | TM = 1 | 0.0% | 0.0% | 50.4% | 20.2% | 16.3% | 13.2% | 0.0% |
| P02786 | 760 | Transferrin receptor protein 1 | TM = 1 | 0.0% | 0.0% | 88.4% | 8.8% | 2.8% | 0.0% | 0.0% |
| P03138 | 389 | Large envelope protein | TM = 4 | 0.0% | 0.0% | 63.2% | 14.7% | 22.1% | 0.0% | 0.0% |
| P04441 | 279 | H-2 class II histocompatibility antigen gamma chain | TM = 1 | 0.0% | 0.0% | 80.3% | 10.4% | 9.3% | 0.0% | 0.0% |
| P04921 | 128 | Glycophorin-C | TM = 1 | 0.0% | 0.0% | 44.5% | 36.7% | 18.8% | 0.0% | 0.0% |
| P05067 | 770 | Amyloid beta A4 protein | TM = 1 | 0.0% | 0.0% | 88.6% | 6.1% | 3.1% | 2.2% | 0.0% |
| P06028 | 91 | Glycophorin-B | TM = 1 | 0.0% | 0.0% | 44% | 11% | 24.2% | 20.9% | 0.0% |
| P07204 | 575 | Thrombomodulin | TM = 1 | 0.0% | 0.0% | 86.4% | 6.3% | 4.2% | 3.1% | 0.0% |
| P07359 | 626 | Platelet glycoprotein Ib alpha chain | TM = 1 | 0.0% | 0.0% | 78.1% | 16% | 3.4% | 2.6% | 0.0% |
| P07725 | 236 | T-cell surface glycoprotein CD8 alpha chain | TM = 1 | 0.0% | 0.0% | 69.1% | 11% | 8.9% | 11% | 0.0% |
| P08514 | 1039 | Integrin alpha-IIb | TM = 1 | 0.0% | 0.0% | 92.6% | 1.9% | 2.5% | 3% | 0.0% |
| P08592 | 770 | Amyloid beta A4 protein | TM = 1 | 0.0% | 0.0% | 88.6% | 6.1% | 3.1% | 2.2% | 0.0% |
| P11279 | 417 | Lysosome-associated membrane glycoprotein 1 | TM = 1 | 84.9% | 0.0% | 0.0% | 2.9% | 5.5% | 6.7% | 0.0% |
| P13473 | 410 | Lysosome-associated membrane glycoprotein 2 | TM = 1 | 84.6% | 0.0% | 0.0% | 2.7% | 5.9% | 6.8% | 0.0% |
| P13838 | 378 | Leukosialin | TM = 1 | 0.0% | 0.0% | 59.6% | 32.8% | 6.1% | 1.9% | 0.0% |
| P14221 | 144 | Glycophorin-A | TM = 1 | 0.0% | 0.0% | 47.9% | 36.1% | 16.0.0% | 0.0% | 0.0% |
| P16150 | 400 | Leukosialin | TM = 1 | 0.0% | 0.0% | 58.2% | 31% | 5.8% | 4.8% | 0.0% |
| P17404 | 335 | Chondromodulin-1 | TM = 1 | 0.0% | 0.0% | 0.0% | 0.0% | 6.3% | 0.0% | 93.7% |
| P21583 | 273 | Kit ligand | TM = 1 | 0.0% | 0.0% | 69.2% | 13.2% | 8.4% | 9.2% | 0.0% |
| P42098 | 421 | Zona pellucida sperm-binding protein 3 | TM = 1 | 0.0% | 0.0% | 85.3% | 4.5% | 5.0.0% | 5.2% | 0.0% |
| P51681 | 352 | C-C chemokine receptor type 5 | TM = 7 | 0.0% | 0.0% | 26.1% | 27.0.0% | 46.9% | 0.0% | 0.0% |
| P61073 | 352 | C-X-C chemokine receptor type 4 | TM = 7 | 0.0% | 0.0% | 281% | 29.8% | 42.0.0% | 0.0% | 0.0% |
| P80370 | 383 | Protein delta homolog 1 | TM = 1 | 0.0% | 0.0% | 73.1% | 14.6% | 6.3% | 6.0.0% | 0.0% |
| Q00657 | 2326 | Chondroitin sulfate proteoglycan 4 | TM = 1 | 0.0% | 0.0% | 94.4% | 3.3% | 1.1% | 1.2% | 0.0% |
| Q01455 | 230 | Membrane protein | TM = 3 | 17.0.0% | 0.0% | 0.0% | 57.4% | 25.7% | 0.0% | 0.0% |
| Q07287 | 536 | Zona pellucida sperm-binding protein 4 | TM = 1 | 0.0% | 0.0% | 91.4% | 0.7% | 3.9% | 3.9% | 0.0% |
| Q09163 | 385 | Protein delta homolog 1 | TM = 1 | 0.0% | 0.0% | 73.2% | 14.5% | 6.2% | 6.0.0% | 0.0% |
| Q16790 | 459 | Carbonic anhydrase 9 | TM = 1 | 0.0% | 0.0% | 82.1% | 5.2% | 4.6% | 8.1% | 0.0% |
| Q62765 | 843 | Neuroligin-1 | TM = 1 | 0.0% | 0.0% | 77.3% | 14.8% | 2.5% | 5.3% | 0.0% |
| Q71M36 | 566 | Chondroitin sulfate proteoglycan 5 | TM = 1 | 0.0% | 0.0% | 69.4% | 21.6% | 3.7% | 5.3% | 0.0% |
| Q99075 | 208 | Proheparin-binding EGF-like growth factor | TM = 1 | 0.0% | 0.0% | 67.8% | 11.5% | 11.5% | 9.1% | 0.0% |
Abbreviation: L, Lumenal; N, Nucleoplasmic; E, Extracellular; C, Cytoplasmic; TM, Transmembrane; S, Signal peptide.
The structural distribution of O-linked glycosylation sites on 40 transmembrane proteins that are extracted from release 15.0 of UniProt.
| Type of membrane topology | Number of O-linked glycosylation sites |
|---|---|
| Extracellular | 177 |
| Lumenal | 22 |
| Nucleoplasmic | 1 |
| Cytoplasmic | 0 |
| Transmembrane | 0 |
| Unknown | 2 |
The sequence frequency logos of O-linked glycosylated serine and threonine on transmembrane (TM) proteins and non-transmembrane (non-TM) proteins that are extracted from release 15.0 of UniProt.
| Glycosylated residues | Number of non-homologous sites | Number of proteins | Window length | Sequence frequency logo |
|---|---|---|---|---|
| Glycosylated serine on TM proteins | 87 | 29 | -14 ~ +14 | |
| Glycosylated serine on non-TM proteins | 281 | 27 | -14 ~ +14 | |
| Glycosylated threonine on TM proteins | 115 | 118 | -14 ~ +14 | |
| Glycosylated threonine on non-TM proteins | 356 | 137 | -14 ~ +14 | |
The significant amino acid pairs that surround the O-linked glycosylated serine and threonine on transmembrane (TM) proteins and non-transmembrane (non-TM) proteins that are extracted from release 15.0 of UniProt.
| O-linked glycosylated serine | O-linked glycosylated threonine | ||||||
|---|---|---|---|---|---|---|---|
| TM proteins | non-TM proteins | TM proteins | non-TM proteins | ||||
| (+3T, +9E or +9T) | 0.071 | (+1G,-2E) | 0.058 | (+5 S,-9 M or -9P) | 0.057 | (+3P,-1P') | 0.078 |
| (-7T,+2 S or +3S) | 0.064 | (+1G,-1G) | 0.041 | (+5 S,-7T or -7S) | 0.053 | (+1P,-1P') | 0.058 |
| (+5P,+3P or +3T) | 0.064 | (+1G,+11G) | 0.041 | (+5P,+9E or +9P) | 0.05 | (+3P,+1P') | 0.053 |
| (+1T,+4P) | 0.056 | (+1G,+5G) | 0.04 | (-7T,-6 S or -6T) | 0.049 | (-1P,+5P') | 0.053 |
| (+1P,-1P) | 0.056 | (+1G,-3G) | 0.034 | (+5E,+1P or +1T or +1V) | 0.047 | (-4T,+3P') | 0.036 |
| (-7P,-5T) | 0.053 | (+1G,+10S) | 0.034 | (+7T,+8 S or +8P) | 0.047 | (+1P,+5P') | 0.033 |
| (+1G,+8E) | 0.049 | (+4 S,+5G) | 0.032 | (-7T,+8T) | 0.044 | (-4T,-1P') | 0.033 |
| (-7T,-12A or -12S) | 0.047 | (-1G,-2E) | 0.031 | (+7T,+10P or +10S) | 0.043 | (-4T,+14T') | 0.033 |
| (-7T,+6T) | 0.047 | (-5 S,+12S) | 0.029 | (-7T,-12S) | 0.042 | (-4T,+4T') | 0.032 |
| (+1G,-13D) | 0.046 | (-1P,-3P) | 0.028 | (+5P,-9T or -9V) | 0.042 | (+3T,+2A') | 0.031 |
| (+5P,-12S) | 0.046 | (+4 S,+1G) | 0.027 | (+1 S,-8 S or -8T) | 0.041 | (-4T,-10T') | 0.030 |
| (-7P,-10 S or -10T) | 0.045 | (+1G,+2E) | 0.027 | (+1T,+11 S or +11P) | 0.041 | (-4T,+2T') | 0.030 |
| (+3P,+11S) | 0.043 | (-5 S,+9A) | 0.027 | (+7P,+12 S or +12T) | 0.040 | (-1A,+3P') | 0.030 |
| (-7G,+6K or +6H) | 0.038 | (+9A,-5S) | 0.027 | (-7 S,+10S) | 0.039 | ||
| (+3T,+10I) | 0.036 | (+1T,-7T) | 0.026 | (+7 S,+10S) | 0.039 | ||
| (+1G,-1G) | 0.034 | (+4P,-3P) | 0.025 | (+1P,-13Q or -13T) | 0.037 | ||
| (-1G,-3G) | 0.025 | (-7P,-11 S or -11L) | 0.036 | ||||
| (-1P,+3P) | 0.025 | (+1P,-3P or -3S) | 0.036 | ||||
| (-5G,-4D) | 0.025 | (+5T,+10 S or +10E) | 0.036 | ||||
| (-5 S,+2S) | 0.024 | (+5E,-7 S or -7T) | 0.036 | ||||
| (+9A,+12S) | 0.024 | (+7P,+6D) | 0.035 | ||||
| (-5 S,-12T) | 0.023 | (+1 S,-6 S or -6T) | 0.034 | ||||
| (-1G,+9G) | 0.021 | (-7 S,-13P) | 0.033 | ||||
| (-5 S,-7S) | 0.019 | (+1T,-13L or -13D) | 0.030 | ||||
| (-1G,+7S) | 0.018 | (-10P,+7P or +7E) | 0.017 | ||||
| (+4 S,+7S) | 0.017 | ||||||
| (+4 S,+12S) | 0.017 | ||||||
| (+4 S,-2E) | 0.017 | ||||||
| (-1G,+4S) | 0.017 | ||||||
| (-5G,-6S) | 0.017 | ||||||
| (+4 S,+14S) | 0.016 | ||||||
| (-5G,+10S) | 0.015 | ||||||
The five-fold cross-validation performance of O-linked glycosylation sites on transmembrane (TM) proteins and non-transmembrane (non-TM) proteins that are extracted from release 15.0 of UniProt.
| Training features | Amino acid (BLOSUM62) | Amino acid + SAAPs | Amino acid + SAAPs + Membrane topology* | ||
|---|---|---|---|---|---|
| TM proteins | non-TM proteins | TM proteins | non-TM proteins | TM proteins | |
| True Positive | 132 | 361 | 132 | 384 | 132 |
| False Positive | 2124 | 365 | 1933 | ||
| True Negative | 1988 | 12110 | 2085 | 12301 | 2133 |
| False Negative | 70 | 276 | 70 | 253 | 70 |
| Sensitivity | 65.3% | 56.7% | 65.3% | 60.3% | 65.3% |
| Specificity | 81.1% | 85.1% | 85.1% | 86.4% | 87.1% |
| Accuracy | 79.9% | 83.9% | 83.6% | 85.3% | 85.4% |
| Balanced Accuracy | 73.2% | 70.9% | 75.2% | 73.4% | 76.2% |
| MCC | 0.30 | 0.23 | 0.34 | 0.26 | 0.37 |
*This process considers the structural topology of transmembrane in the prediction of O-linked glycosylation sites on transmembrane proteins.
Comparison of independent test between GlycoRBF and other methods.
| Methods | GlycoRBF | GPP | NetOglyc3.1 | CKSAAP | |
|---|---|---|---|---|---|
| O-linked glycosylation sites on TM proteins | Sensitivity | 50.0% | 57.1% | 21.4% | 28.6% |
| Specificity | 70.2% | 38.4% | 82.1% | 74.2% | |
| Accuracy | 70.0% | 38.7% | 81.4% | 73.6% | |
| Balanced Accuracy | 47.8% | 51.7% | 51.4% | ||
| MCC | 0.05 | -0.01 | 0.01 | 0.01 | |
| O-linked glycosylation sites on non-TM proteins | Sensitivity | 61.4% | 54.5% | 47.7% | 20.5% |
| Specificity | 80.4% | 44.1% | 82.0% | 78.2% | |
| Accuracy | 79.2% | 44.8% | 79.9% | 74.6% | |
| Balanced Accuracy | 49.3% | 64.9% | 49.4% | ||
| MCC | 0.24 | -0.01 | 0.18 | -0.01 | |
Figure 2The user interface of web-based GlycoRBF.
Figure 3A case study of human cyclic AMP-dependent transcription factor ATF-6 alpha that contains three O-linked glycosylation sites at 474T, 586T, and 645T, based on the annotation of HPRD.
Figure 4The .