| Literature DB >> 20936154 |
Seizi Someya1, Masanori Kakuta, Mizuki Morita, Kazuya Sumikoshi, Wei Cao, Zhenyi Ge, Osamu Hirose, Shugo Nakamura, Tohru Terada, Kentaro Shimizu.
Abstract
Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.Entities:
Year: 2010 PMID: 20936154 PMCID: PMC2948896 DOI: 10.1155/2010/289301
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
Figure 1Outline of prediction system.
List of query commands applied for a sequence retrieval system (SRS) to create a positive dataset.
| Subsets | Search conditions in SRS Query Language | Number of hits | Number of hits in the positive dataset |
|---|---|---|---|
| Subset 1 Lectin which are not enzymes | [libs = {swiss_prot trembl}-Description: lectin*] ∣ [libs-Keywords:Lectin*] ∣ [libs-Keywords:Chitin-binding*] ∣ [libs-Description:sugarbinding*] ! ([libs-Description: | 2017 | 231 |
|
| |||
| Subset 2 Lectin which are also enzymes | [libs = {swiss_prot trembl}-Description: lectin*] ∣ [libs-Keywords:Lectin*] ∣ [libs-Keywords:Chitin-binding*] ∣ [libs-Description: sugar-binding*] & ([libs-Description: *Peptidase*] ∣ [libs-Description: ligase*] ∣ [libs-Description: ribonuclease*] ∣ [libs-Description: *Protease*] ∣ [libs-Description: *Proteinase*] ∣ [libs-Keywords: *lipase*] ∣ [libs-Keywords: ribonuclease*] ∣ [libs-Keywords: *Protease*] ∣ [libs-Keywords: *Proteinase*] ∣ [libs-Keywords: *lipase*]) ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:] | 37 | 4 |
|
| |||
| Subset 3 Other “Carbohydrate-binding” proteins | [libs = {swiss_prot trembl}-Keywords: Carbohydrate-binding*] ∣ [libs-Description:Carbohydrate-binding*] ! [libs-Description: CUT*] ! [libs-Description: Hydrolase*] ! [libs-Description:lyase*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:] | 16 | 15 |
|
| |||
| Subset 4 Hyaluronic acid binding proteins | [libs = {swiss_prot trembl}-Description: Hyaluronate*] ∣[libs-Keywords:Hyaluronate*] ∣ [libs-Description: Hyaluronan*] ∣ [libs-Keywords:Hyaluronan*] ∣ [libs-Description: Hyaluronic*] ∣ [libs-Keywords:Hyaluronic*] ! [libs-Description: lyase*] ! [libs-Description: synthase*] & ([libs-Description: *link*] ∣ [libs-Description: *bind*] ∣ [libs-Description: *associate*] ∣ [libs-Description: *receptor*] ∣ [libs-Description: *mediate*] ∣ [libs-Keywords: *link*] ∣ [libs-Keywords: *bind*] ∣ [libs-Keywords: *associate*]) ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:] | 90 | 14 |
|
| |||
| Subset 5 Heparin-binding proteins | [libs = {swiss_prot trembl}-Keywords: Heparin-binding*] ∣ [libs-Description:Heparin-binding*] ! [libs-Description: Putative*] ! [libs-Description:lyase*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength# 30:] | 333 | 60 |
|
| |||
| Subset 6 Interleukin which can bind to sugar-chains | [libs = {swiss_prot trembl}-ID: IL1A_*] ∣ [libs-ID: IL1B_*] ∣ [libs-ID: IL4_*] ∣ [libs-ID: IL1RA_*] ∣ [libs-ID: IL6_*] ∣ [libs-ID: IL3_*] ∣ [libs-ID: IL2_*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:] | 154 | 7 |
|
| |||
| Subset 7 FimH adhesion of type 1 pili | [libs = {swiss_prot trembl}-Description: FimH*] ∣ [libs-Description: Neuraminyllactose-binding*] ∣ [libs-Description: S-fimbrial adhesin*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:]) | 1 | 1 |
|
| |||
| Subset 8 F-box only protein which can bind to sugar-chains | [libs = {swiss_prot trembl}-ID: FBX27_HUMAN*] ∣ [libs-ID: FBX6_HUMAN*] | 2 | 1 |
|
| |||
| Subset 9 Agrin. Tenascin-C Phospholipase A2 inhibitor subunit A Neurexin | [libs = {swiss_prot trembl}-ID: AGRIN_HUMAN] ∣ [libs-ID: PLIA_TRIFL] ∣ [libs-Description: Tenascin-C] ∣ [libs-ID: NRX1A_HUMAN*] | 13 | 8 |
|
| |||
| Subset 10 Chitin-binding proteins | [libs = {swiss_prot trembl}-Description: cbp-1] ! [libs-Description: Centromere* ] ! [libs-Description: EC*] ! [libs-Description: synthase*] ! [libs-Description: Putative*] ! [libs-Description:putative*] ! [libs-ProtExist: 4*] ! [libs-ProtExist: 5*] ! [libs-ProtExist: 3*] & [libs-SeqLength#30:] | 4 | 4 |
Figure 2Amino acid frequencies of carbohydrate-binding proteins.
Figure 3Prediction performance.
List of Performance measures.
| AA-20 | Levitt-6 | Someya-7 | |
|---|---|---|---|
| ACC | 0.87 | 0.83 | 0.84 |
|
| |||
| TPR | 0.83 | 0.77 | 0.80 |
| FPR | 0.09 | 0.11 | 0.11 |
| MCC | 0.74 | 0.67 | 0.70 |
|
| |||
| AUC | 0.929 | 0.890 | 0.918 |
The performance measures are obtained through the leave-one-out method with a classification threshold (decision value) of θ = 0 and the AUCs of AA-20, Levitt-6, and Someya-7 grouping methods.
Abbreviations: ACC: accuracy, TPR: true positive rate, FPR: false positive rate, MCC: Matthews correlation coefficient, and AUC: area under the ROC curve.
Figure 4Performance in genome-wide prediction.
(a)
| Nonpolar | Polar | |
|---|---|---|
|
| A, C, L, M | E, H, K, Q, R |
|
| F, I, V, W, Y | T |
| Turn | G, P | D, N, S |
(b)
| Nonpolar | Polar | |
|---|---|---|
|
| A, L, M | E, H, K, Q, R |
|
| F, I, V, W, Y | T |
| Turn | G, P | D, N, S |
| Cysteine | C |