| Literature DB >> 20634983 |
Khader Shameer1, Ganesan Pugalenthi, Krishna Kumar Kandaswamy, Ponnuthurai N Suganthan, Govindaraju Archunan, Ramanathan Sowdhamini.
Abstract
3-dimensional domain swapping is a mechanism where two or more protein molecules form higher order oligomers by exchanging identical or similar subunits. Recently, this phenomenon has received much attention in the context of prions and neurodegenerative diseases, due to its role in the functional regulation, formation of higher oligomers, protein misfolding, aggregation etc. While 3-dimensional domain swap mechanism can be detected from three-dimensional structures, it remains a formidable challenge to derive common sequence or structural patterns from proteins involved in swapping. We have developed a SVM-based classifier to predict domain swapping events using a set of features derived from sequence and structural data. The SVM classifier was trained on features derived from 150 proteins reported to be involved in 3D domain swapping and 150 proteins not known to be involved in swapped conformation or related to proteins involved in swapping phenomenon. The testing was performed using 63 proteins from the positive dataset and 63 proteins from the negative dataset. We obtained 76.33% accuracy from training and 73.81% accuracy from testing. Due to high diversity in the sequence, structure and functions of proteins involved in domain swapping, availability of such an algorithm to predict swapping events from sequence and structure-derived features will be an initial step towards identification of more putative proteins that may be involved in swapping or proteins involved in deposition disease. Further, the top features emerging in our feature selection method may be analysed further to understand their roles in the mechanism of domain swapping.Entities:
Keywords: 3D domain swapping; SVM; domain swap; feature selection; machine learning
Year: 2010 PMID: 20634983 PMCID: PMC2901629 DOI: 10.4137/bbi.s4464
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
List of 10 structures with GO annotation, SCOP fold and Pfam domain ID.
| 1A64 | antigen binding, protein binding, protein homodimerization activity, protein self-association | Immunoglobulin-like beta-sandwich | V-set |
| 1OQF | catalytic activity, lyase activity, methylisocitrate lyase activity | TIM beta/alpha-barrel | ICL |
| 1K6 W | cytosine deaminase activity, iron ion binding, hydrolase activity, hydrolase activity, acting on carbon-nitrogen (but not peptide) bond, metal ion binding | Composite domain of metallo-dependent hydrolases | Amidohydro_3 |
| 11BA | nucleic acid binding, nuclease activity, endonuclease activity, pancreatic ribonuclease activity, hydrolase activity | RNase A-like | Rnase A |
| 1EK1 | magnesium ion binding, catalytic activity, epoxide hydrolase activity, hydrolase activity, metal ion binding | alpha/beta-Hydrolases, HAD-like | Abhydrolase_1, Hydrolase |
| 1I21 | glucosamine 6-phosphate N-acetyltransferase activity, N-acetyltransferase activity, acyltransferase activity, transferase activity | Acyl-CoA N-acyltransferases (Nat) | Acetyltransf_1 |
| 1M5M | sugar binding | Cyanovirin-N | CVNH |
| 1FRO | lactoylglutathione lyase activity, zinc ion binding, lyase activity, metal ion binding | Glyoxalase/Bleomycin resistance protein/Dihydroxybiphenyl dioxygenase | Glyoxalase |
| 1DDT | transferase activity, transferase activity, transferring glycosyl groups, NAD+-diphthamide ADP-ribosyltransferase activity | Common fold of diphtheria toxin/transcription factors/cytochrome f | Diphtheria_R, Diphtheria_T, Diphtheria_C |
| 1LSS | catalytic activity, binding, cation transmembrane transporter activity, potassium ion binding | NAD(P)-binding Rossmann-fold domains | TrkA_N |
Figure 1.Structures of three different proteins involved in 3D domain swapping (PDB IDs: 1A64,57 1OQF,58 1K6W59). Hinge region is colored in red and swapped segment is in coffee brown.
Figure 2.Schematic representation of data curation steps.
Performance evaluation on training data (150 proteins from positive dataset and 150 proteins from negative dataset).
| 10 features | 71.67 |
| 25 features | 75.33 |
| 50 features | 76.33 |
| All features (66) | 76.33 |
Test with independent validation dataset (63 proteins from positive dataset and 63 proteins from negative dataset).
| 10 features | 69.84 | 66.67 | 0.37 | 68.25 | 67.69 | 68.85 |
| 25 features | 73.02 | 65.08 | 0.38 | 69.05 | 67.65 | 70.69 |
| 50 features | 73.02 | 79.37 | 0.52 | 76.19 | 77.97 | 74.63 |
| All features (66) | 73.02 | 74.60 | 0.48 | 73.81 | 74.19 | 73.44 |
Abbreviations: MCC, Matthews Correlation Coefficient; PPV, Positive prediction value; NPV, Negative prediction value; AROC, Asymptotic receiver operating characteristic.
Figure 3.ROC curves plotted utilizing the fractions of true positives and false positives values derived using top 10 features and all features.
List of top 10 selected features.
| 1 | Solvent inaccessible residues in coil |
| 2 | Frequency of residues (that form hydrogen bond to main chain CO) in helix |
| 3 | Number of cysteines in strand |
| 4 | Physico chemical properties (Refractivity) |
| 5 | Number of cysteines in helix |
| 6 | Frequency of neutral amino acids (THSQ) |
| 7 | Frequency of valine |
| 8 | Frequency of tyrosine |
| 9 | Frequency of tryptophan |
| 10 | Composition of coil |
Example results using the prediction model.
| 1YVS | Barnase | Domain swap |
| 2NZ7 | Caspase-recruitment domain | Domain swap |
| 2OQR | Response regulator RegX3 | Domain swap |
| 2VTY | Novel Bcl-2-Like domain swapped dimer | Domain swap |
| 2B9I | GITRL | Domain swap |
| 3EXM | Cyanovirin-N | Domain swap |
| 2V4N | Sur E | Non swap |
| 2PQM | Cysteine synthase | Non swap |