| Literature DB >> 23144557 |
Akshay Yadav1, Valadi Krishnamoorthy Jayaraman.
Abstract
The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10.Entities:
Keywords: Cell Adhesion Molecules; Fragment libraries; Function prediction; Protein fragments; Support vector machines
Year: 2012 PMID: 23144557 PMCID: PMC3488839 DOI: 10.6026/97320630008953
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1For illustration, we consider a library of 6 fragements. Each (over lapping) contiguous CA segment in the backbone is associated with its most similar library fragment appeared in the bag.In this example, feature vector = (12, 15, 10, 8, 4, 18), corresponding to the fragments (A, B, C, D, E, F). The last coordinate in the feature vector is +1 for CAMs and -1 for NonCAMs.
Figure 2The variation of 10-fold Cross Validation % accuracy as a function of library size for fragment length 4 (A), fragment length 5; (B), fragment length 7; (C), fragment length 9, (D), fragment length 10; (E), fragment length 11 (F) and fragment length 12 (G). The X axis represents the library size and the Y-axis represents the 10-fold Cross Validation % accuracy.