| Literature DB >> 29255385 |
Alireza Meshkin1, Hossein Ghafuri1.
Abstract
Since, it is believed that the native structure of most proteins is defined by their sequences, utilizing data mining methods to extract hidden knowledge from protein sequences, are unavoidable. A major difficulty in mining bioinformatics data is due to the size of the datasets which contain frequently large numbers of variables. In this study, a two-step procedure for prediction of relative solvent accessibility of proteins is presented. In a first "feature selection" step, a small subset of evolutionary information is identified on the basis of selected physicochemical properties. In the second step, support vector regression is used to real value prediction of protein solvent accessibility with these custom selected features of evolutionary information. The experiment results show that the proposed method is an improvement in average prediction accuracy and training time.Entities:
Keywords: PSI-BLAST; feature selection method; physicochemical properties of amino acids; support vector regression
Year: 2010 PMID: 29255385 PMCID: PMC5698889
Source DB: PubMed Journal: EXCLI J ISSN: 1611-2156 Impact factor: 4.068
Table 148 physicochemical properties of amino acid
Figure 1A detailed overview of the proposed method
Table 2Results of subset selection of physicochemical features
Table 3Results of subset selection of evolutionary information
Table 4The total count of selected features
Figure 2Example of predicted RSA values for a protein (PDB code 1ABA)
Table 5Comparison between our method and other reported methods; unreported results are denoted by “-“