Harsh Saini1, Gaurav Raicar2, Alok Sharma3, Sunil Lal4, Abdollah Dehzangi5, James Lyons6, Kuldip K Paliwal7, Seiya Imoto8, Satoru Miyano9. 1. University of the South Pacific, Fiji. Electronic address: saini_h@usp.ac.fj. 2. University of the South Pacific, Fiji. Electronic address: raicar_g@usp.ac.fj. 3. University of the South Pacific, Fiji; Griffith University, Brisbane, Australia. Electronic address: sharma_al@usp.ac.fj. 4. University of the South Pacific, Fiji. Electronic address: lal_s@usp.ac.fj. 5. Griffith University, Brisbane, Australia. Electronic address: a.dehzangi@griffith.edu.au. 6. Griffith University, Brisbane, Australia. Electronic address: james.lyons@griffithuni.edu.au. 7. Griffith University, Brisbane, Australia. Electronic address: k.paliwal@griffith.edu.au. 8. Human Genome Center, University of Tokyo, Japan. Electronic address: imoto@ims.u-tokyo.ac.jp. 9. Human Genome Center, University of Tokyo, Japan. Electronic address: miyano@hgc.jp.
Abstract
BACKGROUND: Identification of the tertiary structure (3D structure) of a protein is a fundamental problem in biology which helps in identifying its functions. Predicting a protein׳s fold is considered to be an intermediate step for identifying the tertiary structure of a protein. Computational methods have been applied to determine a protein׳s fold by assembling information from its structural, physicochemical and/or evolutionary properties. METHODS: In this study, we propose a scheme in which a feature extraction technique that extracts probabilistic expressions of amino acid dimers, which have varying degree of spatial separation in the primary sequences of proteins, from the Position Specific Scoring Matrix (PSSM). SVM classifier is used to create a model from extracted features for fold recognition. RESULTS: The performance of the proposed scheme is evaluated against three benchmarked datasets, namely the Ding and Dubchak, Extended Ding and Dubchak, and Taguchi and Gromiha datasets. CONCLUSIONS: The proposed scheme performed well in the experiments conducted, providing improvements over previously published results in literature.
BACKGROUND: Identification of the tertiary structure (3D structure) of a protein is a fundamental problem in biology which helps in identifying its functions. Predicting a protein׳s fold is considered to be an intermediate step for identifying the tertiary structure of a protein. Computational methods have been applied to determine a protein׳s fold by assembling information from its structural, physicochemical and/or evolutionary properties. METHODS: In this study, we propose a scheme in which a feature extraction technique that extracts probabilistic expressions of amino acid dimers, which have varying degree of spatial separation in the primary sequences of proteins, from the Position Specific Scoring Matrix (PSSM). SVM classifier is used to create a model from extracted features for fold recognition. RESULTS: The performance of the proposed scheme is evaluated against three benchmarked datasets, namely the Ding and Dubchak, Extended Ding and Dubchak, and Taguchi and Gromiha datasets. CONCLUSIONS: The proposed scheme performed well in the experiments conducted, providing improvements over previously published results in literature.