Literature DB >> 18267306

Oxypred: prediction and classification of oxygen-binding proteins.

S Muthukrishnan1, Aarti Garg, G P S Raghava.   

Abstract

This study describes a method for predicting and classifying oxygen-binding proteins. Firstly, support vector machine (SVM) modules were developed using amino acid composition and dipeptide composition for predicting oxygen-binding proteins, and achieved maximum accuracy of 85.5% and 87.8%, respectively. Secondly, an SVM module was developed based on amino acid composition, classifying the predicted oxygen-binding proteins into six classes with accuracy of 95.8%, 97.5%, 97.5%, 96.9%, 99.4%, and 96.0% for erythrocruorin, hemerythrin, hemocyanin, hemoglobin, leghemoglobin, and myoglobin proteins, respectively. Finally, an SVM module was developed using dipeptide composition for classifying the oxygen-binding proteins, and achieved maximum accuracy of 96.1%, 98.7%, 98.7%, 85.6%, 99.6%, and 93.3% for the above six classes, respectively. All modules were trained and tested by five-fold cross validation. Based on the above approach, a web server Oxypred was developed for predicting and classifying oxygen-binding proteins (available from http://www.imtech.res.in/raghava/oxypred/).

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 18267306      PMCID: PMC5054225          DOI: 10.1016/S1672-0229(08)60012-1

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

Oxygen-binding proteins are widely present in eukaryotes ranging from non-vertebrates to humans (. Moreover, these proteins have also been reported to be present in many prokaryotes and protozoans (. The occurrence of oxygen-binding proteins in all kingdoms of organisms, though not in all organisms, shows their biological importance. Extensive studies on oxygen-binding proteins have categorized them into six different broad types, including erythrocruorin, hemerythrin, hemocyanin, hemoglobin, leghemoglobin, and myoglobin, each has its own functional characteristics and structure with unique oxygen-binding capacity. These oxygen-binding proteins are crucial for the survival of any living organism. With the advancement in sequencing technology, the size of protein sequence databases is growing at an exponential rate. Thus it is much needed to develop bioinformatic methods for functional annotation of proteins, particularly for identifying oxygen-binding proteins 3., 4.. Recently, Lin et al. ( have developed a support vector machine (SVM)-based method for predicting functional classes of metal-binding proteins. However, to the best of our knowledge, no method has been developed specifically for predicting and classifying oxygen-binding proteins. In the present study, we have developed a reliable SVM-based method for predicting and classifying oxygen-binding proteins using different residue compositions.

Results and Discussion

Prediction of oxygen-binding proteins

SVM modules were trained and tested on our dataset of oxygen-binding and non-oxygen-binding proteins. First we developed an SVM module using amino acid composition and achieved a Matthew’s correlation coefficient (MCC) value of 0.71 with 85.5% accuracy when evaluated by five-fold cross validation. It has been shown that dipeptide composition provides more information than simple amino acid composition because dipeptide composition encapsulates local order information (. Thus we developed an SVM module using dipeptide composition and achieved an MCC value of 0.76 with 87.8% accuracy, 88.5% sensitivity, and 87.1% specificity. This result demonstrates that the dipeptide composition-based module performs better than the amino acid composition-based module for the prediction of oxygen-binding proteins.

Classification of oxygen-binding proteins

We classified the predicted oxygen-binding proteins into six classes, including erythrocruorin, hemerythrin, hemocyanin, hemoglobin, leghemoglobin, and myoglobin. It was found that the compositions vary significantly from one class to another (Figure 1), indicating that one class of proteins can be discriminated from other classes based on amino acid composition. Therefore, we developed six SVM modules corresponding to the six classes, respectively. First, we developed amino acid composition-based SVM modules and achieved accuracy from 95.8% to 99.4% with an overall accuracy of 97.2% (Table 1). Then we developed dipeptide composition-based SVM modules and achieved accuracy from 85.6% to 99.6% with an overall accuracy of 95.3% (Table 1). It is interesting to note that here the performance of the amino acid composition-based module is better than that of the dipeptide composition-based module (. This study demonstrates that it is possible to predict and classify oxygen-binding proteins using compositional information (amino acid and dipeptide).
Fig. 1

Average (AVG) amino acid composition of six different classes of oxygen-binding proteins. Amino acids are denoted by their single letter codes.

Table 1

Performance of SVM modules for classifying oxygen-binding proteins

Protein classAccuracy (%)
Amino acid compositionDipeptide composition
Erythrocruorin95.896.1
Hemerythrin97.598.7
Hemocyanin97.598.7
Hemoglobin96.985.6
Leghemoglobin99.499.6
Myoglobin96.093.3

Average97.295.3

Oxypred server

The SVM modules constructed in the present study have been implemented as a web server Oxypred using CGI/Perl script, which is available for academic use at http://www.imtech.res.in/raghava/oxypred/. Users can submit protein sequences in one of the standard formats such as FASTA, GenBank, EMBL, or GCG. The server first predicts oxygen-binding proteins and then classifies them into the six classes.

Materials and Methods

Dataset

We extracted the sequences of oxygen-binding and non-oxygen-binding proteins from Swiss-Prot database (http://www.expasy.org/sprot/) (. In order to obtain a high-quality dataset, we removed all those proteins annotated as “fragments”, “isoforms”, “potentials”, “similarity”, or “probables” 9., 10., and created a non-redundant dataset where no two proteins have a similarity more than 90% using PROSET software (. Our final dataset consisted of 672 oxygen-binding proteins and 700 non-oxygen-binding proteins. These 672 oxygen-binding proteins were then classified into six different classes, consisting of 20 erythrocruorin, 31 hemerythrin, 77 hemocyanin, 486 hemoglobin, 13 leghemoglobin, and 45 myoglobin proteins.

Support vector machine

SVM modules were implemented by a freely downloadable package of SVMlight (http://www.cs.cornell.edu/people/tj/svm_light/). The software enables users to define a number of parameters as well as inbuilt kernel functions such as linear kernel, radial basis function and polynomial kernel (of a given degree). In order to develop the prediction method, we trained SVMs using oxygen-binding proteins as positive labels and non-oxygen-binding proteins as negative labels. For classifying oxygen-binding proteins, we used the one-versus-rest SVM strategy.

Input features and performance evaluation

We used amino acid composition and dipeptide composition as input features. For amino acid composition, a protein is represented by a vector of 20 dimensions, while for dipeptide composition a protein is represented by a vector of 400 dimensions. We used the five-fold cross validation technique to evaluate the performance of SVM modules 12., 13.. The performance of these modules were measured with standard parameters like accuracy, sensitivity, specificity, and MCC (.

Authors’ contributions

SM and AG created datasets, developed various modules, and evaluated all modules. SM and AG also developed the web server. GPSR conceived the idea, coordinated it and refined the manuscript drafted by SM and AG. All authors read and approved the final manuscript.

Competing interests

The authors have declared that no competing interests exist.
  13 in total

1.  ESLpred: SVM-based method for subcellular localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.

Authors:  Manoj Bhasin; G P S Raghava
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

2.  Prediction of alpha-turns in proteins using PSI-BLAST profiles and secondary structure information.

Authors:  Harpreet Kaur; G P S Raghava
Journal:  Proteins       Date:  2004-04-01

3.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000.

Authors:  A Bairoch; R Apweiler
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

4.  Support vector machine-based method for subcellular localization of human proteins using amino acid compositions, their order, and similarity search.

Authors:  Aarti Garg; Manoj Bhasin; Gajendra P S Raghava
Journal:  J Biol Chem       Date:  2005-01-12       Impact factor: 5.157

Review 5.  Recent developments and future prospects of Vitreoscilla hemoglobin application in metabolic engineering.

Authors:  Lei Zhang; Yingjun Li; Zinan Wang; Yang Xia; Wansheng Chen; Kexuan Tang
Journal:  Biotechnol Adv       Date:  2006-11-11       Impact factor: 14.227

6.  BTXpred: prediction of bacterial toxins.

Authors:  Sudipto Saha; Gajendra P S Raghava
Journal:  In Silico Biol       Date:  2007

7.  Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach.

Authors:  H H Lin; L Y Han; H L Zhang; C J Zheng; B Xie; Z W Cao; Y Z Chen
Journal:  BMC Bioinformatics       Date:  2006-12-18       Impact factor: 3.169

8.  AlgPred: prediction of allergenic proteins and mapping of IgE epitopes.

Authors:  Sudipto Saha; G P S Raghava
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

9.  VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition.

Authors:  Sudipto Saha; G P S Raghava
Journal:  Genomics Proteomics Bioinformatics       Date:  2006-02       Impact factor: 7.691

Review 10.  Microbial globins.

Authors:  Guanghui Wu; Laura M Wainwright; Robert K Poole
Journal:  Adv Microb Physiol       Date:  2003       Impact factor: 3.517

View more
  3 in total

1.  Support vector machine (SVM) based multiclass prediction with basic statistical analysis of plasminogen activators.

Authors:  Selvaraj Muthukrishnan; Munish Puri; Christophe Lefevre
Journal:  BMC Res Notes       Date:  2014-01-27

2.  BacHbpred: Support Vector Machine Methods for the Prediction of Bacterial Hemoglobin-Like Proteins.

Authors:  MuthuKrishnan Selvaraj; Munish Puri; Kanak L Dikshit; Christophe Lefevre
Journal:  Adv Bioinformatics       Date:  2016-02-29

3.  Harnessing the evolutionary information on oxygen binding proteins through Support Vector Machines based modules.

Authors:  Selvaraj Muthukrishnan; Munish Puri
Journal:  BMC Res Notes       Date:  2018-05-11
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.