| Literature DB >> 29307143 |
Abstract
Metal binding proteins or metallo-proteins are important for the stability of the protein and also serve as co-factors in various functions like controlling metabolism, regulating signal transport, and metal homeostasis. In structural genomics, prediction of metal binding proteins help in the selection of suitable growth medium for overexpression's studies and also help in obtaining the functional protein. Computational prediction using machine learning approach has been widely used in various fields of bioinformatics based on the fact all the information contains in amino acid sequence. In this study, random forest machine learning prediction systems were deployed with simplified amino acid for prediction of individual major metal ion binding sites like copper, calcium, cobalt, iron, magnesium, manganese, nickel, and zinc.Entities:
Keywords: amino acid sequence; binding sites; machine learning; proteins
Year: 2017 PMID: 29307143 PMCID: PMC5769865 DOI: 10.5808/GI.2017.15.4.162
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1Construction of dataset used for prediction.
The 18 variables, obtained by merging three simplified alphabets of amino acid residues used to represent protein sequences
| Variable | Residues |
|---|---|
| V1 | CMQLEKRA |
| V2 | P |
| V3 | ND |
| V4 | G |
| V5 | HWFY |
| V6 | S |
| V7 | TIV |
| V8 | CFILMVW |
| V9 | AG |
| V10 | PH |
| V11 | EDRK |
| V12 | NQSTY |
| V13 | FWY |
| V14 | CILMV |
| V15 | H |
| V16 | ST |
| V17 | EDNQ |
| V18 | KR |
Overall prediction performance of the classifier in predicting individual metal ion binding sites
| Metal | Sensitivity | Specificity | Mathews correlation | Accuracy |
|---|---|---|---|---|
| Ca | 0.769 | 0.739 | 0.507 | 0.754 |
| Co | 0.884 | 0.823 | 0.708 | 0.853 |
| Cu | 0.746 | 0.815 | 0.563 | 0.781 |
| Fe | 0.772 | 0.740 | 0.512 | 0.756 |
| Mg | 0.766 | 0.714 | 0.481 | 0.740 |
| Mn | 0.729 | 0.647 | 0.378 | 0.688 |
| Ni | 0.945 | 0.869 | 0.817 | 0.907 |
| Zn | 0.740 | 0.640 | 0.382 | 0.690 |
Feature selection of variables in improving the performance of copper ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.746 | 0.815 | 0.781 | 0.563 |
| AG | 0.762 | 0.809 | 0.786 | 0.571 |
| CMQLEKRA | 0.794 | 0.804 | 0.799 | 0.599 |
| NQSTY | 0.779 | 0.814 | 0.796 | 0.593 |
| EDNQ | 0.796 | 0.797 | 0.796 | 0.592 |
| CFILMVW | 0.785 | 0.803 | 0.794 | 0.588 |
| TIV | 0.785 | 0.798 | 0.792 | 0.583 |
| PH | 0.774 | 0.801 | 0.788 | 0.576 |
Feature selection of variables in improving the performance of calcium ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.769 | 0.738 | 0.754 | 0.507 |
| P | 0.783 | 0.758 | 0.770 | 0.541 |
| EDNQ | 0.788 | 0.751 | 0.770 | 0.541 |
| EDRK | 0.796 | 0.758 | 0.777 | 0.554 |
| PH | 0.785 | 0.756 | 0.770 | 0.541 |
| CILMV | 0.801 | 0.754 | 0.777 | 0.556 |
| AG | 0.790 | 0.749 | 0.770 | 0.539 |
| CFILMVW | 0.789 | 0.765 | 0.777 | 0.554 |
| NQSTY | 0.785 | 0.767 | 0.776 | 0.552 |
| CMQLEKRA | 0.780 | 0.765 | 0.772 | 0.545 |
Feature selection of variables in improving the performance of cobalt ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.884 | 0.823 | 0.853 | 0.708 |
| CILMV | 0.903 | 0.842 | 0.872 | 0.747 |
| CFILMVW | 0.899 | 0.837 | 0.868 | 0.737 |
| ND | 0.894 | 0.828 | 0.861 | 0.724 |
| EDNQ | 0.884 | 0.833 | 0.858 | 0.717 |
| PH | 0.894 | 0.847 | 0.870 | 0.741 |
| ST | 0.903 | 0.837 | 0.870 | 0.742 |
| NQSTY | 0.860 | 0.833 | 0.846 | 0.693 |
Feature selection of variables in improving the performance of iron ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.772 | 0.740 | 0.756 | 0.512 |
| NQSTY | 0.778 | 0.731 | 0.754 | 0.509 |
| S | 0.786 | 0.727 | 0.757 | 0.514 |
| PH | 0.786 | 0.724 | 0.755 | 0.511 |
| CMQLEKRA | 0.785 | 0.720 | 0.753 | 0.507 |
| CFILMVW | 0.787 | 0.734 | 0.761 | 0.523 |
| AG | 0.790 | 0.720 | 0.755 | 0.511 |
| TIV | 0.780 | 0.725 | 0.753 | 0.507 |
| HWFY | 0.790 | 0.735 | 0.762 | 0.525 |
Feature selection of variables in improving the performance of magnesium ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.766 | 0.714 | 0.740 | 0.481 |
| ST | 0.779 | 0.714 | 0.746 | 0.494 |
| ND | 0.774 | 0.720 | 0.747 | 0.494 |
| NQSTY | 0.767 | 0.717 | 0.742 | 0.485 |
| S | 0.772 | 0.711 | 0.742 | 0.484 |
| HWFY | 0.770 | 0.716 | 0.743 | 0.487 |
| PH | 0.777 | 0.709 | 0.743 | 0.487 |
| CMQLEKRA | 0.775 | 0.708 | 0.741 | 0.484 |
Feature selection of variables in improving the performance of manganese ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.729 | 0.647 | 0.688 | 0.378 |
| FWY | 0.731 | 0.717 | 0.734 | 0.474 |
| EDNQ | 0.741 | 0.656 | 0.698 | 0.398 |
| CMQLEKRA | 0.750 | 0.647 | 0.698 | 0.399 |
| AG | 0.750 | 0.643 | 0.697 | 0.396 |
| S | 0.739 | 0.660 | 0.700 | 0.400 |
Feature selection of variables in improving the performance of nickel ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.945 | 0.869 | 0.907 | 0.817 |
| EDRK | 0.950 | 0.887 | 0.918 | 0.838 |
| G | 0.931 | 0.892 | 0.917 | 0.824 |
| NQSTY | 0.923 | 0.887 | 0.905 | 0.810 |
| ST | 0.941 | 0.878 | 0.909 | 0.821 |
| EDNQ | 0.936 | 0.865 | 0.900 | 0.803 |
| FWY | 0.918 | 0.860 | 0.889 | 0.780 |
| HWFY | 0.931 | 0.865 | 0.898 | 0.800 |
| TIV | 0.927 | 0.869 | 0.898 | 0.797 |
Feature selection of variables in improving the performance of zinc metal ion prediction against proteins that lack metal ions
| Variable removed | Average sensitivity | Average specificity | Average accuracy | Average Mathews correlation |
|---|---|---|---|---|
| None | 0.740 | 0.640 | 0.690 | 0.382 |
| HWFY | 0.751 | 0.638 | 0.695 | 0.391 |
| CMQLEKRA | 0.750 | 0.636 | 0.692 | 0.386 |
| AG | 0.747 | 0.638 | 0.693 | 0.388 |
| ST | 0.743 | 0.644 | 0.693 | 0.389 |
| EDNQ | 0.743 | 0.636 | 0.689 | 0.381 |
Fig. 2The performance graph of the Random forest classifier using feature selection (10-fold cross validation for cobalt ion prediction).