| Literature DB >> 18312695 |
M Michael Gromiha1, Yukimitsu Yabuki.
Abstract
BACKGROUND: Discriminating membrane proteins based on their functions is an important task in genome annotation. In this work, we have analyzed the characteristic features of amino acid residues in membrane proteins that perform major functions, such as channels/pores, electrochemical potential-driven transporters and primary active transporters.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18312695 PMCID: PMC2375119 DOI: 10.1186/1471-2105-9-135
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Amino acid composition in channels/pores, electrochemical and active transporters
| Residue | Channels/pores | Electrochemical | Active |
| Ala | 0.59 | 0.64 | 0.65 |
| Asp | 0.39 | 0.43 | 0.26 |
| Cys | 0.80 | 0.81 | 0.69 |
| Glu | 0.48 | 0.56 | 0.47 |
| Phe | 0.63 | 0.61 | 0.60 |
| Gly | 0.48 | 0.56 | 0.36 |
| His | 0.46 | 0.53 | 0.47 |
| Ile | 0.67 | 0.73 | 0.62 |
| Lys | 0.58 | 0.43 | 0.40 |
| Leu | 0.70 | 0.69 | 0.68 |
| Met | 0.63 | 0.63 | 0.49 |
| Asn | 0.41 | 0.36 | 0.38 |
| Pro | 0.26 | 0.44 | 0.28 |
| Gln | 0.49 | 0.51 | 0.56 |
| Arg | 0.52 | 0.60 | 0.47 |
| Ser | 0.49 | 0.44 | 0.43 |
| Thr | 0.59 | 0.62 | 0.48 |
| Val | 0.67 | 0.70 | 0.55 |
| Trp | 0.76 | 0.63 | 0.68 |
| Tyr | 0.71 | 0.67 | 0.54 |
Membrane propensity of amino acid residues in channels/pores, electrochemical and active transporters
| Residue | Channels/pores | Electrochemical | Active |
| Ala | 7.95 | 9.16 | 8.96 |
| Asp | 5.37 | 3.39 | 4.63 |
| Cys | 1.22 | 1.34 | 0.96 |
| Glu | 5.51 | 3.95 | 5.57 |
| Phe | 4.30 | 5.91 | 4.49 |
| Gly | 7.71 | 7.98 | 7.32 |
| His | 1.91 | 1.70 | 1.66 |
| Ile | 5.60 | 7.46 | 6.63 |
| Lys | 5.39 | 3.80 | 5.26 |
| Leu | 9.30 | 12.08 | 10.77 |
| Met | 2.21 | 2.99 | 2.74 |
| Asn | 5.34 | 3.39 | 4.04 |
| Pro | 4.13 | 4.39 | 4.48 |
| Gln | 4.03 | 2.89 | 3.81 |
| Arg | 4.74 | 3.85 | 4.66 |
| Ser | 7.64 | 7.36 | 6.47 |
| Thr | 5.94 | 5.60 | 5.62 |
| Val | 6.66 | 7.90 | 7.36 |
| Trp | 1.37 | 1.73 | 1.47 |
| Tyr | 3.58 | 3.13 | 3.00 |
Discrimination of channels/pores, electrochemical potential-driven transporters and primary active transporters using different machine learning approaches with amino acid composition as features
| Method | 5-fold cross-validation | |||||||||
| Sensitivity | Precision | F-Measure | Accuracy | |||||||
| F1 | F2 | F3 | F1 | F2 | F3 | F1 | F2 | F3 | (%) | |
| Bayesnet | 0.582 | 0.777 | 0.538 | 0.606 | 0.643 | 0.612 | 0.594 | 0.703 | 0.573 | 62.1 |
| Naive Bayes | 0.496 | 0.823 | 0.534 | 0.626 | 0.597 | 0.606 | 0.554 | 0.692 | 0.568 | 60.7 |
| Logistic function | 0.535 | 0.695 | 0.619 | 0.615 | 0.638 | 0.601 | 0.572 | 0.665 | 0.610 | 61.6 |
| RBF network | 0.543 | 0.735 | 0.625 | 0.640 | 0.666 | 0.603 | 0.587 | 0.699 | 0.614 | 63.3 |
| Support vector machine | 0.469 | 0.757 | 0.642 | 0.675 | 0.620 | 0.603 | 0.553 | 0.682 | 0.622 | 62.4 |
| k-nearest neighbor | 0.525 | 0.707 | 0.572 | 0.586 | 0.588 | 0.615 | 0.554 | 0.642 | 0.593 | 59.8 |
| Bagging meta learning | 0.541 | 0.679 | 0.677 | 0.646 | 0.660 | 0.618 | 0.589 | 0.669 | 0.646 | 63.6 |
| Classification via Regression | 0.492 | 0.695 | 0.630 | 0.599 | 0.628 | 0.599 | 0.540 | 0.660 | 0.614 | 60.8 |
| Decision tree J4.8 | 0.506 | 0.580 | 0.572 | 0.529 | 0.581 | 0.554 | 0.517 | 0.580 | 0.563 | 55.5 |
| NBTree | 0.512 | 0.669 | 0.569 | 0.569 | 0.610 | 0.568 | 0.539 | 0.638 | 0.569 | 68.2 |
| Partial decision tree | 0.473 | 0.649 | 0.550 | 0.544 | 0.551 | 0.568 | 0.506 | 0.596 | 0.559 | 55.6 |
| Jack-knife test | 0.571 | 0.709 | 0.676 | 0.664 | 0.660 | 0.644 | 0.571 | 0.709 | 0.676 | 65.4 |
| Equal data | 0.635 | 0.713 | 0.624 | 0.689 | 0.698 | 0.591 | 0.661 | 0.705 | 0.607 | 65.7 |
F1: channels/pores; F2: electrochemical potential-driven transporters; F3: primary active transporters. Equal data: Results obtained with a dataset of 502 proteins each in all the three classes of transporters.
Discrimination of channels/pores, electrochemical potential-driven transporters and primary active transporters using different machine learning approaches with amino acid occurrence as features
| Method | 5-fold cross-validation | |||||||||
| Sensitivity | Precision | F-Measure | Accuracy | |||||||
| F1 | F2 | F3 | F1 | F2 | F3 | F1 | F2 | F3 | (%) | |
| Bayesnet | 0.329 | 0.735 | 0.567 | 0.554 | 0.515 | 0.572 | 0.413 | 0.606 | 0.569 | 54.6 |
| Naive Bayes | 0.202 | 0.757 | 0.575 | 0.477 | 0.512 | 0.534 | 0.284 | 0.611 | 0.554 | 51.8 |
| Logistic function | 0.533 | 0.713 | 0.705 | 0.689 | 0.717 | 0.604 | 0.601 | 0.715 | 0.651 | 65. |
| RBF network | 0.247 | 0.727 | 0.633 | 0.486 | 0.593 | 0.530 | 0.328 | 0.654 | 0.577 | 54.6 |
| Support vector machine | 0.163 | 0.727 | 0.826 | 0.847 | 0.705 | 0.529 | 0.273 | 0.716 | 0.645 | 60.0 |
| k-nearest neighbor | 0.629 | 0.705 | 0.640 | 0.634 | 0.683 | 0.651 | 0.632 | 0.694 | 0.646 | 65.6 |
| Bagging meta learning | 0.553 | 0.685 | 0.737 | 0.676 | 0.733 | 0.625 | 0.608 | 0.709 | 0.676 | 66.7 |
| Classification via Regression | 0.465 | 0.721 | 0.721 | 0.686 | 0.702 | 0.602 | 0.547 | 0.711 | 0.656 | 64.5 |
| Decision tree J4.8 | 0.543 | 0.625 | 0.555 | 0.526 | 0.592 | 0.593 | 0.534 | 0.609 | 0.574 | 57.2 |
| NBTree | 0.471 | 0.570 | 0.659 | 0.553 | 0.656 | 0.548 | 0.508 | 0.610 | 0.598 | 57.7 |
| Partial decision tree | 0.520 | 0.647 | 0.623 | 0.551 | 0.645 | 0.600 | 0.535 | 0.646 | 0.612 | 60.0 |
| Jack-knife test | 0.500 | 0.703 | 0.729 | 0.639 | 0.749 | 0.607 | 0.561 | 0.726 | 0.663 | 65.4 |
| Equal data | 0.723 | 0.743 | 0.574 | 0.691 | 0.712 | 0.630 | 0.707 | 0.727 | 0.601 | 68.0 |
F1: channels/pores; F2: electrochemical potential-driven transporters; F3: primary active transporters. Equal data: Results obtained with a dataset of 502 proteins each in all the three classes of transporters.
Discrimination accuracy between two different transporters
| 5-fold cross-validation accuracy (%) | |||
| F1 | F2 | F3 | |
| F1 | - | 86.8 | 73.2 |
| F2 | 86.8 | - | 80.5 |
| F3 | 73.2 | 80.5 | - |
| F1 | - | 81.4 | 71.8 |
| F2 | 81.4 | - | 77.1 |
| F3 | 71.8 | 77.1 | - |
Highest accuracy is shown.
F1: channels/pores
F2: electrochemical potential-driven transporters
F3: primary active transporters
Discrimination of channels and pores using different machine learning approaches
| Method | 5-fold cross-validation | ||||
| Sensitivity (%) | Specificity (%) | F-measure | Accuracy (%) | ||
| Channel | Pore | ||||
| Bayesnet | 94.1 | 81.4 | 0.910 | 0.857 | 88.9 |
| Naive Bayes | 92.5 | 88.4 | 0.923 | 0.887 | 90.8 |
| Logistic function | 92.0 | 89.1 | 0.922 | 0.888 | 90.8 |
| Neural network | 93.0 | 91.5 | 0.935 | 0.915 | 92.4 |
| RBF network | 92.5 | 88.4 | 0.923 | 0.887 | 90.8 |
| Support vector machines | 95.2 | 88.4 | 0.937 | 0.905 | 92.4 |
| k-nearest neighbor | 89.8 | 86.8 | 0.903 | 0.862 | 88.6 |
| Bagging meta learning | 89.8 | 83.7 | 0.894 | 0.844 | 87.3 |
| Classification via Regression | 88.2 | 85.3 | 0.889 | 0.843 | 87.0 |
| Decision tree J4.8 | 86.1 | 78.3 | 0.856 | 0.789 | 82.9 |
| NBTree | 90.9 | 83.7 | 0.899 | 0.850 | 88.0 |
| Partial decision tree | 87.2 | 79.1 | 0.865 | 0.800 | 83.9 |