| Literature DB >> 32321420 |
Munira Alballa1, Gregory Butler2,3.
Abstract
BACKGROUND: Membrane transport proteins (transporters) play an essential role in every living cell by transporting hydrophilic molecules across the hydrophobic membranes. While the sequences of many membrane proteins are known, their structure and function is still not well characterized and understood, owing to the immense effort needed to characterize them. Therefore, there is a need for advanced computational techniques takes sequence information alone to distinguish membrane transporter proteins; this can then be used to direct new experiments and give a hint about the function of a protein.Entities:
Keywords: Amino acid composition; Ensemble learning; Transporter prediction
Mesh:
Substances:
Year: 2020 PMID: 32321420 PMCID: PMC7178945 DOI: 10.1186/s12859-019-3311-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1TooT-T overview
The dataset
| Class | Training dataset | Testing dataset |
|---|---|---|
| Transporter | 780 | 120 |
| Non-Transporter | 600 | 60 |
| Total | 1380 | 180 |
Different Blast thresholds on TCDB
| Name | BLAST Threshold | Motivation |
|---|---|---|
| TCDB_exact | e-value=0; percent identity 100% | exact match |
| TCDB_high | e-value 1e–20; percent identity 40%; query coverage 70%; subject coverage 70%; and difference in length of 10% | thresholds recommended by Butler et al. [ |
| TCDB_med | e-value 1e–8% | threshold recommended by Barghash et al. [ |
Average performance of different models
| Name | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | |
|---|---|---|---|---|---|
| SVM | psiPAAC* | 86.73 ±0.29 | 87.99 ±0.54 | 87.29 ±0.11 | 0.7448 ±0.0027 |
| blast-PAAC | 87.03 ±0.37 | 86.08 ±0.24 | 86.62 ±0.22 | 0.7299 ±0.0045 | |
| psiAAC* | 82.69 ±0.21 | 90.64 ±0.41 | 86.13 ±0.15 | 0.7278 ±0.0036 | |
| psiPseAAC* | 80.18 ±0.58 | 91.51 ±0.45 | 85.13 ±0.40 | 0.7125 ±0.0075 | |
| blast-AAC | 84.97 ±0.35 | 84.14 ±0.52 | 84.61 ±0.22 | 0.6897 ±0.0050 | |
| PSSM | 83.83 ±0.59 | 82.03 ±0.59 | 83.06 ±0.21 | 0.6579 ±0.0038 | |
| blast-PseAAC | 84.59 ±0.53 | 78.19 ±0.82 | 81.81 ±0.35 | 0.6306 ±0.0077 | |
| PseAAC | 80.45 ±0.42 | 70.62 ±0.70 | 76.19 ±0.44 | 0.5149 ±0.0098 | |
| AAC | 79.73 ±0.50 | 70.66 ±0.89 | 75.79 ±0.51 | 0.5069 ±0.0101 | |
| PAAC | 77.93 ±0.31 | 72.14 ±0.56 | 75.41 ±0.31 | 0.5014 ±0.0062 |
The table shows mean±sd performance of ten different runs of the 10-CV, in ascending order of accuracy. The asterisk symbol (*) refers to the features used in TooT-T
Impact of incorporating evolutionary information on the accuracy
| Encoding | Accuracy(%) | blast-X to X | psi-X to X | psi-X to blast-X | ||
|---|---|---|---|---|---|---|
| X | X | blast-X | psi-X | increase(%) | increase(%) | increase(%) |
| AAC | 75.79 | 84.61 | 86.13 | + 08.82 | + 10.34 | + 01.52 |
| PAAC | 75.41 | 86.62 | 87.29 | + 11.21 | + 11.88 | + 00.67 |
| PseAAC | 76.19 | 81.81 | 85.13 | + 05.62 | + 08.94 | + 03.32 |
| Average | 75.80 | 84.35 | 86.18 | + 08.55 | + 10.39 | + 01.84 |
The table notes differences in accuracy and the percentage of improvement when incorporating different evolutionary information to the baseline compositions. The highest improvement in accuracy was achieved by psi-compositions, with an average improvement of 10.39%
Performance of annotation transfer by homology
| Name | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | |
|---|---|---|---|---|---|
| ATH | |||||
| TCDB_exact | 56.92 | 95.17 | 73.55 | 0.5440 | |
| TCDB_high | 85.90 | 85.50 | 85.72 | 0.7112 | |
| TCDB_med | 90.38 | 64.17 | 78.98 | 0.5737 |
The table shows the performance homology annotation transfer with the training dataset using different thresholds. The best prediction power was achieved using the TCDB_high threshold. The predicted transporter from TCDB_exact was more reliable due to its high specificity. ATH= Annotation Transfer by Homology
Cross-validation performance of the proposed model
| name | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | |
|---|---|---|---|---|---|
| SVM | |||||
| psiAAC | 82.69 ±00.21 | 90.64 ±00.41 | 86.13 ±00.15 | 0.7278 ±0.0036 | |
| psiPAAC | 86.73 ±00.29 | 87.99 ±00.54 | 87.29 ±00.11 | 0.7448 ±0.0027 | |
| psiPseAAC | 80.43 ±00.43 | 91.47 ±00.46 | 85.23 ±00.34 | 0.7142 ±0.0069 | |
| ATH | |||||
| TCDB_exact | 56.92 | 95.17 | 73.55 | 0.5440 | |
| TCDB_high | 85.90 | 85.50 | 85.72 | 0.7112 | |
| TCDB_med | 90.38 | 64.17 | 78.98 | 0.5737 | |
| Proposed_Ensemble* | 90.15 ±00.24 | 89.97 ±00.34 | 90.07 ±00.07 | 0.7995 ±0.001 | |
The table lists the mean±sd performance of ten different runs of the 10-CV of the proposed ensemble. It also shows the performance of each of its constituent classifiers
*The proposed model; ATH = Annotation Transfer by Homology
Independent testing performance of the proposed model
| name | Sensitivity (%) | Specificity (%) | Accuracy (%) | MCC | |
|---|---|---|---|---|---|
| SVM | |||||
| psiAAC | 83.33 | 95.00 | 87.22 | 0.75 | |
| psiPAAC | 89.17 | 88.33 | 88.89 | 0.76 | |
| psiPseAAC | 80.00 | 96.67 | 85.56 | 0.73 | |
| ATH | |||||
| TCDB_exact | 56.67 | 91.67 | 68.33 | 0.46 | |
| TCDB_high | 86.67 | 80.00 | 84.44 | 0.66 | |
| TCDB_med | 92.5 | 58.33 | 81.11 | 0.56 | |
| Proposed_Ensemble* | 94.17 | 88.33 | 92.22 | 0.82 | |
The table shows the performance of the proposed ensemble and each of its constituent classifiers
*The proposed model; ATH = Annotation Transfer by Homology
Pearson correlation coefficient of constituent classifiers
| model | psiAAC | psiPAAC | psiPseAAC | TCDB_exact | TCDB_high | TCDB_med |
|---|---|---|---|---|---|---|
| psiAAC | 1.00 | 0.81 | 0.90 | 0.56 | 0.63 | 0.52 |
| psiPAAC | 0.81 | 1.00 | 0.80 | 0.51 | 0.61 | 0.50 |
| psiPseAAC | 0.90 | 0.80 | 1.00 | 0.55 | 0.62 | 0.52 |
| TCDB_exact | 0.56 | 0.51 | 0.55 | 1.00 | 0.65 | 0.51 |
| TCDB_high | 0.63 | 0.61 | 0.62 | 0.65 | 1.00 | 0.78 |
| TCDB_med | 0.52 | 0.50 | 0.52 | 0.51 | 0.78 | 1.00 |
The table shows the correlation between the constituent classifiers of the ensemble. Among themselves, the homology annotation transfer exhibit a lower correlation compared to those of the machine-learning models. This lower correlation motivates the use of ensemble techniques and helps to build a more powerful model
Comparison with other published work
| Sensitivity(%) | Specificity (%) | Accuracy (%) | MCC | |||||
|---|---|---|---|---|---|---|---|---|
| Ind. | CV | Ind. | CV | Ind. | CV | Ind. | CV | |
| 80.00 | 83.76 | 68.33 | 77.68 | 76.11 | 81.12 | 0.47 | 0.62 | |
| TrSSP [ | 76.67 | 76.67 | 81.67 | 78.46 | 80.00 | 78.99 | 0.57 | 0.58 |
| Ou et al. [ | 100.00 | 83.14 | 77.50 | 84.48 | 85.00 | 83.94 | 0.73 | 0.68 |
| Proposed model | 94.17 | 90.15 | 88.33 | 89.97 | 92.22 | 90.07 | 0.82 | 0.80 |
| Li et al. [ | 96.67 | 99.50 | 95.83 | 97.44 | 96.11 | 98.33 | 0.91 | 0.97 |