| Literature DB >> 32517331 |
Jianzhao Gao1, Hong Wei1, Alberto Cano2, Lukasz Kurgan2.
Abstract
Computational prediction of ion channels facilitates the identification of putative ion channels from protein sequences. Several predictors of ion channels and their types were developed in the last quindecennial. While they offer reasonably accurate predictions, they also suffer a few shortcomings including lack of availability, parallel prediction mode, single-label prediction (inability to predict multiple channel subtypes), and incomplete scope (inability to predict subtypes of the voltage-gated channels). We developed a first-of-its-kind PSIONplusm method that performs sequential multi-label prediction of ion channels and their subtypes for both voltage-gated and ligand-gated channels. PSIONplusm sequentially combines the outputs produced by three support vector machine-based models from the PSIONplus predictor and is available as a webserver. Empirical tests show that PSIONplusm outperforms current methods for the multi-label prediction of the ion channel subtypes. This includes the existing single-label methods that are available to the users, a naïve multi-label predictor that combines results produced by multiple single-label methods, and methods that make predictions based on sequence alignment and domain annotations. We also found that the current methods (including PSIONplusm) fail to accurately predict a few of the least frequently occurring ion channel subtypes. Thus, new predictors should be developed when a larger quantity of annotated ion channels will be available to train predictive models.Entities:
Keywords: ion channel; ion channel type; ligand-gated ion channel; multi-label prediction; sequential prediction; voltage-gated ion channel
Mesh:
Substances:
Year: 2020 PMID: 32517331 PMCID: PMC7355608 DOI: 10.3390/biom10060876
Source DB: PubMed Journal: Biomolecules ISSN: 2218-273X
Summary of the benchmark multi-label dataset.
| Protein Type | Ion Channel Type | Ion Channel Subtype | Number of Proteins |
|---|---|---|---|
| Ion channels | Voltage-gated | Sodium (Na+) | 19 |
| Potassium (K+) | 26 | ||
| Calcium (Ca2+) | 28 | ||
| Anions | 22 | ||
| Ligand-gated | Sodium (Na+) | 20 | |
| Potassium (K+) | 18 | ||
| Calcium (Ca2+) | 41 | ||
| Anions | 6 | ||
| Non-ion channels (other types of membrane proteins) | 111 | ||
| Total number of proteins | 221 | ||
Figure 1Parallel (A), sequential (B) and complete sequential (C) prediction of the ion channels and their types and subtypes. The solid lines denote inputs while dashed black lines denote putative annotations generated by predictive models.
Figure 2Flowchart of the prediction protocol implemented by PSIONplusm.
Coverage by Pfam domains and the average rate of correct predictions for the domain-based prediction of ion channels and their subtypes. The right-most column is the average of the correct prediction rates across proteins with a given label, where the rate is computed as the number of correctly predicted labels divided by the number of all predicted labels.
| Prediction Target/Label | % of Proteins with at Least One Pfam Domain | Average Rate of Correct Predictions on Benchmark Dataset | |
|---|---|---|---|
| PSIONplus Training Dataset | Benchmark Dataset | ||
| Non-ion channels | 93.0% | 95.5% | 92.8% |
| Voltage-gated sodium channels | 100.0% | 94.7% | 58.9% |
| Voltage-gated potassium channels | 100.0% | 96.2% | 49.4% |
| Voltage-gated calcium channels | 96.6% | 96.4% | 14.3% |
| Voltage-gated anion channels | 90.9% | 77.3% | 13.6% |
| Ligand-gated sodium channels | 100.0% | 100.0% | 64.8% |
| Ligand-gated potassium channels | 100.0% | 100.0% | 71.7% |
| Ligand-gated calcium channels | 100.0% | 100.0% | 4.9% |
| Ligand-gated anion channels | 100.0% | 100.0% | 0.0% |
Evaluation of the sequential prediction of ion channels and their subtypes on the benchmark dataset. The random predictor is implemented by shuffling the actual labels; we report the average based on 1000 repetitions. The IonchannelPred2.0+PSIONplus is a multi-label prediction that combines outputs generated by these two methods. The best values for each row (a given quality index and outcome) are shown in bold font.
| Prediction Target/Label | Measure | Predictors | ||||
|---|---|---|---|---|---|---|
| Random | IonchanPred2.0 | PSIONplus | IonchanPred2.0+ PSIONplus | PSIONplusm | ||
| Overall (multi-label prediction of ion channels and their types) | F1 | 31.6 | 40.3 +/− a | 54.1 +/− | 52.5 +/− |
|
| Accuracy | 30.6 | 37.3 +/− |
| 46.6 +/− | 47.1 + | |
| Precision | 31.6 | 43.9 +/− |
| 53.4 +/= | 53.4 + | |
| Recall | 31.6 | 37.3 +/− | 50.2 +/− | 51.6 +/− |
| |
| Ion vs. Non-ion channels | F1 | 50.2 | 70.4 +/− |
|
|
|
| Precision | 0.2 | 81.2 +/= |
|
|
| |
| Recall | 50.2 | 62.2 +/− |
|
|
| |
| Voltage-gated sodium channels | F1 | 8.6 | 0.0 =/- | 0.0 =/- | 0.0 =/- |
|
| Precision |
| 0.0 =/- | 0.0 =/- | 0.0 =/- | 7.0 = | |
| Recall | 8.6 | 0.0 =/- | 0.0 =/- | 0.0 =/- |
| |
| Voltage-gated potassium channels | F1 | 11.8 | 34.3 +/− |
| 38.3 +/− | 43.6 + |
| Precision | 11.8 | 22.8 =/- |
| 24.5 +/− | 29.3 + | |
| Recall | 11.8 | 69.2 +/− | 84.6 +/= |
| 84.6 + | |
| Voltage-gated calcium channels | F1 | 12.7 | 22.6 =/- | 26.4 +/− | 31.0 +/− |
|
| Precision | 12.7 | 24.0 =/= |
| 25.6 +/− | 27.1 + | |
| Recall | 12.7 | 21.4 =/- | 25.0 =/- | 39.3 +/− |
| |
| Voltage-gated anion channels | F1 | 10.0 | 19.4 =/- | 27.6 +/− | 26.7 +/− |
|
| Precision | 10.0 | 33.3 +/= |
| 50.0 +/+ | 26.1 + | |
| Recall | 10.0 | 13.6 =/- | 18.2 =/- | 18.2 =/- |
| |
| Ligand-gated sodium channels | F1 | 9.0 | 0.0 =/= |
|
| 6.6 = |
| Precision | 9.0 | 0.0 =/- |
|
| 4.9 = | |
| Recall | 9.0 | 0.0 =/- | 5.0 =/- | 5.0 =/- |
| |
| Ligand-gated potassium channels | F1 |
| 7.7 =/+ | 5.7 =/+ | 5.4 =/+ | 3.4 = |
| Precision | 8.1 |
| 5.9 =/+ | 5.3 =/+ | 2.4 = | |
| Recall |
| 5.6 =/= | 5.6 =/= | 5.6 =/= | 5.6 = | |
| Ligand-gated calcium channels | F1 | 18.5 | 0.0 −/− |
| 48.5 +/− | 53.7 + |
| Precision | 18.5 | 0.0 −/− |
| 64.0 +/+ | 53.7 + | |
| Recall | 18.5 | 0.0 −/− | 39.0 +/− | 39.0 +/− |
| |
| Ligand-gated anion channels | F1 |
| 0.0 =/= | 0.0 =/= | 0.0 =/= | 0.0 = |
| Precision |
| 0.0 =/= | 0.0 =/= | 0.0 =/= | 0.0 = | |
| Recall |
| 0.0 =/= | 0.0 =/= | 0.0 =/= | 0.0 = | |
a we report statistical significance of the differences between the random prediction and each of the four predictors of ion channels, and also between the PSIONplusm and the other three predictors of ion channels, where +, −, and = denote that a given predictor is significantly, significantly worse, and not significantly different to the other method. For instance, +/− for the overall prediction and F1 for IonchannelPred 2.0 means F1 of IonchannelPred 2.0 is significantly better than the F1 of the random predictor and significantly worse than the F1 of PSIONplusm. Comparison to the random predictor is based on 99.9% confidence interval over the 1000 repetitions (p-value < 0.001). Comparison with PSIONplusm is based on 100 tests on randomly selected 50% of the benchmark proteins to ensure that the differences are robust across a diverse set of datasets. The significance was measured using paired t-test and the differences are assumed significant if p-value < 0.001.
Figure 3Predictive performance for the sequential prediction with PSIONplusm, IonchanPred2.0, PSIONplus, combination of predictions from IonchanPred2.0 and PSIONplus (IonchanPred2.0+PSIONplus) and the random predictor (implemented by shuffling of actual labels) on the benchmark dataset. Panel (A) summarizes the values of F1, precision, and recall metrics for the multi-label prediction of channels and their types. Panel (B) shows F1 values for individual outcomes including the prediction of ion channels and their 8 types/subtypes. Annotations above the bars denote the statistical significance of the differences between the random prediction and each of the four predictors, where +, −, and = denote that a given predictor is significantly better, significantly worse, and not significantly different to the random predictor. The thick horizontal black lines identify the ion channel predictors that outperform the random predictor and which are statistically significantly better than the other channel predictors for a given label. We assume that the difference is significant when p-value < 0.001. Calculation of significance is explained in the footnote in Table 3.
Figure 4Web interface of the PSIONplusm webserver at https://yanglab.nankai.edu.cn/PSIONplusm/. Panel (A) shows the main page of the webserver while panel (B) shows an example output produced by the webserver.