| Literature DB >> 26384938 |
Altan Kara1, Martin Vickers2, Martin Swain3, David E Whitworth4, Narcis Fernandez-Fuentes5.
Abstract
BACKGROUND: Two component systems (TCS) are signalling complexes manifested by a histidine kinase (receptor) and a response regulator (effector). They are the most abundant signalling pathways in prokaryotes and control a wide range of biological processes. The pairing of these two components is highly specific, often requiring costly and time-consuming experimental characterisation. Therefore, there is considerable interest in developing accurate prediction tools to lessen the burden of experimental work and cope with the ever-increasing amount of genomic information.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26384938 PMCID: PMC4575426 DOI: 10.1186/s12859-015-0741-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic representation of MetaPred2CS. Individual predictions are performed for given pairs of HK and RR (a). The prediction scores are then used as the input vector for the SVM (b) trained in the P+ and P- sets (c). Finally, prediction scores are scaled from 0 to 1 (d)
AUC values of predictions by individual methods for the P+/P-, NP+/P- and OP+/P- datasets. GN and GO methods were not included the AUC comparison given the large genomic distance between pairs on the P- dataset that made the predictions unfeasible
| Datasets | AUC Values of Individual Methods | |||||
|---|---|---|---|---|---|---|
| i2h | MT | GF | PP | GN | GO | |
| P+/P- | 0.84 | 0.66 | 0.58 | 0.57 | N/A | N/A |
| NP+/P- | 0.90 | 0.69 | 0.60 | 0.55 | N/A | N/A |
| OP+/P- | 0.78 | 0.63 | 0.55 | 0.59 | N/A | N/A |
Combinations of prediction methods and prediction performance at 10-fold cross-validation. 1: i2h not included, 2: MT not included, 3: GF not included, 4: PP not included, 5: GN not included, 6: GO not included, 7: GN and GO not included, 8: all methods included. AUC and MCC represent the area under the ROC curved and Matthew’s correlation coefficient respectively
| Combinations | AUC Values | MCC Values |
|---|---|---|
| 1 : i2h method excluded | 88.86 | 0.401 |
| 2 : MT method excluded | 94.69 | 0.500 |
| 3 : GF method excluded | 94.45 | 0.484 |
| 4 : PP method excluded | 91.89 | 0.414 |
| 5 : GN method excluded | 94.04 | 0.454 |
| 6 : GO method excluded | 94.76 | 0.504 |
| 7 : GN/GO methods excluded | 90.15 | 0.408 |
| 8 : all methods included | 94.79 | 0.508 |
Performance of default predictor on species-specific gene sets. Sensitivity, specificity, accuracy and MCC values are presented, as defined in the text
| Species used as test data | Performance of Classifier | |||
|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | MCC | |
|
| 0.82 | 0.86 | 0.85 | 0.607 |
|
| 0.92 | 0.87 | 0.87 | 0.582 |
|
| 0.81 | 0.86 | 0.77 | 0.477 |
|
| 0.75 | 0.89 | 0.88 | 0.476 |
Prediction performance of default predictor on neighbouring and orphan pairs. AUC and MCC values for MetaPred2CS trained on the NP+/P- and OP+/P- datasets at different level of K-fold cross-validation
| Dataset | Performance of Classifier According to Cross-validation Levels | |||||
|---|---|---|---|---|---|---|
| 5-fold | 10-fold | 20-fold | ||||
| AUC | MCC | AUC | MCC | AUC | MCC | |
| NP+/P- | 98.79 | 0.639 | 98.40 | 0.639 | 98.75 | 0.634 |
| OP+/P- | 90.28 | 0.409 | 89.36 | 0.407 | 90.31 | 0.410 |
Fig. 2ROC curves of predictions on the NP+/P-, P+/P- and OP+/P- datasets using default predictor. Blue, black and red ROC curves represent the performance on the NP+/P-, P+/P- and OP+/P- datasets, respectively
Prediction of the T dataset by the Bayesian approach [32] and MetaPred2CSc. Non-interacting protein pairs are marked with an asterisk and best predictions are highlighted in bold
| Type(a) | Protein Pairs | Bayesian Approach | MetaPred2CS |
|---|---|---|---|
| IT | CC0248 - CC0247 |
| 0.894 |
| IT | CC0289 - CC0294 |
| 0.633 |
| IT | CC2755 - CC2757 |
| 0.164 |
| IT | CC2765 - CC2766 |
| 0.852 |
| IT | CC2932 - CC2931 | 0.945 |
|
| IT | CenK - CenR |
| 0.491 |
| IT | CckN - DivK | 0.306 |
|
| IT | ChpT - CtrA | 0.197 |
|
| IT | ChpT - CpdR | 0.001 |
|
| IT | DivJ - CtrA | 0.461 |
|
| IT | DivJ - PleD | 0.385 |
|
| IT | DivJ - DivK | 0.041 |
|
| IT | DivL - DivK | 0.537 |
|
| IT | DivL - CtrA | 0.130 |
|
| IT | PleC - DivK | 0.080 |
|
| IT | PleC - PleD | 0.001 |
|
| NI | ChpT - CC3477* | 0.607 |
|
| NI | ChpT - CC2757* | 0.128 |
|
| NI | ChpT - CenR* | 0.067 |
|
| NI | PleC - CtrA* |
| 0.022 |
| NI | PleC - CC3477* | 0.001 |
|
(a)IT: interacting pair; NI: non-interacting pair
Fig. 3ROC curves of predictions on the SP+/SP- datasets. Red, blue and green ROC curves represent predictions by MetaPred2CS, STRING [33], and the Bayesian approach of Burger and van Nimwegen [21], respectively