| Literature DB >> 29343727 |
Ming-Ju Tsai1, Jyun-Rong Wang1, Chi-Dung Yang2,3, Kuo-Ching Kao1, Wen-Lin Huang4, Hsi-Yuan Huang5, Ching-Ping Tseng2, Hsien-Da Huang1,2, Shinn-Ying Ho6,7.
Abstract
Cyclic AMP receptor protein (CRP), a global regulator in Escherichia coli, regulates more than 180 genes via two roles: activation and repression. Few methods are available for predicting the regulatory roles from the binding sites of transcription factors. This work proposes an accurate method PredCRP to derive an optimised model (named PredCRP-model) and a set of four interpretable rules (named PredCRP-ruleset) for predicting and analysing the regulatory roles of CRP from sequences of CRP-binding sites. A dataset consisting of 169 CRP-binding sites with regulatory roles strongly supported by evidence was compiled. The PredCRP-model, using 12 informative features of CRP-binding sites, and cooperating with a support vector machine achieved a training and test accuracy of 0.98 and 0.93, respectively. PredCRP-ruleset has two activation rules and two repression rules derived using the 12 features and the decision tree method C4.5. This work further screened and identified 23 previously unobserved regulatory interactions in Escherichia coli. Using quantitative PCR for validation, PredCRP-model and PredCRP-ruleset achieved a test accuracy of 0.96 (=22/23) and 0.91 (=21/23), respectively. The proposed method is suitable for designing predictors for regulatory roles of all global regulators in Escherichia coli. PredCRP can be accessed at https://github.com/NctuICLab/PredCRP .Entities:
Mesh:
Substances:
Year: 2018 PMID: 29343727 PMCID: PMC5772556 DOI: 10.1038/s41598-017-18648-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Components for developing and evaluating the proposed PredCRP method. (1) establishment of the CRPS dataset consisting of 169 CRP-binding sites with regulatory roles supported by strong evidence, (2) feature extraction from the training dataset CPRS-TRN, (3) feature selection in cooperation with an SVM, (4) PredCRP-ruleset obtained based on the decision tree method C4.5, (5) PredCRP-model evaluated by CRP-binding sties with weak evidence (the CRPW dataset) and (6) the qPCR experimental validation on the regulatory roles of CRP.
Prediction performance comparisons between PredCRP-model and the SVM-based methods with various feature sets on the CRPS-TRN dataset.
| Feature set | No. of features | SPE | SEN | MCC | ACC |
|---|---|---|---|---|---|
| PredCRP-model | 12 | 1.00 | 0.92 | 0.95 | 0.98 |
| Informative 4-mer motifs | 8 | 0.99 | 0.38 | 0.52 | 0.86 |
| Informative location features | 4 | 1.00 | 0.29 | 0.49 | 0.85 |
| All features (baseline) | 380 | 0.98 | 0.29 | 0.41 | 0.83 |
| Composition descriptor | 320 | 0.98 | 0.17 | 0.26 | 0.81 |
| Location-dependent descriptor | 17 | 0.99 | 0.29 | 0.45 | 0.84 |
| Location-independent descriptor | 363 | 0.99 | 0.17 | 0.31 | 0.81 |
Prediction performance comparisons between PredCRP-model and the SVM-based method with various feature sets on the CRPS-TST dataset.
| Feature set | No. of features | SPE | SEN | MCC | ACC |
|---|---|---|---|---|---|
| PredCRP-model | 12 | 0.95 | 0.83 | 0.79 | 0.93 |
| Informative 4-mer motifs | 8 | 0.97 | 0.25 | 0.36 | 0.82 |
| Informative location features | 4 | 0.98 | 0.25 | 0.36 | 0.82 |
| All features (baseline) | 380 | 0.98 | 0.17 | 0.26 | 0.80 |
| Composition descriptor | 320 | 0.93 | 0.33 | 0.33 | 0.80 |
| Location-dependent descriptor | 17 | 1.00 | 0.08 | 0.26 | 0.80 |
| Location-independent descriptor | 363 | 0.93 | 0.33 | 0.33 | 0.80 |
Figure 2ROC curves of various methods using the CRPS-TST dataset. The AUCs of PredCRP-model and SVMs with informative 4-mer motifs, informative location features, all features, composition feature, location-dependent feature, and location-independent feature were 0.71, 0.73, 0.90, 0.79, 0.61, 0.79, and 0.70, respectively.
Figure 3Four interpretable rules for illustrating the regulatory roles of CRP acting on the binding region.
Figure 4The quantitative PCR experiments for determining the regulatory roles of CRP on the 23 previously unobserved regulatory interactions in E. coli. The y-axis represents the relative quantity. The whiskers are the standard deviation of relative quantity. The black bars belong to the control group (0 mM cAMP concentration), and the white bars belong to the case group (1 mM cAMP concentration).