| Literature DB >> 29718100 |
Julia K Varga1, Gábor E Tusnády1.
Abstract
Motivation: Transmembrane proteins (TMPs) are crucial in the life of the cells. As they have special properties, their structure is hard to determine--the PDB database consists of 2% TMPs, despite the fact that they are predicted to make up to 25% of the human proteome. Crystallization prediction methods were developed to aid the target selection for structure determination, however, there is a need for a TMP specific service.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29718100 PMCID: PMC6137969 DOI: 10.1093/bioinformatics/bty342
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Performance of the models of the different steps with cross-validation and on the respective test sets
| Cross-validation | Test | Features | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Step | Acc | Sens | Spec | G-mean | AUC | MCC | Acc | Sens | Spec | G-mean | AUC | MCC | |
| Solubilization | 0.745 | 0.701 | 0.700 | 0.700 | 0.772 | 0.368 | 0.732 | 0.694 | 0.770 | 0.731 | 0.803 | 0.421 | 156 |
| Purification | 0.812 | 0.761 | 0.758 | 0.758 | 0.820 | 0.437 | 0.734 | 0.753 | 0.717 | 0.735 | 0.813 | 0.394 | 193 |
| Crystallization | 0.763 | 0.809 | 0.807 | 0.807 | 0.885 | 0.583 | 0.795 | 0.743 | 0.847 | 0.794 | 0.875 | 0.581 | 201 |
| Whole process | 0.923 | 0.934 | 0.927 | 0.930 | 0.976 | 0.786 | 0.752 | 0.662 | 0.841 | 0.746 | 0.833 | 0.456 | –– |
Note: The numbers of features after feature selection are also included for every step.
Acc, Balanced accuracy; Sens, Sensitivity; Spec, Specificity and MCC, Matthew’s correlation coefficient.
Number of sequences in each group for every step of the crystallization process
| Step | Success | Failure | Total | ||
|---|---|---|---|---|---|
| Training | Test | Training | Test | ||
| Solubilization | 2161 | 549 | 864 | 217 | 3833 |
| Purification | 1732 | 439 | 429 | 107 | 2735 |
| Crystallization | 543 | 152 | 1279 | 321 | 2367 |
| Whole process | 543 | 152 | 2545 | 632 | 3950 |
Fig. 1.Performance of TMCrys. ROC curves of the performance of TMCrys both for the corresponding test sets and cross-validation. (A) Solubilization, (B) purification and (C) crystallization step. (D) Comparing the performance of TMCrys (shading: confidence interval) with existing tools for the whole process. The methods are the following: CrystalP2, XtalPred, XtalPred-RF, ParCrys, Crysalis I and Crysalis II
Performance of the different prediction methods on the test set
| Method | Acc | Sens | Spec | G-mean | AUC | MCC |
|---|---|---|---|---|---|---|
| TMCrys | 0.752 | 0.662 | 0.841 | 0.746 | 0.833 | 0.456 |
| Crysalis I | 0.493 | 0.105 | 0.881 | 0.304 | 0.510 | –0.017 |
| Crysalis II | 0.492 | 0.112 | 0.871 | 0.312 | 0.499 | –0.020 |
| XtalPred | 0.491 | 0.016 | 0.967 | 0.124 | 0.482 | –0.038 |
| XtalPred-RF | 0.577 | 0.620 | 0.524 | 0.570 | 0.578 | 0.299 |
| CRYSTALP2 | 0.572 | 0.606 | 0.538 | 0.571 | 0.593 | 0.106 |
| ParCrys | 0.445 | 0.107 | 0.783 | 0.289 | 0.564 | –0.125 |
Acc, Balanced accuracy; Sens, Sensitivity; Spec, Specificity and MCC, Matthews correlation coefficient.