| Literature DB >> 16412240 |
Youfang Cao1, Shi Liu, Lida Zhang, Jie Qin, Jiang Wang, Kexuan Tang.
Abstract
BACKGROUND: A new method for the prediction of protein structural classes is constructed based on Rough Sets algorithm, which is a rule-based data mining method. Amino acid compositions and 8 physicochemical properties data are used as conditional attributes for the construction of decision system. After reducing the decision system, decision rules are generated, which can be used to classify new objects.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16412240 PMCID: PMC1363362 DOI: 10.1186/1471-2105-7-20
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Results of self-consistency test
| Dataset | Algorithm | Rate of correct prediction for each class | Overall rate of accuracy | |||
| All-α | All-β | α/β | α+β | |||
| 277 domains | Component coupled | 95.7% | 93.4% | 95.1% | 92.3% | 94.2% |
| Neural network | 98.6% | 93.4% | 96.3% | 84.6% | 93.5% | |
| SVM | 100% | 100% | 100% | 100% | 100% | |
| Rough Sets | 100% | 100% | 100% | 100% | 100% | |
| 498 domains | Component coupled | 95.8% | 95.2% | 94.9% | 95.4% | 95.8% |
| Neural network | 100% | 98.4% | 96.3% | 84.5% | 94.6% | |
| SVM | 100% | 100% | 100% | 100% | 100% | |
| Rough Sets | 100% | 100% | 100% | 100% | 100% | |
Results of jackknife test
| Dataset | Algorithm | Rate of correct prediction for each class | Overall rate of accuracy | |||
| All-α | All-β | α/β | α+β | |||
| 277 domains | Component coupled | 84.3% | 82.0% | 81.5% | 67.7% | 79.1% |
| Neural network | 68.6% | 85.2% | 86.4% | 56.9% | 74.7% | |
| SVM | 74.3% | 82.0% | 87.7% | 72.3% | 79.4% | |
| Rough Sets | 77.1% | 77.0% | 93.8% | 66.2% | 79.4% | |
| 498 domains | Component coupled | 93.5% | 88.9% | 90.4% | 84.5% | 89.2% |
| Neural network | 86.0% | 96.0% | 88.2% | 86.0% | 89.2% | |
| SVM | 88.8% | 95.2% | 96.3% | 91.5% | 93.2% | |
| Rough Sets | 87.9% | 91.3% | 97.1% | 86.0% | 90.8% | |
Statistics of decision rules
| 277 domains | Percentage | 498 domains | Percentage | |
| α | 11711 | 25.10% | 12744 | 24.29% |
| β | 10250 | 21.97% | 11211 | 21.36% |
| α/β | 11886 | 25.48% | 13771 | 26.34% |
| α+β | 12804 | 27.45% | 14748 | 28.11% |
| Total | 46651 | 52474 | ||
Figure 1The comparisons between actual objects number and predicted number of 4 protein structural classes.
The composition of two datasets in this study
| Dataset | All-α | All-β | α/β | α+β | Total |
| 277 domains | 70 | 61 | 81 | 65 | 277 |
| 498 domains | 107 | 126 | 136 | 129 | 498 |
Figure 2Pipeline script for cross-validation test on Rosetta system.