| Literature DB >> 17042958 |
You Jung Kim1, Jignesh M Patel.
Abstract
BACKGROUND: Protein structure classification plays a central role in understanding the function of a protein molecule with respect to all known proteins in a structure database. With the rapid increase in the number of new protein structures, the need for automated and accurate methods for protein classification is increasingly important.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17042958 PMCID: PMC1622760 DOI: 10.1186/1471-2105-7-456
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Classification result for proCC using new domains in SCOP 1.69
| Correct CC | Incorrect CI | New classes UN | Existing classes UE | New classes TN | Existing classes TE | ||||
| Family | 4008 | 347 | 555 | 379 | 726 | 4563 | 86.3% | 8.0% | 76.5% |
| Superfamily | 4321 | 154 | 292 | 522 | 353 | 4936 | 87.2% | 3.4% | 82.7% |
| Fold | 4597 | 159 | 153 | 380 | 209 | 5080 | 90.1% | 3.3% | 75.0% |
This table shows the result of classifying 5298 new domains in SCOP 1.69 using proCC.
The comparison between SGM and proCC
| SGM | proCC | SGM | proCC | SGM | proCC | |
| Family | 71.3% | 86.3% | 19.7% | 8.0% | 77.4% | 76.5% |
| Superfamily | 69.6% | 87.2% | 17.0% | 3.4% | 82.2% | 82.7% |
| Fold | 71.3% | 90.1% | 15.7% | 3.3% | 76.6% | 75.0% |
This table shows the result of comparing SGM and proCC for classifying 5298 new domains in SCOP 1.69.
The comparison between SCOPmap and proCC using the predicted Superfamily SCOP labels
| Correct CC | Incorrect CI | New classes UN | Existing classes UE | ID | ||||
| SCOPmap | 2069 | 65 | 190 | 212 | 237 | 81.5% | 61.9% | 2–3 hours per query |
| proCC | 2025 | 75 | 246 | 275 | 152 | 81.9% | 80.1% | 9 minutes per query |
This table shows the result of classifying 2773 single domain chains in SCOP 1.69. All numbers reported in column 2–6 are in terms of the number of chains (or domains due to the fact that we used single domain chains). Column 2–3 show the number of single domain chains which are correctly identified as single domain chains and are classified to known superfamilies. Column 4–5 show the number of single domain chains which are correctly identified as single domain chains and are labeled as unclassified. Column 6 shows the number of single chain domains which are incorrectly identified as multi-domain chains.
The clustering effectiveness at the SCOP family, superfamily, and fold levels
| Family | 320 | 358 | 301 | 822 (88%) |
| Superfamily | 260 | 327 | 234 | 731 (78%) |
| Fold | 200 | 318 | 191 | 670 (72%) |
This table shows the result of clustering 934 unclassified domains at the SCOP family, superfamily, and fold levels. Column 2 shows the number of SCOP families, superfamilies, and folds that these 934 domains are spread across. Column 3 shows the number of automatically generated clusters at each SCOP level. Column 4 shows the number of common clusters/SCOP classes that were correctly mapped. Column 5 shows the number of actual domains in the cluster that had the same label as the corresponding SCOP class.
Figure 1Assessing the quality of the automatically generated clusters. This figure shows the automatically generated family-level clusters for the unclassified domains in the SCOP 1.69 "d" class (i.e. the alpha and beta proteins (a+b)). This figure also shows the representative domain structures for each cluster. A connected graph corresponds to an automatically detected MCL cluster. The ellipses indicate the novel families in SCOP 1.69. The MCL clusters are assigned a family-level label based on the most common family-level label in the cluster. Within a cluster, the nodes with the same color indicate that all these nodes have the same family-level label. To keep this figure simple, only clusters with more than four domains are shown. There are an additional of 79 clusters that matched the SCOP family label, and of these 30 clusters correspond to new families in SCOP 1.69. This figure was generated using BioLayout [31] and PyMol [32].
Figure 2Visualization of the classification decision boundary. This figure shows the classification boundary created for entries in SCOP 1.67 using SCOP 1.65 as the database. The SVM is used to detect the boundary between "Classified" and "Unclassified" entries. This trained SVM will then be used to predict class labels for SCOP 1.69.