| Literature DB >> 31869390 |
Abstract
Proteins are diverse with their sequences, structures and functions, it is important to study the relations between the sequences, structures and functions. In this paper, we conduct a study that surveying the relations between the protein sequences and their structures. In this study, we use the natural vector (NV) and the averaged property factor (APF) features to represent protein sequences into feature vectors, and use the multi-class MSE and the convex hull methods to separate proteins of different structural classes into different regions. We found that proteins from different structural classes are separable by hyper-planes and convex hulls in the natural vector feature space, where the feature vectors of different structural classes are separated into disjoint regions or convex hulls in the high dimensional feature spaces. The natural vector outperforms the averaged property factor method in identifying the structures, and the convex hull method outperforms the multi-class MSE in separating the feature points. These outcomes convince the strong connections between the protein sequences and their structures, and may imply that the amino acids composition and their sequence arrangements represented by the natural vectors have greater influences to the structures than the averaged physical property factors of the amino acids.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31869390 PMCID: PMC6927603 DOI: 10.1371/journal.pone.0226768
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Information for the CATH datasets.
| Datasets | CATH I | CATH II | CATH III | |
|---|---|---|---|---|
| 458 | 536 | 8321 | ||
| 10 | 14 | 1673 | ||
| 157 | 195 | |||
| 10 | 11 | 1772 | ||
| 141 | 145 | |||
| 10 | 15 | 4876 | ||
| 160 | 196 | |||
This table shows the group and sequence statistics for the three CATH datasets.
The classification results for the 30 CATH groups by the multi-class MSE method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 92.99 | 90.07 | 78.13 | |
| 83.44 | 53.90 | 60.62 | |
| 91.72 | 67.38 | 71.25 | |
| 93.63 | 90.07 | 83.75 | |
| 95.54 | 91.49 | 88.12 | |
| 93.63 | 78.02 | 78.02 | |
| 96.18 | 89.36 | 90.00 | |
| 60.49 | 53.33 | 55.51 | |
| 88.45 | 76.70 | 75.68 | |
This table shows the MSE classification rates for the 30 CATH groups with different feature methods.
The classification results for the 30 CATH groups by the convex hull method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | ||
| 100 | 87.94 | 89.38 | ||
| 100 | 87.94 | 89.38 | ||
| 98.09 | 85.82 | 88.75 | ||
| 100 | 87.50 | 97.50 | ||
| 80.89 | 73.76 | 79.37 | ||
| 89.17 | 82.98 | 80.63 | ||
| 79.62 | 86.52 | 89.38 | ||
| 92.36 | 85.82 | 89.38 | ||
| 100 | 85.11 | 89.38 | ||
| 62.75 | 69.79 | 59.71 | ||
| 90.29 | 83.32 | 85.29 | ||
This table shows the convex hull classification rates for the 30 CATH groups, where the natural vectors and PseAAC vectors are partitioned into 10 dimensions. N1 refers to the first 10 dimensions of the natural vector, which are the numbers for amino acids A,R,N,D,C,Q,E,G,H,I; N2 refers to the second 10 dimensions of the natural vector, which are the numbers for amino acids L,K,M,F,P,S,T,W,Y,V. The other labels are similarly defined.
The classification results for the 30 CATH groups by SVM method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 100 | 78.72 | 81.87 | |
| 100 | 60.28 | 71.88 | |
| 100 | 84.40 | 69.37 | |
| 100 | 79.31 | 77.04 | |
| 100 | 75.83 | 80.00 | |
| 100 | 61.54 | 64.00 | |
| 99.36 | 94.33 | 85.00 | |
| 51.95 | 72.59 | 87.84 | |
| 93.91 | 75.88 | 77.13 | |
This table shows the classification rates for the 30 CATH groups by the SVM method.
The classification results for the 30 CATH groups by random forest method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 95.54 | 86.52 | 96.25 | |
| 94.90 | 56.74 | 85.00 | |
| 100 | 92.91 | 95.63 | |
| 95.54 | 87.94 | 96.25 | |
| 100 | 92.20 | 96.25 | |
| 100 | 88.65 | 99.38 | |
| 100 | 89.36 | 98.12 | |
| 46.75 | 97.46 | 62.16 | |
| 91.59 | 86.47 | 91.13 | |
This table shows the classification rates for the 30 CATH groups by the random forest method.
The classification results for the 40 CATH groups by the multi-class MSE method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 86.15 | 68.28 | 67.35 | |
| 82.05 | 43.45 | 60.20 | |
| 85.13 | 46.21 | 70.41 | |
| 88.72 | 66.21 | 70.92 | |
| 89.23 | 71.72 | 79.59 | |
| 82.05 | 52.41 | 70.92 | |
| 88.72 | 69.66 | 80.61 | |
| 55.00 | 45.49 | 50.11 | |
| 82.13 | 57.93 | 68.76 | |
This table shows the multi-class MSE classification rates for the 40 CATH groups with different feature combinations.
The classification results for the 40 CATH groups by the convex hull method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | ||
| 97.95 | 91.03 | 93.37 | ||
| 98.46 | 93.10 | 92.35 | ||
| 88.21 | 86.90 | 88.78 | ||
| 91.79 | 87.59 | 88.78 | ||
| 94.36 | 88.28 | 89.29 | ||
| 98.46 | 88.28 | 92.35 | ||
| 89.74 | 82.07 | 81.63 | ||
| 84.10 | 86.21 | 84.18 | ||
| 88.72 | 86.21 | 76.02 | ||
| 69.69 | 79.44 | 72.81 | ||
| 90.15 | 86.91 | 85.96 | ||
This table shows the convex hull classification rates for the 40 CATH groups, where the natural vectors and the PseAAC vectors are partitioned into 10 dimensions. N1 refers to the first 10 dimensions of the natural vector, which are the numbers for the amino acids A,R,N,D,C,Q,E,G,H,I; N2 refers to the second 10 dimensions of the natural vector, which are the numbers for the amino acids L,K,M,F,P,S,T,W,Y,V. The other labels are similarly defined.
The classification results for the 40 CATH groups by the SVM method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 100 | 79.31 | 71.43 | |
| 100 | 58.62 | 59.69 | |
| 100 | 57.24 | 75.51 | |
| 100 | 79.31 | 77.04 | |
| 100 | 62.14 | 95.00 | |
| 100 | 53.79 | 88.57 | |
| 67.37 | 82.86 | 73.33 | |
| 91.77 | 98.65 | 99.10 | |
| 94.89 | 71.49 | 79.96 | |
This table shows the classification rates for the 40 CATH groups by the SVM method.
The classification results for the 40 CATH groups by the random forest method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 95.90 | 76.55 | 91.84 | |
| 95.38 | 54.48 | 91.84 | |
| 98.46 | 86.21 | 100 | |
| 96.41 | 80.69 | 93.88 | |
| 98.46 | 86.90 | 100 | |
| 99.49 | 83.45 | 100 | |
| 99.49 | 86.21 | 99.49 | |
| 99.13 | 85.20 | 83.41 | |
| 97.84 | 79.96 | 95.06 | |
This table shows the classification rates for the 40 CATH groups by the random forest method.
The classification results for the CATH data with low similarity by the multi-class MSE method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 75.49 | 70.88 | 55.87 | |
| 72.80 | 71.78 | 40.07 | |
| 75.37 | 74.77 | 44.93 | |
| 74.42 | 72.52 | 52.38 | |
| 75.67 | 76.58 | 53.69 | |
| 75.07 | 73.81 | 46.70 | |
| 67.90 | 68.96 | 65.67 | |
| 50.93 | 50.51 | 35.91 | |
| 70.96 | 69.98 | 49.40 | |
This table shows the MSE classification rates for the CATH data with sequence similarity below 30%.
The classification results for the CATH data with low similarity by the random forest method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 83.56 | 91.20 | 80.66 | |
| 60.97 | 79.35 | 74.90 | |
| 94.74 | 96.28 | 96.16 | |
| 85.59 | 93.74 | 83.22 | |
| 95.28 | 96.95 | 96.29 | |
| 93.31 | 97.12 | 97.66 | |
| 95.52 | 98.36 | 91.55 | |
| 99.40 | 53.44 | 86.67 | |
| 88.55 | 88.31 | 88.39 | |
This table shows the classification rates for the CATH data with low similarity by the random forest method.
The classification results for the CATH data with low similarity by the convex hull method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | ||
| 90.20 | 87.98 | 79.41 | ||
| 92.83 | 82.22 | 84.21 | ||
| 81.41 | 83.58 | 79.80 | ||
| 85.00 | 72.40 | 80.21 | ||
| 86.61 | 86.17 | 84.21 | ||
| 90.02 | 82.22 | 84.60 | ||
| 48.95 | 37.25 | 21.33 | ||
| 81.59 | 78.78 | 50.80 | ||
| 90.02 | 79.01 | 58.20 | ||
| 52.78 | 66.65 | 33.33 | ||
| 79.94 | 75.63 | 65.61 | ||
This table shows the convex hull classification rates for the CATH data with low similarity, where the natural vectors and PseAAC vectors are partitioned into 10 dimensions. N1 refers to the first 10 dimensions of the natural vector, which are the numbers for amino acids A,R,N,D,C,Q,E,G,H,I; N2 refers to the second 10 dimensions of the natural vector, which are the numbers for amino acids L,K,M,F,P,S,T,W,Y,V. The other labels are similarly defined.
The classification results for the CATH data with low similarity by the SVM method.
| Feature methods | Classification rates by structural classes (%) | ||
|---|---|---|---|
| Mainly α | Mainly β | Mixed α & β | |
| 100 | 14.73 | 99.69 | |
| 100 | 39.62 | 73.28 | |
| 100 | 26.30 | 99.57 | |
| 100 | 47.69 | 98.30 | |
| 100 | 22.74 | 99.98 | |
| 100 | 16.25 | 99.10 | |
| 57.26 | 22.74 | 100 | |
| 53.62 | 63.49 | 63.00 | |
| 88.86 | 31.70 | 91.62 | |
This table shows the classification rates for the CATH data with low similarity by the SVM method.
Information for the SCOP datasets.
| Datasets | SCOP I | SCOP II | SCOP III | SCOP IV | |
|---|---|---|---|---|---|
| 817 | 406 | 2509 | 4836 | ||
| 6 | 10 | 12 | 960 | ||
| 202 | 104 | 611 | |||
| 6 | 10 | 12 | 1030 | ||
| 205 | 94 | 568 | |||
| 6 | 10 | 12 | 1356 | ||
| 213 | 94 | 651 | |||
| 6 | 10 | 12 | 1490 | ||
| 197 | 114 | 679 | |||
This table presents the group and sequence numbers for each of the four SCOP datasets.
The classification results for the 24 SCOP groups by the multi-class MSE method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 72.77 | 85.85 | 75.12 | 72.59 | |
| 76.24 | 79.02 | 47.42 | 31.47 | |
| 69.80 | 89.76 | 57.75 | 68.02 | |
| 92.08 | 89.27 | 82.16 | 76.65 | |
| 93.07 | 91.22 | 81.69 | 87.82 | |
| 72.28 | 89.76 | 60.09 | 69.54 | |
| 93.56 | 91.71 | 82.16 | 87.82 | |
| 54.04 | 61.54 | 70.22 | 70.51 | |
| 77.98 | 84.77 | 69.58 | 70.55 | |
This table shows the multi-class MSE classification rates for the 24 SCOP groups with different feature combinations.
The classification results for the 24 SCOP groups by the random forest method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 90.10 | 82.44 | 77.00 | 86.80 | |
| 74.75 | 77.56 | 63.38 | 59.90 | |
| 100 | 99.51 | 100 | 100 | |
| 92.08 | 82.93 | 78.40 | 88.32 | |
| 100 | 100 | 100 | 98.48 | |
| 100 | 99.51 | 100 | 100 | |
| 98.02 | 95.61 | 96.71 | 96.95 | |
| 100 | 50.00 | 100 | 100 | |
| 94.37 | 85.95 | 89.44 | 91.31 | |
This table shows the classification rates for the 24 SCOP groups by the random forest method.
The classification results for the 24 SCOP groups by the convex hull method.
| Feature methods | Classification rates by structural classes (%) | ||||
|---|---|---|---|---|---|
| All α | All β | α + β | α/β | ||
| 95.54 | 91.71 | 95.31 | 97.46 | ||
| 100 | 87.80 | 99.53 | 95.94 | ||
| 93.56 | 95.61 | 93.90 | 88.83 | ||
| 99.01 | 80.49 | 91.55 | 96.95 | ||
| 86.63 | 78.05 | 80.75 | 71.07 | ||
| 78.22 | 71.22 | 79.81 | 77.66 | ||
| 97.52 | 31.22 | 84.51 | 88.83 | ||
| 91.58 | 96.59 | 75.59 | 91.88 | ||
| 100 | 79.51 | 72.77 | 90.86 | ||
| 80.40 | 52.43 | 94.03 | 94.59 | ||
| 92.25 | 76.46 | 86.78 | 89.41 | ||
This table shows the convex hull classification rates for the 24 SCOP groups, where the natural vectors and the PseAAC vectors are partitioned into 10 dimensions. N1 refers to the first 10 dimensions of the natural vector, which are the numbers for amino acids A,R,N,D,C,Q,E,G,H,I; N2 refers to the second 10 dimensions of the natural vector, which are the numbers for amino acids L,K,M,F,P,S,T,W,Y,V. The other labels are similarly defined.
The classification results for the 24 SCOP groups by the SVM method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 100 | 89.76 | 64.79 | 86.80 | |
| 100 | 68.29 | 49.77 | 73.60 | |
| 100 | 91.71 | 73.71 | 76.65 | |
| 100 | 95.61 | 69.48 | 90.86 | |
| 100 | 96.59 | 76.53 | 89.85 | |
| 100 | 93.66 | 35.21 | 80.71 | |
| 97.52 | 96.59 | 79.34 | 90.36 | |
| 81.12 | 100 | 95.28 | 100 | |
| 97.33 | 91.53 | 68.01 | 86.10 | |
This table shows the classification rates for the 24 SCOP groups by the SVM method.
The classification results for the 40 SCOP groups by the multi-class MSE method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 66.35 | 71.28 | 75.53 | 91.23 | |
| 50.96 | 51.06 | 8.51 | 82.46 | |
| 62.50 | 50.00 | 48.94 | 81.58 | |
| 84.62 | 73.40 | 76.60 | 92.98 | |
| 89.42 | 74.47 | 79.79 | 89.47 | |
| 69.23 | 51.06 | 47.87 | 82.46 | |
| 90.38 | 77.66 | 81.91 | 92.98 | |
| 61.09 | 53.96 | 60.89 | 63.18 | |
| 71.82 | 62.86 | 60.01 | 84.54 | |
This table shows the multi-class MSE classification rates for the 40 SCOP groups with different feature combinations.
The classification results for the 40 SCOP groups by the random forest method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 96.15 | 79.79 | 81.91 | 96.49 | |
| 92.31 | 74.47 | 68.09 | 96.49 | |
| 95.19 | 77.66 | 89.36 | 98.25 | |
| 98.08 | 80.85 | 81.91 | 96.49 | |
| 98.08 | 80.85 | 91.49 | 98.25 | |
| 97.12 | 79.79 | 92.55 | 99.12 | |
| 99.04 | 82.98 | 91.49 | 97.37 | |
| 87.50 | 100 | 98.18 | 100 | |
| 95.43 | 82.05 | 86.87 | 97.81 | |
This table shows the classification rates for the 24 SCOP groups by the random forest method.
The classification results for the 40 SCOP groups by the convex hull method.
| Feature methods | Classification rates by structural classes (%) | ||||
|---|---|---|---|---|---|
| All α | All β | α+β | α/β | ||
| 100 | 90.43 | 80.85 | 100 | ||
| 100 | 81.91 | 81.91 | 100 | ||
| 99.04 | 81.91 | 78.72 | 91.23 | ||
| 94.23 | 77.66 | 68.09 | 85.09 | ||
| 92.31 | 77.66 | 60.64 | 71.05 | ||
| 95.19 | 81.91 | 63.83 | 66.67 | ||
| 82.69 | 89.36 | 80.85 | 88.60 | ||
| 90.38 | 90.43 | 80.85 | 99.12 | ||
| 97.12 | 81.91 | 80.85 | 85.09 | ||
| 92.81 | 99.48 | 94.45 | 96.21 | ||
| 94.38 | 85.27 | 77.10 | 88.31 | ||
This table shows the convex hull classification rates for the 40 SCOP groups, where the natural vectors and the PseAAC vectors are partitioned into 10 dimensions. N1 refers to the first 10 dimensions of the natural vector, which are the numbers for amino acids A,R,N,D,C,Q,E,G,H,I; N2 refers to the second 10 dimensions of the natural vector, which are the numbers for amino acids L,K,M,F,P,S,T,W,Y,V. The other labels are similarly defined.
The classification results for the 40 SCOP groups by the SVM method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 100 | 64.89 | 54.26 | 76.32 | |
| 100 | 73.40 | 46.81 | 73.68 | |
| 100 | 75.53 | 73.40 | 82.46 | |
| 100 | 74.47 | 84.04 | 92.98 | |
| 100 | 82.98 | 94.68 | 90.35 | |
| 100 | 81.91 | 81.91 | 84.21 | |
| 89.42 | 85.11 | 94.68 | 91.23 | |
| 100 | 100 | 100 | 100 | |
| 98.68 | 79.79 | 78.72 | 86.40 | |
This table shows the classification rates for the 40 SCOP groups by the SVM method.
The classification results for the 48 SCOP groups by the multi-class MSE method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 61.54 | 50.18 | 45.31 | 68.92 | |
| 48.77 | 57.22 | 45.31 | 63.18 | |
| 66.61 | 61.44 | 48.08 | 57.88 | |
| 73.16 | 60.04 | 70.05 | 69.81 | |
| 78.56 | 65.14 | 71.27 | 73.49 | |
| 67.10 | 63.20 | 54.84 | 59.06 | |
| 77.91 | 67.08 | 72.20 | 75.26 | |
| 60.39 | 55.69 | 70.36 | 54.19 | |
| 66.76 | 56.00 | 59.68 | 65.22 | |
This table shows the multi-class MSE classification rates for the 48 SCOP groups with different feature combinations.
The classification results for the 48 SCOP groups by the random forest method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 84.45 | 77.46 | 86.02 | 91.90 | |
| 80.20 | 77.11 | 85.25 | 91.16 | |
| 90.34 | 84.33 | 93.09 | 96.61 | |
| 89.03 | 82.39 | 89.09 | 95.29 | |
| 92.14 | 86.44 | 94.16 | 97.79 | |
| 93.78 | 89.08 | 95.70 | 99.41 | |
| 91.98 | 91.20 | 92.93 | 97.79 | |
| 98.86 | 100 | 89.16 | 88.12 | |
| 90.10 | 86.00 | 90.68 | 94.76 | |
This table shows the classification rates for the 48 SCOP groups by the random forest method.
The classification results for the 48 SCOP groups by the convex hull method.
| Feature methods | Classification rates by structural classes (%) | ||||
|---|---|---|---|---|---|
| All α | All β | α + β | α/β | ||
| 80.03 | 87.32 | 84.79 | 86.45 | ||
| 85.11 | 89.08 | 84.64 | 82.33 | ||
| 63.01 | 65.85 | 68.20 | 38.14 | ||
| 68.41 | 61.97 | 63.90 | 58.32 | ||
| 54.99 | 54.40 | 64.98 | 30.04 | ||
| 57.94 | 54.58 | 61.60 | 37.11 | ||
| 71.36 | 77.29 | 62.21 | 49.48 | ||
| 70.54 | 77.99 | 70.20 | 57.58 | ||
| 68.58 | 81.87 | 66.05 | 55.38 | ||
| 80.31 | 99.74 | 92.53 | 97.20 | ||
| 70.03 | 75.01 | 71.91 | 59.20 | ||
This table shows the convex hull classification rates for the 48 SCOP groups, where the natural vectors and the PseAAC vectors are partitioned into 10 dimensions. N1 refers to the first 10 dimensions of the natural vector, which are the numbers for amino acids A,R,N,D,C,Q,E,G,H,I; N2 refers to the second 10 dimensions of the natural vector, which are the numbers for amino acids L,K,M,F,P,S,T,W,Y,V. The other labels are analogously defined.
The classification results for the 48 SCOP groups by the SVM method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 100 | 64.89 | 54.26 | 76.32 | |
| 100 | 73.40 | 46.81 | 73.68 | |
| 100 | 75.53 | 73.40 | 82.46 | |
| 100 | 74.47 | 84.04 | 92.98 | |
| 100 | 82.98 | 94.68 | 90.35 | |
| 100 | 81.91 | 81.91 | 84.21 | |
| 100 | 97.78 | 81.67 | 97.50 | |
| 100 | 96.25 | 100 | 100 | |
| 100 | 81.37 | 77.10 | 87.19 | |
This table shows the classification rates for the 48 SCOP groups by the SVM method.
The classification results for the SCOP data with sequence similarity below 30% by the multi-class MSE method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 61.15 | 52.82 | 51.84 | 52.08 | |
| 63.85 | 58.74 | 42.85 | 31.21 | |
| 64.27 | 58.74 | 52.58 | 40.00 | |
| 66.77 | 56.80 | 52.36 | 54.70 | |
| 68.85 | 58.35 | 50.96 | 55.70 | |
| 59.90 | 50.00 | 51.03 | 48.46 | |
| 63.54 | 52.33 | 50.37 | 51.41 | |
| 43.54 | 41.07 | 26.25 | 31.21 | |
| 61.48 | 53.61 | 47.28 | 45.60 | |
This table shows the MSE classification rates for the SCOP data with sequence similarity below 30%.
The classification results for the SCOP data with low similarity by the random forest method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 53.65 | 55.15 | 99.93 | 78.72 | |
| 54.27 | 59.61 | 98.53 | 47.45 | |
| 50.83 | 56.60 | 99.93 | 52.01 | |
| 60.21 | 60.68 | 100 | 79.26 | |
| 50.52 | 57.48 | 99.93 | 59.66 | |
| 53.65 | 57.86 | 100 | 51.07 | |
| 62.81 | 64.37 | 100 | 79.66 | |
| 64.38 | 76.31 | 73.60 | 99.80 | |
| 56.29 | 61.01 | 96.49 | 68.45 | |
This table shows the classification rates for the SCOP data with sequence similarity below 30% by the random forest method.
The classification results for the SCOP data with sequence similarity below 30% by the convex hull method.
| Feature methods | Classification rates by structural classes (%) | ||||
|---|---|---|---|---|---|
| All α | All β | α + β | α/β | ||
| 76.56 | 69.61 | 58.19 | 56.51 | ||
| 78.75 | 69.13 | 63.42 | 64.50 | ||
| 60.83 | 60.78 | 59.66 | 55.64 | ||
| 66.98 | 52.62 | 61.21 | 64.03 | ||
| 69.48 | 66.21 | 63.86 | 66.58 | ||
| 71.67 | 63.40 | 66.81 | 71.14 | ||
| 62.29 | 53.79 | 42.26 | 18.59 | ||
| 60.21 | 51.55 | 43.73 | 19.13 | ||
| 67.29 | 56.50 | 50.22 | 20.60 | ||
| 67.50 | 60.00 | 57.52 | 32.48 | ||
| 68.16 | 60.36 | 56.69 | 46.92 | ||
This table shows the convex hull classification rates for the SCOP data with sequence similarity below 30%, where the natural vectors and PseAAC vectors are partitioned into 10 dimensions. N1 refers to the first 10 dimensions of the natural vector, which are the numbers for amino acids A,R,N,D,C,Q,E,G,H,I; N2 refers to the second 10 dimensions of the natural vector, which are the numbers for amino acids L,K,M,F,P,S,T,W,Y,V. The other labels are similarly defined.
The classification results for the SCOP data with sequence similarity below 30% by the SVM method.
| Feature methods | Classification rates by structural classes (%) | |||
|---|---|---|---|---|
| All α | All β | α + β | α/β | |
| 100 | 26.89 | 57.67 | 96.85 | |
| 100 | 40.87 | 15.86 | 79.46 | |
| 100 | 18.35 | 24.56 | 96.17 | |
| 100 | 24.56 | 21.98 | 93.09 | |
| 100 | 28.84 | 25.44 | 97.72 | |
| 100 | 21.65 | 20.21 | 98.32 | |
| 41.56 | 59.61 | 61.06 | 99.46 | |
| 47.50 | 83.98 | 45.28 | 99.93 | |
| 86.13 | 38.09 | 34.01 | 95.13 | |
This table shows the classification rates for the SCOP data with sequence similarity below 30% by the SVM method.