| Literature DB >> 17953741 |
Y-h Taguchi1, M Michael Gromiha.
Abstract
BACKGROUND: Predicting the three-dimensional structure of a protein from its amino acid sequence is a long-standing goal in computational/molecular biology. The discrimination of different structural classes and folding types are intermediate steps in protein structure prediction.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17953741 PMCID: PMC2174517 DOI: 10.1186/1471-2105-8-404
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Role of re-weighting. Leave-one-out cross validation results [%] obtained with different measures and two types of LDA
| with re-weighting | without re-weighting | |||||||
|---|---|---|---|---|---|---|---|---|
| sensitivity | precision | F1 | accuracy | sensitivity | precision | F1 | accuracy | |
| Occurrence | 33 | 29. | 29 | 33 | 28 | 35 | 30 | 38 |
| Composition | 27 | 23 | 23 | 26 | 24 | 27 | 27 | 33 |
Performances of fold recognition. Leave-one-out cross validation performances [%] in each fold. wo: without re-weighting, w: with re-weighting
| Sensitivity | Precision | F1 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ID | Fold | Fold Description | Number | Ratio | wo | w | wo | w | wo | w | |
| all- | |||||||||||
| 1 | a.3 | Cytochrome C | 25 | 2 | 24 | 48 | 50 | 27 | 32 | 35 | |
| 2 | a.4 | DNA/RNA binding 3-helical bundle | 103 | 6 | 73 | 49 | 43 | 51 | 54 | 50 | |
| 3 | a.24 | Four helical up and down bundle | 26 | 2 | 23 | 38 | 35 | 20 | 28 | 26 | |
| 4 | a.39 | EF hand-like fold | 25 | 2 | 40 | 44 | 45 | 26 | 43 | 33 | |
| 5 | a.60 | SAMdomain-like | 26 | 2 | 8 | 27 | 29 | 12 | 12 | 16 | |
| 6 | a.118 | 47 | 3 | 47 | 45 | 50 | 50 | 48 | 47 | ||
| all- | |||||||||||
| 7 | b.1 | Immunoglobulin-like | 173 | 11 | 76 | 38 | 41 | 69 | 54 | 49 | |
| 8 | b.2 | Common fold of diphtheria toxin/transcription factors/cytochrome f | 28 | 2 | 4 | 29 | 11 | 21 | 5 | 24 | |
| 9 | b.6 | Cupredoxin-like | 30 | 2 | 27 | 37 | 42 | 22 | 33 | 27 | |
| 10 | b.18 | Galactose-binding domain-like | 25 | 2 | 20 | 36 | 50 | 26 | 29 | 30 | |
| 11 | b.29 | Concanavalin A-like lectins/glucanases | 26 | 2 | 23 | 27 | 24 | 18 | 24 | 22 | |
| 12 | b.34 | SH3-like barrel | 42 | 3 | 0 | 29 | 0 | 20 | - | 24 | |
| 13 | b.40 | OB-fold | 78 | 5 | 22 | 24 | 24 | 24 | 23 | 24 | |
| 14 | b.82 | Double-stranded | 34 | 2 | 12 | 18 | 19 | 17 | 15 | 17 | |
| 15 | b.121 | Nucleoplasmin-like | 42 | 3 | 52 | 52 | 51 | 47 | 52 | 49 | |
| 16 | c.1 | TIM barrel | 145 | 9 | 44 | 27 | 57 | 65 | 50 | 38 | |
| 17 | c.2 | NAD(P)-binding Rossmann-fold domains | 77 | 5 | 34 | 31 | 30 | 32 | 32 | 32 | |
| 18 | c.3 | FAD/NAD(P)-binding domain | 31 | 2 | 10 | 16 | 13 | 11 | 11 | 13 | |
| 19 | c.23 | Flavodoxin-like | 55 | 3 | 11 | 5 | 17 | 8 | 13 | 7 | |
| 20 | c.26 | Adenine nucleotide a hydrolase-like | 34 | 2 | 12 | 29 | 14 | 22 | 13 | 25 | |
| 21 | c.37 | P-loop containing nucleoside triphosphate hydrolases | 95 | 6 | 43 | 34 | 42 | 53 | 43 | 41 | |
| 22 | c.47 | Thioredoxin fold | 32 | 2 | 9 | 19 | 38 | 10 | 15 | 13 | |
| 23 | c.55 | Ribonuclease H-like motif | 49 | 3 | 4 | 6 | 11 | 8 | 6 | 7 | |
| 24 | c.66 | S-adenosyl-L-methionine-dependent methyltransferases | 34 | 2 | 29 | 29 | 31 | 21 | 30 | 24 | |
| 25 | c.69 | 37 | 2 | 35 | 41 | 39 | 34 | 37 | 37 | ||
| 26 | d.15 | 42 | 3 | 5 | 21 | 40 | 18 | 9 | 19 | ||
| 27 | d.17 | Cystatin-like | 25 | 2 | 0 | 8 | - | 4 | - | 5 | |
| 28 | d.58 | Ferredoxin-like | 118 | 7 | 32 | 7 | 17 | 25 | 22 | 11 | |
| small | |||||||||||
| 29 | g.3 | Knottins | 80 | 5 | 98 | 89 | 72 | 82 | 83 | 85 | |
| 30 | g.41 | Rubredoxin-like | 28 | 2 | 11 | 71 | 75 | 32 | 19 | 44 | |
Figure 1Prediction versus experiment. Comparison between predicted and experimental folds in 1612 proteins. The diagonal elements show the correctly predicted proteins. Dark block indicates the presence of more number of proteins and solid line indicates the boundary between five classes as shown in Table 2, i.e., all-α, all-β, α/β, and α + β and small proteins. (a)without re-weighing. (b) with re-weighing.
Figure 2Amino acid occurrence. (a)Comparison between mean amino acid occurrence of two typical folds, DNA/RNA binding α-helical bundle (a.4, black) and Immunoglobulin-like β-sandwich (b.2, red) (b) Distribution of these two folds over the first two discriminant functions with re-weighting. a.4: filled black circles, b.2: red crosses.
Figure 3Probability measure of discrimination. Rows : 103 proteins in fold (a.4). Columns : 30 folds. From left to right, the order is ID in Table 2. The darkest square corresponds to probability 0.5, and the lightest is zero.
Performances with independent dataset Predictive ability [%]of our method to the independent dataset of proteins used in Ding and Dubchak [18]. wo: without re-weighting, w: with re-weighting
| Number | Ratio | Sensitivity | Precision | F1 | ||||
|---|---|---|---|---|---|---|---|---|
| Fold Description | [%] | wo | w | wo | w | wo | w | |
| Cytochrome C | 16 | 3 | 56 | 94 | 64 | 47 | 60 | 63 |
| DNA/RNA binding 3-helical bundle | 32 | 6 | 75 | 56 | 41 | 47 | 53 | 51 |
| Four helical up and down bundle | 15 | 3 | 33 | 33 | 71 | 42 | 45 | 37 |
| EF hand-like fold | 15 | 3 | 53 | 53 | 57 | 42 | 55 | 47 |
| Immunoglobulin-like | 74 | 14 | 66 | 31 | 44 | 68 | 53 | 43 |
| Cupredoxin-like | 21 | 4 | 29 | 38 | 50 | 33 | 36 | 36 |
| Concanavalin A-like lectins/glucanases | 13 | 2 | 38 | 38 | 42 | 33 | 40 | 36 |
| SH3-like barrel | 16 | 3 | 0 | 50 | - | 44 | - | 47 |
| OB-fold | 32 | 6 | 16 | 28 | 26 | 31 | 20 | 30 |
| TIM barrel | 77 | 14 | 40 | 25 | 66 | 70 | 50 | 37 |
| FAD/NAD: (P)-binding domain | 23 | 4 | 22 | 30 | 114 | 50 | 37 | 38 |
| Flavodoxin-like | 24 | 5 | 8 | 13 | 28 | 35 | 13 | 18 |
| NAD: (P)-binding Rossmann-fold domains | 40 | 8 | 40 | 35 | 5 | 8 | 8 | 13 |
| P-loop containing nucleoside triphosphate hydrolases | 22 | 4 | 23 | 18 | 38 | 50 | 29 | 27 |
| Thioredoxin fold | 17 | 3 | 18 | 35 | 33 | 25 | 23 | 29 |
| Ribonuclease H-like motif | 22 | 4 | 5 | 18 | 14 | 22 | 7 | 20 |
| 18 | 3 | 33 | 39 | 43 | 41 | 38 | 40 | |
| 15 | 3 | 0 | 33 | 0 | 20 | - | 25 | |
| Ferredoxin-like | 40 | 8 | 23 | 3 | 11 | 10 | 15 | 4 |
| Total/Mean | 532 | 31 | 35 | 42 | 38 | 34 | 34 | |
| Accuracy | ||||||||
| without reweighting | 36 | |||||||
| with reweighting | 32 | |||||||
Performances with other features Mean performances [%] obtained with different features for the data set used in Ding and Dubchak [18]. Re-weighting scheme is employed
| Sensitivity | Precision | F1 | Accuracy | |
|---|---|---|---|---|
| Features | ||||
| secondary structure | 35 | 32 | 40 | 36 |
| polarity | 19 | 18 | 26 | 21 |
| polarizability | 18 | 18 | 26 | 19 |
| hydrophobicity | 23 | 22 | 28 | 24 |
| volume | 21 | 20 | 25 | 22 |
| Composition | ||||
| composition | 34 | 33 | 34 | 35 |
| composition + length | 36 | 35 | 38 | 38 |
| composition + other five features | 35 | 39 | 39 | 39 |
| Occurrence | ||||
| occurrence | 40 | 40 | 39 | 42 |
| occurrence + other five features | 40 | 46 | 42 | 44 |
Correlation between physical properties and the first discriminant function Brief descriptions of 49 selected physico-chemical, energetic and conformational properties, their correlation coefficient with the first discriminate function, and q-value. Asterisks in the last column shows q-value is less than 5%
| No. | Description | Corr. Coef. | ||
|---|---|---|---|---|
| 1. | Compressibility | 0.04 | 38.6 | |
| 2. | Thermodynamic transfer hydrophobicity | 0.54 | 1.9 | * |
| 3. | Surrounding hydrophobicity | 0.74 | 0.4 | * |
| 4. | Polarity | 0.36 | 9.2 | |
| 5. | Isoelectric point | 0.02 | 41.2 | |
| 6. | Equilibrium constant with reference to the ionization property | 0.01 | 41.7 | |
| 7. | Molecular weight | 0.06 | 38.4 | |
| 8. | Bulkiness | 0.49 | 3.0 | * |
| 9. | Chromatographic index | 0.51 | 2.7 | * |
| 10. | Refractive index | 0.36 | 9.2 | |
| 11. | Normalized consensus hydrophobicity | 0.48 | 3.4 | * |
| 12. | Short and medium range non-bonded energy | 0.11 | 32.7 | |
| 13. | Long-range non-bonded energy | 0.65 | 0.7 | * |
| 14. | Total non-bonded energy | 0.57 | 1.5 | * |
| 15. | Alpha-helical tendency | 0.29 | 14.1 | |
| 16. | Beta-helical tendency | 0.63 | 0.8 | * |
| 17. | Turn tendency | 0.61 | 0.9 | * |
| 18. | Coil tendency | 0.60 | 1.1 | * |
| 19. | Helical contact area | 0.20 | 23.0 | |
| 20. | Mean rms fluctuational displacement | 0.57 | 1.5 | * |
| 21. | Buriedness | 0.63 | 0.8 | * |
| 22. | Solvent accessible reduction ratio | 0.70 | 0.4 | * |
| 23. | Average number of surrounding residues | 0.72 | 0.4 | * |
| 24. | Power to be at the N-terminal of alpha helix | 0.57 | 1.5 | * |
| 25. | Power to be at the C-terminal of alpha helix | 0.18 | 26.4 | |
| 26. | Power to be at the middle of alpha helix | 0.05 | 38.6 | |
| 27. | Partial-specific volume | 0.25 | 18.8 | |
| 28. | Average medium-range contacts | 0.11 | 32.7 | |
| 29. | Average long-range contacts | 0.65 | 0.7 | * |
| 30. | Combined surrounding hydrophobicity (globular and membrane) | 0.69 | 0.4 | * |
| 31. | Solvent accessible surface area for denatured protein | 0.12 | 32.7 | |
| 32. | Solvent accessible surface area for native protein | 0.52 | 2.5 | * |
| 33. | Solvent accessible surface area for protein unfolding | 0.47 | 3.7 | * |
| 34. | Gibbs free energy change of hydration for unfolding | 0.30 | 14.1 | |
| 35. | Gibbs free energy change of hydration for denatured protein | 0.40 | 7.3 | |
| 36. | Gibbs free energy change of hydration for native protein | 0.46 | 4.1 | * |
| 37. | Unfolding enthalpy change of hydration | 0.05 | 38.6 | |
| 38. | Unfolding entropy change of hydration | 0.37 | 8.9 | |
| 39. | Unfolding hydration heat capacity change | 0.54 | 1.9 | * |
| 40. | Unfolding Gibbs free energy change of chain | 0.16 | 27.6 | |
| 41. | Unfolding enthalpy change of chain | 0.22 | 21.7 | |
| 42. | Unfolding entropy change of chain | 0.44 | 4.7 | * |
| 43. | Unfolding Gibbs free energy change | 0.33 | 11.0 | |
| 44. | Unfolding enthalpy change | 0.35 | 10.2 | |
| 45. | Unfolding entropy change | 0.34 | 10.3 | |
| 46. | Volume (number of non-hydrogen side chain atoms) | 0.11 | 32.7 | |
| 47. | Shape (position of branch point in a side-chain) | 0.10 | 32.8 | |
| 48. | Flexibility (number of side-chain dihedral angles) | 0.24 | 19.5 | |
| 49. | Backbone dihedral probability | 0.51 | 2.5 | * |