| Literature DB >> 23433045 |
J Eduardo Fajardo1, Andras Fiser.
Abstract
BACKGROUND: Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23433045 PMCID: PMC3598644 DOI: 10.1186/1471-2105-14-63
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Correlations between all pairs of variables considered in this study
| −0.2949 | | | | | | |
| 0.4026 | −0.7765 | | | | | |
| 0.3504 | −0.4898 | 0.7452 | | | | |
| 0.3163 | −0.2732 | 0.4977 | 0.8068 | | | |
| −0.3577 | 0.4067 | −0.5901 | −0.6053 | −0.7136 | | |
| 0.1546 | −0.1340 | 0.1363 | 0.1086 | 0.0594 | −0.0458 | |
Conserv: sequence conservation; Distance: distance to general center of mass; Closeness: closeness; Between: betweenness; PageRank: PageRank, RSA: relative surface accessibility.
Figure 1Cumulative distribution of functional (filled circles) and non-functional (open circles) residues for each of the attributes analyzed. The plain solid line shows the relative difference between the two other lines. A, sequence conservation using relative entropy; dotted line indicates the threshold value used for selection of residues before presenting to neural networks; B, distance to the GCM; C, closeness; D, PageRank; E, betweenness; and F, Relative Surface Accessibility.
A representative list of performances of neural networks with different combinations of features used as inputs
| C | D | A | K | All residues | Conserved > 3.5 |
| X | | | | 16.28 | 22.05 |
| | X | | | 15.51 | 26.68 |
| | | X | | 10.12 | 27.56 |
| X | | X | | 18.71 | 28.16 |
| | X | X | | 19.78 | 32.48 |
| X | X | | | 20.56 | 27.67 |
| X | X | X | | 20.68 | 28.95 |
| X | X | 10.95 | 25.48 | ||
C, sequence conservation; D, distance to the GCM; A, residue type; K, closeness. A check symbol on a column indicates that the feature was included in training. The Matthew’s correlation coefficient, expressed as a percentage, was calculated on the validation set after convergence of the training procedure. Networks were trained either with “All residues” or only with residues that showed a conservation value of 3.5 or more. The calculation of the MCC was performed over the entire original set of residues, regardless of the effect of preselection on the overall counts.
Performance of selected catalytic-residue prediction methods
| | |||
| 63.06 | 17.06 | 26.82 | |
| 54.05 | 7.85 | 13.71 | |
| 51.35 | 21.75 | 30.56 | |
| 55.7 | 14.1 | 22.49 | |
| 48.2 | 17.0 | 25.13 | |
| 51.1 | 17.13 | 25.66 | |
A. Comparison of three methods using our independent testing set of 29 proteins. B. Comparison of three methods based on a 10-fold cross-validation procedure on a common dataset.
Figure 2Predicted (green) and experimentally characterized (red and blue) functional residues. Experimentally characterized functional residues that were correctly predicted are marked in red while those that were missed are in blue. (A) one case where several methods failed (1bd3) (B) a successful case (all six functional residues captured) 1lcb.