| Literature DB >> 25982853 |
Yasser B Ruiz-Blanco1,2, Waldo Paz3,4, James Green5, Yovani Marrero-Ponce6,7.
Abstract
BACKGROUND: The exponential growth of protein structural and sequence databases is enabling multifaceted approaches to understanding the long sought sequence-structure-function relationship. Advances in computation now make it possible to apply well-established data mining and pattern recognition techniques to these data to learn models that effectively relate structure and function. However, extracting meaningful numerical descriptors of protein sequence and structure is a key issue that requires an efficient and widely available solution.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25982853 PMCID: PMC4432771 DOI: 10.1186/s12859-015-0586-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1GUI layer corresponding to the configuration of indices and weighting operators.
Illustration of the application of the Autocorrelation operator to the index using the parameter k = 2
|
|
|
|
|
|
|---|---|---|---|---|
|
| −3.53E-02 | L1 | L1’ = L1L3 = | 2.93E-04 |
|
| −1.54E-02 | L2 | L2’ = L2L4 = | 1.39E-04 |
|
| −8.31E-03 | L3 | L3’ = L3L1 + L3L5 = | 3.72E-04 |
|
| −9.01E-03 | L4 | L4’ = L4L2 + L4L6 = | 2.05E-04 |
|
| −9.43E-03 | L5 | L5’ = L5L3 + L5L7 = | 2.88E-04 |
|
| −7.36E-03 | L6 | L6’ = L6L4 + L6L8 = | 3.09E-04 |
|
| −2.23E-02 | L7 | L7’ = L7L5 = | 2.10E-04 |
|
| −3.30E-02 | L8 | L8’ = L8L6 = | 2.43E-04 |
The structure of an octapeptide from a mammalian prion protein (PDB code: 1OEH) was employed for calculations of lnFD values.
Figure 2GUI layer corresponding to the configuration of groups of residues.
Figure 3GUI layer corresponding to the configuration of aggregation operators.
Figure 4Comparison of Shannon entropy values of sequence-based features computed with PROFEAT, PROTEIN RECON, and the different weighting operators of ProtDCal: autocorrelation (AC), Kier-Hall (KH), electrotopological state (ES), Ivanshiuc-Balaban (IB) and gravitational-like (GR) operators.
Figure 5Comparison of runtime values per protein per feature versus protein lengths by using four families of protein features: structure-based thermodynamic indices, sequence-based thermodynamic indices, topographic indices (weighted by topological distance), and amino-acid-property-based indices (TAE-derived indices excluded).
Performance metrics for N-linked glycosylation prediction using GPP and ProtDCal features using Random Forest and Naïve Bayes classifiers
|
|
| |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| 92.8 | 96.6 | 91.8 | 90.3 | 83.8 | 94.6 |
|
| 91.6 | 93.2 | 91.4 | 91.1 | 97.6 | 90.6 |
CCI = Correctly classified instances. GPP results from Hamby & Hirst [59].
Performance metrics for N-linked glycosylation prediction from different contemporary predictors
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
|
| 91.6 | 91.1 | 92.8 | 76.7 | 95.0 | 79.8 |
|
| 93.2 | 97.6 | 96.6 | 43.9 | 98.0 | 72.7 |
|
| 91.4 | 90.6 | 91.8 | 95.7 | 77.0** | 81.9 |
Results reproduced from Hamby & Hirst [59]. CCI = Correctly classified instances. *Metrics of EnsembleGly are based on sequences-based 5-fold cross-validation. **This value refers to precision [= TP/(TP + FP)] and not to specificity [= TN/(TN + FP)] as it was originally reported [60].
Performance metrics for N-linked glycosylation prediction from using GPP ProtDCal’s models in a blind test
|
|
|
|
| |
|---|---|---|---|---|
|
| 87.11 | 93.50 | 86.40 | 43.60 |
|
| 86.78 | 95.80 | 85.80 | 43.10 |
|
| 66.21 | 97.22 | 62.72 | 22.70 |
CCI = Correctly classified instances.