| Literature DB >> 24590441 |
Pooya Zakeri1, Ben Jeuris2, Raf Vandebril2, Yves Moreau3.
Abstract
MOTIVATION: Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, kernel methods are an interesting class of techniques for integrating heterogeneous data. Various methods have been proposed to fuse multiple kernels. Most techniques for multiple kernel learning focus on learning a convex linear combination of base kernels. In addition to the limitation of linear combinations, working with such approaches could cause a loss of potentially useful information.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24590441 PMCID: PMC4071197 DOI: 10.1093/bioinformatics/btu118
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The architecture of our fusion model for protein fold recognition. GeoFold refers to the fusion model that uses 26 different data sources, and FunGeoFold refers to the kernel fusion model that incorporates the FunD information through GM between FunD kernel and fused kernel produced by GeoFold (GeoFold kernel)
Comparison of proposed models with the existing predictor and meta-predictors
| Methods | PERF | Description |
|---|---|---|
| SVM | 56 | ( |
| SE | 61.1 | ( |
| PFP-Pred | 62.1 | ( |
| PFRES | 68.4 | ( |
| VBKC | 68.1 | ( |
| MLKdiv-dc | 73.36 | ( |
| MLKdiv-conv | 71.01 | ( |
| MLKdiv-dc | 75.19 | ( |
| PFP-FunDseqE | 70.5 | ( |
| Classifier Fusion | 67.02 | ( |
| MarFold | 71.7 | ( |
| Tax-Fold | 71.5 | ( |
| Bi-grams | 69.5 | ( |
| HPFP | 74.21 | ( |
| MKLdiv-dc | 61.1 | 26 PFs |
| MKLdiv-conv | 63.70 | 26 PFs |
| AK-MKL | 61.88 | 26 PFs |
| SimpleMKL | 56.92 | 26 PFs |
| Harmonic mean | 65.80 | 26 PFs |
| Arithmetic mean | 60.57 | 26 PFs |
| Karcher-KF (GeoFold1) | 86.16 | GFK1 (geometric mean) 26 PFs |
| AGH-KF (GeoFold2) | 86.68 | GFK2 (geometric mean) 26 PFs |
| LogE-KF (LogEFold) | 81.72 | LogE (Log-Euclidean mean) 26 PFs |
Fig. 2.The effect of sequentially incorporating PFs according to the decreasing order of their kernels performance. The results of sequentially adding sequence-based features are further discussed in the Supplemental Material
The results of incorporating the FunD composition
| Methods | PERF | Methods | PERF |
|---|---|---|---|
| FunFold-cdd | 69.94 | FunLogFold-cdd | 87.43 |
| FunFold-InterPro | 73.89 | FunLogFold-InterPro | 89.30 |
| FunFold-Combined | 76.50 | FunAmtFold-cdd | 77.2 |
| FunGeoFold-cdd | 87.71 | FunAmtFold-InterPro | 84.07 |
| FuncGeoFold-InterPro | 89.30 |
FunLogEFold (FunAmFold) is referred to the kernel fusion model, which incorporates the FunD information (extracted from CDD or InterPro) and GeoFold kernels through LogE(AM).
Fig. 3.The performance of convex linear combination of two different kernels using 201 different pairs weights of kernels (blue line). The relative performances of fused kernels through weighted LogEM (red line). The relative performances of fused kernels using weighted GM (for more details see the Supplementary Material) (magenta line)
Perfomance of our proposed data fusion approach on newDD dataset
| Methods | Performance | Protein features |
|---|---|---|
| Tax-Fold | 90 | ( |
| PS2 | Sequence evolution information | |
| S | Predicted secondary structure | |
| FunFold-cdd | FunD-cdd | |
| FunFold-InterPro | FunD-InterPro | |
| GeoFold (AGH-KF) | PS2 and S | |
| FunGeogEFold-cdd | PS2,S,FunD-cdd | |
| FunGeoFold-InterPro | PS2,S,FunD-InterPro | |
| LogEFold (LogE) | PS2 and S | |
| FunLogEFold-cdd | PS2,S,FunD-cdd | |
| FunLogEFold-InterPro | PS2,S,FunD-InterPro |