| Literature DB >> 27471658 |
Abstract
Metabolic pathways can be conceptualized as the biological equivalent of a data pipeline. In living cells, series of chemical reactions are carried out by different proteins called enzymes in a stepwise manner. However, many pathways remain incompletely characterized, and in some of them, not all enzyme components have been identified. Kernel methods are useful in many difficult problem areas, such as document classification and bioinformatics. Specifically, kernel methods have been used recently to predict biological networks, such as protein-protein interaction networks and metabolic networks. In this paper, we implement and compare different methods and types of data to predict metabolic networks. The methods are Penalized Kernel Matrix Regression (PKMR) and pairwise Support Vector Machine (pSVM). We develop several experiments using these methods with sequence, non-sequence, and combined data. We obtain better accuracy when the sequence data are used in both methods. Whereas when the methods are compared using the same type of data, the pSVM approach shows better accuracy. The best results are obtained with pSVM using all combined kernels.Entities:
Keywords: Kernel methods; Machine learning; Metabolic pathways; Network prediction
Year: 2016 PMID: 27471658 PMCID: PMC4947111 DOI: 10.1007/s13721-016-0134-5
Source DB: PubMed Journal: Netw Model Anal Health Inform Bioinform ISSN: 2192-6670
Experiments are grouped by methods (experiment I, II, III—PKMR and experiment IV, V, VI—pSVM) and by type of data (I, IV—non-sequence data, II, V—sequence data, and III, VI—combined data)
| Experiment | Methods | Type of kernel |
|---|---|---|
| I | PKMR | Non-sequence (described in Sect. |
| II | PKMR | Sequence (described in Sect. |
| III | PKMR | Combined sequence and non-sequence (described in Sect. |
| IV | pSVM | Non-sequence (described in Sect. |
| V | pSVM | Sequence (described in Sect. |
| VI | pSVM | Combined sequence and non-sequence (described in Sect. |
Results collected during the experiments
| Experiment | Predictor kernel | AUC | Time | Confidence |
|---|---|---|---|---|
| I–PKMR–Non-Sequence |
| 0.660 | 300 | [0.655, 0.665] |
|
| 0.503 | 240 | [0.499, 0.507] | |
|
| 0.775 | 240 | [0.771, 0.779] | |
|
| 0.755 | 350 | [0.752, 0.759] | |
|
| 0.799 | 420 | [0.791, 0.807] | |
| II–PKMR–Sequence |
| 0.797 | 450 | [0.793, 0.801] |
|
| 0.782 | 430 | [0.778, 0.786] | |
|
| 0.725 | 420 | [0.720, 0.731] | |
|
| 0.817 | 480 | [0.811, 0.823] | |
|
| 0.821 | 530 | [0.818, 0.824] | |
| III–PKMR–Combined |
| 0.812 | 470 | [0.809, 0.816] |
|
| 0.831 | 610 | [0.828, 0.834] | |
|
| 0.840 | 720 | [0.831, 0.849] | |
| IV–pSVM–Non-Sequence |
| 0.791 | 9020 | [0.786, 0.796] |
|
| 0.696 | 7800 | [0.692, 0.700] | |
|
| 0.802 | 7980 | [0.797, 0.807] | |
|
| 0.818 | 10,100 | [0.812, 0.824] | |
|
| 0.877 | 10,121 | [0.871, 0.883] | |
| V–pSVM–Sequence |
| 0.887 | 12,060 | [0.879, 0.895] |
|
| 0.868 | 12,000 | [0.859, 0.877] | |
|
| 0.840 | 11,760 | [0.836, 0.844] | |
|
| 0.898 | 12,220 | [0.891, 0.905] | |
|
| 0.910 | 12,800 | [0.901, 0.919] | |
| VI–pSVM–Combined |
| 0.890 | 12,100 | [0.882, 0.898] |
|
| 0.939 | 13,420 | [0.935, 0.944] | |
|
| 0.940 | 14,010 | [0.934, 0.946] |
These are AUC score (area under the ROC curve as accuracy), time s (Execution times in seconds), and confidence intervals
Fig. 1Comparison of the methods (PKMR—Penalized Kernel Matrix Regression and pSVM—pairwise Support Vector Machine) for the sequence data kernels, related to accuracy and execution times