| Literature DB >> 25260372 |
Abiel Roche-Lima1, Michael Domaratzki, Brian Fristensky.
Abstract
BACKGROUND: Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25260372 PMCID: PMC4261252 DOI: 10.1186/1471-2105-15-318
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Weighted transducer and weighted automaton representing sequences in the alphabet . (a) Weighted Transducer T. (b) Weighted Automaton A (A is obtained projecting the output of T).
Figure 2Conversion from a metabolic network to a graph representation. (a) Part of the Glycolysis Pathways, from BioCyc Database [5, 6]. (b) The resulting graph with the nodes (enzymes) and edges (enzyme-enzyme relations). (c) Table that represents known enzymes relations (EC numbers related are classified as +1 and non-related as -1).
Figure 3Diagram of pairwise SVM applied to metabolic network prediction. (a) An example of the pairs in the training set using the EC numbers (top) or gene names (bottom). (b) The pairwise kernel as a matrix, where the numerical values in each cell correspond to a measure of similarities, given two pairs of EC numbers (top) or two pairs of gene names (bottom). (c) A model is trained to estimate the parameters α and b of the decision function f. (d) Given a new pair of EC numbers (left) or gene names (right) the decision function is evaluated and the pair is classified as interacting or non-interacting.
Groups for PRK and pairwise kernel comparison
| Group | PRKs 1 | Pairwise Kernel 2 |
|---|---|---|
| N-GRAM |
|
|
| PHY |
|
|
| PFAM |
|
|
1Kernels were taken from Table 2.
2Computed with the Tensor Product Pairwise Kernel.
Average AUC ROC scores and processing times for various PRKs
| Exp | Type of kernels | Kernel | Average AUC score | Runtime (sec) | Confidence intervals |
|---|---|---|---|---|---|
| I | Pairwise Rational | PRK-Direct-Sum ( | 0.499 | 15.0 | [0.486, 0.512] |
| Kernels ( | PRK-Tensor-Product ( | 0.597 | 16.2 | [0.589, 0.605] | |
| (3-gram) | PRK-Metric-Learning ( | 0.641 | 17.4 | [0.633, 0.648] | |
| PRK-Cartesian ( | 0.640 | 15.0 | [0.632, 0.647] | ||
| II | PRKs combined | PRK-Direct-Sum+Phy ( | 0.425 | 136.2 | [0.411, 0.438] |
| with phylogenetic | PRK-Tensor+Phy ( | 0.733 | 135.6 | [0.725, 0.741] | |
| data ( | PRK-Metric+Phy ( | 0.761 | 139.2 | [0.753, 0.768] | |
| sequence kernel) | PRK-Cartesian+Phy ( | 0.742 | 132.6 | [0.734, 0.749] | |
| III | PRKs combined | PRK-D-Sum+PFAM ( | 0.493 | 136.2 | [0.480, 0.506] |
| with PFAM data | PRK-Tensor+PFAM ( | 0.827 | 136.8 | [0.819, 0.834] | |
| ( | PRK-Metric+PFAM ( | 0.844 | 140.4 | [0.837, 0.850] | |
| Sequence kernel) | PRK-Cartesian+PFAM ( | 0.842 | 132.0 | [0.835, 0.849] |
Figure 4Comparison of some pairwise rational kernels and pairwise kernels grouped by kernel types ( -GRAM group, PHY group and PFAM group).