| Literature DB >> 25874225 |
Mark F Rogers1, Colin Campbell1, Yiming Ying2.
Abstract
There is significant interest in inferring the structure of subcellular networks of interaction. Here we consider supervised interactive network inference in which a reference set of known network links and nonlinks is used to train a classifier for predicting new links. Many types of data are relevant to inferring functional links between genes, motivating the use of data integration. We use pairwise kernels to predict novel links, along with multiple kernel learning to integrate distinct sources of data into a decision function. We evaluate various pairwise kernels to establish which are most informative and compare individual kernel accuracies with accuracies for weighted combinations. By associating a probability measure with classifier predictions, we enable cautious classification, which can increase accuracy by restricting predictions to high-confidence instances, and data cleaning that can mitigate the influence of mislabeled training instances. Although one pairwise kernel (the tensor product pairwise kernel) appears to work best, different kernels may contribute complimentary information about interactions: experiments in S. cerevisiae (yeast) reveal that a weighted combination of pairwise kernels applied to different types of data yields the highest predictive accuracy. Combined with cautious classification and data cleaning, we can achieve predictive accuracies of up to 99.6%.Entities:
Mesh:
Year: 2015 PMID: 25874225 PMCID: PMC4385617 DOI: 10.1155/2015/707453
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Kernel weights for the pairwise kernels used in this study. The weights selected for each kernel were those at the highest C-value that had two or more nonzero weights.
| Kernel | Kernel weights for individual models | |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| 0 | 0.449 | 0.099 | 0.033 | 0 | 0.419 |
|
| 0.362 | 0.198 | 0 | 0.096 | 0.308 | 0.035 |
|
| 0 | 0.258 | 0 | 0.200 | 0 | 0.542 |
|
| 0 | 0.170 | 0.176 | 0.211 | 0.191 | 0.252 |
|
| 0 | 0.266 | 0 | 0 | 0 | 0.734 |
Figure 1Comparison of average rankings for accuracy (a) and AUC (b) for 20 small data sets using unweighted pairwise kernels. The dot for each kernel identifies its mean rank; horizontal bars depict the Nemenyi test critical region for α = 0.05. The tensor product kernel () consistently had the highest ranking while the symmetric direct sum kernel () had the lowest. The differences between the remaining three kernels become clearer when we consider AUC as well as accuracy: the metric learning () kernel has higher rankings than the other two on both measures.
Cross-validation results for the pairwise kernels using unweighted (U) and weighted (W) combinations of the six unpaired kernels for data sets of different sizes. Shown is test accuracy averaged over N = 20, N = 10, or N = 5 data sets (1,098, 2,196, or 4,392 examples, respectively, split into 80% training and 20% test sets). In many cases, the MKL weights yield a significant improvement while in other cases there is no significant change. Significant values are denoted as follows: **Wilcoxon signed rank α = 0.01 or * α = 0.05, and †paired t-test α < 0.01. Statistically significant values are marked in bold type.
| Kernel |
|
|
| |||
|---|---|---|---|---|---|---|
| U | W | U | W | U | W | |
|
| 0.826 | 0.836* | 0.860 | 0.867 | 0.895 | 0.901 |
|
| 0.667 | 0.662 | 0.663 | 0.681** | 0.694 | 0.716† |
|
| 0.764 | 0.801** | 0.802 | 0.837** | 0.852 | 0.883† |
|
| 0.731 | 0.740 | 0.756 | 0.764 | 0.755 | 0.791† |
|
| 0.764 | 0.759 | 0.817 | 0.807 | 0.862 | 0.849 |
Figure 2Graphical depiction showing the typical improvement in accuracy we see when using a weighted sum of base kernels via MKL. Here, we compare the average performance of the best-performing composite kernel, (solid grey bars), with the corresponding base kernels (hashed bars) on data sets of three different sizes. By leveraging information from multiple kernels, provides an accuracy increase of 4% to 5% over the best of the base kernels. When we use MKL over all 30 base kernels combined (), we achieve a further 1.2% to 1.4% increase (black bars). Differences between and its base kernels are significant at α < 0.001; differences between and are significant at α < 0.01.
Kernel weights learned for a comprehensive kernel, , that combines all base pairwise kernels. For each pairwise kernel, we show the final weight assigned to each of its base kernels. The tensor product kernel () and the metric learning kernel () contribute the most information to this comprehensive kernel. None of the motif base kernels (K ) contribute, nor do any of the Cartesian product base kernels (). The kernel weights sum to unity.
| Kernel | Kernel weights for combined model | |||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
|
| 0 | 0.193 | 0 | 0.103 | 0.075 | 0.372 |
|
| 0 | 0.002 | 0 | 0.010 | 0 | 0 |
|
| 0 | 0.044 | 0 | 0.023 | 0 | 0.153 |
|
| 0 | 0.006 | 0 | 0.019 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 0 | 0 |
Figure 3Plot of the test accuracy ((a) y-axis) and fraction of pairs predicted ((b) y-axis) as a function of the p-value cutoff (x-axis) for (i) using all available pairwise and data kernels (, solid curve) and (ii) the top-performing pairwise kernel (, dashed curve). By increasing the p-value cutoff, we increase the accuracy in our predictions but decrease the fraction of pairs for which we can make predictions.
Figure 4Mean test error as a fraction (y-axis) versus the number of patterns learnt (x-axis) for the top-performing pairwise kernel, . Error bars depict a 95% confidence interval for 5-fold cross-validation test error averaged over 10 distinct data subsets, each with m = 2,196. The upper curve gives the performance if we learn all the data sequentially (from a common start set) in random order. The lower curve gives the test accuracy if the next addition to the training set is chosen based on having the lowest confidence predicted link-label.