| Literature DB >> 20529363 |
Shi Yu1, Tillmann Falck, Anneleen Daemen, Leon-Charles Tranchevent, Johan Ak Suykens, Bart De Moor, Yves Moreau.
Abstract
BACKGROUND: This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as L(infinity), L1, and L2 MKL. In particular, L2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing L(infinity) MKL method. In real biomedical applications, L2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources.Entities:
Mesh:
Year: 2010 PMID: 20529363 PMCID: PMC2906488 DOI: 10.1186/1471-2105-11-309
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Acronyms
| ℝ | the dual variable of SVM | |
| ℝ | a semi-positive definite matrix | |
| ℝ | a convex set | |
| Ω | ℝ | a combination of multiple semi-positive definite matrices |
| ℕ | the index of kernel matrices | |
| ℕ | the number of kernel matrices | |
| [0, 1] | coefficients of kernel matrices | |
| [0, + ∞) | dummy variable in optimization problem | |
| ℝ | ||
| ℝ | ||
| ℝ | the norm vector of the separating hyperplane | |
| ℝ | the feature map | |
| ℕ | the index of training samples | |
| ℝ | the vector of the | |
| ℝ | bias term in 1-SVM | |
| ℝ+ | regularization term of 1-SVM | |
| ℝ | slack variable for the | |
| ℝ | kernel matrix | |
| ℝ | kernel function, | |
| ℝ | the vector of a test data sample | |
| -1 or +1 | the class label of the | |
| ℝ | the diagonal matrix of class labels | |
| ℝ+ | the box constraint on dual variables of SVM | |
| ℝ+ | the bias term in SVM and LSSVM | |
| ℝ | ||
| ℕ | the number of classes | |
| ℝ | ||
| ℝ | variable vector in SIP problem | |
| ℝ | dummy variable in SIP problem | |
| ℕ | the index of class number in classification problem, | |
| ℝ | ||
| λ | ℝ+ | the regularization parameter in LSSVM |
| ℝ | the error term of the | |
| ℝ | the dual variable of LSSVM, | |
| ℝ+ | precision value as the stopping criterion of SIP iteration | |
| ℕ | index parameter of SIP iterations | |
| ℝ |
The notation used in this paper is based on the dual problem and can be linked to a equivalent notation in the primal problem
| primal problem | dual problem | |
|---|---|---|
| variable | ||
Summary of algorithms implemented in the paper
| Algorithm Nr. | Formulation Nr. | Name | References | Formulation | Equations |
|---|---|---|---|---|---|
| 1 | 1-A | 1-SVM | [ | SOCP | (20) |
| 1 | 1-B | 1-SVM | [ | QCQP | (20) |
| 2 | 2-A | 1-SVM | [ | SOCP | (20) |
| 2 | 2-B | 1-SVM | [ | QCQP | (20) |
| 3 | 3-A | 1-SVM | [ | SOCP | (19) |
| 3 | 3-B | 1-SVM | [ | QCQP | (19) |
| 4 | 4-A | 1-SVM | novel | SOCP | (23) |
| 5 | 5-B | SVM | [ | QCQP | (26) |
| 5 | 5-C | SVM | [ | SIP | (33) |
| 6 | 6-B | SVM | novel | QCQP | (26) |
| 7 | 7-A | SVM | [ | SOCP | (25) |
| 7 | 7-B | SVM | [ | QCQP | (25) |
| 8 | 8-A | SVM | novel | SOCP | (27) |
| 8 | 8-C | SVM | [ | SIP | (34) |
| 9 | 9-B | Weighted SVM | novel | QCQP | Suppl. (3) |
| 10 | 10-B | Weighted SVM | novel | QCQP | Suppl. (3) |
| 11 | 11-B | Weighted SVM | [ | QCQP | Suppl. (2) |
| 12 | 12-A | Weighted SVM | novel | SOCP | Suppl. (4) |
| 13 | 13-B | LSSVM | [ | QCQP | (39) |
| 13 | 13-C | LSSVM | [ | SIP | (41) |
| 14 | 14-B | LSSVM | novel | QCQP | (39) |
| 15 | 15-D | LSSVM | [ | linear | (38) |
| 16 | 16-B | LSSVM | novel | SOCP | (40) |
| 16 | 16-C | LSSVM | novel | SIP | (42) |
| 17 | 17-B | Weighted LSSVM | novel | QCQP | Suppl. (8) |
| 18 | 18-B | Weighted LSSVM | novel | QCQP | Suppl. (8) |
| 19 | 19-D | Weighted LSSVM | [ | linear | Suppl. (6) |
| 20 | 20-A | Weighted LSSVM | novel | SOCP | Suppl. (9) |
Summary of algorithms implemented in the paper. Because a same algorithm can be solved via different formulations. The different formulation numbers correspond to a same algorithm number and represent multiple formulations. In total 20 different algorithms were implemented, which were solved through 28 different formulations. For an algorithm with different formulations, the solutions are identical and only differ by computational efficiency. Some algorithms have already been proposed in the literature as shown in the reference column. The novel algorithms and formulations proposed in this paper are labeled as "novel".
Summary of data sets and algorithms used in five experiments
| Nr. | Data Set | Problem | Samples | Classes | Algorihtms | Evaluation |
|---|---|---|---|---|---|---|
| 1 | disease relevant genes | ranking | 620 | 1 | 1-4 | LOO AUC |
| 2 | prostate cancer genes | ranking | 9 | 1 | 1-4 | AUC |
| 3 | rectal cancer patients | classification | 36 | 2 | 5-8,13-16 | LOO AUC |
| 4 | endometrial disease | classification | 339 | 2 | 5-8,13-16 | 3-fold AUC |
| miscarriage | classification | 2356 | 2 | 5-8,13-16 | 3-fold AUC | |
| pregnancy | classification | 856 | 2 | 9-12,17-20 | 3-fold AUC | |
| 5 | UCI pen digit and optical digit | classification | 1000-3000 | 10 | 1A,1B,5B,5C,13B,13C | CPU time |
Results of experiment 1: prioritization of 620 disease relevant genes by genomic data fusion
| Error of AUC (mean) | Error of AUC (std.) | p-value | corr | corr | corr | corr | |
|---|---|---|---|---|---|---|---|
| 0.0923 | 0.0035 | 2.98 · 10-17 | - | 0.94 | 0.66 | 0.82 | |
| 0.0806 | 0.0033 | 2.66 · 10-06 | 0.94 | - | 0.82 | 0.92 | |
| 0.0908 | 0.0042 | 1.92 · 10-16 | 0.66 | 0.82 | - | 0.90 | |
| 0.0034 | - | 0.82 | 0.92 | 0.90 | - |
Results of experiment 1: disease relevant gene prioritization by genomic data fusion. The error of AUC values is evaluated by LOO validation in 20 random repetitions. The best performance (L2) is shown in bold. The p-values are compared with the best performance using a paired t-test. As shown, the L2 method is significantly better than other methods. The paired Spearman correlation scores compare similarities of rankings obtained by different approaches when compared with the target rankings (denoted as -). Higher Spearman correlation values mean that the two ranking results are much similar.
Figure 1Optimal kernel coefficients for disease gene prioritization. Optimal kernel coefficients assigned on genomic data sources in disease gene prioritization. For each method, the average coefficients of 20 repetitions are shown. The three most important data sources ranked by L∞ are Text, GO, and Motif. The coefficients on other six sources are almost zero. The L2 method shows the same ranking on these three best data sources as L∞, moreover, it also shows ranking for other six sources. Thus, as another advantage of L2 method, it provides more refined ranking of data sources than L∞ method in data integration.
Results of experiment 2: prioritization of prostate cancer genes by genomic data fusion
| Name | Ensemble id | References | Endeavour | ||||
|---|---|---|---|---|---|---|---|
| CPNE | ENSG00000085719 | Thomas | 0.3030 | 0.2323 | - | ||
| 31/100 | 24/100 | 70/100 | |||||
| CDH23 | ENSG00000107736 | Thomas | 0.0606 | 0.0303 | - | ||
| 7/100 | 4/100 | 78/100 | |||||
| EHBP1 | ENSG00000115504 | Gudmundsson | 0.5354 | 0.5152 | - | ||
| 54/100 | 52/100 | 57/100 | |||||
| MSMB | ENSG00000138294 | Eeles | 0.0505 | - | |||
| Thomas | 6/100 | 69/100 | |||||
| KLK3 | ENSG00000142515 | Eeles | 0.3434 | 0.3535 | - | ||
| 35/100 | 36/100 | ||||||
| JAZF1 | ENSG00000153814 | Thomas | - | ||||
| 7/100 | |||||||
| LMTK2 | ENSG00000164715 | Eeles | 0.4646 | 0.8081 | 0.7677 | - | |
| 47/100 | 81/100 | 77/100 | |||||
| IL16 | ENSG00000172349 | Thomas | 0.0303 | - | |||
| 4/100 | 72/100 | ||||||
| CTBP2 | ENSG00000175029 | Thomas | 0.5758 | 0.6869 | - | ||
| 58/100 | 69/100 | ||||||
Results of experiment 2: prioritization of prostate cancer genes by genomic data fusion. For each novel prostate cancer gene, the first row shows the error of AUC values and the second row lists the ranking position of the prostate cancer gene among its 99 closet neighboring genes.
Results of experiment 3: classification of patients in rectal cancer clinical decision using microarray and proteomics data sets
| 24 g | 0.0584 | 0.0519 | 0.0812 | 0.0812 | 0.1331 | 0.1331 | 0.1331 | 0.1331 | 0.1364 | |
| 25 g | 0.0519 | 0.0617 | 0.0649 | 0.1136 | 0.1104 | 0.1234 | 0.1201 | 0.1234 | ||
| 26 g | 0.0487 | 0.0487 | 0.0812 | 0.0844 | 0.0877 | 0.1266 | 0.1136 | 0.1234 | 0.1299 | 0.1364 |
| 27 g | 0.0617 | 0.0649 | 0.0812 | 0.0877 | 0.0942 | 0.1429 | 0.1364 | 0.1364 | 0.1331 | 0.1461 |
| 28 g | 0.0552 | 0.0487 | 0.0617 | 0.0747 | 0.0714 | 0.1429 | 0.1331 | 0.1331 | 0.1364 | 0.1396 |
| LSSVM | SVM | |||||||||
| 14 p | 15 p | 16 p | 17 p | 18 p | 14 p | 15 p | 16 p | 17 p | 18 p | |
| 24 g | 0.0584 | 0.0519 | 0.0812 | 0.0812 | 0.1266 | 0.1006 | 0.1266 | 0.1299 | 0.1331 | |
| 25 g | 0.0519 | 0.0617 | 0.0649 | 0.1136 | 0.1071 | 0.1234 | 0.1201 | 0.1234 | ||
| 26 g | 0.0487 | 0.0487 | 0.0812 | 0.0844 | 0.0877 | 0.1136 | 0.1136 | 0.1201 | 0.1266 | 0.1331 |
| 27 g | 0.0617 | 0.0649 | 0.0812 | 0.0877 | 0.0942 | 0.1364 | 0.1364 | 0.1364 | 0.1331 | 0.1461 |
| 28 g | 0.0552 | 0.0487 | 0.0617 | 0.0747 | 0.0714 | 0.1299 | 0.1299 | 0.1299 | 0.1331 | 0.1364 |
| LSSVM | SVM | |||||||||
| 14 p | 15 p | 16 p | 17 p | 18 p | 14 p | 15 p | 16 p | 17 p | 18 p | |
| 24 g | 0.0747 | 0.0747 | 0.0584 | 0.0714 | 0.0747 | |||||
| 25 g | 0.0584 | 0.0519 | 0.0649 | 0.0714 | 0.0714 | |||||
| 26 g | 0.0584 | 0.0519 | 0.0682 | 0.0682 | 0.0682 | |||||
| 27 g | 0.0617 | 0.0584 | 0.0714 | 0.0682 | 0.0682 | |||||
| 28 g | 0.0584 | 0.0584 | 0.0649 | 0.0649 | 0.0682 | |||||
| LSSVM | SVM | |||||||||
| 14 p | 15 p | 16 p | 17 p | 18 p | 14 p | 15 p | 16 p | 17 p | 18 p | |
| 24 g | 0.0909 | 0.0877 | 0.0974 | 0.0942 | 0.1006 | |||||
| 25 g | 0.0747 | 0.0649 | 0.0812 | 0.0844 | 0.0844 | |||||
| 26 g | 0.0747 | 0.0584 | 0.0812 | 0.0779 | 0.0779 | |||||
| 27g | 0.0779 | 0.0812 | 0.0844 | 0.0812 | 0.0812 | |||||
| 28 g | 0.0812 | 0.0714 | 0.0812 | 0.0779 | 0.0812 | |||||
The table shows the error of AUC in patient classification using microarray and proteomics data. In LSSVM L∞, L∞ (0.5), and L2, the regularization parameter λ was estimated jointly as the kernel coefficient of an identity matrix. In LSSVM L1, λ was set to 1. In all SVM approaches, the C parameter of the box constraint was set to 1. In the table, the row and column labels represent the numbers of genes (g) and proteins (p) used to construct the kernels. The genes and proteins were ranked by feature selection techniques (see text). The AUC of LOO validation was evaluated without the bias term b (as the implicit bias approach) because its value varied by each left out sample. In this problem, considering the bias term decreased the AUC performance. The performance was compared among eight algorithms for the same number of genes and proteins, where the best values (the smallest Error of AUC) are represented in bold, the second best ones in italic. The best performance of all the feature selection results is underlined. The table presents the 25 best feature selection results of each method. The complete experimental results containing 26 different numbers of genes and 26 numbers of proteins is available at http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html.
Figure 2The effect of . The effect of θin LSSVM MKL and SVM MKL classifiers for rectal cancer diagnosis. Figure on the top: the performance of LSSVM MKL. Figure on the bottom: the performance of SVM MKL. In each figure we compare three feature selection results. The performance of L2 MKL is shown as dashed lines.
Figure 3Benchmark of various λ values in LSSVM MKL classifiers in rectal cancer diagnosis. Benchmark of various λ values in LSSVM MKL classifiers for rectal cancer diagnosis. The four kernels were constructed using 27 gene features and 17 protein features (see text). For each fixed λ value, the error of AUC was evaluated by LOO validation. The maximal and minimal estimated λ in L∞ and L2 MKL are shown.
Results of experiment 4 data set I: classification of endometrial disease patients using multiple kernels derived from clinical data
| Classifier | Mean - error of AUC | Std. - error of AUC | pvalue |
|---|---|---|---|
| - | |||
| 0.4369 | |||
| 0.2483 | |||
| LSSVM | 0.2456 | 0.0124 | 0.0363 |
| SVM | 0.2489 | 0.0178 | 0.0130 |
| SVM | 0.2513 | 0.0144 | 0.0057 |
| LSSVM | 0.2574 | 0.0189 | 9.98 · 10-5 |
| LSSVM | 0.2678 | 0.0130 | 1.53 · 10-6 |
Results of experiment 4 data set I: classification of endometrial disease patients using multiple kernels derived from clinical data. The classifier with the best performance is shown in bold. The p-values are compared with the best performance using a paired t-test. The performance of classifiers is sorted from high to low according to the p-values.
Results of experiment 4 data set II: classification of miscarriage patients using multiple kernels derived from clinical data
| Classifier | Mean - error of AUC | Std. - error of AUC | pvalue |
|---|---|---|---|
| - | |||
| 0.0712 | |||
| LSSVM | 0.2027 | 0.0045 | 9.77 · 10-4 |
| SVM | 0.2109 | 0.0040 | 9.55 · 10-12 |
| SVM | 0.2168 | 0.0040 | 1.79 · 10-12 |
| LSSVM | 0.2132 | 0.0029 | 1.11 · 10-13 |
| SVM | 0.2297 | 0.0038 | 1.10 · 10-15 |
| LSSVM | 0.2319 | 0.0015 | 3.42 · 10-21 |
Results of experiment 4 data set II: classification of miscarriage patients using multiple kernels derived from clinical data. The classifier with the best performance is shown in bold. The p-values are compared with the best performance using a paired t-test. The performance of classifiers is sorted from high to low according to the p-values.
Results of experiment 4 data set III: classification of PUL patients using multiple kernels derived from clinical data
| Classifier | Mean - error of AUC | Std. - error of AUC | pvalue |
|---|---|---|---|
| - | |||
| 0.0519 | |||
| Weighted LSSVM | 0.1290 | 0.0206 | 0.0169 |
| Weighted SVM | 0.1499 | 0.0248 | 4.79 · 10-5 |
| Weighted SVM | 0.1552 | 0.0210 | 1.02 · 10-6 |
| Weighted SVM | 0.1551 | 0.0153 | 3.87 · 10-6 |
| Weighted SVM | 0.1594 | 0.0162 | 2.29 · 10-9 |
| Weighted LSSVM | 0.1651 | 0.0174 | 4.41 · 10-10 |
Results of experiment 4 data set II: classification of PUL patients using multiple kernels derived from clinical data. The classifier with the best performance is shown in bold. The p-values are compared with the best performance using a paired t-test. The performance of classifiers is sorted from high to low according to the p-values.
Figure 4The effect of . The effect of θin LSSVM MKL and SVM MKL classifiers on endometrial disease data set. Figure on the left: performance of the regularized LSSVM L∞ MKL with various θvalues. Figure on the right: performance of the regularized SVM L∞ MKL. The black dashed lines represent the performance of the L2 MKL classifiers. The error bars are standard deviations of 20 repetitions.
Figure 5The effect of . The effect of θin LSSVM MKL and SVM MKL classifiers on miscarriage data set. Figure on the left: performance of the regularized LSSVM L∞ MKL with various θvalues. Figure on the right: performance of the regularized SVM L∞ MKL. The black dashed lines represent the performance of the L2 MKL classifiers. The error bars are standard deviations of 20 repetitions.
Figure 6The effect of . The effect of θin LSSVM MKL and SVM MKL classifiers on pregnancy data set. Figure on the left: performance of the regularized LSSVM L∞ MKL with various θvalues. Figure on the right: performance of the regularized SVM L∞ MKL. The black dashed lines represent the performance of the L2 MKL classifiers. The error bars are standard deviations of 20 repetitions.
Comparison of the performance obtained by joint estimation of λ and standard cross-validation in LSSVM MKL
| Data Set | Norm | Validation Approach | Estimation Approach |
|---|---|---|---|
| endometrial disease | 0.2625 ± 0.0146 | 0.2678 ± 0.0130 | |
| 0.2584 ± 0.0188 | 0.2456 ± 0.0124 | ||
| miscarriage | 0.1873 ± 0.0100 | 0.2319 ± 0.0015 | |
| 0.1912 ± 0.0089 | 0.2002 ± 0.0049 | ||
| pregnancy | 0.1321 ± 0.0243 | 0.1651 ± 0.0173 | |
| 0.1299 ± 0.0172 | 0.1165 ± 0.0100 | ||
Comparison of the performance obtained by joint estimation of λ and standard cross-validation using LSSVM MKL. As shown, the estimation approach based on L2 MKL is better than L∞ MKL. This is because when the kernel coefficients are sparse, the estimated regularization parameters λ are either very big or very small, which are usually not optimal values in LSSVM. In contrast, the λ values estimated by L2 method are at normal scale and often close to the optimal values.
Convexity and complexity of all methods
| Method | convexity | complexity |
|---|---|---|
| 1-SVM SOCP | convex | |
| 1-SVM QCQP | convex | |
| SVM SOCP | convex | |
| SVM QCQP | convex | |
| SVM SIP | convex | |
| SVM SIP | relaxation | |
| LSSVM SOCP | convex | |
| LSSVM QCQP | convex | |
| LSSVM SIP | convex | |
| LSSVM SIP | relaxation |
Convexity and complexity of all methods. n is the number of samples, p is the number of kernels, k is the number of classes, τ is the number of iterations in SIP. The complexity of LSSVM SIP depends on the algorithms used to solve the linear system. For the conjugate gradient method, the complexity is between O(n1.5) and O(n2) [22].
Figure 7Comparison of QP formulation and SIP formulation on large scale data. Comparison of QP formulation and SIP formulation on large scale data. Figure on the top left: comparison of SOCP and QCQP formulations to solve 1-SVM MKL using two kernels. To simulate the ranking problem in 1-SVM, 3000 digit samples were retrieved as training data. Two kernels were constructed respectively for each data source using RBF kernel functions. The computational time was thus evaluated by combining the two 3000 × 3000 kernel matrices. Figure on the top right: comparison of SVM and LSSVM MKL on problems with increasing number of samples. The benchmark data set was made up of two linear kernels and labels in 10 digit classes. The number of data points was increased from 1000 to 3000. Figure on the bottom left: comparison of SVM and LSSVM MKL on problems with increasing number of kernels. The benchmark data set was constructed by 2000 samples labeled in 2 classes. We used different kernel widths to construct the RBF kernel matrices and increase the number of kernel matrices from 2 to 200. The QCQP formulations had memory issues when the number of kernels was larger than 60. Figure on the bottom right: comparison of SVM and LSSVM on problems with increasing number of classes. The benchmark data was made up of two linear kernel matrices and 2000 samples. The samples were equally and randomly divided into various number of classes. The class number gradually increased from 2 to 20.