| Literature DB >> 35418133 |
Wenbo Liu1,2, Shengnan Liang3,4, Xiwen Qin5.
Abstract
The kernel function in SVM enables linear segmentation in a feature space for a large number of linear inseparable data. The kernel function that is selected directly affects the classification performance of SVM. To improve the applicability and classification prediction effect of SVM in different areas, in this paper, we propose a weighted p-norm distance t kernel SVM classification algorithm based on improved polarization. A t-class kernel function is constructed according to the t distribution probability density function, and its theoretical proof is presented. To find a suitable mapping space, the t-class kernel function is extended to the p-norm distance kernel. The training samples are obtained by stratified sampling, and the affinity matrix is redefined. The improved local kernel polarization is established to obtain the optimal kernel weights and kernel parameters so that different kernel functions are weighted combinations. The cumulative optimal performance rate is constructed to evaluate the overall classification performance of different kernel SVM algorithms, and the significant effects of different p-norms on the classification performance of SVM are verified by 10 times fivefold cross-validation statistical comparison tests. In most cases, the results using 6 real datasets show that compared with the traditional kernel function, the proposed weighted p-norm distance t kernel can improve the classification prediction performance of SVM.Entities:
Year: 2022 PMID: 35418133 PMCID: PMC9008017 DOI: 10.1038/s41598-022-09766-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1p-norm distance t-kernel value under different scale parameters.
Data information.
| Dataset name | Sample size | Feature | Categories | Data source |
|---|---|---|---|---|
| Kidney | 400 | 24 | 2 | UCI |
| Dermatology | 366 | 34 | 6 | UCI |
| Sonar | 208 | 60 | 2 | UCI |
| Pima | 768 | 8 | 2 | UCI |
| Postcode | 7291 | 256 | 10 | [ |
| Breast | 98 | 1213 | 3 | BIGDAC yyy[ |
UCI: http://archive.ics.uci.edu/ml/index.php.
BIGDAC: http://portals.broadinstitute.org/cgi-bin/cancer/datasets.cgi.
The optimized result of the weight coefficients and kernel parameters.
| Dataset | Kernel parameter 1 | Kernel parameter 2 | ||
|---|---|---|---|---|
| Kidney | 0.78 | 0.22 | ||
| Dermatology | 0.23 | 0.77 | ||
| Sonar | 0.91 | 0.29 | ||
| Pima | 0.78 | 0.22 | ||
| Postcode | 0.86 | 0.14 | ||
| Breast | 0.65 | 0.35 |
The fivefold cross-validation classification accuracy based on the SVM algorithm with different kernel functions.
| Dataset | Poly + SVM | Sig + SVM | Gau + SVM | Lap + SVM | SMKL + SVM | W |
|---|---|---|---|---|---|---|
| Kidney | 0.9575 | 0.9850 | ||||
| Dermatology | 0.9344 | 0.9672 | 0.9645 | 0.9699 | 0.9672 | |
| Sonar | 0.7978 | 0.8413 | 0.8170 | 0.8364 | 0.8459 | |
| Pima | 0.7448 | 0.6771 | 0.7643 | 0.7735 | 0.7696 | |
| Postcode | 0.9243 | 0.8432 | 0.9230 | 0.9243 | ||
| Breast | 0.8684 | 0.7653 | 0.8384 | 0.6947 | 0.8684 |
Significant values are in bold.
The fivefold cross-validation classification recall based on the SVM algorithm with different kernel functions.
| Dataset | Poly + SVM | Sig + SVM | Gau + SVM | Lap + SVM | SMKL + SVM | W |
|---|---|---|---|---|---|---|
| Kidney | 0.9494 | 0.9875 | ||||
| Dermatology | 0.9453 | 0.9727 | 0.9704 | 0.9749 | 0.9727 | |
| Sonar | 0.8077 | 0.8678 | 0.8237 | 0.8398 | 0.8636 | |
| Pima | 0.7491 | 0.6820 | 0.7761 | 0.7883 | 0.7774 | |
| Postcode | 0.9211 | 0.8693 | 0.9358 | 0.9370 | 0.9370 | |
| Breast | 0.8070 | 0.6377 | 0.7820 | 0.5789 | 0.8070 |
Significant values are in bold.
The fivefold cross-validation classification Kappa coefficient based on the SVM algorithm with different kernel functions.
| Dataset | Poly + SVM | Sig + SVM | Gau + SVM | Lap + SVM | SMKL + SVM | W |
|---|---|---|---|---|---|---|
| Kidney | 0.9113 | 0.9679 | ||||
| Dermatology | 0.9174 | 0.9586 | 0.9550 | 0.9619 | 0.9585 | |
| Sonar | 0.5922 | 0.6778 | 0.6300 | 0.6692 | 0.6861 | |
| Pima | 0.4154 | 0.2894 | 0.4573 | 0.4743 | 0.4741 | |
| Postcode | 0.9148 | 0.8232 | 0.9130 | 0.9146 | 0.9161 | |
| Breast | 0.7683 | 0.5935 | 0.7106 | 0.3879 | 0.7694 |
Significant values are in bold.
The fivefold cross-validation classification training time based on the SVM algorithm with different kernel functions (minutes).
| Dataset | Poly + SVM | Sig + SVM | Gau + SVM | Lap + SVM | SMKL + SVM | W |
|---|---|---|---|---|---|---|
| Kidney | 0.05 | 0.12 | 0. 87 | 0.05 | 0.64 | 0.78 |
| Dermatology | 0.07 | 0.18 | 2.64 | 0.133 | 0.54 | 0.65 |
| Sonar | 0.04 | 0.18 | 0.60 | 0.03 | 0.21 | 0.29 |
| Pima | 4.20 | 0.29 | 2.01 | 0.08 | 2.53 | 2.71 |
| Postcode | 1.37 | 3.73 | 35.25 | 1.71 | 8.21 | 17.53 |
| Breast | 0.31 | 0.36 | 0.70 | 0.33 | 0.32 | 0.39 |
Figure 2The fivefold cross-validation accuracy varies with p-norm distance based on 6 datasets.
Figure 3The fivefold cross-validation recall varies with p-norm distance based on 6 datasets.
Figure 4The fivefold cross-validation Kappa coefficient varies with p-norm distance based on 6 datasets.
The p-norm distance setting in different datasets.
| Dataset | |
|---|---|
| Kidney | |
| Dermatology | |
| Sonar | |
| Pima | |
| Postcode | |
| Breast |
The statistical comparison test of the weighted t kernel SVM classification performance at the 2 level p-norm.
| Dataset | Accuracy test | Recall test | Kappa coefficient test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sig | Sig | Sig | |||||||
| Kidney | 6.488 | 0.0232 | 7.489 | 0.0279 | 4.421 | 0.0359 | |||
| Dermatology | − 1.214 | − 0.0009 | No | − 0.9019 | -0.0005 | No | − 1.68 | -0.0030 | No |
| Sonar | 3.074 | 0.0057 | 2.775 | 0.0054 | 2.874 | 0.0104 | |||
| Pima | − 5.116 | − 0.0060 | − 3.283 | -0.0055 | − 5.254 | − 0.0118 | |||
| Postcode | 1.937 | 0.0025 | No | 1.157 | 0.0015 | No | 1.934 | 0.028 | No |
| Breast | 1.5 | 0.0020 | No | 1.5 | 0.0017 | No | 1.496 | 0.0035 | No |
Sig: significance.
Significant values are in bold.