| Literature DB >> 35378802 |
Zhenlong Sun1,2, Jing Yang1, Xiaoye Li2.
Abstract
Support vector machine (SVM) is an efficient classification method in machine learning. The traditional classification model of SVMs may pose a great threat to personal privacy, when sensitive information is included in the training datasets. Principal component analysis (PCA) can project instances into a low-dimensional subspace while capturing the variance of the matrix A as much as possible. There are two common algorithms that PCA uses to perform the principal component analysis, eigenvalue decomposition (EVD) and singular value decomposition (SVD). The main advantage of SVD compared with EVD is that it does not need to compute the matrix of covariance. This study presents a new differentially private SVD algorithm (DPSVD) to prevent the privacy leak of SVM classifiers. The DPSVD generates a set of private singular vectors that the projected instances in the singular subspace can be directly used to train SVM while not disclosing privacy of the original instances. After proving that the DPSVD satisfies differential privacy in theory, several experiments were carried out. The experimental results confirm that our method achieved higher accuracy and better stability on different real datasets, compared with other existing private PCA algorithms used to train SVM.Entities:
Mesh:
Year: 2022 PMID: 35378802 PMCID: PMC8976603 DOI: 10.1155/2022/2935975
Source DB: PubMed Journal: Comput Intell Neurosci
Symbols.
| Symbol | Description |
|---|---|
|
| The adjacent matrix of datasets |
|
| Matrix of covariance |
|
| Train instance |
|
| Label |
| Α | Dual vector |
|
| Symmetric matrix for kernel function |
|
| Kernel function |
|
| Vector composed entirely of ones |
|
| Upper limit of |
|
| Eigenvalue |
|
| Eigenvector |
| Γ | The accumulative contribution rate of principal components |
|
| The singular vectors or eigenvectors matrix |
|
| The singular values or eigenvalues diagonal matrix |
|
| Singular value |
|
| Unit diagonal matrix |
|
| A randomized mechanism |
|
| All subsets of possible outcomes of mechanism |
|
| Privacy budget |
|
| Privacy parameter |
|
| The |
|
| Laplace noise (mean: 0; scale: |
|
| Gaussian noise with (mean: 0; deviation: |
The comparison between the three algorithms.
| Algorithm | PCA | Adding mode | Noise form | Noise scale | Mechanism | Privacy level |
|---|---|---|---|---|---|---|
| DPSVD | SVD |
| Asymmetric |
| Gaussian | ( |
| AG | EVD |
| Symmetric |
| Gaussian | ( |
| DPPCA-SVM | EVD |
| Symmetric |
| Laplace | ( |
Test datasets.
| Indices | Datasets | Instances | Features | Ranges |
|---|---|---|---|---|
| 1 | A1a | 1605 | 119 | [0, 1] |
| 2 | Mushrooms | 8124 | 112 | [-1, 1] |
| 3 | Musk | 6598 | 166 | [-1, 1] |
| 4 | Splice | 1000 | 60 | [-1, 1] |
Performance comparison of algorithms on different datasets.
| Datasets |
| Algorithm |
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Mean | Std | Max | Min | Mean | Std | Max | Min | |||
| A1a | -- | SVM | 83.49 | -- | -- | -- | 754 | -- | -- | -- |
| 0.1 | DPSVD |
| 0.15 | 83.61 | 83.24 |
| 6 | 771 | 756 | |
| AG | 83.08 | 0.24 | 83.30 | 82.68 | 743 | 4 | 747 | 738 | ||
| DPPCA-SVM | 82.58 | 0.58 | 83.30 | 81.93 | 771 | 12 | 783 | 754 | ||
| 0.5 | DPSVD |
| 0.22 | 83.86 | 83.30 |
| 9 | 768 | 743 | |
| AG | 83.09 | 0.16 | 83.18 | 82.80 | 686 | 3 | 689 | 682 | ||
| DPPCA-SVM | 83.07 | 0.50 | 83.49 | 82.24 | 764 | 14 | 778 | 747 | ||
| 1 | DPSVD |
| 0.13 | 83.74 | 83.43 |
| 5 | 759 | 748 | |
| AG | 83.45 | 0.24 | 83.80 | 83.18 | 697 | 4 | 702 | 693 | ||
| DPPCA-SVM | 83.46 | 0.41 | 84.11 | 82.99 | 756 | 14 | 778 | 742 | ||
|
| ||||||||||
| Mushrooms | -- | SVM | 99.90 | -- | -- | -- | 617 | -- | -- | -- |
| 0.1 | DPSVD |
| 0.01 | 99.90 | 99.88 |
| 39 | 700 | 607 | |
| AG | 99.18 | 0.02 | 99.21 | 99.15 | 518 | 3 | 521 | 514 | ||
| DPPCA-SVM | 99.49 | 0.42 | 99.89 | 99.02 | 683 | 67 | 747 | 604 | ||
| 0.5 | DPSVD |
| 0.01 | 99.90 | 99.89 |
| 25 | 674 | 607 | |
| AG | 99.19 | 0.05 | 99.26 | 99.14 | 524 | 5 | 531 | 517 | ||
| DPPCA-SVM | 99.59 | 0.38 | 99.90 | 99.06 | 763 | 50 | 811 | 687 | ||
| 1 | DPSVD |
| 0.00 | 99.90 | 99.90 |
| 26 | 625 | 559 | |
| AG | 99.79 | 0.04 | 99.83 | 99.73 | 445 | 22 | 469 | 417 | ||
| DPPCA-SVM | 99.83 | 0.08 | 99.98 | 99.78 | 651 | 81 | 779 | 559 | ||
|
| ||||||||||
| Musk | -- | SVM | 93.95 | -- | -- | -- | 1351 | -- | -- | -- |
| 0.1 | DPSVD |
| 0.11 | 94.23 | 93.95 |
| 15 | 1369 | 1330 | |
| AG | 88.96 | 0.08 | 89.09 | 88.89 | 1865 | 10 | 1876 | 1855 | ||
| DPPCA-SVM | 93.97 | 0.20 | 94.23 | 93.74 | 1379 | 8 | 1391 | 1369 | ||
| 0.5 | DPSVD |
| 0.11 | 94.27 | 94.00 |
| 14 | 1379 | 1341 | |
| AG | 88.94 | 0.01 | 88.95 | 88.92 | 1866 | 6 | 1874 | 1858 | ||
| DPPCA-SVM | 94.10 | 0.15 | 94.35 | 93.97 | 1336 | 15 | 1355 | 1315 | ||
| 1 | DPSVD | 94.14 | 0.10 | 94.29 | 94.04 |
| 10 | 1358 | 1333 | |
| AG | 88.93 | 0.02 | 88.95 | 88.91 | 1872 | 10 | 1887 | 1860 | ||
| DPPCA-SVM |
| 0.17 | 94.35 | 93.92 | 1318 | 40 | 1384 | 1275 | ||
|
| ||||||||||
| Splice | -- | SVM | 94.30 | -- | -- | -- | 607 | -- | -- | -- |
| 0.1 | DPSVD |
| 0.75 | 92.40 | 90.60 | 635 | 17 | 662 | 616 | |
| AG | 90.56 | 0.83 | 91.30 | 89.30 |
| 16 | 605 | 568 | ||
| DPPCA-SVM | 87.14 | 0.32 | 87.40 | 86.70 | 643 | 16 | 660 | 619 | ||
| 0.5 | DPSVD | 92.00 | 0.80 | 92.80 | 90.70 |
| 10 | 625 | 600 | |
| AG |
| 0.38 | 94.00 | 93.00 | 588 | 5 | 595 | 582 | ||
| DPPCA-SVM | 87.22 | 0.73 | 88.40 | 86.60 | 659 | 34 | 706 | 615 | ||
| 1 | DPSVD | 92.36 | 0.56 | 93.10 | 91.80 |
| 19 | 645 | 594 | |
| AG |
| 0.31 | 93.90 | 93.10 | 594 | 3 | 599 | 591 | ||
| DPPCA-SVM | 87.36 | 0.94 | 88.30 | 86.00 | 641 | 21 | 667 | 614 | ||
Figure 1Accuracy at various ɛ on dataset A1a.
Figure 2Accuracy at various ɛ on dataset Mushrooms.
Figure 3Accuracy at various ɛ on dataset Musk.
Figure 4Accuracy at various ɛ on dataset Splice.
Figure 5SV at various ɛ on dataset A1a.
Figure 6SV at various ɛ on dataset Mushrooms.
Figure 7SV at various ɛ on dataset Musk.
Figure 8SV at various ɛ on dataset Splice.