| Literature DB >> 35327860 |
Itay Mosafi1, Eli Omid David1, Yaniv Altshuler2, Nathan S Netanyahu1,3.
Abstract
As state-of-the-art deep neural networks are being deployed at the core level of increasingly large numbers of AI-based products and services, the incentive for "copying them" (i.e., their intellectual property, manifested through the knowledge that is encapsulated in them) either by adversaries or commercial competitors is expected to considerably increase over time. The most efficient way to extract or steal knowledge from such networks is by querying them using a large dataset of random samples and recording their output, which is followed by the training of a student network, aiming to eventually mimic these outputs, without making any assumption about the original networks. The most effective way to protect against such a mimicking attack is to answer queries with the classification result only, omitting confidence values associated with the softmax layer. In this paper, we present a novel method for generating composite images for attacking a mentor neural network using a student model. Our method assumes no information regarding the mentor's training dataset, architecture, or weights. Furthermore, assuming no information regarding the mentor's softmax output values, our method successfully mimics the given neural network and is capable of stealing large portions (and sometimes all) of its encapsulated knowledge. Our student model achieved 99% relative accuracy to the protected mentor model on the Cifar-10 test set. In addition, we demonstrate that our student network (which copies the mentor) is impervious to watermarking protection methods and thus would evade being detected as a stolen model by existing dedicated techniques. Our results imply that all current neural networks are vulnerable to mimicking attacks, even if they do not divulge anything but the most basic required output, and that the student model that mimics them cannot be easily detected using currently available techniques.Entities:
Keywords: adversarial AI; artificial intelligence; communication; cybersecurity; deep learning; entropy; information theory; models; neural networks; swarm intelligence
Year: 2022 PMID: 35327860 PMCID: PMC8947501 DOI: 10.3390/e24030349
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Illustration of images created using our composite data-generation method. The images and their relative mixture are random. Using this method during each epoch we create an entirely new dataset, with random data not seen before by the model.
The architecture used in the composite training experiment for the student model. This architecture is a modification of the VGG-16 architecture [47], which has proven to be very successful and robust. By performing only small modifications over the input and output layers, we can adapt this architecture for a student model intended to mimic a different mentor model.
| Modified VGG-16 Model Architecture for Student Network |
|---|
| 3 × 3 Convolution 64 |
| 3 × 3 Convolution 64 |
| Max pooling |
| 3 × 3 Convolution 128 |
| 3 × 3 Convolution 128 |
| Max pooling |
| 3 × 3 Convolution 256 |
| 3 × 3 Convolution 256 |
| 3 × 3 Convolution 256 |
| Max pooling |
| 3 × 3 Convolution 512 |
| 3 × 3 Convolution 512 |
| 3 × 3 Convolution 512 |
| Max pooling |
| 3 × 3 Convolution 512 |
| 3 × 3 Convolution 512 |
| 3 × 3 Convolution 512 |
| Max pooling |
| Dense 512 |
| Dense 512 |
| Softmax 10 |
Parameters used for training in the composite experiment.
| Parameters | Values |
|---|---|
| Learning rate | 0.001 |
| Activation function | ReLU |
| Batch size | 128 |
| Dropout rate | - |
| - | |
| SGD momentum | 0.9 |
| Data augmentation | - |
Parameters used for the training process using standard (non-composite) mimicking.
| Parameters | Values |
|---|---|
| Learning rate | 0.001 |
| Activation function | ReLU |
| Batch size | 128 |
| Dropout rate | 0.2 |
| 0.0005 | |
| SGD momentum | 0.9 |
| Data augmentation | Used |
Figure 2Generated images and their corresponding expected softmax distribution, which reveals the model’s certainty level for each example. In practice, the manner by which objects overlap and the degree of their overlap largely affect the certainty level.
Figure 3Student test accuracies for composite and soft-label experiments, training the student over 100 epochs. The student trained using the composite method is superior during almost the entire training process. The two experiments were selected for visual comparison as they reached the highest success rates for the test set.
Summary of the experiments. The table provides the CIFAR-10 test accuracy of three student models in absolute terms and in comparison to the 90.48% test accuracy achieved by the mentor itself. The three mimicking methods use standard mimicking for unprotected and protected mentors, as well as composite mimicking for a protected mentor, which provides the best results.
| Method | Mentor Status | Test Accuracy | Relative Accuracy |
|---|---|---|---|
| Standard | Unprotected | 89.10% | 98.47% |
| Standard | Protected | 87.46% | 96.66% |
| Composite | Protected | 89.59% | 99.01% |