| Literature DB >> 35371250 |
Abstract
One of the most insidious methods of bypassing security mechanisms in a modern information system is the domain generation algorithms (DGAs), which are used to disguise the identity of malware by periodically switching the domain name assigned to a command and control (C&C) server. Combating advanced techniques, such as DGAs, is an ongoing challenge that security organizations often need to work with and possibly share private data to train better and more up-to-date machine learning models. This logic raises serious concerns about data integrity, trade-related issues, and strict privacy protocols that must be adhered to. To address the concerns regarding the privacy and security of private data, we propose in this work a privacy-preserved variational-autoencoder to DGA combined with case studies from the education industry and distance learning, specifically because the recent pandemic has brought an explosive increase to remote learning. This is a system that, using the secured multi-party computation (SMPC) methodology, can successfully apply machine learning techniques, specifically the Siamese variational-autoencoder algorithm, on encrypted data and metadata. The method proposed for the first time in the literature facilitates learning specialized extraction functions of useful intermediate representations in complex deep learning architectures, producing improved training stability, high generalization performance, and remarkable categorization accuracy.Entities:
Mesh:
Year: 2022 PMID: 35371250 PMCID: PMC8970956 DOI: 10.1155/2022/7384803
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Siamese variational-autoencoder architecture.
Figure 2DGA domains unique client Ips and DNS queries by https://data.netlab.360.com/dga/.
Training and test datasets.
| Training datasets | Test dataset | |
|---|---|---|
| Nonwordlist-based DGA | Wordlist-based DGA | MixTest nonwordlist and wordlist DGA |
| 200.000 legit | 250.000 legit | 800.000 legit |
| 200.000 DGA | 250.000 DGA | 700.000 DGA |
Results with various training datasets.
| Training dataset | Accuracy | Recall | Precision | f-score |
|---|---|---|---|---|
| Nonwordlist-based DGA | 0.8949 | 0.8883 | 0.8904 | 0.8903 |
| Wordlist-based DGA | 0.9072 | 0.9038 | 0.9056 | 0.9058 |
| MixTrain (nonwordlist & wordlist-based DGA) | 0.9260 | 0.9263 | 0.9259 | 0.9261 |