| Literature DB >> 35795747 |
XiangDong Huang1,2,3,4, Hao Li3,5, Jiajia Liu4, FengChun Liu1,2,3,5,6, Jian Wang3,5, BaoShan Xie3,5, BaoPing Chen3,5, Qi Zhang3,5, Tao Xue1,4.
Abstract
With the rapid development of the Internet, malicious domain names pose more and more serious threats to many fields, such as network security and social security, and there have been many research results on malicious domain detection. This article proposes a malicious domain name detection model based on improved deep learning, which can combine the advantages of three different network models, convolutional neural network (CNN), temporal convolutional network (TCN), and long short-term memory network (LSTM) in malicious domain name detection, to obtain a better detection effect than that of the original single or two models. Experiments show that the effect of the improved deep learning model proposed in this article is better than that of the combined model of CNN and LSTM or the combined model of CNN and TCN, and the accuracy and regression rates reached 99.76% and 98.81%, respectively.Entities:
Mesh:
Year: 2022 PMID: 35795747 PMCID: PMC9252679 DOI: 10.1155/2022/9241670
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Basic structure of detection model.
Figure 2Basic principles of character encoding.
Figure 3LSTM combined with an attention mechanism.
Figure 4Basic structure diagram of TCN (expanded convolution).
Figure 5Basic CNN network model.
Figure 6CNN + TCN network model.
Figure 7CNN + LSTM with an attention mechanism.
54 DGA domain name datasets.
| Label | DGA | Number |
|---|---|---|
| 1 | Abcbot | 27 |
| 2 | Antavmu | 16 |
| 3 | Bamital | 104 |
| 4 | Banjori | 483084 |
| 5 | Bigviktor | 1000 |
| 6 | Blackhole | 2 |
| 7 | Ccleaner | 1 |
| 8 | Chinad | 1000 |
| 9 | Conficker | 498 |
| 10 | Copperstealer | 11 |
| 11 | Cryptolocker | 1000 |
| 12 | Dircrypt | 766 |
| 13 | Dyre | 1000 |
| 14 | Emotet | 496892 |
| 15 | Enviserv | 492 |
| 16 | Feodo | 263 |
| 17 | Flubot | 30000 |
| 18 | Fobber | 597 |
| 19 | Gameover | 12000 |
| 20 | Gspy | 100 |
| 21 | Kfos | 124 |
| 22 | Locky | 1158 |
| 23 | Madmax | 1 |
| 24 | Matsnu | 905 |
| 25 | Mirai | 1 |
| 26 | Monerominer | 0 |
| 27 | Murifet | 8560 |
| 28 | Mydoom | 10049 |
| 29 | Necro | 2708 |
| 30 | Necurs | 8188 |
| 31 | Ngioweb | 5275 |
| 32 | Nymaim | 480 |
| 33 | Omexo | 38 |
| 34 | Padcrypt | 168 |
| 35 | Proslikefan | 100 |
| 36 | Pykspa | 45670 |
| 37 | Qadars | 2000 |
| 38 | Ramnit | 20065 |
| 39 | Ranbyus | 11160 |
| 40 | Rovnix | 179993 |
| 41 | Shifu | 2545 |
| 42 | Shiotob | 8004 |
| 43 | Simda | 30289 |
| 44 | Suppobox | 2269 |
| 45 | Symmi | 4256 |
| 46 | Tempedreve | 193 |
| 47 | Tinba | 100653 |
| 48 | Tinynuke | 32 |
| 49 | Tofsee | 20 |
| 50 | Tordwm | 510 |
| 51 | Vawtrak | 842 |
| 52 | Vidro | 100 |
| 53 | Virut | 9748 |
| 54 | Xshellghost | 1 |
DGA domain name dataset selection of the experimental model.
| Label | DGA | Number | Input dataset size (training set + validation set) | Test set |
|---|---|---|---|---|
| 1 | Banjori | 483084 | 10000 | 20000 |
| 2 | Emotet | 496892 | 10000 | 20000 |
| 3 | Flubot | 30000 | 10000 | 20000 |
| 4 | Gameover | 12000 | 10000 | 2000 |
| 5 | Mydoom | 10049 | 10000 | 49 |
| 6 | Pykspa | 45670 | 10000 | 20000 |
| 7 | Ramnit | 20065 | 10000 | 10065 |
| 8 | Ranbyus | 11160 | 10000 | 1160 |
| 9 | Rovnix | 179993 | 10000 | 20000 |
| 10 | Simda | 30289 | 10000 | 20000 |
| 11 | Tinba | 100653 | 10000 | 20000 |
Comparison table of various performance indicators of each model.
| Precision (%) | Recall (%) | Accuracy (%) | FPR (%) | F1-score | |
|---|---|---|---|---|---|
| CNN | 88.94 | 94.99 | 82.35 | 15.74 | 0.8822 |
| CNN + TCN | 94.16 | 94.99 | 91.50 | 6.44 | 0.9321 |
| CNN + LSTM_Attention | 92.61 | 86.92 | 92.61 | 3.77 | 0.8967 |
| CNN + LSTM_Attention + TCN | 99.12 | 99.76 | 98.81 | 1.52 | 0.8935 |
Figure 8Average values of different parameter indexes of different models (repeated five times).
Figure 9Model checking effect comparison chart.
Figure 10ROC curve of model.
Figure 11PR curve of model.
Figure 12Comparison chart of operation time of different network models.