| Literature DB >> 35494824 |
Meijia Zhang1, Wenwen Sun2, Jie Tian1, Xiyuan Zheng1, Shaopeng Guan2.
Abstract
Internet traffic classification is fundamental to network monitoring, service quality and security. In this paper, we propose an internet traffic classification method based on the Echo State Network (ESN). To enhance the identification performance, we improve the Salp Swarm Algorithm (SSA) to optimize the ESN. At first, Tent mapping with reversal learning, polynomial operator and dynamic mutation strategy are introduced to improve the SSA, which enhances its optimization performance. Then, the advanced SSA are utilized to optimize the hyperparameters of the ESN, including the size of the reservoir, sparse degree, spectral radius and input scale. Finally, the optimized ESN is adopted to classify Internet traffic. The simulation results show that the proposed ESN-based method performs much better than other traditional machine learning algorithms in terms of per-class metrics and overall accuracy. ©2022 Zhang et al.Entities:
Keywords: Classification; Echo state network; Hyperparameter optimization; Internet traffic; Salp swarm algorithm
Year: 2022 PMID: 35494824 PMCID: PMC9044245 DOI: 10.7717/peerj-cs.860
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Summary of related work.
| Work | Simple description | Comments |
|---|---|---|
|
| Classify the network traffic of a specific port into the corresponding network application | Be affected by dynamic ports |
|
| Identify the application by analyzing the protocol signature in the payload | Have large computational overhead, and may cause unnecessary privacy disputes |
|
| Utilize the bidirectional GRU to extract the forward and backward features of byte sequences in the session, and then employed the attention mechanism to assign the weight of the features according to their contributions | Have long training time and high computational cost |
|
| Cut the original data traffic and input it into the CNN to classify network traffic | |
|
| Adopt CNN to extract high-dimensional features of the network traffic, and then extract the representative features from these features based on the AE | |
|
| Utilize CNN, LSTM and SAE to extract the spatial, temporal and coding features of the original traffic, and combine these features to achieve comprehensive understanding of the original traffic | |
|
| Use header information and payload data to train CNN and SAE respectively | |
|
| Divide a large classification into small classifications by using the tree structure | |
|
| Design a joint deep learning model as a basic classifier, and then adopt the attention mechanism to aggregates the basic predictions generated in the first step | |
| This paper | Utilize the advanced SSA to optimize the hyperparameters of the ESN, and then adopt the optimized ESN to classify Internet traffic | Simplify the training process, and have the characteristics of easy implementation and fast training speed |
Figure 1ESN structure.
Figure 2The testing results with Sphere function.
Figure 3The testing results with Griewank function.
Figure 4The flowchart of ESN-based network traffic classification.
Moore dataset statistics.
|
|
|
|
|
|---|---|---|---|
| WWW | 241,186 | 86,906 | 86.906% |
| 21,000 | 7,567 | 7.567% | |
| Ftp-data | 4,261 | 1,536 | 1.536% |
| Ftp-pasv | 1,976 | 712 | 0.712% |
| Ftp-control | 2,245 | 809 | 0.809% |
| Services | 1,543 | 556 | 0.556% |
| Database | 1,947 | 701 | 0.701% |
| P2P | 1,539 | 555 | 0.555% |
| Attack | 1,318 | 475 | 0.475% |
| Mutimedia | 423 | 153 | 0.153% |
| Interactive | 82 | 28 | 0.028% |
| Games | 6 | 2 | 0.002% |
| Totality | 277,526 | 100,000 | 100% |
NISM dataset statistics.
|
|
|
|
|
|---|---|---|---|
| DNS | 32,691 | 5,325 | 5.325% |
| Ftp | 1,486 | 242 | 0.242% |
| Http | 10,236 | 1,668 | 1.668% |
| Telent | 1,076 | 175 | 0.175% |
| Lime | 555,738 | 90,533 | 90.533% |
| Local forwarding | 2,199 | 358 | 0.358% |
| Remote forwarding | 2,083 | 339 | 0.339% |
| Scp | 2,102 | 342 | 0.342% |
| Sftp | 2,074 | 338 | 0.338% |
| Shell | 2,142 | 349 | 0.349% |
| X11 | 2,025 | 330 | 0.330% |
| Totality | 613,851 | 100,000 | 100% |
Hyperparameters of ESN.
|
|
|
|
|---|---|---|
| Moore dataset | the size of the reservoir | 98 |
| spectral radius | 0.78 | |
| sparse degree | 0.89 | |
| input scale | 0.95 | |
| NISM dataset | the size of the reservoir | 77 |
| spectral radius | 0.83 | |
| sparse degree | 0.78 | |
| input scale | 0.90 |
Figure 5The changing curves of fitness values on the Moore dataset.
Figure 6The changing curves of fitness values on the NISM dataset.
The parameter values of the comparison algorithms.
|
|
|
|
|---|---|---|
| Moore dataset | SVM | RBF kernel, |
| SAE | Hidden layers=2 (the node number of each layer is 50) | |
| CNN | Hidden layers=2 (the node number of each layer is 50) | |
| GRU | Hidden layers=2 (the node number of each layer is 50) | |
| DBN | Hidden layers=2 (the node number of each layer is 50) | |
| NISM dataset | SVM | RBF kernel, |
| SAE | Hidden layers=2 (the node number of each layer is 40) | |
| CNN | Hidden layers=2 (the node number of each layer is 40) | |
| GRU | Hidden layers=2 (the node number of each layer is 40) | |
| DBN | Hidden layers=2 (the node number of each layer is 40) |
The class accuracy of different ML algorithms on the Moore dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 0.90 | 0.90 | 0.93 | 0.82 | 0.73 | 0.88 | 0.62 | 0.80 | 0.82 | 0.84 | 0.00 | 0.00 |
| SAE | 0.93 | 0.97 | 0.97 | 1.00 | 0.83 | 0.93 | 0.82 | 0.79 | 0.93 | 0.86 | 0.83 | 0.00 |
| CNN | 0.94 | 0.97 | 0.98 | 0.95 | 0.95 | 0.96 | 0.94 | 0.81 | 0.94 | 0.89 | 0.47 | 1.00 |
| GRU | 0.97 | 0.97 | 0.97 | 0.97 | 0.91 | 0.99 | 0.95 | 0.76 | 0.99 | 0.88 | 0.56 | 0.00 |
| DBN | 0.94 | 0.92 | 0.95 | 0.96 | 0.80 | 0.97 | 0.76 | 0.79 | 0.92 | 0.86 | 1.00 | 0.00 |
| ESN | 0.92 | 0.95 | 0.95 | 0.90 | 0.91 | 0.93 | 0.92 | 0.80 | 0.92 | 0.86 | 0.83 | 0.00 |
| SSA-ESN | 0.97 | 0.97 | 0.98 | 0.95 | 0.93 | 0.97 | 0.94 | 0.89 | 0.95 | 0.89 | 0.90 | 1.00 |
| Ours | 0.99 | 0.98 | 1.00 | 0.95 | 1.00 | 1.00 | 0.99 | 0.94 | 0.99 | 0.91 | 1.00 | 1.00 |
The class accuracy of different ML algorithms on the NISM dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 0.90 | 0.94 | 0.98 | 0.97 | 0.76 | 1.00 | 1.00 | 0.66 | 0.68 | 0.91 | 0.92 |
| SAE | 0.90 | 0.94 | 0.99 | 0.93 | 0.81 | 1.00 | 1.00 | 0.70 | 0.76 | 0.93 | 1.00 |
| CNN | 0.95 | 1.00 | 1.00 | 1.00 | 0.81 | 1.00 | 1.00 | 0.99 | 0.95 | 0.97 | 1.00 |
| GRU | 0.85 | 0.98 | 0.99 | 0.96 | 1.00 | 0.99 | 1.00 | 0.98 | 0.94 | 0.97 | 0.99 |
| DBN | 0.91 | 0.94 | 0.99 | 0.99 | 0.82 | 1.00 | 1.00 | 0.72 | 0.81 | 0.94 | 0.99 |
| ESN | 0.89 | 0.94 | 0.99 | 0.94 | 0.89 | 1.00 | 1.00 | 0.85 | 0.80 | 0.94 | 0.99 |
| SSA-ESN | 0.97 | 1.00 | 0.99 | 1.00 | 0.98 | 1.00 | 1.00 | 0.98 | 0.98 | 0.98 | 1.00 |
| Ours | 0.99 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 0.99 | 0.99 | 0.99 | 0.99 | 1.00 |
The class recall rate of different ML algorithms on the Moore dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 0.97 | 0.88 | 0.92 | 0.82 | 0.75 | 0.94 | 0.89 | 0.37 | 0.81 | 0.82 | 0.00 | 0.00 |
| SAE | 0.98 | 0.92 | 1.00 | 0.96 | 0.90 | 0.94 | 0.98 | 0.71 | 0.84 | 0.86 | 0.50 | 0.00 |
| CNN | 0.99 | 0.96 | 1.00 | 0.93 | 0.86 | 0.95 | 0.95 | 0.83 | 0.88 | 0.83 | 0.70 | 0.50 |
| GRU | 0.97 | 0.98 | 1.00 | 0.97 | 0.89 | 0.94 | 0.97 | 0.94 | 0.83 | 0.87 | 0.50 | 0.00 |
| DBN | 0.98 | 0.92 | 0.99 | 0.93 | 0.87 | 0.93 | 0.95 | 0.70 | 0.81 | 0.81 | 0.20 | 0.00 |
| ESN | 0.97 | 0.92 | 0.92 | 0.93 | 0.85 | 0.94 | 0.95 | 0.89 | 0.83 | 0.81 | 0.50 | 0.00 |
| SSA-ESN | 1.00 | 0.98 | 1.00 | 0.99 | 0.95 | 0.99 | 0.98 | 0.94 | 0.94 | 0.94 | 0.80 | 0.50 |
| Ours | 1.00 | 0.99 | 1.00 | 0.99 | 0.96 | 0.99 | 0.98 | 0.95 | 0.95 | 0.95 | 0.90 | 0.50 |
The class recall rate of different ML algorithms on the NISM dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 0.72 | 1.00 | 0.96 | 0.93 | 0.90 | 0.97 | 0.98 | 0.67 | 0.67 | 0.92 | 0.97 |
| SAE | 0.79 | 1.00 | 0.99 | 0.93 | 0.91 | 0.99 | 0.98 | 0.76 | 0.68 | 0.97 | 0.98 |
| CNN | 0.78 | 1.00 | 0.99 | 1.00 | 0.96 | 0.99 | 0.98 | 0.97 | 0.99 | 1.00 | 0.99 |
| GRU | 1.00 | 1.00 | 0.99 | 0.98 | 0.81 | 0.99 | 0.98 | 0.94 | 0.97 | 0.99 | 0.99 |
| DBN | 0.80 | 1.00 | 1.00 | 0.93 | 0.91 | 0.96 | 0.99 | 0.81 | 0.71 | 1.00 | 0.98 |
| ESN | 0.80 | 1.00 | 1.00 | 0.93 | 0.91 | 0.97 | 0.98 | 0.87 | 0.85 | 0.95 | 0.98 |
| SSA-ESN | 0.95 | 1.00 | 1.00 | 1.00 | 0.96 | 0.99 | 0.99 | 0.97 | 0.97 | 0.99 | 0.98 |
| Ours | 0.99 | 1.00 | 1.00 | 1.00 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
The class F-measure of different ML algorithms on the Moore dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 0.93 | 0.89 | 0.92 | 0.82 | 0.74 | 0.94 | 0.73 | 0.50 | 0.81 | 0.83 | 0.00 | 0.00 |
| SAE | 0.95 | 0.94 | 0.99 | 0.98 | 0.86 | 0.94 | 0.89 | 0.75 | 0.88 | 0.86 | 0.62 | 0.00 |
| CNN | 0.97 | 0.97 | 0.99 | 0.94 | 0.90 | 0.95 | 0.94 | 0.82 | 0.91 | 0.86 | 0.56 | 0.67 |
| GRU | 0.97 | 0.97 | 0.99 | 0.97 | 0.90 | 0.97 | 0.96 | 0.84 | 0.90 | 0.87 | 0.53 | 0.00 |
| DBN | 0.96 | 0.92 | 0.97 | 0.95 | 0.83 | 0.95 | 0.84 | 0.74 | 0.86 | 0.84 | 0.33 | 0.00 |
| ESN | 0.96 | 0.92 | 0.95 | 0.90 | 0.88 | 0.94 | 0.94 | 0.80 | 0.88 | 0.86 | 0.70 | 0.00 |
| SSA-ESN | 0.99 | 0.97 | 0.99 | 0.97 | 0.97 | 0.99 | 0.99 | 0.94 | 0.95 | 0.93 | 0.90 | 0.67 |
| Ours | 1.00 | 0.98 | 1.00 | 0.97 | 0.98 | 0.99 | 0.99 | 0.95 | 0.97 | 0.93 | 0.95 | 0.67 |
The class F-measure of different ML algorithms on the NISM dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
| SVM | 0.80 | 0.97 | 0.97 | 0.95 | 0.83 | 0.99 | 0.99 | 0.67 | 0.68 | 0.91 | 0.95 |
| SAE | 0.84 | 0.97 | 0.99 | 0.93 | 0.86 | 0.99 | 0.99 | 0.73 | 0.72 | 0.95 | 0.99 |
| CNN | 0.86 | 1.00 | 1.00 | 1.00 | 0.88 | 0.99 | 0.99 | 0.98 | 0.97 | 0.98 | 0.99 |
| GRU | 0.92 | 0.99 | 0.99 | 0.97 | 0.90 | 0.99 | 0.99 | 0.96 | 0.96 | 0.98 | 0.99 |
| DBN | 0.85 | 0.97 | 0.99 | 0.96 | 0.86 | 0.98 | 0.99 | 0.76 | 0.76 | 0.97 | 0.99 |
| ESN | 0.80 | 0.97 | 0.97 | 0.92 | 0.88 | 0.99 | 0.99 | 0.89 | 0.85 | 0.95 | 0.99 |
| SSA-ESN | 0.98 | 0.99 | 1.00 | 0.97 | 0.98 | 0.99 | 0.99 | 0.98 | 0.99 | 0.99 | 0.99 |
| Ours | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
Figure 7The overall accuracy of different machine learning algorithms on the Moore and NISM datasets.
The training time and testing time of each algorithm.
| Algorithms | Moore dataset | NISM dataset | ||
|---|---|---|---|---|
| Training time(s) | Testing time(s) | Training time(s) | Testing time(s) | |
| SVM | 35.613 | 0.2031 | 89.7042 | 0.9505 |
| SAE | 60.437 | 1.3562 | 113.489 | 2.369 |
| CNN | 38.1249 | 0.2504 | 56.3493 | 1.3795 |
| GRU | 31.8204 | 0.4143 | 52.2412 | 0.9124 |
| DBN | 33.8685 | 0.4361 | 52.1904 | 0.9579 |
| ESN | 9.892 | 0.1998 | 19.593 | 0.8237 |
| SSA-ESN | 250.078 | 0.4036 | 372.6851 | 0.9601 |
| Ours | 233.441 | 0.3983 | 345.7114 | 0.9582 |
| Algorithm 1: ASSA. |
| Input: population size |
| Output: the optimal position of salp in the population |
| 1 Adopt the Tent mapping with reverse learning to initialize the salp population. |
| 2 for |
| 3 Calculate the fitness value of salps in the population |
| 4 Sort salps in the population according to fitness value |
| 5 Choose food. The salp position with the best fitness is regarded as the food position |
| 6 Choose leaders and followers. After selecting the food, there are |
| 7 Update the leader position according to formula |
| 8 Update the follower position according to formula |
| 9 end for |