| Literature DB >> 36262148 |
Ke Yuan1,2, Daoming Yu1, Jingkai Feng3, Longwei Yang1, Chunfu Jia4, Yiwang Huang5.
Abstract
Cryptographic algorithm identification, which refers to analyzing and identifying the encryption algorithm used in cryptographic system, is of great significance to cryptanalysis. In order to improve the accuracy of identification work, this article proposes a new ensemble learning-based model named hybrid k-nearest neighbor and random forest (HKNNRF), and constructs a block cipher algorithm identification scheme. In the ciphertext-only scenario, we use NIST randomness test methods to extract ciphertext features, and carry out binary-classification and five-classification experiments on the block cipher algorithms using proposed scheme. Experiments show that when the ciphertext size and other experimental conditions are the same, compared with the baselines, the HKNNRF model has higher classification accuracy. Specifically, the average binary-classification identification accuracy of HKNNRF is 69.5%, which is 13%, 12.5%, and 10% higher than the single-layer support vector machine (SVM), k-nearest neighbor (KNN), and random forest (RF) respectively. The five-classification identification accuracy can reach 34%, which is higher than the 21% accuracy of KNN, the 22% accuracy of RF and the 23% accuracy of SVM respectively under the same experimental conditions.Entities:
Keywords: Cryptographic algorithm identification; K-nearest neighbor algorithm; Machine learning; Random forest algorithm; Randomness test
Year: 2022 PMID: 36262148 PMCID: PMC9575859 DOI: 10.7717/peerj-cs.1110
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Workflow of cryptographic algorithm identification.
Figure 2The cryptographic algorithm identification process based on HKNNRF.
Figure 3The flow of the cryptographic algorithm identification scheme constructed based on the HKNNRF model.
Specific parameter list of five block cipher algorithms.
| Algorithm | Structure | Key | Mode | Parameter | Implementation |
|---|---|---|---|---|---|
| AES | SP | Fixed | ECB | Fixed | Crypto |
| 3DES | Feistel | Fixed | ECB | Fixed | Crypto |
| Blowfish | Feistel | Fixed | ECB | Fixed | Crypto |
| CAST | Feistel | Fixed | ECB | Fixed | Crypto |
| RC2 | Feistel | Fixed | ECB | Fixed | Crypto |
Confusion matrix of classification results.
| Real situation | Forecast result | |
|---|---|---|
| Positive | Negative | |
| Positive | TP (True positive) | FN (False negative) |
| Negative | FP (False positive) | TN (True negative) |
Binary-classification experimental results based on ciphertext features.
| Evaluating indicator | File size (KB) | Classifier | |||
|---|---|---|---|---|---|
| SVM | KNN | RF | HKNNRF | ||
| Accuracy | 512 | 0.600 | 0.600 | 0.600 | 0.725 |
| 256 | 0.575 | 0.575 | 0.625 | 0.650 | |
| 64 | 0.625 | 0.600 | 0.650 | 0.675 | |
| 8 | 0.525 | 0.525 | 0.575 | 0.700 | |
| 1 | 0.500 | 0.550 | 0.525 | 0.725 | |
| Precision | 512 | 0.580 | 0.601 | 0.600 | 0.725 |
| 256 | 0.580 | 0.583 | 0.628 | 0.700 | |
| 64 | 0.620 | 0.594 | 0.650 | 0.650 | |
| 8 | 0.530 | 0.532 | 0.600 | 0.675 | |
| 1 | 0.420 | 0.530 | 0.615 | 0.700 | |
| Recall | 512 | 0.580 | 0.600 | 0.600 | 0.700 |
| 256 | 0.580 | 0.575 | 0.625 | 0.650 | |
| 64 | 0.660 | 0.600 | 0.650 | 0.700 | |
| 8 | 0.520 | 0.525 | 0.575 | 0.650 | |
| 1 | 0.420 | 0.550 | 0.525 | 0.625 | |
Figure 4Identification accuracy under different file sizes.
Binary-classification accuracy of five cryptographic algorithms based on HKNNRF.
| Accuracy | 1 KB | 8 KB | 64 KB | 256 KB | 512 KB |
|---|---|---|---|---|---|
| 3DES and AES | 0.725 | 0.700 | 0.675 | 0.650 | 0.725 |
| 3DES and Blowfish | 0.625 | 0.625 | 0.650 | 0.625 | 0.650 |
| 3DES and CAST | 0.650 | 0.625 | 0.725 | 0.675 | 0.625 |
| 3DES and RC2 | 0.650 | 0.675 | 0.700 | 0.625 | 0.65 |
| AES and Blowfish | 0.600 | 0.700 | 0.675 | 0.650 | 0.675 |
| AES and CAST | 0.650 | 0.675 | 0.650 | 0.700 | 0.700 |
| AES and RC2 | 0.625 | 0.600 | 0.700 | 0.675 | 0.650 |
| Blowfish and CAST | 0.650 | 0.650 | 0.650 | 0.625 | 0.625 |
| Blowfish and RC2 | 0.700 | 0.600 | 0.675 | 0.700 | 0.700 |
| CAST and RC2 | 0.725 | 0.650 | 0.625 | 0.675 | 0.725 |
Figure 5Comparison of binary-classification accuracy of different file sizes.
Five-classification experimental results based on ciphertext features.
| Evaluating indicator | File size (KB) | Classifier | |||
|---|---|---|---|---|---|
| SVM | KNN | RF | HKNNRF | ||
| Accuracy | 512 | 0.190 | 0.200 | 0.170 | 0.240 |
| 256 | 0.210 | 0.200 | 0.200 | 0.330 | |
| 64 | 0.100 | 0.160 | 0.220 | 0.270 | |
| 8 | 0.170 | 0.190 | 0.240 | 0.300 | |
| 1 | 0.230 | 0.210 | 0.220 | 0.340 | |
| Precision | 512 | 0.230 | 0.219 | 0.210 | 0.298 |
| 256 | 0.233 | 0.225 | 0.208 | 0.330 | |
| 64 | 0.115 | 0.099 | 0.231 | 0.227 | |
| 8 | 0.204 | 0.207 | 0.233 | 0.305 | |
| 1 | 0.214 | 0.222 | 0.223 | 0.371 | |
| Recall | 512 | 0.190 | 0.200 | 0.170 | 0.240 |
| 256 | 0.210 | 0.200 | 0.200 | 0.330 | |
| 64 | 0.100 | 0.160 | 0.220 | 0.270 | |
| 8 | 0.170 | 0.190 | 0.240 | 0.300 | |
| 1 | 0.230 | 0.210 | 0.220 | 0.340 | |