| Literature DB >> 35808388 |
Ruimin Wang1,2, Haitao Li1, Jing Jing1, Liehui Jiang1,2, Weiyu Dong1.
Abstract
With the improvement of intelligence and interconnection, Internet of Things (IoT) devices tend to become more vulnerable and exposed to many threats. Device identification is the foundation of many cybersecurity operations, such as asset management, vulnerability reaction, and situational awareness, which are important for enhancing the security of IoT devices. The more information sources and the more angles of view we have, the more precise identification results we obtain. This study proposes a novel and alternative method for IoT device identification, which introduces commonly available WebUI login pages with distinctive characteristics specific to vendors as the data source and uses an ensemble learning model based on a combination of Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) for device vendor identification and develops an Optical Character Recognition (OCR) based method for device type and model identification. The experimental results show that the ensemble learning model can achieve 99.1% accuracy and 99.5% F1-Score in the determination of whether a device is from a vendor that appeared in the training dataset, and if the answer is positive, 98% accuracy and 98.3% F1-Score in identifying which vendor it is from. The OCR-based method can identify fine-grained attributes of the device and achieve an accuracy of 99.46% in device model identification, which is higher than the results of the Shodan cyber search engine by a considerable margin of 11.39%.Entities:
Keywords: IoT; OCR; WebUI; device identification; ensemble learning model
Year: 2022 PMID: 35808388 PMCID: PMC9269544 DOI: 10.3390/s22134892
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Samples of device login page screenshot: (a) maxis; (b) QNAP; (c) ASUS and Juniper.
Figure 2Framework for IoT device identification.
Figure 3Device Vendor Classifying based on an ensemble model.
Figure 4Device Type/Model Identifying based on OCR.
30 IoT vendor devices.
| Vendor | Type | Quantity | Vendor | Type | Quantity |
|---|---|---|---|---|---|
| AXIS | Camera, etc. | 1769 | Samsung | Camera | 1597 |
| Asus | Router | 1431 | SonicWall | Firewall | 1286 |
| Avtech | Camera | 1301 | Sophos | Firewall | 2541 |
| Check-Point | Firewall, etc. | 1523 | Super-Micro | Gateway | 1862 |
| Cisco | Router, etc. | 1409 | Synology | NAS | 1131 |
| Cyberoam | VPN | 354 | TP-LINK | Router, etc. | 1747 |
| D-Link | Router, etc. | 678 | Technicolor | Gateway | 1032 |
| Dahua | DVR, etc. | 635 | Topsec | Firewall | 1428 |
| H3c | Firewall | 685 | Yamaha | Network switch | 527 |
| Hikvision | Camera, etc. | 871 | ZTE | Router, etc. | 2086 |
| Huawei | Switch, etc. | 2378 | Zyxel | Router, etc. | 740 |
| Juniper | Firewall | 1023 | maxis | Router | 462 |
| Linksys | Router, etc. | 856 | peplink | Router | 791 |
| MikroTik | Router | 385 | pfSense | Firewall | 692 |
| QNAP | NAS | 2863 | ruckus | Access Controller | 563 |
Parameters configuration of the CNN and Res-DNN.
| Classifier Model | CNN | Res-DNN |
|---|---|---|
| Input size | 224 × 224 × 3 | 16 × 16 |
| Convolutional kernel number | 16, 32, 64 | - |
| Convolutional kernel size | 7 × 7, 3 × 3, 3 × 3 | - |
| Pooling type | Max pooling | - |
| pool_size | (22) | - |
| Optimizer | SGD | Adam |
| Sizes of dense layers | 1000, 256, 128, 64, 30 | L1: 1000, 500, 150, 10, 2; L2: 30, 2; L3: 60,2 |
| Bach_size | 128 | 128 |
| Momentum | 0.9 | - |
| Learning Rate | 0.005 | 0.002 |
| Activation function | ReLU, SoftMax | ReLU |
| Loss function | Cross entropy | Cross entropy |
Performance of multi-classifiers in IoT device identification on cross-validation.
| Times | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| 1st cross-validation | 0.978 | 0.987 | 0.980 | 0.983 |
| 2nd cross-validation | 0.984 | 0.986 | 0.990 | 0.988 |
| 3rd cross-validation | 0.978 | 0.984 | 0.983 | 0.983 |
| 4th cross-validation | 0.981 | 0.980 | 0.987 | 0.983 |
| 5th cross-validation | 0.978 | 0.977 | 0.980 | 0.978 |
| average | 0.98 | 0.983 | 0.984 | 0.983 |
Figure 5Multi-classifier training and validation visualization.
Figure 6Multi-classifier confusion matrix.
Figure 7Evaluation results of the multi-classification model. (a) PDF (b) CDF.
Figure 8Visualization of the probability vectors from the multi-classifier.
Performance of the classifiers in distinguishing between the known and unknown vendors.
| Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|
| Logistic Regression | 0.957 | 0.956 | 1 | 0.9786 |
| Decision Tree | 0.969 | 0.990 | 0.8977 | 0.984 |
| KNN | 0.976 | 0.992 | 0.983 | 0.987 |
| SVM | 0.980 | 0.996 | 0.983 | 0.989 |
| TCN | 0.985 | 0.996 | 0.989 | 0.992 |
| RF | 0.987 | 0.991 | 0.994 | 0.993 |
| WideResNet | 0.989 | 0.990 | 0.996 | 0.994 |
| Res-DNN | 0.991 | 0.992 | 0.998 | 0.995 |
Figure 9OCR-based identification results.
Comparison of OCR-based method and Shodan.
| Attributes | # by Our Method | Accuracy of Our Method | # by Shodan | Accuracy of Shodan |
|---|---|---|---|---|
| Vendor | 7010 | 99.9% | 6411 | 91.36% |
| Type | 6990 | 99.62% | 6830 | 97.34% |
| Model | 6978 | 99.46% | 6786 | 88.07% |