| Literature DB >> 35494833 |
Shida Lu1, Kai Huang2, Talha Meraj3, Hafiz Tayyab Rauf4.
Abstract
A Completely Automated Public Turing Test to tell Computers and Humans Apart (CAPTCHA) is used in web systems to secure authentication purposes; it may break using Optical Character Recognition (OCR) type methods. CAPTCHA breakers make web systems highly insecure. However, several techniques to break CAPTCHA suggest CAPTCHA designers about their designed CAPTCHA's need improvement to prevent computer vision-based malicious attacks. This research primarily used deep learning methods to break state-of-the-art CAPTCHA codes; however, the validation scheme and conventional Convolutional Neural Network (CNN) design still need more confident validation and multi-aspect covering feature schemes. Several public datasets are available of text-based CAPTCHa, including Kaggle and other dataset repositories where self-generation of CAPTCHA datasets are available. The previous studies are dataset-specific only and cannot perform well on other CAPTCHA's. Therefore, the proposed study uses two publicly available datasets of 4- and 5-character text-based CAPTCHA images to propose a CAPTCHA solver. Furthermore, the proposed study used a skip-connection-based CNN model to solve a CAPTCHA. The proposed research employed 5-folds on data that delivers 10 different CNN models on two datasets with promising results compared to the other studies. ©2022 Lu et al.Entities:
Keywords: CAPTCHa; CNN; Computer vision; Deep learning; Optical Character Recognition
Year: 2022 PMID: 35494833 PMCID: PMC9044336 DOI: 10.7717/peerj-cs.879
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Few recent CAPTCHA recognition-based studies methods and their results.
| Reference | Year | Dataset | Method | Results |
|---|---|---|---|---|
|
| 2021 | CNKI CAPTCHA, | Binarization, smoothing, segmentation and annotation with Adhesian and more interference | Recognition rate = 99%, 98.5%, 97.84% |
|
| 2021 | Tamil, Hindi and Bengali | Pillow Library, CNN | ∼ |
|
| 2021 | Private created Dataset | 15-layer CNN | Classification accuracy = 80% |
|
| 2021 | Private | 7-Layer CNN | Classification Accuracy = 99.7% |
|
| 2021 | 4-words Kaggle Dataset | CNN | Classification Accuracy = 100% |
|
| 2021 | Private GAN based dataset | CNN | Classification Accuracy = 96%, overall = 74% |
|
| 2020 | Weibo, Gregwar | CNN | Testing Accuracy = 92.68% |
Figure 1The proposed framework for CAPTCHA recognition for both 4 and 5 character datasets.
Description of both employed datasets’ in proposed study.
| Properties | d1 | d2 |
|---|---|---|
| Image dimension | 50 × 200 × 3 | 24 × 72 × 3 |
| Extension | PNG | PNG |
| Number of images | 9955 | 1040 |
| Character types | 32 | 19 |
| Resized image dimension (Per Character) | 20 × 24 × 1 | 20 × 24 × 1 |
Figure 2Five and four character’s datasets used in proposed study, their character-wise frequencies (row 1: 4-character dataset 1 (d2); row 2: five-character Dataset 2 (d1)).
Figure 3Preprocessing and isolation of characters in both datasets (row 1: the d1 dataset, binarization, erosion, area-wise selection, and segmentation; row 2: binarization and isolation of each character).
Parameters setting and learnable weights for proposed skipping-CNN architecture.
| Number | Layers name | Category | Parameters | Weights/Offset | Padding | Stride |
|---|---|---|---|---|---|---|
| 1 | Input | Image Input | 24 × 20 × 1 | – | – | – |
| 2 | Conv (1) | Convolution | 24 × 20 × 8 | 3 × 3 × 1 × 8 | Same | 1 |
| 3 | BN (1) | Batch Normalization | 24 × 20 × 8 | 1 × 1 × 8 | – | – |
| 4 | ReLU (1) | ReLU | 24 × 20 × 8 | – | – | – |
| 5 | Conv (2) | Convolution | 12 × 10 × 16 | 3 × 3 × 8 × 16 | Same | 2 |
| 6 | BN (2) | Batch Normalization | 12 × 10 × 16 | 1 × 1 × 16 | – | – |
| 7 | ReLU (2) | ReLU | 12 × 10 × 16 | – | – | – |
| 8 | Conv (3) | Convolution | 12 × 10 × 32 | 3 × 3 × 16 × 32 | Same | 1 |
| 9 | BN (3) | Batch Normalization | 12 × 10 × 32 | 1 × 1 × 32 | – | – |
| 10 | ReLU (3) | ReLU | 12 × 10 × 32 | – | – | – |
| 11 | Skip-connection | Convolution | 12 × 10 × 32 | 1 × 1 × 8 × 32 | 2 | 0 |
| 12 | Add | Addition | 12 × 10 × 32 | – | – | – |
| 13 | Pool | Average Pooling | 6 × 5 × 32 | – | 2 | 0 |
| 14 | FC | Fully connected | 1 × 1 × 19 (d2) | 19 × 960 (d2) | – | – |
| 15 | Softmax | Softmax | 1 × 1 × 19 | – | – | – |
| 16 | Class Output | Classification | – | – | – | – |
Figure 4Five-folds based trained CNN weights with their respective layers are shown that shows the proposed CNN skipping connection based variation in all CNNs’ architectures.
Five-character dataset accuracy (%) and F1-score with five-fold text recognition based testing on the trained CNNs.
| Accuracy (%) | F1-measure (%) | Accuracy (%) | F1-measure (%) | ||||
|---|---|---|---|---|---|---|---|
| Character | Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5 | 5-Fold Mean | 5-Fold Mean |
|
| 87.23 | 83.33 | 89.63 | 84.21 | 83.14 | 84.48 | 86.772 |
|
| 87.76 | 75.51 | 87.75 | 90.323 | 89.32 | 86.12 | 86.0792 |
|
| 84.31 | 88.46 | 90.196 | 91.089 | 90.385 | 89.06 | 89.4066 |
|
| 84.31 | 80.39 | 90.00 | 90.566 | 84.84 | 86.56 | 85.2644 |
|
| 86.95 | 76.59 | 82.61 | 87.50 | 82.22 | 87.58 | 85.2164 |
|
| 89.36 | 87.23 | 86.95 | 86.957 | 88.636 | 86.68 | 87.3026 |
|
| 89.58 | 79.16 | 91.66 | 93.47 | 89.362 | 87.49 | 89.5418 |
|
| 81.81 | 73.33 | 97.72 | 86.04 | 90.90 | 85.03 | 87.7406 |
|
| 87.23 | 79.16 | 85.10 | 82.60 | 80.0 | 82.64 | 81.0632 |
|
| 91.30 | 78.26 | 91.30 | 87.91 | 88.66 | 88.67 | 86.7954 |
|
| 62.79 | 79.54 | 79.07 | 85.41 | 81.928 | 78.73 | 79.4416 |
|
| 92.00 | 84.00 | 93.87 | 93.069 | 82.47 | 89.1 | 87.5008 |
|
| 95.83 | 91.83 | 100 | 95.833 | 94.73 | 95.06 | 94.522 |
|
| 64.00 | 56.00 | 53.061 | 70.47 | 67.34 | 62.08 | 63.8372 |
|
| 81.40 | 79.07 | 87.59 | 79.04 | 78.65 | 81.43 | 77.8656 |
|
| 97.78 | 78.26 | 82.22 | 91.67 | 98.87 | 90.34 | 92.0304 |
|
| 95.24 | 83.72 | 90.47 | 96.66 | 87.50 | 90.55 | 91.3156 |
|
| 89.58 | 87.50 | 82.97 | 87.23 | 82.105 | 85.68 | 86.067 |
|
| 93.02 | 95.45 | 97.67 | 95.43 | 95.349 | 95.40 | 95.8234 |
|
|
|
|
|
|
|
|
|
Four-character dataset accuracy (%) and F1-score with five-fold text recognition based testing on the trained CNNs.
| Accuracy (%) | F1-measure (%) | Accuracy (%) | F1-measure (%) | ||||
|---|---|---|---|---|---|---|---|
| Character | Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5 | 5-Fold Mean | 5-Fold Mean |
|
| 97.84 | 99.14 | 99.57 | 98.92 | 99.13 | 98.79 | 98.923 |
|
| 97.02 | 94.92 | 98.72 | 97.403 | 97.204 | 96.52 | 97.5056 |
|
| 97.87 | 97.46 | 99.15 | 98.934 | 98.526 | 98.55 | 98.4708 |
|
| 98.76 | 98.76 | 99.17 | 97.97 | 98.144 | 99.01 | 98.0812 |
|
| 100 | 95.65 | 99.56 | 99.346 | 99.127 | 98.69 | 98.947 |
|
| 98.80 | 99.60 | 99.19 | 99.203 | 98.603 | 99.36 | 98.9624 |
|
| 99.15 | 98.72 | 97.42 | 98.283 | 98.073 | 98.29 | 98.1656 |
|
| 98.85 | 96.55 | 98.08 | 98.092 | 99.617 | 98.39 | 98.4258 |
|
| 97.85 | 98.71 | 99.13 | 98.712 | 97.645 | 98.54 | 98.2034 |
|
| 99.57 | 96.59 | 98.72 | 97.89 | 96.567 | 97.95 | 97.912 |
|
| 99.58 | 98.75 | 99.16 | 99.379 | 99.374 | 99.25 | 99.334 |
|
| 100 | 100 | 100 | 99.787 | 99.153 | 99.92 | 99.6612 |
|
| 99.18 | 97.57 | 100 | 98.994 | 98.374 | 98.94 | 98.6188 |
|
| 98.69 | 98.26 | 100 | 98.253 | 98.253 | 98.52 | 98.3076 |
|
| 98.76 | 97.93 | 100 | 98.319 | 98.551 | 98.43 | 98.7944 |
|
| 99.58 | 97.90 | 100 | 98.347 | 99.371 | 99.33 | 99.1232 |
|
| 100 | 98.72 | 99.57 | 99.788 | 99.574 | 99.66 | 99.4458 |
|
| 99.15 | 99.58 | 100 | 99.156 | 99.371 | 99.58 | 99.1606 |
|
| 97.41 | 98.28 | 100 | 99.355 | 99.352 | 98.79 | 99.1344 |
|
| 99.16 | 96.23 | 99.16 | 99.17 | 99.532 | 98.58 | 98.9816 |
|
| 99.58 | 97.10 | 99.17 | 99.793 | 98.755 | 98.83 | 98.652 |
|
| 98.35 | 97.94 | 98.77 | 98.347 | 96.881 | 97.86 | 97.8568 |
|
| 100 | 100 | 99.58 | 99.576 | 99.787 | 99.75 | 99.7456 |
|
| 99.58 | 99.17 | 99.17 | 99.174 | 98.319 | 99.00 | 99.0834 |
|
| 98.75 | 99.58 | 100 | 99.583 | 99.156 | 99.42 | 99.4118 |
|
| 97.47 | 97.90 | 98.73 | 98.305 | 98.312 | 97.98 | 99.558 |
|
| 100 | 97.43 | 99.57 | 99.134 | 98.925 | 98.80 | 99.1794 |
|
| 100 | 98.67 | 98.67 | 99.332 | 98.441 | 98.47 | 98.8488 |
|
| 100 | 100 | 100 | 99.376 | 99.167 | 99.67 | 99.418 |
|
| 99.15 | 97.46 | 100 | 99.573 | 99.788 | 99.15 | 99.3174 |
|
| 97.90 | 98.33 | 98.74 | 98.156 | 99.371 | 98.66 | 98.7866 |
|
| 99.17 | 98.75 | 99.16 | 98.965 | 99.163 | 99.16 | 99.0832 |
|
|
|
|
|
|
|
|
|
Figure 5The validation loss and validation accuracy graphs are shown for each fold of the CNN (row 1: five-character CAPTCHA; row 2: four-character CAPTCHA).
Comparison of proposed study based five and four-character datasets’ with state-of-the-art methods.
| References | No. of characters | Method | Results |
|---|---|---|---|
|
| 6 | Faster R-CNN | Accuracy = 98.5% |
| 4 | Accuracy = 97.8% | ||
| 5 | Accuracy = 97.5% | ||
|
| 4 | Selective D-CNN | Success rate = 95.4% |
|
| Different | CNN | Accuracy = 80% |
|
| Different | KNN | Precision = 98.99% |
| SVN | 99.80% | ||
| Feed forward-Net | 98.79% | ||
| Proposed Study | 4 | Skip-CNN with 5-Fold Validation | Accuracy = 98.82% |
| 5 | – | Accuracy = 85.52% |