| Literature DB >> 34858492 |
Mvv Prasad Kantipudi1, Sandeep Kumar2, Ashish Kumar Jha3.
Abstract
Deep learning is a subfield of artificial intelligence that allows the computer to adopt and learn some new rules. Deep learning algorithms can identify images, objects, observations, texts, and other structures. In recent years, scene text recognition has inspired many researchers from the computer vision community, and still, it needs improvement because of the poor performance of existing scene recognition algorithms. This research paper proposed a novel approach for scene text recognition that integrates bidirectional LSTM and deep convolution neural networks. In the proposed method, first, the contour of the image is identified and then it is fed into the CNN. CNN is used to generate the ordered sequence of the features from the contoured image. The sequence of features is now coded using the Bi-LSTM. Bi-LSTM is a handy tool for extracting the features from the sequence of words. Hence, this paper combines the two powerful mechanisms for extracting the features from the image, and contour-based input image makes the recognition process faster, which makes this technique better compared to existing methods. The results of the proposed methodology are evaluated on MSRATD 50 dataset, SVHN dataset, vehicle number plate dataset, SVT dataset, and random datasets, and the accuracy is 95.22%, 92.25%, 96.69%, 94.58%, and 98.12%, respectively. According to quantitative and qualitative analysis, this approach is more promising in terms of accuracy and precision rate.Entities:
Mesh:
Year: 2021 PMID: 34858492 PMCID: PMC8632382 DOI: 10.1155/2021/2676780
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Sample image of text-involved scenes of SVT dataset.
Literature study on existing methodology.
| S. no. | Author & year | Methodology | Dataset | Performance |
|---|---|---|---|---|
| 1 | S. Yasser Arafat et al. [ | Faster RCNN + two stream deep neural network (TSDNN) | UPTI dataset | Avg. precision = 98% |
| R. R. = 95.20% | ||||
| 2 | Asghar Ali Chandio et al. [ | Multiscale and multilevel features | Chars74 K and ICDAR03 datasets | Precision = 90% |
| Recall = 91% | ||||
| F-score = 91% | ||||
| 3 | Yao Qin et al. [ | Faster RCNN + BLSTM | ICDAR 2015 datasets | Precision = 89.8% |
| Recall = 84.3% | ||||
| F-score = 86.9% | ||||
| 4 | Jheng-Long Wu et al. [ | BLSTM + CNN | Corpus dataset | Macro-F1 = 72% |
| Micro-F1 = 71% | ||||
| 5 | S. Yasser Arafat et al. [ | (AlexNet and Vgg16) + BLSTM | UPTI dataset | Accuracy = 97% |
| 6 | Sardar Jaf et al. [ | Recurrent neural network (RNN) + BLSTM | English web treebank universal dependencies dataset | Precision = 91.43% |
| Recall = 94.52% | ||||
| F-score = 92.20% | ||||
| 7 | M. A. Panhwar et al. [ | ANN | Self-dataset | Accuracy = 85% |
| 8 | Yen-Min Su et al. [ | Contour + morphological operation + ROI | ICDAR datasets | Accuracy = 93.44% |
| Recall = 79.16% | ||||
| F-score = 85.71% | ||||
| 9 | Ling-Qun Zuo Su et al. [ | CNN + BLSTM | SVT dataset, IIIT5K dataset, ICDAR 2003 and 2015 dataset | Accuracy = 95.96% |
| Accuracy = 98% | ||||
| Accuracy = 98.2% | ||||
| Accuracy = 91% | ||||
| 10 | Baoguang Shi et al. [ | CRNN | SVT dataset, IIIT5K dataset, ICDAR dataset | Accuracy = 97.5% |
| Accuracy = 97.8% | ||||
| Accuracy = 98.7% | ||||
| Accuracy = 89.6% | ||||
| 11 | Xiaohang Ren et al. [ | Text structure component detector (TSCD) | Ren's dataset, Zhou's dataset, Pan's dataset | Precision = 82% |
| Recall = 72% | ||||
| F-score = 77% | ||||
| 12 | Xiang Bai et al. [ | Bag of strokelets + HOG | SVT dataset, IIIT5K dataset, ICDAR 2003 dataset | Accuracy = 80.99% |
| Accuracy = 85.6% | ||||
| Accuracy = 82.64% | ||||
| 13 | Mingkun Yang et al. [ | CAPTCHA system | IIIT5K, SVT, IC03 IC13, IC15, SVTP CUTE | Accuracy = 92.9%, 89.6%, 92.5%, 92.2%, 76.8%, 80%, 77.1% |
| 14 | Anna Zhu et al. [ | Anchor selection-based region proposal network | ICDAR2013, ICDAR2015, and MSRA-TD500 | Precision = 90.18%, 83.34%, 84.67% |
| Recall = 91.16%, 79.99%, 80.37% | ||||
| F-score = 90.62%, 81.63%, 82.49% | ||||
| 15 | ZiLing Hu et al. [ | Text contour attention text detector | ICDAR2015, CTW1500 | Precision = 88.9%, 86.5% |
| Recall = 85.2%, 80% | ||||
| F-score = 87%, 83.1% |
Figure 2Block diagram of proposed work.
Figure 3Original image and contour of the image on SVT dataset.
Hyperparameters used in the proposed work.
| Parameter | Value |
|---|---|
| Epochs | 50 |
| Validation_split = 0.1 | 0.1 |
| Drop out | 0.2 |
| Filters | 16 |
| Batch_size | 64 × 64 |
| Learning rate | 0.00001 |
Figure 4Block diagram of RCNN combined with Bi-LSTM.
Figure 5Block diagram of LSTM.
Figure 6Text recognition on MSRA dataset.
Metrics of MSRATD 50 dataset.
| Sr. No. | Parameters | Output |
|---|---|---|
| 1 | Precision | 94.15 |
| 2 | Recall | 85.73 |
| 3 | F-score | 87.09 |
| 4 | Accuracy | 95.22 |
Figure 7Text recognition on SVHN dataset.
Metrics of SVHN dataset.
| S. no. | Parameters | Output |
|---|---|---|
| 1 | Precision | 92.49 |
| 2 | Recall | 79.03 |
| 3 | F-score | 89.80 |
| 4 | Accuracy | 92.25 |
Figure 8Text recognition on vehicle number plates dataset.
Metrics of UFPR-ALPR dataset.
| S. no. | Parameters | Output |
|---|---|---|
| 1 | Precision | 93.11 |
| 2 | Recall | 86.77 |
| 3 | F-score | 90.01 |
| 4 | Accuracy | 96.69 |
Figure 9Text recognition on SVT dataset.
Metrics of SVT dataset.
| S. no. | Parameters | Output |
|---|---|---|
| 1 | Precision | 91.86 |
| 2 | Recall | 84.27 |
| 3 | F-score | 88.49 |
| 4 | Accuracy | 94.58 |
Figure 10Text recognition on random/self-dataset.
Metrics of random/self-dataset.
| S. no. | Parameters | Output |
|---|---|---|
| 1 | Precision | 90.18 |
| 2 | Recall | 98.19 |
| 3 | F-score | 97.07 |
| 4 | Accuracy | 98.12 |
Metrics of various datasets used in the proposed system.
| S. no. | Parameters | MSRATD 50 | UFPR-ALPR | SVHN | SVT | Random/self |
|---|---|---|---|---|---|---|
| 1 | Precision | 94.15 | 93.11 | 92.49 | 91.86 | 90.18 |
| 2 | Recall | 85.73 | 86.77 | 79.03 | 84.27 | 98.19 |
| 3 | F-score | 87.09 | 90.01 | 89.80 | 88.49 | 97.07 |
| 4 | Accuracy | 95.22 | 96.69 | 92.25 | 94.58 | 98.12 |
Figure 11Overall text recognition on all dataset.
Figure 12Incorrect text recognition on all dataset.
Metrics of various datasets used in the proposed system.
| S. no. | Parameters | Ref. [ | Ref. [ | Ref. [ | Ref. [ | Proposed work (average) |
|---|---|---|---|---|---|---|
| 1 | Precision | 82 | — | 91.43 | 90 | 92.15 |
| 2 | Recall | 72 | 79.16 | 94.52 | 91 | 83.50 |
| 3 | F-score | 77 | 85.71 | 92.20 | 91 | 88.56 |
| 4 | Accuracy | — | 93.44 | — | — | 93.83 |
Figure 13Comparative analysis of the proposed work with existing techniques.