| Literature DB >> 34307856 |
Mir Moynuddin Ahmed Shibly1, Tahmina Akter Tisha1, Tanzina Akter Tani1, Shamim Ripon1.
Abstract
In this era of advancements in deep learning, an autonomous system that recognizes handwritten characters and texts can be eventually integrated with the software to provide better user experience. Like other languages, Bangla handwritten text extraction also has various applications such as post-office automation, signboard recognition, and many more. A large-scale and efficient isolated Bangla handwritten character classifier can be the first building block to create such a system. This study aims to classify the handwritten Bangla characters. The proposed methods of this study are divided into three phases. In the first phase, seven convolutional neural networks i.e., CNN-based architectures are created. After that, the best performing CNN model is identified, and it is used as a feature extractor. Classifiers are then obtained by using shallow machine learning algorithms. In the last phase, five ensemble methods have been used to achieve better performance in the classification task. To systematically assess the outcomes of this study, a comparative analysis of the performances has also been carried out. Among all the methods, the stacked generalization ensemble method has achieved better performance than the other implemented methods. It has obtained accuracy, precision, and recall of 98.68%, 98.69%, and 98.68%, respectively on the Ekush dataset. Moreover, the use of CNN architectures and ensemble methods in large-scale Bangla handwritten character recognition has also been justified by obtaining consistent results on the BanglaLekha-Isolated dataset. Such efficient systems can move the handwritten recognition to the next level so that the handwriting can easily be automated.Entities:
Keywords: Bangla handwritten character recognition; Bootstrap aggregating; Convolutional neural network; Deep learning; Ensemble learning; Feature extraction; Image classification; Stacked generalization
Year: 2021 PMID: 34307856 PMCID: PMC8279136 DOI: 10.7717/peerj-cs.565
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Complexity comparison of Bangla handwritten character with handwritten characters from other languages.
This figure compares the complexity of handwritten characters of Bangla with other languages. (A) An English handwritten character, (B) a Bangla handwritten character, and (C) an Arabic handwritten character.
Figure 2Overview of the workflow of the study.
Details of AlexNet, VGG16, VGG19 and a small CNN architectures.
| AlexNet | VGG16 | VGG19 | Small CNN |
|---|---|---|---|
| Flatten | Flatten | Flatten | Flatten |
| 1st Dense layer | 1st Dense layer | 1st Dense layer | 1st Dense layer + Dropout (50%) |
| 2nd Dense layer | 2nd Dense layer | 2nd Dense layer | 2nd Dense layer + Dropout (30% |
| Softmax output layer | Softmax output layer | Softmax output layer | Softmax output layer |
Note:
The layers descriptions are given for each architecture. For convolutional layer (Conv2D), the dimension of the filter at a convolutional layer e.g., 3 × 3 and number of filters e.g., 96 are given. For max-pooling, the pool size e.g., 2 × 2 is mentioned.
Figure 3Two building blocks of ResNet architecture.
The ResNet architecture is built with the help of these two blocks. (A) Convolutional block and (B) identity block.
Figure 4Xception architecture details.
Figure 5Overview of DenseNet architecture.
Different image augmentation hyperparameters.
| Name | Augmentation hyperparameters |
|---|---|
| aug0 | No image augmentation |
| aug1 | |
| aug2 | |
| aug3 | |
| aug4 | |
| aug5 | |
| aug6 | |
| aug7 | |
| aug8 | |
| aug9 |
Figure 6The working procedure of stacked generalization ensemble method.
Figure 7The working procedure of bootstrap aggregating ensemble method.
Ekush dataset details.
| Character type | No. of classes | No. of instances |
|---|---|---|
| Modifier | 10 | 54,829 |
| Basic character | 50 | 307,316 |
| Compound character | 52 | 306,231 |
| Digit | 10 | 61,374 |
| Total | 122 | 729,750 |
Figure 8A few representative images of each character type from the Ekush dataset.
(A) A few Bangla handwritten modifiers, (B) a few Bangla handwritten vowels, (C) a few Bangla handwritten consonants, (D) a few Bangla handwritten compounds, and (E) a few Bangla handwritten digits.
Performance comparison with state-of-the-art works on Bangla handwritten character recognition.
| Work | Dataset | Number of characters | Method | Accuracy |
|---|---|---|---|---|
| Own collected dataset. | 138 | MQDF | 85.90 | |
| Own prepared dataset with 20000 samples | 50 | Convnets | 85.96 | |
| BanglaLekha Isolated ( | 80 | Convnets | 88.93 | |
| Own collected dataset. | 152 | SVM | 94.3 | |
| CMATERdb ( | 73 | Modified ResNet18 | 95.10 | |
| NumtaDB ( | 10 | Ensemble | 96.69 | |
| Ekush ( | 122 | Convnets | 97.73 | |
| Proposed method | Ekush ( | 122 | Stacked Generalization | 98.68 |
| Proposed method | Ekush ( | 122 | Bagging | 98.37 |
| Proposed method | BanglaLekha Isolated ( | 84 | Stacked Generalization | 92.67 |
| Proposed method | BanglaLekha Isolated ( | 84 | Bagging | 93.55 |
Figure 9Validation accuracy vs epochs comparison for the convnets on the Ekush dataset.
Figure 10Validation loss vs epochs comparison for the convnets on the Ekush dataset.
All models’ performances on the Ekush and BanglaLekha-Isolated datasets.
| Dataset | Methods | Models | Precision (%) | Recall (%) | F1-score (%) | Accuracy (%) |
|---|---|---|---|---|---|---|
| Ekush Dataset | Convnets | AlexNet | 96.80 | 96.80 | 96.79 | 96.80 |
| VGG16 | 97.06 | 97.05 | 97.04 | 97.05 | ||
| VGG19 | 96.99 | 96.97 | 96.97 | 96.97 | ||
| ResNet50 | 97.82 | 97.81 | 97.81 | 97.81 | ||
| Xception | 97.64 | 97.63 | 97.63 | 97.63 | ||
| DenseNet | 96.31 | 96.25 | 96.26 | 96.25 | ||
| Small CNN | 97.33 | 97.30 | 97.30 | 97.30 | ||
| ResNet50 as the feature extractor | Logistic Regression | 97.76 | 97.76 | 97.76 | 97.76 | |
| SVM | 97.76 | 97.75 | 97.75 | 97.75 | ||
| Naïve Bayes | 97.2 | 97.15 | 97.16 | 97.15 | ||
| Decision Tree | 95.75 | 95.74 | 95.74 | 95.74 | ||
| Ensemble | Stacked Generalization | 98.69 | 98.68 | 98.68 | 98.68 | |
| Bootstrap Aggregating | 98.38 | 98.37 | 98.37 | 98.37 | ||
| Adaboost | 96.42 | 96.36 | 96.36 | 96.37 | ||
| Xgboost | 96.59 | 96.58 | 96.58 | 96.58 | ||
| Random Forest | 97.33 | 97.32 | 97.31 | 97.32 | ||
| BanglaLekha-Isolated Dataset | Convnets | AlexNet | 88.99 | 88.88 | 88.89 | 88.88 |
| VGG16 | 92.16 | 92.11 | 92.10 | 92.11 | ||
| VGG19 | 89.86 | 89.75 | 89.76 | 89.75 | ||
| ResNet50 | 92.68 | 92.63 | 92.65 | 92.63 | ||
| Xception | 90.37 | 90.19 | 90.22 | 90.19 | ||
| DenseNet | 89.60 | 89.41 | 89.50 | 89.41 | ||
| Small CNN | 92.63 | 92.59 | 92.58 | 92.59 | ||
| ResNet50 as the feature extractor | Logistic Regression | 92.24 | 92.17 | 92.19 | 92.17 | |
| SVM | 92.09 | 92.03 | 92.04 | 92.03 | ||
| Naïve Bayes | 91.87 | 91.67 | 91.72 | 91.67 | ||
| Decision Tree | 90.12 | 90.04 | 90.06 | 90.04 | ||
| Ensemble | Stacked Generalization | 92.78 | 92.67 | 92.67 | 92.67 | |
| Bootstrap Aggregating | 93.60 | 93.55 | 93.55 | 93.55 | ||
| Adaboost | 91.72 | 91.62 | 91.66 | 91.62 | ||
| Xgboost | 91.92 | 91.86 | 91.88 | 91.86 | ||
| Random Forest | 92.28 | 92.20 | 92.22 | 92.20 |
Figure 11First and second level classifiers’ performances on Ekush dataset.
Figure 12Individual and bagged classifiers’ performances on Ekush dataset.
Training and testing time of different models on the Ekush dataset.
| Model | Time to process an image during training (millisecond) | Time to predict an image during testing (millisecond) |
|---|---|---|
| AlexNet | 0.9766 | 0.3672 |
| Small CNN | 0.3916 | 0.1582 |
| DenseNet | 1.468 | 0.4307 |
| ResNet50 | 0.4727 | 0.2031 |
| VGG16 | 0.332 | 0.1572 |
| VGG19 | 0.3271 | 0.336 |
| Xception | 0.5469 | 0.1914 |
False positive, false negative of a few classes of ResNet50 model for Ekush dataset.
| Performance | Class | Support | Number of false-negative | Number of false positive |
|---|---|---|---|---|
| Poor | 111 | 156 | 47 | 32 |
| 46 | 894 | 60 | 60 | |
| 84 | 872 | 63 | 53 | |
| 97 | 642 | 48 | 53 | |
| Average | 103 | 924 | 46 | 2 |
| 16 | 926 | 15 | 15 | |
| 39 | 926 | 42 | 46 | |
| Good | 20 | 928 | 5 | 2 |
| 96 | 926 | 3 | 6 |
Figure 13A few misclassified instances from Ekush dataset by ResNet50 model.