| Literature DB >> 30781367 |
Tuyen Danh Pham1, Dat Tien Nguyen2, Chanhum Park3, Kang Ryoung Park4.
Abstract
Automatic sorting of banknotes in payment facilities, such as automated payment machines or vending machines, consists of many tasks such as recognition of banknote type, classification of fitness for recirculation, and counterfeit detection. Previous studies addressing these problems have mostly reported separately on each of these classification tasks and for a specific type of currency only. In other words, there has been little research conducted considering a combination of these multiple tasks, such as classification of banknote denomination and fitness of banknotes, as well as considering a multinational currency condition of the method. To overcome this issue, we propose a multinational banknote type and fitness classification method that both recognizes the denomination and input direction of banknotes and determines whether the banknote is suitable for reuse or should be replaced by a new one. We also propose a method for estimating the fitness value of banknotes and the consistency of the estimation results among input trials of a banknote. Our method is based on a combination of infrared-light transmission and visible-light reflection images of the input banknote and uses deep-learning techniques with a convolutional neural network. The experimental results on a dataset composed of Indian rupee (INR), Korean won (KRW), and United States dollar (USD) banknote images with mixture of two and three fitness levels showed that the proposed method gives good performance in the combination condition of currency types and classification tasks.Entities:
Keywords: deep learning; fitness value estimation; infrared-light transmission image; multinational banknote type and fitness classification; visible-light reflection image
Year: 2019 PMID: 30781367 PMCID: PMC6412798 DOI: 10.3390/s19040792
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Summary of related works on banknote recognition and fitness classification considering the variety of currency types.
| Category | Method | Advantage | Disadvantage | |
|---|---|---|---|---|
| Banknote recognition | Single Currency Recognition | - Using HSV color features and template matching [ | Simple in image acquisition and classification as recognition process is conducted on separated currency types with visible light image. | Currency types need to be manually selected before recognition. |
| Multiple Currency Recognition | - Using GA for optimizing feature extraction and NN for classifying [ | Multinational banknote classification methods do not require the pre-selection of currency type. | Classification task becomes complex as the number of classes increase. | |
| Banknote fitness classification | Using Single Sensor | - Using image morphology and Otsu’s thresholding [ | Simple in image acquisition because of using only one type of visible-light banknote image. | Currency types need to be manually selected. |
| Using Multiple Sensors | - Using fuzzy system on VR and NIRT banknote images [ | Performance can be enhanced by using multiple imaging sensors. | Complexity and expensiveness in the implementation of hardware. | |
| Banknote type and fitness classification (proposed method) | Using CNN for banknote recognition and fitness classification of banknotes from multiple countries with VR and IRT images. | Take advantage of deep learning technique on CNN for a large number of classes when combining banknote recognition and fitness classification into one classifier. | Time consuming procedure for CNN training is required. | |
Summary of banknote datasets used for experiments in the previous studies (Ref: Reference(s), N/A: Not Available).
| Category | Ref. | Currency Type | Output Description | Dataset Availability | |
|---|---|---|---|---|---|
| Banknote Recognition | Single Currency Recognition | [ | INR, AUD, EUR, SAR, USD | 2 denominations for each of INR, AUD, EUR, and SAR. USD was not reported. | N/A |
| [ | USD, RMB, EUR | 24 classes of USD, 20 classes of RMB, and 28 classes of EUR. | N/A | ||
| [ | USD, Angola (AOA), Malawi (MWK), South Africa (ZAR) | 68 classes of USD, 36 classes of AOA, 24 classes of MWK, and 40 classes of ZAR. | N/A | ||
| [ | Hongkong (HKD), Kazakhstan (KZT), Colombia (COP), USD | 128 classes of HKD, 60 classes of KZT, 32 classes of COP, and 68 classes of USD. | N/A | ||
| Multiple Currency Recognition | [ | Japan (JPN), Italy (ITL), Spain (ESP), France (FRF) | 23 denominations. | N/A | |
| [ | KRW, USD, EUR, CNY, RUB | 55 denominations. | N/A | ||
| [ | CNY, EUR, JPY, KRW, RUB, USD | 248 classes of 62 denominations. | DMC-DB1 [ | ||
| [ | 23 countries (USD, RUB, KZT, JPY, INR, EUR, CNY, etc.) | 101 denominations. | N/A | ||
| Banknote Fitness Classification | [ | INR, KRW, USD | 5 classes with 3 classes of case 1 (fit, normal and unfit) and 2 classes of case 2 (fit and unfit). | DF-DB2 [ | |
| [ | EUR, RUB | 2 classes (fit and unfit). | N/A | ||
| [ | USD, KRW, INR | 2 classes (fit and unfit). | N/A | ||
| [ | KRW, INR, USD | 3 classes of KRW and INR (fit, normal, and unfit), 2 classes of USD (fit and unfit). | DF-DB1 [ | ||
| Banknote type and fitness classification (proposed method) | INR, KRW, USD | 116 classes of banknote kinds and fitness levels. | DF-DB3 [ | ||
Figure 1Overall flowchart of the proposed method.
Figure 2Example of KRW banknote images captured by the device in: (a) front side’s forward VR image (A direction); (b) front side’s backward VR image (B direction); (c) back side’s forward VR image (C direction); (d) back side’s backward VR image (D direction); (e) forward IRT image and (f) backward IRT image; (g–l) are the corresponding banknote region segmented images from the original images in (a–f), respectively.
Figure 3Convolutional neural network (CNN) architectures used for comparisons in our research.
Architecture of AlexNet used in this study (unit: pixel).
| Layer Name | Filter Size | Stride | Padding | Number of Filters | Output Feature Map Size | |
|---|---|---|---|---|---|---|
| Image Input Layer | 120 × 240 × 3 | |||||
| Conv1 | Conv. | 7 × 7 × 3 | 2 | 0 | 96 | 57 × 117 × 96 |
| CCN | ||||||
| Max Pooling | 3 × 3 | 2 | 0 | 28 × 58 × 96 | ||
| Conv2 | Conv. | 5 × 5 × 96 | 1 | 2 | 128 | 28 × 58 × 128 |
| CCN | ||||||
| Max Pooling | 3 × 3 | 2 | 0 | 13 × 28 × 128 | ||
| Conv3 | Conv. | 3 × 3 × 128 | 1 | 1 | 256 | 13 × 28 × 256 |
| Conv4 | Conv. | 3 × 3 × 256 | 1 | 1 | 256 | 13 × 28 × 256 |
| Conv5 | Conv. | 3 × 3 × 256 | 1 | 1 | 128 | 13 × 28 × 128 |
| Max Pooling | 3 × 3 | 2 | 0 | 6 × 13 × 128 | ||
| Fully Connected Layers | Fc1 | 4096 | ||||
| Fc2 | 2048 | |||||
| Dropout | ||||||
| Fc3 | 116 | |||||
| Softmax | ||||||
Architecture of GoogleNet used in this study. “Conv. 1×1 (a)” and “Conv. 1×1 (b)” denote the 1 × 1 convolutional layers used for 3 × 3 and 5 × 5 convolutional computing reduction, respectively, “Conv. 1×1 (c)” denotes the 1 × 1 convolutional layers used for 3 × 3 pooling dimensional matching (unit of filter size, stride, and feature map size: pixel).
| Layer Name | Filter Size/Stride | Number of Filters | Output Feature Map Size | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Conv. 1 × 1 | Conv. 1 × 1 (a) | Conv. 3 × 3 | Conv. 1 × 1 (b) | Conv. 5 × 5 | Conv. 1 × 1 (c) | |||||
| Image Input Layer | 120 × 240 × 3 | |||||||||
| Conv1 | Conv. | 7 × 7/2 | 64 | 60 × 120 × 64 | ||||||
| Max Pooling | 3 × 3/2 | 30 × 60 × 64 | ||||||||
| CCN | ||||||||||
| Conv2 | Conv. | 1 × 1/1 | 64 | 30 × 60 × 64 | ||||||
| Conv. | 3 × 3/1 | 192 | 30 × 60 × 192 | |||||||
| CCN | ||||||||||
| Max Pooling | 3 × 3/2 | 15 × 30 × 192 | ||||||||
| Conv3 | Inception3a | 64 | 96 | 128 | 16 | 32 | 32 | 15 × 30 × 256 | ||
| Inception3b | 128 | 128 | 192 | 32 | 96 | 64 | 15 × 30 × 480 | |||
| Max Pooling | 3 × 3/2 | 7 × 15 × 480 | ||||||||
| Conv4 | Inception4a | 192 | 96 | 208 | 16 | 48 | 64 | 7 × 15 × 512 | ||
| Inception4b | 160 | 112 | 224 | 24 | 64 | 64 | 7 × 15 × 512 | |||
| Inception4c | 128 | 128 | 256 | 24 | 64 | 64 | 7 × 15 × 512 | |||
| Inception4d | 112 | 144 | 288 | 32 | 64 | 64 | 7 × 15 × 528 | |||
| Inception4e | 256 | 160 | 320 | 32 | 128 | 128 | 7 × 15 × 832 | |||
| Max Pooling | 3 × 3/2 | 3 × 7 × 832 | ||||||||
| Conv5 | Inception5a | 256 | 160 | 320 | 32 | 128 | 128 | 3 × 7 × 832 | ||
| Inception5b | 384 | 192 | 384 | 48 | 128 | 128 | 3 × 7 × 1024 | |||
| Average Pooling | 3 × 7/1 | 1 × 1 × 1024 | ||||||||
| Fully-Connected Layer | Dropout | |||||||||
| Fc | 116 | |||||||||
| Softmax | ||||||||||
Architecture of ResNet-18 used in this study (unit: pixels).
| Layer Name | Filter Size | Stride | Padding | Number of Filters | Output Feature Map Size | ||
|---|---|---|---|---|---|---|---|
| Image Input Layer | 120 × 240 × 3 | ||||||
| Conv1 | Conv. | 7 × 7 × 3 | 2 | 3 | 64 | 60 × 120 × 64 | |
| BN | |||||||
| Max Pooling | 3 × 3 | 2 | 1 | 30 × 60 × 64 | |||
| Conv2 | Res2a | Conv. | 3 × 3 × 64 | 1 | 1 | 64 | 30 × 60 × 64 |
| Conv. | 3 × 3 × 64 | 1 | 1 | 64 | |||
| Res2b | Conv. | 3 × 3 × 64 | 1 | 1 | 64 | 30 × 60 × 64 | |
| Conv. | 3 × 3 × 64 | 1 | 1 | 64 | |||
| Conv3 | Res3a | Conv. | 3 × 3 × 64 | 2 | 1 | 128 | 15 × 30 × 128 |
| Conv. | 3 × 3 × 128 | 1 | 1 | 128 | |||
| Conv. (Shortcut) | 1 × 1 × 64 | 2 | 0 | 128 | |||
| Res3b | Conv. | 3 × 3 × 128 | 1 | 1 | 128 | 15 × 30 × 128 | |
| Conv. | 3 × 3 × 128 | 1 | 1 | 128 | |||
| Conv4 | Res4a | Conv. | 3 × 3 × 128 | 2 | 1 | 256 | 8 × 15 × 256 |
| Conv. | 3 × 3 × 256 | 1 | 1 | 256 | |||
| Conv. (Shortcut) | 1 × 1 × 128 | 2 | 0 | 256 | |||
| Res4b | Conv. | 3 × 3 × 256 | 1 | 1 | 256 | 8 × 15 × 256 | |
| Conv. | 3 × 3 × 256 | 1 | 1 | 256 | |||
| Conv5 | Res5a | Conv. | 3 × 3 × 256 | 2 | 1 | 512 | 4 × 8 × 512 |
| Conv. | 3 × 3 × 512 | 1 | 1 | 512 | |||
| Conv. (Shortcut) | 1 × 1 × 256 | 2 | 0 | 512 | |||
| Res5b | Conv. | 3 × 3 × 512 | 1 | 1 | 512 | 4 × 8 × 512 | |
| Conv. | 3 × 3 × 512 | 1 | 1 | 512 | |||
| Average Pooling | 4 × 8 | 0 | 1 × 1 × 512 | ||||
| Fully-Connected Layers | Fc | 116 | |||||
| Softmax | |||||||
Architecture of ResNet-50 used in this study (unit: pixels).
| Layer Name | Filter Size | Stride | Padding | Number of Filters | Output Feature Map Size | ||
|---|---|---|---|---|---|---|---|
| Image Input Layer | 120 × 240 × 3 | ||||||
| Conv1 | Conv. | 7 × 7 × 3 | 2 | 3 | 64 | 60 × 120 × 64 | |
| BN | |||||||
| Max Pooling | 3 × 3 | 2 | 1 | 29 × 59 × 64 | |||
| Conv2 | Res2a | Conv. | 1 × 1 × 64 | 1 | 0 | 64 | 29 × 59 × 256 |
| Conv. | 3 × 3 × 64 | 1 | 1 | 64 | |||
| Conv. | 1 × 1 × 64 | 1 | 0 | 256 | |||
| Conv. (Shortcut) | 1 × 1 × 64 | 1 | 0 | 256 | |||
| Res2b-c | Conv. | 1 × 1 × 256 | 1 | 0 | 64 | 29 × 59 × 256 | |
| Conv. | 3 × 3 × 64 | 1 | 1 | 64 | |||
| Conv. | 1 × 1 × 64 | 1 | 0 | 256 | |||
| Conv3 | Res3a | Conv. | 1 × 1 × 256 | 2 | 0 | 128 | 15 × 30 × 512 |
| Conv. | 3 × 3 × 128 | 1 | 1 | 128 | |||
| Conv. | 1 × 1 × 128 | 1 | 0 | 512 | |||
| Conv. (Shortcut) | 1 × 1 × 256 | 2 | 0 | 512 | |||
| Res3b-d | Conv. | 1 × 1 × 512 | 1 | 0 | 128 | 15 × 30 × 512 | |
| Conv. | 3 × 3 × 128 | 1 | 1 | 128 | |||
| Conv. | 1 × 1 × 128 | 1 | 0 | 512 | |||
| Conv4 | Res4a | Conv. | 1 × 1 × 512 | 2 | 0 | 256 | 8 × 15 × 1024 |
| Conv. | 3 × 3 × 256 | 1 | 1 | 256 | |||
| Conv. | 1 × 1 × 256 | 1 | 0 | 1024 | |||
| Conv. (Shortcut) | 1 × 1 × 512 | 2 | 0 | 1024 | |||
| Res4b-f | Conv. | 1 × 1 × 1024 | 1 | 0 | 256 | 8 × 15 × 1024 | |
| Conv. | 3 × 3 × 256 | 1 | 1 | 256 | |||
| Conv. | 1 × 1 × 256 | 1 | 0 | 1024 | |||
| Conv5 | Res5a | Conv. | 1 × 1 × 1024 | 2 | 0 | 512 | 4 × 8 × 2048 |
| Conv. | 3 × 3 × 512 | 1 | 1 | 512 | |||
| Conv. | 1 × 1 × 512 | 1 | 0 | 2048 | |||
| Conv. (Shortcut) | 1 × 1 × 1024 | 2 | 0 | 2048 | |||
| Res5b-c | Conv. | 1 × 1 × 2048 | 1 | 0 | 512 | 4 × 8 × 2048 | |
| Conv. | 3 × 3 × 512 | 1 | 1 | 512 | |||
| Conv. | 1 × 1 × 512 | 1 | 0 | 2048 | |||
| Average Pooling | 4 × 8 | 1 | 0 | 1 × 1 × 2048 | |||
| Fully Connected Layers | Fc | 116 | |||||
| Softmax | |||||||
Figure 4Structure of the Inception module.
Figure 5Structures of residual blocks: (a) two-layer deep block; (b) three-layer deep (bottleneck) block.
Figure 6Comparison of average pixel values of VR banknote images from (a) INR10 front forward direction (INR10A); (b) back side of banknotes in (a) (INR10C); (c) KRW5000 front backward direction (KRW5000B); and (d) back side of banknotes in (c) (KRW5000D).
Figure 7Examples of INR banknote images: (a) fit; (b) normal; and (c) unfit banknotes. From left to right of each figure are the VR1, VR2, and IRT images of the input banknote, respectively.
Figure 8Examples of KRW banknote images: (a) fit; (b) normal; and (c) unfit banknotes. Images in each figure have the same order as those in Figure 7.
Figure 9Examples of USD banknote images: (a) fit and (b) unfit banknotes. Images on the left and right of each figure are the VR and IRT images of the input banknote, respectively.
Number of input banknotes in the experimental multinational banknote fitness dataset.
| Banknote Type | Number of Banknotes | Number of Banknotes after Data Augmentation | ||||
|---|---|---|---|---|---|---|
| Fit | Normal | Unfit | Fit | Normal | Unfit | |
| INR10 | 1299 | 553 | 196 | 2598 | 2212 | 1960 |
| INR20 | 898 | 456 | 57 | 2694 | 2280 | 1425 |
| INR50 | 719 | 235 | 206 | 1438 | 1175 | 2060 |
| INR100 | 1477 | 1464 | 243 | 2954 | 2928 | 1944 |
| INR500 | 1399 | 435 | 130 | 2798 | 2175 | 1950 |
| INR1000 | 153 | 755 | 71 | 1530 | 2265 | 1775 |
| KRW1000 | 3690 | 3344 | 2695 | 3690 | 3344 | 2695 |
| KRW5000 | 3861 | 3291 | 3196 | 3861 | 4045 | 3196 |
| KRW10000 | 3900 | 3779 | N/A | 3900 | 3779 | N/A |
| KRW50000 | 3794 | 3799 | N/A | 3794 | 3799 | N/A |
| USD5 | 177 | N/A | 111 | 3540 | N/A | 2775 |
| USD10 | 384 | N/A | 83 | 3072 | N/A | 2075 |
| USD20 | 390 | N/A | 51 | 3120 | N/A | 1275 |
| USD50 | 851 | N/A | 42 | 4255 | N/A | 1050 |
| USD100 | 772 | N/A | 90 | 3860 | N/A | 2250 |
Figure 10Average accuracy of banknote type and fitness classification using various CNN architectures with training from scratch and transfer learning.
Figure 11Average accuracy calculated separately for banknote recognition and fitness classification using various CNN architectures with training from scratch and transfer learning.
Figure 12Examples of error cases in the testing of our proposed method: (a) Case 1—unfit banknote with banknote type correctly recognized but fitness level misclassified to normal; (b) Case 2—unfit banknote with misclassified banknote type; and (c) Case 3—misclassification of both banknote type and fitness. In (a) and (c), images were arranged similarly as those in Figure 7; in (b), images were arranged similarly as those in Figure 9.
Comparative experimental results of the proposed method between with and without banknote region segmentation.
| Method | AlexNet | ResNet-18 | ||||
|---|---|---|---|---|---|---|
| Banknote Recognition Accuracy | Fitness Classification Accuracy | Overall Accuracy | Banknote Recognition Accuracy | Fitness Classification Accuracy | Overall Accuracy | |
| Using Original Banknote Image | 99.535 | 97.035 | 96.773 | 99.608 | 97.532 | 97.408 |
| Using Segmented Banknote Image (Proposed Method) | 99.935 | 97.928 | 97.926 | 99.936 | 97.678 | 97.690 |
Figure 13Average testing accuracy of five-fold cross-validation with overall accuracy and accuracy calculated separately for banknote recognition and fitness classification using various CNN architectures with models trained from scratch.
Figure 14Average root-mean-squared error (RMSE) of banknote fitness estimation using various regression CNN architectures with training from scratch and transfer learning.
Figure 15Examples of input banknotes with desired fitness values and their trials with predicted fitness values.
Figure 16Average quantization error (AQE) of banknote fitness estimation using various regression CNN architectures with training from scratch and transfer learning.
Figure 17Adjusted average quantization error (AQE) of banknote fitness estimations using various regression CNN architectures with training from scratch and transfer learning with the differences of estimated trial values among banknotes considered.
Comparison of banknote type and fitness classification and banknote fitness value estimation results by our proposed method to that of previous method. “RMSE” denotes root-mean-squared error, “AQE” denotes average quantization error.
| Method | Banknote Type and Fitness Classification Accuracy (unit: %) | Banknote Fitness Value Estimation | ||||
|---|---|---|---|---|---|---|
| Banknote Recognition Accuracy | Fitness Classification Accuracy | Overall Accuracy | RMSE | AQE | ||
| Using VR Images and AlexNet [ | 99.955 | 95.040 | 95.038 | 1.048 | 0.0688 | |
| Using IRT, VR images and CNNs (Proposed Method) | AlexNet | 99.935 | 97.928 | 97.926 | 0.920 | 0.0513 |
| ResNet-18 | 99.963 | 97.678 | 97.690 | 1.041 | 0.0522 | |