| Literature DB >> 32490098 |
Asghar Ali Chandio1,2, Md Asikuzzaman1, Mark Pickering1, Mehwish Leghari2,3.
Abstract
Reading text in natural scene images is an active research area in the fields of computer vision and pattern recognition as text detection, text recognition and script identification are required. In this data article, a comprehensive dataset for Urdu text detection and recognition in natural scene images is presented and analysed. To develop the dataset, more than 2500 natural scene images were captured using a digital camera and a built-in mobile phone camera. Three separate datasets for isolated Urdu character images, cropped word images and end-to-end text spotting were developed. The isolated Urdu character and cropped word images dataset contain a much larger number of samples than existing Arabic natural scene text datasets. The Urdu text spotting dataset contains images with Urdu, English and Sindhi text instances. However, the focus has been given to the Urdu text instances. The ground truths for each image in the isolated character, cropped word or text spotting datasets are provided separately. The proposed datasets can be used to perform Urdu text detection and recognition or end-to-end recognition in natural scenes. These datasets can also be helpful to develop Arabic and Persian natural scene text detection and recognition systems, as Urdu is a derived language of these scripts and has many similar letters. The datasets can also be helpful to develop multi-language translation systems, which can facilitate foreign tourists to read and translate multilingual text in natural scene images. To evaluate the datasets, state-of-the-art machine learning and deep neural networks were used to build the text detection and recognition models, where the best classification accuracies are achieved. To the best of the authors' knowledge, this is the first dataset proposed for Urdu text detection, recognition or end-to-end text recognition in natural scene images. The aim of this data article is to present a benchmark work in the field of document analysis and recognition. Computer Science Computer Vision and Pattern Recognition Tables Figures Images Text Files Using a digital camera with a 20 megapixels (MP) sensor, an iPhone with a 12 MP back camera and a Samsung mobile with a 16MP back camera. Raw Analyzed Environmental factors such as illuminations, blurring and lighting conditions were considered while capturing images. The focus was given to the text within an image. The images in the dataset were obtained from the advertisement banners, sign-boards along the road side and streets, shop name boards, text written on the passing vehicles and walls. The images provided in this dataset were collected in different cities of Sindh, Pakistan. Summarized data are hosted with the article. The datasets and their related files are hosted in a Mendeley public data repository. DOI: https://data.mendeley.com/datasets/k5fz57zd9z/1 URL: http://dx.doi.org/10.17632/k5fz57zd9z.1.Entities:
Keywords: Convolutional neural networks; Cursive text in the wild; Multilingual text spotting dataset; Natural scene images; Urdu natural scene text dataset; Urdu text detection; Urdu text recognition
Year: 2020 PMID: 32490098 PMCID: PMC7262424 DOI: 10.1016/j.dib.2020.105749
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Natural scene images with text variations and complex backgrounds
Fig. 2Some examples of scene images in the Urdu-Text dataset. first row: machine printed text, second row: handwritten text, third row: handwritten and machine printed text with blur and uneven lighting conditions.
Fig. 3Some example images in the dataset with multilingual text.
Characteristics of Urdu, Arabic and Persian Scripts
| Characteristics | Urdu | Arabic | Persian |
|---|---|---|---|
| Number of Characters | 39 | 28 | 32 |
| Writing Direction | Right to left | Right to left | Right to left |
| Cursive | Yes | Yes | Yes |
| Dots and Diacritics | Yes | Yes | Yes |
Fig. 4Some examples with challenging text in the Urdu-Text dataset. (a) ligature overlapping and context-sensitivity (b) diagonal text and (c) ligature overlapping on different baselines
Fig. 5Representation of ligatures in an Urdu word
Fig. 6Variations in writing styles. (a) the Urdu word سندھ written in ten different styles (b) placing a ligature on top of another ligature
Fig. 7Some examples of stretched text in the cropped Urdu word image dataset
Fig. 8Manual segmentation of characters
Fig. 9Some examples of isolated character images in Urdu-Char dataset
Comparison of the Urdu-Char and Urdu-Word datasets with related datasets
| Datasets | No. of Images | No. of Character Images | No. of Word Images |
|---|---|---|---|
| ICDAR03 | 251 | 6185 | 1157 |
| ICDAR13 | 462 | — | 5003 |
| ICDAR15 | 1670 | — | 6545 |
| Chars74K | 1922 | 7705 English, 3345 Kannada | 1416 |
| ICDAR17 MLT Arabic | 800 | — | 3712 |
| ARASTI | 371 | 2093 | 1687 |
| EASTR | 2469 | 16624 Arabic, 5904 English | 2593 Arabic, 5172 English |
Fig. 10Some examples of cropped Urdu word images.
Fig. 11Some sample images in Urdu-Text spotting dataset. left: Urdu-Text images, right: annotation files.
Statistics of the Word Instances for each script
| Script Type | No. of Words |
|---|---|
| Urdu | 7603 |
| English | 5653 |
| Sindhi | 350 |
| Arabic | 68 |
| Symbols | 113 |
| Others | 01 |
Classification accuracy of each of the character class. The rows in table are ordered according to the ascending F-Score values.
| Character Class | Precision | Recall | F-Score | Character Class | Precision | Recall | F-Score |
|---|---|---|---|---|---|---|---|
| غ | 0.56 | 0.55 | 0.56 | ذ | 0.79 | 0.86 | 0.83 |
| ع | 0.62 | 0.55 | 0.58 | ہ | 0.89 | 0.79 | 0.84 |
| چ | 0.60 | 0.56 | 0.58 | ھ | 0.88 | 0.81 | 0.85 |
| ت | 0.63 | 0.64 | 0.63 | ہـ | 0.83 | 0.88 | 0.85 |
| ح | 0.65 | 0.61 | 0.63 | یـ | 0.85 | 0.86 | 0.86 |
| خ | 0.65 | 0.60 | 0.63 | ط | 0.85 | 0.87 | 0.86 |
| ب | 0.64 | 0.65 | 0.65 | م | 0.87 | 0.85 | 0.86 |
| ض | 0.59 | 0.77 | 0.67 | ز | 0.85 | 0.92 | 0.88 |
| پ | 0.69 | 0.68 | 0.68 | ظ | 0.89 | 0.87 | 0.88 |
| ص | 0.68 | 0.70 | 0.69 | ڈ | 0.88 | 0.88 | 0.88 |
| ف | 0.70 | 0.70 | 0.70 | س | 0.88 | 0.90 | 0.89 |
| ث | 0.71 | 0.70 | 0.71 | ٹ | 0.91 | 0.87 | 0.89 |
| ج | 0.71 | 0.71 | 0.71 | ک | 0.90 | 0.91 | 0.90 |
| ء | 0.67 | 0.77 | 0.72 | ل | 0.92 | 0.88 | 0.90 |
| ں | 0.81 | 0.68 | 0.74 | ے | 0.91 | 0.93 | 0.92 |
| ش | 0.77 | 0.72 | 0.74 | ا | 0.92 | 0.96 | 0.94 |
| ق | 0.84 | 0.66 | 0.74 | و | 0.95 | 0.94 | 0.94 |
| ن | 0.76 | 0.75 | 0.75 | ی | 0.95 | 0.95 | 0.95 |
| گ | 0.76 | 0.76 | 0.76 | ر | 0.94 | 0.96 | 0.95 |
| ڑ | 0.88 | 0.70 | 0.78 | ژ | 0.98 | 0.93 | 0.95 |
| د | 0.78 | 0.87 | 0.82 | آ | 0.94 | 0.97 | 0.96 |
Accuracy for Urdu word image text recognition
| Model | RNN Type | No. of Hidden Units | WRR (%) |
|---|---|---|---|
| CNN + RNN + CTC | LSTM | 128 | 74.77 |
| CNN + RNN + CTC | BLSTM | 128 | 78.13 |
| Tesseract-OCR | — | — | 6.81 |
Comparison of text detection accuracy on Urdu-Text spotting dataset
| Model | Precision | Recall | F-Score |
|---|---|---|---|
| EAST | 0.18 | 0.39 | 0.26 |
| CTPN | 0.32 | 0.71 | 0.43 |
| Proposed CNN without Pre-trained Weights | 0.22 | 0.45 | 0.30 |
| Proposed CNN with Pre-trained Weights | 0.29 | 0.70 | 0.37 |
| Computer Science | |
| Computer Vision and Pattern Recognition | |
| Tables | |
| Using a digital camera with a 20 megapixels (MP) sensor, an iPhone with a 12 MP back camera and a Samsung mobile with a 16MP back camera. | |
| Raw | |
| Environmental factors such as illuminations, blurring and lighting conditions were considered while capturing images. The focus was given to the text within an image. | |
| The images in the dataset were obtained from the advertisement banners, sign-boards along the road side and streets, shop name boards, text written on the passing vehicles and walls. | |
| The images provided in this dataset were collected in different cities of Sindh, Pakistan. | |
| Summarized data are hosted with the article. |