| Literature DB >> 35242929 |
Abstract
This article presents a handwritten Arabic alphabets, words and paragraphs dataset (AHAWP). The dataset contains 65 different Arabic alphabets (with variations on begin, end, middle and regular alphabets), 10 different Arabic words (that encompass all Arabic alphabets) and 3 different paragraphs. The dataset was collected anonymously from 82 different users. Each user was asked to write each alphabet and word 10 times. A userid uniquely but anonymously identifies the writer of each alphabet, word and paragraph. In total, the dataset consists of 53199 alphabet images, 8144 words images and 241 paragraphs images. This dataset can be used for multiple purposes. It can be used for optical handwriting recognition of alphabets and words. It can also be used for writer identification (or verification) of handwritten Arabic text. It is also possible to evaluate difference in writing styles of isolated alphabets as compared to the same alphabet written as part of the word or in paragraph by the same user using this dataset. The dataset is publicly available at https://data.mendeley.com/datasets/2h76672znt/1.Entities:
Keywords: Arabic Text recognition; Handwritten Arabic alphabets; Handwritten Arabic paragraphs; Handwritten Arabic words; Writer identification
Year: 2022 PMID: 35242929 PMCID: PMC8866147 DOI: 10.1016/j.dib.2022.107947
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Arabic Alphabets with variations collected in the dataset.
Fig. 2Arabic words collected in the dataset.
Fig. 3Arabic paragraphs collected in the dataset.
Fig. 4Arabic alphabets handwritten by a user.
Fig. 5Arabic words handwritten by a user.
Fig. 6Arabic paragraph handwritten by a user.
| Subject | Computer Science |
| Specific subject area | Image processing, Optical Handwritten Text Recognition, Writer Identification |
| Type of data | Image |
| How the data were acquired | Users completed the forms (based on fixed template) with their handwriting and these forms were then scanned |
| Data format | Raw |
| Description of data collection | Data was collected anonymously in a classroom setting. A “userid” was used to uniquely but anonymously identify the writer of each alphabet, word and paragraph. |
| Data source location | College of Computer Engineering and Science, Prince Mohammad Bin Fahd University |
| Data accessibility | Repository name: Mendeley Data |