| Literature DB >> 35242944 |
Ammad Ul Islam1, Muhammad Jaleed Khan1,2, Muhammad Asad1, Haris Ahmad Khan3, Khurram Khurshid1.
Abstract
This article presents a dataset of hyperspectral images of handwriting samples collected from 54 individuals. The purpose of the presented dataset is to further explore the use of hyperspectral imaging in document image analysis and to benchmark the performance of forensic analysis methods for hyperspectral document images. Each hyperspectral cube in the dataset has a spatial resolution of 512 × 650 pixels and contains 149 spectral channels in the spectral range of 478-901 nm. All the individuals have different personalities and have their writing patterns. The information of age and gender of each individual is collected. Each subject has written twenty-eight sentences using 12 different varieties of pens from different brands in blue color, each approximately 9 words or 33 characters long, all English alphabets in capital and small cases, digits from 0 to 9. The previous methods use synthetic mixed samples created by joining different parts of the images from the UWA WIHSI dataset.Each document consists of real mixed samples written withdifferent pens and by different writers with a variety of mixing ratios of inks and writers for forensic analysis.The standard A4 pages, each weighing 70 gs and manufactured by "AA" company, are used for data collection. The handwritten notes written by each subject with different pens are annotated in rectangular boxes. This dataset can be used for several tasks related to hyperspectral document image analysis and document forensic analysis including, handwritten optical character recognition, ink mismatch detection, writer identification at sentence, word, and character-level, handwriting-based gender classification, handwriting-based age prediction, handwritten word segmentation, and word generation. This dataset was designed and collected by the research team at the Artificial intelligence and Computer Vision Lab (iVision), Institute of Space Technology, Pakistan, and the hyperspectral images were acquired through imaging spectroscopy in the visible wavelength range at Wageningen University & Research, the Netherlands.Entities:
Keywords: Age estimation; Document forensics; Document image analysis; Handwritten optical character recognition; Hyperspectral image analysis; Hyperspectral imaging; Ink mismatch detection; Writer identification
Year: 2022 PMID: 35242944 PMCID: PMC8873541 DOI: 10.1016/j.dib.2022.107964
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Manufacturer and brand details of pen/inks used, with pen numbers.
| Pen Number | Brand Name | Manufacturer |
|---|---|---|
| Pen # 1 | Dollar Clipper | Dollar Industries |
| Pen # 2 | Piano Pro | Sayyed Engineers |
| Pen # 3 | Mercury Handy Grip | Mark Industries |
| Pen # 4 | Piano Point | Sayyed Engineers |
| Pen # 5 | Picasso Oria | Shahsons |
| Pen # 6 | Piano Silk | Sayyed Engineers |
| Pen # 7 | Picasso Grip | Shahsons |
| Pen # 8 | Piano Click | Sayyed Engineers |
| Pen # 9 | Piano Click Sky | Sayyed Engineers |
| Pen # 10 | Piano Ball Point Pen | Sayyed Engineers |
| Pen # 11 | Piano Crystal Gel | Sayyed Engineers |
| Pen # 12 | Piano Crystal | Sayyed Engineers |
ID-wise, Age and gender details.
| Age | Total | Male ID's | Female ID's | Male | Female |
|---|---|---|---|---|---|
| 18 | 5 | 30,31,42,49 | 2 | 4 | 1 |
| 19 | 16 | 17,19,20,24,25,27,28,29,32,33,35,36,38,41,46,48 | 16 | 0 | |
| 20 | 9 | 18,26,34,37,40,43,44,45,47 | 9 | 0 | |
| 21 | 6 | 8,22,50 | 3,4,6 | 3 | 3 |
| 22 | 5 | 7,13,21,23,39 | 5 | 0 | |
| 24 | 4 | 1,15,16 | 5 | 3 | 1 |
| 25 | 2 | 53 | 9 | 1 | 1 |
| 26 | 3 | 14,52 | 10 | 2 | 1 |
| 27 | 2 | 54 | 11 | 1 | 1 |
| 30 | 2 | 12,51 | 2 | 0 | |
Fig. 1Potential domains of data utilization.
Fig. 21st two pages of a handwritten document written by Writer#3.
Fig. 3Mixed Combinations of different inks in different ratios(a) 3rd Page written by Writer#3 (b) mixed combination of two inks; Pen#1 and Pen#2 in 1:1 (c) mixed combination of three inks; Pen#7, Pen #8 and Pen #9 in 1:1:1 (d) mixed combination of 3 inks; Pen #10, Pen #11 and Pen #12 in 1:8:1(e) different sentence written by Writer#3 with Pen#2.
Combinations of different writing samples in a single sentence with different ratios.
| Number of Writers | Ratio |
|---|---|
| 2 | 1:1 |
| 2 | 3:7 |
| 2 | 1:4 |
| 3 | 1:1:1 |
| 3 | 1:8:1 |
| 4 | 1:1:1:1 |
| 5 | 1:1:1:1:1 |
| 6 | 1:1:1:1:1:1 |
| 9 | 1:1:1:1:1:1:1 |
Fig. 4(a)Page#4 of the handwritten document written by Writer#3 (b) Page#5 of the handwritten document written by Writer#3.
Fig. 5Mixed Combinations of text written by different writers in different Ratios(a) mixed combination of text written by two writers; Writer#43 and Writer#51 in ratio 2:3 (b) mixed combination of text written by three different writers; Writer#12, Writer#43 and Writer#51 in ratio 1:1:1 (c) mixed combination of text written by four different writers; Writer#51, Writer#43, Writer#12 and Writer#52 in ratio 1:1:1:1 (d) Mixed combination of text written by six different writers; Writer#43, Writer#54, Writer#52, Writer#12, Writer#53 and Writer#51 in ratio 1:1:1:1:1:1:1 (e) Capital case English alphabets written by writer#51 (f) Small case English alphabets written by Writer#51 (g) Numeric digits written by Writer#51.
Combinations of different inks in a single sentence with different ratios.
| Number of Inks | Ratio |
|---|---|
| 2 | 1:1 |
| 2 | 3:7 |
| 2 | 1:4 |
| 3 | 1:1:1 |
| 3 | 1:8:1 |
Fig. 6Quantitative distribution, based on age and gender.
| Subject | Computer Science; Computer Vision and Pattern Recognition |
| Specific subject area | Hyperspectral Document Imaging |
| Type of data | Image |
| How data were acquired | Individuals of the 18–30 age group, who know writing English and have a sense of following complex instructions were given 5 pages data collection form. The completed form is then stored in a safe environment and is sent for hyperspectral scanning. All the collected documents are scanned with Imec SNAPSCAN VNIR hyperspectral camera. Scanning is done with 149 spectral bands of 478.783 nm- 900.972 nm. |
| Data format | Hyperspectral RAW Images in ENVI Format |
| Description of data collection | A group of several persons were given instructions and asked for volunteer data collection, 9 of them were selected and instructed accordingly in a single session. All 9 individuals were given 5 pages of the data collection form and are provided with pen # 1. Each participant is followed-up during the data collection process. The data for pen # 1 is collected, when all group members completed the section to be written with pen # 1, the pens were collected back and pen # 2 is provided to each member and vice versa. The mixed combinations for ink mixing are completed in an above-explained manner. For the section to be written by different writers (a mixed combination for writer identification) the documents were shuffled and distributed again. After completion of all the pages, documents were cross-checked and after verification, the document is stored in an envelope holding a tag of “completed documents”. |
| Data source location | Institution: Institute of Space Technology |
| Data accessibility | Repository name: Harvard Dataverse Repository |
Quantitative comparison of iVision HHID with the publicly available datasets of hyperspectral handwritten images.
| Dataset | UWA WIHSI (Blue) | UWA WIHSI (Blue) | iVison HHID (Proposed) |
|---|---|---|---|
| 7 | 7 | ||
| 5 | 5 | ||
| 7 | 7 | ||
| 45 | 45 | ||
| 165 | 165 | ||
| 5 | 5 | ||
| No | No | ||
| No | No | ||
| No | No | ||
| 752 × 480 | 752 × 480 | ||
| 400–720 | 400–720 | ||
| 33 | 33 |