Literature DB >> 26217753

Data set for Tifinagh handwriting character recognition.

Omar Bencharef1, Younes Chihab2, Nouredine Mousaid1, Mustapha Oujaoura3.   

Abstract

The Tifinagh alphabet-IRCAM is the official alphabet of the Amazigh language widely used in North Africa [1]. It includes thirty-one basic letter and two letters each composed of a base letter followed by the sign of labialization. Normalized only in 2003 (Unicode) [2], ICRAM-Tifinagh is a young character repertoire. Which needs more work on all levels. In this context we propose a data set for handwritten Tifinagh characters composed of 1376 image; 43 Image For Each character. The dataset can be used to train a Tifinagh character recognition system, or to extract the meaning characteristics of each character.

Entities:  

Year:  2015        PMID: 26217753      PMCID: PMC4510372          DOI: 10.1016/j.dib.2015.04.008

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table

Value of the data

The Amazigh language is considered as official language only in 2003. Therefore, The integration of the Tifinagh alphabet in new information technologies and communication (ICT) and engage in research in this field has become a major necessity [1], [2]. The Amazigh language is spoken by about 30 million people in North Africa (the oasis of Siwa in Egypt, Morocco through Libya, Tunisia, Algeria, Niger, Mali, Burkina Faso and Mauritania) [3], [4]. Due to the diversity of hand writing characters, there are two big approaches in this field and both need a dataset to be executed: the first one is based on complex classifiers like Artificial Neural Network or Support Vector Machine; those classifiers need a dataset to be trained to classify characters[5]. The other approaches also need a dataset this time to find a normalization of each character. The data set is very useful to train classification system for Tifinagh hand writing, that remain an active area of research. The dataset is the first free and on line dataset for handwriting Tifinagh character without formalities.

Experimental design, materials and methods

We ask 30 people (17 male and 13 female) to write the 32 Tifinagh (Fig. 1) characters on one page, and we add 13 more features to take on consideration horizontal and vertical inclination. The pages where scanned using the Epson 10000XL.
Fig. 1

Elementary IRCAM Tifinagh characters.

Elementary IRCAM Tifinagh characters. The extraction steps were: We use the horizontal histogram to correct the inclination of every page [6]. Using connected components algorithm we detect the center of each character [7]. We extract 31 sub-images of 30×30px that contain the characters (Fig. 2).
Fig. 2

Example of handwriting Tifinagh character from the proposed data base.

Example of handwriting Tifinagh character from the proposed data base. The sub-image are named using Latin character mentioned in Fig. 1 for each character followed by number from 1 to 30( a1,a2…a43 ). For character with sub point we use a double character (hh or zz). For epsilon we use a double A(aa1,aa2…aa43). To automatically explore the dataset or to extract features from the whole dataset we propose the following Matlab code:
Subject areaComputer science

More specific subject areaImage processing, character recognition
Type of dataImage
How data was acquiredHand writing and scanner
Data formatJpg image
Experimental factorsWe ask 30 students to write in each cell of a table all Tifinagh characters, we use an Epson 10000XL to data scan, and we add 13 more features to take on consideration horizontal and vertical inclination
Experimental features1376 Image with a size of 30⁎30px (43 images/character)
Data source locationEssaouira, Morocco
Data accessibilityWithin this article
function x=base_generation()
//Read all jpg image from folder ‘data_set’
fileFolder = fullfile(‘data_set’);
dirOutput = dir(fullfile(fileFolder,’⁎.jpg’))
fileNames = {dirOutput.name}׳
numFrames = numel(fileNames)
cd ‘data_set׳
p = imread(fileNames{2});
//We read and converts to gray level the first image then we call the //feature extraction process
d=imread(p);
d=double(d)/255;
y=rgb2gray(d);
t =zmoment(y,11); // Call the feature extraction
b=t; //we add the ‘t’ to the data matrix
// We repeat the same treatment for the rest of the data set
for i=2:1240
p=m{i};
d=imread(p);
d=double(d)/255;
y=rgb2gray(d);
t=zmoment(y,11); // Call the feature extraction function(Zernike for // example)
b=[b;t];
End
x=b
  1 in total

1.  Genetic ancestry of a Moroccan population as inferred from autosomal STRs.

Authors:  K Bentayebi; F Abada; H Ihzmad; S Amzazi
Journal:  Meta Gene       Date:  2014-06-21
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.