Literature DB >> 25870463

Semi-automatic ground truth generation using unsupervised clustering and limited manual labeling: Application to handwritten character recognition.

Szilárd Vajda1, Yves Rangoni2, Hubert Cecotti3.   

Abstract

For training supervised classifiers to recognize different patterns, large data collections with accurate labels are necessary. In this paper, we propose a generic, semi-automatic labeling technique for large handwritten character collections. In order to speed up the creation of a large scale ground truth, the method combines unsupervised clustering and minimal expert knowledge. To exploit the potential discriminant complementarities across features, each character is projected into five different feature spaces. After clustering the images in each feature space, the human expert labels the cluster centers. Each data point inherits the label of its cluster's center. A majority (or unanimity) vote decides the label of each character image. The amount of human involvement (labeling) is strictly controlled by the number of clusters - produced by the chosen clustering approach. To test the efficiency of the proposed approach, we have compared, and evaluated three state-of-the art clustering methods (k-means, self-organizing maps, and growing neural gas) on the MNIST digit data set, and a Lampung Indonesian character data set, respectively. Considering a k-nn classifier, we show that labeling manually only 1.3% (MNIST), and 3.2% (Lampung) of the training data, provides the same range of performance than a completely labeled data set would.

Entities:  

Year:  2015        PMID: 25870463      PMCID: PMC4392711          DOI: 10.1016/j.patrec.2015.02.001

Source DB:  PubMed          Journal:  Pattern Recognit Lett        ISSN: 0167-8655            Impact factor:   3.756


  5 in total

1.  Script recognition--a review.

Authors:  Debashis Ghosh; Tulika Dube; Adamane P Shivaprasad
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2010-12       Impact factor: 6.226

2.  Learning context-sensitive shape similarity by graph transduction.

Authors:  Xiang Bai; Xingwei Yang; Longin Jan Latecki; Wenyu Liu; Zhuowen Tu
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2010-05       Impact factor: 6.226

3.  Reducing the dimensionality of data with neural networks.

Authors:  G E Hinton; R R Salakhutdinov
Journal:  Science       Date:  2006-07-28       Impact factor: 47.728

4.  Angular pattern and binary angular pattern for shape retrieval.

Authors:  Rong-Xiang Hu; Wei Jia; Haibin Ling; Yang Zhao; Jie Gui
Journal:  IEEE Trans Image Process       Date:  2013-10-18       Impact factor: 10.856

5.  80 million tiny images: a large data set for nonparametric object and scene recognition.

Authors:  Antonio Torralba; Rob Fergus; William T Freeman
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2008-11       Impact factor: 6.226

  5 in total
  1 in total

1.  Feature Selection for Automatic Tuberculosis Screening in Frontal Chest Radiographs.

Authors:  Szilárd Vajda; Alexandros Karargyris; Stefan Jaeger; K C Santosh; Sema Candemir; Zhiyun Xue; Sameer Antani; George Thoma
Journal:  J Med Syst       Date:  2018-06-29       Impact factor: 4.460

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.