| Literature DB >> 28287960 |
Charles Otto, Dayong Wang, Anil K Jain.
Abstract
Given a large collection of unlabeled face images, we address the problem of clustering faces into an unknown number of identities. This problem is of interest in social media, law enforcement, and other applications, where the number of faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) quality measures, and run-time. Our algorithm achieves an F-measure of 0.87 on the LFW benchmark (13 K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13 K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.Year: 2017 PMID: 28287960 DOI: 10.1109/TPAMI.2017.2679100
Source DB: PubMed Journal: IEEE Trans Pattern Anal Mach Intell ISSN: 0098-5589 Impact factor: 6.226