| Literature DB >> 35291716 |
Hajer Ghodhbani1, Mohamed Neji1,2, Imran Razzak3, Adel M Alimi1,4.
Abstract
Since the last years and until now, technology has made fast progress for many industries, in particularly, garment industry which aims to follow consumer desires and demands. One of these demands is to fit clothes before purchasing them on-line. Therefore, many research works have been focused on how to develop an intelligent apparel industry to ensure the online shopping experience. Image-based virtual try-on is among the most potential approach of virtual fitting that tries on target clothes into customer's image, therefore, it has received considerable research efforts in the recent years. However, there are several challenges involved in development of virtual try-on that make it difficult to achieve naturally looking virtual outfit such as shape, pose, occlusion, illumination cloth texture, logo and text etc. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of virtual try-on. This review first introduces virtual try-on and its challenges followed by its demand in fashion industry. We summarize state-of-the-art image based virtual try-on for both fashion detection and fashion synthesis as well as their respective advantages, drawbacks, and guidelines for selection of specific try-on model followed by its recent development and successful application. Finally, we conclude the paper with promising directions for future research.Entities:
Keywords: Fashion detection; Fashion industry; Fashion synthesis; Virtual try-on
Year: 2022 PMID: 35291716 PMCID: PMC8908950 DOI: 10.1007/s11042-022-12802-6
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Fig. 1Article Classification based on Research Questions
Fig. 2Classification of based approaches for image-based virtual try-on System
Fig. 3Examples of fashion parsing based on semantic segmentation [28]
Fig. 4Annotation examples for LIP [13] with appearance variability and different views
Fig. 5Example of human pose estimation from DeepPose [79] on the LSP Dataset [30]
Fig. 6Example of multi-person HPE [38]
Fig. 7Example of results from Fashion Landmark Detection approach [37]. First row illustrates the results on DeepFashion-C [43], second row presents results on Fashion Landmark Dataset (FLD) dataset [44]
Fig. 8Examples of image style transfer by TextureGAN [90]
Fig. 9Example of Warping a clothing image proposed by VITON [16]: Given the target clothing image and a clothing mask, the shape context matching is used to estimate the TPS transformation and generate a warped clothing image
Fig. 10Results from the CP-VTON [84], CP-VTON+ [50] ACGPN [98] and CIT [62]
Fig. 11Different architectures for warped Module: a based on segmentation mask from VITON [16], b without human segmentation from WUTON [27] and PF-AFN [12]
Fig. 12Garment transfer results generated by the work of Sarkar et al. [67]
Fig. 13Examples of pose transformation results generated by PG2 work of Liqian Ma, et al. [48] from DeepFashion dataset [43] (a) and Market-1501 dataset [104] (b)
Fig. 14Example of clothing simulation results obtained with DRAPE model [15]
Fig. 15Examples of physical simulation from the work of Wang et al. [86]
Summary of the benchmark datasets for fashion tasks
| Task | Dataset | Number of photos | Description | Publish time | |
|---|---|---|---|---|---|
| Virtual Try-On | LookBook [ | 84,748 | Composed by 9,732 top product images and 75,016 fashion model images | 2016 | |
| DeepFashion [ | 78,979 | Selected from the In-shop Clothes Benchmark and associated with several sentences as captions and a segmentation map. | 2016 | ||
| VITON [ | 32,506 | Contained around 19,000 frontal-view woman and top clothing image pairs, yielding 16,253 pairs | 2018 | ||
| FashionTryOn [ | 28,714 | Comprising 28, 714 clothing person-person triplets with each consisting of a clothing item image and two model images in different poses. | 2019 | ||
| FashionOn [ | 22,566 | Pairs of person image wearing the same clothes in different poses. | 2019 | ||
| Fashion Parsing | Fashionista [ | 158,235 | Outfit information in the form of tags, comments, and links | 2012 | |
| Paper Doll [ | 339,797 | Annotated with metadata tags denoting characteristics, e.g., color, style, occasion, clothing type, brand | 2013 | ||
| Chictopia10k [ | 10,000 | Contains real-world annotated images in the wild with arbitrary postures, views and backgrounds | 2015 | ||
| LIP [ | 50,462 | ■ Focus on semantic understanding of person and contains images with elaborated pixel-wise annotations with 19 semantic human part labels and 2D human poses with 16 key points. ■ Images collected from real-world scenarios contain human appearing with challenging poses and views, occlusions, and various appearances. | 2017 | ||
| MHP | v1.0 [ | 4,980 | ■ Instance-aware setting with fine-grained pixel-level annotations works with 7 body parts and 11 clothes categories. | 2017 | |
| v2.0 [ | 25,403 | ■ Annotated images with 58 fine-grained semantic categories: 11 body parts and 47 clothes categories ■ Captured images in real-world scenes from various viewpoints, poses, occlusion, interaction, and background | 2018 | ||
| Crowd Instance-level Human Parsing (CIHP) [ | 38,280 | ■ Multi-person images ■ Pixel-wise annotations in instance-level | 2018 | ||
| ModaNet [ | 55,176 | Annotated with pixel-level labels, bounding boxes, and polygons | 2018 | ||
| DeepFashion2 [ | 491,000 | ■ Diverse images of 13 popular clothing categories from both commercial shopping stores and consumers. ■ Labeled with scale, occlusion, zoom-in, viewpoint, and category, style, bounding box, dense landmarks and per-pixel mask. | 2019 | ||
| Fashionpedia [ | 48,000 | Containing 294 fine-grained attributes with high resolution (1710 × 2151) | 2020 | ||
| RichWear [ | 322,198 | Street fashion dataset containing various text labels for fashion analysis. The images are collected from an Asian social network site, focuses on street styles in Japan and other Asian areas. | 2021 | ||
| Fashion landmark detection | DeepFashion-C [ | 289,222 | Annotated with clothing bounding box, pose variation type, landmark visibility, clothing type, category, and attributes | 2016 | |
| Fashion Landmark Dataset (FLD) [ | 123,016 | Annotated with clothing type, pose variation type, landmark visibility, clothing bounding box, and human body joint | 2016 | ||
| Unconstrained Landmark Database (ULD) [ | 30,000 | ■ Collected from fashion blogs, forums and the consumer-to shop retrieval benchmark of DeepFashion [ ■ Contains substantial foreground scatters and background clutters | 2017 | ||
| DeepFashion2 [ | 491,000 | DeepFashion2 used in diverse tasks like fashion parsing, clothes detection, pose estimation, segmentation, and retrieval. | 2019 | ||
| Human Pose Estimation | MPII Human pose [ | 2.5104 | ■ Data are from YouTube videos. It covers 410 human activities, and each image is provided with activity label | 2014 | |
| MSCOCO [ | 328,000 | ■ Data are from Internet. It used for diverse activities. | 2014 | ||
| AI Challenger [ | 300,000 | ■ Data are crawled from Internet. ■ Provide three sub-datasets for human keypoint detection, attribute based zero-shot recognition and image Chinese captioning. | 2017 | ||
| PoseTrack [ | 550 video sequences | ■ Focusses on 3 aspects: (1) single-frame multi-person pose estimation. (2) Multi-person pose estimation in videos. (3) Multi-person articulated tracking. | 2017 | ||
| Pose Transfer | Human3.6M [ | 3.6M | ■ Containing 3.6 million different 3D articulated poses captured from a set of men and women actors. ■ provides synchronized 2D and 3D data (including time of flight, high quality image and motion capture data), accurate 3D human models of the actors, and mixed reality settings | 2014 | |
| Market-1501 [ | 32,668 | ■ Contains over 32,000 annotated boxes, plus a distractor set of over 500K images produced using the Deformable Part Model (DPM) as pedestrian detector. | 2015 | ||
| DeepFashion [ | 52,712 | In-shop Clothes Retrieval Benchmark DeepFashion is used for pose transfer | 2016 | ||
| SMPL-NPT [ | 24,000 | Contains 24,000 synthesized body meshes and used for 3D Pose Transfer | 2020 | ||
| SMG-3D [ | 8,000 | Contains 8,000 pairs of naturally plausible body meshes of 40 identities and 200 poses, 35 identities and 180 poses are used as the training set | 2021 | ||
| Clothing Simulation | MG-Cloth [ | 356 scans | Contains 3D scans of person with different body shapes, poses and clothes. | 2019 | |
| DeepFashion3D [ | 2,078 models | Contains 3D garment models with 10 different clothing categories and 563 garment instances | 2020 | ||
| AFRIFASHION1600 [ | 1600 | African fashion dataset curated to improve visibility, inclusion and familiarity of African fashion in computer vision tasks | 2021 | ||
Performance comparisons of fashion parsing methods (in %) [28]
| Method | Dataaset | Evaluation Metrics | |||||||
|---|---|---|---|---|---|---|---|---|---|
| mIOU | aPA | mAGR | Acc. | Fg.acc. | Avg.prec. | Avg.recall | AVG.F-1 | ||
| Yamaguchi et al., [ | – | – | – | 88.96 | 62.18 | 52.75 | 49.43 | 44.76 | |
| Liang et al., [ | – | – | – | 91.11 | 71.04 | 71.69 | 60.5 | 64.38 | |
| Co-CNN [ | – | – | – | ||||||
| Yamaguchi et al., [ | – | – | – | 89.98 | 65.66 | 54.87 | 51.16 | 46.80 | |
| Liang et al., [ | – | – | – | 92.33 | 76.54 | 73.93 | 66.49 | 69.30 | |
| Co-CNN [ | – | – | – | ||||||
| CE2P [ | 53.10 | – | – | 63.20 | – | – | – | – | |
| Wang et al., [ | – | – | – | – | |||||
| Co-CNN [ | – | 96.02 | – | – | 83.57 | 84.95 | 77.66 | 80.14 | |
| TGPNet [ | – | 96.45 | – | – | 87.91 | 83.36 | 80.22 | 81.76 | |
| Wang et al., [ | – | ||||||||
Results of different state-of-the-art methods for fashion parsing [68]
| Model | Market-1501 [ | DeepFashion [ | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SSIM | IS | Mask-SSIM | Mask-IS | DS | pSSIM | SSIM | IS | DS | pSSIM | |
| PG2 [ | 0.261 | 0.782 | 3.367 | 0.390 | – | 0.773 | 3.163 | 0.951 | – | |
| Def-GAN [ | 0.291 | 3.230 | 0.807 | 3.502 | 0.720 | – | 0.760 | 0.976 | – | |
| PATN [ | 0.81 | 3.162 | 0.799 | 3.737 | 0.6186 | 0.771 | 3.201 | 0.976 | 0.799 | |
| Loss function [ | 3.326 | 0.742 | 3.262 | |||||||
| Real Data | 1.000 | 3.890 | 1.000 | 3.706 | 0.740 | 1 | 1.000 | 4.053 | 0.968 | 1 |
Fig. 16Evaluation of the work of Santesteban et al. [4] compared with DRAPE [65] and ClothCap [101]
Fig. 17Applications of AI techniques in fashion industry
Fig. 18Illustration of the idea of Virtual Try-On System
Fig. 19Limitation of generated results of the virtual try-on task presented by the work of Sarkar et al. [67]