| Literature DB >> 36191011 |
Ruqia Bibi1, Zahid Mehmood2, Asmaa Munshi3, Rehan Mehmood Yousaf1, Syed Sohail Ahmed4.
Abstract
The recent era has witnessed exponential growth in the production of multimedia data which initiates exploration and expansion of certain domains that will have an overwhelming impact on human society in near future. One of the domains explored in this article is content-based image retrieval (CBIR), in which images are mostly encoded using hand-crafted approaches that employ different descriptors and their fusions. Although utilization of these approaches has yielded outstanding results, their performance in terms of a semantic gap, computational cost, and appropriate fusion based on problem domain is still debatable. In this article, a novel CBIR method is proposed which is based on the transfer learning-based visual geometry group (VGG-19) method, genetic algorithm (GA), and extreme learning machine (ELM) classifier. In the proposed method, instead of using hand-crafted features extraction approaches, features are extracted automatically using a transfer learning-based VGG-19 model to consider both local and global information of an image for robust image retrieval. As deep features are of high dimension, the proposed method reduces the computational expense by passing the extracted features through GA which returns a reduced set of optimal features. For image classification, an extreme learning machine classifier is incorporated which is much simpler in terms of parameter tuning and learning time as compared to other traditional classifiers. The performance of the proposed method is evaluated on five datasets which highlight the better performance in terms of evaluation metrics as compared with the state-of-the-art image retrieval methods. Its statistical analysis through a nonparametric Wilcoxon matched-pairs signed-rank test also exhibits significant performance.Entities:
Mesh:
Year: 2022 PMID: 36191011 PMCID: PMC9529116 DOI: 10.1371/journal.pone.0274764
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Images having similar visual appearance but belongs to different classes [3] (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Details of competitive CBIR methods.
| Technique | Problem addressed | Feature extraction | Clustering | Classification | Similarity measure | Limitations/ Future work |
|---|---|---|---|---|---|---|
|
| Semantic gap, the computational cost | 8-Directional Gray Level Co-occurrence Matrix, geometric shape features, and HSV Color Moments | N.A. | N.A. | Manhattan, Canberra Euclidean and statistical distance | To integrate optimization techniques to reduce dimensions of feature vectors |
|
| Classification accuracy | Fine-tune AlexNet | N.A. | Extreme learning machine | N.A. | Misclassified visually similar images |
|
| Retrieval accuracy | Color moments, | N.A. | N.A. | Manhattan | Incorporate deep learning techniques |
|
| Lack of spatial information, dimensionality problem | GMM based color quantization method, spatiograms | Expectation maximization-Bayesian Information Criterion | N.A. | Mahalanobis distance, Jensen–Shannon Divergence | To explore more sophisticated color and texture information |
|
| Semantic gap | SIFT, SURF | k-means | SVM | Euclidean distance | Incorporate deep learning techniques |
|
| Semantic gap | Color moments, Gabor and Discrete wavelet transform, Color and Edge | N.A. | N.A. | Euclidean distance | Incorporate deep learning-based classifiers |
|
| Incompatibility of image descriptor and ranking methods | Perceptual Uniform Descriptor | N.A. | N.A. | L1/L2 norm, | Manifold ranking with multi-graph fusion |
|
| Role of image re-ranking in CBIR | HSV, SIFT, AlexNet | N.A. | N.A. | Jaccard similarity | To incorporate feature extraction and fusion re-ranking |
|
| Vanishing gradient | Residual network | N.A. | Minimum distance classifier | Canberra distance | Use of deep architecture in the medical field |
|
| Fusion framework for ranking retrieval results | Color Difference Histogram and Angular Radial | N.A. | N.A. | Euclidean distance, Modified Canberra distance | Handcrafted features, computationally expensive |
|
| Semantic gap | Image binarization, image transform, and morphological operator | N.A. | ANN, SVM | Euclidean distance, City block distance | Handcrafted features, computationally expensive |
|
| Semantic gap, Effective feature representation | Hierarchical wavelet packet descriptors | N.A. | SVM | Euclidean distance | Over partitioning of images leads to disrupted texture patterns |
|
| Learning difficulty in dynamic image samples | Independent Subspace Analysis-spatial pyramid matching | k-means | SVM | Histogram intersection | Sensitive to noise, required fixed group size for random vectors |
|
| Challenges in object and scene image classification | GP-HOG, FC-GPHOG, enhanced fisher model | N.A. | Nearest neighbor | Cosine similarity | To handle fuzzy memberships of class images. |
|
| Computational complexities in case of a large no. of classes | Dense SIFT, Locally constrained linear coding, Spatial pyramid matching | k-means | One vs all binary classifiers | N.A. | Incorporation of the semantic relationship between classes and distribution of classes |
|
| Lack of prior knowledge while transferring to | ResNet-50, VGG-16 | k-means | Hopfield network | Euclidean distance between two weighted matrices | Inherent ambiguity while retrieving images from certain classes. |
|
| Visual similarity vs semantic correlation | NA | Spectral clustering | SVM | Euclidean distance | Incorporate CNN-based representations for better classification accuracies. |
|
| Lack of spatial information, dimensionality reduction | BRISK like FREAK, Spatial CH, BoW | N.A. | K-nearest neighbors | Chebyshev distance | To incorporate scale invariancy |
|
| Semantic gap | SURF, HOG | k-means++ | SVM | Euclidean distance | To incorporate spatial information |
|
| Image diffusion | CH, LDP, SIFT, BoF | N.A. | K- nearest neighbors | Manhattan distance | Time optimization of a diffusion process |
|
| Semantic gap | MDGHM-SURF-ORB | Fuzzy c- means | Soft label SVM | Canberra distance | Incorporate VLAD, deep learning approaches |
|
| Semantic gap | Local binary pattern, Zernike moments, HSV histogram | Adaptive Sunflower optimization algorithm (SFO) | Deep neural network-search and rescue optimization algorithm (DNN- SAR) | Matching difference | Integration of Hadoop approaches with CBIR |
|
| Semantic gap, dimensionality reduction, robust feature representation, the computational expense | Transfer learning based on VGG-19 architecture, GA | N.A. | ELM | Canberra distance | Incorporate other deep learning techniques |
Fig 2A proposed methodology for CBIR (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Fig 3VGG-19 architecture for the proposed method.
Fig 4Methodology of genetic algorithm for the proposed method.
Fig 5ELM schematic diagram [56].
Fig 6Sample images of Wang-A dataset (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Performance comparison of the proposed method with state-of-the-art methods on the Wang-A dataset (values presented in bold are significant among competitive methods).
| Semantic Classes | FIF-IRS [ | VGG-16 [ | SCNN-ELM [ | AlexNet [ | CM-LBP-CED [ | Proposed Method | |
|---|---|---|---|---|---|---|---|
| African Tribes | P | 82.00 | 96.06 | 70.00 | 93.33 | 81.00 | 84.85 |
| R | 16.40 | 19.21 | 14.00 | 18.66 | 16.20 | 16.97 | |
| F | 27.33 | 32.00 | 23.33 | 31.10 | 27.00 | 28.28 | |
| Beaches | P | 60.00 | 84.19 | 66.00 | 90.00 | 66.00 | 87.50 |
| R | 12.00 | 16.83 | 13.20 | 18.00 | 13.20 | 17.50 | |
| F | 20.00 | 28.00 | 22.00 | 30.00 | 22.00 | 29.16 | |
| Building | P | 67.00 | 87.30 | 72.00 | 96.67 | 78.75 | 100 |
| R | 13.40 | 17.46 | 14.40 | 19.33 | 15.75 | 20.00 | |
| F | 22.33 | 29.10 | 24.00 | 32.21 | 26.25 | 33.33 | |
| Buses | P | 95.00 | 100 | 70.00 | 100 | 96.25 | 100 |
| R | 19.00 | 20.00 | 14.00 | 20.00 | 19.25 | 20.00 | |
| F | 31.66 | 33.33 | 23.33 | 33.33 | 32.08 | 33.33 | |
| Dinosaurs | P | 100 | 97.99 | 78.00 | 100 | 100 | 100 |
| R | 20.00 | 19.59 | 15.60 | 20.00 | 20.00 | 20.00 | |
| F | 33.33 | 32.65 | 26.00 | 33.33 | 33.33 | 33.33 | |
| Elephants | P | 95.00 | 91.60 | 96.00 | 100 | 70.75 | 100 |
| R | 19.00 | 18.32 | 19.20 | 20.00 | 14.15 | 20.00 | |
| F | 31.66 | 30.53 | 32.00 | 33.33 | 23.58 | 33.33 | |
| Flowers | P | 100 | 98.03 | 96.00 | 96.67 | 95.75 | 100 |
| R | 20.00 | 19.60 | 19.20 | 19.33 | 19.15 | 20.00 | |
| F | 33.33 | 32.66 | 32.00 | 32.21 | 31.91 | 33.33 | |
| Horses | P | 100 | 100 | 82.00 | 100 | 98.75 | 100 |
| R | 20.00 | 20.00 | 16.40 | 20.00 | 19.75 | 20.00 | |
| F | 33.33 | 33.33 | 27.33 | 33.33 | 32.91 | 33.33 | |
| Mountain | P | 63.00 | 90.70 | 67.00 | 83.83 | 67.75 | 93.33 |
| R | 12.60 | 18.14 | 13.40 | 16.76 | 13.55 | 18.66 | |
| F | 21.00 | 30.23 | 22.33 | 27.93 | 22.58 | 31.10 | |
| Foods | P | 71.00 | 100 | 100 | 96.83 | 77.25 | 100 |
| R | 14.20 | 20.00 | 20.00 | 19.36 | 15.45 | 20.00 | |
| F | 23.66 | 33.33 | 33.33 | 32.26 | 25.75 | 33.33 | |
|
| P | 83.30 | 94.58 | 79.70 | 95.73 | 83.22 |
|
| R | 16.66 | 18.91 | 15.90 | 19.14 | 16.64 |
| |
| F | 27.76 | 31.51 | 26.51 | 31.90 | 27.73 |
| |
|
| |||||||
| z-value | -2.8031 | -1.9876 | -2.8031 | -1.9876 | -2.8031 | -2.8031 | |
| p-value | 0.00512 | 0.0466 | 0.00512 | 0.0466 | 0.00512 | 0.00512 | |
Fig 7Top-20 retrieved images against query image (class: African tribes) (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Fig 8Top-20 retrieved images against query image (class: Elephants) (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Fig 9Sample images of Wang-B dataset (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Performance comparison of the proposed method with state-of-the-art methods on the Wang-B dataset.
| Performance metrics | GMM-mSpatiogram [ | SIFT-SURF [ | LIOP-LBPV [ | CM-DWT-CEDD [ | Proposed Method |
|---|---|---|---|---|---|
| mAP | 74.10 | 74.95 | 76.02 | 86.33 |
|
| Avg. recall | 13.80 | 14.99 | 15.20 | 17.26 |
|
| Avg. F-measure | 23.26 | 24.98 | 25.33 | 28.76 |
|
|
| |||||
| z-value | -2.8030 | -2.8031 | -2.8032 | -2.8036 | -2.8031 |
| p-value | 0.00512 | 0.00512 | 0.00514 | 0.00517 | 0.00512 |
Fig 10Top-20 retrieved images against query image (class: Bus) (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Fig 11Top-20 retrieved images against query image (class: Tiger) (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Fig 12Sample images of the Wang 10k dataset (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Performance comparison of the proposed method with state-of-the-art methods on the Wang 10k dataset.
| Performance metrics | GLCM-GSF-HSVCM [ | CM-LBP-CED [ | PUD [ | N3G-MFR[ | ResNet [ | Proposed Method |
|---|---|---|---|---|---|---|
| mAP | 56.4 | 59.98 | 58.46 | 65 | 74.60 |
|
| Avg. recall | 11.28 | 11.99 | 11.69 | 13 | 14.92 |
|
| Avg. F-measure | 18.8 | 19.98 | 19.48 | 21.66 | 24.86 |
|
|
| ||||||
| z-value | -2.8031 | -2.8033 | -2.8032 | -2.8035 | -2.8037 | -2.8031 |
| p-value | 0.00512 | 0.00512 | 0.00512 | 0.00513 | 0.00515 | 0.00512 |
Fig 13Top-20 retrieved images against a query image (class: Car) (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Fig 14Top-20 retrieved images against a query image (class: Train) (reprinted from [3] under a CC BY license, with permission from J. Z. WANG, original copyright [2003]).
Fig 15Sample images of the OT Scene dataset [64].
Performance comparison of the proposed method with state-of-the-art methods on the OT Scene dataset.
| Performance metrics | CDH-ART [ | B-T-Morph [ | SIFT-SURF [ | HWVP[ | ISA-SPM [ | FC-GPHOG [ | Proposed Method |
|---|---|---|---|---|---|---|---|
| mAP | 51.04 | 60.7 | 69.75 | 77.2 | 86.29 | 89.6 |
|
| Avg. recall | 10.20 | 12.14 | 13.95 | 15.44 | 17.25 | 17.92 |
|
| Avg. F-measure | 17 | 20.23 | 23.25 | 25.73 | 28.75 | 29.86 |
|
|
| |||||||
| z-value | -2.8031 | -2.8031 | -2.8033 | -2.8033 | -2.8035 | -2.8036 | -2.8033 |
| p-value | 0.00512 | 0.00512 | 0.00513 | 0.00513 | 0.00514 | 0.00514 | 0.00513 |
Fig 16Top-20 retrieved images according to the query image of the OT Scene dataset (class: open country).
Fig 17Top-20 retrieved images according to the query image of the OT Scene dataset (class: inside a city).
Fig 18Sample images of the Caltech-256 dataset [66].
Performance comparison of the proposed method with state-of-the-art methods on the Caltech-256 dataset.
| Performance metrics | FC-GPHOG [ | ACEnet [ | Balanced tree structures [ | ResNet-HAM [ | Proposed Method |
|---|---|---|---|---|---|
| mAP | 33 | 36.99 | 38.56 | 74.7 |
|
| Avg. recall | 6.6 | 7.39 | 7.71 | 14.94 |
|
| Avg. F-measure | 11 | 12.31 | 12.85 | 24.9 |
|
|
| |||||
| z-value | -2.8025 | -2.8026 | -2.8027 | -2.8034 | -2.8027 |
| p-value | 0.00510 | 0.00510 | 0.00510 | 0.00513 | 0.00510 |
Fig 19Top-20 retrieved images according to the query image of the Caltech-256 dataset (class: French Horn).
Fig 20Top-20 retrieved images according to the query image of the Caltech-256 dataset (class: Grapes).
Fig 21Effect of no. of hidden neurons on the accuracy of the proposed method.
Fig 22Performance analysis in terms of the precision-recall curve of the proposed method.
Computational time (in seconds) of the proposed method and its comparative analysis with competitive methods of CBIR on the Wang-A dataset.
| CM-LBP-CED [ | FIF-IRS [ | Spatial color-Shape [ | DNN-SAR [ | SURF-HOG [ | CHLDP-DSIFT [ | MDGHM-SURF-ORB [ | Proposed Method |
|---|---|---|---|---|---|---|---|
| 1.1087 | 1.46 | 1.34 | 1.26 | 0.7845 | 0.7837 | 0.5124 | 0.47 |
Computational time (in seconds) of the proposed method and its comparative analysis with competitive methods of CBIR on the Caltech-256 dataset.
| No. of images retrieved | Proposed method | DNN-SAR [ | Spatial color-Shape [ |
|---|---|---|---|
| 10 | 0.28 | 0.93 | 1.06 |
| 15 | 0.76 | 1.0 | 1.11 |
| 20 | 0.9 | 1.07 | 1.19 |
| 25 | 0.94 | 1.11 | 1.25 |
| 30 | 1.01 | 1.16 | 1.26 |