| Literature DB >> 36035324 |
Soraia M Alarcão1, Vânia Mendonça2, Carolina Maruta3, Manuel J Fonseca1.
Abstract
One of the main challenges in CBIR systems is to choose discriminative and compact features, among dozens, to represent the images under comparison. Over the years, a great effort has been made to combine multiple features, mainly using early, late, and hierarchical fusion techniques. Unveiling the perfect combination of features is highly domain-specific and dependent on the type of image. Thus, the process of designing a CBIR system for new datasets or domains involves a huge experimentation overhead, leading to multiple fine-tuned CBIR systems. It would be desirable to dynamically find the best combination of CBIR systems without needing to go through such extensive experimentation and without requiring previous domain knowledge. In this paper, we propose ExpertosLF, a model-agnostic interpretable late fusion technique based on online learning with expert advice, which dynamically combines CBIR systems without knowing a priori which ones are the best for a given domain. At each query, ExpertosLF takes advantage of user's feedback to determine each CBIR contribution in the ensemble for the following queries. ExpertosLF produces an interpretable ensemble that is independent of the dataset and domain. Moreover, ExpertosLF is designed to be modular, and scalable. Experiments on 13 benchmark datasets from the Biomedical, Real, and Sketch domains revealed that: (i) ExpertosLF surpasses the performance of state of the art late-fusion techniques; (ii) it successfully and quickly converges to the performance of the best CBIR sets across domains without any previous domain knowledge (in most cases, fewer than 25 queries need to receive human feedback).Entities:
Keywords: Content-based image retrieval; Late fusion; Online learning; Prediction with expert advice; Relevance feedback
Year: 2022 PMID: 36035324 PMCID: PMC9391217 DOI: 10.1007/s11042-022-13119-0
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Fig. 1Example of a query image and three related images that illustrate the semantic gap between high-level concepts and low-level features: only Fig. 1d is similar according to both visual and semantic features. (figure best seen in color)
Fig. 2Architecture of the typical CBIR setting. The retrieved images with a green border represent relevant images, while a red border represents non-relevant images for the given query. (figure best seen in color)
Fig. 3Overview of our late fusion of multiple CBIR systems with expert advice. (figure best seen in color)
Fig. 4Example of an iteration using our online CBIR setting. The orange arrows represent the new weight for each expert. (figure best seen in color)
Summary of the color, shape, texture, and joint descriptors selected to study. The column ‘#’ indicates the feature vector length
| Feature | # | Short Description | |
|---|---|---|---|
| Auto Color Correlogram | 1024 | Color histogram combined with spatial correlation between identical colors. | |
| Color Histogram | 128 | HSB Color histogram with 8 bins for hue, 4 for saturation and 4 for brightness. | |
| Itten Contrasts | 14 | Histogram for saturation (avg., low, middle, and high), lightness (avg., very light, light, middle, dark, and very dark), hue (avg., warm and cold), and contrast. | |
| Opponent Histogram | 64 | Combination of 1D histograms based on the channels of the opponent color space, where | |
| Reference Color Similarity | 77 | Average pixel color similarity of 77 colors spaced evenly in HSV color space (18 hues with 100% and 50% each in saturation and brightness, plus 5 gray values). | |
| Edge Histogram | 80 | 5-bin histogram counting edges in vertical, horizontal, 35∘, 135∘, and non-directional directions (image divided into 16-equal-sized, non-overlapping blocks). | |
| Edges | 6 | Number of edges along vertical, horizontal, 35∘, 135∘, non-directional, and all. | |
| Haralick | 14 | Relative frequency distribution that describes how often one gray tone will appear in a specific spatial relationship to another gray tone on the image. | |
| Tamura | 18 | Histogram for coarseness, contrast, and directionality. | |
| Color and Edge Directive Descriptor | 144 | Histogram that is constituted by 6 regions determined by texture, where each region is constituted by 24 individual HSV fuzzy color regions. | |
| Fuzzy Color and Texture Histogram | 192 | Histogram constituted by 8 regions determined by texture (Haar Wavelet), where which region is constituted by 24 individual regions resulting from the combination of YIQ and HSV color fuzzy systems. | |
| Joint Composite Descriptor | 168 | Combines CEDD and FCTH. It is made up of 7 texture areas, with each area made up of 24 color regions. |
Summary of the semantic descriptors selected to study. The column ‘#’ indicates the feature vector length
| Feature | # | Short Description | |
|---|---|---|---|
| Adjective-Noun Pairs | 2089 | Computes the probability for each possible adjective-noun pair. | |
| Adjectives | 231 | Computes the probability of the most relevant adjectives from adjective-noun pairs. | |
| Nouns | 424 | Computes the probability of the most relevant nouns from adjective-noun pairs. | |
| General Concepts | 7786 | Probability of the presence of relevant general concepts (objects, moods, etc.) |
Summary of the datasets
| Ref. | Dataset | #Images | #Categories | #Relevant | |
|---|---|---|---|---|---|
| [ | BrainCE-MRI | 3064 | 3 | 708 - 1426 | |
| [ | BreakHis | 7909 | 8 | 444 - 3451 | |
| [ | COVID19-Rx | 3886 | 3 | 1200 - 1345 | |
| [ | HAM10000 | 10015 | 7 | 115 - 6705 | |
| [ | IRMA | 14410 | 193 | 1 - 2343 | |
| [ | PlantPathology | 1821 | 4 | 91 - 622 | |
| [ | CopyDays | 3212 | 157 | 20 - 24 | |
| [ | COREL1K | 1000 | 10 | 100 | |
| [ | COREL10K | 10000 | 100 | 97 - 103 | |
| [ | GHIM10K | 10000 | 20 | 500 | |
| [ | $P | 4802 | 16 | 299 - 301 | |
| [ | ImiSketchS | 1871 | 13 | 43 - 372 | |
| [ | mCali | 8159 | 24 | 339 - 340 |
Fig. 5Example images from the Biomedical datasets. (figure best seen in color)
Fig. 6Example images from the Real datasets. (figure best seen in color)
Fig. 7Example images from the Sketch datasets. (figure best seen in color)
Results for Biomedical datasets using only visual descriptors. Bold values indicate the best results
| Dataset | Method | Metric | avgP@10 | avgF1 | mAP |
|---|---|---|---|---|---|
| BrainCE-MRI | BestExpert (texture) | - | 0.67 | ||
| EF | - | 0.49 | 0.33 | ||
| FreqRankLF | - | 0.54 | 0.50 | 0.28 | |
| SimRankLF | - | 0.60 | 0.46 | 0.26 | |
| ExpertosLF_V | Jaccard | 0.67 | |||
| BreakHis | BestExpert (color) | - | |||
| EF | - | 0.75 | 0.20 | ||
| FreqRankLF | - | 0.07 | 0.35 | 0.17 | |
| SimRankLF | - | 0.53 | 0.27 | 0.11 | |
| ExpertosLF_V | SorensenDice | 0.77 | |||
| COVID19-Rx | BestExpert (shape) | - | 0.87 | ||
| EF | - | 0.59 | 0.45 | ||
| FreqRankLF | - | 0.34 | 0.54 | 0.40 | |
| SimRankLF | - | 0.81 | 0.55 | 0.37 | |
| ExpertosLF_V | SorensenDice | 0.87 | |||
| HAM10000 | BestExpert (color) | - | 0.68 | 0.52 | 0.38 |
| EF | - | 0.38 | |||
| FreqRankLF | - | 0.04 | 0.50 | 0.23 | |
| SimRankLF | - | 0.68 | 0.52 | ||
| ExpertosLF_V | Overlap | 0.70 | 0.52 | 0.37 | |
| IRMA | BestExpert (shape) | - | 0.68 | 0.28 | |
| EF | - | ||||
| FreqRankLF | - | 0.05 | 0.22 | 0.09 | |
| SimRankLF | - | 0.46 | 0.28 | 0.16 | |
| ExpertosLF_V | SorensenDice | 0.67 | 0.28 | ||
| Plant Pathology | BestExpert (shape) | - | 0.44 | 0.13 | |
| EF | - | ||||
| FreqRankLF | - | 0.31 | 0.32 | 0.11 | |
| SimRankLF | - | 0.47 | 0.32 | 0.12 | |
| ExpertosLF_V | OtsukaOchiai | 0.54 | 0.13 |
Results for Biomedical datasets using visual and semantic descriptors. Bold values indicate the best results
| Dataset | Method | Metric | avgP@10 | avgF1 | mAP |
|---|---|---|---|---|---|
| BrainCE-MRI | BestExpert (texture) | - | 0.67 | ||
| EF | - | 0.49 | 0.33 | ||
| FreqRankLF | - | 0.47 | 0.48 | 0.26 | |
| SimRankLF | - | 0.60 | 0.47 | 0.28 | |
| ExpertosLF_VS | Jaccard | 0.67 | |||
| BreakHis | BestExpert (color) | - | |||
| EF | - | 0.77 | 0.34 | 0.18 | |
| FreqRankLF | - | 0.06 | 0.29 | 0.11 | |
| SimRankLF | - | 0.55 | 0.27 | 0.11 | |
| ExpertosLF_VS | SorensenDice | 0.77 | |||
| COVID19-Rx | BestExpert (shape) | - | 0.87 | 0.61 | 0.46 |
| EF | - | 0.62 | |||
| FreqRankLF | - | 0.48 | 0.58 | 0.46 | |
| SimRankLF | - | 0.86 | 0.59 | 0.44 | |
| ExpertosLF_VS | Overlap | 0.85 | 0.48 | ||
| HAM10000 | BestExpert (color) | - | 0.68 | 0.37 | |
| EF | - | ||||
| FreqRankLF | - | 0.04 | 0.50 | 0.24 | |
| SimRankLF | - | 0.67 | 0.51 | 0.35 | |
| ExpertosLF_VS | Overlap | 0.70 | 0.52 | 0.37 | |
| IRMA | BestExpert (shape) | - | 0.68 | 0.39 | 0.28 |
| EF | - | ||||
| FreqRankLF | - | 0.09 | 0.30 | 0.14 | |
| SimRankLF | - | 0.49 | 0.28 | 0.17 | |
| ExpertosLF_VS | SorensenDice | 0.67 | 0.39 | 0.28 | |
| Plant Pathology | BestExpert (semantic) | - | |||
| EF | - | 0.63 | 0.35 | 0.17 | |
| FreqRankLF | - | 0.36 | 0.17 | ||
| SimRankLF | - | 0.58 | 0.37 | 0.17 | |
| ExpertosLF_VS | OtsukaOchiai |
Fig. 8Evolution of the weights for each expert, and evolution of F1 over queries. (figure best seen in color)
Results for Real datasets using only visual descriptors. Bold values indicate the best results
| Dataset | Method | Metric | avgP@10 | avgF1 | mAP |
|---|---|---|---|---|---|
| CopyDays | BestExpert (joint) | - | 0.88 | ||
| EF | - | 0.70 | 0.69 | ||
| FreqRankLF | - | 0.75 | 0.62 | ||
| SimRankLF | - | 0.82 | 0.64 | 0.60 | |
| ExpertosLF_V | SorensenDice | 0.88 | |||
| COREL1K | BestExpert (color) | - | 0.78 | 0.49 | 0.40 |
| EF | - | ||||
| FreqRankLF | - | 0.27 | 0.50 | 0.32 | |
| SimRankLF | - | 0.40 | 0.27 | 0.14 | |
| ExpertosLF_V | Overlap | 0.67 | 0.53 | 0.40 | |
| COREL10K | BestExpert (color) | - | 0.51 | 0.22 | 0.15 |
| EF | - | ||||
| FreqRankLF | - | 0.12 | 0.21 | 0.09 | |
| SimRankLF | - | 0.17 | 0.07 | 0.03 | |
| ExpertosLF_V | OtsukaOchiai | 0.57 | 0.25 | 0.11 | |
| GHIM10K | BestExpert (joint) | - | 0.58 | 0.25 | 0.12 |
| EF | - | ||||
| FreqRankLF | - | 0.12 | 0.25 | 0.10 | |
| SimRankLF | - | 0.24 | 0.12 | 0.02 | |
| ExpertosLF_V | SorensenDice | 0.57 | 0.25 | 0.11 |
Results for Real datasets using visual and semantic descriptors. Bold values indicate the best results
| Dataset | Method | Metric | avgP@10 | avgF1 | mAP |
|---|---|---|---|---|---|
| CopyDays | BestExpert (joint) | - | 0.88 | 0.74 | 0.73 |
| EF | - | ||||
| FreqRankLF | - | 0.65 | 0.66 | 0.51 | |
| SimRankLF | - | 0.85 | 0.66 | 0.63 | |
| ExpertosLF_VS | Jaccard | 0.88 | 0.74 | 0.73 | |
| COREL1K | BestExpert (semantic) | - | |||
| EF | - | 0.92 | 0.91 | ||
| FreqRankLF | - | 0.86 | 0.89 | ||
| SimRankLF | - | 0.78 | 0.59 | 0.52 | |
| ExpertosLF_VS | OtsukaOchiai | ||||
| COREL10K | BestExpert (semantic) | - | 0.87 | ||
| EF | - | ||||
| FreqRankLF | - | 0.41 | 0.52 | ||
| SimRankLF | - | 0.25 | 0.17 | 0.09 | |
| ExpertosLF_VS | SorensenDice | 0.86 | |||
| GHIM10K | BestExpert (semantic) | - | |||
| EF | - | 0.97 | 0.76 | 0.70 | |
| FreqRankLF | - | 0.45 | 0.72 | ||
| SimRankLF | - | 0.27 | 0.25 | 0.10 | |
| ExpertosLF_VS | SorensenDice |
Fig. 9Evolution of the weights for each expert, and evolution of F1 over queries. (figure best seen in color)
Results for Sketch datasets using only visual descriptors. Bold values indicate the best results
| Dataset | Method | Metric | avgP@10 | avgF1 | mAP |
|---|---|---|---|---|---|
| $P | BestExpert (shape) | - | 0.94 | 0.58 | 0.50 |
| EF | - | ||||
| FreqRankLF | - | 0.24 | 0.45 | 0.28 | |
| SimRankLF | - | 0.88 | 0.44 | 0.31 | |
| ExpertosLF_V | OtsukaOchiai | 0.94 | 0.58 | 0.50 | |
| ImiSketchS | BestExpert (joint) | - | 0.43 | 0.33 | 0.16 |
| EF | - | 0.19 | |||
| FreqRankLF | - | 0.27 | 0.33 | ||
| SimRankLF | - | 0.50 | 0.29 | 0.14 | |
| ExpertosLF_V | Overlap | 0.48 | 0.17 | ||
| mCali | BestExpert (shape) | - | 0.52 | 0.21 | 0.10 |
| EF | - | ||||
| FreqRankLF | - | 0.06 | 0.18 | 0.06 | |
| SimRankLF | - | 0.48 | 0.16 | 0.06 | |
| ExpertosLF_V | SorensenDice | 0.43 | 0.21 | 0.10 |
Results for Sketch datasets using visual and semantic descriptors. Bold values indicate the best results
| Dataset | Method | Metric | avgP@10 | avgF1 | mAP |
|---|---|---|---|---|---|
| $P | BestExpert (semantic) | - | 0.68 | 0.61 | |
| EF | - | ||||
| FreqRankLF | - | 0.36 | 0.68 | 0.51 | |
| SimRankLF | - | 0.62 | 0.42 | 0.24 | |
| ExpertosLF_VS | Overlap | 0.89 | 0.61 | ||
| ImiSketchS | BestExpert (semantic) | - | 0.74 | 0.28 | |
| EF | - | ||||
| FreqRankLF | - | 0.33 | 0.26 | ||
| SimRankLF | - | 0.36 | 0.26 | 0.10 | |
| ExpertosLF_VS | SorensenDice | 0.74 | 0.28 | ||
| mCali | BestExpert (semantic) | - | 0.82 | 0.33 | |
| EF | - | ||||
| FreqRankLF | - | 0.16 | 0.26 | ||
| SimRankLF | - | 0.33 | 0.25 | 0.11 | |
| ExpertosLF_VS | SorensenDice | 0.82 | 0.33 |
Fig. 10Evolution of the weights for each expert, and evolution of F1 over queries. (figure best seen in color)
Fig. 11Biomedical: PlantPathology (Q = 800), BrainCE-MRI, COVID19-Rx (Q = 2000), BreakHis (Q = 6000), and IRMA and HAM10000 (Q = 9000). (figure best seen in color)
Fig. 12Real. CopyDays (Q = 800), COREL10K, and GHIM10K (Q = 9000). (figure best seen in color)
Fig. 13Sketch. ImiSketchS (Q = 800), $P (Q = 2000), and mCali (Q = 6000). (figure best seen in color)
Fig. 14Average elapsed time of performing a query on the datasets IRMA, GHIM10K, and mCali, using different fusion approaches. (figure best seen in color)
Fig. 15Distribution of the experts weights per dataset. (figure best seen in color)
Fig. 16Difference between the ExpertosLF_VS solution with visual and semantic experts and the ExpertosLF_V solution using only visual experts performance. Darker shades of green mean that the performance of ExpertosLF_VS is better than ExpertosLF_V. (figure best seen in color)