| Literature DB >> 35469120 |
Davood Zabihzadeh1, Amar Tuama2, Ali Karami-Mollaee3, Seyed Jalaleddin Mousavirad1.
Abstract
An important challenge in metric learning is scalability to both size and dimension of input data. Online metric learning algorithms are proposed to address this challenge. Existing methods are commonly based on Passive/Aggressive (PA) approach. Hence, they can rapidly process large volumes of data with an adaptive learning rate. However, these algorithms are based on the Hinge loss and so are not robust against outliers and label noise. We address the challenges by formulating the online Distance/Similarity learning problem with the robust Rescaled Hinge loss function. The proposed model is rather general and can be applied to any PA-based online Distance/Similarity algorithm. To achieve scalability to data dimension, we propose low-rank online Distance/Similarity methods that learn a rectangular projection matrix instead of a full Mahalanobis matrix. The low-rank approaches not only reduce the computational cost but also keep the discrimination power of the learned metrics. Also, current online methods usually assume training triplets or pairwise constraints exist in advance. However, this assumption does not hold, and generating triplets using available batch sampling methods is both time and space consuming. We address this issue by developing an efficient, yet effective robust one-pass triplet construction algorithm. We conduct several experiments on datasets from various applications. The results confirm that the proposed methods significantly outperform state-of-the-art online metric learning methods in the presence of label noise and outliers by a large margin.Entities:
Keywords: Label noise; Metric learning; One pass triplet construction; Online distance/similarity learning; Rescaled hinge loss; Robust algorithm
Year: 2022 PMID: 35469120 PMCID: PMC9020766 DOI: 10.1007/s10489-022-03419-1
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.086
Summary of the main notations
| Notation | Description |
|---|---|
| Distance or similarity matrix at time t, | |
| Linear projection matrix at time t. | |
| Incoming triplet at time t. | |
| Bilinear similarity function | |
| Mahalanobis distance | |
| p.s.d matrix | |
| The Hinge loss. | |
| The Rescaled Hinge loss | |
| Rescaled parameter in | |
| HQ | Half-Quadratic |
| Auxiliary variable of the HQ algorithm at time | |
| Maximum number of iterations in the HQ algorithm | |
| Slack variable | |
| Weight of the triplet at time t | |
| Adaptive learning rate |
Advantages and limitations of existing online metric learning methods
| Method | Description | Advantages | Limitations |
|---|---|---|---|
| OASIS [ | Using bilinear similarity measure. Omitting the p.s.d constraint. Adopting the Hinge Loss. | Scalable | Non-Robust More prone to overfitting |
| OKSa [ | Extending OASIS in the feature space of a kernel. | Learning non-linear projection in input space | Non-Robust Not providing an online triplet sampling algorithm. More prone to overfitting. The model size increases over time. |
| OMKS [ | Extending OKS for multiple kernel learning. It combines the result of each kernel using the Hedge algorithm. | Learning non-linear projection in the input space. High flexibility of the learned similarity. | Non-Robust. Not providing an online triplet sampling algorithm. More prone to overfitting. The model size increases over time. |
| ODMLb [ | Similar to OASIS but uses Mahalanobis distance. Considers the p.s.d constraint. | Non-Scalable. Resistance against overfitting. | Non-Robust. Non-Scalable. Not providing an online triplet sampling algorithm. |
| OMDML [ | Extends ODML for multiple kernel learning. Combines the result of each kernel using the Hedge algorithm | Learns a non-linear projection in the input space. High flexibility of the learned metric. Resistance against overfitting. | Non-Robust. Non-Scalable. Not providing an online triplet sampling algorithm. |
| SLMOML [ | Online version of the seminal ITML method. Uses logdet regularization. | Scalable to some extent. Still, it has | Non-Robust. Not providing an online triplet sampling algorithm. |
| LPA-ODMLc [ | Learns multiple metrics. Each metric consists of a global (shared) and a local component. Utilizes DRP to achieve scalability. | Learns non-linear projection. Good discrimination power. Resistance against overfitting. | Non-Robust. Not providing an online triplet sampling algorithm. |
| OPML [ | Learns projection matrix directly. Provides a one-pass triplet construction algorithm | Scalable to some extent. Still, needing | Non-Robust. The triplet sampling algorithm does not consider the structure of data. |
| OAHU [ | An online deep metric learning method. Learns a metric per layer in the network. Combines the metrics using the Hedge algorithm. | An end-to-end metric learning. Good discrimination power. Dynamically adapts the complexity of the model. | Non-Robust. Non-Convex. Not providing an online triplet sampling algorithm. Too many parameters. Needs a metric embedding per layer. |
| LSMDML[ | Learns a metric for each source of multi-modal data. Fuses the metrics using a PA-based method. | Good discrimination power. Resistance against overfitting. | Non-Robust. Not providing an online triplet sampling algorithm. |
aOnline Kernel Similarity Learning
bOnline Distance Metric Learning
cLocal Passive/Aggressive Online Distance Metric Learning
Fig. 1Illustration of target neighbors and imposters of [17]
Fig. 2The margin-based Hinge loss function. The loss linearly grows for z ≤ 1 with no bound
Fig. 3The Robust Rescaled hinge loss function vs z with different η values
Fig. 4The proposed neural network model for Low-rank Robust Online Distance/Similarity learning
Fig. 5Illustration of imposters of the data point x
Fig. 6The system flow of the proposed learning/test schemes
Fig. 7Three different types of noisy triplets in the form (x, x, x): (a) Anchor noisy triplet where x is contaminated with label noise, (b) Positive noisy triplet where x has label noise, and (c) Negative noisy triplet where x has a wrong label
Fig. 8T-SNE Visualization of the Wine dataset after applying 10% label noise
Statistics of generated triplets in the Wine dataset contaminated with 10% label noise
| Method | Batch | OPML | OCTG (Ours) | |||
|---|---|---|---|---|---|---|
| Feature | # | Mean Hinge loss | # | Mean Hinge loss | # | Mean Hinge loss |
| Instances | 178 | – | 178 | – | 178 | – |
| Classes | 3 | – | 3 | – | 3 | – |
| Triplets | 413 | 0.92 | 85 | 140 | 0.71 | |
| Normal triplets | 131 | 0.85 | 46 | 0.45 | 105 | 0.39 |
| Noisy triplets | 282 | 0.96 | 39 | 0.51 | 35 | 1.67 |
| Anchor noisy triplets | 38 | 1.01 | 17 | 0.57 | 35 | 1.67 |
| Positive noisy triplets | 23 | 1.02 | 17 | 0.51 | 0 | – |
| Negative noisy triplets | 249 | 0.95 | 14 | 0.42 | 0 | – |
Fig. 9The kNN (k = 3) accuracy of the learned metric of various algorithms in the Wine dataset with 10% label noise
Fig. 10The kNN accuracy of the learned metric by Robust-LODML algorithm (η = 3) in the Wine dataset with 10% label noise
Fig. 11The tSNE visualization of the Wine dataset with 10% label noise where data points are displayed (a) with equal sizes (b) with the sizes proportional to their weights
Fig. 12Mean accuracy of kNN-Robust-LODML (k = 3) vs. η values on the Wine dataset with 20% label noise
Statistics and explanations of evaluated datasets
| Data Set | #classes | n | #dim | d | Description |
|---|---|---|---|---|---|
| Wine [ | 3 | 178 | 13 | 13 | Standard UCI classification dataset. |
| Letters [ | 26 | 20,000 | 16 | 16 | includes 20,000 examples of 26 English capital letters. Images of letters are generated from 20 different fonts and then 16 numerical attributes are extracted from these images. |
Extended Yale Faces [ | 38 | 2414 | 1024 | 200 | is a standard face recognition dataset that contains 2414 face images of 38 classes. For each person, at most 64 images are taken under extreme illumination conditions. |
| Ionosphere [ | 10 | 351 | 34 | 33 | Standard UCI classification dataset. |
| WDBC [ | 2 | 569 | 32 | 30 | Breast Cancer Wisconsin (Diagnostic) Data Set |
| Australian | 2 | 690 | 14 | 14 | was used in a competition on click-through rate prediction jointly hosted by Avazu and Kaggle in 2014. The participants were asked to from the first 10 days of advertising log, estimate the click probability for the impressions on the 11th day. |
| German Credit | 2 | 1000 | 24 | 24 | Each instance represents a person who takes a credit from a bank and is classified as good or bad credit risks according to the set of attributes. |
The classification accuracy of the kNN using the learned metric of the competing methods
| Data Set | Robust-LODML | Robust-ODML | LPA-ODML | ODML | OPML | Euclidean | |
|---|---|---|---|---|---|---|---|
| Wine | 0 5 10 15 20 | 97.65 ± 4.11 96.47 ± 4.96 | 96.47 ± 6.32 97.06 ± 3.10 95.88 ± 5.58 | 97.06 ± 4.16 97.06 ± 5.00 96.47 ± 4.11 94.71 ± 6.47 90.00 ± 6.82 | 96.00 ± 4.78 93.14 ± 3.26 90.86 ± 6.52 89.14 ± 3.73 | 97.06 ± 4.16 95.88 ± 4.84 96.47 ± 4.11 94.12 ± 5.55 91.18 ± 8.43 | 95.29 ± 5.41 93.53 ± 5.154 94.12 ± 4.80 92.35 ± 6.82 85.88 ± 8.41 |
| Letters | 0 5 10 15 20 | 96.80 ± 0.25 | 96.68 ± 0.37 95.85 ± 0.61 94.36 ± 0.44 93.27 ± 0.63 92.20 ± 1.04 | 96.08 ± 0.29 93.57 ± 0.34 91.78 ± 0.28 88.69 ± 0.30 | 96.76 ± 0.29 95.98 ± 0.28 94.08±0.32 91.53 ± 0.71 88.46 ± 0.40 | 96.78 ± 0.34 96.02 ± 0.37 94.29 ± 0.31 91.57 ± 0.40 88.32 ± 0.32 | 95.39±0.36 94.53±.50 92.64±0.51 90.03 ± 0.55 86.67 ± 0.81 |
| Extended Yale Faces | 0 5 10 15 20 | 95.52 ± 1.12 94.27 ± 1.12 93.69 ± 1.02 92.70 ± 1.78 92.37 ± 0.68 | 93.94 ± 0.90 92.86 ± 0.84 92.78 ± 1.24 91.33 ± 0.90 88.88 ± 1.26 | 93.82 ± .82 92.82 ± 1.05 91.70 ± 1.41 88.51 ± 1.19 85.56 ± 0.91 | 93.57 ± 0.88 92.53 ± 0.62 90.95 ± 1.23 88.71 ± 1.32 85.23 ± 1.28 | 93.36 ± 0.89 92.57 ± 0.27 91.54 ± 0.86 88.63 ± 1.03 85.56 ± 0.91 | |
| Ionosphere | 0 5 10 15 20 | 93.14 ± 3.35 90.86 ± 4.82 | 92.00 ± 3.24 91.43 ± 5.39 89.14 ± 9.31 88.00 ± 5.35 | 91.71 ± 4.75 88.86 ± 4.56 87.14 ± 6.06 83.71 ± 4.68 | 90.29 ± 4.09 89.71 ± 3.61 87.71 ± 3.82 87.71 ± 5.23 84.29 ± 6.35 | 86.57 ± 5.05 87.43 ± 4.30 86.57 ± 3.31 84.35 ± 7.31 82.32 ± 7.25 | 84.86 ± 3.29 86.00 ± 3.41 84.57 ± 4.89 81.43 ± 2.26 79.43 ± 6.29 |
| WDBC | 0 5 10 15 20 | 94.29 ± 3.01 91.07 ± 2.92 | 95.36 ± 2.55 93.57 ± 2.41 92.32 ± 3.67 | 95.18 ± 3.04 94.82 ± 2.59 93.04 ± 2.59 90.18 ± 3.69 85.89 ± 3.62 | 95.00 ± 2.64 93.93 ± 3.17 92.68 ± 2.97 88.39 ± 6.08 85.00 ± 5.60 | 95.00 ± 3.84 93.75 ± 3.88 93.57 ± 2.94 88.93 ± 2.35 85.89 ± 3.81 | 92.86 ± 3.15 92.32 ± 3.37 89.11 ± 3.81 85.71 ± 5.26 83.93 ± 6.63 |
| Australian | 0 5 10 15 20 | 85.51 ± 5.11 86.09 ± 5.17 84.78 ± 2.49 85.51 ± 3.92 | 86.23 ± 3.82 | 85.94 ± 5.68 83.77 ± 5.01 82.32 ± 4.92 79.42 ± 5.10 | 85.51 ± 3.98 84.93 ± 4.44 81.45 ± 4.62 78.26 ± 5.68 74.64 ± 6.84 | 83.62 ± 6.11 83.33 ± 5.08 81.45 ± 4.92 81.45 ± 3.54 78.84 ± 4.94 | 82.03 ± 5.43 81.30 ± 5.31 78.99 ± 7.43 76.81 ± 5.76 73.19 ± 5.12 |
| German Credit | 0 5 10 15 20 | 74.60 ± 2.07 73.20 ± 4.44 73.20 ± 3.58 | 74.70 ± 2.21 73.40 ± 5.42 71.50 ± 4.14 | 73.50 ± 4.40 72.70 ± 3.13 71.20 ± 5.09 70.30 ± 5.54 | 73.90 ± 2.38 72.40 ± 4.20 71.10 ± 5.22 68.60 ± 2.95 65.70 ± 3.68 | 72.20 ± 3.79 70.30 ± 4.50 69.40 ± 5.21 65.20 ± 3.88 63.30 ± 5.14 | 69.40 ± 4.14 67.90 ± 5.26 66.50 ± 5.99 64.30 ± 4.79 62.30 ± 6.20 |
Fig. 14Boxplots of some statistically different results with p-value = 5%
Fig. 13Comparison of the classification accuracy of RDML with other DML methods versus label noise
Fig. 15Four images from the COVID-19 dataset. First row: Normal cases, Second row: COVID-19 patients
Classification metrics of kNN using the learned metrics of competing methods on the COVID-19 dataset
| Method | Accuracy | Sensitivity | Precision | Specificity | G-mean | F1-Score | |
|---|---|---|---|---|---|---|---|
Robust-ODML Robust-LODML LPA-ODML ODML OPML BLMNN | 0 | 99.23 ± 0.66 99.42 ± 0.57 99.36 ± 0.23 99.29 ± 0.42 99.23±0.70 | 95.35 ± 3.52 96.54 ± 3.80 96.62±2.61 96.29 ± 3.77 | 99.50 ± 1.12 99.13 ± 1.25 98.10 ± 2.18 98.60 ± 2.29 98.57 ± 0.97 | 99.92 ± 0.18 99.85 ± 0.21 99.70 ± 0.31 99.78 ± 0.33 99.93 ± 0.17 | 97.59 ± 1.83 98.20 ± 1.96 98.50 ± 1.42 98.18 ± 1.25 97.62 ± 2.10 | 97.43 ± 1.99 97.97 ± 2.16 97.66 ± 0.88 97.56 ± 1.10 97.40 ± 1.96 |
Robust-ODML Robust-LODML LPA-ODML ODML OPML BLMNN | 5 | 99.10 ± 0.48 98.97 ± 0.35 98.97 ± 0.42 98.53 ± 0.49 98.59 ± 0.87 | 96.11 ± 2.61 94.71 ± 4.12 96.50 ± 3.37 93.81 ± 3.21 92.64 ± 3.06 | 97.75 ± 1.30 95.83 ± 1.25 96.29 ± 2.26 95.72 ± 5.45 97.23 ± 3.73 | 99.62 ± 0.28 99.33 ± 0.17 99.40 ± 0.33 99.34 ± 0.79 99.55 ± 0.62 | 97.84 ± 1.30 97.26 ± 2.13 97.93 ± 1.64 96.52 ± 1.41 96.02 ± 1.78 | 96.90 ± 1.13 96.37 ± 1.08 96.34 ± 1.39 94.62 ± 1.97 94.86 ± 3.02 |
Robust-ODML Robust-LODML LPA-ODML ODML OPML BLMNN | 10 | 98.21 ± 1.23 97.50 ± 0.89 98.08 ± 0.60 97.88 ± 1.12 | 94.90 ± 2.03 88.95 ± 7.19 92.53 ± 4.39 90.53 ± 5.19 89.90 ± 7.86 | 94.78 ± 3.34 94.31 ± 3.53 90.20 ± 7.81 96.04 ± 4.68 94.76 ± 2.82 | 99.00 ± 1.01 99.02 ± 0.64 98.37 ± 1.22 99.41 ± 0.66 99.18 ± 0.49 | 96.92 ± 0.87 94.18 ± 3.73 95.38 ± 2.00 94.83 ± 2.44 94.35 ± 4.15 | 94.62 ± 2.78 93.59 ± 3.62 91.09 ± 3.33 93.00 ± 1.19 92.10 ± 4.48 |
Robust-ODML Robust-LODML LPA-ODML ODML OPML BLMNN | 15 | 97.95 ± 0.66 97.76 ± 1.96 95.90 ± 0.83 96.03 ± 0.18 96.28 ± 2.03 | 95.68 ± 3.36 90.78 ± 3.68 93.69 ± 4.61 93.41 ± 2.71 86.19 ± 10.62 | 89.16 ± 4.22 90.52 ± 7.72 79.84 ± 5.70 80.84 ± 4.63 87.38 ± 7.07 | 98.14 ± 0.62 98.09 ± 1.80 96.29 ± 0.75 96.50 ± 0.48 97.91 ± 1.21 | 96.87 ± 2.46 95.01 ± 1.99 94.96 ± 2.35 94.93 ± 1.16 91.72 ± 5.87 | 92.84 ± 2.59 92.97 ± 5.58 86.09 ± 3.79 86.56 ± 1.86 86.57 ± 7.55 |
Robust-ODML Robust-LODML LPA-ODML ODML OPML BLMNN | 20 | 96.73 ± 0.69 97.31 ± 1.03 92.63 ± 0.93 92.44 ± 1.41 93.91 ± 1.55 | 96.08 ± 4.67 89.82 ± 4.86 90.45 ± 3.36 91.40 ± 5.26 81.31 ± 7.12 | 82.06 ± 5.47 86.35 ± 5.73 67.32 ± 5.37 66.55 ± 7.70 76.70 ± 4.70 | 96.57 ± 0.93 97.54 ± 0.97 92.99 ± 0.71 92.64 ± 1.62 95.97 ± 0.81 | 96.78 ± 2.38 94.18 ± 2.90 91.70 ± 1.91 91.98 ± 2.59 88.27 ± 4.19 | 89.17 ± 2.67 90.82 ± 3.51 77.09 ± 3.93 76.75 ± 5.41 78.88 ± 5.55 |
Fig. 162×Sensitivity + Precision and G-means of the competing methods on the COVID-19 dataset
Mean of confusion matrices of proposed methods obtained by 5-fold cross validation on the COVID-19 dataset (label noise = 20%)
| Predicted Positive (COVID-19) | Predicted Negative (Normal) | ||
|---|---|---|---|
| Robust-LODML | Actual Positive (COVID-19) | 42.00 | 1.8 |
| Actual Negative (Normal) | 6.60 | 261.60 | |
| Robust-ODML | Actual Positive (COVID-19) | 42.8 | 1.00 |
| Actual Negative (Normal) | 9.2 | 259.00 |
Fig. 17Mean run time of evaluated methods in 5-fold validation setting in the COVID-19 dataset
Summary of statistics and run-time of the competing methods in a noise free (nl = 0%) and high-level noisy (nl = 20%) settings
| Method | #triplets | #active | hyper-parameters | run-time (sec) | |
|---|---|---|---|---|---|
Robust-ODML Robust-LODML LPA-ODML ODML OPML | 0 | 43.40 43.40 1231.00 1231.00 1231.00 | 21.00 26.80 65.00 100.60 33.20 | 0.7297 0.6810 3.3945 2.4982 | |
Robust-ODML Robust-LODML LPA-ODML ODML OPML | 20 | 323.00 323.00 1245.00 1245.00 1245.00 | 325.80 292.60 508.80 572.40 515.00 | 1.4442 5.1661 2.7546 5.5909 |