Literature DB >> 35434402

Recognition awareness: adding awareness to pattern recognition using latent cognizance.

Abstract

This study investigates an application of a new probabilistic interpretation of a softmax output to Open-Set Recognition (OSR). Softmax is a mechanism wildly used in classification and object recognition. However, a softmax mechanism forces a model to operate under a closed-set paradigm, i.e., to predict an object class out of a set of pre-defined labels. This characteristic contributes to efficacy in classification, but poses a risk of non-sense prediction in object recognition. Object recognition is often operated under a dynamic and diverse condition. A foreign object-an object of any unprepared class-can be encountered at any time. OSR is intended to address an issue of identifying a foreign object in object recognition. Softmax inference has been re-interpreted with the emphasis of conditioning on the context. This re-interpretation and Bayes theorem have led to an approach to OSR, called Latent Cognizance (LC). LC utilizes what a classifier has learned and provides a simple and fast computation for foreign identification. Our investigation on LC employs various scenarios, using Imagenet 2012 dataset as well as foreign and fooling images. Its potential application to adversarial-image detection is also explored. Our findings support LC hypothesis and show its effectiveness on OSR.

Entities: Chemical

Keywords: Artificial neural network; Machine learning; Object recognition; Open-set recognition; Pattern recognition; Softmax

Year: 2022 PMID： 35434402 PMCID： PMC9010638 DOI： 10.1016/j.heliyon.2022.e09240

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

A well-adopted softmax function along with its accompanying cross-entropy loss has been introduced by Bridle [1] in 1990. Since then, softmax has been proved effective and used extensively in classification. With a proper setting, a softmax output converges to a class probability conditioned on the input. Nevertheless, softmax limitations become more noticeable in object recognition [2], [3], where a chance to encounter an un-prepared class is common. Many approaches address this issue through open-set recognition (OSR). While there are various approaches to OSR, one potential is striking as it has been derived directly from a probabilistic re-interpretation of a softmax output. It is Latent Cognizance (LC) proposed by Nakjai and Katanyukul [4]. LC was not originally intended for OSR. It was to address the issue of un-prepared classes in hand-sign recognition, but its underlying hypothesis is general. Its mechanism can fit to various domains. The new interpretation underlying LC has been verified using synthetically traceable examples [5]. LC applications have been shown to be effective in hand-sign recognition [4] and facial expression recognition [6]. However, the recently introduced LC has not been adequately investigated for other domains. An investigation of LC application to a more general domain, such as OSR, will allow a better insight into LC potential and the hypothesis behind it. Our study here is set out to investigate an application of LC to OSR and its related issues, such as an OSR evaluation metric, a role of its base classifier, and its potential for adversarial-image detection.

Background

Despite that a softmax output is conventionally viewed as class probability, many literature [7], [8], [9] have commented on a softmax output that often found uncorrelated to class probability, especially when an input is “foreign”. For conciseness, a foreign1 image will be referred to an image of any class that has not been included in model preparation. Based on this observation, a new interpretation (§2.1) of a softmax output is proposed [4]. The new interpretation reveals a relation between penultimate values and posterior probabilities, which in turn has led to an invention of Latent Cognizance (LC, §2.2). LC exploits what already learned in a deep network to estimate a probability if an image is “domestic”—i.e., of any category used in model preparation. This allows its application to open-set recognition (§2.3).

Probabilistic interpretation of softmax output

A softmax function is commonly employed for multi-class classification. Softmax computation (1) is performed at the last calculation of a classifier. To classify an input x into 1 of K classes, it is to compute the predicted class output . Denote a softmax output , where A logit or penultimate vector , when w represents network parameters and is a network computation prior to the softmax. The realization of depends on a specific network configuration, while values of w are obtained through a training process. Softmax regulates and for all k's. Bridle [1] shows that a well-trained classifier has its softmax output converged to the posterior class probability: . Softmax is effective in classification. Extended beyond classification, softmax output is often found unrelated to class probabilities when an input is foreign [7], [8], [9]. This observation contradicts the carried-over perception: . Thus, the softmax output is reinterpreted as a class probability conditioned on a given domestic input [4]: where s indicates the context that x is domestic or . This interpretation emphasizes the domestic condition s. The realization of this conditioning is not unique to LC. Various literature [10], [11] have stated this conditioning. This interpretation has also been verified using a set of traceable (but small) examples [5].

Latent cognizance

Based on (2) and Bayes' theorem, the softmax output can be written as: Conferring (1) to (3), the relation (4) is found: Consider similar patterns on both sides of (4), a relation between penultimate values and the probabilities is hypothesized: given a well-trained network, the penultimate vector relates to posterior probability through function . To lessen a burden on enforcing probabilistic properties, it is more convenient to work with a function whose value just correlates to the probability rather than working directly with . Assume that there exists a monotonic function such that . Thus, marginalization reveals Function is called a cognizance function. A marginalized cognizance quantifies the degree of x being domestic: the lower value is, the more likely x is foreign.

Open-set recognition

Open-Set Recognition (OSR) can be viewed as a mapping task, , where an input image x is mapped to a label index. Indices are associated to K pre-defined classes used in model preparation/training process. These prepared classes are called domestic. Label index 0 represents any class not used in the training process, collectively called a foreign class. In practice, a foreign class can represent multiple classes. OSR is different from object detection [10], [12], [13]. As Scheirer et al. [2] have pointed out, an object-detection model is usually trained on images containing both positive and negative examples, while OSR is to identify if an image is foreign or domestic without negative examples. If an image is domestic, OSR also has to recognize its category. With this outline, OSR can be viewed as a conventional object recognition with additional novelty detection capability. Many novelty detection methods [14] rely on some kinds of a distance-based scheme, using a distance between the input and the training samples as a cue. This distance-based approach is often characterized by a search over training examples (or their representatives). This search is computationally expensive. In additions, designed solely for novelty detection, these methods do not scale well to OSR. Neither does it exploit a well-trained classifier, which is available in OSR settings. To address the issue from OSR perspective, Scheirer et al. [2] formalize OSR definition and propose a concept of an open-space risk as well as a SVM-based one-vs-set machine. Instead of using a single hyperplane as in one-class SVM [15], the one-vs-set machine uses two hyperplanes to bound the domestic samples, in order to minimize the open-space risk. Later approaches [3], [16] rely more on statistic models and are reported with better foreign identification. OpenMax [3] employs statistic models to identify foreign samples: a sample whose all penultimate values are too different from their corresponding statistics is likely to be foreign. Specifically, OpenMax operates in two phases: (phase 1) meta-recognition calibration is to learn domestic statistics and (phase 2) class probability re-adjustment is to re-adjust softmax output and estimate a probability of being foreign, based on statistics learned in phase 1. In addition, OpenMax employs thresholding to overrule domestic prediction if its probability is below the threshold. Weibull distribution is used to model each intra-class distance. Similar to OpenMax, Extreme Value Machine (EVM [16]) also uses statistic models. But rather than relying on a base-classifier like what OpenMax does, EVM proposes to use each class statistic model for both classification and foreign identification. The result allows a training process to be done at once—it learns both classification and foreign identification in one training process. In addition, EVM can easily add a statistic model for each new class found. However, this comes with a cost of using a less efficient classification inference. EVM has to search over all non-redundant training examples to complete the task. Note that both OpenMax and EVM implicitly imply a uni-modal distribution of intra-class distances. Resorting to a generative model, Neal et al. [9] have used counterfactual images as foreign samples to train a new classifier accounting for classes: K domestic classes and one additional class representing any foreign class. The counterfactual images are supposed to look closest to the domestic images, but not belong to any domestic class. With conjecture that counterfactual images lie just outside the ideal decision boundaries, using them in training could help tighten decision boundaries of the new classifier. Pivotal to their approach, it is how to create counterfactual images. Neal et al. have prepared the encoder and the generator, then used them in the synthesis of counterfactual images. The encoder and the generator are obtained through a generative adversarial framework with reconstruction loss [17], [18], [19], [20]. That is, given a set of training images X, discriminator D and generator G (along with encoder E) are trained in alternating steps with discriminator and generator losses as and , respectively. Lagrange multiplier λ is a user specific parameter. The encoder and generator are trained jointly. The encoder E is to map an input image to its representation in a latent space. The generator G is to reconstruct an image back from a latent representation. Then, to synthesize a counterfactual image , a base image x is encoded to a latent base . A counterfactual representation is obtained through optimization: , where gives the output of a K-class classifier. At last, a counterfactual image is generated: . Thus a counterfactual is presumably foreign, but very similar to its domestic base x. The second term of the counterfactual objective is supposed to constrain a latent to be foreign. Neal et al. have formulated it based on the assumption that a classifier gives low output values on a foreign sample. This assumption glimpses that LC and counterfactual approaches could complement each other and the issue is worth a dedicated study. With a concern that some information might have been lost during supervised training, Yoshihashi et al. [11] have proposed classification-reconstruction learning, where a compact representation is learned simultaneously to the classification. The compact representation z is learned through reconstruction of the input. Then, it is used as additional information (along with classifier's penultimate vector a) provided to a foreign detector—a binary classifier based on distance between and their average values. They believe that the compact representation will compensate for the presumably lost information. In spite of these approaches, OSR remains greatly challenging [21]. Most OSR approaches require a considerable extra mechanism or a re-design of the entire model. In the sense of a substantial effort crafted for the task, they are more comparable to Kahneman [22]'s analytical system II decision. LC approach requires much less effort and relies on quick deduction using only cues provided by a base classifier. It is more analogous to Kahneman's instinct system I decision. As human survival relies on both decision systems, we believe that a practical OSR system, or more generally a robust intelligent agent, may not need to pick only one best approach. Both systems can co-exist and complement each other, as this has shown to be a winning strategy in nature. A more comprehensive review on OSR is provided by Geng et al. [23]. Concern over information loss A concern over information loss [11] might be associated to softmax bottleneck [24], [25]. Yang et al. [24] have analyzed a softmax-based model for its capacity to represent a conditional probability for a language model. Based on matrix factorization, they have concluded that a softmax-based model does not have enough capacity to express the true language distribution. This is referred to as “softmax bottleneck”. Their rationale is drawn based on the diversity of a domain of natural language and a standard practice of computing a softmax language model. A softmax language model is computed using a logit, which is a dot product between a fixed-size context vector and a word-embedding vector. The chosen fixed size is generally too small for diversity of a natural language context. How this nature carries over to other domains and settings may be subject to dedicated studies. Nonetheless, our investigation on LC effectiveness may answer this concern for OSR in some degree. Related but with slightly different objectives, many studies [7], [8], [26] investigate mechanisms to quantify uncertainty of a classification inference. Inference uncertainty is a measure quantifying a degree of confidence or a level of expertise in making a particular prediction. Gal and Ghahramani [7] have discussed that inference uncertainty is different from a model confidence, which is conventionally taken as a value of each softmax output. A value of the softmax output can be shown to be very high (close to one) even when the input lies far beyond vicinity of the training samples in the input space. In this respect, quantifying inference uncertainty is similar to quantifying a degree of being foreign (in our context). However, a striking distinction between identifying a foreign and quantifying uncertainty is at the difficult classification or ambiguity among domestic classes. Difficulty in distinguishing among domestic classes is well encompassed by inference uncertainty, but this is not an issue of foreign identification. However, as these are closely related, a potential application of LC to inference uncertainty seems highly likely and our study here could lay a ground for such an investigation.

Models and methods

Section 2.2 explains Latent Cognizance (LC) from a mathematical perspective. This section elaborates it from architectural and procedural views with focus on its application to Open-Set Recognition (OSR). To elaborate the LC approach, given a softmax discriminative model (Fig. 1, it is commonly trained with a cross-entropy loss (Fig. 2). First, it can be shown that the training, i.e., for is a true class for , will have the model converge to predict where represents a probability that the true class is k given the input x.

Figure 1

Commonly-used softmax discriminative model.

Figure 2

Common training setting for a softmax discriminative model.

Commonly-used softmax discriminative model. Common training setting for a softmax discriminative model. Since , to minimize is to have for all n's and that . Thus the solution to is to have . However, recall that . That is, when the training is a closed set s, and the model may be used in an open set , the prediction actually converges to . Note that this observation is quite common in practice—conditioning on the context is usually present—but the emphasis on this conditioning leads to an awareness of the validity of the prediction, particularly essential in open-set recognition. LC approach is drawn upon a relation between softmax calculation and Bayesian derivation of the interpreted probability, or shortly . Consider Equation (4), one may deduce , where is some function and . Then marginalization gives . The rightmost term is a posterior probability representing any domestic class and consequently indicating validity of the classification. In practice, is quite elusive, therefore the idea is to empirically choose a cognizance function that is good enough to estimate . Fig. 3 emphasizes LC as an additional component into a commonly-used softmax discriminative model.

Figure 3

A softmax discriminative model with LC addition.

A softmax discriminative model with LC addition. Our main investigation is on LC approach, whose mechanism can be built on top of a well-prepared object recognition model (LC is shown in Fig. 4c). A typical object recognition model (Fig. 4a) employs feature extraction, then applies class-scoring process, whose function performed by penultimate and softmax layers shown in the figure. The softmax layer regulates values of the output vector making them interpretively probabilities. Nakjai and Katanyukul [4] re-interpreted the softmax output, then derived and proposed LC—specifically the use of marginalized cognizance (5).

Figure 4

Architectures of conventional object recognition (a), OpenMax (b), and Latent Cognizance (c).

Architectures of conventional object recognition (a), OpenMax (b), and Latent Cognizance (c). Since marginalized cognizance is proportional to probability of being domestic, LC can be straightforwardly applied to OSR. For OSR with K domestic classes, our application is as follows. Choose a base classifier where a predicted class and softmax output when is a penultimate vector computed from the input x. Choose a cognizance function . Choose a threshold τ. Compute a marginalized cognizance . If , predict class κ; otherwise predict class 0 (foreign). Choices of the cognizance function and threshold can be empirically obtained. A well-adopted classifier can be exploited for the base classifier C. This characteristic is beneficial as this foreign identification can be seamlessly added to a well-established classification system. However, a potential down side is that since LC logic heavily relies on a penultimate vector computed by its base classifier, its performance may be closely tied to the base classifier. How much effect the base classifier has on LC had not previously explored. Thus, we have also investigated this issue in our experiments. In addition, to properly evaluate OSR, we propose metric counts (Table 2) and a performance metric Q1 (based on F-score, §4). The metric counts have accounted for every case in OSR evaluation.

Table 2

Our OSR metric. Symbols i, i′, f, and u represent respectively a sample of domestic class i, a sample of domestic class i′(≠i), a fooling sample, and a sample of any foreign class. The evaluated systems do not have f output. Thus, predicting u on f is counted as TP.

Ground truth	Prediction	Metric count
i	i	TP_i
i	i′	FN_i and FPi′
i	u	FN_i and FP_u
u	i	FN_u and FP_i
u	u	TP_u
f	i	FN_f and FP_i
f	u	TP_f

Experiments

OpenMax and Latent Cognizance (LC) are investigated on Open-Set Recognition (OSR). Fig. 4 illustrates structural differences between OpenMax (Fig. 4b) and LC (Fig. 4c). An internal structure of a conventional object recognition network resorts to a softmax layer at the end. OpenMax replaces a softmax layer with OpenMax computation. LC extends a conventional object recognition with cognizance computation (5). Model preparation OpenMax and LCs use Alexnet [27] with pre-trained weights as their base classifier. A base classifier provides a penultimate vector for OpenMax and LC. The pre-trained weights were obtained from Caffe [28], which has trained Alexnet on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012 dataset. There is no additional fine-tuning on Alexnet weights. OpenMax hyperparameters In its meta-calibration phase (§2.3), OpenMax requires domestic learning. Our experiment follows default meta-parameter values of Bendale and Boult's implementation2: Weibull tail size , a number of top classes , and using Euclidean Cosine method. OpenMax domestic learning is to find , , , and for all domestic classes . Noted that LCs do not require domestic learning. LC hyperparameters LC uses a cognizance function . The previous work [4] has empirically explored various candidates for and found cubic and exponential functions viable. Our experiment investigates both as cognizance functions. Data Our experiment uses two datasets to evaluate the models. (1) A domestic test dataset is taken from ILSVRC 2012 validation set, as summarized in Table 1. It has 50000 images belonging to 1000 classes. (2) An open dataset is a combination of 108360 selected images from ILSVRC 2010 and 15000 fooling images. All of the 108360 selected examples belong to 360 classes, which all of them are not in ILSVRC 2012. All of 15000 fooling images are random noise images with additional perturbations. The perturbations are based on an adversarial image generation, loosely implemented the work of Szegedy et al. [29].

Table 1

Associated datasets.

Dataset	Source	Size	Remark
Classifier training set	ILSVRC 2012 training set	1281167	Via pre-trained Alexnet
OpenMax domestic learning	ILSVRC 2012 training set	1281167	OpenMax only
Test domestic data	ILSVRC 2012 validation seta	50000
Test fooling images	Newly generated	15000
Test foreign images	ILSVRC 2010 training set	108360	Only classes not in 2012

The ILSVRC 2012 validation set is chosen over the test set for its availability of its ground truth.

Associated datasets. The ILSVRC 2012 validation set is chosen over the test set for its availability of its ground truth. Szegedy et al.'s formulation is that for a given target class k, perform st. ; , where r is an input perturbation and x is a base input of D dimensions. Function represents a classifier. Our implementation relaxes Szegedy et al.'s, i.e., find st. , where is referred to the softmax output of the classifier. Parameter α is user specific and set to 0.9 in our experiments. All generated fooling images have been thoroughly inspected. Performance index Our experiment evaluates the models through performance index Q1. The index is defined as: , where is a performance measure of domestic samples and is an F-score of foreign and fooling samples. Specifically, is an arithmetic mean of class F-scores, i.e., , where is an F-score of the class and is a set of domestic-class indices. The class F-scores are defined as for and , where ϵ is a small number for computational stability and set to 0.0001 in our experiment. Precisions and Recalls are defined as and for and and . True positives TP's, false positives FP's, and false negatives FN's are defined as shown in Table 2. Our OSR metric. Symbols i, i′, f, and u represent respectively a sample of domestic class i, a sample of domestic class i′(≠i), a fooling sample, and a sample of any foreign class. The evaluated systems do not have f output. Thus, predicting u on f is counted as TP. Comfort ratio A proportion of domestic data can be an indicator of how difficult the task is. A ratio of domestic samples to all samples will be called a comfort ratio. Our investigation experiments 4 scenarios of different comfort ratios, as shown in Table 3. A different number of images per foreign class is chosen to set a scenario.

Table 3

Test scenarios.

Test case	I	II	III	IV
Comfort ratio	0.625	0.500	0.333	0.288
#images/foreign class	42	97	236	300
Foreign data	15120	34920	84960	108000
Fooling data	15000	15000	15000	15000
Domestic data	50000	50000	50000	50000

Test scenarios. Subsections 4.1, 4.2 and 4.3 provide the main results, error analysis and additional investigation on potential application to adversarial-image detection.

Results

Fig. 5 and Table 4 show OSR performance over four scenarios. Table 5 reports all time durations spent in each operation. Total time spent in domestic learning reports a total time spent to fine-tune the open-set capability. Average time spent in foreign identification is an average time per image that a method spent to identify whether an image is domestic or foreign.

Figure 5

OSR Performance. Q1 of each method over different comfort ratios. OSR performance of Alexnet is provided only for perspective.

Table 4

Performing results.

	Comfort ratio (%)	OpenMax	Exponential LC g(a)=exp⁡(a)	Cubic LC g(a)=a3	Alexnet
Q1	62.5	0.553	0.578	0.566	0.253
	50.0	0.579	0.575	0.535	0.231
	33.3	0.602	0.591	0.539	0.197
	28.8	0.606	0.595	0.542	0.186

Table 5

Time spent. All time durations are reported in seconds. Normalized time durations are shown in parentheses.

	Total time spent in domestic learning	Average time spent in foreign identification
OpenMax	4.1 × 10⁴ (5.5 × 10⁵)	1.03 (13.7)
Exponential LC	0	7.2 × 10⁻⁵ (9.6 × 10⁻⁴)
Cubic LC	0	9.3 × 10⁻⁵ (1.2 × 10⁻³)

OSR Performance. Q1 of each method over different comfort ratios. OSR performance of Alexnet is provided only for perspective. Performing results. Time spent. All time durations are reported in seconds. Normalized time durations are shown in parentheses. A number in parentheses represents a normalized time. It is normalized by classification time per image. Classification time per image is an average time Alexnet spent to classify an image. It is measured to be s. All three methods use Alexnet and are subject to the same classification time per image. Exponential LC seems to provide slightly better performance than its cubic counterpart. The performances of all three methods are comparable, but LC methods spent considerably less time than OpenMax did. In addition, OpenMax requires significant domestic learning time, while both cubic and exponential LCs can work right off the shelf. All three methods seem to be robust against various comfort ratios.

Error analysis

Table 6, Table 7 show confusion matrices of OpenMax and exponential LC with thresholds at maximal Q1's. The test data is composed of three distinct groups, while prediction is only limited to either domestic or foreign. The tables differentiate predicting domestic on domestic samples with correct classification (CR) and incorrect classification (IC). Table entries are obtained from: (predicting a correct class on domestic samples), (predicting an incorrect class on domestic samples) and other metrics are obtained as specified in Table 2.

Table 6

OpenMax confusion matrices.

Comfort ratio (%)	Prediction	Data
Comfort ratio (%)	Prediction	Domestic	Fooling	Foreign
62.5%	Domestic	CR: 19981IC: 4526	1315	3957
62.5%	Foreign	25493	13685	11163
50.0%	Domestic	CR: 19755IC: 4322	1090	8862
50.0%	Foreign	25923	13910	26058
33.3%	Domestic	CR: 18043IC: 3164	297	16987
33.3%	Foreign	28793	14703	67973
28.8%	Domestic	CR: 17778IC: 3010	224	20941
28.8%	Foreign	29212	14776	87059

Table 7

Exponential LC confusion matrices.

Comfort ratio (%)	Prediction	Data
Comfort ratio (%)	Prediction	Domestic	Fooling	Foreign
62.5%	Domestic	CR: 24527IC: 11572	646	8518
62.5%	Foreign	13901	14354	6602
50.0%	Domestic	CR: 20936IC: 7561	4	12226
50.0%	Foreign	21503	14996	22694
33.3%	Domestic	CR: 18365IC: 5576	0	20251
33.3%	Foreign	26060	15000	64709
28.8%	Domestic	CR: 17320IC: 4931	0	22132
28.8%	Foreign	27749	15000	85868

OpenMax confusion matrices. Exponential LC confusion matrices. Since OSR performance incorporates both classification and foreign identification aspects, Table 8 shows separated accuracies by sub-function: accuracies of foreign identification (denoted “F ACC”) and accuracies of classification (denoted “C ACC”).

Table 8

Accuracies of foreign identification (F ACC) and accuracies of classification (C ACC).

	Comfort ratio (%)	OpenMax	Exponential LC	Cubic LC
F ACC	62.5	0.616	0.712	0.730
	50.0	0.641	0.662	0.639
	33.3	0.693	0.612	0.643
	28.8	0.709	0.712	0.670
C ACC	62.5	0.815	0.679	0.629
	50	0.820	0.735	0.658
	33.3	0.851	0.767	0.696
	28.8	0.855	0.778	0.708

Accuracies of foreign identification (F ACC) and accuracies of classification (C ACC). Breaking down performance into foreign identification and classification reveals that exponential LC performs pretty well on foreign identification (F ACCs are 0.612 to 0.712 across scenarios). When considering foreign identification alone, both LCs are on par with OpenMax. The classification aspect is mostly attributed to the base classifier. Although all methods employ the same Alexnet as their base classifier, classification accuracies are shown to be varied greatly. The explanation may be that foreign identification changes a number of domestic samples to be evaluated for classification performance. For example, when difficult domestic samples get incorrectly identified as foreign, this hurts F ACC, but it helps C ACC: a number of incorrectly-classified samples is decreased. In addition to tail statistics and compact abating probability, OpenMax does thresholding on the maximal class probability. This mechanism filters out too low class probability and may lead to OpenMax tendency toward predicting foreign. OpenMax thresholding mechanism may have provided a boost on OpenMax classification accuracies, as it could bring C ACC to reach 81.5%, conferring to its base classifier Alexnet's reported top-1 accuracy of 57.1%. Exponential LC Fig. 6 shows boxplots of marginalized exponential cognizance of different data groups at comfort 28.8%. On the left, marginalized cognizance values of domestic samples (including both correctly-classified and incorrectly-classified domestic samples, denoted “Domestic”), foreign samples (denoted “Foreign”), and fooling samples (denoted “Fooling”) are shown. On the right, marginalized cognizance values of correctly-classified domestic samples (denoted “Correct”) and ones of incorrectly-classified domestic samples (denoted “Incorrect”) are shown separately.

Figure 6

Boxplots of marginalized exponential cognizance values of different groups at 28.8% comfort. (a) Boxplots are shown for Domestic, Foreign and Fooling groups. (b) A domestic group is broken down to correct and incorrect classifications. Fig. 6 exposes an important aspect for evaluating OSR. While it is difficult to threshold for separation between domestic and foreign (as shown in the left plot), the use of marginalized cognizance can well distinguish the correctly-classified domestic samples from foreign samples (as shown in the right plot). A true challenge of OSR may actually lie in differentiation between difficult classifying and foreign samples, as a previous work [30] has also pointed out. Evaluating OSR without the base-classifier misclassified To quantify the effect of classifier misclassification, we examine OSR performance without incorrectly-classified samples. Table 9 and Fig. 7 show OSR performances of the three methods after removing the base-classifier misclassified samples. That is, the evaluation was conducted in a similar manner as described earlier, but all domestic test samples that Alexnet misclassified were discarded. All results seem much more promising: all Q1 measures are over 0.6. With improvement over 19%, the significance of a base classifier is apparent.

Table 9

Q1 after removing base-classifier misclassified samples. Percentage improvement (conferred to Table 4) is shown in parentheses.

Comfort ratio (%)	OpenMax		Exponential LC g(a)=exp⁡(a)		Cubic LC g(a)=a3		Alexnet
62.5	0.758	(37.1%)	0.783	(35.5%)	0.757	(33.7%)	0.415	(64.0%)
50.0	0.750	(29.5%)	0.744	(29.4%)	0.690	(29.0%)	0.360	(55.8%)
33.3	0.733	(21.8%)	0.720	(21.8%)	0.651	(20.8%)	0.288	(46.2%)
28.8	0.726	(19.8%)	0.714	(20.0%)	0.645	(19.0%)	0.267	(43.5%)

Figure 7

OSR performance over different comfort ratios after removing base-classifier weakness.

Q1 after removing base-classifier misclassified samples. Percentage improvement (conferred to Table 4) is shown in parentheses. OSR performance over different comfort ratios after removing base-classifier weakness. A large number of incorrectly-classified samples may reflect ambiguity in domestic classes or immaturity of a classifier. Attention to this aspect may allow an understanding in the underlying factors and a further improvement.

Examining potential application to detection of adversarial images

As exponential LC is shown to accurately identify fooling images in Table 7, it may appear as if LC may be able to address the issue of adversarial-image detection. However, fooling images are quite different from the actual adversarial images. The fooling images were generated using random noise as base images and this may give away too much clue than actual adversarial images do. To properly examine the issue, 15000 adversarial images were generated and tested against the 50000 domestic images. The adversarial images were generated in the same process generating fooling images described earlier, but—instead of random noise—the base images were randomly chosen from images of other 999 classes (excluding the target class). The resulting images were visually inspected. Fig. 8 shows boxplots of marginalized exponential cognizance of various image types, including adversarial images. Fig. 9 shows Precision-Recall (P-R) plots of adversarial-image detection: binary classification whose positive refers to an adversarial sample and negative refers to a regular example (without adversarial manipulation). Table 10 shows Area Under Curves (AUCs) of P-R plots of each method. For perspective, a random classifier was tested on adversarial-image detection and achieved AUC 0.317 on average (10 repeats).

Figure 8

Boxplots of marginalized cognizance. (a) Boxplots are shown in Domestic, Adv (adversarial), and Fooling groups. (b) The Domestic group is broken down to Correct and Incorrect groups.

Figure 9

Precision-Recall plots of adversarial-image detection. (a) P-R curves of distinguishing adversarial images from domestic images. (b) P-R curves of distinguishing adversarial images from correctly-classified domestic images.

Table 10

AUC of P-R plot: detecting adversarial images.

	OpenMax	Exponential LC	Cubic LC
Adversarial and domestic data	0.379	0.423	0.458
Adversarial and correctly classified data	0.635	0.741	0.719

Boxplots of marginalized cognizance. (a) Boxplots are shown in Domestic, Adv (adversarial), and Fooling groups. (b) The Domestic group is broken down to Correct and Incorrect groups. Precision-Recall plots of adversarial-image detection. (a) P-R curves of distinguishing adversarial images from domestic images. (b) P-R curves of distinguishing adversarial images from correctly-classified domestic images. AUC of P-R plot: detecting adversarial images. Small AUCs (top row, Table 10) rule out a side benefit of any of these OSR methods as an effective adversarial-image detector. However, better AUCs (bottom row) are achieved when tested against only correctly-classified samples. This may disclose some potential of these approaches, but an improvement or further investigation may require a dedicated study.

Complementary comparison

In order to put OSR into perspective, OpenMax and LCs are compared against an off-the-shelf object detection. An object detection can simply be seen as , where x is an input image; is the detection; and M is a number of all possible detections. Each detection usually composes of bounding box coordinates c, detection score p, and object class k. A well-trained Faster R-CNN ResNet101 V1 model3 is chosen for this comparison. To set up a comparable setting, the test dataset composes of 7020 images. The 1000 of these images from ILSVRC 2012 validation set belong to 20 classes verified to be domestic for both Alexnet and the object detection. The 6020 of these images from ILSVRC 2010 belong to 20 classes verified to be foreign for both Alexnet and the object detection. To adapt object detection for OSR, the detection score p is treated as a predicted degree of being domestic. The comparison results are provided in Table 11. The P-R plot is shown in Fig. 10. The details of this comparison and how the object detection is adapted for OSR are provided in the supplementary materials. Noted that this comparison is only meant to provide a preliminary perspective. There are factors—such as the underlying models, how the models are prepared, original numbers of classes used in training, and how the outputs are interpreted under OSR context—that may deserve more attention. A full potential OSR capability of object detection may be worth a dedicated study.

Table 11

Complementary comparison: dedicated OSR methods and off-the-shelf object detection on OSR (OD-OSR).

Metric	OpenMax	Exponential LC	Cubic LC	OD-OSR
Metric	OpenMax	∑iexp⁡(ai)	∑iai3	p
Q1	0.6127	0.6300	0.5767	0.6046
AUC of P-R	0.8713	0.8790	0.8663	0.7532

Figure 10

Precision-Recall plots of foreign detection in the complementary study.

Complementary comparison: dedicated OSR methods and off-the-shelf object detection on OSR (OD-OSR). Precision-Recall plots of foreign detection in the complementary study.

Discussion

Our investigation has revealed viability of Latent Cognizance (LC) in Open-Set Recognition (OSR) and re-affirmed the LC underlying hypothesis. In additions, our study introduces performance metric Q1, discloses some potential on detecting adversarial images, and shows that a base classifier can affect 19% or more on the final OSR performance. OpenMax and LC Both LC and OpenMax rely on penultimate values, but they use penultimate values differently. OpenMax uses statistics of penultimate values to estimate how far off the ones corresponding to the input are from their statistics. LC uses only a penultimate vector corresponding to the input to compute a marginalized cognizance. LC is simpler to implement, faster to compute, and yet able to deliver a similar level of effectiveness. Both LC and OpenMax are quite effective for OSR over a wide range of comfort ratios, but there are rooms for improvement. Besides having a better base classifier for both approaches—as our study has shown its great influence on the final performance—, OpenMax is implicitly based on a uni-modal assumption. Relaxation on this assumption might be a direction to further investigate an OpenMax approach. For LC, we see it analogous to Kahneman [22]'s System I decision. Toward OSR, LC could provide a quick judgment on validity of the input at hand. As human learns to balance decision from both quick/instinctive and slow/analytic systems, we vision LC or its derivative to work along with a more elaborated approach to provide OSR capability for various scenarios. It is always better to be prepared for what we might encounter, but it is nice to have a backup system—the instinctive system—that can work in some degree even for the scenario we might not anticipate. That is the position at we see LC in the big picture. Practical issue of LC formulation Regarding practical deployment and potential numerical issues, cubic cognizance may have an issue with negative logit values in the penultimate vector. Although this was not the case in our experiment, but this issue may arise in practice. Exponential cognizance does not have an issue with negative logits, but a large logit value may unstabilize the computation (as exponential cognizance could reach numerical infinity for a large logit). This could be mitigated by a numerically safer application of LC. For example, may be a safer version than used in our experiment. Note that this log formulation may help explain Nakjai and Katanyukul [4]'s experiment that thresholding on gives similar results to thresholding on marginalized exponential cognizance. That is because logarithm is an increasing monotonic function—making thresholding on marginalized exponential cognizance similar to doing on log marginalized exponential cognizance—and the term is likely to be small. On the concern over information loss Effectiveness of LC on OSR may have somewhat assuaged the concern over information loss discussed in Yoshihashi et al. [11]. In addition, marginalized exponential cognizance has been satisfactorily used to quantify a degree of being foreign in a counterfactual approach [9], as Neal et al. use logit values for their classifier output (see §2). Our results along with effectiveness shown in previous works [4], [6], [9] have supported the hypothesis underlying LC. OSR metric Q1 is based on averaging over performances of classification and foreign detection. It accounts for all cases (Table 2). The rationales are justified (from our current point of view), but an assessment on this and other OSR metrics should be properly studied in their own right. Additionally, metric Q1 measures the overall OSR performance, but we found that it is more beneficial to examine the break-down performances: classification and foreign identification. A unified metric that provides a more convenient way to examine overall OSR performance as well as its underlying factors could be greatly useful. Adversarial detection and inference uncertainty Although there might be some potential for adversarial detection, at this point we do not see adversarial detection as a promising side benefit of these OSR approaches. However, when domestic examples are broken down into correctly and incorrectly classified examples (Fig. 7 and 8b), the results seem more favorable from a perspective of inference uncertainty. These may suggest that while we are testing LC for foreign and adversarial detection, it may naturally be more suitable for providing inference uncertainty. In that case, the outcomes look much more decisively positive. Re-purposing LC for inference uncertainty does not lose its value as a mechanism for machine awareness. A lower value of marginalized cognizance indicates that the classification prediction is likely to be incorrect for either a wrong class, a foreign, or even an adversarial input. It still provides an awareness of the input being beyond a machine capability. OSR and object detection Comparing intrinsic OSR methods to object detection modified for OSR reveals a marginal benefit of dedicated OSR approaches over a simply modified object detection. Although the comparison is in a preliminary stage, this shows strong potential of extending object detection capability to address OSR. Since the mechanism behind many object detection systems also employs softmax, it is possible to enhance this capacity through LC. A further study on enhancing OSR capacity of object detection could benefit both domains. However, object detection has been trained on a different setting. Objects of a “foreign” class might accidentally be in some of the training images, but their class is not one of the target classes. (This situation makes it difficult to justify a class as foreign without inspecting all the images.) Therefore, the model might have learned the presumed foreign objects. Consequently, the class is not truly foreign for the model. Thus, the experiments and evaluation should carefully be conducted. Big picture Regardless of how OSR is carried out, it is a crucial step in machine intelligence. Enhancing classification with foreign identification is, to a large extent, analogous to enriching machine intelligence with an awareness of its own limitation. It is the awareness that the question is beyond what a system could answer. This awareness when fully developed could allow a safer measure against a dynamic and diverse setting on which an intelligent system will be deployed. Therefore, OSR and similar concepts in other domains should be sufficiently addressed for the development of a robust intelligent agent.

Conclusion

Our investigation has explored the application of Latent Cognizance (LC) to Open-Set Recognition (OSR) as well as its potential application to adversarial-image detection. Our findings reveal that LC performance is comparable to OpenMax (Exponential LC is no worse than 4% behind OpenMax, Table 6), but LC spends considerably less computing time (several magnitude less, Table 5) and it does not require an extra learning session. However, in practice, an extra learning session may be administered in order to determine a proper value for the marginalized-cognizance threshold. Regarding performance over various scenarios, LC is shown to be robust against a wide range of comfort ratios (performance varies less than 1% across the tested scenarios, Table 6). LC effectiveness, particularly as shown by foreign-identification accuracies (Table 8), supports its underlying hypothesis. Error analysis reveals that LC can very well distinguish domestic samples from foreign ones when samples are less ambiguous to the base classifier, but has difficulty when the base classifier underperforms (Table 7 and Fig. 6). The effect of the base classifier on the final OSR performance has empirically been estimated to be at least 19% (Table 9). Our preliminary investigation of LC application to an adversarial-image detection does not show an immediate prospect (Table 10 and Fig. 9), but it rather shines possibility of new exposition of LC as an inference uncertainty. Nevertheless, whether is LC conjectured as providing a degree of input being domestic or of inference uncertainty in the base classification, it still reflects a level of awareness on the input and capability of the underlying machine to process it. In addition to our focus on LC application to OSR, metric counts (Table 2) and a performance index Q1 (§4) designed for OSR are also introduced.

Declarations

Author contribution statement

Tatpong Katanyukul: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper. Pisit Nakjai: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data availability statement

The source code is available at https://github.com/beebrain/Lecognition-Openset.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

3 in total