Ji Tao1, Yap-Peng Tan1. 1. Division of Information Engineering, School of Electrical & Electronic Engineering, Block S1, Nanyang Avenue, Nanyang Technological University, Singapore.
Abstract
We propose a novel probabilistic approach to recognizing people entering and leaving a closed room in human work place or living environment. Specifically, people in the view of a monitoring camera are first tracked and represented using low-level color features. Based on a new color similarity measure, optimal recognition of people leaving and entering the room is carried out by probabilistic reasoning under the constraints imposed by the domain knowledge, e.g., a person currently inside a room cannot enter again without first leaving it, and vice versa. The novelty of our work mainly lies in the development of a systematic way to incorporate the correlation and constraint among a sequence of people observations, and the optimality of recognition is achieved by maximizing a joint posterior probability of the observations. Experimental results of real and synthetic data are presented to show the efficacy of the proposed approach.
We propose a novel probabilistic approach to recognizing people entering and leaving a closed room in human work place or living environment. Specifically, people in the view of a monitoring camera are first tracked and represented using low-level color features. Based on a new color similarity measure, optimal recognition of people leaving and entering the room is carried out by probabilistic reasoning under the constraints imposed by the domain knowledge, e.g., a person currently inside a room cannot enter again without first leaving it, and vice versa. The novelty of our work mainly lies in the development of a systematic way to incorporate the correlation and constraint among a sequence of people observations, and the optimality of recognition is achieved by maximizing a joint posterior probability of the observations. Experimental results of real and synthetic data are presented to show the efficacy of the proposed approach.
As automated video surveillance becomes increasingly important [9], [10], [18], we are endeavoring to develop a video-based system for closed-room people monitoring. In this paper, we present a promising system of our recent effort. The system consists of two modules: a video analysis module to detect and track people entering or leaving the only entrance/exit of a closed room (e.g., a research lab, class room or meeting room), and a people recognition module to correspond each observed person with a person previously entering the room or to identify him/her as a new person unseen before. The results obtained from the system allow the derivation of such useful information as to how long a person has stayed in the room, how many people are in the room during a certain period, and who they are. Potential applications of the system include, for example, identifying who could possibly be infected by a newly identified severe acute respiratory syndrome (SARS) victim [3], and characterizing the regular or abnormal human activities in a monitored work place.Besides our proposed approach, there exist other solutions to the problems of interest. For example, biometrics have been successfully used for person recognition and very satisfactory results have been achieved. Representative work in biometric-based recognition ranges from fingerprint and iris identification to face and gait recognition [1], [4], [11], [20]. However, many of these methods involve intrusive data collection, i.e., require human proactive action and collaboration in the course of identification or authentication, and thus work mainly in well-controlled environments. Although gait recognition partly addresses this limitation by exploring human motion characteristics, gait information, by itself, has limited discriminating power and only works for people whose motion patterns have been well characterized. In another approach, Nakajima et al. [17] attempt to recognize people by color-based and shape-based low-level features with the use of multi-class support vector machine classifiers (SVMs), where the people under monitoring are restricted to their lab members, for whom a database for recognition can be constructed beforehand.Regardless of the selection of feature representation, most of the existing approaches accomplish the recognition task based on some maximum likelihood classification rule [7], which is generally simple, effective and fast. Vega et al. [25] model people by their gait-based feature relational distribution and recognize a person by choosing the one with the smallest distribution distance from a test data set. McKenna et al. [14] attempt to recognize people after separation while tracking a group of people. They compare the color models of people after interaction with the models built before the interaction and correspond them to maximize a joint likelihood. It should be noted that these existing methods mainly perform recognition based on observations or features obtained at a single time instance or duration. The temporal correlation and constraint among observation sequence, however, are seldom utilized although they often exist in some specific contexts; for example, a person currently inside a room cannot be entering the room without first leaving it.In comparison, we develop a systematic approach to exploit the inherent correlation and constraint among multiple people observations for improving the recognition accuracy. Specifically, an HMM-like, probabilistic inference framework is proposed to incorporate domain-specific knowledge (i.e., a closed-room with a single entrance/exit) for enhanced people recognition. To be unintrusive, we choose color histogram as the feature for people recognition for it can be easily and reliably extracted using a low-cost camera from a distance. Rather than using only a single observation, our method is capable of performing optimal recognition based on multiple observations acquired at different time instances to maximize a joint posterior probability. As a result, it can effectively enhance the limited discriminating power of low-level features, such as color histogram. Experimental results demonstrate that the proposed approach can obtain promising recognition rates, as compared with the existing maximum likelihood approaches.
Video analysis
People detection & tracking
In this module, we extend the color-based people detection and tracking system developed in our previous work [13], [24]. To work in a more perceptually uniform color space, we choose the HSV (hue, saturation, and value) color space and represent the color of each pixel with a vector of three color attributes . To have a reference background for change detection, we use a background initialization scheme to construct the background model automatically from a scene that may contain moving objects or light flickers. Specifically, a pixel will not be deemed as a background pixel unless its color/intensity value remains stable for a sufficiently long period. The color attributes of each pixel during this period are then modeled with a multivariate Gaussian distribution as the pixel's background model.To accommodate scene change over time, we also update the background model adaptively based on the newly observed background pixels. The updating is carried out as follows:
where and denote the values of each newly observed background pixel at time , and is an updating parameter proportional to standard deviation . In our implementation, we set , for the reason that the faster or larger a pixel changes, the faster its background model should be updated.Since the chrominance information becomes less reliable when the scene is too dark, a value threshold and a saturation threshold are used to decide whether the observed hue information could be used for segmenting the foreground regions. (In our case, the foreground regions of interest are regions containing moving people.) Specifically, the following rules are applied to classify each pixel as either a background or a foreground pixel:Using the binary segmentation results, the module then localizes a moving person within a minimum bounding rectangle (MBR) and represents the person with a set of identity attributes, including a label, the size, and the centroid of the MBR.If the hue information is reliable, the pixel is marked as a foreground pixel if ; otherwise, it is marked as a background pixel.If the hue information is not reliable, the pixel is marked as a changed pixel if ; otherwise, it is marked as a background pixel.For people tracking, we extend the continuously adaptive mean shift (CAMSHIFT) algorithm [2] to work with the density function of the binary segmentation results. The extended CAMSHIFT algorithm searches within the current frame for the nearest foreground region of each moving person tracked in the previous frame. This is because each moving person in the current frame cannot move far away from his/her location in the previous frame in a video sequence of high frame rate—an assumption that holds for many general applications.During tracking, color histogram, a computationally easy color feature that is known to be rather invariant to changes of shapes and sizes [23], is acquired to represent the color appearance of each person. In our implementation, we quantize the coordinates of color space uniformly into 12(hue), 4(saturation) and 4(value) bins, respectively, resulting in a total of 192 quantized colors. To account for varying person size, the color histogram for each tracked person is normalized by the total number of foreground pixels of the person. Furthermore, to accommodate changes due to different lighting condition or camera view angle, the color histogram is updated as follows:where is a learning parameter, and is the color histogram obtained from the new frame acquired at time . Note that as the color histogram of a tracked person could be severely affected by other pixels, such as the shadow pixels or noise clutters due to occlusion, we also make use of a shadow-removal scheme in the proposed system to enhance the quality of each extracted color histogram [24].The information acquired during the tracking of a person, including the size variation and motion trajectory of a tracked person, is then used to judge whether the person is entering or leaving a room as seen by the camera (Figs. 1
and 2
).
Fig. 1
System overview with two sample frames illustrating a person entering and leaving a closed room with a single entrance/exit.
Fig. 2
Sample images of the people detection and tracking results: (a) a person is leaving the lab, and (b) another person is entering the lab.
System overview with two sample frames illustrating a person entering and leaving a closed room with a single entrance/exit.Sample images of the people detection and tracking results: (a) a person is leaving the lab, and (b) another person is entering the lab.
Similarity measure
Color histogram intersection has been widely used as a similarity measure for image recognition and retrieval [6], [8], [15]; however, direct application of this measure to gauge the likeness of people in our work poses some potential problems. For example, color histogram intersection of two different people is generally larger than zero for some of their appearance features, such as hair and skin, could share similar colors. On the other hand, the same person observed at different time instances may not have identical color histograms due to the differences in lighting conditions, camera view angles, and segmentation results. A function is therefore required to map the value of color histogram intersection of two observed people and , defined as , onto a similarity probability, , indicating whether and correspond to the same person.Conceivably, the function needs to possess the following properties: (1) it should be nondecreasing; (2) should be approaching 0 and 1 as takes values near the ends of the interval [0, 1], respectively; (3) the transition of from a low value to a high value should take place at where the value of becomes evident to support that the two observed people are identical. After some subjective studies and comparisons, we select the sigmoid function [12] to perform the required mapping, which is shown in Fig. 3
and can be expressed as
Fig. 3
The Sigmoid function to map color histogram intersection onto the similarity probability of two observed people.
The Sigmoid function to map color histogram intersection onto the similarity probability of two observed people.There are two parameters determining the shape of a sigmoid function curve: controlling the steepness of the transition and defining the center of the transition point. To assign proper values to the two parameters and , we have examined a large number of people segmentation results with different similarity degrees; some of them are indicated in Fig. 3 using the crosses, whose -coordinates are color histogram intersection values and -coordinates are the similarity degrees judged by human observers. We have found that setting the parameter pair to (16, 0.55) can fit the curve well.It should be noted that other features, such as face or gait features, for which a similarity measure is defined can be used in our proposed people monitoring system. To make the system less intrusive, we have only made use of color histogram in the work reported in this paper.
Probabilistic reasoning framework
Problem formulation
As hidden Markov models (HMMs) could be used to integrate prior knowledge and incomplete observation data to perform probabilistic reasoning [16], [22], [26], our first attempt is to establish a suitable HMM for the recognition task in our problem. In general, an HMM can be fully characterized by a set of parameters , where denotes the state transition probabilities, , the observation symbol probabilities, and , initial state probabilities. These parameters are usually pre-learned from a set of representative data based on a fixed number of states. This situation, however, is not applicable to our case, for the number of people as well as their activity patterns, i.e., the frequencies of entering and leaving, are generally different from place to place or time to time, and hard to estimate from prior data.To construct a framework that is well suited for the problem of our concern, we make use of the lattice structure and parameter setting of HMMs and formulate the problem of people recognition as follows. Assume that the room being monitored has only a single entrance/exit (referred to as a closed room) and is empty when the system is first activated. When a person is entering the room, we append a new state to the state set (database) to represent his/her identity; when a person leaves at time , he/she will be recorded as a new observation . Thus the states in the state set at time correspond to the current people identities in the database (i.e., people that possibly stay in the room at time ), and are denoted as . When an observation sequence has been obtained over a period of time, the identities of the people leaving the room can be recognized from the state sets, and , recorded over the same period. Characterizing this reasoning framework with a time-variant parameter set , we can use the Viterbi algorithm [19] to find an optimal state sequence associated with the observation sequence to maximize a joint posterior probability , so that each leaving person can correspond to one of those who are judged still inside the closed room.A recognition example of the proposed framework is illustrated in Fig. 4
, where the optimal state sequence obtained is highlighted in bold (red). From this path,we can identify as as , as , etc. We can see that the framework has a lattice structure similar to HMMs; however, there are several key differences that distinguish our framework from conventional HMMs. First, and most importantly, the parameter set is time variant and needs to be derived at each observation time instance based on all the previous possible states and current observations rather than from some training data. Second, the number of states in our model is not fixed but can increase over time before a decision is made. Third, the states can be indefinite because more than one states could be associated with the same person, e.g., states and in the example shown in Fig. 4.
Fig. 4
A recognition example of the proposed probabilistic reasoning framework.
A recognition example of the proposed probabilistic reasoning framework.It should be noted that in our framework a state could represent the feature model and the identity of a person simultaneously. For clarity, we shall use to denote a person's identity and his/her feature model.
Framework construction
It can be seen from our problem formulation that the main task in constructing the proposed framework lies in the estimation of the time-variant parameter set . Once is known, we can find the optimal path indicating the recognized people by a “turn-the-crank” procedure given by the Viterbi algorithm [5]. Our solutions are given as follows.The initial state distribution, :According to the definition of the state set , there are states (people) in the database when the first observation of exit is made at time . Without any other prior knowledge, we intuitively assume that everyone inside the room has an equal probability to leave the room. Hence,The output probability of state
at
time
, ():In analogy with the definition in HMMs, let be the probability of observation generated by state . We regard this probability as how likely an observed is due to person , and simply approximate it by Eq. (3) using their feature similarity asThe state transition probability, :Before proceeding to the next step, we define a set of probabilities between observation times and as shown in Fig. 5
. For concision, we shall refer to “observation time” as “time” hereafter without causing confusion. Let be the probability of person staying in the room at time instance (straightforwardly ), where and denote the time instances right after and before the observation being made. Let be a likelihood matrix measuring the similarities between people entering between and (i.e., ) and the people staying in the room at time (i.e., ), defined aswhere is the number of states (people who possibly stay in the room) at time and is the number of people entering between times and . With these auxiliary probabilities, we can now derive the transition probability using the following three steps:
Fig. 5
The likelihood matrix between times and and the probability of the existence of a person associated with a certain state at time instances and .
The likelihood matrix between times and and the probability of the existence of a person associated with a certain state at time instances and .(i) Transition probability, :The transition probability measures the odds of person leaving the room at time , given that person left at time , i.e., and . Without other prior knowledge, it is reasonable to assume that the probability for to leave at time is proportional to that of his/her existence in the room at time instance . Conceivably, a person cannot leave a room if he/she is not in the room at all. Thus, the transition probability can be computed aswhere the denominator is a normalization factor to make for each . The term in Eq. (7) can be further expanded for each particular state by using Bayesian rules aswhere is one of the possible realizations of over all , and or 0 represents the status of person being in or out of the room. By assuming that the status of each person is statistically independent of the others, the second term on the right hand side of Eq. (8) can be expressed asFrom Eqs. (7)–(9), we can see that the transition probability depends on two probabilities: and ; the former is the conditional odds of person staying in the room at time instance given the status of all the people observed up to time instance , and the latter is the probability of person being status at time instance given that person leaves the room at time . The calculation of these two probabilities is discussed below.(ii) Conditional probability, :We compute this conditional probability with the aid of the likelihood matrix defined in Eq. (6). The specific calculation is given bywhere is the entry of a modified matrix which is derived from to indicate how likely person is in fact person returning under a certain cond. We introduce the matrix in order to incorporate the domain knowledge in the context of the cond and improve our estimation. In particular, the value can be considered as the gain of the existence odds for person due to newly entered people between times and . This gain, along with the existence odds of person at time instance , which is the binary value of , constitutes his/her existence odds at time instance . For those newly added states where , the value can therefore be considered as the odds for person to exist as a new person. Note that in this framework we attempt to estimate these odds in an approximate manner, which is intuitive and has been found to be effective for improving the recognition rates.Specifically, is derived from aswhere is a 2-D normalization operation to be discussed later and
and are two indication factors incorporating the domain knowledge of a particular cond. Specifically, shows that it is impossible for one to enter a closed room if he/she is currently inside, while accounts for the fact that a person staying in the room could not enter the room again if he/she has not been observed leaving the room. is the most likely path ending in state at time , i.e., the most probable state sequence that is obtained by the Viterbi algorithm and includes all the people that have been observed leaving the room up to time .To illustrate it, we provide in the following a numerical example, in which five people enter a room with three of them leaving, forming a process of three observations as shown in Fig. 6
.
Fig. 6
An example illustrating the estimation of conditional probability.
An example illustrating the estimation of conditional probability.For the time pair shown in Fig. 6, suppose that the likelihood matrix obtained by Eq. (6) is equal toConsider one of the possible realizations of people's status at time instance , , ; that is, or all people are outside the room at time instance except for person . The conditional probabilities that need to be computed are .First, we incorporate the knowledge of the people status into the likelihood matrix by using the two indication factors. From Eq. (12), and the third row of is set to zero. This is because if is known to be inside the room, then neither nor can be regardless of their similarities. The second indication factor relies on the most likely path ending in the state for which the transition probability needs to be calculated. Assume that at time is the state and the most likely path is as shown in Fig. 6. It can be seen that is not on this optimal path, which means person has not left the room since he/she entered. So none of and could be and according to Eq. (13). With this domain knowledge, the original likelihood matrix can be modified asIn this example, the two indication factors affect the same row of the likelihood matrix . In general, the adjustments could be made on different rows depending on the cond as well as the state chosen for evaluating the transition probability.Next, the adjusted likelihood matrix is normalized so that the summations of the likelihoods corresponding to all possible circumstances are equal to one. To do this, we introduce a normalization operation, referred to as the correlated normalization operation or operation, which works as follows. Consider a person and a group of candidates consisting of people, and define the similarity measure . Under the constraint that at most one person (or none) of could be , we can obtain the following new similarity measures
To make the probabilities corresponding to all possible circumstances sum up to one, the operation is defined to normalize the above new similarity measures asThis normalization operation aims to correlate likelihoods that are measured independently by imposing the constraint mentioned before Eq. (14). It should be noted that this constraint can be applied to the adjusted likelihood matrix both row-wise and column-wise. For instance, at most one of and (or none of them) could be , while could be at most one of , , and (or none of them, i.e., is a new person) in the example shown in Fig. 6. For simplicity, we apply a 2-D operation , a row-wise normalization followed by a column-wise normalization, to normalize the likelihood matrix and obtainFrom this normalized likelihood matrix , we can obtain useful information such as the existence odds of is increased by 0.05+0.72 = 0.77, the odds that is a new person is , etc. Consequently, the existence odds at time instance under can be obtained using Eq. (10) asApplying the same procedure to all possible cond's, we can obtain all the conditional probabilities required in Eq. (8), one of the two prerequisites for computing the transition probability as discussed in step (i).(iii) Probability:The remaining unknown for calculating the transition probability is , which is the existence odds of person at time instance given that the person leaving at time is . To illustrate how to compute this unknown, we shall extend the example of Fig. 6 to time and show instead how to compute the unknown by making use of the probabilities obtained so far and assuming the full knowledge of , where is the previous state of in the most likely path . By the same procedure, can be similarly estimated based on . Note that, , and are indices of the states at times , , and , respectively.As shown in Fig. 7
, assume that person leaves the room both at time and at time , i.e., and (denoted by the shaded circles). We manifest the values of , which are assumed to be known, on the right hand side of the observation made at time , and the computed values of on the left hand side of the observation made at time . The estimation process of the probabilities on the right hand side of the observation and the final results are shown in Table 1
.
Fig. 7
An example to illustrate the estimation of existence odds.
Table 1
The estimation of the existence odds
P[sj,t+=1|qt=s1]
P[sj,t+1-=1|qt=s1]
γ
γ˜
P[sj,t+1+=1|qt+1=s1]
s1
0
0.77
0.77
1
0
s2
0.30
0.50
0.20
0.16
0.46
s3
0.70
0.75
0.05
0.04
0.74
s4
N/A
0.75
0.75
0.61
0.61
s5
N/A
0.23
0.23
0.19
0.19
An example to illustrate the estimation of existence odds.The estimation of the existence oddsWe first examine the gain of existence odds from time instance to for each person: . Clearly, because the gain in existence odds is due to the two newly entered people. However, if we know for sure that leaves the room at time , then he/she must stay in the room at time instance , and thus, should be equal to 1 rather than 0.77. In other words, the gain for each person needs to be re-adjusted ( in the table) to incorporate the knowledge of this new assumption to ensure that . That is equal to one can be also explained as follows: since leaves at time and time successively, one of the entering persons and must be . As there is no reason to favor any person other than , our approach is to increase 's existence odds from 0.77 to 1 and decrease the others’ proportionally. This is sensible as once we know that the person who leaves the room at time is , then and should be more likely to be than what we originally estimate, and as a result, less likely to be the other people. At time instance , the existence odds of becomes zero due to his exit, and the others’ can be obtained as the summation of their existence odds at time instance and the re-adjusted gain .The above analysis and derivation can be generalized into the formulas below:
It can be seen from these formulas that the estimation of existence odds is recursive; that is, one's existence odds at time instance depends on that at . To initialize the estimation, we set for all and if is the first person leaving the room (i.e. ), since all the people who have entered the room, except for , should stay inside at time instance .It should also be noted that although the derivation may appear somewhat complex, it leads to an important underlying property of the proposed framework: the summation of the existence odds of all states at any time instance , i.e., for any and , is always equal to the number of people who stay in the room at that time.Batch recognition of observations (people who have left the room) can be performed at any time when necessary by retrieving the state sequence with the maximum score of joint posterior probability as the optimal state sequence.To use the proposed framework for recognizing people re-entering the room is equivalent to finding and merging those states associated with a same person in the optimal state sequence. This can be accomplished by the following local maximum likelihood scheme. Let and search backward in the optimal state sequence for , where and is minimized. If such exists, the person will be assumed as the person re-entering, where is obtained asand is the state set at time . Then, should be merged into and removed from all the state sets containing it thereafter. The backward search is performed from to the end of the optimal state sequence until all the states possibly representing a same person are disclosed and merged.
Case studies
In this section, we study two special cases in which certain extreme conditions are assumed so that intuitive analysis can be performed. By comparing the results obtained by our model and that of the intuitive analysis, we demonstrate that the proposed model can produce reasonable results even under such extreme conditions.
Case 1: Successive exits
In this case, we assume that three people enter a room initially and then leave successively without any other new entries among them. The model established is shown in Fig. 8
, where the likelihood matrices are null as nobody enters between two observations.
Fig. 8
A model illustrating the case in which three people enter and leave a closed room successively, where the dash arrows indicate the transitions that are not allowed.
A model illustrating the case in which three people enter and leave a closed room successively, where the dash arrows indicate the transitions that are not allowed.The transition matrix is first computed at time by using Eqs. (7)–(19) derived in Section 3.2 and obtained as The diagonal entry is zero, which is in accordance with the intuitive analysis that if leaves at time and nobody enters before the next observation time , could not leave at time again.Moving on to the next observation time , we further assume that the previous states of , and in their own most likely paths are , and at time , respectively, which can be obtained by using the Viterbi algorithm based on and . Hence, the transition matrix at time can be computed as That means, from time to time , each state has only one possible choice of transition. Because, for instance, if leaves at time and its previous state has already left at time , the person who leaves the room at time must be . More specifically, to select which state at time depends on not only the states at the previous time but also those prior to .
Case 2: Ideal similarity measure
In this case, we shall assume that the ideal similarity measure between two people can be a binary value of either one (identical) or zero (distinct). In Fig. 9
, two people enter first and one of them leaves afterward. After that one person enters again followed by one exit. We suppose that looks the same as but has different appearance from , i.e. and (likelihood matrix ). The transition matrix can be computed as
Fig. 9
A model illustrating the case in which has perfect similarity with and .
A model illustrating the case in which has perfect similarity with and .We can interpret this process based on our general knowledge. If leaves at time , will be viewed as re-entering due to their same appearance. Therefore, at time instance , and must stay in the room while cannot be a new person. and then take the equal probability to leave at time , which is reflected in the transition matrix as . If leaves at time , it is interesting to see that has to be a new person although he/she has the same appearance with since is already known to be inside. (One possibility is that and are identical twins wearing the same clothes.) In this case, and are equally possible to leave at time while could not leave again. In the transition matrix, it is shown as . Once again, the proposed model obtains the same result as that derived by the intuitive analysis based on our general knowledge.
Experimental results
We have tested the proposed people monitoring system with two videos acquired in a research laboratory using a camera to monitor the lab's only entrance. During 1-h monitoring period, video-I recorded eleven different people who were unaware of the experiment, of which four entered and left twice, another four entered and left only once, and three entered without leaving. Video-II simulated the process of people entering and leaving with the help of eight different students, among whom seven entered and left the lab for three times and another one twice. To further increase the test samples, a synthetic process generator is also designed to randomly re-arrange the entries and exits from the two videos based on the rule that one cannot enter unless he/she is outside the lab, and vice versa. This generator allows us to simulate a wide variety of combinations of entries and exits over time from the same group of people.For comparison, we also implement a recognition approach based on maximum likelihood (ML) classification [14], [21] as follows. When a person is detected entering a closed room, he/she is compared with people who are in the system's database and currently labeled as ‘out’. If the maximum likelihood with respect to an ‘out’ person is larger than a threshold (set to 0.7 in our implementation), they are considered the same person. Then, the observed person is labeled as ‘in’ and his/her corresponding color histogram in the database is updated with the latest one. Otherwise, the person is assumed to be new and then labeled as ‘in’ with a new identity label. When there are multiple exits without entries among them, the leaving people are recognized from the people with label ‘in’ by maximizing a joint likelihood.To ensure that the system can perform as desired, we assume that each person has consistent color appearance (both frontal and back views) and it remains the same during the period of observation. This is because our proposed system uses only the color appearance of each person to perform recognition and the camera looks only at a fixed direction. This limitation can likely be circumvented by incorporating other possible features, such as face and thermal infrared imagery, or using more than one camera looking at different directions. We consider this as a potential future research direction.Fig. 10
shows the sample images of the eight people (1–8) recorded in video-II after foreground segmentation. Table 2
shows an example sequence of entries and exits obtained by the synthetic process generator based on these eight people, where ‘I’ and ‘O’ represents in (enter) and out (exit), respectively. A total of 44 entries and exits are observed, among them 22 are entering while the other 22 are exiting.
Fig. 10
The segmentation results of eight different people recorded in video-II, with their entering (frontal) images in the first row and leaving (backward) ones in the second row.
Table 2
An example sequence of entries (I) and exits (O) produced by the synthetic process generator using the eight people observed in video-II
Person
P6
P2
P6
P6
P1
P1
P3
P2
P1
P8
P2
P1
P7
P5
P3
In/out
I
I
O
I
I
O
I
O
I
I
I
O
I
I
O
Person
P5
P1
P8
P7
P3
P5
P2
P6
P4
P6
P8
P5
P8
P7
P2
Cont’d
O
I
O
O
I
I
O
O
I
I
I
O
O
I
I
Person
P7
P2
P6
P5
P7
P4
P4
P4
P7
P1
P3
P8
P8
P5
Cont’d
O
O
O
I
I
O
I
O
O
O
O
I
O
O
The segmentation results of eight different people recorded in video-II, with their entering (frontal) images in the first row and leaving (backward) ones in the second row.An example sequence of entries (I) and exits (O) produced by the synthetic process generator using the eight people observed in video-IIThe recognition results of the eight people (at observation time) are given in Table 3
. The results obtained by our proposed approach and the ML approach are compared with the ground truth. In this particular process, our approach achieves 100 percent recognition rates, outperforming the ML approach. In the ML approach, the system wrongly recognizes as at time 7 and reversely at time 19. Furthermore, is incorrectly identified as at time 18 and on the contrary at time 20. The errors of recognizing different people as the same one are mainly due to that the similarities between the color histograms of these different people exceed the preset threshold . On the contrary, the reason for the errors at time 10, 13 and 15, where and are identified as new people when they are in fact just re-entering, is that the similarities of their color histograms observed at different times are lower than . In other words, the results of the ML approach are rather sensitive to the threshold , inappropriate selection of which often causes such false recognitions as mentioned above. In contrast, our proposed approach benefits from using a threshold-free scheme; therefore, it is more robust to less-than-perfect segmentation results as well as changes in lighting conditions or view angles.
Table 3
Comparison of the recognition results obtained by the maximum likelihood (ML) approach and our proposed approach
Obs. time
1
2
3
4
5
6
7
8
9
10
11
Ground truth
S1
S4
S2
S4
S5
S10
S7
S9
S2
S1
S10
Proposed approach
S1
S4
S2
S4
S5
S10
S7
S9
S2
S1
S10
ML approach
S1
S4
S2
S4
S5
S10
S4_
S9
S2
S3_
S10
cont’d
12
13
14
15
16
17
18
19
20
21
22
Ground truth
S7
S9
S2
S1
S14
S14
S9
S4
S5
S7
S10
Proposed approach
S7
S9
S2
S1
S14
S14
S9
S4
S5
S7
S10
ML approach
S7
S17_
S2
S3_
S14
S14
S12_
S16_
S9_
S7
S10
Comparison of the recognition results obtained by the maximum likelihood (ML) approach and our proposed approachAt each observation time, there is a most likely path ending in each individual state with the probability of , which can be iteratively computed as using the Viterbi algorithm. To make a recognition decision at time is to choose from all the most likely paths the one with the largest value of . Hence, a confidence index can be defined in the range of [0, 1] to evaluate the reliability of a decision made at each timeThe variation of the confidence index over time for the above example is shown in Fig. 11
.
Fig. 11
The confidence index over decision time.
The confidence index over decision time.At the early monitoring stage, the system has only few choices for making decision so that the confidence index is usually high. With the increase of the state number, i.e., more possible paths to choose from, the reliability of a decision may decrease. However, the advantage of our approach lies in that it is capable of maintaining the decision reliability at a later observation time by collectively considering all the available observations. In Fig. 11, when the confidence index is lower than 0.8, the ML approach is likely to make a wrong decision at the corresponding time (as indicated by the (red) circles). Meanwhile, the proposed approach can still find the right choice since the probabilities (or scores) of the other paths are even lower. Note that as the recognition decision made at time depends upon all the observations (not the decisions) obtained prior to time in a sense of maximizing the joint posterior probability of multiple observations—a merit of the Viterbi algorithm—our proposed system can minimize the chance of error propagation. However, if there is a substantial change in the lighting condition or a person changes his/her color appearance (e.g., clothes) during the period of observation, the proposed framework may generate fallacious state set and degrade the recognition performance.Overall, 20 synthesized processes of entries and exits are generated from each of the two test videos. We set to 0.7 to obtain the best results of the ML approach. The ground truth is obtained manually. It is evident from Table 4
that our proposed approach can improve the recognition rates notably.
Table 4
Comparison of the recognition rates obtained by the maximum likelihood (ML) approach and our proposed approach
Data
ML approach
Proposed approach
Video-I (%)
83.3
99.6
Video-II (%)
75.0
97.5
Comparison of the recognition rates obtained by the maximum likelihood (ML) approach and our proposed approachIt should also be noted that the computational complexity of our algorithm may increase rapidly when more states are generated. However, it is possible for one to make a definite decision when the confidence index is sufficiently high so that states associated with the same person can be merged to reduce the total number of the states, e.g., at time 16 in Fig. 11. We shall investigate this in our future work.
Conclusions
We have presented in this paper a novel probabilistic reasoning framework for people monitoring in a closed environment. The main contribution of our work lies in the development of a systematic way to exploit the temporal correlation and domain constraint of multiple people observations for improving the recognition accuracy compared with that of conventional maximum-likelihood approaches. Rather than identifying each single observation from a database, the framework is devised to recognize people based on multiple observations acquired at different time instances. Experimental results demonstrate that the proposed approach outperforms the conventional maximum-likelihood approach when using the same feature and being tested with the same test videos.It should also be noted that although we focus mainly on a system suitable for closed-room monitoring, we expect such correlation and constraint to be present in many application scenarios, e.g., an interconnected network of cameras monitoring multiple entrances/exits or adjacent areas of a large site, and our proposed approach can be readily extended to these applications.