Literature DB >> 24982986

Pain expression recognition based on pLSA model.

Abstract

We present a new approach to automatically recognize the pain expression from video sequences, which categorize pain as 4 levels: "no pain," "slight pain," "moderate pain," and " severe pain." First of all, facial velocity information, which is used to characterize pain, is determined using optical flow technique. Then visual words based on facial velocity are used to represent pain expression using bag of words. Final pLSA model is used for pain expression recognition, in order to improve the recognition accuracy, the class label information was used for the learning of the pLSA model. Experiments were performed on a pain expression dataset built by ourselves to test and evaluate the proposed method, the experiment results show that the average recognition accuracy is over 92%, which validates its effectiveness.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2014 PMID： 24982986 PMCID： PMC3985295 DOI： 10.1155/2014/736106

Source DB: PubMed Journal: ScientificWorldJournal ISSN： 1537-744X

1. Introduction

In recent years, tremendous amounts of researches have been carried out in the field of automatic expressions (such as pain, anger, and sadness) recognition from video sequence. Pain is a subjective and personal experience, and pain recognition is still difficult. There are numerous potential applications for pain recognition. Doctors can recognize pain when patients are experiencing genuine pain so that their pains are taken seriously, like young children who could not self-report pain measures, or many patients in postoperative care or transient states of consciousness, and with severe disorders requiring assisted breathing, among other conditions [1, 2]. Real-time automatic system can be trained which could potentially provide significant advantage in patient care and cost reduction. Measuring or monitoring pain is normally conducted via self-report as it is convenient and requires no special skill or staffing. However, self-report measures cannot be used when patients cannot communicate verbally. Many researchers have pursued the goal of obtaining a continuous objective measure of pain through analyses of tissue pathology, neurological “signatures,” imaging procedures, testing of muscle strength, and so on [3]. These approaches have been fraught with difficulty because they are often inconsistent with other evidence of pain [3], in addition to being highly invasive and constraining to the patient. The experience of pain is often represented by changes in facial expression. So, facial expression is considered to be the most reliable source of information when judging the pain intensity experienced by another. In the past several years, significant efforts have been made to identify reliable and valid facial indicators of pain [4-14]. In [4, 5], an approach was developed to automatically recognize acute pain; active appearance models (AAM) were used to decouple shape and appearance parameters from face images; based on AAM, three pain representations were derived. And then SVM were used to classify pain. In [6-10], Prkachin and Solomon validated a facial action coding system (FACS) based measure of pain that can be applied on a frame-by-frame basis. But these methods require manual labeling of facial action units or other observational measurements by highly trained observers [15, 16], which is both timely and costly. Most must be performed offline, which makes them ill-suited for real-time applications in clinical settings. In [11], a robust approach for pain expression recognition was presented using video sequences. An automatic face detector is employed which uses skin color modeling to detect human face in the video sequence. The pain affected portions of the face are obtained by using a mask image. The obtained face images are then projected onto a feature space, defined by Eigenfaces, to produce the biometric template. Pain recognition is performed by projecting a new image onto the feature spaces spanned by the Eigenfaces and then classifying the painful face by comparing its position in the feature spaces with the positions of known individuals. Zhang and Xia [12] used supervised locality preserving projections (SLPP) to extract feature of pain expression, and multiple kernels support vector machines (MKSVM) are employed for recognizing pain expression. Methods described above used static features to character pain expression, but these static features cannot fully represent pain. In this paper, we propose a method for automatically inferring pain form video sequences. This approach includes two steps: extracting feature of pain expression and classifying pain expression. In the extracting feature, features of pain expression are extracted by motion descriptor based on optical flow. Then we convert facial velocity information to visual words using “bag-of-words” models, and pain expression is represented by a number of visual words; final pLSA model is used for pain expression recognition. In addition, in order to improve the recognition accuracy, the class label information was used for the learning of the pLSA model. The paper is structured as follows. After reviewing related work in this section, we describe the pain feature extraction based on optical flow technique and “bag-of-words” models in Section 2. Section 3 gives details of pLSA model for recognizing pain expression. Section 4 shows experiment result, also comparing our approach with three state-of-the-art methods, and the conclusions are given in the final section.

2. Pain Expression Representation

2.1. Facial Velocity Feature

According to the physiology, the experience of pain is often represented by changes in facial expression and the expression is a dynamic event; it is must be represented by the motion information of the face. So, we use facial velocity features to characterize pain. The facial velocity features (optical flow vector) are estimated by optical flow model, and each pain expression was coded on a 4-level intensity dimension (A–D): “no pain,” “slight pain,” “moderate pain,” and “severe pain.” Given a stabilized video sequence in which the face of a person appears in the center of the field of view, we compute the facial velocity (optical flow vector) u = (u , u ) at each frame using optical flow equation, which is expressed as where where (x, y, t) is the image in pixel (x, y) at time t, where I(x, y, t) is the intensity at pixel (x, y) and time t, u , u are the horizontal and vertical velocities in pixel (x, y). We can obtain u = (u , u ) by minimizing the objective function: There are many methods to solve the optical flow equation. We use the iterative algorithm [17] to compute the optical flow velocity: where k is the number of iterations, initial value of velocity u 0 = u 0 = 0, and is the average velocity of the neighborhood of point (x, y). The optical flow vector field u is then split into two scalar fields u and u , corresponding to the x and y components of u [18]. u and u are further half-wave rectified into four nonnegative channels u +,u −,u +, and u −, so that u = u + − u − and u = u + − u −. These four nonnegative channels are then blurred with a Gaussian kernel and normalized to obtain the final four channels ub ,ub ,ub , and ub . Facial pain expressionisrepresented by velocity features that are composed of the channels ub ,ub ,ub , and ub of all pixels in facial image. Because pain expression can be regarded as facial motion, the velocity features can describe pain effectively, in addition to the velocity features having been shown to perform reliably with noisy image sequences [18], and have been applied in various tasks, such as action classification and motion synthesis. But the dimension of these velocity features is too high (4 × N × N, where N × N is image size) to be used directly for recognition and, so, we convert these velocity features into visual words using “bag of words” [19, 20].

2.2. Visual Words for Characterizing Pain

The “bag-of-words” model was originally proposed for analyzing text documents, where a document is represented as a histogram over word counts. In this paper, each facial image is divided into blocks whose size is L × L, and each image block is represented by optical flow vector of all pixels in the block. On this basis, pain is represented by visual words using the method of BoW (bag of words). To construct the codebook, we randomly select a subset from all image blocks; then, we use k-means clustering algorithms to obtain V clusters. Codewords are then defined as the centers of the obtained clusters, namely, visual words. In the end, each face image is converted to the “bag-of-words” representation by appearance times of each codeword in the image that is used to represent the image, namely, BoW histogram. The step for characterizing pain is as follows.

Step 1

Optical flow channels ub ,ub ,ub , and ub are computed.

Step 2

Each facial image is divided into n × n blocks, which is represented by optical flow vector of all pixels in the block.

Step 3

Vision words are obtained using k-means clustering algorithms.

Step 4

Pain expression is represented by BoW histogram d: where n(I, w ) is the number of visual word w included in image and M is the number of vision words in word sets. Figure 1 shows an example of our “bag-of-words” representation.

Figure 1

The processing pipeline of the “bag-of-words” representation: (a) give an image, (b) divide into L × L blocks, (c) represent each block by a “visual word,” and (d) ignore the ordering of words and represent the facial image as a histogram over “visual words.”

3. pLSA-Based Pain Expression Recognition

We use the pLSA models [21] to learn and recognize human pain. Our approach is directly inspired by a body of work on using generative topic models for visual recognition based on the “bag-of-words” paradigm. The pLSA models have been applied to various computer vision applications, such as scene recognition, object recognition, action recognition, and human detection [22-26].

3.1. Probabilistic Latent Semantic Analysis (pLSA)

pLSA is a statistical generative model that associates documents and words via the latent topic variables, which represents each document as a mixture of topics. We briefly outline the principle of the pLSA in this subsection. The model of pLSA is shown in Figure 2.

Figure 2

Graph model of pLSA. Nodes represent random variables. Shaded nodes are observed variables and unshaded ones are unseen variables. The plates stand for repetitions.

Suppose document, word, and topic are represented by d , w , and z , respectively. The joint probability of document d , topic z , and word w can be expressed as where p(ω | z ) is the probability of word ω occurring in pain category z , p(z | d ) is the probability of topic z occurring in image d , and p(d ) can be considered as the prior probability of d . The conditional probability of p(ω | d ) can be obtained by marginalizing over all the topic variables z : Denote n(d , ω ) as the occurrence of word ω in image d ; the prior probability p(d ) can be modeled as A maximum likelihood estimation of p(ω | z ) and p(z | d ) is obtained by maximizing the function using the expectation maximization (EM) algorithm. The objective likelihood function of the EM algorithm is or The EM algorithm consists of two steps: an expectation (E) step computes the posterior probability of the latent variables, and a maximization (M) step maximizes the completed data likelihood computed based on the posterior probabilities obtained from E-step. Both steps of the EM algorithm for pLSA parameter estimate are listed below. E-Step. Given p(ω | z ) and p(z | d ), estimate p(z | d , w ): M-Step. Given the estimated p(z | d , w ) in E-Step and n(d , w ), estimate p(w | z ) and p(z | d ): where n(d ) = ∑ n(d , w ) is the length of document d . Given a new document, the conditional probability distribution over aspect p(z | d new) can be inferred by maximizing the likelihood of d new using a fixed word-aspect distribution p(ω | z ) learned from the observed data [21]. The iteration of inferring p(z | d new) is the same as the learning process except that the word-topic distribution p(ω | z ) in (12) is a fixed value, that is, learned from training data.

3.2. pLSA-Based Pain Expression Recognition

In this paper, we treat each block in an image as a single word w , an image as a document d , and a pain category as a topic variable z . For the task of pain classification, our goal is to classify a new face image to a specific pain class. During the inference stage, given a testing face image and the document specific coefficients p(z | d test), we can treat each aspect in the pLSA model as one class of pains. So, the pain categorization is determined by the aspect corresponding to the highest p(z | d test). The pain category k of d test is determined as For pain recognition with large amount of training data, this would result in long training time. In this paper, we adopt a supervised algorithm to train pLSA, which is similar to [27]. Each image has class label information in the training images, which is important for the classification task. Here, we make use of this class label information in the training images for the learning of the pLSA model, since each image directly corresponds to a certain pain class on train sets; the image for training data becomes observable. This model is called supervised pLSA (SpLSA). The graphical model of SpLSA is shown in Figure 3.

Figure 3

Graph model of SpLSA. Nodes represent random variables. Shaded nodes are observed variables and unshaded ones are unseen variables. The plates stand for repetitions.

The parameter p(w | z ) in the training step defines the probability of a word w drawing from a topic z . Letting each topic in pLSA correspond to a pain category, the distribution p(w | z ) in the training can be simply estimated as where n is the number of the images corresponding to the kth pain class and n is the number of the jth word (block) in the images corresponding to the kth pain class. This means that the p(ω | z ) calculated by this way can be used to initialize the p(w | z) in the EM algorithm for model learning, which makes the EM algorithm converge more quickly. The supervised training of pLSA is summarized in Algorithm 1. Once the distribution p(w | z) is computed by the EM algorithm, for a testing face image d test, the posterior distribution p(z | d test) can be calculated the same as in original pLSA. The training of pLSA for classification is summarized in Algorithm 2.

Algorithm 1

Supervised training of the pLSA. For all k and j, calculate as the initialization of the p(w | z) and random initialization of the p(z | d). E-Step: for all (d , w ) pairs, calculate M-Step: substitute p(z | d , w ) as calculated in Step 2; for all k and j, calculate M-Step: substitute p(z | d , w ) as calculated in Step 2; for all i and k, calculate

Step 5

Repeat Steps 2–4 until the convergence condition is met. The supervised training algorithm not only makes the training more efficient, but also improves the overall recognition accuracy significantly.

Algorithm 2

Training of the pLSA for classification For all k and j, calculate E-Step: for all (d test, w ) pairs, calculate Partial M-Step: fix p(w | z ) as calculated in Step 1; for all k, calculate Repeat Steps 2 and 3 until the convergence condition is met. Calculate pain class

4. Experimental Results and Analysis

The effectiveness of the proposed algorithm was verified by using C++ and MATLAB hybrid implementation on a PC with Pentium 3.2 GHz processor and 4 G RAM. We have built a database of painful and normal face images. In this database, there are four groups of images (“no pain,” “slight pain,” “moderate pain,” and “severe pain”), and each group includes 20 males and 20 females. The face images were taken under various laboratory-controlled lighting conditions, and each face image was normalized to a size of 64 × 64. Some sample images are shown in Figure 4.

Figure 4

Examples of recognizing pain expression from facial videos. (a) No pain, (b) slight pain, (c) moderate pain, and (d) severe pain.

In experiments, 30 face images per class are randomly chosen for training, while the remaining images are used for testing. We preprocessed these images by aligning and scaling them so that the distances between the eyes were the same for all images and also ensuring that the eyes occurred in the same coordinates of the image. We run the system 5 times and obtain 5 different training and testing sample sets. The recognition rates were found by averaging the recognition rate of each run. Each facial image is divided into blocks whose size is L × L. First, we studied the effect of the size of image block on the recognition accuracy. Figure 5 represents the recognition accuracy curve with different block sizes L. It can be concluded that the accuracy peaks when the block sizes L is 8. Therefore L is set as 8.

Figure 5

Recognition accuracy curve with different block sizes.

In order to determine the value of M, that is, the number of the visual word set, the relation between M and recognition accuracy was observed, which is displayed in Figure 6. It is revealed in Figure 4 that the recognition accuracy is risen up at the beginning with the increasing of M recognition and if M is larger than or equal to 60, the recognition accuracy is stabled to 0.922. As a result, M is set as 60.

Figure 6

Relation curve between M and accuracy.

To examine the accuracy of our proposed pain recognition approach, we compare our method to three state-of-the-art approaches for pain recognition using the same data. The first method is “AAM + SVM” [4], which used active appearance models (AAM) to extract face features and SVM to classify pain. The second method is “Eigenimage” [11], which used Eigenface for pain recognition. The third method is “SLPP + MKSVM” [12], which used SLPP to extract feature of pain expression and multiple kernels support vector machines (MKSVM) for recognizing. 200 different expression images are used for this experiment. Some images contain the same person but in different moods. The recognition results are presented in the confusion matrices shown in Table 1. Each cell in the confusion matrix is the average results; our results are at the upper left; the results of “AAM + SVM,” “Eigenimage,” and “SLPP + MKSVM” are presented in the upper right, the lower left, and the lower right, respectively. where A, B, C, and D indicate no pain, slight pain, moderate pain, and severe pain, respectively. As Table 1 shows, our method improves the recognition accuracies in all categories. It achieves 92.2% average recognition rate, whereas “AAM + SVM” obtain 81.2%, “Eigenimage” gets 82.5%, and “SLPP + MKSVM” attains 86.5%, as shown in Table 2. The reason is that we improve the recognition accuracy in the two stages of pain feature extraction and pain expression recognition. In the stage of pain feature extraction, we use motion features that are reliable with noisy image sequences and describe pain effectively, while other methods used static features, which cannot effectively describe the expression of pain. In the stage of expression recognition, we use bag-of-words framework and pLAS model to classify expression images. In addition, we make use of this class label information in the training images for the learning of the pLSA model, which can improve the overall recognition accuracy significantly.

Table 2

Comparison of different reported results on pain dataset.

Method	Accuracy (%)
Our method	92.20
“AAM + SVM” [4]	81.20
“Eigenimage” [11]	82.50
“SLPP + MKSVM” [13]	86.50

5. Conclusion

Pain recognition can provide significant advantage in patient care and cost reduction. In this paper, we present a novel method to recognize the pain expression and give the pain level at the same time. The main contribution can be concluded as follows. Visual words are used for pain expression. Optical flow model is used for extracting facial velocity features; then we convert facial velocity features into visual words using “bag-of-words” models. We use pLSA topic models for pain expression recognition. In our models the “latent topics” directly correspond to different pain expression categories. In addition, in order to improve the recognition accuracy, the class label information was used for the learning of the pLSA model. Experiments were performed on a pain expression dataset built by ourselves and evaluate the proposed method. Experimental results reveal that the proposed method performs better than previous ones.

(a)

	A	B	C	D
A	0.95	0.04	0.01	0.00
B	0.04	0.91	0.05	0.00
C	0.01	0.03	0.91	0.05
D	0.00	0.02	0.05	0.93

(b)

	A	B	C	D
A	0.84	0.10	0.03	0.02
B	0.11	0.78	0.08	0.03
C	0.03	0.09	0.79	0.08
D	0.01	0.05	0.12	0.83

(c)

	A	B	C	D
A	0.85	0.11	0.03	0.01
B	0.10	0.80	0.08	0.02
C	0.02	0.08	0.79	0.08
D	0.01	0.04	0.12	0.84

(d)

	A	B	C	D
A	0.88	0.09	0.02	0.01
B	0.09	0.82	0.07	0.02
C	0.02	0.08	0.83	0.07
D	0.00	0.02	0.08	0.90

8 in total

1. Automatically Detecting Pain Using Facial Actions.

Authors: Patrick Lucey; Jeffrey Cohn; Simon Lucey; Iain Matthews; Sridha Sridharan; Kenneth M Prkachin
Journal: Int Conf Affect Comput Intell Interact Workshops Date: 2009-12-08

2. Automatically detecting pain in video through facial action units.

Authors: Patrick Lucey; Jeffrey F Cohn; Iain Matthews; Simon Lucey; Sridha Sridharan; Jessica Howlett; Kenneth M Prkachin
Journal: IEEE Trans Syst Man Cybern B Cybern Date: 2010-11-22

3. Pain in children: comparison of assessment scales.

Authors: D L Wong; C M Baker
Journal: Pediatr Nurs Date: 1988 Jan-Feb

4. Encoding and decoding of pain expressions: a judgement study.

Authors: Kenneth M Prkachin; Sandra Berzins; Susan R Mercer
Journal: Pain Date: 1994-08 Impact factor: 6.961

5. Pain expression in patients with shoulder pathology: validity, properties and relationship to sickness impact.

Authors: Kenneth M Prkachin; Susan R Mercer
Journal: Pain Date: 1989-12 Impact factor: 6.961

6. The Painful Face - Pain Expression Recognition Using Active Appearance Models.

Authors: Ahmed Bilal Ashraf; Simon Lucey; Jeffrey F Cohn; Tsuhan Chen; Zara Ambadar; Kenneth M Prkachin; Patricia E Solomon
Journal: Image Vis Comput Date: 2009-10 Impact factor: 2.818

7. The structure, reliability and validity of pain expression: evidence from patients with shoulder pain.

Authors: Kenneth M Prkachin; Patricia E Solomon
Journal: Pain Date: 2008-05-23 Impact factor: 6.961

8. The consistency of facial expressions of pain: a comparison across modalities.

Authors: Kenneth M Prkachin
Journal: Pain Date: 1992-12 Impact factor: 6.961

8 in total

3 in total

1. The Navigation of Mobile Robot in the Indoor Dynamic Unknown Environment Based on Decision Tree Algorithm.

Authors: Yupei Yan; Weimin Ma; Yangmin Li; Sengfat Wong; Ping He; Shaoping Zhu; Xuemei Yin
Journal: Comput Intell Neurosci Date: 2022-06-20

2. Facial Pain Expression Recognition in Real-Time Videos.

Authors: Pranti Dutta; Nachamai M
Journal: J Healthc Eng Date: 2018-10-30 Impact factor: 2.682

3. Automatic Infants' Pain Assessment by Dynamic Facial Representation: Effects of Profile View, Gestational Age, Gender, and Race.

Authors: Ruicong Zhi; Ghada Zamzmi Dmitry Zamzmi; Dmitry Goldgof; Terri Ashmeade; Yu Sun
Journal: J Clin Med Date: 2018-07-11 Impact factor: 4.241

3 in total