Literature DB >> 35140779

Edge Detection-Based Feature Extraction for the Systems of Activity Recognition.

Muhammad Hameed Siddiqi1, Ibrahim Alrashdi1.   

Abstract

Human activity recognition (HAR) is a fascinating and significant challenging task. Generally, the accuracy of HAR systems relies on the best features from the input frames. Mostly, the activity frames have the hostile noisy conditions that cannot be handled by most of the existing edge operators. In this paper, we have designed an adoptive feature extraction method based on edge detection for HAR systems. The proposed method calculates the direction of the edges under the presence of nonmaximum conquest. The benefits are in ease that depends upon the modest procedures, and the extension possibility is to determine other types of features. Normally, it is practical to extract extra low-level information in the form of features when determining the shapes and to get the appropriate information, the additional cultured shape detection procedure is utilized or discarded. Basically, this method enlarges the percentage of the product of the signal-to-noise ratio (SNR) and the highest isolation along with localization. During the processing of the frames, again some edges are demonstrated as a footstep function; the proposed approach might give better performance than other operators. The appropriate information is extracted to form feature vector, which further be fed to the classifier for activity recognition. We assess the performance of the proposed edge-based feature extraction method under the depth dataset having thirteen various kinds of actions in a comprehensive experimental setup.
Copyright © 2022 Muhammad Hameed Siddiqi and Ibrahim Alrashdi.

Entities:  

Mesh:

Year:  2022        PMID: 35140779      PMCID: PMC8820868          DOI: 10.1155/2022/8222388

Source DB:  PubMed          Journal:  Comput Intell Neurosci


1. Introduction

The extensive applications of human activity recognition (HAR) in real-time surveillance, athletics, human to human communication, and healthcare have illustrated significant effects in rearing the quality of human lives. In the domains of telemedicine and healthcare, the remote analysis of the patients' activities and well treatment might provide the innovation in the development of such domains. This type of development trains the physicians in their corresponding decision-making procedures through remotely watching the activities of the patients, specifically the stroke patients [1]. This process describes a case study for monitoring the critical situations of the stroke patients in telemedicine. The condition of the stroke patients might be explained through the methods of activity recognition systems, which might help the experts to provide the daily recommendations to the patients. The activities like jogging and walking might be recognized for the stroke patients, and hence the recommendations might be provided remotely for the better treatment. Similarly, the psychoanalyst may utilize the technology of telecare for treating a patient with upcoming-distressing anxiety illness (UDAI) by remotely watching the exercises [2]. The methods have been developed to watch heart condition of the patients through telecare [3], which is a major application for the actions analysis. These innovations enable the experts and doctors to professionally watch and manage the diseases of the patients. Generally, HAR systems are recognizing the corresponding actions performed by a human through the collected data from various resources. In HAR, activities based on 2D vision are affected by illumination, obstruction, and shadow in real-time domains [4]. Wearable sensor-based human activity recognition has significant accuracy and instantaneous performance [5]. However, the collection of human actions through accelerometers and gyroscopes decreases the ease of human body and terminates the spontaneity of human computer interaction [6]. Currently, in telemedicine and healthcare, domains such neutral knowledge along with video technologies are employed that rise concealment issues. Then, it might lead to conditions where the humans might not recognize that their isolated data has been propagated and consequently convert exposed to a risk [7]. Kinect depth camera only captures the depth data and does not expose the identity and other sensitive information of the human in healthcare domains. This characteristic makes the depth camera a best choice against RGB camera, and thus, we select the depth camera instead of utilizing RGB camera for the proposed methodology. The recognition of human activities might be categorized in two categories: frame-based recognition and sequence-based recognition. In frame-based recognition, only the extant frame is utilized with or without the standard frame to identify the human activities from the inward video frames, while in sequence-based recognition, the regular motion of the feature points is considered amongst the current frame and the initial frame. Hence, the frame-based recognition might not have the capability accurately recognize the human activities; therefore, we focused on the sequence-based recognition in this work. Generally, the HAR systems have four stages: segmentation, feature extraction, feature selection, and recognition. A huge amount of work has been done for segmentation and recognition; however, very limited approaches have been proposed on feature extraction and selection. State-of-the-art methods are discussed in [8-10] for HAR system, where the authors proposed a new feature extraction method based on Haar wavelet transform to create the feature vector. Then, they utilized k-NN for recognizing the activities, and they claimed best accuracy. However, Haar wavelet transform has a technical limitation, which is not continuous and hence is not distinguishable [11]. Moreover, if the corresponding data is large, then the observation step of k-NN can be slow, which is one major limitation of k-NN. Similarly, activity recognition frameworks were developed by [12-14] that were based on temporal matching method in order to build the temporal features, which further define the difference of motion sequences from various time periods. However, most of these systems are too complex and inefficient for comprehensive dataset, due to which these systems dropped the classification rates. Hence, in this paper, we have designed an adoptive feature extraction method for HAR systems. The proposed method calculates the direction of the edges under the presence of nonmaximum conquest. The benefits are in ease that depends upon the modest procedures, and the extension possibility is to determine other types of features. Normally, it is practical to extract extra low-level information in the form of features when determining the shapes, and to get the appropriate information, the additional cultured shape detection procedure is utilized or discarded. As illustrated, the activity frames are very noisy, and before the analysis, it needs preprocessing; therefore, we used histogram equalization for this step. Moreover, the whole frames consist of features named sludge that may be noticed for various edge operators to extract the edge, though without considering noise. The feature vectors for the proposed methodology have been generated using hysteresis thresholding, where the thresholds were selected manually for the best performance. In the proposed methodology, the elementary factors nicely reply to the noise, and it is hard to choose a threshold which discloses a main portion of the sludge boundary. Basically, this method enlarges the percentage of the product of the signal-to-noise ratio (SNR) and the highest isolation along with localization. During the processing of the frames, again some edges are demonstrated as a footstep function; the proposed approach might give better performance than other operators. The appropriate information is extracted to form feature vector, which is further fed to the classifier for activity recognition. We assess the performance of the proposed edge-based feature extraction method under the depth dataset having thirteen various kinds of actions in a comprehensive experimental setup. The structure of the remaining paper is as follows. Some literature reviews are well described along with their shortcomings in Section 2. The proposed method is explained in Section 3, while the results along with some discussions are presented in Section 4. At the end, the conclusion of the paper is indicated in Section 5.

2. Related Work

Recently, activity recognition includes understanding of human movements against a succession of frame observations regarding ecological situations. The newly established commodity depth cameras dig out innovative potentials of dealing with such concern but provide some sole challenges as well. A state-of-the-art feature extraction method was designed by [15] for condensed system, which was based on spatial and temporal information. Further, they utilized k-NN for the classification, and they claimed higher performance accuracy. However, this system is complex and inefficient for comprehensive dataset, due to which this system dropped the classification rates, and that is one of the common limitations of spatial and temporal based features. Moreover, if the corresponding data is large, then the observation step of k-NN can be slow, which is one major limitation of k-NN [16]. Similarly, the authors of [17] developed a psychology-enthused twin stream gated recurring unit approach for activity classification that was based on human body joints. Their approach achieved an 89.97% classification accuracy. However, their approach might not be applicable in healthcare domains because of low accuracy. A depth map-based HAR system was designed by [18, 19] that utilized convolutional autoencoder neural network in order to train and learn fixed features and summarize the gratified of single depth maps. However, convolutional neural network is not appropriate for naturalistic domains. Moreover, computational-wise, it is much expensive and needs special device for training [20]. Similarly, an ensembled method was developed by [21, 22] for HAR systems, which was based on graph network for human skeleton. The system utilized information fusion, deep learning, and temporal information in order to extract the best features. However, for data fusion, it is hard to establish applications and data in a precise level of the system and to reprocess it [23], also it is improper for human viewpoint [24]. Moreover, temporal-based information is inefficient for comprehensive dataset, due to which this system dropped the classification rates. On the other hand, an integrated descriptor-based method was designed by [25] for HAR system. In this system, for feature extraction, the concept of multilevel fusion was utilized against depth frames to devise a comprehensive learning system. However, multilevel fusion-based feature extraction has some common issues like feature mismatch, robustness to camera sensors, letdown, and susceptibility to unpredictable noise or meddling because the distinction compassion greatly decreases the significant of multilevel fusion-based methods [26]. Furthermore, finding optimal features and feature extraction methods require extensive domain knowledge, which is time-consuming [27]. In the approach of [28], the authors improved the gesture, and still history frames are calculated, which further helped in the extraction of gradient feature [28]. However, one of the major limitations of gradient-based feature extraction technique is the existence of numerous local optima, resultant in explanations where global optimum might not simply be guaranteed [29]. An efficient approach was proposed by [30] for HAR under three-dimensional skeleton information. In this approach, they simply employed a straight-forward deep learning model for activity classification. However, one of the major problems in skeleton-based feature extraction method is that the alone sparse skeleton data might not enough to completely classify the human activities [31]. Similarly, an improved bag-of-visual-words based activity classification method was designed by [32], where the authors employed support vector machine for classification. However, in this model, throughout the feature detection process, a huge number of key points are positioned that affect the performance of the model, due to which computational-wise, it is also more expensive [33]. Also, support vector machine is a vector-based classifier, which cannot classify the sequence-based activities. An adoptive state of the art approach was presented by [34] that was three-dimensional autocorrelation features. Moreover, they described the sequence of depth movement maps in order to get the data of temporal movement that might differentiate identical activities. However, this method is pretentious by affinal transformations that needs extra progresses to resolve 1 more complex activity recognition [35]. Therefore, the proposed method calculates the direction of the edges under the presence of nonmaximum conquest. The benefits are in ease that depends upon the modest procedures, and the extension possibility is to determine other types of features. Normally, it is practical to extract extra low-level information in the form of features when determining the shapes, and to get the appropriate information, the additional cultured shape detection procedure is utilized or discarded. Basically, this method enlarges the percentage of the product of the signal-to-noise ratio (SNR) and the highest isolation along with localization. During the processing of the frames, again some edges are demonstrated as a footstep function; the proposed approach might give better performance than other operators. The appropriate information is extracted to form feature vector, which is further fed to the classifier for activity recognition.

3. Proposed Edge Detection Algorithm

The analysis of Taylor series tells that head-to-head pixel's differencing delivers the estimation of the first-order derivative at a pixel. If the variation is considered, then the pixels are divided by ∇i; then, by Taylor extension, g(i+∇i) is By reordering, the term g(i) is given as These indicate the variations among the head-to-head pixels, that is, the estimation of the first order derivative along with error Q(∇i), which relies on the dimension of ∇i and the intricacy of the edge. If ∇i is high, then the error may be high. In training, the rapid selection of image points and the condensed high pitch content consider this assumption suitable, which is equal to calculating the difference of the first order that is carried out at two head-to-head pixels, which is a difference at horizontal such as εxx, which is given as follows:which is equal to integrating space to perceive the edges εxx by Again, in order to analyze the Taylor series, we extend g(i − ∇i) as By substituting equations (1) and (5), we achieve the following: Equation (6) recommends that currently, the estimation of the first order variation is the variation among the pixels detached by a single point along with Q(∇i2). If ∇i < 1, which is obviously lesser than the integrated error of head-to-head pixels (as shown in equation (6)), which further can be utilized by noise or error reduction as M is the length between the edge direction and vector. During implementation, those templates which provided the highest value are stored as the edge value at that pixel. Then, the pixel of the edge ε is the highest for two values originated through the convolution of the two image templates at point ρ. Another way of considering the highest value is only the summation of results of the two templates to associate the edges along horizontal and vertical. There are various diversities of the edges that mostly be utilized to take the two templates as one of the facilitating mechanisms to build an edge vector along horizontal axis and vertical axis, respectively. Edge detection is a kind of mechanism to differentiate the two things, as it perceives variations, which must reply to noise and step-like image intensity variations in image intensity. Therefore, it is practical to integrate the average in the process of edge detection. Hence, we may spread the horizontal (M) and vertical (M) templates along three rows and columns, respectively, which provides two types of results such as the brightness on every axis and magnitude of the edge; θ is the vector angle, which is explained as follows: M and M might be utilized to find the suitable quadrangle for the direction of the edge. In the proposed methodology, we also utilized Sobel operator, which employs two windows in order to find the edges in the form of vectors, and it is one of the most well-known edge detection operators. Moreover, the proposed method along with Sobel operator showed better performance against other concurrent operators. The proposed operator also considers the optimum averaging procedures and differencing. The Gaussian averaging has previously quantified to provide the optimum averaging. The binomial theorem provides a series of integer coefficients in order to approximate the normal distribution. As described before, we are utilizing two windows that provide us two sets of coefficients in the form of triangles as shown in Figure 1.
Figure 1

(a) The additive Pascal triangle and (b) the subtractive Pascal triangle for the set of coefficients.

Figure 1(a) provides the irregular coefficients of an optimum discrete sharpening operator, which is basically a Gaussian filter along with integer coefficients. In Figure 1, the increasing coefficients for the window is given by the rows. The coefficients of sharpening inside the Sobel operator are of size 3 × 3. Moreover, Figure 1(b) describes the Pascal triangle coefficients for the subtraction, which might be executed by deducting the templates derived from the corresponding head-to-head extensions for the size of minor mask. Therefore, we need a filter that might offer the Pascal triangle coefficients for window parameters such that size is p and location is γ. The filter is the Pascal (γ, p) as shown in the following: There are four possibilities for the edge direction measurement delivered by the Sobel operator, as described in Figure 2.
Figure 2

Another preparation of edge direction. (a) (M, M), (b) (−M, M), (c) (M, M), and (d) (−M, −M), respectively.

In Figure 2, the reversing template of M does not specify the disjointedness at the turnings, which means the magnitude of the edge under the presence of the Sobel employed to the square is not presented but is comparable to those operators originated by the applications of other operators. If we change the templates of the Sobel filter, the edges' direction measurement is organized and may be the normal edges by itself. But when the edges are to be determined, the reorganization might help in the construction of the algorithm to find the target. So, for this purpose, if an algorithm is locating the shapes, then it must use the direction of the edges for the accurate preparation. This procedure might improve the performance of the algorithm, but it must precisely map the corresponding image data. Once all the edges and their corresponding directions are detected, then the entire information has been stored in the form of feature vector, which is further utilized in hidden Markov model for activity recognition.

4. Algorithm Evaluation

We evaluated the proposed methodology against the following flow.

4.1. Arrangement for Experiments

We performed the following experiments in order to judge the performance of the proposed methodology: The first experiment indicates the accuracy of the proposed methodology against the defined dataset. An n – 1-fold cross justification scheme has been used for the entire experiments, which means that the data from each subject will be utilized at least once for training and testing in order to maintain the robustness. In the second experiment, comprehensive experiments are presented instead of using the proposed methodology, which means that we will perform multiple experiments with recent feature extraction methods but will not utilize the proposed feature extraction method. In the final experiment, state-of-the-art methods are compared with the performance of the proposed methodology.

4.2. Depth Images Dataset

This defined dataset has a sequence of 670 videos, which were collected under the setting of Kinect depth camera. There were a total 70 subjects (university students: male and female) that performed thirteen activities such as bending, jacking, place-jumping, running, side movement, skipping, walking, one-hand waving, two-hand waving, jumping, clapping, boxing, and sitting up and down. In order to make the dataset more realistic, we recorded some videos from stroke patients in real healthcare domain. All the activity frames have various sizes; therefore, we have reduced the sizes of the entire frames to 100 × 100 in order to sustain the normalization. The dataset was recorded in 6-month period (from February 2017 to July 2017).

4.3. Experimental Results and Discussions

In the first experiment, we presented the accuracy of the proposed methodology against the depth dataset, which are performed in MATLAB using offline setting in lab. The whole results are represented in Table 1.
Table 1

Accuracy of classification for the proposed methodology against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 98 000020000000
JC0 97 10020000000
PLJ00 99 0000100000
RNN020 96 001001000
SIM0000 100 00000000
SKP00100 95 1010101
WLK000000 100 000000
OW10000100 99 00000
OW210000100 97 0010
JP001000010 98 000
CLP0000010000 99 00
BXG02010010000 96 0
SUD000010000000 99

Average 97.9%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down

The proposed methodology showed significant accuracy, as presented in Table 1. This significant performance is of the calculation of the edge directions under the presence of nonmaximum conquest. Also, during the processing of the frames, some edges are demonstrated as a footstep function; the proposed approach might give better performance than other operators. In the second experiment, the accuracies are represented against various latest feature extraction methods of machine learning. However, in these experiments, we will not use the proposed methodology. For these group of experiments, we employed different types of well-known feature extraction techniques such as autoencoder, histogram of oriented gradients, contrast features, ellipse features, Fourier features, Gabor features, Haralick texture features, geometric features, local binary pattern features, and basic intensity features. The whole results for this group of experiments against depth dataset are described in Tables 2–11, respectively.
Table 2

Accuracy of classification for the proposed activity recognition system with autoencoder (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 80 023120510240
JC1 79 13122421031
PLJ02 84 1030204022
RNN421 77 311621011
SIM2124 79 12130212
SKP22121 81 1310321
WLK023012 85 211201
OW13122422 75 22131
OW210212213 80 1214
JP211232112 82 211
CLP2301112042 83 10
BXG22112412112 79 2
SUD014210210210 86

Average 81.5%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 3

Accuracy of classification for the proposed activity recognition system with histogram of oriented gradients (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 83 021220310213
JC0 87 12021202021
PLJ22 80 2122121221
RNN322 79 314021210
SIM1312 82 22212101
SKP02132 85 1210111
WLK220142 78 141221
OW10522133 76 23120
OW232122152 72 1522
JP224122113 76 114
CLP1012112412 80 23
BXG01211321221 82 2
SUD121010312132 83

Average 80.2%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 4

Accuracy of classification for the proposed activity recognition system with contrast features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 88 022121102010
JC0 89 11220210101
PLJ20 90 2101102010
RNN121 85 021221102
SIM0122 87 12021020
SKP21112 83 1212121
WLK122212 80 121312
OW14121122 78 12123
OW212523122 77 1211
JP221012212 82 221
CLP1122113210 81 41
BXG22312112112 80 2
SUD011211212120 86

Average 83.5%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 5

Accuracy of classification for the proposed activity recognition system with ellipse features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 90 021021010102
JC0 91 10021022010
PLJ20 88 1200120121
RNN022 86 011201302
SIM2112 80 21122123
SKP12102 83 1410221
WLK010202 91 011200
OW12021121 84 22012
OW212121102 86 0211
JP211020110 89 120
CLP1010210101 92 01
BXG12110210201 89 0
SUD010210201102 90

Average 87.6%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 6

Accuracy of classification for the proposed activity recognition system with Fourier features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 93 021020101000
JC1 89 10121020201
PLJ02 91 1010110120
RNN000 94 101001012
SIM2110 87 21120201
SKP40122 83 1012121
WLK020102 90 102020
OW11021202 87 10202
OW212021102 88 1110
JP011020210 91 020
CLP1021010110 92 01
BXG12012010020 89 2
SUD212012212112 83

Average 89.0%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 7

Accuracy of classification for the proposed activity recognition system with Gabor features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 91 201020010120
JC0 90 12020201011
PLJ20 86 1202110212
RNN121 84 121021122
SIM0101 92 02102100
SKP10110 93 1010101
WLK210210 89 201020
OW12010102 88 02022
OW212010100 94 0010
JP011020211 87 212
CLP2021011221 85 21
BXG12012010101 91 0
SUD201011021101 90

Average 89.2%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 8

Accuracy of classification for the proposed activity recognition system with Haralick texture features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 88 021120121020
JC1 90 12021010101
PLJ02 85 0202121122
RNN221 79 121211422
SIM1321 77 42132121
SKP21122 81 1212131
WLK122101 83 121212
OW12122121 76 22126
OW212112102 86 1201
JP011200210 92 010
CLP2020211012 87 02
BXG12011012011 89 1
SUD012011001021 91

Average 84.9%

BN for bending, JC for jacking, PLJ for place-jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 9

Accuracy of classification for the proposed activity recognition system with geometric features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 79 231221421102
JC2 77 12522132210
PLJ12 83 2212112021
RNN021 86 121201202
SIM2102 90 01102010
SKP02102 88 0210121
WLK212112 81 122212
OW11112112 84 12121
OW210212102 87 1210
JP211212212 78 224
CLP2521213212 75 31
BXG11012210111 89 0
SUD021010010202 91

Average 83.7%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 10

Accuracy of classification for the proposed activity recognition system with local binary pattern features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 93 001020100201
JC0 91 10120021020
PLJ20 89 2102100102
RNN120 86 211211021
SIM1020 92 20100110
SKP21110 89 1012101
WLK020020 94 000020
OW12012110 84 21222
OW212212112 80 2114
JP111222110 82 232
CLP0211012220 88 10
BXG12011010020 91 1
SUD202112022020 86

Average 88.0%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

Table 11

Accuracy of classification for the proposed activity recognition system with basic intensity features (without employing the proposed methodology) against depth dataset.

ActivitiesBNJCPLJRNNSIMSKPWLKOW1OW2JPCLPBXGSUD
BN 79 214221212211
JC1 81 12122121222
PLJ02 84 0212112212
RNN221 76 611321122
SIM1022 90 10012010
SKP22142 74 2322141
WLK242125 73 231212
OW11122124 78 14121
OW212112112 85 1210
JP212212012 82 122
CLP2122211122 80 22
BXG12412212124 77 1
SUD261211242124 72

Average 79.3%

BN for bending, JC for jacking, PLJ for place jumping, RNN for running, SIM for side movement, SKP for skipping, WLK for walking, OW1 for one-hand waving, OW2 for two-hand waving, JP for jumping, CLP for clapping, BXG for boxing, and SUD for sitting up and down.

As demonstrated in Tables 2–11, the proposed activity recognition system did not get best classification rates with latest feature extraction techniques. On the other side, the system achieved significant accuracy with the proposed feature extraction technique. This is because the proposed method calculates the direction of the edges under the presence of nonmaximum conquest. The benefits are in ease that depends upon the modest procedures, and the extension possibility is to determine other types of features. Normally, it is practical to extract extra low-level information in the form of features when determining the shapes, and to get the appropriate information, the additional cultured shape detection procedure is utilized or discarded. In the last experiment, the accuracy of classification for the proposed methodology is compared with the state of the art HAR systems. In the existing systems, some of the systems are executed for these experiments, while for some systems, we borrowed their simulation, and for some systems, we took the results from their corresponding papers. All of the systems are implemented against the depth dataset with the exact sceneries as presented in their respective works. The whole comparison results are indicated in Table 12.
Table 12

Accuracies of classification for the proposed methodology along with the recent human activity recognition systems.

Recent systemsAccuracies of classification (%)Std. dev. (σ)
[36]89.2±3.8
[37]93.1±1.3
[38]85.9±4.5
[39]90.7±3.6
[40]81.6±2.9
[41]79.8±1.6
[42]88.5±4.8

Proposed methodology 97.9 ±2.1
As demonstrated in Table 12, the proposed methodology achieved the highest accuracy on depth dataset compared to other state-of-the-art systems. This is because in the proposed method, during the processing of the frames, some edges are demonstrated as a footstep function; the proposed approach might give better performance than other operators. Also, some appropriate information is extracted to form feature vector, which is further fed to the classifier to get best accuracy of classification.

5. Conclusion

Human activity recognition states the task of evaluating the physical activity of a person over the usage of neutral knowledge. Currently, in telemedicine and healthcare domains, such neutral knowledge, along with video technologies, is employed, which rise concealment issues; then, it might lead to conditions where the humans might not recognize that their isolated data has been propagated and consequently convert exposed to a risk. Therefore, in this paper, we have designed an adoptive feature extraction method for HAR systems, which utilized Kinect depth camera in order to resolve the privacy issue. The proposed method calculates the direction of the edges under the presence of nonmaximum conquest. The benefits are in ease that depends upon the modest procedures, and the extension possibility is to determine other types of features. Normally, it is practical to extract extra low-level information in the form of features when determining the shapes, and to get the appropriate information, the additional cultured shape detection procedure is utilized or discarded. The benefits are in ease that depends upon the modest procedures, and the extension possibility is to determine other types of features. Normally, it is practical to extract extra low-level information in the form of features when determining the shapes, and to get the appropriate information, the additional cultured shape detection procedure is utilized or discarded. As illustrated, the activity frames are very noisy, and before the analysis, it needs preprocessing; therefore, we used histogram equalization for this step. Moreover, the whole frames consist of features named sludge that may be noticed for various edge operators to extract the edge, though without considering noise. The feature vectors for the proposed methodology have been generated using hysteresis thresholding, where the thresholds were selected manually for the best performance. In the proposed methodology, the elementary factors nicely reply to the noise, and it is hard to choose a threshold which discloses a main portion of the sludge boundary. Basically, this method enlarges the percentage of the product of the signal-to-noise ratio (SNR) and the highest isolation along with localization. During the processing of the frames, again some edges are demonstrated as a footstep function; the proposed approach might give better performance than other operators. The appropriate information is extracted to form feature vector, which is further fed to the classifier for activity recognition. We assess the performance of the proposed edge-based feature extraction method under the depth dataset having thirteen various kinds of actions in a comprehensive experimental setup. The proposed methodology achieved the highest accuracy of classification compared to the existing latest HAR systems. In the future, we will try to improve the proposed methodology in the telemedicine and healthcare domains in order to sustain the same accuracy and facilitate the physicians for better recommendations. Moreover, our alternative goal is to deploy the proposed methodology in smart phones. So, for that purpose, we will try to propose a light weight classifier coupled with the proposed approach to implement it in the smart phone for real healthcare domains.
  7 in total

1.  Human facial expression recognition using stepwise linear discriminant analysis and hidden conditional random fields.

Authors:  Muhammad Hameed Siddiqi; Rahman Ali; Adil Mehmood Khan
Journal:  IEEE Trans Image Process       Date:  2015-04       Impact factor: 10.856

2.  Physical Activity Recognition Using Posterior-Adapted Class-Based Fusion of Multiaccelerometer Data.

Authors:  Alok Kumar Chowdhury; Dian Tjondronegoro; Vinod Chandran; Stewart G Trost
Journal:  IEEE J Biomed Health Inform       Date:  2017-05-17       Impact factor: 5.772

Review 3.  Heart failure patients monitored with telemedicine: patient satisfaction, a review of the literature.

Authors:  I H Kraai; M L A Luttik; R M de Jong; T Jaarsma; H L Hillege
Journal:  J Card Fail       Date:  2011-05-06       Impact factor: 5.712

Review 4.  Stroke telemedicine.

Authors:  Bart M Demaerschalk; Madeline L Miley; Terri-Ellen J Kiernan; Bentley J Bobrow; Doren A Corday; Kay E Wellik; Maria I Aguilar; Timothy J Ingall; David W Dodick; Karina Brazdys; Tiffany C Koch; Michael P Ward; Phillip C Richemont
Journal:  Mayo Clin Proc       Date:  2009       Impact factor: 7.616

5.  Recognition of Human Activities Using Depth Maps and the Viewpoint Feature Histogram Descriptor.

Authors:  Kamil Sidor; Marian Wysocki
Journal:  Sensors (Basel)       Date:  2020-05-22       Impact factor: 3.576

6.  Exploring 3D Human Action Recognition Using STACOG on Multi-View Depth Motion Maps Sequences.

Authors:  Mohammad Farhad Bulbul; Sadiya Tabussum; Hazrat Ali; Wenli Zheng; Mi Young Lee; Amin Ullah
Journal:  Sensors (Basel)       Date:  2021-05-24       Impact factor: 3.576

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.