| Literature DB >> 33802164 |
Arifa Sultana1, Kaushik Deb1, Pranab Kumar Dhar1, Takeshi Koshiba2.
Abstract
Human fall identification can play a significant role in generating sensor based alarm systems, assisting physical therapists not only to reduce after fall effects but also to save human lives. Usually, elderly people suffer from various kinds of diseases and fall action is a very frequently occurring circumstance at this time for them. In this regard, this paper represents an architecture to classify fall events from others indoor natural activities of human beings. Video frame generator is applied to extract frame from video clips. Initially, a two dimensional convolutional neural network (2DCNN) model is proposed to extract features from video frames. Afterward, gated recurrent unit (GRU) network finds the temporal dependency of human movement. Binary cross-entropy loss function is calculated to update the attributes of the network like weights, learning rate to minimize the losses. Finally, sigmoid classifier is used for binary classification to detect human fall events. Experimental result shows that the proposed model obtains an accuracy of 99%, which outperforms other state-of-the-art models.Entities:
Keywords: convolutional neural network (CNN); deep learning; gated recurrent unit (GRU); human fall classification; recurrent neural network (RNN)
Year: 2021 PMID: 33802164 PMCID: PMC8000947 DOI: 10.3390/e23030328
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Summarization of literature discussion.
| Research Paper | Proposed Models | Limitations | Accuracy (%) |
|---|---|---|---|
| [ | Frame differencing for foreground extraction, ellipse fit for human silhouettes extraction and SVM classifier for fall classification. | SVM underperforms in case of an insufficient training data sample. | 96.34% |
| [ | Background subtraction method to extract foreground and CNN to classify fall events. | Generates false predictions in bending, crawling, and sitting positions. | 90.2% |
| [ | CNN for feature extraction and DB-LSTM recognizes sequential pattern. | Shows false projection at the identical background and occluded environment. | 92.66% |
| [ | CNN to extract the skeleton joint map. | The result can be developed if the recurrent neural network is incorporated. | 94% |
| [ | Human silhouettes are converted to voxel person and linguistic summarizations of temporal fuzzy inference curves classify fall events. | Not applicable for short-term activity recognition. | 96.5% |
| [ | Deeper-cut method extracts 2D skeleton and LSTM identify fall actions. | Low accuracy rate. | 90% |
| [ | Faster R-CNN for fall classification. | Can not properly classify fall events when a person sitting on a sofa or a chair. | 95.5% |
| [ | PCANet model is trained followed by SVM classifier. | Low accuracy rate. | 88.87% (Sensitivity) |
| [ | CSS features are extracted from human silhouettes and extreme learning machine (ELM) classify fall events. | Misclassifies the lying position. | 86.83% |
| [ | OpenPose method extract human skeleton and time-continuous recognition algorithm identify fall event. | Misclassifies workout motions. | 97.34% |
| [ | OpenPose identify fall events using information from human skeleton. | Unable to classify partially occluded human actions. | 97% |
| [ | Mask R-CNN extract object from noisy background and Bi-LSTM classify human actions. | Cannot identify the behavior of multiple people living in the same room. | 96.7% |
| [ | Deep belief network for training and testing human actions. | Not always possible to carry a cell phone in an indoor environment. | 97.56% (Sensitivity) |
| [ | CNN integrated with LSTM for posture detection. | Unnormalized data may lead to huge training time. | 98% |
Figure 1Workflow of the proposed human fall classification model.
Figure 2Visualization of the output of convolution layer for human fall classification.
Figure 3Visualization of the output of max-pooling layer for human fall classification.
Figure 4Proposed 2DCNN-GRU architecture to classify human fall.
Figure 5Gated Recurrent Unit (GRU) Cell.
Figure 6Performance of scratch model for 35% validation and 25% test data.
Figure 7Performance of scratch model for 50% validation and 20% test data.
Figure 8Performance of scratch model for 60% validation and 20% test data.
Figure 9Number of frames vs. validation accuracy curve.
Effect of number of frames on execution time.
| Number of Frames | Execution Time of Proposed Model |
|---|---|
| 5 | 2.8 min |
| 8 | 3.9 min |
| 10 | 4.7 min |
| 12 | 5.4 min |
| 15 | 6.8 min |
| 18 | 7.8 min |
| 20 | 9.4 min |
| 22 | 11.7 min |
Figure 10Sequential video frames for daily activity in multiple cameras fall dataset.
Figure 11Sequential video frames for fall event in multiple cameras fall dataset.
Effects of batch normalization for performance measurement.
| Training Accuracy | Validation Accuracy | Test Accuracy | Total Epochs | |
|---|---|---|---|---|
| Normalized data | 100% | 99.7% | 99% | 40 |
| Unnormalized data | 94% | 84% | 81% | 55 |
Figure 12Each row showing low-to-high level feature maps using the proposed convolutional neural network (CNN) model in multiple cameras fall dataset for (a) daily activities, (b) fall events.
Figure 13Output of the proposed model for multiple cameras fall dataset for (a) non-fall events, (b) fall events.
Figure 14Attention map for fall classification.
Figure 15Dropout vs. validation accuracy curve.
Comparison of test accuracy of the proposed model with existing models in the UR fall detection dataset for human fall classification.
| Existing Models | Accuracy (%) |
|---|---|
| VGG 16 | 98% |
| VGG 19 | 98% |
| Xception | 99% |
| 1DCNN with GRU | 94.30% |
| 2DCNN with Bi-LSTM | 95.50% |
| 3DCNN with LSTM | 99% |
| 2DCNN with LSTM | 89% |
| Proposed Model | 99.80% |
Comparison of test accuracy of the proposed model with existing models in multiple cameras fall dataset for human fall classification.
| Existing Models | Accuracy (%) |
|---|---|
| VGG 16 | 97.60% |
| VGG 19 | 98% |
| Xception | 98% |
| 1DCNN with GRU | 92.70% |
| 2DCNN with Bi-LSTM | 95% |
| 3DCNN with LSTM | 97.50% |
| 2DCNN with LSTM | 88% |
| Proposed Model | 98% |
Number of parameters, depth and training time comparison with existing models for human fall classification.
| Existing Models | Number of Parameters | Depth | Training Time |
|---|---|---|---|
| VGG 16 | 138,357,544 | 23 | 39 m 21 s |
| VGG 19 | 143,667,240 | 26 | 46 m 31 s |
| Xception | 22,910,480 | 126 | 18 m 22 s |
| 3DCNN with LSTM | 12,317,230 | 20 | 11 m 44 s |
| 2DCNN with LSTM | 7,523,320 | 18 | 7 m 16 s |
| Proposed Model | 5,288,860 | 18 | 4 m 7 s |
Figure 16Confusion matrix for UR fall detection dataset.
Figure 17Confusion matrix for Multiple cameras fall dataset.
Class-wise performance of the scratch model for UR fall detection dataset.
| Classes | Mean Accuracy (%) | Accuracy (%) | Precision (%) | Sensitivity (%) | Specificity (%) | F1-Score (%) |
|---|---|---|---|---|---|---|
| Fall event | 100 | 100 | 100 | 100 | 100 | 100 |
| Non-fall event | 100 | 100 | 100 | 100 | 100 |
Class-wise performance of the scratch model for Multiple cameras fall dataset.
| Classes | Mean Accuracy (%) | Accuracy (%) | Precision (%) | Sensitivity (%) | Specificity (%) | F1-Score (%) |
|---|---|---|---|---|---|---|
| Fall event | 98 | 100 | 100 | 96 | 100 | 98 |
| Non-fall event | 96.15 | 100 | 96 | 100 | 98 |
Performance comparison with existing models using UR fall detection dataset.
| Methods | Accuracy (%) |
|---|---|
| Kasturi et al. [ | 96.34% |
| Lu et al. [ | 99.27% |
| Proposed model | 99.8% |
Performance comparison with existing models using multiple cameras fall dataset.
| Methods | Accuracy (%) |
|---|---|
| Wang et al. [ | 96% |
| Ma et al. [ | 97.2% |
| Proposed model | 98% |