| Literature DB >> 23921828 |
Nor Surayahani Suriani1, Aini Hussain, Mohd Asyraf Zulkifley.
Abstract
Event recognition is one of the most active research areas in video surveillance fields. Advancement in event recognition systems mainly aims to provide convenience, safety and an efficient lifestyle for humanity. A precise, accurate and robust approach is necessary to enable event recognition systems to respond to sudden changes in various uncontrolled environments, such as the case of an emergency, physical threat and a fire or bomb alert. The performance of sudden event recognition systems depends heavily on the accuracy of low level processing, like detection, recognition, tracking and machine learning algorithms. This survey aims to detect and characterize a sudden event, which is a subset of an abnormal event in several video surveillance applications. This paper discusses the following in detail: (1) the importance of a sudden event over a general anomalous event; (2) frameworks used in sudden event recognition; (3) the requirements and comparative studies of a sudden event recognition system and (4) various decision-making approaches for sudden event recognition. The advantages and drawbacks of using 3D images from multiple cameras for real-time application are also discussed. The paper concludes with suggestions for future research directions in sudden event recognition.Entities:
Mesh:
Year: 2013 PMID: 23921828 PMCID: PMC3812589 DOI: 10.3390/s130809966
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1.Images from the GERHOME Laboratory. Reproduced with permission from www-sop.inria.fr/stars/projects/Gerhome/Videos/ (on 20 February 2013)
Previous related surveys.
| 1999 | Aggarwal | Human movement analysis | [ |
| 2002 | Wang | Human motion estimation and activity understanding | [ |
| 2004 | Aggarwal and Park | Recognition of actions and interaction | [ |
| 2004 | Weiming Hu | Detection of anomalous behavior in dynamic scenes | [ |
| 2006 | Thomas B. Moesland | Human body motion and recognition | [ |
| 2006 | Gandhi. T | Pedestrian detection based on various cases | [ |
| 2007 | Pantic | Behavior understanding | [ |
| 2008 | Morris B.T | Trajectory analysis for visual surveillance | [ |
| 2008 | Teddy Ko | Behavior analysis in automated video surveillance | [ |
| 2008 | Turaga | Human activities recognition | [ |
| 2009 | G.Lavee | Video event understanding (abstraction and event modeling) | [ |
| 2009 | Varun Chandola | Anomaly detection | [ |
| 2010 | Joshua Candamo | Event recognition in transit applications | [ |
| 2010 | Ji and Liu | Recognition of poses and actions in multiple view | [ |
| 2011 | Popoola and Wang | Contextual abnormal human behavior | [ |
Figure 2.Common structure of video-based sudden event recognition system.
Figure 3.Semantic hierarchy level description.
Figure 4.(a) Sudden fall. Reproduced with permission from www.iro.umontreal.ca/labim-age/Dataset/(accessed on 20 February 2013) [31]; and (b) snatch theft. Adapted from [32].
Figure 5.Diagram of sudden event recognition.
Sudden event criteria.
|
| |||
|---|---|---|---|
| Elderly/Patient walk and fall | ✓ | ✓ | ✓ |
| Snatch theft | ✓ | ✓ | ✓ |
| Crossing a prohibited area | × | ✓ | ✓ |
| Children jump and run | ✓ | ✓ | × |
| Burglary | × | × | ✓ |
| Abandoned luggage | ✓ | × | × |
Figure 6.Example of a human-centered sudden event. (a) Patient falls out of bed. ©2011 IEEE. Reprinted, with permission, from [33]; (b) group interaction shows the sequence of a sudden assault. Reproduced with permission from [34]. With kind permission from Springer Science and Business Media.
Experimental results in relation to sudden fall (TP: true positives; TN: true negatives).
| 2D features: bounding box aspect ratio | Parallel and top view camera perform >80% within a distance of 4–7 m Top view camera limits the precision within a distance of 1–4 m (< 80%) | [ |
| Aspect ratio, horizontal and vertical gradient | Accurately detect single person fall indoors and outdoors 100%, but <80% accuracy for a person fall with multiple people in the scene | [ |
| Horizontal and vertical velocities profile | ANOVA analysis indicated that the peak horizontal and vertical velocities were higher ( | [ |
| 2D silhouette aspect ratio and utilized personal information information, such as weight, height and health history | The successful rates of fall detections with and without personal information are 79.8% and 68%, respectively | [ |
| Bounding box aspect ratio and covariance matrix | Only preliminary results; no quantitative analysis | [ |
| Vertical projection histogram of silhouette | Correct detection rate is 84.44% | [ |
| Shape deformation using shape context | Reduce error rate from 9.1% to 3.8% using Procrustes distance compared to other 2D features | [ |
| Motion features and metric rectification | TP = 97.6%, TN = 86.7% | [ |
| Integrated spatio-temporal map and shape variations | Slip only and fall events detected with 90% and 95% accuracy, respectively; | [ |
| Spatio-temporal motion from dense optical flow | Performed on six classes of action and fall events detected with 99% accuracy | [ |
| Motion history image (MHI); shape variations and motion quantification | Sensitivity of 88% and an acceptable rate of false detection with a specicity of 87.5% for 24 daily activities | [ |
| Global motion: period of fall, change of centroid location and vertical histogram projection | Achieve correctness ratio of about 93% with a 13% missed ratio; the ratio of FPis 0% for all 28 events | [ |
| Approximated ellipse, projection histogram and temporal changes of head positions | TP = 94.3% and TN = 96.9% for 10 classes of action | [ |
| Principal component analysis (PCA) | >85% fall detection rate | [ |
| 3D orientation of image | Only preliminary results; no quantitative analysis | [ |
| Length and width ratio (DFTcoefficient) of Fourier transform histogram projection | Average recognition rate for four classes of action is 97.8% | [ |
| Posture-based with point tracking | Achieve accuracy of 74.29% | [ |
| Posture-based probabilistic projection maps | Average accuracy >95% in classifying human postures | [ |
| Head tracking using particle filter | Reasonable mean error of 5% at five meters | [ |
Experimental results in relation to sudden events with multiple person interaction. HMM, Hidden Markov Model; MLN, Markov logic network; SVM, support vector machine.
| Motion trajectories | High recognition rate; almost 100% using HMM classifier, while the proposed model observation decomposed HMM is less sensitive | [ |
| Velocity measured using optical flow | Results reported that 80% of snatching events are detected | [ |
| Optical flow | Achieved accuracy of 91% | [ |
| Blob detection, motion features semantics behavior representations | 100% precision and 32% recall using BEHAVE dataset | [ |
| Silhouette ellipse representation and convex hull with context free grammar representations, | Accuracy reported for the deterministic approach is 91.6%, while the probabilistic approach achieves 93.8% | [ |
| Logic programming with context free grammar representations | Tested using four techniques; (1) Traditional MLN with Backward Induction (TBI); (2) Traditional MLN with Q-values (TQV); (3) Modified MLN with Backward Induction (MBI); and (4) Modified Q-values (MQV). Overall, MQV achieved a high precision value, almost 100%, while TQV is less sensitive | [ |
| Learning context of group activities | The results reported that SVM outperformed rule-based learning with ROC > 0.81 for six different actions | [ |
Experimental results in relation to a sudden event for a person with vehicle interaction (TP: true positives, TN: true negatives).
| Spatio-temporal of Markov Random Field (MRF) for motion vectors and object tracking | Overall success rate is 94.6% for horizontal and vertical traffic | [ |
| Discrete Cosine Transform (DCT) coefficient feature vector with traffic detections using Gaussian mixture HMM | HMM classifier correctly detected 91% to 94% under different illumination situations | [ |
| Spatio-temporal motion trajectories using MRF | Able to detect accidents at a very high recall rate, which is more than 90% | [ |
| Blob tracker using Kalman filter | Results reported that the percentage of falsely classified vehicle modes was about 3% | [ |
| Extracted features: shape, position, motion and track object using Kalman filter | Correctly classify five types of vehicle behavior and pedestrians with 86% to 96% accuracy | [ |
| Motion trajectories and hierarchical clustering | Half total error rate (HTER) = 11% and area under the curve (AUC) > 0.8 | [ |
| Motion trajectories; velocities of motion vector | The accuracy increases up to 95%, tested on two video clips (277 frame sequences) | [ |
| Target trajectories with context free grammar (CFG) representations | Results reported using (1) stochastic CFG, TP = 86%, TN = 14%; (2) HMM, TP = 55%, TN = 45% | [ |
| Optical flow and K-means clustering | Only visual sample results; no empirical analysis | [ |
| Local motion patterns to initialize the Gaussian mixture model (GMM) | Accuracy of detection measured = 83.23%, and the error rate is 16.77% | [ |
| Motion trajectories and C-means clustering | Results reported that the false rejection rate (FRR) = 6%, and the false acceptance rate (FAR) = 8.3% | [ |
| Track target trajectories and learn rules using grammar representations | The accuracy increased from 73% to 98.6% when tested on five traffic sub-events with different parsing parameters, | [ |
| Lane tracking using particle swarm optimization (PSO) particle filters (PF) | Results indicated that particle filter output is smoother than the PSO-PF algorithm | [ |
The trends of interest in sudden event recognition algorithms applicable in crime prevention, traffic monitoring and home care/hospital assistant system (RT: real-time implementations). MLP, multi-layer perceptron; EM, expectation-maximization; GMM, Gaussian mixture model
| Crime Prevention | Liu | 06 | Snatching | HMM | N | [ |
| Julio Cezar | 07 | Attack, threatening | Voronoi diagram | Y | [ | |
| Ryoo | 09 | Approach, attack, punch, kick | Grammar learning | N | [ | |
| Goya | 09 | Purse snatching | Nearest neighbor classifier | N | [ | |
| Jeong | 10 | Attack, fight, follow, snatch | Markov logic network | N | [ | |
| Ibrahim | 12 | Snatching | SVM | N | [ | |
| Zhang | 12 | Approach, attack, fight | SVM | N | [ | |
|
| ||||||
| Traffic Monitoring | Kamijo | 00 | Video of reckless driver, bumping accident, sudden stop and start, passing | HMM | N | [ |
| Xiaokun Li | 04 | Traffic behavior at the highway | HMM | Y | [ | |
| Kamijo | 04 | Accidents detection | Logical reasoning | N | [ | |
| Veeraraghavan | 05 | Turning, lane change, sudden stops | Trajectory analysis | Y | [ | |
| Kumar | 05 | Vehicle behavior for accident detection | Bayesian classifier | Y | [ | |
| Jiang | 07 | Vehicle U-turn, sudden brake and pull over the road | HMM | N | [ | |
| Chen | 07 | Traffic behavior at intersection, | MLP and RBF | N | [ | |
| car crash event | ||||||
| Veeraraghavan | 09 | Lane changes, sudden stop at intersection | Grammar learning | Y | [ | |
| Imran | 10 | Traffic behavior at intersection | EM algorithm | N | [ | |
| Hernandez | 10 | Traffic behavior at intersection | Hidden Markov Network | N | [ | |
| Hsieh | 11 | Traffic behavior at intersection | Fuzzy SOM | N | [ | |
| Zhang | 11 | Traffic incidents at the crossroad | Minimum Description Length (MDL) | N | [ | |
| Cheng | 12 | Sudden lane changes | Particle Swarm Optimization (PSO) | Y | [ | |
|
| ||||||
| Homecare/Hospital Assistant System | Ge Wu | 00 | Walk, sit down, down stairs, lying down, tripping | Velocity profile | N | [ |
| Ji Tao | 05 | Simulated indoor standing, walking, falling down | Statistical (Hypothesis testing) | N | [ | |
| Cucchiara | 05 | Simulated indoor walking, crouching, sitting and falling down | Bayesian classifier | N | [ | |
| Miaou | 06 | Fall detection using omni camera view | Bounding box aspect ratio | N | [ | |
| Anderson | 06 | Indoor walking, fallen down, kneeling and getting up | HMM | N | [ | |
| Jansen | 06 | 3D image | Learn contextual model | N | [ | |
| Thome | 06 | Indoor video sequence of a walking-falling-lengthened motion patterns | HMM | N | [ | |
| Vishakarma | 07 | Indoor and outdoor walk and fall; fall in crowd scene | Deterministic (gradient value and fall angle) | N | [ | |
| Juang | 07 | Indoor walk, jogging, bend, | ||||
| lying down and falling | Fuzzy | N | [ | |||
| Rougier | 07 | Indoor walk, bend, sit, lying down and fall | Ellipse ratio, orientation and motion quantity | N | [ | |
| Lin | 07 | Indoor standing, squatting, fallen down | Centroid and vertical histogram projection | N | [ | |
| Foroughi | 08 | Indoor walk, run, stumble, limp, sit, bend and lie down | MLP neural network | N | [ | |
| Foroughi | 08 | Indoor walk, run, stumble, limp, bend, sit and lie down | SVM | N | [ | |
| Hazelhoff | 08 | Indoor walking, bending, sitting and fall | Gaussian classifier | Y | [ | |
| Anderson | 09 | Simulated on different types of falls | Fuzzy Logic | N | [ | |
| Liu | 10 | Standing, sitting and fall postures | KNNclassifier | N | [ | |
| Rougier | 11 | Indoor walk, bend, sit, lying down and fall | GMM classifier | N | [ | |
| Liao | 12 | Indoor and outdoor walk, run, sit, stand, lying down and crouching down | Bayesian belief network | N | [ | |
| Miao Yu | 12 | Indoor walk, sit, bend, lying on sofa and fallen down | Directed acyclic graph SVM | N | [ | |
| Olivieri | 12 | Indoor walk, jogging, bend, lying down and falling | KNN classifier | N | [ | |
| Brulin | 12 | Simulated indoor sitting, standing, lying, squatting and fall from bed | NN; Fuzzy Logic | N | [ | |
Multi-view camera implementations in relation to sudden event recognition (RT: real-time implementations).
| Weinland | 06 | Motion history volumes for kick, punch, turn and get up actions | Classifier methods are PCA, Mahalanobis distance and LDAwith an average rate of 73%, 93% and 92%, respectively | N | [ |
| Denman | 06 | Optical flow for vehicle tracking to predict velocities—four cameras | Overall Euclidean tracking error of 11.41 pixels | Y | [ |
| Cuchiarra | 07 | Warping person silhouette from multi-camera view | Only visual; no empirical data analysis | Y | [ |
| Weinland | 07 | 3D exemplars based on HMM; silhouette projections for kick, punch, turn and get up actions | Average recognition rate for all four cameras are 81.4% | N | [ |
| Taj | 07 | Object detection using statistical change detection and tracking using graph matching Events classifier based on HMM | Overall accuracy for a warning event is 94.45% and for an alarm event is 92.86% | N | [ |
| Calderara | 07 | Used two cameras to detect and track using background suppression and appearance-based probabilistic approach Applied Bayesian network for motion trajectory classifier | Accuracies detected are 100% for a sudden event and 97.5% for a normal event | N | [ |
| Thome | 08 | 3D multiple view pose detector with motion modeling based on layered HMM (LHMM) | Good ability to detect sudden changes of human pose with a rate of 82% correct detection for fall events | Y | [ |
| Adam | 08 | Statistical low-level processing; eliminate tracking based algorithm | Results recorded the detection rate as > 95%. | Y | [ |
| Antonakaki | 09 | Videos from three cameras performed background subtraction, homography estimation and target localization | Overall precision is 98.6% using SVM and HMM for a sudden event, such as FightChase, FightRunAway, etc. | Y | [ |
| Uson | 09 | Voxel-based algorithm combined with probabilistic models using eight cameras | Average detection rate for a fall down event is 95.6% | N | [ |
| Anderson | 09 | Extract person profile (height, width, length) from silhouette in four cameras | Three fuzzy variable (Upright, On the groundand In-between) positions; the precision recorded was 83.1%,97.6% and 67.7%, respectively | N | [ |
| Drews | 10 | Estimate crowd size and activity using optical flow from two cameras | HMM and Bayesian network (BN) to classified between calm, low and high movement BN outperformed HMM with less false alarms | N | [ |
| Peyman | 11 | Traffic flow monitoring for a sudden incident at an intersection using a hybrid scale invariant feature transform (SIFT)—two cameras | The SVM learning machine performed recall above 93% and 95% precision | N | [ |
| Auvinet | 11 | Analyzed volume distribution along vertical axis for 3D shape of people | Results achieved 99.7% sensitivity and specificity with four cameras. | Y | [ |