| Literature DB >> 32226110 |
Huorong Ren1,2, Zhixing Ye1,2, Zhiwu Li3,1.
Abstract
Anomaly detection in sequence data is becoming more and more important in a wide variety of application domains such as credit card fraud detection, health care in medical field, and intrusion detection in cyber security. In the existing anomaly detection approaches, Markov chain techniques are widely accepted for their simple realization and few parameters. However, the short memory property of a classical Markov model ignores the interaction among data, and the long memory property of a higher order Markov model clouds the relationship between the previous data and current test data, and reduces the reliability of the model. Besides, both of these models cannot successfully describe the sequences changing with a tendency. In this paper, we propose an anomaly detection approach based on a dynamic Markov model. This approach segments sequence data by a sliding window. In the sliding window, we define the states of data according to the value of the data and establish a higher order Markov model with a proper order consequently, to balance the length of the memory property and keep up with the trend of sequences. In addition, an anomaly substitution strategy is proposed to prevent the detected anomalies from impacting the building of the models and keep anomaly detection continuously. The experimental results using simulated datasets and real-world datasets have demonstrated that the proposed approach improves the adaptability and stability of anomaly detection in sequence data.Entities:
Keywords: Anomaly detection; Higher order Markov model; Markov model; Sequence data
Year: 2017 PMID: 32226110 PMCID: PMC7094635 DOI: 10.1016/j.ins.2017.05.021
Source DB: PubMed Journal: Inf Sci (N Y) ISSN: 0020-0255 Impact factor: 6.795
Fig. 1Experimental results obtained for synthetic data using and not using anomaly substitution strategy.
Fig. 2Experimental results obtained for synthetic data with different states numbers.
The TP and FA values of the experimental results for different orders n.
| 8.6 | 81.8 | |
| 65.2 | 46.1 | |
| 62.1 | 53.9 | |
| Constantly update | 95.7 | 8.3 |
Fig. 3Experimental results for different orders n.
Fig. 4Experimental results for different length (l) of sliding window.
Fig. 5Comparison of the experimental results produced by the four approaches.
The TP and FA values of the experimental results on different data.
| Detection methods | Passenger traffic data | Ann gun CentroidA | Chfdb chf13 45590.3 | |||
|---|---|---|---|---|---|---|
| The LOF approach | 66.7 | 66.7 | 62.8 | 16.3 | 58.3 | 10.5 |
| The classical Markov chain techniques | 100 | 90.5 | 54.3 | 3.7 | 62.5 | 43.3 |
| The higher order Markov chain | 50.0 | 94.7 | 89.0 | 32.0 | 64.7 | 0 |
| The proposed approach | 100 | 25.0 | 93.2 | 2.4 | 94.8 | 6.7 |
Fig. 6Comparison of the experimental results of the four approaches.
Fig. 7Comparison of the experimental results of the four approaches.
The results of anomaly detection on eight ECG datasets using the four approaches.
| ECG datasets | datasets length | Classical Markov chain techniques | Higher order Markov chain techniques ( | Proposed approach | LOF method | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| stdb_308_0_1 | 2000 | 8 | 70.8 | 30.0 | 9 | 80.0 | 13.3 | 8 | 87.4 | 4.8 | 200 | 82.7 | 18.5 |
| stdb_308_0_2 | 2000 | 7 | 52.6 | 37.4 | 10 | 84.9 | 16.7 | 7 | 85.6 | 5.8 | 200 | 78.3 | 13.3 |
| ltstdb_20321_240_1 | 2500 | 10 | 56.9 | 33.7 | 6 | 67.6 | 21.7 | 10 | 88.5 | 8.3 | 250 | 80.5 | 14.7 |
| ltstdb_20321_240_2 | 2500 | 10 | 62.1 | 23.8 | 9 | 68.2 | 10.9 | 10 | 92.5 | 6.4 | 250 | 72.8 | 10.9 |
| chfdb_chf13_45590_1 | 3000 | 5 | 64.4 | 25.4 | 8 | 78.9 | 19.2 | 5 | 87.9 | 5.2 | 280 | 85.4 | 13.8 |
| chfdb_chf13_45590_2 | 3000 | 7 | 60.6 | 23.3 | 9 | 67.1 | 14.8 | 7 | 89.6 | 8.8 | 280 | 79.3 | 9.4 |
| chfdb_chf01_275_1 | 3500 | 9 | 69.2 | 18.7 | 10 | 73.5 | 12.6 | 9 | 91.3 | 5.4 | 300 | 81.5 | 20.8 |
| chfdb_chf01_275_2 | 3500 | 6 | 72.8 | 20.6 | 8 | 82.5 | 17.3 | 6 | 90.7 | 7.5 | 300 | 74.1 | 10.3 |
| Average | 63.7 | 26.6 | 75.3 | 15.8 | 89.1 | 6.5 | 79.3 | 13.9 | |||||