| Literature DB >> 31671540 |
Pei Shi1,2, Guanghui Li3, Yongming Yuan4, Liang Kuang5.
Abstract
Wireless sensor networks (WSNs) are susceptible to faults in sensor data. Outlier detection is crucial for ensuring the quality of data analysis in WSNs. This paper proposes a novel improved support vector data description method (ID-SVDD) to effectively detect outliers of sensor data. ID-SVDD utilizes the density distribution of data to compensate SVDD. The Parzen-window algorithm is applied to calculate the relative density for each data point in a data set. Meanwhile, we use Mahalanobis distance (MD) to improve the Gaussian function in Parzen-window density estimation. Through combining new relative density weight with SVDD, this approach can efficiently map the data points from sparse space to high-density space. In order to assess the outlier detection performance, the ID-SVDD algorithm was implemented on several datasets. The experimental results demonstrated that ID-SVDD achieved high performance, and could be applied in real water quality monitoring.Entities:
Keywords: Parzen-window algorithm; outlier detection; support vector domain description; water quality monitoring; wireless sensor networks (WSNs)
Year: 2019 PMID: 31671540 PMCID: PMC6864849 DOI: 10.3390/s19214712
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Key notations.
| Symbol | Description |
|---|---|
|
| Radius of sphere |
| o | Center of sphere |
|
| The trade-off between sphere volume and the number of target data outside the sphere |
|
| Slack variable |
|
| Lagrange multiplier |
|
| The distance between an observation datum in the feature space and center |
|
| The mean of Parzen-window density Par( |
|
| The feature dimension of input data |
|
| Weighting factor |
|
| The number of target data |
|
| Relative density weight of |
|
| Parzen-window density of |
|
| Mahalanobis distance between vectors |
|
| Covariance matrix |
|
| Mean value of |
|
| Relative density weight array |
|
| The number of true positive results |
|
| The number of true negative results |
|
| The number of false positive results |
|
| The number of false negative results |
|
| The degree of polynomia |
|
| Bandwidth of Gaussian kernel function |
|
| A constant |
|
| A constant |
Figure 1Illustration of support vector data description (SVDD) in feature space for outlier detection.
Experimental datasets.
| Datasets | Attributes | Normal Data | Outliers |
|---|---|---|---|
| SensorScope node12 | 2 | 1411 | 44 |
| SensorScope node17 | 2 | 1309 | 137 |
| water quality data | 3 | 1706 | 50 |
Comparison among different kernel functions. TNR: true negative rate; TPR: true positive rate.
| Different Kernel Functions | TPR (%) | TNR (%) | Accuracy (%) |
|---|---|---|---|
| SensorScope12 | |||
| Linear | 89.3525 | 0 | 84.4898 |
| Ploy | 100 | 0 | 94.5578 |
| Gaussian | 99.4245 | 87.5 | 98.7755 |
| Tanh | 98.1295 | 20 | 93.8776 |
| SensorScope17 | |||
| Linear | 38.4095 | 28.1482 | 36.5014 |
| Ploy | 92.5550 | 45.9259 | 83.8843 |
| Gaussian | 100 | 97.037 | 99.449 |
| Tanh | 68.1895 | 0 | 55.5096 |
Figure 2Distribution of the SensorScope dataset.
Detection results of SensorScope datasets.
| ID-SVDD | D-SVDD | DW-SVDD | SVDD | |
|---|---|---|---|---|
| Node 12 | ||||
| TPR (%) | 99.4245 | 98.4173 | 70.5036 | 98.0496 |
| TNR (%) | 87.5 | 100 | 82.5 | 100 |
| Accuracy (%) | 98.7755 | 98.5034 | 71.1565 | 98.1788 |
| Time (s) | 0.489 | 0.5329 | 0.4211 | 0.5463 |
| Node 17 | ||||
| TPR (%) | 100 | 98.8338 | 90.3553 | 99.3232 |
| TNR (%) | 97.037 | 100 | 63.7037 | 98.5185 |
| Accuracy (%) | 99.449 | 98.8981 | 85.3994 | 99.1736 |
| Time (s) | 0.3794 | 0.578 | 0.4172 | 0.3763 |
Figure 3Illustration of the water quality dataset distribution in the training process. DO: dissolved oxygen.
Figure 4Detection results of the water quality dataset in the testing process.
Detection results for the water quality dataset. D-SVDD: density-compensated SVDD; DW-SVDD: density-weighted SVDD; ID-SVDD: improved density-compensated SVDD.
| pond13 | ID-SVDD | D-SVDD | DW-SVDD | SVDD |
|---|---|---|---|---|
| TPR (%) | 91.1374 | 89.0694 | 70.901 | 67.356 |
| TNR (%) | 96.2963 | 100 | 92.5926 | 96.2963 |
| Accuracy (%) | 91.3352 | 89.4886 | 71.733 | 68.4659 |
Figure 5Outlier detection time of the water quality dataset with different algorithms.