| Literature DB >> 30621241 |
Rahim Khan1, Ihsan Ali2, Saleh M Altowaijri3, Muhammad Zakarya4, Atiq Ur Rahman5, Ismail Ahmedy6, Anwar Khan7, Abdullah Gani8.
Abstract
Multivariate data sets are common in various application areas, such as wireless sensor networks (WSNs) and DNA analysis. A robust mechanism is required to compute their similarity indexes regardless of the environment and problem domain. This study describes the usefulness of a non-metric-based approach (i.e., longest common subsequence) in computing similarity indexes. Several non-metric-based algorithms are available in the literature, the most robust and reliable one is the dynamic programming-based technique. However, dynamic programming-based techniques are considered inefficient, particularly in the context of multivariate data sets. Furthermore, the classical approaches are not powerful enough in scenarios with multivariate data sets, sensor data or when the similarity indexes are extremely high or low. To address this issue, we propose an efficient algorithm to measure the similarity indexes of multivariate data sets using a non-metric-based methodology. The proposed algorithm performs exceptionally well on numerous multivariate data sets compared with the classical dynamic programming-based algorithms. The performance of the algorithms is evaluated on the basis of several benchmark data sets and a dynamic multivariate data set, which is obtained from a WSN deployed in the Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences and Technology. Our evaluation suggests that the proposed algorithm can be approximately 39.9% more efficient than its counterparts for various data sets in terms of computational time.Entities:
Keywords: WSN data; dynamic programming; longest common subsequence; multivariate data set
Year: 2019 PMID: 30621241 PMCID: PMC6339076 DOI: 10.3390/s19010166
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1An overview of the proposed similarity computation mechanism.
A short description of the collected time series data “S” for a single sensor node—measured at 15 min intervals (soil moisture represents soil wetness kilo ohm).
| Date and Time | Temperature | Humidity | Soil Moisture |
|---|---|---|---|
| 15 April 2011 22:00 | 35 | 92 | 780 |
| 15 April 2011 22:15 | 39 | 82 | 778 |
| 15 April 2011 22:30 | 36 | 87 | 776 |
| 15 April 2011 22:45 | 35 | 91 | 774 |
| 15 April 2011 23:00 | 37 | 85 | 772 |
| 15 April 2011 23:15 | 36 | 87 | 772 |
| 15 April 2011 23:30 | 38 | 83 | 770 |
| 15 April 2011 23:45 | 35 | 90 | 767 |
| 15 April 2011 00:00 | 38 | 82 | 762 |
| 15 April 2011 00:15 | 36 | 86 | 756 |
A short description of the collected time series data “T” for a single sensor node—measured at 15 min intervals (soil moisture represents soil wetness kilo ohm).
| Date and Time | Temperature | Humidity | Soil Moisture |
|---|---|---|---|
| 16 April 2011 2:00 | 37 | 85 | 782 |
| 16 April 2011 2:15 | 35 | 92 | 780 |
| 16 April 2011 2:30 | 36 | 87 | 776 |
| 16 April 2011 2:45 | 38 | 84 | 775 |
| 16 April 2011 3:00 | 35 | 91 | 774 |
| 16 April 2011 3:15 | 37 | 85 | 772 |
| 16 April 2011 3:30 | 38 | 83 | 770 |
| 16 April 2011 3:45 | 35 | 90 | 767 |
| 17 April 2011 0:00 | 38 | 81 | 762 |
| 17 April 2011 0:15 | 36 | 86 | 756 |
Figure 2Sliding window-based control parameter to reduce the total number of comparisons needed to find a match in both data sets. Here, the x-axis shows the time of the collected temperature data, and the y-axis shows the value of the data in C.
Figure 3Comparison of the computational times of the proposed and dynamic programming-based algorithms—using data sets of constant length and variable dimensionality (the lowest lines i.e., minimum values are better).
Figure 4Comparison of the computational times of the proposed and dynamic programming-based algorithms—using data sets of variable length and constant dimensionality (the lowest lines i.e., minimum values are better).
Figure 5Computational time (in seconds) of the proposed algorithm on larger data sets (12,000 × 12,000 values) (the computational time increases linearly with respect to the size of data sets).
Figure 6Comparison of the computational times of the proposed and dynamic programming-based algorithms using data sets of constant length and dimensionality. The lowest lines (i.e., the minimum values) reflect better performance.
Comparison of the computational times (in seconds) of the proposed and dynamic programming-based algorithms—using benchmark data sets (minimum values, as shown in bold face, are better) [31,32,34].
| Data Sets | Computational Time (Seconds) | ||
|---|---|---|---|
| DP-Based Algorithm | DP-Based Algorithm | Proposed Algorithm | |
| with k-2 | |||
| Hobolink-500 | 1.1091 | 1.1880 | 0.9220 |
| Amex-500 | 0.625 | 0.8060 | 0.5470 |
| Robotic-300 | 0.2810 | 0.3170 | 0.2650 |
| Haptitus-500 | 0.4220 | 0.4561 | 0.3910 |
| Twopattern-500 | 1.1720 | 1.2138 | 0.7810 |
| Bacteria-700 | 1.3910 | 1.7430 | 1.0470 |