| Literature DB >> 30223516 |
Ricardo Perez-Castillo1, Ana G Carretero2, Ismael Caballero3, Moises Rodriguez4,5, Mario Piattini6,7, Alejandro Mate8, Sunho Kim9, Dongwoo Lee10.
Abstract
The Internet-of-Things (IoT) introduces several technical and managerial challenges when it comes to the use of data generated and exchanged by and between various Smart, Connected Products (SCPs) that are part of an IoT system (i.e., physical, intelligent devices with sensors and actuators). Added to the volume and the heterogeneous exchange and consumption of data, it is paramount to assure that data quality levels are maintained in every step of the data chain/lifecycle. Otherwise, the system may fail to meet its expected function. While Data Quality (DQ) is a mature field, existing solutions are highly heterogeneous. Therefore, we propose that companies, developers and vendors should align their data quality management mechanisms and artefacts with well-known best practices and standards, as for example, those provided by ISO 8000-61. This standard enables a process-approach to data quality management, overcoming the difficulties of isolated data quality activities. This paper introduces DAQUA-MASS, a methodology based on ISO 8000-61 for data quality management in sensor networks. The methodology consists of four steps according to the Plan-Do-Check-Act cycle by Deming.Entities:
Keywords: ISO 8000-61; Internet-of-Things; IoT; SCPs; Smart, Connected Products; data quality; data quality in sensors; data quality management processes
Year: 2018 PMID: 30223516 PMCID: PMC6165119 DOI: 10.3390/s18093105
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Layers in SCP environments.
SCP factors that can finally affect the levels of DQ according to [6].
| SCP Factor | Side Effect in Data Quality | Acquisition | Processing | Utilization |
|---|---|---|---|---|
| Deployment Scale | SCPs are expected to be deployed on a global scale. This leads to a huge heterogeneity in data sources (not only computers but also daily objects). Also, the huge number of devices accumulates the chance of error occurrence. | X | X | |
| Resources constraints | For example, computational and storage capabilities that do not allow complex operations due, in turn, to the battery-power constraints among others. | X | X | |
| Network | Intermittent loss of connection in the IoT is recurrent. Things are only capable of transmitting small-sized messages due to their scarce resources. | X | X | |
| Sensors | Embedded sensors may lack precision or suffer from loss of calibration or even low accuracy. Faulty sensors may also result in inconsistencies in data sensing. | X | ||
| Environment | SCP devices will not be deployed only in tolerant and less aggressive environments. To monitor some phenomenon, sensors may be deployed in environments with extreme conditions. Data errors emerge when the sensor experiences the surrounding environment influences [ | X | X | |
| Vandalism | Things are generally defenceless from outside physical threats (both from humans and animals). | X | X | |
| Fail-dirty. | A sensor node fails, but it keeps up reporting readings which are erroneous. It is a common problem for SCP networks and an important source of outlier readings. | X | X | |
| Privacy | Privacy preservation processing, thus DQ could be intentionally reduced. | X | ||
| Security vulnerability | Sensor devices are vulnerable to attack, e.g., it is possible for a malicious entity to alter data in an SCP device. | X | X | |
| Data stream processing | Data gathered by smart things are sent in the form of streams to the back-end pervasive applications which make use of them. Some stream processing operators could affect quality of the underlying data [ | X | X |
Sensors errors deriving DQ problems in SCP environments (adapted from [8]).
| Error | Description | Example |
|---|---|---|
| Temporal delay error | The observations are continuously produced with a constant temporal deviation |
|
| Constant or offset error | The observations continuously deviate from the expected value by a constant offset. |
|
| Continuous varying or drifting error | The deviation between the observations and the expected value is continuously changing according to some continuous time-dependent function (linear or non-linear). |
|
| Crash or jammed error | The sensor stops providing any readings on its interface or gets jammed and stuck in some incorrect value. |
|
| Trimming error | Data is correct for values within some interval but are modified for values outside the interval. Beyond the interval, the data can be trimmed or may vary proportionally. |
|
| Outliers error | The observations occasionally deviate from the expected value, at random points in the time domain. |
|
| Noise error | The observations deviate from the expected value stochastically in the value domain and permanently in the temporal domain. |
|
SCP Network Errors.
| Sensor Fault | DQ Problem | Root Cause | Solution |
|---|---|---|---|
| Omission faults | Absence of values | Missing sensor | Network reliability, retransmission |
| Crash faults (fading/intermittent) | Inaccuracy/absence of values | Environment interference | Redundancy/estimating with past values |
| Delay faults | Inaccuracy | Time domain | Timeline solutions |
| Message corruption | Integrity | Communication | Integrity validation |
DQ Characteristics in ISO/IEC 25012 that can be affected by sensor data errors.
| DQ Characteristics | Inherent | System Dependent | Temporal Delay Error | Constant or Offset Error | Continuous Varying/Drifting Error | Crash or Jammed Error | Trimming Error | Outliers Error | Noise Error |
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |||
| Accuracy | x | P | S | S | P | P | |||
| Completeness | x | P | S | S | |||||
| Consistency | x | S | P | S | S | ||||
| Credibility | x | S | S | P | S | ||||
| Currentness | x | P | S | ||||||
| Accessibility | x | x | |||||||
| Compliance | x | x | |||||||
| Confidentiality | x | x | |||||||
| Efficiency | x | x | |||||||
| Precision | x | x | P | S | |||||
| Traceability | x | x | S | S | |||||
| Understandability | x | x | |||||||
| Availability | x | S | S | ||||||
| Portability | x | ||||||||
| Recoverability | x | S | S |
Figure 2Methodology DAQUA-MASS phases and steps.
Inputs, outputs and RACIQ matrix for the Plan phase. (R—Responsible (works on); A—Accountable, C—Consulted; I—Informed; Q—Quality Reviewer).
| Step | Act. | Input | Output | CIO | CDO | DGM | SCP DQ Steward | DQ Steward | SCP Arch |
|---|---|---|---|---|---|---|---|---|---|
| P1 | P1-1 |
List of prioritized data quality requirements Perform analysis definition |
Specification of data lifecycle in SCP environments | I | A | C | R | ||
| P1-2 |
Specification of data lifecycle in SCP environments |
Data Quality Policies for SCPs Data Quality Procedures for SCPs | I | A | R | R | C | ||
| P1-3 |
Specification of data lifecycle in SCP environments Data Quality Policies for SCPs Data Quality Procedures for SCPs |
Meta-data specification | I | C | A | C | |||
| P1-4 |
Specification of data lifecycle in SCP environments Stakeholders or any other experts opinion |
Data Quality Strategy List of prioritized data quality requirements | I | A | R | R | R | C | |
| P2 | P2-1 |
List of prioritized data quality requirements Mechanisms for data monitoring |
Metrics list Measurement plan | I | A | C | R | ||
| P2-2 |
Metric list Measurement plan List of prioritized data quality requirements |
Implementation of measurement methods | A | Q | Q | R | |||
| P3 | P3-1 |
DQ Issues list Data Quality Strategy List of prioritized data quality requirements Business process definition |
Root causes for DQ issues related with SCP faults Report with the effects of DQ issues on business processes | I | A | R | R | R | |
| P3-2 |
Data Quality Strategy List of prioritized data quality requirements Business process definition |
DQ risk catalogue | I | I | A | R | R | C | |
| P3-3 |
Root causes for DQ issues related with SCP faults Report with the effects of DQ issues on business processes Specification of data lifecycle in SCP environments SCP node placement plan SCP node replication plan |
Solution definition for mitigating root causes | I | A | R | R | R | ||
| P3-4 |
Data Quality Strategy Business process definition DQ risk catalogue Specification of data lifecycle in SCP environments |
Improvement target list | |||||||
| P3-5 |
Data Quality Strategy List of prioritized data quality requirements Report with the effects of DQ issues on business processes DQ risk catalogue |
Data quality enhancement plan | I | Q | A | R | R | C |
Inputs, outputs and RACIQ matrix for the Do phase. (R—Responsible (works on); A—Accountable, C—Consulted; I—Informed; Q—Quality Reviewer).
| Step | Act. | Input | Output | CIO | CDO | DGM | SCP DQ Steward | DQ Steward | SCP Arch |
|---|---|---|---|---|---|---|---|---|---|
| D1 | D1-1 |
Meta-data specification Specification of data lifecycle in SCP environments |
Sensor flags specification | A | Q | R | |||
| D1-2 |
Specification of data lifecycle in SCP environments |
SCP node placement plan update SCP node replication plan update | I | A | Q | R | |||
|
Root causes for DQ issues related with SCP faults Specification of data lifecycle in SCP environments |
SCP hardware updates SCP software updates | I | A | Q | R | ||||
| D1-3 |
Root causes for DQ issues related with SCP faults Report with the effects of DQ issues on business processes SCP node placement plan SCP node replication plan |
Implementation of data cleansing mechanisms | I | A | R | C | |||
| D1-4 |
Data quality enhancement plan DQ risk catalogue Mechanisms for data monitoring |
Human inspection plan | I | A | Q | R | C | ||
| D1-5 |
Automated alert system | I | C | A | |||||
| D1-6 |
Root causes for DQ issues related with SCP faults Report with the effects of DQ issues on business processes Specification of data lifecycle in SCP environments SCP node placement plan SCP node replication plan |
Sensor maintenance plan | I | A | Q | R |
Inputs, outputs and RACIQ matrix for the Check phase. (R—Responsible (works on); A—Accountable, C—Consulted; I—Informed; Q—Quality Reviewer).
| Step | Act. | Input | Output | CIO | CDO | DGM | SCP DQ Steward | DQ Steward | SCP Arch |
|---|---|---|---|---|---|---|---|---|---|
| C1 | C1-1 |
Data quality enhancement plan List of prioritized data quality requirements DQ risk catalogue |
Mechanisms for data monitoring update DQ control mechanisms update | I | A | R | R | R | |
| C1-2 |
Specification of data lifecycle in SCP environments SCP node placement and replication plan |
Interstice comparison plan | I | A | Q | R | |||
| C2 | C2-1 |
Data Quality Strategy DQ risk catalogue Implementation of measurement methods |
DQ Issues list | I | I | A | R | Q |
Inputs, outputs and RACIQ matrix for the Act phase. (R—Responsible (works on); A—Accountable, C—Consulted; I—Informed; Q—Quality Reviewer).
| Step | Act. | Input | Output | CIO | CDO | DGM | SCP DQ Steward | DQ Steward | SCP Arch |
|---|---|---|---|---|---|---|---|---|---|
| A1 | A1-1 |
Sensor maintenance plan SCP node placement plan SCP node replication plan Root causes for DQ issues related with SCP faults Report with the effects of DQ issues on business processes Specification of data lifecycle in SCP environments |
Sensor calibration plan Replacement ready access strategy Sensor maintenance plan | I | A | C | R | ||
| A1-2 |
Device replacement plan | I | A | C | R |