| Literature DB >> 35717423 |
Lucas Pereira1, Nuno Velosa2, Manuel Pereira2.
Abstract
There is a generalized consensus in the Non-Intrusive Load Monitoring research community on the importance of public datasets for improving this research field. Still, despite the considerable efforts to release public data, what is currently available suffers from serious issues, among which is the lack of widely accepted data models and common interfaces to access the currently available and future datasets. This paper proposes the Energy Monitoring and Disaggregation Data Format (EMD-DF64). EMD-DF64 is a data model, file format, and application programming interface developed to provide a unique interface to create, manage, and access high-frequency (≥ 1 Hz) electric energy consumption datasets. More precisely, the present paper describes the data model and its respective implementation, which was done by leveraging the well-known Sony WAVE64 format that supports the storage of audio data and metadata annotations.Entities:
Mesh:
Year: 2022 PMID: 35717423 PMCID: PMC9206669 DOI: 10.1038/s41598-022-14517-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1EMD-DF64: Data model overview.
List of chunks that compose the RIFF-WAVE64 file format.
| Name | GUID | Description | Parent | EMD-DF64 |
|---|---|---|---|---|
| RIFFa | ‘riff’ | This is the main chunk and is mandatory for every file that is based on the RIFF standard | – | |
| WAVEa,b | ‘wave’ | Identifies the contents of the RIFF chunk as being of the type w64 | riff | |
| Formata | ‘fmt ’ | Defines the data format, e.g. sampling rate, sample size and bits and the number of channels | riff | |
| Dataa | ‘data’ | Waveform data can be stored as a single contiguous array of interleaved samples or as a discrete sequence of blocks of samples and silence wrapped in a ‘wavl’ chunk | riff | |
| Wave list | ‘wavl’ | Wraps sequences of data and silence chunks | data | – |
| Silent | ‘slnt’ | Represents silence and is defined as a count of silence samples | wavl | – |
| Fact | ‘fact’ | Stores information about how the waveform data is organized. It is mandatory when the waveform data is stored in a ‘wavl’ chunk and for all compressed audio formats | riff | – |
| Cue | ‘cue ’ | Identifies a series of positions in the waveform data as as having additional information associated with them. There is at most one cue chunk per wave file, and it is followed by a list of cue points | riff | |
| List | ‘list’ | This is a wrapper for RIFF chunks, which in the particular case of WAVE64 files is an associated data list (“adtl”) | riff | |
| Associated data lista,b | ‘adtl’ | Identifies a list that contains individual information attached to the cue points defined in the cue chunk | list | |
| Label | ‘labl’ | Associates a text label to a specific cue point. Must be defined inside the associated data list chunk | adtl | |
| Note | ‘note’ | Same as label, but usually contains comment text for a specific cue point | adtl. | |
| Labeled text | ‘ltxt’ | Associates a text comment to specific regions of waveform data. A region is a cue point whose adtl list duration in samples is defined in this chunk. Must be defined inside the associated data list chunk | adtl | |
| Embedded file info | ‘file’ | Contains information described in other file formats (e.g. ASCII text files) that is associated with a particular cue point | adtl | – |
| Playlist | ‘plst’ | Specifies a play order for a series of cue points | riff | – |
| Infob | Identifies a list that contains the info chunks[ | riff |
a These chunks are mandatory.
b‘WAVE’ , ‘adtl and ‘INFO’ are chunk identifiers.
Figure 2Chunk structure of the EMD-DF 64 file format.
List of EMD-DF64 specific chunks.
| Name | GUID | Description | Parent |
|---|---|---|---|
| Configa | “CNFG” | List with file specific configuration chunks | riff |
| Timestampa | “TMSP” | Unix timestamp of the first sample in the waveform data | CNFG |
| Timezonea | “TMZN” | Timezone of the place where the data was collected | CNFG |
| Sampling ratea | “SPRT” | Sampling rate of the waveform data (overwrites the original value in the format chunk if the actual sampling rate is lower than 1 Hz) | CNFG |
| Calibration constantsa | “CHCC” | Calibration constants to recreate the original values of the waveform data. One constant for each channel | CNFG |
| Missing data list | “MDL” | List with missing data identifiers | riff |
| Missing data | “MDAT” | Chunk containing a missing data identifier | MDL |
| Annotation | “ANNO” | Identifies a list that contains metadata and comment chunks | riff |
| Metadata | “META”’ | This is metadata specific chunk. Must be contained in the Annotation chunk | ANNO |
| Comment | “COMT” | This is a comment specific chunk and must be specified within the Annotation chunk | ANNO |
aThese chunks are mandatory.
List of RIFF metadata chunks supported by EMD-DF64.
| Name | GUID | Description |
|---|---|---|
| File creator | “IART” | The name of the file creator |
| Commissioner | “IMCS” | The name of the dataset commissioner |
| Comments | “ICMT” | Free text comment |
| Copyright | “ICOP” | Dataset copyright notice |
| Creation data | “ICRD” | Data of dataset creation |
| Keywords | “IKEY” | A list of keywords to describe the dataset content |
| Name | “INAM” | The dataset name |
| Product | “IPRD” | Original purpose of the dataset |
| Subject | “ISBJ” | Contents of the file (e.g., current and voltage waveforms) |
| Software | “ISFT” | Name of the software that was used to create the file |
Figure 3EMD-DF64 class diagram.
Description of the demos provided along the the EMD-DF64 software package.
| Demo | Objective |
|---|---|
| Explorer | Demonstrates different features of the EMD-DF64. The features include: (1) reading an existing file, (2) writing a new file, (3) updating an existing file, (4) converting between file formats, and (5) merging files. |
| BLUED | Demonstrates the conversion of the BLUED dataset. This demo includes adding appliance labels, appliance activities, and metadata (local notes, comments, and RIFF Info chunks) |
| SustDataED | Demonstrates the conversion of the SustDataED dataset. This demo includes adding labels to datasets with missing data, as well as merging dataset files |
| Python_demo | Python notebook that demonstrates the application of the pyemddf to handle EMD-DF64 files in Python |
| MATLAB_demo | MATLAB script that demonstrates the handling of EMD-DF64 files using the EMDDF.m class |
Comparison of file formats using BLUED as a reference dataset.
| TXT | FLAC | EMD-DF (WAVE) | EMD-DF64 (W64) | |
|---|---|---|---|---|
| Current and voltage files | 6430 | 32 | 14 | 2 |
| Total size | 320a | 27 | 55 | 51 |
| WavPack compress (GB) | –b | –a | 30 | 28 |
a54 GB when compressed using RAR (see https://www.rarlab.com/).
bNot applicable.