| Literature DB >> 21941477 |
Jan Grewe1, Thomas Wachtler, Jan Benda.
Abstract
Metadata providing information about the stimulus, data acquisition, and experimental conditions are indispensable for the analysis and management of experimental data within a lab. However, only rarely are metadata available in a structured, comprehensive, and machine-readable form. This poses a severe problem for finding and retrieving data, both in the laboratory and on the various emerging public data bases. Here, we propose a simple format, the "open metaData Markup Language" (odML), for collecting and exchanging metadata in an automated, computer-based fashion. In odML arbitrary metadata information is stored as extended key-value pairs in a hierarchical structure. Central to odML is a clear separation of format and content, i.e., neither keys nor values are defined by the format. This makes odML flexible enough for storing all available metadata instantly without the necessity to submit new keys to an ontology or controlled terminology. Common standard keys can be defined in odML-terminologies for guaranteeing interoperability. We started to define such terminologies for neurophysiological data, but aim at a community driven extension and refinement of the proposed definitions. By customized terminologies that map to these standard terminologies, metadata can be named and organized as required or preferred without softening the standard. Together with the respective libraries provided for common programming languages, the odML format can be integrated into the laboratory workflow, facilitating automated collection of metadata information where it becomes available. The flexibility of odML also encourages a community driven collection and definition of terms used for annotating data in the neurosciences.Entities:
Keywords: datamodel; datasharing; metadata; neuroscience; ontology
Year: 2011 PMID: 21941477 PMCID: PMC3171061 DOI: 10.3389/fninf.2011.00016
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Figure 1The flow of data and metadata in sciences. The basis of this “food chain,” on top, is the laboratory in which the data is originally recorded, stored, managed and analyzed. Here metadata are important in many respects. Data management uses them to categorize and organize the data, during data analysis stimulus information is required and further, derived, data characteristics are added which again may be useful for querying data, etc. Data may further be shared with collaborators for discussion and re-evaluation. Eventually, data may be made available via public databases like the G-Node (Herz et al., 2008). On all levels data exchange between people as well as computer programs requires a detailed annotation of the raw data with metadata.
Figure 2Open metaData Markup Language Entity-Relation diagram. The odML model is a tree structure of Sections and Properties. Connecting lines and “crow's feet” indicate the relationship between the entities. For example: a Section can contain 0 to many (n) Properties which in turn must have at least 1 Value. The recursive connection of the Section indicates that there can be 0 to many subsections building the tree. All is embraced by a RootSection that contains some document-related elements. All elements listed in the different entities may at maximum occur once.
Data types.
| Type | Description | Example |
|---|---|---|
| Int | Integer value | −1024 |
| Float | Floating point value | −3.1416 |
| String | Any short string of characters | A short comment |
| Text | Longer text potentially spanning several lines | A much longer text that might require more than one line |
| Tuples with | E.g., resolution of a screen (1024;768) pixel, or coordinate information. | |
| Date | Date in yyyy-mm-dd format | 2009-05-26 |
| Time | The local time in hh:mm:ss format | 11:51:00 |
| Date time | Date and time joined (“yyyy-mm-dd hh:mm:ss”-format) | 2009-05-26 11:51:00 |
| Boolean | True or false | True |
| URL | A resource (file) location on the local filesystem or on the web | |
| Binary | Binary content of, e.g., an image file (base64 encoded) | |
| Person | The entered value describes a person. Data type used for name matching in the library | John Doe or Doe, John, or J. Doe, etc. |
Valid data types for values and uncertainties of a odML-Property that should be used when specifying metadata. These types are not restricted in the format or implementation thus, new types could be invented.
Section types.
| Section type | Description |
|---|---|
| Analysis | Descriptions of an analysis. |
| Analysis/psth | Properties to describe a peri stimulus time histogram. |
| Analysis/power spectrum | Properties to describe a power spectrum. |
| Analysis/coherence | Properties to describe a coherence spectrum. |
| Cell | Descriptions of a recorded cell. |
| Collection/event list | Section to combine lists of events. |
| Collection/hardware properties | Descriptions of the hardware characteristics. |
| Collection/hardware settings | Descriptions of the actual hardware settings like filter adjustments, etc. |
| Dataset | Description of a dataset. Recording time, files, etc. |
| Electrode | Description of an electrode. |
| Event | Generic descriptions of an event. |
| Experiment | General Experiment descriptions. |
| Experiment/behavior | For descriptions of an behavioral experiment. |
| Experiment/electrophysiology | Properties to describe an electrophysiological experiment. |
| Experiment/imaging | Properties to describe an imaging experiment. |
| Experiment/psychophysics | Properties to describe psychophysical experiments. |
| Hardware | Descriptions of an hardware item. |
| Hardware/amplifier | Descriptions of an electrophysiological amplifier (type, operations mode…). |
| Hardware/attenuator | Descriptions of an attenuator device (gain…). |
| Hardware/camera objective | Description of an camera objectives (focal length, aperture…). |
| Hardware/daq | Properties and settings of a data acquisition device. |
| Hardware/eyetracker | Properties and settings of an eyetracker device. |
| Hardware/filter | Description of a filter device (lowpass, bandpass, highpass, etc.). |
| Hardware/filterSet | Description of a filter set or filter cube used in a microscope. |
| Hardware/iaq | Properties and settings of an image acquisition device (camera, frame grabber) |
| Hardware/light source | Description of a light source. |
| Hardware/microscope | Description of a microscope. |
| Hardware/microscope objective | Descriptions of a microscope objective. |
| Hardware/scanner | Descriptions of the scanner used to sample microscope images. |
| Hardware/stimulus isolator | Descriptions of an stimulus isolator device. |
| Person | Descriptions of a person. |
| Preparation | Properties to describe preparation procedures ( |
| Project | Properties to describe the scientific project to which recorded data belongs |
| Recording | Properties to describe a recording session. (date, experimenter, etc.) |
| Setup | Properties to describe a recording setup |
| Stimulus | Properties to describe a stimulus |
| Stimulus/dc | A constant stimulus (DC) or stimulus intensity offset |
| Stimulus/gabor | Definition of a gabor stimulus |
| Stimulus/grating | Definition of a grating stimulus (squareqwave or sine wave, etc.) |
| Stimulus/movie | Definitions of an image sequence |
| Stimulus/pulse | Description of a pulse stimulus (width, intensity, timing…) |
| Stimulus/ramp | Description of a ramp stimulus (slope start intensity…) |
| Stimulus/random dot | Description of random dot stimulus |
| Stimulus/sawtooth | Descriptions of a sawtooth stimulus |
| Stimulus/sine wave | Descriptions of a sine wave stimulus (frequency, amplitude…) |
| Stimulus/squareWave | Descriptions of a squarewave stimulus (frequency, amplitude…) |
| Stimulus/whiteNoise | Descriptions of a white noise stimulus (cutoff-frequency, SD…) |
| Subject | Description of an experimental subject (species, age, sex…) |
The type of a section defines what kind of information is contained.
Dataset terminology.
| Name | Definition | Type |
|---|---|---|
| Experimenter | The person who recorded the data | Person |
| Start | The point in time the recording began | Datetime |
| End | The point in time the recording ended | Datetime |
| Comment | A comment about the dataset | Text |
| File | The location (URL) of files of this dataset | URL |
| File | The data file itself | Binary |
| Quality | An estimate of the dataset quality | String |
List of properties defined in the dataset terminology describing a set of recorded data. The respective section containing these properties is of the “dataset” type.
Amplifier terminology.
| Name | Description | Data type | Dependency | Dependency value |
|---|---|---|---|---|
| Model | The model name of this hardware item | String | – | – |
| Manufacturer | The manufacturer of this hardware item | String | – | – |
| Serial no | The device serial number | String | – | – |
| Inventory no | The inventory number of the described hardware item | String | – | – |
| Owner | An identifier of the owner of this hardware item | String | – | – |
| Amplifier type | The type of amplifier. E.g., extracellular amplifier, intracellular amplifier, etc. | String | – | – |
| Measurement type | The type of measurement performed. For example the membrane voltage was measured or the cell was current clamped | String | – | – |
| Operation mode | The operation mode the amplifier was in. The operation mode can be “continuous” or “discontinuous” for bridge and switched amplifiers, respectively | String | – | – |
| Switching frequency | The switching frequency of the amplifier | Float | Operation mode | Discontinuous |
| Duty cycle | The duty cycle of the current injections in discontinuous/switched mode. Note: The duty cycle refers to the setting of a switched amplifier and is not to be confused with the duty cycle used to describe a stimulus protocol for square wave current injections | Float | Operation mode | Discontinuous |
| Gain | The gain of the amplifier | Float | – | – |
| High pass cutoff | The high pass-filter cutoff-frequency | Float | – | – |
| Low pass cutoff | The low pass-filter cutoff-frequency | Float | – | – |
List of properties that can be used to define the settings and properties of an amplifier used in electrophysiological setups (Section type: “hardware/amplifier”).
Figure 3Hardware descriptions in . Hardware descriptions can be split up into the HardwareProperties and HardwareSettings. These container sections then group subsections for the individual hardware items used in the setup. Sections are shown in the form “name – [type].”
Figure 4Describing a stimulus in . odML description of a visual stimulus which is an additive combination of three components. The trace on top shows how the actual stimulus might have looked like. Sections are shown in the form “name – [type].”
.
| Element | Mandatory | Description | Example |
|---|---|---|---|
| Value | Yes | The value of the property | 53.4 |
| Uncertainty | No | Specifies the uncertainty of the value. The number of uncertainty values given should match the number of values. Error estimates must have the same unit as the value (e.g., SD not the variance). What kind of uncertainty measure is used can be specified in the definition element | 6.2 |
| Unit | No | The unit of the value and the uncertainty. Can be given only once. | Hz |
| Type | No | This entry specifies the data-type (see Table | Float |
| Definition | No | This entry is meant for definitions regarding the value. For example it can be used to refer to an ontology, | |
| Reference | No | The reference entry can be used to, e.g., refer to an entry in a database | – |
| Filename | No | The filename that should be used if binary content is transported in this property. There may be one filename for each value entry | – |
| Encoder | No | Binary content must be encoded into ascii to be included in | Base64 |
| Checksum | No | If binary content is directly included or if the URL of an external file is given, the checksum entry can be used to validate the file's identity, integrity. Use this element to indicate the algorithm and the checksum in the format algorithm$checksum | crc32$b84892a2 for a checksum calculated with a crc 32 bit algorithm |
The value elements and their meaning in detail.
Figure 5Transporting dataset information in . (A) Parts of the description of a simple electrophysiological experiment in which a single cell was recorded and several datasets were saved to disk. (B) Experiments in which several datasets have been recorded in a number of cells from the same subject. (C) Description of simultaneous recordings of two cells. Note: For clarity Properties are omitted in (B,C). Sections are shown in the form “name – [type].”
Figure 6Using mappings. This figure shows how mappings can be applied to convert a metadata tree from one layout to another. The left panel shows metadata that are organized as suggested by the CARMEN “Mini” metadata standard. The metadata file is in the odML format and refers to the CarmenMini terminology which defines mappings for properties and sections. These are URLs to the respective properties in the odML-terminologies. Applying this mapping information converts the tree to the layout suggested by the odML-terminologies (right panel).
Listing 2Dummy Matlab function “powerSpectrum.m” to illustrate how metadata can be retrieved and used during data analysis.
Open metaData Markup Language root section.
| Element | Mandatory | Description | Example |
|---|---|---|---|
| Author | No | The author of the document | – |
| Date | No | The date the document was created (yyyy-mm-dd format) | – |
| Version | No | The version of the document | – |
| Repository | No | Defines the default repository used in this document. This information is overwritten by repository elements in subsections | – |
| Section(s) | Yes (at least 1) | The first level subsections of the | – |
The root section elements and their meaning in detail.
Open metaData Markup Language section.
| Element | Mandatory | Description | Example |
|---|---|---|---|
| Type | Yes | The section type. The type allows categorization. For example we suggest that all hardware related section to be of the “hardware” type. A section must have a type while the user is free to add new types. Type entries can occur only once in a section | Hardware/amplifier |
| Name | No | The section name. This entry should be given but may be overridden by a | AmplifierNo1 |
| Definition | No | Defines the information contained in the section | This section describes the properties and settings of an amplifier. |
| Reference | No | The identifier for the entity represented by this section as it may be used in a data management system, etc. | Ampl-z42 |
| Link | No | This element defines an internal link within the actual document or | (See “How to use” paragraph in the main text) |
| Include | No | This element defines a link to an external | (See “How to use” paragraph in the main text) |
| Repository | No | A section can be based on a pre-defined terminology (see below). The repository element specifies the file in which the definition can be found, e.g., | |
| Mapping | No | A section may also map to another section. When conversion is requested, all containing properties, as long as they themselves don't define a mapping, will be put into the target section | |
| Section | No | A section can have subsections allowing to build a tree-like structure | |
| Property | No | A section can have properties which constitute the actual content of the section |
The section elements and their meaning in detail.
odML-property.
| Element | Mandatory | Description | Example |
|---|---|---|---|
| Name | Yes | The name of the property | “Firing rate” |
| Value | Yes | The value (see Table | – |
| Definition | No | This entry defines the meaning of this property. Can be given only once | The number of action potentials fired by a neuron per second |
| Mapping | No | The mapping element maps a property to a different one, e.g., one defined in an | |
| Dependency | No | This element offers the opportunity to introduce dependencies between properties: i.e., this very property may only be meaningful if a certain other property is also specified in the same section (see Table 2 for an example). The | – |
| Dependency value | No | The dependencyValue further specifies the dependencies of this property. It can restrict the dependency to the case in which the property referred by the dependency field assumes the very value given with this field (see Table 2 for an example). Can be given only once | – |
The property elements and their meaning in detail.