| Literature DB >> 28881939 |
Elizabeth L Brainerd1, Richard W Blob2, Tyson L Hedrick3, Andrew T Creamer4, Ulrike K Müller5.
Abstract
SYNOPSIS: Standards-based data management facilitates data preservation, discoverability, and access for effective data reuse within research groups and across communities of researchers. Data sharing requires community consensus on standards for data management, such as storage and formats for digital data preservation, metadata (i.e., contextual data about the data) that should be recorded and stored, and data access. Video imaging is a valuable tool for measuring time-varying phenotypes in organismal biology, with particular application for research in functional morphology, comparative biomechanics, and animal behavior. The raw data are the videos, but videos alone are not sufficient for scientific analysis. Nearly endless videos of animals can be found on YouTube and elsewhere on the web, but these videos have little value for scientific analysis because essential metadata such as true frame rate, spatial calibration, genus and species, weight, age, etc. of organisms, are generally unknown. We have embarked on a project to build community consensus on video data management and metadata standards for organismal biology research. We collected input from colleagues at early stages, organized an open workshop, "Establishing Standards for Video Data Management," at the Society for Integrative and Comparative Biology meeting in January 2017, and then collected two more rounds of input on revised versions of the standards. The result we present here is a rubric consisting of nine standards for video data management, with three levels within each standard: good, better, and best practices. The nine standards are: (1) data storage; (2) video file formats; (3) metadata linkage; (4) video data and metadata access; (5) contact information and acceptable use; (6) camera settings; (7) organism(s); (8) recording conditions; and (9) subject matter/topic. The first four standards address data preservation and interoperability for sharing, whereas standards 5-9 establish minimum metadata standards for organismal biology video, and suggest additional metadata that may be useful for some studies. This rubric was developed with substantial input from researchers and students, but still should be viewed as a living document that should be further refined and updated as technology and research practices change. The audience for these standards includes researchers, journals, and granting agencies, and also the developers and curators of databases that may contribute to video data sharing efforts. We offer this project as an example of building community consensus for data management, preservation, and sharing standards, which may be useful for future efforts by the organismal biology research community.Entities:
Mesh:
Year: 2017 PMID: 28881939 PMCID: PMC5886321 DOI: 10.1093/icb/icx060
Source DB: PubMed Journal: Integr Comp Biol ISSN: 1540-7063 Impact factor: 3.326
Rubric for best practices in video data management for organismal biology research
| Standards | Level 0: unacceptable | Level 1: good | Level 2: better | Level 3: best |
|---|---|---|---|---|
| (1) Data storage | Single copy, local disk storage only (such as on a hard drive). | A local working copy plus an archival | One archival | Archival |
| (2) Video file formats | Video files compressed, resized, or at a different frame rate from the original video files (e.g., YouTube or Vimeo). | Original, archival | Level 1 plus version converted to a widely accessible format with maximum data preservation in the conversion. | Level 2 plus compressed/converted version(s) |
| (3) Metadata linkage | Metadata absent or separate from video files (such as in lab notebooks); substantial effort required to share. | Metadata contained in digital files in a widely used format. Metadata files linked to video files by similar file names OR by bundling each video file together with its metadata into an uncompressed archive, such as zip, tar or hdf5. | Same as Level 1 except metadata files linked to video files by similar file names AND by bundling each video file together with its metadata; OR metadata text embedded in the video file itself. | Metadata, including video file name, encoded in XML or other machine-readable format and contained within the video files themselves or by bundling each video file together with its metadata. |
| (4) Video data and metadata access | Not directly accessible online; substantial effort required to share. | Video data and metadata available in an Internet-accessible location, such as in commercial cloud | Video data and metadata online in a public repository with a stated mission of providing public access to data | Level 2 plus metadata stored in a manner to make the videos discoverable on the web; i.e., metadata searchable and viewable without downloading a large video bundle |
| (5) Contact information and acceptable use | No contact information and no statement of terms of reuse. | Contact name and e-mail address and a clear statement about rights and acceptable reuse of the video. | Name, e-mail and assignment of an internationally-recognized content license | Level 2 plus ORCID ID for contact person and the assignment of a unique identifier such as a digital object identifier that can be used for the data’s discovery and citation. |
| (6) Camera settings | No metadata. | Frame rate (frames per second). | Frame rate and spatial calibration data and number of cameras and camera ID (camera used for this specific video) if part of multi-camera system. | Level 2 plus four or more of the following: video resolution (in pixels); shutter speed/exposure time; audio (Y/N); camera make and model; lens type; video type (e.g., monochrome, color, X-ray, PIV, infrared); file format; camera view (e.g., lateral); original video or post-processed; length (duration) of the video. |
| (7) Organism(s) | No metadata. | Genus and species (more than one binomial if more than one species in the video). | Genus, species, and at least one of the following: individual ID (multiple individual IDs if multiple individuals); some measure of size (e.g., length, weight). | Level 2 plus four or more of the following: sex; age; life stage; physical condition (e.g., prior invasive procedures, senescent, gravid, mutant); wild caught or captive bred; higher taxonomic groupings above genus; common name; links to publications using the same individual(s); accession number(s) if individual(s) were deposited in a museum. |
| (8) Recording conditions | No metadata. | Date OR location recorded (institution or field site or location name or GPS coordinates) | Date AND location recorded (institution or field site or location name or GPS coordinates) | Level 2 plus two or more of the following: temperature; light regime; time of day; humidity; season; auxiliary data recorded (e.g., EMG, pressure, force; none); synchronization method for auxiliary data (if any); environment (e.g., water, land, air, tree canopy; treadmill; trackway; flume); recorded indoors or outdoors. |
| (9) Subject matter/ topic | No metadata. | Text description (abstract) of the contents of the video, including any related publication citations and information on the original purpose for which the video was collected. If the video belongs to a specific collection, experiment, field project, etc., include its name/title. Funding source and other acknowledgments should be included. | Level 1 plus 5–10 keywords. Suggestions for keywords include behavior (e.g., swimming, flying, displaying); experimental treatments applied (e.g., incline of treadmill, food type, denervated); entire organism visible or focus on part (e.g., head, knee, caudal fin). | Level 1 plus 5–10 keywords selected from an internationally recognized list (controlled vocabulary/taxonomy) of subjects/topics, (e.g., Encyclopedia of Life TraitBank). |
Archival copy is unmodified from original and remains unmodified.
Commercial cloud services such as Google Drive, Dropbox, Amazon S3, Rackspace, EMC, MS OneDrive, or other storage managed by IT professionals, such as through an academic institution.
Scientific repositories (e.g., Dryad, OSF, XMA/ZMAPortal) and university and library-based data repositories would meet this standard.
Levels 1 and 2 are identical if original file formats are widely accessible and Levels 1–3 are identical if original files are also small enough for easy viewing and accessibility online.
A warning should be provided if frame rate or pixel resolution has been changed for files that can be downloaded, since these affect spatial and temporal calibrations.
Embargo periods are permitted; public access defined here as offered with a license that permits reuse of the data.
Best is metadata and videos in a database with an interface designed to make them discoverable from outside the site and the metadata searchable and videos viewable from within the interface.
Such as a Creative Commons CC0 or CC BY, or GNU General Public License or Open Data Commons ODC.
Use of international standards is encouraged: ISO 8601 for date format; ISO 6709 for GPS coordinates; and ISO 27729 for institutions.
Examples of the video data management rubric applied to published works
| Standards | Example 1 | Example 2 | Example 3 |
|---|---|---|---|
| Example source | |||
| Dryad Data DOI: | Data archived at xmaportal.org: unique Study ID Brown40. | Dryad Data DOI: | |
| (1) Data storage | Level 2: Original, archival files on IT-managed storage server (triple backup) at the co-PI’s institution, on DVDs and external hard drives at PI’s and co-PI’s institutions, and cropped images (to reduce file size) on Dryad data repository (not archival). | Level 3: XMAPortal (data repository), storage on Google drive, two external hard drive copies at co-PI's institutions. | Level 2: Maintained in IT-managed storage at UNC, a network accessible lab working copy, and an off-network lab backup; incomplete video set also on Dryad data repository with the kinematics. |
| (2) Video file data | Level 2: Uncompressed original format (tiff images) on Dryad, description of image processing on Dryad (cropping); original uncompressed format (tiffs) on storage server. | Level 3: Original cine files stored in the XMAPortal; XMAPortal interface allows video viewing online, as well as download of videos converted to user-selected formats. | Level 2: Original camera output, in this case the widely accessible .MOV container with h.264 compression. |
| (3) Metadata linkage | Level 1: Metadata of all videos (video format, description of video sequences) available on Dryad as text files (dat and txt format), metadata not bundled with video file, but in a separate metadata folder. | Level 3: Metadata contained within original cine files; metadata downloadable from XMAPortal with same file names as associated videos and bundled into zip archives. | Level 2: Metadata available as plain text on Dryad and in the .MOV container headers; including frame rate, resolution, UTC recording time and GPS location. |
| (4) Video data and metadata access | Level 2: Videos and metadata available on Dryad. | Level 3: Videos and metadata available through XMAPortal; metadata searchable and videos and metadata viewable within the Portal interface. | Level 1: Metadata and an incomplete video set on Dryad; complete video set in a shareable and internet-accessible location. |
| (5) Contact information and acceptable use | Level 2: Contact details and statement provided about how to cite the data on Dryad with CC0 licensing through Dryad and DOI through Dryad. | Level 3: Names, contact e-mails, ORCID IDs and CC BY 4.0 licensing provided in study metadata through XMAPortal; study assigned unique identifier in the XMAPortal. | Level 3: ORCID IDs associated with data in Dryad with Dryad CC0 licensing and DOI through Dryad. |
| (6) Camera settings | Level 1: Metadata documents on Dryad give frame rate and calibration factor (meters/pixel) given in separate file, one file per video recording. | Level 2: Frame rate, spatial calibration, and camera ID available, with some additional information (X-ray settings, shutter speed). | Level 1: Single-camera settings available in file headers; multi-camera calibration and lens distortion data in an internet-accessible location with complete video data. |
| (7) Organism(s) | Level 2: Species and age provided; additional data (size) provided in publication but not on Dryad. | Level 2: Genus, species, size, and individual ID provided. | Level 1: Genus, species provided on Dryad. |
| (8) Recording conditions | Level 1: Date of recording provided; location and context provided only in associated publication, not in the metadata document on Dryad. | Level 2: Date and location of recording provided, as well as environment; people who recorded video available through high level of database menu. | Level 2: Date and location encoded in camera metadata; these plus the environmental context are also provided in the publication. |
| (9) Subject matter/topic | Level 1: Abstract on Dryad. | Level 1: Association of video with project identified through XMAPortal; text description of behavior provided with menu listing of each video. | Level 2: Subject matter, behavior, and original recording purpose described in Dryad metadata. |