| Literature DB >> 31024009 |
Samuel S C Rund1, Kyle Braak2, Lauren Cator3, Kyle Copas2, Scott J Emrich4, Gloria I Giraldo-Calderón5,6, Michael A Johansson7,8, Naveed Heydari9, Donald Hobern2, Sarah A Kelly10, Daniel Lawson10, Cynthia Lord11, Robert M MacCallum10, Dominique G Roche12, Sadie J Ryan13,14,15, Dmitry Schigel2, Kurt Vandegrift16, Matthew Watts3, Jennifer M Zaspel17, Samraat Pawar3.
Abstract
Arthropods play a dominant role in natural and human-modified terrestrial ecosystem dynamics. Spatially-explicit arthropod population time-series data are crucial for statistical or mathematical models of these dynamics and assessment of their veterinary, medical, agricultural, and ecological impacts. Such data have been collected world-wide for over a century, but remain scattered and largely inaccessible. In particular, with the ever-present and growing threat of arthropod pests and vectors of infectious diseases, there are numerous historical and ongoing surveillance efforts, but the data are not reported in consistent formats and typically lack sufficient metadata to make reuse and re-analysis possible. Here, we present the first-ever minimum information standard for arthropod abundance, Minimum Information for Reusable Arthropod Abundance Data (MIReAD). Developed with broad stakeholder collaboration, it balances sufficiency for reuse with the practicality of preparing the data for submission. It is designed to optimize data (re)usability from the "FAIR," (Findable, Accessible, Interoperable, and Reusable) principles of public data archiving (PDA). This standard will facilitate data unification across research initiatives and communities dedicated to surveillance for detection and control of vector-borne diseases and pests.Entities:
Mesh:
Year: 2019 PMID: 31024009 PMCID: PMC6484025 DOI: 10.1038/s41597-019-0042-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Population abundance time-series example. From New Jersey light trap mosquito surveillance performed by the Iowa State University Medical Entomology Laboratory from 1977-2017[61,62]. Data available for download on VectorBase[50].
The MIReAD Study Information (Resource metadata) fields. The information in this table should be included with every data submission, for example by including data in the file header as demonstrated in the examples[40]
| Field | Details | Recommendations | Examples |
|---|---|---|---|
| Contact details | A name, person, authority, | Include investigator ORCID(s), email address, website (if institutional) if possible. | Kurt Vandegrift |
| General description of the experiment/ collection set | A short description of the study objectives, sampling design, and hypotheses. | Useful things to indicate are: | “Monitoring of major pests on cucumber, sweet pepper and tomato under net-house conditions in Punjab, India” |
| Citations | Reference to related publications, digital if possible ( | “A web-based relational database for monitoring and analyzing mosquito population dynamics | |
| Species Identification Method | A description of method of species identification. Particularly important for cryptic species complexes. | Providing information on the veracity of the identification is encouraged, such as a reference to the exact identification key or method. | “Morphological” |
| Not present vs zero information | Indication of what gaps, zeros, NA, | It is imperative, especially for population surveys, to understand the difference between a species was not found when the collection method would be expected to find the given species (confirmed absence), or a species was not looked for ( | “Zero indicates a species was looked for and not found. NA represents a trap failure” |
| GPS information | If raw GPS data is obfuscated in any way, a statement on the manner by which this occurred should be given. | The highest resolution data ( | “Raw GPS points have been provided. GPS unit accuracy + /− 8 m” |
| Data usage information | The data reuse policy for your data. | For data to be F.A.I.R., it must be Reusable. We therefore recommend data be provided as “CC0” or “CC BY 4.0”. | “CC 0” |
The MIReAD data fields
| Field(s) | Details | Recommendations | Examples |
|---|---|---|---|
|
| Start time of the data sample collection. | Be as specific as | “2012-04-27” |
|
| End time of the data sample collection. | See above. | See above. |
|
| The geographical location of sample collection. | As detailed as possible. Latitude and longitude if possible with specified accuracy | “Kukar Maikiya, Jigawa State, Nigeria” and “40.697” and “ −74.015” |
|
| Sampling apparatus ( | “CDC light trap” | |
|
| The attractant/ lures used to attract insects to a trap or collection | Please be as specific as possible. For example, a cow used as bait should be referred to as a ‘cow’ and not ‘animal’, or specify the attractant used, such as ‘CO2’ | “None” |
| Collection area | The spatial extent (area or volume) of the sample. | If relevant ( | “100 m^2” |
|
| Classification of sample collected. | Scientific genus and species preferred. | “ |
|
| Description of exactly what was observed, the unit for the field “Value,” below. | Do not abbreviate. | “Number of individuals per m^2” |
|
| The numerical amount or result from the sample collection. | Units should be provided in a separate field or in the header. | “0” |
| Additional sample information field(s) | This could be more than one field and should be used when more information is required to understand the experiment, for example experimental variables, sub-locations, plant host cultivar/species, etc. | Do not abbreviate. | “Forest” |
| Sample Name | A human readable sample name. | Naming convention is not restricted, but any encoded metadata should be revealed in the other data fields. For example, you may name a sample named ‘Aphid1_StickyTrap_Jan4,’ but you will still have “Sticky Trap” listed in a Collection Method field, and “Jan 4, 2017” in the date field. | “Trap1_Night1” |
Figure 2b provides an annotated example. Field names in bold should be considered required. Remaining fields are optional or depend on the complexity of the experimental design.
Fig. 2MIReAD reduces data ambiguity. (a) Seemingly clean data can still lack key information or have ambiguous metadata, hindering data reuse. (b) MIReAD compliant data includes the metadata necessary for data reuse and removes ambiguity. (c) Note data can be formatted differently, but still be MIReAD compliant such as by presenting data in a wide format.
The MIReAD data fields and their mapping to the GBIF metadata profile.
| MIReAD Field | Corresponding GBIF metadata fields |
|---|---|
| Contact details | contact |
| General description of the experiment/collection set | designDescription; sampling |
| Citations | citation |
| Species Identification Method | designDescription |
| Not present vs zero information | samplingDescription |
| GPS obfuscation information | geographicDescription geodeticDatum |
| Data usage information | intellectualRights |
See GBIF[63] for more information.
The MIReAD data fields and their mapping to the Darwin Core data standard.
| MIReAD Field | Corresponding DarwinCore fields |
|---|---|
| Start Time (for collection) | eventTime |
| End Time (for collection) | eventTime |
| Location | A number of fields under Location See: |
| Collection method | samplingProtocol |
| Collection attractants | samplingProtocol |
| Collection area | samplingEffort |
| Taxonomy | A number of fields under Taxon See |
| Unit(s) of measurement and observation | sampleSizeUnite |
| Value | sampleSizeValue |
| Additional sample information | fieldNotes |
| eventRemarks | |
| SampleID | eventID |
| Sample Name |
See Wieczorek et al.[64] for more information.