| Literature DB >> 32286314 |
Jorge Assis1, Eliza Fragkopoulou2, Duarte Frade2, João Neiva2, André Oliveira2, David Abecasis2, Sylvain Faugeron3,4, Ester A Serrão2.
Abstract
Species distribution records are a prerequisite to follow climate-induced range shifts across space and time. However, synthesizing information from various sources such as peer-reviewed literature, herbaria, digital repositories and citizen science initiatives is not only costly and time consuming, but also challenging, as data may contain thematic and taxonomic errors and generally lack standardized formats. We address this gap for important marine ecosystem-structuring species of large brown algae and seagrasses. We gathered distribution records from various sources and provide a fine-tuned dataset with ~2.8 million dereplicated records, taxonomically standardized for 682 species, and considering important physiological and biogeographical traits. Specifically, a flagging system was implemented to signal potentially incorrect records reported on land, in regions with limiting light conditions for photosynthesis, and outside the known distribution of species, as inferred from the most recent published literature. We document the procedure and provide a dataset in tabular format based on Darwin Core Standard (DwC), alongside with a set of functions in R language for data management and visualization.Entities:
Mesh:
Year: 2020 PMID: 32286314 PMCID: PMC7156423 DOI: 10.1038/s41597-020-0459-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Global dataset of marine forest species of brown macroalgae. Included orders: Fucales, Laminariales and Tilopteridales. Red and gray circles depict raw and corrected data, respectively.
Fig. 2Global dataset of marine forest species of seagrasses. Included families: Cymodoceaceae, Hydrocharitaceae, Posidoniaceae and Zosteraceae. Red and gray circles depict raw and corrected data, respectively.
Summary of records included in the dataset per ecological group, original source type and quality flagged (considering locations on land, regions with unsuitable light conditions and outside known distributional ranges).
| Group | Records number (percentage) | Literature | Herbaria | Repositories | Total |
|---|---|---|---|---|---|
| Kelp and fucoid algae | Overall | 439,877 | 36,775 | 611,796 | 1,088,448 |
| Flagged: On Land | 2,241 (0.51) | 5,350 (14.54) | 18,615 (3.04) | 26,206 (2.41) | |
| Flagged: Unsuitable light | 21,080 (4.79) | 7,420 (20.17) | 44,480 (7.27) | 72,980 (6.70) | |
| Flagged: Outside distribution | 1,013 (0.23) | 1,367 (3.71) | 4,537 (0.74) | 6,917 (0.63) | |
| Seagrasses | Overall | 2,376 | 622 | 1,660,359 | 1,663,357 |
| Flagged: On Land | 60 (2.52) | 233 (37.45) | 6,676 (0.40) | 6,969 (0.42) | |
| Flagged: Unsuitable light | 131 (5.51) | 254 (40.83) | 116,036 (6.99) | 116,421 (6.99) | |
| Flagged: Outside distribution | 39 (1.64) | 99 (15.91) | 68,314 (4.114) | 68,452 (4.12) | |
| Total | Overall | 442,253 | 37,397 | 2,272,155 | 2,751,805 |
Values in parenthesis refer to percentage of flagged record.
Fig. 3Records of marine forest species per year.
Fig. 4New additions to major online data repositories (marine forests of brown macroalgae). Red circles depict new data and gray circles depict data aggregated from the repositories Global Biodiversity Information Facility[62] and the Ocean Biogeographic Information System[63].
Fig. 5New additions to major online data repositories (marine forests of seagrasses). Red circles depict new data and gray circles depict data aggregated from the repositories Global Biodiversity Information Facility[62] and the Ocean Biogeographic Information System[63].
Summary of species included in the dataset per ecological group and original source type.
| Group | Species number (percentage) | Literature | Herbaria | Repositories | Total |
|---|---|---|---|---|---|
| Kelp and fucoid algae | Overall | 80 | 396 | 601 | 623 |
| Flagged: On Land | 50 (62.50) | 333 (84.09) | 317 (52.75) | 314 (50.40) | |
| Flagged: Unsuitable light | 71 (88.75) | 336 (84.84) | 513 (85.35) | 537 (86.19) | |
| Flagged: Outside distribution | 22 (27.50) | 235 (59.34) | 208 (34.61) | 423 (67.89) | |
| Seagrasses | Overall | 9 | 21 | 59 | 59 |
| Flagged: On Land | 8 (88.88) | 19 (90.48) | 50 (84.74) | 52 (88.13) | |
| Flagged: Unsuitable light | 9 (100.00) | 18 (85.71) | 51 (86.44) | 52 (88.13) | |
| Flagged: Outside distribution | 2 (22.22) | 17 (80.95) | 36 (61.02) | 39 (66.10) | |
| Total | Overall | 89 | 417 | 660 | 682 |
Quality flags (considering locations on land, regions with unsuitable light conditions and outside known distributional ranges) refer to species with at least one record flagged. Values in parenthesis refer to percentage of species with at least one record flagged.
List of functions available to facilitate extraction, listing and visualization of occurrence records (refer to main Github repository for more information).
| Function | Description | Arguments |
|---|---|---|
| extractDataset() | Imports data to R environment | group (character), pruned (logical) |
| listTaxa() | Lists available taxa | — |
| listData() | Lists data available in a dynamic table | extractDataset object name (character), taxa (character), status (character) |
| listDataMap() | Lists data available in a map | extractDataset object name (character), taxa (character), status (character), radius (integer), color (character), zoom (integer) |
| subsetDataset() | Subsets available data to a specific taxon | extractDataset object name (character), taxa (character), status (character) |
| exportData() | Exports available data to a text delimited file or shapefile (geospatial vector data for geographic information systems) | extractDataset object name (character), taxa (character), status (character), file type (character), file name (character) |
Description of the main fields used in the dataset.
| Field | Description |
|---|---|
| id | An identifier given to the occurrence at the time it was recorded |
| modified | The most recent date-time on which the resource was changed |
| basisOfRecord | The specific nature of the data record |
| aphiaID | Unique identifier of a taxon |
| acceptedAphiaID | Unique identifier of an accepted taxon |
| name | Taxon’s name, as reported originally |
| acceptedName | Accepted name’s taxon |
| scientificNameAuthorship | Name of who described the taxon originally |
| taxonomicStatus | The status of the taxon (e.g., accepted/not accepted) |
| kingdom | Higher taxonomic classification |
| phylum | Higher taxonomic classification |
| class | Higher taxonomic classification |
| order | Higher taxonomic classification |
| family | Higher taxonomic classification |
| genus | Higher taxonomic classification |
| decimalLongitude | Geographical longitude in decimal degrees of the center of a location |
| decimalLatitude | Geographical latitude in decimal degrees of the center of a location |
| coordinateUncertaintyInMeters | Distance from decimalLatitude and decimalLongitude that describes the smallest circle containing the entire Location |
| depthAccuracy | Depth uncertainty, as reported originally |
| country | Country or major administrative unit in which the Location occurs |
| locality | The specific description of the place |
| verbatimDepth | Depth in meters |
| minimumDepthInMeters | Minimum depth in meters |
| maximumDepthInMeters | Maximum depth in meters |
| year | The four-digit year in which the Event occurred |
| month | The two-digit month in which the Event occurred |
| day | The two-digit day in which the Event occurred |
| sourceType | Type of original data source |
| bibliographicCitation | Reference for the resource indicating how this record should be cited |
| bibliographicCitationDOI | Permanent identifier for the original resource |
| flagHumanCuratedDistribution* | Flag for records outside the known distribution of species |
| flagMachineOnLand* | Flag for records occurring over landmasses |
| flagMachineSuitableLightBottom* | Flag for records outside regions with suitable light conditions |
| RecordNotes | Additional comments or notes |
For more information on additional available fields please refer to the Darwin Core Standard[32] permanent repository[34,64] at www.dwc.tdwg.org.
*Potentially flagged records as ‘−1’ in dataset.
| Measurement(s) | Species • Distribution |
| Technology Type(s) | digital curation |
| Factor Type(s) | geographic location |
| Sample Characteristic - Organism | Fucales • Laminariales • Tilopteridales • Cymodoceaceae • Hydrocharitaceae • Posidoniaceae • Zosteraceae |
| Sample Characteristic - Environment | marine biome |
| Sample Characteristic - Location | global |