| Literature DB >> 28829434 |
Michael Siccha1, Michal Kucera1.
Abstract
Census counts of marine microfossils in surface sediments represent an invaluable resource for paleoceanography and for the investigation of macroecological processes. A prerequisite for such applications is the provision of data syntheses for individual microfossil groups. Specific to such syntheses is the necessity of taxonomical harmonisation across the constituent datasets, coupled with dereplication of previous compilations. Both of these aspects require expert knowledge, but with increasing number of records involved in such syntheses, the application of expert knowledge via manual curation is not feasible. Here we present a synthesis of planktonic foraminifera census counts in surface sediment samples, which is taxonomically harmonised, dereplicated and treated for numerical and other inconsistencies. The data treatment is implemented as an objective and largely automated pipeline, allowing us to reduce the initial 6,984 records to 4,205 counts from unique sites and informative technical or true replicates. We provide the final product and document the procedure, which can be easily adopted for other microfossil data syntheses.Entities:
Mesh:
Year: 2017 PMID: 28829434 PMCID: PMC5566098 DOI: 10.1038/sdata.2017.109
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Taxonomic categories considered in the ForCenS database.
| Multi-species categories | ||
| 1 | ||
| 2 | ||
| 3 | ||
| 4 | ||
| 5 | ||
| 6 | ||
| 7 | ||
| 8 | ||
| 9 | ||
| 10 | ||
| 11 | ||
| 12 | ||
| 13 | ||
| 14 | ||
| 15 | ||
| 16 | ||
| 17 | ||
| 18 | ||
| 19 | ||
| 20 | ||
| 21 | ||
| 22 | ||
| 23 | ||
| 24 | ||
| 25 | ||
| 26 | ||
| 27 | ||
| 28 | ||
| 29 | ||
| 30 | ||
| 31 | ||
| 32 | ||
| 33 | ||
| 34 | ||
| 35 | ||
| 36 | ||
| 37 | ||
| 38 | ||
| 39 | ||
| 40 | ||
| 41 | ||
| not present in ForCenS | ||
| 42 | ||
| 43 | ||
| 44 | ||
| 45 | ||
| 46 | ||
| 47 | ||
| Multi-species categories | ||
| | ||
| | ||
| | ||
| Morphotype categories | ||
| 24 A | ||
| 24 B | ||
| 7 C | ||
| 7 D | ||
| 31 C | ||
| 31 D | ||
Synonymisation used in preparing the ForCenS database.
Details of the constituent datasets of the ForCenS database.
| CLIMAP | 1981 | 375 | 37 | 44 | 26 |
| BUFD | 1999 | 1,265 | 36 | 43 | 27 |
| ATL947 | 2003 | 947 | 31 | 39 | 28 |
| MARGO | 2005 | 3,773 | 39 | 49 | 19–32 |
| Huels | 1999 | 21 | 30 | 34 | 33 |
| Mohtadi | 2005 | 91 | 20 | 22 | 34 |
| Mohtadi | 2007 | 34 | 18 | 20 | 35 |
| Salgueiro | 2008 | 134 | 23 | 25 | 36 |
| Siccha | 2009 | 61 | 31 | 34 | 37 |
| Munz | 2015 | 283 | 31 | 35 | 38 |
ForCenS sample metadata description.
| The name of the sample | |||
| A unique descriptor for the sample | |||
| A binary coded flag for the sample treatment (see | |||
| Sampling device | |||
| Decimal latitude in the range of −90 (90° South) to +90 (90° North). | |||
| Decimal longitude in the range of −180 to (180° West) +180 (180° East). | |||
| Water depth at the sampling site | |||
| A binary coded flag denoting the ocean basin (see | |||
| Upper sediment depth boundary for the sample | |||
| Lower sediment depth boundary for the sample | |||
| Sample_depth_average | Average sediment depth for the sample | ||
| Author of the sample data (or compilation) | |||
| Journal of the publication associated with the sample data | |||
| Year of the publication associated with the sample data | |||
| Digital Object Identifier of the sample data publication | |||
| Digital Object Identifier of the resource from where the sample data was retrieved | |||
| Comment to sample and annotation of any modifications to the sample data | |||
| A binary coded flag denoting the source database of the sample (see | |||
| Variable denoting the original sample data type, 0 for relative abundances, 1 for raw count data | |||
| Minimum number of counted individuals per sample in the study | |||
| Number of counted individuals in the sample |
ForCenS database error flag description.
| 1 | 1 | modified |
| 2 | 2 | outlier (not yet implemented) |
| 3 | 4 | dissolution affected (not yet implemented) |
| 4 | 8 | duplicate (see comment) |
| 5 | 16 | taxonomically incorrect (see comment) |
| 6 | 32 | too many unidentified (>5%) |
| 7 | 64 | sum of count data deviates from 100% by more than 5% |
| 8 | 128 | too few counted individuals (<150) |
| 9 | 256 | non-standard sampling device |
| 10 | 512 | no geographical coordinates |
ForCenS database constituent database flag description.
| 1 | 1 | CLIMAP | 26 |
| 2 | 2 | Brown University Foraminiferal Database | 27 |
| 3 | 4 | ATL947 | 28 |
| 4 | 8 | MARGO North Atlantic | 29 |
| 5 | 16 | MARGO South Atlantic | 29 |
| 6 | 32 | MARGO Indo-Pacific | 31 |
| 7 | 64 | MARGO Pacific | 30 |
| 8 | 128 | MARGO Mediterranean | 32 |
ForCenS sample metadata description WOA09 basin mask (Data Citation 18)
| 1 | 1 | All oceans |
| 2 | 2 | Atlantic |
| 3 | 4 | North Atlantic |
| 4 | 8 | South Atlantic |
| 5 | 16 | Pacific |
| 6 | 32 | North Pacific |
| 7 | 64 | South Pacific |
| 8 | 128 | Indian Ocean |
| 9 | 256 | Southern Ocean |
| 10 | 512 | Arctic Ocean |
| 11 | 1,024 | Mediterranean Sea |
| 12 | 2,048 | Red Sea |
Results of the sequential processing of the constituent datasets of ForCenS.
| Numbers denote the numbers of samples retained or excluded from the different databases, either cumulative or for the individual step of database merging. | ||||||||
|---|---|---|---|---|---|---|---|---|
| CLIMAP | 375 | 351 | 24 | — | — | — | — | — |
| ↳ and BUFD | 1,640 | 1,568 | 72 | 37 | 32 | 2 | 3 | — |
| ↳ and ATL947 | 2,587 | 2,340 | 247 | 160 | 157 | 1 | 2 | — |
| ↳ and MARGO | 6,360 | 3,637 | 2,723 | 2,075 | 1,912 | 42 | 111 | 10 |
| ↳ and Additions | 6,984 | 4,205 | 2,779 | — | — | — | — | — |
Results of the individual processing of the constituent datasets of ForCenS.
| Numbers denote the numbers of samples retained or excluded from the different databases. The number of samples excluded for various reasons do not add up to the total number of excluded samples because one individual sample might be flagged for exclusion due to more than one reason (taxonomically invalid and of insufficient numerical quality at the same time). | |||||||
|---|---|---|---|---|---|---|---|
| CLIMAP | 375 | 351 | 24 | 6 | 8 | 12 | — |
| BUFD | 1,265 | 1,254 | 11 | 1 | 10 | — | — |
| ATL947 | 947 | 932 | 15 | 3 | 9 | 5 | — |
| MARGO | 3,773 | 3,170 | 603 | 70 | 46 | 3 | 486 |
| Additions | 624 | 568 | 56 | 43 | 13 | — | — |
Figure 1Location of all census counts retained in the ForCenS compilation.
Colours denote the sample source, the first occurrence of a sample in a compilation taking precedence over reuse in later compilations.
Figure 2Location of all census counts excluded from the ForCenS compilation with colours denoting the reason of exclusion.