| Literature DB >> 35087092 |
Darcy Bird1,2, Lux Miranda3, Marc Vander Linden4, Erick Robinson5, R Kyle Bocinsky6, Chris Nicholson7, José M Capriles8, Judson Byrd Finley9, Eugenia M Gayo10, Adolfo Gil11, Jade d'Alpoim Guedes12, Julie A Hoggarth13, Andrea Kay14, Emma Loftus15, Umberto Lombardo16, Madeline Mackie17, Alessio Palmisano18, Steinar Solheim19, Robert L Kelly20, Jacob Freeman21,22.
Abstract
Archaeologists increasingly use large radiocarbon databases to model prehistoric human demography (also termed paleo-demography). Numerous independent projects, funded over the past decade, have assembled such databases from multiple regions of the world. These data provide unprecedented potential for comparative research on human population ecology and the evolution of social-ecological systems across the Earth. However, these databases have been developed using different sample selection criteria, which has resulted in interoperability issues for global-scale, comparative paleo-demographic research and integration with paleoclimate and paleoenvironmental data. We present a synthetic, global-scale archaeological radiocarbon database composed of 180,070 radiocarbon dates that have been cleaned according to a standardized sample selection criteria. This database increases the reusability of archaeological radiocarbon data and streamlines quality control assessments for various types of paleo-demographic research. As part of an assessment of data quality, we conduct two analyses of sampling bias in the global database at multiple scales. This database is ideal for paleo-demographic research focused on dates-as-data, bayesian modeling, or summed probability distribution methodologies.Entities:
Year: 2022 PMID: 35087092 PMCID: PMC8795199 DOI: 10.1038/s41597-022-01118-7
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Database/Dataset name, base LocAccuracy variable, and other relevant information collected.
| Database Name | Pub Year | Base LocAccuracy (see Table | Parent Dataset(s) (Note that this is not complete) | Citation |
|---|---|---|---|---|
| 14CARHU | 2015 | 0 or 1 | N/A | [ |
| 14SEA | 2017 | 0 | None listed | [ |
| aDRAC | 2016 | 1 | None listed | [ |
| Andes14C | 2021 | 2 | Ziolkowski_ | [ |
| AustArch | 2014 | 2 | N/A | [ |
| Bevan2017 | 2017 | 1 | ORAU[ RADON[ EUROEVOL[ CalPal[ Chapple 2019[ | [ |
| CALPAL | 2016 | 1 | N/A | [ |
| Capriles_&_Albarracin-Jordan_2013 | 2013 | 2 | N/A | [ |
| CARD | 2019 | 1 | None listed | [ |
| CONTEXT | 2006 | 1 | Website not maintained | [ |
| Cremaetal2016 | 2016 | 2 | N/A | [ |
| EUROEVOL | 2016 | 2 | RADON[ BANADORA[ | [ |
| Flohretal2016 | 2016 | 1 | CalPal[ CONTEXT[ Ex Orient[ | [ |
| Gayo_CentralChile | 2019 | 2 | SCAR Campbell & Quiroz 2015[ | [ |
| Goldberg_2016 | 2016 | 1 | SCAR Andes14C[ Mendez2013[ Bueno Prates Mendez Rademaker Steele & Politis 2009[ | [ |
| GuedesBocinsky2018# | 2018 | 2 | Wangetal2014[ | [ |
| Jorgensen_2020 | 2016 | 0 | N/A | [ |
| Kay_WestAfrica | 2019 | 1 or 2 | N/A | [ |
| KITEeastafrica | 2016 | 2 | N/A | [ |
| Lombardo_2020 | 2020 | 3 | Caprilesetal2019[ | [ |
| ManningTimpson2014 | 2014 | 2 | Vernet & Aumassip 1992[ | [ |
| MedAfriCarbon | 2020 | 1 through 3 | CalPal[ | [ |
| Mendez2013 | 2013 | 0 | N/A | [ |
| Mendezetal2015 | 2015 | 1 | N/A | [ |
| MesoRAD2020 | 2020 | 1 | N/A | [ |
| Palmisano2017_Italy | 2017 | 1 through 3 | CalPal[ RADON[ EUROEVOL[ ORAU[ IRPA/KIK[ | [ |
| Pratesetal2020# | 2020 | 1 | Pratesetal2013[ | [ |
| RADON | 2012 | 2 | N/A | [ |
| RADON-B | 2014 | 1 | None listed | [ |
| RapaNui2020 | 2020 | 2 | Mulrooney2013[ | [ |
| RirisArroyoKalin2019 | 2019 | 2 | SCAR[ Andes14C[ Goldberg2016[ Bueno Prates Mendez Steele & Politis 2009[ | [ |
| SARD | 2019 | 2 | N/A | [ |
| SCAR | 2015 | 2 | Andes 14C[ Rademaker | [ |
| Silva_VanderLinden_2017 | 2017 | 2 | Flohretal2016[ EUROEVOL[ | [ |
| Solheim_Norway | 2018 | 2 | N/A | [ |
| UWyo2021 | 2011 | 1 | N/A | |
| Vermeersch2019* | 2019 | 1 | CalPal[ | [ |
| Wangetal2014 | 2014 | 1 | N/A | [ |
| Ziolkowski_ | 1994 | 2 | N/A | [ |
*The Vermeersch dataset was added last. Only radiocarbon lab numbers NOT already present in the dataset were added to the raw, uncleaned dataset.
#These datasets did not have Country provided as a variable, but Country was acquired by plotting the geographic coordinates and conducting a spatial join with the Natural Earth countries shapefile.
These are the only databases listed in the “Source” column.
Database variable names, descriptions, and whether or not the variable was required to include the data.
| Variable Name | Required? | Description |
|---|---|---|
| LabID | Yes | Unique lab identification for every radiocarbon date |
| Age | Yes | Radiocarbon age |
| Error | Yes | One sigma standard error of the radiocarbon age |
| Material | No | Taken straight from the dataset, no consolidation of materials. |
| Taxa | No | Taken straight from the dataset if they had a separate column. Again, no cleaning or verification process. |
| d13C | No | δ13C value taken straight from the dataset with no cleaning or verification process. Note that there may be many inaccurate “0” values taken from the original dataset, since several datasets used “0” instead of NA or a blank cell. |
| Method | No | Refers to method of radiocarbon dating used, such as AMS or radiometric. Taken straight from the dataset with no cleaning or verification process. |
| Period | No | Archaeological time period. Did not clean or organize. Common in European datasets, generally hit-or-miss elsewhere. |
| SiteID | No | Site identification number. Very useful for US and Canadian sites, otherwise uncommon. |
| SiteName | No | Site name, usually unique to each site within each country. Common in non-North American sites |
| Long | No | Longitude, preferably in decimal degrees, but degrees, minutes, and seconds also accepted. Any other format was excluded. |
| Lat | No | Latitude, preferably in decimal degrees, but degrees, minutes, and seconds also accepted. Any other format was excluded. |
| LocAccuracy | Yes | Variable created according to each dataset’s described accuracy and verified later. Necessary to prioritize radiocarbon ages that came from more reliable sources (e.g. directly from collector) 0: no specific locational information: only country provided 1: Province/State (not-US) or county (US) locational information. Note that the accuracy varies according to how large the country, province, and counties are. 2: Very close locational information (within 500 m), including locations digitized from forms and found during internet search 3: Exact location of site provided. Source collected location personally. |
| Country | No | The country listed or provided by a dataset affiliated with the date and verified later. If no country was provided by the dataset, the data were retained but lat/long were not verified except to ensure they were on the appropriate continent. |
| Province | No | Administrative province or state within a country |
| Region | No | Variable generated according to country and province, if available. A broader region of the world. |
| Continent | Yes | Provided by dataset and verified later. Any dates without an affiliated continent (or one that could be determined according to a listed country) were deleted. |
| Source | Yes | The dataset that provided the date. |
| Reference | No | Full reference if available, but short (e.g. author and year) reference also accepted for the radiocarbon date information. Provided by dataset. No verification process, but extra whitespace removed. |
Note that many variables are not required but are very useful.
Fig. 1Flowchart demonstrating the decision-making process for verifying the location and modifying the LocAccuracy variable accordingly.
Fig. 2Flowchart used to locate archaeological sites without locational information. Note this method worked best for famous sites.
Data records and availability information.
| Filename | Description | Access | Citation |
|---|---|---|---|
| p3k14c_raw.csv | Manually cleaned raw radiocarbon dataset | Restricted | [ |
| p3k14c_scrubbed.csv | Scrubbed radiocarbon dataset without location obfuscation | Restricted | [ |
| p3k14c_scrubbed_fuzzed.csv | Final radiocarbon dataset, location information obscured | Public | [ |
| p3k14c_graveyard.csv | Data removed from the final dataset during the scrubbing process, including the rationale for removal | Restricted | [ |
Fig. 3Global map showing locations of all radiocarbon records after the data cleaning process, color-coded by continent. Individual sites are translucent to illustrate site density.
Number of cleaned radiocarbon dates by continent.
| Continent | Raw Dates (including duplicates) | Scrubbed Dates | % Kept | Dated Sites | Mean Dates/Site |
|---|---|---|---|---|---|
| Africa | 14,860 | 11,129 | 74.9 | 3,463 | 3.21 |
| Asia | 21,828 | 14,071 | 64.5 | 2,693 | 5.23 |
| Australia | 3,661 | 3,657 | 99.9 | 1,530 | 2.39 |
| Europe | 119,106 | 77,393 | 65.0 | 21,331 | 3.63 |
| North America | 102,288 | 64,934 | 63.5 | 16,120 | 4.03 |
| Central America | 1,223 | 1,218 | 99.5 | 99 | 12.3 |
| South America | 9,568 | 7,668 | 80.1 | 2,077 | 3.69 |
| Total | 272,534 | 180,070 | 66.1 | 47,313 | 3.81 |
Fig. 4Kernel density estimate highlighting clustering and gaps in different regional/continental records. (a,b) Kernel density estimates for (a) North-Western continental Europe and (B) the Contiguous United States of America. (c,d) Kernel density estimates weighted by the number of 14C dates per site for (c) North-Western continental Europe and (d) for the Contiguous United States of America. (e,f) Risk surface analysis for (e) North-Western continental Europe and for (f) the Contiguous United States of America. North-Western continental Europe shows significant oversampling of dates in Belgium, the Netherlands, and portions of eastern France, and undersampling across much of eastern Germany; the Contiguous United States shows oversampling across the Great Basin, central Rocky Mountain, central Plains, and New England regions, and undersampling across the Northwest and northern California, southern Southwest, Texas, the American Bottom, and southern Florida regions.
Fig. 5The density of sites versus dates for China (provinces) and Western Africa (countries). Each plot is log-log transformed. The relationship between the spatial density of recorded archaeological sites and dates across each region is sub-linear, indicating that enhanced recording of archaeological resources does not produce a higher density of dates. China shows a super-linear relationship between the density of dated sites and dates, suggesting over-sampling of dates in provinces with higher densities of dated sites; this relationship in Western Africa remains slightly sub-linear.
Fig. 7Focused case study of data in the United States, graphed linearly (left column) or logarithmically (right column). (a,b) Dated sites by county; (c,d) Dated sites by state; (e,f) Recorded Sites by state.
Fig. 6The density of dated sites versus dates at continental scale. Each point represents an Administrative Level 1 region (state/province) within the continent. Each plot is log-log transformed. North America and Australia demonstrate an effectively linear relationship between dated site density and date density, while Central America, South America, Europe, Africa, and Asia have super-linear relationships such that regions with a higher density of dated sites have an enhanced number of dates for those sites.
| Measurement(s) | radiocarbon age |
| Technology Type(s) | accelerator mass spectrometer |
| Sample Characteristic - Location | global |