Literature DB >> 34219351

The Sedimentary Geochemistry and Paleoenvironments Project.

Úna C Farrell¹, Rifaat Samawi², Savitha Anjanappa³, Roman Klykov³, Oyeleye O Adeboye⁴, Heda Agic⁵, Anne-Sofie C Ahm⁶, Thomas H Boag⁷, Fred Bowyer⁸, Jochen J Brocks⁹, Tessa N Brunoir¹⁰, Donald E Canfield¹¹, Xiaoyan Chen¹², Meng Cheng¹³, Matthew O Clarkson¹⁴, Devon B Cole¹⁵, David R Cordie¹⁶, Peter W Crockford¹⁷, Huan Cui^18,19, Tais W Dahl²⁰, Lucas D Mouro²¹, Keith Dewing²², Stephen Q Dornbos²³, Nadja Drabon²⁴, Julie A Dumoulin²⁵, Joseph F Emmings²⁶, Cecilia R Endriga², Tiffani A Fraser²⁷, Robert R Gaines²⁸, Richard M Gaschnig²⁹, Timothy M Gibson⁷, Geoffrey J Gilleaudeau³⁰, Benjamin C Gill³¹, Karin Goldberg³², Romain Guilbaud³³, Galen P Halverson³⁴, Emma U Hammarlund³⁵, Kalev G Hantsoo³⁶, Miles A Henderson³⁷, Malcolm S W Hodgskiss³⁸, Tristan J Horner³⁹, Jon M Husson⁴⁰, Benjamin Johnson⁴¹, Pavel Kabanov²², C Brenhin Keller⁴², Julien Kimmig⁴³, Michael A Kipp⁴⁴, Andrew H Knoll⁴⁵, Timmu Kreitsmann⁴⁶, Marcus Kunzmann⁴⁷, Florian Kurzweil⁴⁸, Matthew A LeRoy³¹, Chao Li¹³, Alex G Lipp⁴⁹, David K Loydell⁵⁰, Xinze Lu⁵¹, Francis A Macdonald⁵, Joseph M Magnall⁵², Kaarel Mänd⁵³, Akshay Mehra⁴², Michael J Melchin⁵⁴, Austin J Miller⁵¹, N Tanner Mills⁵⁵, Chiza N Mwinde⁵⁶, Brennan O'Connell⁵⁷, Lawrence M Och⁵⁸, Frantz Ossa Ossa⁵⁹, Anais Pagès⁶⁰, Kärt Paiste⁶¹, Camille A Partin⁶², Shanan E Peters⁶³, Peter Petrov⁶⁴, Tiffany L Playter⁶⁵, Stephanie Plaza-Torres⁶⁶, Susannah M Porter⁵, Simon W Poulton⁸, Sara B Pruss⁶⁷, Sylvain Richoz⁶⁸, Samantha R Ritzer², Alan D Rooney⁷, Swapan K Sahoo⁶⁹, Shane D Schoepfer⁷⁰, Judith A Sclafani², Yanan Shen¹², Oliver Shorttle³⁸, Sarah P Slotznick⁴², Emily F Smith³⁶, Sam Spinks⁴⁷, Richard G Stockey², Justin V Strauss⁴², Eva E Stüeken⁷¹, Sabrina Tecklenburg², Danielle Thomson⁷², Nicholas J Tosca⁷³, Gabriel J Uhlein⁷⁴, Maoli N Vizcaíno², Huajian Wang⁷⁵, Tristan White⁷, Philip R Wilby²⁶, Christina R Woltz⁵, Rachel A Wood⁷⁶, Lei Xiang⁷⁷, Inessa A Yurchenko⁷⁸, Tianran Zhang⁴², Noah J Planavsky⁷, Kimberly V Lau⁷⁹, David T Johnston²⁴, Erik A Sperling².

Abstract

Entities: Chemical

Keywords: Earth history; consortium; database; geochemistry; website

Mesh：

Year: 2021 PMID： 34219351 PMCID： PMC9291056 DOI： 10.1111/gbi.12462

Source DB: PubMed Journal: Geobiology ISSN： 1472-4669 Impact factor: 4.216

× No keyword cloud information.

INTRODUCTION

Geobiology explores how Earth's system has changed over the course of geologic history and how living organisms on this planet are impacted by or are indeed causing these changes. For decades, geologists, paleontologists, and geochemists have generated data to investigate these topics. Foundational efforts in sedimentary geochemistry utilized spreadsheets for data storage and analysis, suitable for several thousand samples, but not practical or scalable for larger, more complex datasets. As results have accumulated, researchers have increasingly gravitated toward larger compilations and statistical tools. New data frameworks have become necessary to handle larger sample sets and encourage more sophisticated or even standardized statistical analyses. In this paper, we describe the Sedimentary Geochemistry and Paleoenvironments Project (SGP; Figure 1), which is an open, community‐oriented, database‐driven research consortium. The goals of SGP are to (1) create a relational database tailored to the needs of the deep‐time (millions to billions of years) sedimentary geochemical research community, including assembling and curating published and associated unpublished data; (2) create a website where data can be retrieved in a flexible way; and (3) build a collaborative consortium where researchers are incentivized to contribute data by giving them priority access and the opportunity to work on exciting questions in group papers. Finally, and more idealistically, the goal was to establish a culture of modern data management and data analysis in sedimentary geochemistry. Relative to many other fields, the main emphasis in our field has been on instrument measurement of sedimentary geochemical data rather than data analysis (compared with fields like ecology, for instance, where the post‐experiment ANOVA (analysis of variance) is customary). Thus, the longer‐term goal was to build a collaborative environment where geobiologists and geologists can work and learn together to assess changes in geochemical signatures through Earth history.

FIGURE 1

The Sedimentary Geochemistry and Paleoenvironments Project (SGP) is an open, collaborative consortium focused on understanding how the Earth has changed through time through analyses of large sedimentary geochemical datasets With respect to the data product, SGP is focused on assembling a well‐vetted and comprehensive dataset that is tractable to multivariate statistical analyses accounting for multiple geological and methodological biases. Phase 1 of the project, which focused on the Neoproterozoic and Paleozoic, has been completed. Future phases will capture a broader range of geologic time, data types, and geography. The database contains tens of thousands of unpublished data points provided by consortium members, as well as detailed metadata that go beyond what is contained in papers. In many cases, these represent measurements that are tangential to a given published study but still of high utility to database studies; these allow the community to address questions that would be impossible to answer solely with the published data. For instance, in order to use a proxy such as Mo/TOC (total organic carbon) ratios in mudrocks deposited under a euxinic water column, the full suite of trace metal, iron speciation, and total organic carbon data is needed. Likewise, geospatial information is required to account for sampling biases, and many statistical learning approaches cannot accept, or have difficulty with, incomplete geological predictor variables. Ultimately, it is this complete data matrix that will allow for SGP’s most insightful analyses. This paper serves as an introduction to SGP, the process by which our data products are created, a description of the Phase 1 data product and a citable reference for that product, a description of the SGP website and API (Application Programming Interface) for open access, and a statement of our future goals.

WHY SGP?

In recent years, there has been a welcome trend in the broader geochemical community toward increased data accessibility, documentation of sample context, and sample curation, albeit with challenges still ahead (Brantley et al., 2020; Cutcher‐Gershenfeld et al., 2016; Planavsky et al., 2020). First, progress has been made through journals and organizations adopting stringent data archiving rules and promoting adherence to FAIR principles—findability, accessibility, interoperability, and reusability (“FAIR Play in Geoscience Data,” 2019; Wilkinson et al., 2016). Second, several databases now house geochemical data at different scales and with different focuses (Brantley et al., 2020; Gard et al., 2019; He et al., 2019; Lehnert et al., 2000). Among the largest and most active are projects such as EarthChem (earthchem.org), the Geobiodiversity Database (geobiodiversity.com), Pangaea (https://www.pangaea.de), and the StabisoDB (https://cnidaria.nat.uni‐erlangen.de/stabisodb/). The SGP database was built with the data structures and standards of these other projects in mind, in keeping with FAIR principles and with the hope that data can be easily shared in the future. Consistent with the stance taken by other organizations in the community (Hanson, 2016), we also strongly encourage all members to register their samples for an International Geo Sample Number (IGSN; i.e., globally unique alphanumeric sample identifiers), which can be obtained from the System for Earth Sample Registration (www.geosamples.org). However, SGP is a domain‐specific project that differs from other databases in the way the data are collected, the nature of the data collected, and the tailored way in which they are presented to our research community. Specifically, SGP is focused on addressing how geochemical proxy records change through deep time. Central to these goals are the following: Compilation of a large quantity (i.e., millions of records) of sedimentary geochemical data spanning deep time. Appropriate age models (with uncertainty), especially for Proterozoic/Archean samples. Information on interpreted depositional environment and specific rock type. Information necessary to gauge whether samples are likely to preserve primary, environmental geochemical signals. Detailed methodological information on how the data were generated. An ability to download the data of interest flexibly and easily. Although some other databases contain sedimentary geochemical data, the vast majority of deep‐time data is not available from any single source, and samples are not readily associated with critical contextual data—such as age constraints and environmental data—necessary for the types of proxy‐through‐time and/or environmental studies typically conducted in historical geobiology. When the SGP was founded in 2015, we believed that a “team science” philosophy would be the most effective way to move beyond spreadsheets to the type and abundance of data required. The research consortium framework we have implemented is modeled after mature consortia in human statistical genetics, such as the Psychiatric Genomics Consortium (PGC). In the PGC, researchers have aggregated data to make statistically robust observations and landmark findings not possible with the data generated by any single research group alone (Duncan et al., 2017; Schizophrenia Working group of the Psychiatric Genomics Consortium, 2014; Wray et al., 2018). Similar to biomedical research consortia, we hope that the intellectual and collaborative environment fostered by SGP will ultimately be as important as our data products or specific insights in research papers. The first priority for Phase 1 of SGP was to assemble or generate multi‐proxy sedimentary geochemical data (carbon and sulfur abundances and isotopes, iron speciation, major and trace metal abundances, and trace metal isotopes, primarily from fine‐grained siliciclastic rocks) from multiple regions worldwide for every Paleozoic Epoch and equivalent ~25 Myr Neoproterozoic time slice. In addition to data compilation, this has involved an effort by SGP members to generate new geochemical data from “background” intervals in the Paleozoic (i.e., not associated with events such as mass extinctions or significant climatic shifts). The first phase of data collection came to an end in 2019. At that point, a copy of the database was vetted by SGP team members and then archived—the first data “freeze” (following the best‐practices approach used in medical consortia). Working groups were formed (with working group leadership established through an open call to SGP team members), and data were made available to Working group analysts via the website and through tailored queries. The first working group papers have recently been published (LeRoy et al., 2021; Lipp et al., 2021; Mehra et al., 2021), and more are in progress. Meanwhile, data collection continues, and the Phase 2 goal is to include more Mesozoic–Cenozoic and pre‐Neoproterozoic time intervals and to expand the geochemical record to more diverse lithologies and grain‐specific phases. The Phase 2 data freeze is currently anticipated for 2023, followed by data vetting and analyses toward group papers.

DATABASE

SGP utilizes a relational database implemented with the PostgreSQL database management system. A full database diagram and documentation are available at https://github.com/ufarrell/sgp_phase1, and a simplified diagram is shown in Figure 2. The design was inspired by several existing data models in the geological and natural history museum communities. Tables for analytical geochemistry are from the British Geological Survey (BGS) geochemistry data model (Watson et al., 2014), with minor modifications. Tables for geological, geographical, and sample details are based on established museum collection management databases (Specify 6 https://www.specifysoftware.org/ and Arctos https://arctosdb.org/) in addition to the Observations Data Model 2 (ODM2, Horsburgh et al., 2016; Hsu et al., 2017), an information model for Earth observations.

FIGURE 2

Simplified schema showing tables and table relationships in the SGP database (https://ufarrell.github.io/sgp_phase1/ for a detailed description). Tables are grouped according to the kind of information they store. Analytical tables (orange) are from the British Geological Survey model (Watson et al., 2014). Geographical, geological (green), and sample (red) tables are primarily based on natural history museum databases. “Housekeeping” tables (purple) record information such as how samples are grouped into projects, where they are stored, and who has contributed contextual information The SGP database is centered on the sample table (Figure 2). Samples are generally characterized by an individual rock sample and all resulting analyzed powders. The three key sections of the database linked to samples are (1) analytical results and associated methods, (2) geographical context, and (3) geological context. Dictionary tables (standardized lists of terms, also known as “controlled vocabularies”) are based on existing community vocabularies where possible (e.g., from EarthChem, ODM2, Macrostrat, U.S. Geological Survey (USGS), and BGS). However, in many cases, these vocabularies required additions, such as the inclusion of specific sedimentary geochemical experimental methods (e.g., sequential iron extraction techniques; Poulton & Canfield, 2005). The BGS data model for analytical methods and geochemical results has been adopted almost without modification. We store analytical data in their submitted or published format and do not standardize the results to any given unit. An analytical result may be empty (NULL) only if it is below or above detection limits, and those values are also stored if they are available. If the results are published, they are linked directly to a reference work on an individual basis so that a fine‐level distinction can be made between published and related unpublished data from the same samples. Any geostandards that are analyzed alongside samples in a study are also recorded. In the SGP, we make every effort not to include the same result twice. However, replicates may legitimately be added if the same sample has undergone analysis for the same analyte more than once (this could include anything from true replicate analyses using the same methods in the same laboratory to analyses of the same sample by different research groups using different methods). We do not currently assign new sample identifiers to sub‐samples. A parent–child relationship may be added in Phase 2 when the focus will expand to include carbonate data.

DATA COLLECTION

The SGP welcomes contributions from any interested researchers. Specifically, contributing data automatically makes a researcher part of the SGP Collaborative Team, rather than one needing to “join” SGP to contribute data. In the first consortium‐building stage, potential collaborators were targeted if their work was particularly relevant to the Phase 1 goals, and additional researchers were recruited via SGP representation at multiple conferences. SGP collaborators are involved in providing details about their samples and providing published data tables and unpublished data from their own archives. In addition, some data have been collected from relevant published studies where the authors are not directly involved. In such cases, contextual information was coded by SGP team members using information provided in the paper. SGP collaborators are asked to fill in a template with contextual information as completely as possible, but with an emphasis on key fields such as modern latitude and longitude, stratigraphic unit name, depositional environment, and lithology. A particularly important field is interpreted age, which is a numerical estimate for the age of each sample in millions of years (Ma). Whenever possible, the original authors, who are most familiar with the samples and stratigraphic sections, are asked to provide the interpreted age. They can use whatever method with which they feel most comfortable; for example, ages may be estimated based on assumed sedimentation rates and/or linear interpolation, or groups of samples can be assigned one age based on proximity to any available time markers. A brief justification is required for each age provided, which may be used in the future to refine ages further. Maximum and minimum age estimates can also be stored, and indeed, are critical for the type of re‐weighted bootstrap analyses employed by many SGP working groups (Mehra et al., 2021). A subset of samples from two USGS databases has been integrated into the SGP database. The first of the databases used is the National Geochemical Database: Rock (USGS NGDB, U.S. Geological Survey, 2008), comprising data from USGS projects from the 1960s to1990s, largely from North America. The second is the Global Geochemical Database for Critical Metals in Black Shales project (USGS CMIBS, Granitto et al., 2017), which includes predominantly Phanerozoic shale data from all continents. Data from both USGS databases lack much of the contextual information available for samples directly coded by the SGP team members (most specifically basin type, metamorphic/maturity grade, depositional environment, and detailed age justification) and there are a higher proportion of analytes with less detailed geochemical methodology. Nevertheless, they represent large numbers of samples (74% of samples in Phase 1 are from USGS sources) with age, lithology, and geographic information that can be utilized for many types of analysis. In the case of USGS NGDB, only sedimentary samples were incorporated into SGP, and in the case of USGS CMIBS, we did not include samples with lithologies indicative of ore or studies where the authors were primarily concerned with mineral deposits or studying the effects of metamorphism on shales. An attempt was made to match USGS fields to SGP fields, with some data cleaning needed in order to extract important information such as up‐to‐date stratigraphic names. Samples can easily be traced back to the original USGS databases using their original identifiers. The USGS NGDB data were enhanced by adding interpreted ages. Samples were matched, using a combination of stratigraphy and location, to the continuous‐time age model in Macrostrat (Peters et al., 2018). Specifically, the minimum and maximum age estimates from the Macrostrat model were entered, and the interpreted age was entered as the average of these values. Only samples with matched interpreted ages were included from USGS NGDB. The USGS CMIBS samples were associated with Macrostrat continuous‐time age models where possible and given age information by SGP team members where not. However, a proportion (36%) remain without ages, and filling those in is a key goal for Phase 2. These three sources of data (direct entry by SGP team members (26% of samples), the CMIBS compilation (16% of samples), and the USGS NGDB (58% of samples)) provide a robust base platform for statistical analyses of aggregated sedimentary geochemical data through Earth history. Moving forward, we will continue direct entry from SGP team members, and work toward incorporating geochemical data compiled by additional geological surveys (for instance, incorporation of the OZCHEM whole‐rock database from Geoscience Australia is currently in progress).

DATA DESCRIPTION PHASE 1

Phase 1 of data collection ended in August 2019. A static version of the database was archived and made available to collaborators through the website (sgp‐search.io) and via tailored queries. Time was allowed for vetting, and any errors discovered were corrected before the final freeze in February 2020. The Phase 1 data freeze includes 82,578 samples, with 2,701,236 analytical results, and was made public through our search website in December 2020. This paper should be cited in the future use of Phase 1 data downloads. More complete information on the Phase 1 data product can be found on the SGP wiki (https://github.com/ufarrell/sgp_phase1/wiki), including summaries by age, lithology, and geochemical methodology, as well as the specifics of how USGS databases were incorporated into the SGP structure.

SGP

The SGP‐contributed dataset includes 20,811 samples with 518,291 results. Approximately two thirds of the data (64%) come from 160 published sources (https://github.com/ufarrell/sgp_phase1/wiki/SGP‐data‐references). The remaining 36% are from unpublished sources, including new and legacy data. The samples come from 942 individual sites from 46 countries (Figure 3). Consistent with the Phase 1 goals, 84% of samples were from the Neoproterozoic–Paleozoic (Figure 4). Sixty‐four percent of samples are fine‐grained siliciclastic rocks (shale, mudstone, or siltstone), as are the majority of uncoded lithologies (Figure 5).

FIGURE 3

Geographic distribution of samples in the Phase 1 dataset, separated by our three main data sources (SGP direct entry, USGS CMIBS, and USGS NGDB)

FIGURE 4

Distribution by age and continent for SGP direct entry data (a). Distribution by age for SGP, USGS CMIBS, and USGS NGDB data (a small number of samples (489) with ages >2500 Ma are not included in the figure) (b)

FIGURE 5

Representation of lithologies in the Phase 1 dataset. Note that most unclassified samples from SGP direct entry and USGS CMIBS will be fine‐grained clastic rocks (e.g., shale), whereas USGS NGDB unclassified samples are more heterogeneous

Geographic distribution of samples in the Phase 1 dataset, separated by our three main data sources (SGP direct entry, USGS CMIBS, and USGS NGDB) Distribution by age and continent for SGP direct entry data (a). Distribution by age for SGP, USGS CMIBS, and USGS NGDB data (a small number of samples (489) with ages >2500 Ma are not included in the figure) (b) Representation of lithologies in the Phase 1 dataset. Note that most unclassified samples from SGP direct entry and USGS CMIBS will be fine‐grained clastic rocks (e.g., shale), whereas USGS NGDB unclassified samples are more heterogeneous

USGS NGDB

The data from USGS NGDB that are incorporated into the SGP database include 48,234 samples with 1,769,696 results. Nearly all (99%) of the samples are from the United States. Nineteen percent are sandstone, 13% are shale, and 29% do not have a specific lithology (although lithological details may be available in verbatim fields; Figure 5). Contextual details, including depositional environment and low‐grade metamorphic bin, are mostly not available for these samples, and methodological information is sparse. In general, the USGS NGDB samples skew younger than the SGP samples: 39% are from the Paleozoic, 25% from the Mesozoic, and 33% from the Cenozoic (~3% of samples are from the Proterozoic/Archean). The USGS database provides excellent coverage of the United States, but given the remit of the organization, with strong focus on economic deposits (petroleum‐producing units, phosphatic units, and sedimentary mineral deposits), the sampling may not be representative of the entire country. This is distinct from the bias present in geochemical data produced by academic researchers, which are often focused on mass extinction intervals, Earth system perturbations, and other stratigraphic boundaries.

USGS CMIBS

The data incorporated from USGS CMIBS into the SGP database include 12,797 samples with 409,188 results. The samples are from 45 countries, with 40% from Canada, 27% from the United States, and 13% from Australia. The majority of samples are fine‐grained siliciclastic sediments (69% shale, mudstone, siltstone, or argillite; Figure 5). Sixty percent of samples with interpreted ages are Paleozoic, 24% are Mesozoic, 2% are Cenozoic, and 15% are Proterozoic/Archean. As was the case for USGS NGDB, contextual details, including depositional environment and low‐grade metamorphic bin, are often missing for these samples. However, more detailed geochemical methodological information is available. Each sample in CMIBS has a “best value” result per analyte, selected from multiple values that were originally available (Granitto et al., 2017). The choice of “best value” was made using a rubric which included consideration of the sample weight, the sample “decomposition” (e.g., full vs. partial acid digestion), the instruments used in the analysis, and the detection limits (Granitto et al., 2013).

DATA PRESENTATION AND ACCESS

The SGP search website (sgp‐search.io) utilizes an intuitive user interface to query the Phase 1 database via an API. The two main search types are “samples” and “analyses,” with “nhhxrf” simply being a “samples” search that excludes any handheld XRF (X‐ray fluorescence) data. This methodological distinction is made because while handheld XRF data can be accurate for some elements (e.g., Ca and Fe), it is highly inaccurate for many others (e.g., S, Ni) (Rowe et al., 2012). Handheld XRF data represent 1% of the total results and 4% of SGP‐contributed data; although this is a small percentage now, we anticipate continued growth given the popularity and utility of handheld XRFs. A “samples” search will list an individual sample on each row, with geological context information and geochemical analytes taking up the columns. Data are converted to one standard unit, and oxides are converted to elements (e.g., Al2O3 to Al), and values are averaged if more than one analysis was made per sample. Note, this search may average values produced using different analytical methods, although the number of samples in the database with multiple analytical values for a specific analyte is relatively small. Further, any analyses below or above detection limit are removed, as these cannot be averaged. This has implications for queries involving very low abundance elements (e.g., Ag in sedimentary rocks), as only results above detection limits, and thus higher values, will be included. We anticipate that this search will produce the optimal data output for most end‐users interested in Earth history: a file with age, geological context, and geochemical data for each sample. If users are looking to delve deeper into the data and understand the analyses and procedures that were executed to obtain each sample's geochemical data, then the “analyses” search is useful because it lists every analysis recorded in the database in a separate row. The “analyses” search also allows users to show data relating to the laboratory where the sample was analyzed, the person who made the measurement, geochemical methodology, etc. At the current time, aside from the ability to exclude handheld XRF data, the “samples” and “nhhxrf” search types will not report information about, or have the ability to filter by, geochemical methodology. Users who are interested in methodological details or who would like to export a data file beyond the size limit (10 Mb) should contact the SGP Leadership Team regarding a custom SQL query. Once the user has selected a search type, samples can be filtered based on both geological context and geochemical attributes. Note that for many samples some aspects of geological contextual information are incomplete. Thus, for example, a search filtering for samples deposited in a rift basin will only return samples positively described as such and not necessarily all samples in the database deposited in rift basins. Given that samples will have non‐overlapping missing data, too many filters may result in a smaller‐than‐expected dataset. Search results will appear in a “preview” window that can be used to check the output. Each sample also has an information icon associated with it; clicking this icon will bring up a lightbox with detailed sample information. Finally, the user may request to show reference information for their search. For “analyses” searches (where every analysis is shown as an individual row), this will return the specific literature citation for that individual analytic result. For other search types, this will return, for every sample, a concatenated list of all references whose geochemical data contributed to that specific search. When the user is satisfied with their search, they can then download a.csv file of the data and export a map showing the location and age of samples in their search. The SGP website uses an API to interact with the database, and users can make a copy of the API call using the API icon next to their search results. However, users can also bypass the user interface entirely and access data via a direct API call. This comprises three parts: type: Selects the search type (samples, analyses or nhhxrf) filters: Contains a list of search options that are logically ANDed in the results show: Contains search options that determine which columns will appear in the results Thus, an example API call would be {"type":"samples","filters":{"country":["Argentina","Brazil","Chile","Bolivia","Colombia","Venezuela"],"toc":[2,100]},"show":["toc","fe","height_meters","section_name","country","interpreted_age"]}. This API call is making a “samples” type search for samples that originate from Argentina, Brazil, Chile, Bolivia, Colombia, or Venezuela and have 2%–100% total organic carbon (TOC) content. In other words, searching for organic‐rich samples from South America. In addition, the API call is asking for a results output table with columns that show TOC (wt%), Fe (wt%), section or core name, collection height in meters, each sample's country, and the age in millions of years. Full documentation and a tutorial video are available on the website.

FUTURE GOALS AND DIRECTIONS

The overarching goal of SGP was to provide intellectual and geoinformatic resources for the Earth Science community to advance our understanding of environmental changes on Earth through time. A better understanding of Earth's history requires sufficient data density, but equally importantly it means training a new generation of researchers with the data science and statistical skills to make meaningful conclusions from large sedimentary geochemical datasets. Much of the focus in SGP Phase 1 was in initiating the consortium and increasing the data product to the point where it was useful for analyses by the community. We now aim to increasingly move toward developing a community‐initiated set of best practices for data management, a culture of publishing metadata, and a shared intellectual framework for analyzing such datasets. Over the course of Phase 2, we plan to continue holding annual meetings at Goldschmidt while also beginning regular video calls to share progress and ideas for data analysis. We will also develop accessible "Proxy Primer" videos to help the geobiological community understand the strengths and weaknesses of different proxies. Beyond these broad community and educational goals, we have the following more concrete goals during SGP Phase 2: Expand the geological and geographic scope of samples in our database. Most samples with complete context information (SGP direct entry), and indeed most samples, are Neoproterozoic–Paleozoic in age and from North America (Figure 4). Younger and older samples, and worldwide sampling, are necessary for accurate analyses through the full swath of Earth history. Expand the carbonate geochemical record. Our database structure is appropriate for carbonate data (and indeed, >8000 carbonate samples are already in the database). However, this goal will require community discussion regarding how best to incorporate methodologies and phase‐specific analyses. Continue correcting errors in previously entered data. Although we have been as careful as possible during data entry, mistakes are inevitable in a dataset of this size. Paleobiological analyses and basic statistical logic suggest that such mistakes (random error) will not affect results as long as they are not biased (systematic error) (Sepkoski, 1993). Nonetheless, we would like to present the most accurate results, and we welcome users to notify us of true errors (rather than geologic disagreement) that are found during their database searches. Continue developing the SGP search website and API to best serve the sedimentary geochemistry and Earth history communities. Expand the community and user group. Anyone who is interested in contributing to the project is welcome, and helping the community grow our data resource is the only requirement to join the SGP Collaborative Team. Details, including contact information and sample submission templates, are available at https://sgp.stanford.edu/. We want SGP to be a hub for deep‐time sedimentary geochemical research, and researchers from diverse backgrounds, early‐career researchers, and researchers working or studying outside Europe and North America (where the bulk of SGP members reside) are especially invited to become involved. Echoing this final point, we reiterate that the SGP is a community‐oriented research consortium, and we welcome suggestions on how to best move toward our shared goals.

7 in total

1. Ten years in the library: new data confirm paleontological patterns.

Authors: J J Sepkoski
Journal: Paleobiology Date: 1993 Impact factor: 2.892

2. Store and share ancient rocks.

Authors: Noah Planavsky; Ashleigh Hood; Lidya Tarhan; Shuzhong Shen; Kirk Johnson
Journal: Nature Date: 2020-05 Impact factor: 49.962

3. Significant Locus and Metabolic Genetic Correlations Revealed in Genome-Wide Association Study of Anorexia Nervosa.

Authors: Laramie Duncan; Zeynep Yilmaz; Helena Gaspar; Raymond Walters; Jackie Goldstein; Verneri Anttila; Brendan Bulik-Sullivan; Stephan Ripke; Laura Thornton; Anke Hinney; Mark Daly; Patrick F Sullivan; Eleftheria Zeggini; Gerome Breen; Cynthia M Bulik
Journal: Am J Psychiatry Date: 2017-05-12 Impact factor: 18.112

4. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression.

Authors: Naomi R Wray; Stephan Ripke; Manuel Mattheisen; Maciej Trzaskowski; Enda M Byrne; Abdel Abdellaoui; Mark J Adams; Esben Agerbo; Tracy M Air; Till M F Andlauer; Silviu-Alin Bacanu; Marie Bækvad-Hansen; Aartjan F T Beekman; Tim B Bigdeli; Elisabeth B Binder; Douglas R H Blackwood; Julien Bryois; Henriette N Buttenschøn; Jonas Bybjerg-Grauholm; Na Cai; Enrique Castelao; Jane Hvarregaard Christensen; Toni-Kim Clarke; Jonathan I R Coleman; Lucía Colodro-Conde; Baptiste Couvy-Duchesne; Nick Craddock; Gregory E Crawford; Cheynna A Crowley; Hassan S Dashti; Gail Davies; Ian J Deary; Franziska Degenhardt; Eske M Derks; Nese Direk; Conor V Dolan; Erin C Dunn; Thalia C Eley; Nicholas Eriksson; Valentina Escott-Price; Farnush Hassan Farhadi Kiadeh; Hilary K Finucane; Andreas J Forstner; Josef Frank; Héléna A Gaspar; Michael Gill; Paola Giusti-Rodríguez; Fernando S Goes; Scott D Gordon; Jakob Grove; Lynsey S Hall; Eilis Hannon; Christine Søholm Hansen; Thomas F Hansen; Stefan Herms; Ian B Hickie; Per Hoffmann; Georg Homuth; Carsten Horn; Jouke-Jan Hottenga; David M Hougaard; Ming Hu; Craig L Hyde; Marcus Ising; Rick Jansen; Fulai Jin; Eric Jorgenson; James A Knowles; Isaac S Kohane; Julia Kraft; Warren W Kretzschmar; Jesper Krogh; Zoltán Kutalik; Jacqueline M Lane; Yihan Li; Yun Li; Penelope A Lind; Xiaoxiao Liu; Leina Lu; Donald J MacIntyre; Dean F MacKinnon; Robert M Maier; Wolfgang Maier; Jonathan Marchini; Hamdi Mbarek; Patrick McGrath; Peter McGuffin; Sarah E Medland; Divya Mehta; Christel M Middeldorp; Evelin Mihailov; Yuri Milaneschi; Lili Milani; Jonathan Mill; Francis M Mondimore; Grant W Montgomery; Sara Mostafavi; Niamh Mullins; Matthias Nauck; Bernard Ng; Michel G Nivard; Dale R Nyholt; Paul F O'Reilly; Hogni Oskarsson; Michael J Owen; Jodie N Painter; Carsten Bøcker Pedersen; Marianne Giørtz Pedersen; Roseann E Peterson; Erik Pettersson; Wouter J Peyrot; Giorgio Pistis; Danielle Posthuma; Shaun M Purcell; Jorge A Quiroz; Per Qvist; John P Rice; Brien P Riley; Margarita Rivera; Saira Saeed Mirza; Richa Saxena; Robert Schoevers; Eva C Schulte; Ling Shen; Jianxin Shi; Stanley I Shyn; Engilbert Sigurdsson; Grant B C Sinnamon; Johannes H Smit; Daniel J Smith; Hreinn Stefansson; Stacy Steinberg; Craig A Stockmeier; Fabian Streit; Jana Strohmaier; Katherine E Tansey; Henning Teismann; Alexander Teumer; Wesley Thompson; Pippa A Thomson; Thorgeir E Thorgeirsson; Chao Tian; Matthew Traylor; Jens Treutlein; Vassily Trubetskoy; André G Uitterlinden; Daniel Umbricht; Sandra Van der Auwera; Albert M van Hemert; Alexander Viktorin; Peter M Visscher; Yunpeng Wang; Bradley T Webb; Shantel Marie Weinsheimer; Jürgen Wellmann; Gonneke Willemsen; Stephanie H Witt; Yang Wu; Hualin S Xi; Jian Yang; Futao Zhang; Volker Arolt; Bernhard T Baune; Klaus Berger; Dorret I Boomsma; Sven Cichon; Udo Dannlowski; E C J de Geus; J Raymond DePaulo; Enrico Domenici; Katharina Domschke; Tõnu Esko; Hans J Grabe; Steven P Hamilton; Caroline Hayward; Andrew C Heath; David A Hinds; Kenneth S Kendler; Stefan Kloiber; Glyn Lewis; Qingqin S Li; Susanne Lucae; Pamela F A Madden; Patrik K Magnusson; Nicholas G Martin; Andrew M McIntosh; Andres Metspalu; Ole Mors; Preben Bo Mortensen; Bertram Müller-Myhsok; Merete Nordentoft; Markus M Nöthen; Michael C O'Donovan; Sara A Paciga; Nancy L Pedersen; Brenda W J H Penninx; Roy H Perlis; David J Porteous; James B Potash; Martin Preisig; Marcella Rietschel; Catherine Schaefer; Thomas G Schulze; Jordan W Smoller; Kari Stefansson; Henning Tiemeier; Rudolf Uher; Henry Völzke; Myrna M Weissman; Thomas Werge; Ashley R Winslow; Cathryn M Lewis; Douglas F Levinson; Gerome Breen; Anders D Børglum; Patrick F Sullivan
Journal: Nat Genet Date: 2018-04-26 Impact factor: 38.330

5. Biological insights from 108 schizophrenia-associated genetic loci.

Authors:
Journal: Nature Date: 2014-07-22 Impact factor: 49.962

6. The FAIR Guiding Principles for scientific data management and stewardship.

Authors: Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal: Sci Data Date: 2016-03-15 Impact factor: 6.444

7. The Sedimentary Geochemistry and Paleoenvironments Project.

Authors: Úna C Farrell; Rifaat Samawi; Savitha Anjanappa; Roman Klykov; Oyeleye O Adeboye; Heda Agic; Anne-Sofie C Ahm; Thomas H Boag; Fred Bowyer; Jochen J Brocks; Tessa N Brunoir; Donald E Canfield; Xiaoyan Chen; Meng Cheng; Matthew O Clarkson; Devon B Cole; David R Cordie; Peter W Crockford; Huan Cui; Tais W Dahl; Lucas D Mouro; Keith Dewing; Stephen Q Dornbos; Nadja Drabon; Julie A Dumoulin; Joseph F Emmings; Cecilia R Endriga; Tiffani A Fraser; Robert R Gaines; Richard M Gaschnig; Timothy M Gibson; Geoffrey J Gilleaudeau; Benjamin C Gill; Karin Goldberg; Romain Guilbaud; Galen P Halverson; Emma U Hammarlund; Kalev G Hantsoo; Miles A Henderson; Malcolm S W Hodgskiss; Tristan J Horner; Jon M Husson; Benjamin Johnson; Pavel Kabanov; C Brenhin Keller; Julien Kimmig; Michael A Kipp; Andrew H Knoll; Timmu Kreitsmann; Marcus Kunzmann; Florian Kurzweil; Matthew A LeRoy; Chao Li; Alex G Lipp; David K Loydell; Xinze Lu; Francis A Macdonald; Joseph M Magnall; Kaarel Mänd; Akshay Mehra; Michael J Melchin; Austin J Miller; N Tanner Mills; Chiza N Mwinde; Brennan O'Connell; Lawrence M Och; Frantz Ossa Ossa; Anais Pagès; Kärt Paiste; Camille A Partin; Shanan E Peters; Peter Petrov; Tiffany L Playter; Stephanie Plaza-Torres; Susannah M Porter; Simon W Poulton; Sara B Pruss; Sylvain Richoz; Samantha R Ritzer; Alan D Rooney; Swapan K Sahoo; Shane D Schoepfer; Judith A Sclafani; Yanan Shen; Oliver Shorttle; Sarah P Slotznick; Emily F Smith; Sam Spinks; Richard G Stockey; Justin V Strauss; Eva E Stüeken; Sabrina Tecklenburg; Danielle Thomson; Nicholas J Tosca; Gabriel J Uhlein; Maoli N Vizcaíno; Huajian Wang; Tristan White; Philip R Wilby; Christina R Woltz; Rachel A Wood; Lei Xiang; Inessa A Yurchenko; Tianran Zhang; Noah J Planavsky; Kimberly V Lau; David T Johnston; Erik A Sperling
Journal: Geobiology Date: 2021-07-05 Impact factor: 4.216