Literature DB >> 35572794

High-resolution X-ray diffraction datasets: Carbonates.

Abduljamiu O Amao1, Bandar Al-Otaibi2, Khalid Al-Ramadan3.   

Abstract

X-ray diffraction (XRD) analysis is a versatile and reliable method used in the identification of minerals in solid samples. It is one of the primary techniques geoscientists, mineralogist, solid-state chemists depend on to characterize the composition of unknown samples. In recent years there has been a growing interest among researchers to have readily accessible and large dataset to use to calibrate their experiment or to simply build various statistical models. Sadly, this is difficult to come by. Most well-curated datasets are propriety in nature and often too expensive for the average researcher. Additionally, when these datasets are available, they might not be suitable for purpose due to lack of proper coverage for certain a mineral of interest. For these reasons, we have carefully selected and curated samples rich in calcium carbonate that will be useful for various applications. Our dataset includes 1680 X-ray diffraction scans of samples collected from carbonate rich rock formations outcrops in Spain, Italy, and Saudi Arabia. They represent materials with total carbonate concentration range between 30-99%. The spectra were acquired on a Malvern PANalytical EMPYREAN Diffractometer system at two theta range 2- 70 and 0.01 step size. This dataset will be valuable to geoscientists, mineralogist, solid-state chemists, data scientists alike looking to design experiments, build mineralogical reference databases or statistical models with sufficient data points. We currently use the dataset in our own projects to develop comprehensive carbonate library and felt compelled to share.
© 2022 The Author(s). Published by Elsevier Inc.

Entities:  

Keywords:  Calcium carbonate rocks; Carbonates; Modelling; Spectra; X-ray diffraction

Year:  2022        PMID: 35572794      PMCID: PMC9092893          DOI: 10.1016/j.dib.2022.108204

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

In recent years there has been a growing interest among researchers to have readily accessible and large dataset to use to calibrate their experiment or to simply build various statistical models. Sadly, this is difficult to come by most well-curated datasets are propriety in nature and often too expensive for the average researcher. Additionally, when datasets are available, they might not be suitable for purpose due to lack of proper coverage for a mineral of interest. For these reasons, we have carefully selected and curated sample rich in calcium carbonate that will be useful for various applications. Our dataset includes 1680 X-ray diffraction scans of samples collected from Spain, Italy, and Saudi Arabia. They represent materials with calcium carbonate concentration ranging between 30-99%. Geoscientists, mineralogist, solid-state chemists, and data scientists alike looking to design experiments or build statistical models with large data points. This dataset can be used to build databases, create data science model requiring large datapoints or be used as a reference dataset for comparing carbonates

Data Description

Mendeley Data Repository (supplementary data): Folder one contains 1680 “. xy” carbonate spectra files in easily accessible format that can be open with a notepad. Mendeley Data Repository (supplementary data): Folder two contains 1680 carbonate spectra files in “. xrdml” PANalytical format. Mendeley Data Repository (supplementary data): Folder three contains 1680 carbonate spectra files in “. raw” Bruker/Siemens format. Supplementary data Table 1 attached: contains the major groupings of the 1680 carbonate spectra files based on their calcium carbonate composition.
Table 1

Summary statistic for the quantified spectra files including the mean, standard deviation (sd) the various quartiles (p0-p100) and the concentration range of each mineral depicted using histogram.

MineralChemistryMean [%]sdp0p25p50p75p100Histogram
Calcite [%]CaCO369.4824.925.2455.3679.3988.3999.57
Calcite_Magnesian [%]CaCO320.4716.010.009.5716.7726.6388.80
Vaterite [%]CaCO30.251.660.000.000.000.0530.25
Smithsonite [%]ZnCO30.060.210.000.000.010.033.02
Siderite [%]FeCO30.070.080.000.010.050.090.79
Rhodochrosite [%]MnCO30.070.070.000.000.050.100.60
Dolomite [%]MgCO30.130.150.000.020.100.181.65
Monohydrocalcite [%]CaCO3 • H2O0.130.150.000.020.100.181.65
Otavite [%]CdCO30.130.150.000.020.100.181.65
Calcite Group Total [%]90.7718.4136.2795.8398.7899.78102.84
Other Minerals Total [%]9.3218.430.000.291.214.2764.12
Summary statistic for the quantified spectra files including the mean, standard deviation (sd) the various quartiles (p0-p100) and the concentration range of each mineral depicted using histogram. Supplementary data table 2 attached: contains the quantified concentration calcite group of minerals of all the spectra files. Fig. 1 below is a sample spectrum from the dataset depicting a typical spectra file.
Fig. 1

The scan example from a sample “Carbonates_1550” selected from the dataset

The scan example from a sample “Carbonates_1550” selected from the dataset Fig. 2 Result of Uniform Manifold Approximation and Projection (UMAP) and hierarchical clustering (HCPC) showing the variability among the carbonates spectra files.
Fig. 2

Visualising the differences among the shared files

Visualising the differences among the shared files Fig. 3 A generalized pairs plot, offering a range of displays of paired combinations of categorical (Cluster) and quantitative variables (mineral Concentration). On the top, boxplots display the variability in concentration of the minerals. On the left, bar charts and scatterplots are used to display the main groups to which each of the spectra file belongs to based on cluster analysis while the scatterplots pair the concentration minerals. Pearson correlation is displayed on the right and range of variable distribution is available on the diagonal.
Fig. 3

A generalized pairs plot, offering a range of displays of paired combinations of categorical (Cluster) and quantitative variables (mineral Concentration). On the top, boxplots display the variability in concentration of the minerals. On the left, bar charts and scatterplots are used to display the main groups to which each of the spectra file belongs to based on cluster analysis while the scatterplots pair the concentration minerals. Pearson correlation is displayed on the right and range of variable distribution is available on the diagonal.

A generalized pairs plot, offering a range of displays of paired combinations of categorical (Cluster) and quantitative variables (mineral Concentration). On the top, boxplots display the variability in concentration of the minerals. On the left, bar charts and scatterplots are used to display the main groups to which each of the spectra file belongs to based on cluster analysis while the scatterplots pair the concentration minerals. Pearson correlation is displayed on the right and range of variable distribution is available on the diagonal.

Experimental Design, Materials and Methods

Samples for this dataset were collected from several research projects we've worked on in the past. The samples were collected from Spain, Italy, and Saudi Arabia representing over 10 carbonate formations. Spectra included in this dataset were carefully selected examining the percentage composition of calcium carbonate each individual file. The Calcite group of mineral is a loosely defined group dominated by Calcium carbonates. The group may also include other minerals such as dolomite, siderite, smithsonite, etc. (Table 1). Calcium carbonate (CaCO3) and its polymorphs (calcite, aragonite, and vaterite), are the focus of our dataset and someone of the most abundant and therefore important among biominerals. It is ubiquitous in nature and can be found in all rock types, hot springs, in caves, in ore deposits, and geyser deposits, etc., It also constitutes both structural and nonstructural components of living organisms and therefore its biomineralization makes a huge contribution to our ecological and geochemical systems [1], [2], [3]. A general non-linear dimension reduction technique i.e., Uniform Manifold Approximation and Projection (UMAP) was first applied to thousands of spectra files to sort out scans rich in carbonates and hierarchical clustering (HCPC) of UMAP embeddings was used to characterize the geochemical signatures and compare them to known carbonate files before they were quantified. If a spectrum was quantified and the calcium carbonate composition was below 30%, it was excluded from the list. Analyses were conducted using the R Statistical language (version 4.1.2; R Core Team, 2021) on Windows 10 × 64 (build 19044), using the packages hrbrthemes [4], powdR [5], tidymodels [6], purrr [7], report [8], datawizard [9], embed [10], rxylib [11], ggforce [12] and tidyverse [13]. Interconversion procedure between various formats of Powder X-Ray files was accomplished in PowDLL software [14]. The dataset was acquired on a Malvern PANalytical EMPYREAN Diffractometer system at two theta range 2- 70 and 0.01 step size, equipped with a Pixcel1D detector, a Reflection-transmission spinner (minimum step size Phi:0.1) sample stage, a Cu generator with K-Alpha1 [Å] =1.54060, K-Alpha2 [Å] =1.54443 and operated at a current of 40 mA and 45 kV volts. PowdR” an open-source software can be used to explore all the file types and to convert between the various XRD data types included in the dataset.

Ethics Statements

CRediT Author Statement

Abduljamiu O. Amao: Conceptualization, Methodology, Software, Formal analysis, Data Curation Bandar Al-Otaibi: Data curation, Investigation, Khalid Al-Ramadan: Funding acquisition, Writing - Review & Editing, Resources, Validation

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
SubjectEarth and Planetary Sciences
Specific subject areaData Mining, Spectroscopy and Statistical Analysis
Type of data. xy files.raw files. xrdml file.csvfiguretable
How the data were acquiredThe dataset was acquired on a Malvern PANalytical EMPYREAN Diffractometer system at two theta range 2- 70 and 0.01 step size, equipped with a Pixcel1D detector, a Reflection-transmission spinner (minimum step size Phi:0.1) sample stage, a Cu generator with K-Alpha1 [Å] =1.54060, K-Alpha2 [Å] =1.54443 and operated at a current of 40 mA and 45 kV volts. Custom scripts were written in R to transform, visualised, filtered, and processed and quantify the data with the help of packages such as “tidyverse”, “tidymodels”, “PowdR”. Conversion between XRD data types was accomplished in “PowDLL” software
Data formatrawanalysedfiltered
Description of data collectionCarbonate rock samples were first pulverised into powder, then loaded onto a Malvern PANalytical 27mm Sample Insert (9430 018 11271) with the aid of powder sample preparation kit (9430 017 70101) for making very flat pellet surface through back-loading. Subsequently, the pressed sample was supported by the bottom plate before loading to the system for analysis. The spectra were acquired on a Malvern PANalytical EMPYREAN Diffractometer system at two theta ranges of 2- 70 and 0.01 step size.
Data source locationKing Fahd University of Petroleum and Minerals, Dhahran Saudi Arabia
Data accessibilityRepository name: Mendeley DataData identification number: 10.17632/kxd48sck74.1Direct URL to data: https://data.mendeley.com/datasets/kxd48sck74/1
  1 in total

1.  High-magnesian calcite mesocrystals: a coordination chemistry approach.

Authors:  Jos J M Lenders; Archan Dey; Paul H H Bomans; Jan Spielmann; Marco M R M Hendrix; Gijsbertus de With; Fiona C Meldrum; Sjoerd Harder; Nico A J M Sommerdijk
Journal:  J Am Chem Soc       Date:  2012-01-06       Impact factor: 15.419

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.