Literature DB >> 35252502

Metabarcoding data of prokaryotes and eukaryotes inhabiting the phosphogypsum stockpiles on the salt marshes of Huelva (SW Spain).

Patricia Gómez-Villegas1, José Luis Guerrero2, Miguel Pérez-Rodríguez3, Juan Pedro Bolivar2, Javier Vigara1, Rosa León1.   

Abstract

Around 100 Mt of phosphogypsum (PG) of extreme acidity and with high concentrations of heavy metals and radionuclides have been deposited on the salt marshes of the Tinto River estuary in Huelva (SW Spain) for more than forty years. The microbial community able to thrive in these adverse conditions remains totally unknown, despite the fact that it can highly influence the biogeochemical cycle of the phosphogypsum components and include new species with biotechnological interest. High throughput sequencing of 16S/18S rRNA encoding genes is a potent tool to uncover the microbial diversity of extreme environments. This data article describes for the first time the prokaryotic and eukaryotic diversity of two water samples collected in the Huelva phosphogypsum stacks. The raw amplicons of the 16S/18S rRNA maker genes for the two phosphogypsum samples and two reference samples (seawater and the Tinto River water) obtained after sequencing on MiSeq platform are provided. The operational taxonomic units (OTUs) obtained after the treatment and clustering of the obtained reads with the QIIME2 pipeline and their taxonomic assignation performed by comparison with the SILVA database are also presented to complete the information of the article "Exploring the microbial community inhabiting the phosphogypsum stacks of Huelva (SW, Spain) by a high throughput 16S/18S rDNA Sequencing approach".
© 2022 The Author(s). Published by Elsevier Inc.

Entities:  

Keywords:  16s rRNA; 18s rRNA; Extreme environment; Metabarcoding; Metataxonomy; Phosphogypsum

Year:  2022        PMID: 35252502      PMCID: PMC8891958          DOI: 10.1016/j.dib.2022.107989

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Raw data were obtained on Illumine MiSeq platform (Illumina, San Diego, California) with Illumina MiSeq Reagent kit V2 (2 × 250 bp) Processed data were obtained with QIIME 2 (v2020.8) software from the raw data Operational taxonomic units were taxonomically classified by comparison with the SILVA database Institution: University of Huelva City/Town/Region: Huelva Country: Spain

Value of the Data

This dataset provides information on the prokaryotic and eukaryotic microbial population present in the phosphogypsum stacks of Huelva, revealing an unexpected biodiversity Raw data obtained from Illumina MiSeq sequencing platform can be processed with different bioinformatics pipelines to analyze the microbial population of the sampled locations. This dataset demonstrates that high throughput sequencing of the 16S/18S rRNA genes of environmental samples is a potent tool for the metataxonomic analysis of microbial communities. The data provide useful information that can serve to compare the microbial population of this highly polluted and acidic environment with other similar locations in the world. This dataset can reveal the existence of new extremophilic species with interesting biotechnological applications.

Data Description

The data presented in this paper are the results of sequencing the hypervariable regions of the 16S rRNA and 18S rRNA encoding genes for the genomic DNA samples obtained from four environmental samples. Two of these samples were obtained at the south of the zone 2 of the phosphogypsum stacks located on the Tinto salt marshes in Huelva (Fig. 1). The other two were obtained from the seawater and Tinto River and included in this study as reference samples.
Fig. 1

Aspect of the phosphogypsum stacks of Huelva.

Aspect of the phosphogypsum stacks of Huelva. The Genomic DNA from the two phosphogypsum locations and the two reference samples was isolated and its quality was assessed (Table 1).
Table 1

Purity of the Genomic DNA isolated from the two phosphogypsum and the two reference samples.

Sample descriptionSample NameConcentration (ng/µl)260/280 ratioVolumen (µl)% G + C
Tinto River (TR)D3_16S33.741.812555.27
Seawater (SW)D4_16S5.841.582552.55
Perimeter channel (PC)D7_16S6.272.042055.24
Piezometer (PZ)D8_16S4.091.592055.35
Tinto River (TR)D3_18S33.741.812554.08
Seawater (SW)D4_18S5.841.582550.86
Perimeter channel (PC)D7_18S6.272.042050.72
Piezometer (PZ)D8_18S4.091.592053.80
Purity of the Genomic DNA isolated from the two phosphogypsum and the two reference samples. The data corresponding to the forward and reverse raw pair-end sequences (without barcode and primer sequences) obtained after sequencing the V3-V4 hypervariable regions of the 16S rRNA gene (supplementary material, S1) and the V9 hypervariable regions of the 18S rRNA gene (supplementary material, S2) are available in the Supplementary Material of this publication in compressed FASTQ format files Table 2. summarizes the naming for the included files.
Table 2

List of FASTQ files included as supplementary material which contain the raw data (forward and reverse) obtained from the sequencing the 16sRNA and the 18sRNA libraries: PC, perimeter channel; PZ, piezometer; TR, Tinto River; SW, seawater.

16S rRNA18S rRNA
D7_16S_1.fqPC, forwardSuppl. Mat S1D7_18S_1.fqPC, forwardSuppl. Mat S2
D7_16S_2.fqPC, reverseSuppl. Mat S1D7_18S_2.fqPC, reverseSuppl. Mat S2
D8_16S_1.fqPZ, forwardSuppl. Mat S1D8_18S_1.fqPZ, forwardSuppl. Mat S2
D8_16S_2.fqPZ, reverseSuppl. Mat S1D8_18S_2.fqPZ, reverseSuppl. Mat S2
D3_16S_1.fqTR, forwardSuppl. Mat S1D3_18S_1.fqTR, forwardSuppl. Mat S2
D3_16S_2.fqTR, reverseSuppl. Mat S1D3_18S_2.fqTR, reverseSuppl. Mat S2
D4_16S_1.fqSW, forwardSuppl. Mat S1D4_18S_1.fqSW, forwardSuppl. Mat S2
D4_16S_2.fqSW, reverseSuppl. Mat S1D4_18S_2.fqSW, reverseSuppl. Mat S2
List of FASTQ files included as supplementary material which contain the raw data (forward and reverse) obtained from the sequencing the 16sRNA and the 18sRNA libraries: PC, perimeter channel; PZ, piezometer; TR, Tinto River; SW, seawater. For prokaryotes, the raw reads obtained were 19,129 in the Perimeter channel and 13,236 in the Piezometer. For eukaryotes, the raw reads were 10,067 in the Perimeter channel and 21,890 in the Piezometer. These raw data was treated as indicated in the schematic workflow (Fig. 2) to yield 680 prokaryotic and 38 eukaryotic Operational Taxonomic Units in the Perimeter channel, and 596 prokaryotic and 186 eukaryotic Operational Taxonomic Units in the Piezometer.
Fig. 2

Workflow of data analysis from obtaining of the raw reads to the species annotation.

Workflow of data analysis from obtaining of the raw reads to the species annotation. The number of reads and of Operational Taxonomic Units generated from these raw sequences after denoising, merging, chimera filtering and clustering are summarized in Table 3.
Table 3

Number of raw, filtered and merged reads, and the number of clustered Operational Taxonomic Unit, for each sample. Adapted from Gómez-Villegas et al. 2022 [1].

Mean quality
CodeSampleRaw ReadsAfter denoisingMerged inputsNon chimeric readsQ30 (%)Q ScoreObserved OTUs
D7_16SPC_16S19,12914,99511,33810,55594.60≥ 36680
D7_18SPC_18S10,06788648716607693.36≥ 3638
D8_16SPZ_16S17,08413,23610,254940994.54≥ 36596
D8_18SPZ_18S30,36221,89019,60715,08089.74≥ 36186
D3_16STR_16S189,164175,352145,395145,39595.12≥ 36348
D3_18STR_18S160,136126,280123,047103,36590.99≥ 36133
D4_16SSW_16S206,161190,181179,924171,67795.17≥ 36838
D4_18SSW_18S203,677172,627162,688111,90692.24≥ 36399

Sample names: PC, perimeter channel; PZ, piezometer; TR, Tinto River; SW, Seawater.

Number of raw, filtered and merged reads, and the number of clustered Operational Taxonomic Unit, for each sample. Adapted from Gómez-Villegas et al. 2022 [1]. Sample names: PC, perimeter channel; PZ, piezometer; TR, Tinto River; SW, Seawater. The obtained Operational Taxonomic Units were classified at different taxonomic levels by comparison with the SILVA database as described in material and methods. The results for the prokaryotic and eukaryotic microorganisms are available in the supplementary material (files S3 and S4). All effective tags grouped by 97% DNA sequence similarity into Operational Taxonomic Units are compiled in supplementary, as well as their classification at different taxonomic levels by comparison with SILVA database as described in material and methods. The results for the prokaryotic (Suppl. Mat. File S3) and eukaryotic (Suppl. Mat. File S4) microorganisms are available in the supplementary material of this article and in Dryad repository (10.5061/dryad.18931zczx). A detailed comparative analysis of the most abundant genera at each location is shown elsewhere [1].

Experimental Design, Materials and Methods

Collection of samples

Two of the samples were collected from two different water bodies of the phosphogypsum stacks deposited on the Tinto salt marshes in Huelva, Spain. The first sample was collected from the perimeter drainage channel that surrounds the zone 2, collecting leachates from the stored phosphogypsum (UTM coordinates 29 S 684,521, 4,123,390). The second one was taken from a piezometer located in the border of the same zone that receives underground leachates with a depth of 3–4 m (UTM coordinates 29 S 684,536, 4,123,295). Samples from the seawater and the lower course of the Tinto River, collected near the cities of Huelva (UTM coordinates 29 S 687,322, 4,113,035) and Niebla (UTM coordinates 29 S 706,089, 4,138,023), respectively, have also been included for comparison.

Genomic DNA extraction

Genomic DNA isolation was performed using the GeneJet Genomic Purification Kit (Thermo Fisher Scientific, Waltham, MA, USA) and the biomass obtained from 10 L of water from each of the described locations. The biomass was obtained by filtering through 0.7 µm glass fiber filters (Whatman, GF/G) and centrifugation at 12 000 x g. The genomic DNA was quantified using Nanodrop Spectrophotometer ND-1000 (Thermo Fisher Scientific). The quality of the obtained DNA was verified by the A260/A280 ratio and by electrophoretic analysis in 2% agarose gel as previously described [2].

Library construction and amplicon sequencing

The hypervariable V3-V4 16S rDNA region was amplified with primers 1380F/1510R, and the corresponding eukaryotic hypervariable V9 18S rDNA region was amplified with the primers 341F/806R. In both cases, amplifications were done with the Phusion® High-Fidelity PCR Master Mix (New England Biolabs, MA, USA) and using the genomic DNA isolated at the indicated points as a template. For each sample, the amplicons were purified with the Qiagen Gel Extraction Kit (Qiagen, Germany), pooled and used to generate two libraries, one for the 18S rRNA and one for the 16S rRNA, using NEBNext® UltraTM DNA Library Prep Kit for Illumina. The quality of the eight libraries generated was assessed and quantified using Qubit and agarose electrophoresis as QC procedures. The pooled libraries were sequenced with the Illumina MiSeq Reagent kit V2, using a 2 × 250 bp paired-end strategy, in the Illumine MiSeq platform (Illumina, San Diego, California) following the manufacture´s protocol.

Bioinformatics and data analysis

The analysis of the raw data was carried out using QIIME 2 (v2020.8) [3]. First of all, raw data were demultiplexed, using the q2-demux plugin, and filtered, to get clean data, by trimming and truncating low-quality regions, dereplicating reads, and filtering chimeras, using DADA2 [4] (via q2-dada2). Then, the reads were organized in operational taxonomic units using de novo clustering method (via q2-vsearch) from VSEARCH [5]. The clustering was performed grouping at 97% identity to create 97% operational taxonomic units . The operational taxonomic units were classified at each taxonomic rank using the q2-feature-classifier plugin (via classify-sklearn method) and the SILVA database [6]. The SILVA database was applied as two different pre-trained classifiers, specially curated, for 16SV3V4 and 18SV9 regions sequenced. Annotation was performed with a 0.7 threshold.

Ethics Statements

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper

CRediT authorship contribution statement

Patricia Gómez-Villegas: Investigation, Writing – review & editing. José Luis Guerrero: Investigation, Writing – review & editing. Miguel Pérez-Rodríguez: Software, Data curation, Writing – review & editing. Juan Pedro Bolivar: Writing – review & editing. Javier Vigara: Supervision, Writing – review & editing. Rosa León: Conceptualization, Supervision, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
SubjectEnvironmental science
Specific subject areaMetataxonomy of environmental samples.
Type of dataFASTQ and Excel Tables
How the data were acquired

Raw data were obtained on Illumine MiSeq platform (Illumina, San Diego, California) with Illumina MiSeq Reagent kit V2 (2 × 250 bp)

Processed data were obtained with QIIME 2 (v2020.8) software from the raw data

Operational taxonomic units were taxonomically classified by comparison with the SILVA database

Data formatRaw and analysed
Description of data collectionTwo of the samples were collected from two different water bodies of the phosphogypsum stacks deposited on the Tinto salt marshes of Huelva, SpainPC: Perimeter channel, superficial waterUTM coordinates 29 S 684,521, 4,123,390PZ: Piezometer, 3–4 m depthUTM coordinates 29 S 684,536, 4,123,295Other two samples were collected from the seawater and the Tinto RiverSW: Seawater, near the city of HuelvaUTM coordinates 29 S 687,322, 4,113,035TR: Tinto River, near the town of NieblaUTM coordinates 29 S 706,089, 4,138,023Genomic DNA was extracted from each sample and used as template for amplification and sequencing of the corresponding hypervariable regions of the 16S/18S rDNA marker genes
Data source location

Institution: University of Huelva

City/Town/Region: Huelva

Country: Spain

Data accessibilityAs supplementary material with this article and in the Dryad repository (10.5061/dryad.18931zczx)
Related research articleEXPLORING THE MICROBIAL COMMUNITY INHABITING THE PHOSPHOGYPSUM STACKS OF HUELVA (SW SPAIN) BY A HIGH THROUGHPUT 16S/18S rDNA SEQUENCING APPROACH. Submitted to Aquatic toxicology. In press https://doi.org/10.1016/j.aquatox.2022.106103
  6 in total

1.  Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.

Authors:  Evan Bolyen; Jai Ram Rideout; Matthew R Dillon; Nicholas A Bokulich; Christian C Abnet; Gabriel A Al-Ghalith; Harriet Alexander; Eric J Alm; Manimozhiyan Arumugam; Francesco Asnicar; Yang Bai; Jordan E Bisanz; Kyle Bittinger; Asker Brejnrod; Colin J Brislawn; C Titus Brown; Benjamin J Callahan; Andrés Mauricio Caraballo-Rodríguez; John Chase; Emily K Cope; Ricardo Da Silva; Christian Diener; Pieter C Dorrestein; Gavin M Douglas; Daniel M Durall; Claire Duvallet; Christian F Edwardson; Madeleine Ernst; Mehrbod Estaki; Jennifer Fouquier; Julia M Gauglitz; Sean M Gibbons; Deanna L Gibson; Antonio Gonzalez; Kestrel Gorlick; Jiarong Guo; Benjamin Hillmann; Susan Holmes; Hannes Holste; Curtis Huttenhower; Gavin A Huttley; Stefan Janssen; Alan K Jarmusch; Lingjing Jiang; Benjamin D Kaehler; Kyo Bin Kang; Christopher R Keefe; Paul Keim; Scott T Kelley; Dan Knights; Irina Koester; Tomasz Kosciolek; Jorden Kreps; Morgan G I Langille; Joslynn Lee; Ruth Ley; Yong-Xin Liu; Erikka Loftfield; Catherine Lozupone; Massoud Maher; Clarisse Marotz; Bryan D Martin; Daniel McDonald; Lauren J McIver; Alexey V Melnik; Jessica L Metcalf; Sydney C Morgan; Jamie T Morton; Ahmad Turan Naimey; Jose A Navas-Molina; Louis Felix Nothias; Stephanie B Orchanian; Talima Pearson; Samuel L Peoples; Daniel Petras; Mary Lai Preuss; Elmar Pruesse; Lasse Buur Rasmussen; Adam Rivers; Michael S Robeson; Patrick Rosenthal; Nicola Segata; Michael Shaffer; Arron Shiffer; Rashmi Sinha; Se Jin Song; John R Spear; Austin D Swafford; Luke R Thompson; Pedro J Torres; Pauline Trinh; Anupriya Tripathi; Peter J Turnbaugh; Sabah Ul-Hasan; Justin J J van der Hooft; Fernando Vargas; Yoshiki Vázquez-Baeza; Emily Vogtmann; Max von Hippel; William Walters; Yunhu Wan; Mingxun Wang; Jonathan Warren; Kyle C Weber; Charles H D Williamson; Amy D Willis; Zhenjiang Zech Xu; Jesse R Zaneveld; Yilong Zhang; Qiyun Zhu; Rob Knight; J Gregory Caporaso
Journal:  Nat Biotechnol       Date:  2019-08       Impact factor: 54.908

2.  Exploring the microbial community inhabiting the phosphogypsum stacks of Huelva (SW SPAIN) by a high throughput 16S/18S rDNA sequencing approach.

Authors:  Patricia Gómez-Villegas; José Luis Guerrero; Miguel Pérez-Rodriguez; Juan Pedro Bolívar; Antonio Morillo; Javier Vigara; Rosa Léon
Journal:  Aquat Toxicol       Date:  2022-01-24       Impact factor: 4.964

3.  DADA2: High-resolution sample inference from Illumina amplicon data.

Authors:  Benjamin J Callahan; Paul J McMurdie; Michael J Rosen; Andrew W Han; Amy Jo A Johnson; Susan P Holmes
Journal:  Nat Methods       Date:  2016-05-23       Impact factor: 28.547

4.  VSEARCH: a versatile open source tool for metagenomics.

Authors:  Torbjørn Rognes; Tomáš Flouri; Ben Nichols; Christopher Quince; Frédéric Mahé
Journal:  PeerJ       Date:  2016-10-18       Impact factor: 2.984

5.  Characterization of the Microbial Population Inhabiting a Solar Saltern Pond of the Odiel Marshlands (SW Spain).

Authors:  Patricia Gómez-Villegas; Javier Vigara; Rosa León
Journal:  Mar Drugs       Date:  2018-09-12       Impact factor: 5.118

6.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors:  Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.