Literature DB >> 32382622

Metagenomic 16S rDNA amplicon data on bacterial diversity profiling and its predicted metabolic functions of varillales in Allpahuayo-Mishana National Reserve.

Juan C Castro¹, J Dylan Maddox^2,3,4, Hicler N Rodríguez^1,2, Richard B Orbe¹, Gad E Grandez^1,2, Kevin A Feldheim³, Marianela Cobos², Jae D Paredes², Carlos G Castro^1,2, Jorge L Marapara¹, Pedro M Adrianzén¹, Janeth Braga¹.

Abstract

The white-sands forests or varillales of the Peruvian Amazon are characterized by their distinct physical characteristics, patchy distribution, and endemism [1, 2]. Much research has been conducted on the specialized plant and animal communities that inhabit these ecosystems, yet their soil microbiomes have yet to be studied. Here we provide metagenomic 16S rDNA amplicon data of soil microbiomes from three types of varillales in Allpahuayo-Mishana National Reserve near Iquitos, Peru. Composite soil samples were collected from very low varillal, high-dry varillal, and high-wet varillal. Purified metagenomic DNA was used to prepare and sequence 16S rDNA metagenomic libraries on the Illumina MiqSeq platform. Raw paired-endsequences were analyzed using the Metagenomics RAST server (MG-RAST) and Parallel-Meta3 software and revealed the existence of a high percentage of undiscovered sequences, potentially indicating specialized bacterial communities in these forests. Also, were predicted several metabolic functions in this dataset. The raw sequence data in fastq format is available in the public repository Discover Mendeley Data (https://data.mendeley.com/datasets/syktzxcnp6/2). Also, is available at NCBI's Sequence Read Archive (SRA) with accession numbers SRX7891206 (very low varillal), SRX7891207 (high-dry varillal), and SRX7891208 (high-wet varillal).

Entities: Chemical Species

Keywords: 16S rRNA; Metagenomics; Peruvian amazon; Soil microbiome; Tropical forest; Varillales; White-sand forests

Year: 2020 PMID： 32382622 PMCID： PMC7201190 DOI： 10.1016/j.dib.2020.105625

Source DB: PubMed Journal: Data Brief ISSN： 2352-3409

Specifications Table

Value of the data

This is the first metagenomic 16S rDNA amplicon data on bacterial profiling and its predicted metabolic functions of varillales in Allpahuayo-Mishana National Reserve of the Peruvian Amazon. These data provide valuable information on the bacterial diversity and their metabolic functions of varillales in Allpahuayo-Mishana National Reserve of the Peruvian Amazon. Metagenomic 16S rDNA amplicon data revealed a high percentage of undiscovered sequences which may indicate varillales contain specialized bacterial communities.

Data Description

The dataset contains raw paired-end sequencing data acquired through the V3–V4 region of the 16S rDNA gene of metagenomic DNA isolated from three type of white-sand forests or varillales. The raw sequencing data contain 297,864 sequences totalling 5,966,319 base pairs with an average length of 200 bp. The data files (reads in FASTQ format) were deposited at the public repository Discover Mendeley Data (https://data.mendeley.com/datasets/syktzxcnp6/2) and the NCBI database (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA611870&o=acc_s%3Aa) under the BioProject No. PRJNA611870, BioSample accession numbers: SAMN14351537, SAMN14351538, and SAMN14351539; and SRA accession numbers: SRX7891206 (very low varillal), SRX7891207 (high-dry varillal), and SRX7891208 (high-wet varillal). MG-RAST analysis showed that a considerable proportion of sequences were unknown (≈20%). Among the identified sequences, Bacteria (98.4%) and Archaea (0.26%) comprised the majority of the representative kingdoms. The dataset includes data at phylum levels, rarefaction curves and α-diversity results from the very low varillal (Fig. 1), high-dry varillal (Fig. 2), and high-wet varillal (Fig. 3). Additionally, in this dataset were predicted several metabolic functions, such as genetic information processing, carbohydrate metabolism, energy metabolism, etc. (Fig. 4).

Fig. 1

Phylum levels, rarefaction curves and α-diversity of a very low varillal in Allpahuayo-Mishana National Reserve.

Fig. 2

Phylum levels, rarefaction curves and α-diversity of a high-dry varillal in Allpahuayo-Mishana National Reserve.

Fig. 3

Phylum levels, rarefaction curves and α-diversity of a high-wet varillal in Allpahuayo-Mishana National Reserve.

Fig. 4

Predicted metabolic functions from three types of varillales in Allpahuayo-Mishana National Reserve.

Phylum levels, rarefaction curves and α-diversity of a very low varillal in Allpahuayo-Mishana National Reserve. Phylum levels, rarefaction curves and α-diversity of a high-dry varillal in Allpahuayo-Mishana National Reserve. Phylum levels, rarefaction curves and α-diversity of a high-wet varillal in Allpahuayo-Mishana National Reserve. Predicted metabolic functions from three types of varillales in Allpahuayo-Mishana National Reserve.

Experimental Design, Materials, and Methods

Sample collection

In this dataset, soil samples were collected from varialles of Allpahuayo-Mishana National Reserve (Supplementary Fig. S1), which is located in a lowland tropical rain forest of the Peruvian Amazon between 130 and 153 m.a.s.l. Soil samples were obtained from three types of varialles as classified by [1]: 1) very low varillal (3°57′54.293``S, 73°26′10.110''W), which is characterized by a high density of small forest trees (height < 5 m) and an organic soil horizon thickness >11 cm ; 2) high-dry varillal (3°58′33.185``S, 73°25′37.165''W), which is characterized by larger forest trees (height >15 m) and an organic soil horizon thickness ≤11 cm; and 3) high-wet varillal (3°58′21.535``S, 73°25′54.369''W), which is also characterized by larger forest trees (height >15 m) but is differentiated by an organic soil horizon thickness >11 cm. Samples were obtained in October 2018 during the high water level season. In order to obtain a representative sample of soil bacterial diversity, thirteen soil cores (10 cm in diameter and 10 cm in depth) were collected in each varillal. The first soil core was designated the reference point for geographic coordinates. The remaining soil cores were sampled at five meter intervals in each cardinal direction with three soil cores obtained in each direction. All thirteen samples from a given reference point were pooled together, homogenized into a composite soil sample per varillal forest type and then passed through a 2 mm meshed sieve (Supplementary Fig. S2). The meshed soil samples were preserved temporarily at −20°C for further studies.

Metagenomic DNA isolation

Metagenomic DNA was isolated from composite soil samples following the protocol of Devi et al., [3]. In addition, to remove humic and fulvic acids contamination and exclude smaller fragments, partially purified metagenomic DNA was subjected to agarose gel (0.6%) electrophoresis for 30 min at 100 V and DNA fragments >20,000 bp were cut away using a sterile scalpel, placed in 2 mL microtubes, and purified with PureLink™ Quick Gel Extraction Kit (Invitrogen™, Catalog: K210012) following the manufacturer's instructions. Quality and quantity of the purified metagenomic DNA (size approximately to 10,000 bp) were verified by both electrophoretic and spectrophotometric analysis using a NanoDrop 2000 (Thermo Scientific).

Library preparation and next-generation DNA sequencing

Amplicon libraries were prepared following the 16S Metagenomics Sequencing Library preparation protocol (Part # 15044223 B). First, metagenomic DNA was amplified using primers designed to target 16S rDNA V3 and V4 regions [4]: 16S rDNA Amplicon PCR Forward Primer = 5′-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3′, 16S rDNA Amplicon PCR Reverse Primer = 5′-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAATCC-3′. These locus-specific primers were synthesized with overhanging Illumina adapter sequences. A second PCR was performed to incorporate multiplexing indices and Illumina sequencing adapters. Amplicon libraries were then purified using 0.8x AMPure XP beads (Beckman Coulter) and size verified on a Bioanalyzer 2100 (Agilent Technologies) using an Agilent High Sensitivity DNA Kit. Libraries were quantified using the Qubit™ dsDNA HS Assay Kit (Thermo Fisher Scientific), normalized, pooled, and paired-end sequenced using the MiSeq Illumina Platform.

Sequence analysis

Raw paired sequences were uploaded as FASTQ files and analysed using the MG-RAST server v 4.0.3 [5], [6], [7]. Reads obtained after quality control were subjected to taxonomic analysis by comparing with different ribosomal RNA databases using the open and closed-reference Operational Taxonomic Unit (OTU) picking strategy. The OTUs were classified using the Greengene 13_8 16S reference database [8]. Taxonomy assignments were made to each OTU using the RDP classifier [9] and Silvangs [10]. Finally, the sequence coverage by rarefaction analysis and the alpha diversity of species in each varillal was produced by the MG-RAST pipeline. The microbial metabolic pathways were determined based on the 16S rDNA gene data using Parallel-Meta3 software v 3.5.3 [11,12]

Subject	Genetics, Genomics and Molecular Biology
Specific subject area	Soil Metagenomics
Type of data	Figures and 16S rDNA amplicon sequencing data
How data were acquired	Soil samples were collected from three varillal forest types of Allpahuayo-Mishana National Reserve. The metagenomic DNA was isolated using standardized protocols, and sequenced on Illumina Miseq platform
Data format	Raw data in fastq format were deposited in the public repository Discover Mendeley Data (https://data.mendeley.com/datasets/syktzxcnp6/2). Also, raw data is available in NCBI (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA611870&o=acc_s%3Aa)
Parameters for data collection	Metagenomic DNA isolated from soil samples were prepared by amplifying the V3–V4 region of the 16S rDNA gene paired-end sequenced on an Illumina MiSeq platform.
Description of data collection	Filtered sequence reads were analysed using bioinformatics tools (i.e., MG-RAST analysis, Parallel-Meta3 software) of the NGS data.
Data source location	Institution: Universidad Nacional de la Amazonia PeruanaCity/Town/Region: Iquitos/Maynas/Loreto RegionCountry: PeruLatitude and longitude (and GPS coordinates) for collected samples/data:1. very low varillal (3°57′54.293"S, 73°26′10.110"W)2. high-dry varillal (3°58′33.185"S, 73°25′37.165"W)3. high-wet varillal (3°58′21.535"S, 73°25′54.369"W)
Data accessibility	Raw sequencing data are hosted in the public repository Discover Mendeley Data with direct URL to data: https://data.mendeley.com/datasets/syktzxcnp6/2Also, raw sequencing data is available at NCBI under the BioProject No. PRJNA611870 (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA611870&o=acc_s%3Aa). SRA accession numbers: SRX7891206 (very low varillal): https://www.ncbi.nlm.nih.gov/sra/SRX7891206SRX7891207 (high-dry varillal): https://www.ncbi.nlm.nih.gov/sra/SRX7891207SRX7891208 (high-wet varillal): https://www.ncbi.nlm.nih.gov/sra/SRX7891208

8 in total

1. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB.

Authors: T Z DeSantis; P Hugenholtz; N Larsen; M Rojas; E L Brodie; K Keller; T Huber; D Dalevi; P Hu; G L Andersen
Journal: Appl Environ Microbiol Date: 2006-07 Impact factor: 4.792

2. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.

Authors: Qiong Wang; George M Garrity; James M Tiedje; James R Cole
Journal: Appl Environ Microbiol Date: 2007-06-22 Impact factor: 4.792

3. MG-RAST version 4-lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis.

Authors: Folker Meyer; Saurabh Bagchi; Somali Chaterji; Wolfgang Gerlach; Ananth Grama; Travis Harrison; Tobias Paczian; William L Trimble; Andreas Wilke
Journal: Brief Bioinform Date: 2019-07-19 Impact factor: 11.622

4. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies.

Authors: Anna Klindworth; Elmar Pruesse; Timmy Schweer; Jörg Peplies; Christian Quast; Matthias Horn; Frank Oliver Glöckner
Journal: Nucleic Acids Res Date: 2012-08-28 Impact factor: 16.971

5. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.

Authors: F Meyer; D Paarmann; M D'Souza; R Olson; E M Glass; M Kubal; T Paczian; A Rodriguez; R Stevens; A Wilke; J Wilkening; R A Edwards
Journal: BMC Bioinformatics Date: 2008-09-19 Impact factor: 3.169

6. A Rapid and Economical Method for Efficient DNA Extraction from Diverse Soils Suitable for Metagenomic Applications.

Authors: Selvaraju Gayathri Devi; Anwar Aliya Fathima; Sudhakar Radha; Rex Arunraj; Wayne R Curtis; Mohandass Ramya
Journal: PLoS One Date: 2015-07-13 Impact factor: 3.240

7. Parallel-META 3: Comprehensive taxonomical and functional analysis platform for efficient comparison of microbial communities.

Authors: Gongchao Jing; Zheng Sun; Honglei Wang; Yanhai Gong; Shi Huang; Kang Ning; Jian Xu; Xiaoquan Su
Journal: Sci Rep Date: 2017-01-12 Impact factor: 4.379

8. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB.

Authors: Elmar Pruesse; Christian Quast; Katrin Knittel; Bernhard M Fuchs; Wolfgang Ludwig; Jörg Peplies; Frank Oliver Glöckner
Journal: Nucleic Acids Res Date: 2007-10-18 Impact factor: 16.971

8 in total