Literature DB >> 30747914

A metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining.

Markus Gastauer1, Mabel Patricia Ortiz Vera1,2, Kleber Padovani de Souza1,3, Eder Soares Pires1, Ronnie Alves1,3, Cecílio Frois Caldeira1, Silvio Junio Ramos1, Guilherme Oliveira1.   

Abstract

Microorganisms are useful environmental indicators, able to deliver essential insights to processes regarding mine land rehabilitation. To compare microbial communities from a chronosequence of mine land rehabilitation to pre-disturbance levels from references sites covered by native vegetation, we sampled non-rehabilitated, rehabilitating and reference study sites from the Urucum Massif, Southwestern Brazil. From each study site, three composed soil samples were collected for chemical, physical, and metagenomics analysis. We used a paired-end library sequencing technology (NextSeq 500 Illumina); the reads were assembled using MEGAHIT. Coding DNA sequences (CDS) were identified using Kaiju in combination with non-redundant NCBI BLAST reference sequences containing archaea, bacteria, and viruses. Additionally, a functional classification was performed by EMG v2.3.2. Here, we provide the raw data and assembly (reads and contigs), followed by initial functional and taxonomic analysis, as a base-line for further studies of this kind. Further investigation is needed to fully understand the mechanisms of environmental rehabilitation in tropical regions, inspiring further researchers to explore this collection for hypothesis testing.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30747914      PMCID: PMC6371960          DOI: 10.1038/sdata.2019.8

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

In many countries, the environmental rehabilitation of mine lands as close as possible to its pre-disturbance levels is a legal requirement to reduce net losses of biodiversity and ecosystem functions[1,2]. It is necessary to monitor rehabilitating sites to meet targets of environmental licensing agencies[2]. To date, there is no consensus on the best indices available from science to evaluate the monitoring process[3]. Therefore, multidisciplinary approaches aiming at providing such parameters have been proposed recently[4,5]. Besides vegetation or fauna surveys[6], the examination of microbial communities can detect environmental alterations in short time scales[7], thus able to deliver insights about the fulfillment of rehabilitation targets[8,9]. Metagenomic approaches provide insights into environmental variations[10-12], detecting the diversity of microorganisms in rehabilitating habitats[13]. Comparing the composition of microbial communities from rehabilitating communities to preserved reference sites may thus contribute to the evaluation of rehabilitation success in mine lands[14,15]. In Brazil, one of the world’s leading raw iron export nations[16], iron ore deposits occur in open-cast mines in different regions. Ferriferous savanna ecosystems named ‘cangas’[17,18] cover the deposits in the Iron Quadrangle (Minas Gerais), the Carajás mountains (Pará), the Caetité region (Bahia), and the Urucum Massif (Mato Grosso do Sul). Due to particular environmental conditions such as high concentrations of metal ions, especially iron, high radiation, elevated temperatures, and ample rainfall seasonal amplitudes, these diverse and endemic ecosystems[19-21] are considered hotspots of biodiversity[17,22]. Besides the storage of unique genetic resources for therapeutic purposes[23] or the remediation of contaminated areas[24,25], rupestrian canga ecosystems provide many ecosystem services[26]. Impacted by iron ore extraction[27] reshaping entire landscapes[28] by the removal of ore and mining wastes, the environmental rehabilitation of the impacted ecosystems is desired aiming at the preservation of biotic resources and ecosystem services for future generations. Insights to composition, diversity and functional characterization of microbial soil communities along environmental rehabilitation gradients are useful variables for measuring the success of rehabilitation, able to provide valuable feedback to improve the rehabilitation practice. The goal of this study was to identify changes in microbial community composition, diversity and functional processes resulting from mine land rehabilitation and compare to pre-disturbance levels from references sites covered by native canga vegetation. We sampled three study sites before rehabilitation efforts, seven study sites spanning different rehabilitation stages and three reference canga sites associated with two iron ore mines from Corumbá (Urucum Massif). Environmental rehabilitation comprises topographic reformulation after removal of the iron ore, liming, fertilization and the application of biomass before native canga species are seeded or planted. At each study site, we installed three plots of 10 × 10 m; in each plot, a composed soil sample was collected (depth 0–2 cm) for metagenomics analysis. An additional sample (depth 0–10 cm) was collected for physical and chemical analysis. In this study, we applied a paired-end sequencing technology (NextSeq 500 Illumina) after DNA extraction, purification and amplification to construct metagenomic libraries. The Illumina reads were assembled using MEGAHIT. Subsequently, nucleotide sequences coding for proteins (CDS), were extracted from assemblies. Functional and taxonomic classification of coding DNA sequences (CDS) was performed using EMG and Kaiju. Here, we provide the complete metagenomic data set, without detailed analysis of results or discussion to highlight its outstanding comprehensive view into soil microbial communities from the rehabilitation of a canga ecosystem occurring in Southwestern Brazil. We furthermore present the annotated metagenome assembly, containing taxonomic and functional classification as well as chemical soil properties (i.e., pH, cation exchange capacity, organic matter contents, micro- and macro nutrient as well as aluminum availability) and soil texture. The present collection is the first high-throughput sequencing-based survey from non-rehabilitated and reference sites as well as sites under rehabilitation after iron ore mining from a tropical region, thus representing base-line data for further studies of this kind. With its publication, researchers can explore this collection for hypothesis testing related to environmental rehabilitation in tropical regions, especially after mining activities. The consistency in experimental design, sequencing methodology and sample sources ensures the value of this collection for on-going studies about environmental rehabilitation after anthropogenic impacts, in particular, those about mine land rehabilitation.

Methods

Experimental design

Data were collected in October 2016 in 13 study sites from open-cast iron ore mines situated in the Urucum Massif, Mato Grosso do Sul, Brazil (Fig. 1). The altitude of the massif varies between 600 and 1,065 m a.s.l. With a mean annual temperature of 25 °C and mean annual precipitation of 1,070 mm[29], the climate of the region corresponds a tropical warm, savanna climate (Aw in the Koppen classification), characterized by dry winters and rainy summers. The natural vegetation is a mosaic of seasonal deciduous and semi-deciduous forests on slopes and near watercourses. Furthermore, different savanna formations, ranging from arborized physiognomies to treeless grasslands stock on the upper parts of the massif [30].
Figure 1

Map of geographical position of the study sites in the Urucum Massif, Corumbá, Mato Grosso do Sul, Brazil.

1. Rampa Nova, 2. Mina 5, 3. PRAD 45 C, 4. Piscinão 5. Mina Cateto, 6. PRAD 45 A, 7. Mina Escarpa, 8. Secção 10I, 9. Mina 5 N, 10. PRAD 45B, 11. Reference A, 12. Reference B, and 13. Reference C.

Iron ore mining in the region is restricted to the outcrops of ferruginous jaspilites and fixed hematites from the Santa Cruz Formation[31] and begins with the suppression of vegetation and removal of topsoil layers. Environmental rehabilitation after mining includes topological reformulation, topsoil application, liming and fertilization of mine soils. Organic matter originating from suppressed areas is added. The rehabilitation targets are native open savanna formations, i.e., pre-mining formations on ironstone outcrops. Thus, plants rescued from suppressed areas and seedlings of native species produced in a tree nursery are planted to trigger environmental rehabilitation of mine lands. Additionally, seed mixtures of native species collected in the vegetation remnants were applied. On-demand, further activities, such as re-plantation of seedlings, further applications of seeds, and combating alien invasive species, were executed. Study sites comprise three bare soil areas immediately before rehabilitation activities are carried out, seven sites from different rehabilitating stages (two-, three- and six-year-old stands) as well as three reference sites covered by native vegetation, i.e., open savanna formations (Table 1). At each study site, three plots (10 × 10 m) were installed in homogeneous vegetation without signs of external disturbances.
Table 1

Site information for all 13 sampling locations utilized in this study.

CategoryStudy sitesSample AliasLatitudeLongitudeAge
Age is the time interval (in years) between the beginning of rehabilitation activities and sampling.
Non-revegetated study sitesRampa NovaNR_RN_1 – NR_RN_3−19.1950−57.60300
Mina 5NR_M5_1 – NR_M5_3−19.1848−57.61110
PRAD 45 CNR_PR_1 - NR_PR_3−19.2171−57.59080
Sites in environmental rehabilitationPiscinãoRH_PI_1 - RH_PI_3−19.1918−57.60246
Mina CatetoRH_MC_1 - RH_MC_3−19.2168−57.58173
PRAD 45 ARH_PA_1 - RH_PA_3−19.1855−57.60753
Mina EscarpaRH_ME_1 - RH_ME_3−19.1927−57.60323
Secção 10IRH_SC_1 - RH_SC_3−19.1909−57.60202
Mina 5 NRH_M5_1 - RH_M5_3−19.2178−57.58642
PRAD 45BRH_PB_1 - RH_PB_3−19.1840−57.61102
Reference sites, covered by natural canga vegetationReference AREF_A_1 – REF_A1_3−19.1921−57.6016
Reference BREF_B_1 – REF_B_3−19.1837−57.6126
Reference CREF_C_1 – REF_C_3−19.2095−57.5935
Two mixed soil samples were collected from each plot. For each sample, the substrate from five homogeneously distributed sampling points within each plot was mixed. The first sample collected at a depth of 0–10 cm was air dried and submitted to analysis of chemical properties and texture. The pH in water (pH(H2O)) and in potassium chloride (pH(KCl)), organic matter (OM), available phosphorus (P), potassium (K), sulfur (S), calcium (Ca), magnesium (Mg), aluminum (Al), boron (B), zinc (Zn), iron (Fe), manganese (Mn) and copper (Cu) concentrations as well as effective cation exchange capacity (ECEC) of the samples were determined following standardized protocols[32]. Soil texture was detected by particle-size distribution analysis using the pipette method. A mixed superficial soil sample (depth 0–2 cm) was collected for metagenomics analysis from each plot. Immediately after collection, soil samples were cooled in a fridge to avoid DNA degeneration. At the lab, the samples were stored in a freezer of −80 °C until analysis.

DNA extraction and shotgun sequencing

From 250 mg soil from each sample, total DNA was extracted using the PowerSoil DNA Isolation Kit (Mobio Laboratories, USA) following the manufacturer’s instructions. DNA samples were quantified using Qubit 3.0 fluorometer (Thermo Fisher Scientific Inc.). Shotgun metagenomic paired-end libraries were then constructed from 50 ng of pure DNA. For that, samples were subjected to a random enzymatic fragmentation in which the DNA was simultaneously fragmented and bound to adapters using the QXT SureSelect kit (Agilent Technologies). The fragmented DNA was purified using AmPure XP beads (Beckman Coulter) and subjected to an amplification reaction using primers complementary to the Illumina flowcell adapters. Amplified libraries were again purified using AmPure XP beads (Beckman Coulter), quantified using the Qubit 3.0 Fluorometer (Thermo Fisher Scientific Inc.) and checked for fragments size in the 2100 Bioanalyzer (Agilent Technologies®) using a High Sensitivity DNA kit (Agilent Technologies). After that, the libraries were adjusted to a concentration of 4 nM, pooled, denatured and diluted to a running concentration of 1.8 pM. The sequencing run was performed in the NextSeq 500 Illumina platform using a NextSeq 500 v2 kit high-output with 150 cycles.

Genome assembly, taxonomic and functional classification

The Illumina paired-end reads were assembled using MEGAHIT v1.1.2[33], using default parameters (Fig. 2). Contigs were output in the fasta format.
Figure 2

Workflow of genome assembly, functional and taxonomic classification and data validation applied in this study.

Rounded rectangles symbolize processes containing descriptions and tools, and rectangles represent input and/or output files enclosing a brief description, file name (∗xxx∗ is a placeholder for sample ID) and format, as well as their localization. CDS stands for coding DNA sequences. NCBI indicates that files are available from NCBI (Data Citation 1), whereas SF indicates the corresponding files were deposited in Open Science Framework (Data Citation 2).

Using a locally installed EMG v2.3.2 pipeline[34], coding DNA sequences (CDS) were extracted from contigs output as .fnn files. Furthermore, the pipeline produces the functional classification output as .ipr files. Subsequently, the taxonomic classification was performed on CDS using Kaiju v.1.4.4 (running mode: greedy, with up to 5 substitutions; minimum match: 12; minimum match score: 70)[35]. As reference database, we used the non-redundant NBCI BLAST protein sequences (access on December, 8th, 2016, containing 81 M protein sequences from Bacteria, Archaea, and Viruses). We estimated average coverage as the fraction of the observed microbial community covered by the NBCI BLAST protein sequence by package Nonpareil v3.3.3[36], using forward reads with quality scores greater than Q20, as recommended by the tool.

Cluster analysis

For data validation, taxonomic and functional counting matrices were generated. Differences in entire microorganism richness, i.e., the taxonomic matrix containing all CDS identified until genus level, between non-rehabilitated, rehabilitating and reference study sites were outlined using one-way ANOVA followed by post-hoc Tukey HSD tests after checking for normality and homogeneity of variance. Diversity was estimated as Shannon’s diversity index H’, using package vegan v2.5-2[37] in R Environment. We used package pvclust v2.0[38] in R Environment v3.4.1[39] to compute the clusters from the taxonomic counting matrix, considering genus-level predictions from Kaiju. Cluster consistency was tested using the approximately unbiased (au) and the bootstrap probability (bp) statistics[40]. Both statistics return p-values ranging from 0 to 1, where 0 represents a weak consistency and 1 represents a strong consistency for all formed clusters. As au is a better approximation to unbiased p-value than bp, we considered only with au values larger than 0.95, which represents a strong similarity between the grouped samples. Finally, an integral analysis of taxonomy was performed by MGCOMP[41] to observe the relationship among sample profiles and sites. In order to reduce the influence of rare organisms in this analysis, we considered only the top 30 most abundant genera for each sample, which corresponds to the smallest number of genera covering 50% of the analyzed sequences. Based on these top 30 genera, we performed a two-level clustering of all identified genera for this analysis. In the first level, the samples that showed similar genus abundances were grouped and in the second level, a second grouping was carried out in each cluster considering only the samples belonging to the respective group. After the grouping, the genera present in all first level groupings (denominated core taxa), the genera present exclusively in each of the first level groupings (denominated exclusive taxa) and the other genera (denominated neutral taxa) were identified.

Data Records

The raw nucleotide sequences of 1,192,347,558 reads and 2,608,990 contigs extracted from 34 soil samples were deposited as fastq and fasta files at NBCI (Data Citation 1 and Table 2 (available online only)). As required, fastq files contain four lines for each read, that is an identifier of the read, the nucleotide sequence, the placeholder ‘ + ’ for optional annotations (not used here) and the Phred quality score of each nucleotide. fasta files are composed of two elements for each contig, an identifier and the sequence of the contig.
Table 2

Sequencing and assembly data from metagenomic libraries of 34 soil samples from non-rehabilitated, rehabilitating and reference sites from two iron-ore mines, Corumbá, Mato Grosso do Sul, Brazil.

Sample AliasSample IDDateLatitudeLongitudeCategoryAge (year)Forward reads
Reverse reads
Estimated coverageAssembly
# bases# readsinterval# bases# readsinterval# contigsTotal length (Mbp)N50              
N50 is an assembly statistics which indicates the length of the smallest contig in the smallest set of contigs whose total number of bases corresponds to at least 50% of the total length of the assembly[42].
NR_M5_1MG17110/6/2016−19.1848−57.6111non-rehabilitated02.852 + E093.780 + E0743–762.851 + E093.780 + E0763–7633.12%135,79799.141387
NR_M5_2MG17210/6/2016−19.1848−57.6111non-rehabilitated02.573 + E093.417 + E0735–762.572 + E093.417 + E0762–7622.02%110,15661.22943
NR_M5_3MG17310/6/2016−19.1848−57.6111non-rehabilitated03.657 + E094.855 + E0735–763.658 + E094.855 + E0758–7623.52%127,83063.76841
NR_PR_1MG15210/7/2016−19.2171−57.5908non-rehabilitated02.383 + E093.168 + E0735–762.383 + E093.168 + E0761–7619.76%48,85422.46801
NR_PR_2MG15310/7/2016−19.2171−57.5908non-rehabilitated02.118 + E092.812 + E0735–762.117 + E092.812 + E0760–7614.69%71,96836.74837
NR_PR_3MG14710/7/2016−19.2171−57.5908non-rehabilitated01.076 + E091.428 + E0738–761.076 + E091.428 + E0760–7615.68%15,0186.33670
NR_RN_1MG14110/5/2016−19.195−57.603non-rehabilitated03.122 + E094.138 + E0735–763.122 + E094.138 + E0745–7629.57%192,916134.661346
NR_RN_2MG14210/5/2016−19.195−57.603non-rehabilitated02.250 + E092.987 + E0735–762.251 + E092.987 + E0760–7620.58%70,10136.29826
NR_RN_3MG14310/5/2016−19.195−57.603non-rehabilitated03.429 + E094.546 + E0735–763.429 + E094.546 + E0758–7635.23%204,263151.161485
REF_A_1MG16310/4/2016−19.1921−57.6016Reference3.706 + E094.911 + E0735–763.706 + E094.911 + E0762–7627.03%256,960170.891180
REF_A_3MG16510/4/2016−19.1921−57.6016Reference3.097 + E094.106 + E0735–763.097 + E094.106 + E0745–7624.83%63,65625.65650
REF_B_1MG15610/6/2016−19.1837−57.6126Reference2.619 + E093.479 + E0735–762.620 + E093.479 + E0761–7617.68%48,50721.04740
REF_B_3MG15810/6/2016−19.1837−57.6126Reference3.747 + E094.970 + E0735–763.747 + E094.970 + E0762–7619.39%142,82870.43781
REF_C_1MG17410/7/2016−19.2095−57.5935Reference1.753 + E092.326 + E0735–761.753 + E092.326 + E0761–7621.23%65,76928.36674
REF_C_2MG17510/7/2016−19.2095−57.5935Reference2.100 + E092.783 + E0743–762.100 + E092.783 + E0760–7613.99%44,97319.36660
REF_C_3MG16610/7/2016−19.2095−57.5935Reference3.248 + E094.327 + E0735–763.251 + E094.327 + E0760–7621.77%87,36036.41686
RH_M5_1MG17010/6/2016−19.2178−57.5864Rehabilitating22.902 + E093.848 + E0735–762.902 + E093.848 + E0761–7621.57%60,77727.05688
RH_M5_2MG16110/6/2016−19.2178−57.5864Rehabilitating22.143 + E092.843 + E0735–762.142 + E092.843 + E0759–7620.06%43,60921.8825
RH_M5_3MG16210/6/2016−19.2178−57.5864Rehabilitating22.403 + E093.189 + E0736–762.403 + E093.189 + E0761–7618.97%50,39324.65883
RH_MC_1MG16710/6/2016−19.2168−57.5817Rehabilitating32.604 + E093.463 + E0735–762.604 + E093.463 + E0761–7612.39%15,2055.61612
RH_MC_3MG16910/6/2016−19.2168−57.5817Rehabilitating32.517 + E093.338 + E0735–762.517 + E093.338 + E0761–7621.86%38,56919.381095
RH_ME_1MG13710/6/2016−19.1927−57.6032Rehabilitating32.853 + E093.794 + E0735–762.854 + E093.794 + E0761–7616.45%54,57321.49625
RH_ME_2MG13810/6/2016−19.1927−57.6032Rehabilitating31.763 + E092.339 + E0735–761.763 + E092.339 + E0761–7616.54%23,5489.23632
RH_ME_3MG13910/6/2016−19.1927−57.6032Rehabilitating33.668 + E094.874 + E0735–763.669 + E094.874 + E0760–7619.06%102,37343.94689
RH_PA_1MG15910/7/2016−19.1855−57.6075Rehabilitating32.916 + E093.881 + E0735–762.914 + E093.881 + E0761–7625.52%117,60250.86699
RH_PA_3MG15110/7/2016−19.1855−57.6075Rehabilitating32.505 + E093.322 + E0736–762.504 + E093.322 + E0761–7618.39%52,57025.35745
RH_PB_2MG15510/7/2016−19.184−57.611Rehabilitating21.748 + E092.321 + E0735–761.748 + E092.321 + E0760–7620.37%62,92327.53700
RH_PB_3MG14610/7/2016−19.184−57.611Rehabilitating22.336 + E093.101 + E0735–762.335 + E093.101 + E0762–7616.42%28,52611.51648
RH_PI_1MG14410/4/2016−19.1918−57.6024Rehabilitating62.609 + E093.461 + E0735–762.608 + E093.461 + E0760–7613.79%50,79424.5831
RH_PI_2MG14510/4/2016−19.1918−57.6024Rehabilitating62.450 + E093.251 + E0735–762.451 + E093.251 + E0761–7617.47%45,76818.79652
RH_PI_3MG13610/4/2016−19.1918−57.6024Rehabilitating62.045 + E092.728 + E0735–762.049 + E092.728 + E0760–7611.01%9,7773.54609
RH_SC_1MG14810/5/2016−19.1909−57.602Rehabilitating23.455 + E094.582 + E0735–763.455 + E094.582 + E0761–7616.49%49,55924.64778
RH_SC_2MG14910/5/2016−19.1909−57.602Rehabilitating21.606 + E092.129 + E0735–761.606 + E092.129 + E0762–7616.57%19,9218.82724
RH_SC_3MG15010/5/2016−19.1909−57.602Rehabilitating23.560 + E094.721 + E0735–763.560 + E094.721 + E0760–7620.57%95,54748.65761
Further data were deposited in Open Science Framework (Data Citation 2). Here, the “supplementary” folder contains quality reports for forward and reverse reads from each sample as well as chemical and physical soil properties. Soil properties are furnished as comma delimited .csv file, named SoilSamples.csv. Read quality reports contains 12 section entitled 1) Basic Statistics, 2) Per base sequence quality, 3) Per tile sequence quality, 4) Per sequence quality scores, 5) Per base sequence content, 6) Per sequence GC content, 7) Per base N content, 8) Sequence Length Distribution, 9) Sequence Duplication Levels, 10) Overrepresented sequences, 11) Adapter Content and 12) Kmer Content. The file README.txt, available in the same folder, contains a brief explanation for each section. Additionally, the “cluster_analysis” folder contains three subordinated folders. The “inputs” folder contains files regarding CDS detected within assembled contigs whereas the “output” folder contains the taxonomic and the functional classification that were used to generate counting matrices by the corresponding scripts, deposited in the “script” folder. The “inputs” folder contains three zipped files. First, kaiju_input.tar.gz contains a file for each sample with all identified CDS. The file lists CDS identifiers and their sequences. Second, kaiju_output.tar.gz contains the taxonomic classification for each CDS, stored as individual, tab-delimited files for each sample. An upper case letter indicates the success of taxonomic classification (U is unclassified, C is classified) and is followed by the CDS identifier, the NCBI taxonomy ID for the identified taxon and a string showing taxonomic identification containing domain, phylum, class, order, family, genus and species, separated by semicolons, for each CDS. The identifier is composed of CDS ID, containing the contig ID as well as the initial and final nucleotide positions of the CDS within the contig, all of them joined by underlines to a single string. The interpro_output.tar.gz contains the functional classification. Individual comma-delimited files (.csv) contains the enzyme list detected within each sample. Each file is composed of three columns containing an identifier, the name of the protein as well as the number of occurrences within the analyzed sample. The “output” folder contains three comma separated files within a zipped folder (output.tar.gz). The files correspond to the expected taxonomic (taxa.csv) and the functional matrices (functions.csv). Additionally, taxa_30.csv shows the taxonomic matrix for the 30 top genera only. Furthermore, five R scripts used to produce the taxonomic matrix (taxonomic_analysis.R), plot samples clustered by taxonomic composition (taxonomic_cluster_plot.R), plot taxonomic composition of each sample (taxonomic_stacked_plot.R), produce the functional matrix (functional_analysis.R) and to plot samples clustered by functions (functional_cluster_plot.R) are available in the “scripts” folder.

Technical Validation

Altogether, 2,166,372 CDS were detected. A total of 2.064 genera were present in 1,290,491 CDS, among them 127 archaea, 1,853 bacteria, and 84 virus genera. Richness varies from 739 to 1,894 within samples (Table 3). 273,799 CDS (12.64% of all CDS) remain completely unclassified, and for an additional 875,881 CDS (40.43% of all CDS), only partial matches are available. Functional classification of identified contigs distinguished 10,913 proteins.
Table 3

Taxonomic and functional classification of communities from metagenomic libraries of 34 soil samples from non-rehabilitated, rehabilitating and reference sites from two iron-ore mines, Corumbá, Mato Grosso do Sul, Brazil.

Sample AliasSample IDNumber of contigsNumber of CDSClassified CDSUnclassified CDSNumber of GeneraNumber of different functions
CDS are protein-coding sequences. The number of genera corresponds the number of distinct, fully identified genera of archaea, bacteria, and viruses.
NR_M5_1MG171135,797123,230113,6759,5551,6646,769
NR_M5_2MG172110,15688,34078,06910,2711,6526,057
NR_M5_3MG173127,83084,56476,3708,1941,5836,027
NR_PR_1MG15248,85434,57428,3746,2001,4674,498
NR_PR_2MG15371,96853,78144,9228,8591,58651,59
NR_PR_3MG14715,01810,4619,2831,1788532,741
NR_RN_1MG141192,916163,614145,79717,8171,8087,540
NR_RN_2MG14270,10154,03948,1305,9091,5445,262
NR_RN_3MG143204,263175,879157,32918,5501,8247,324
REF_A_1MG163256,960197,062159,53637,5261,8948,643
REF_A_3MG16563,65639,03930,2368,8031,4285,195
REF_B_1MG15648,50733,92528,7165,2091,4034,425
REF_B_3MG158142,828191,76884,88616,8821,7375,794
REF_C_1MG17465,76943,75936,9976,7621,3724,497
REF_C_2MG17544,97331,71727,7363,9811,2614,149
REF_C_3MG16687,36054,93445,3239,6111,5234,750
RH_M5_1MG17060,77744,59138,4956,0961,5004,759
RH_M5_2MG16143,60932,79928,8933,9061,3464,639
RH_M5_3MG16250,39335,74430,9634,7811,3734,661
RH_MC_1MG16715,2059,7088,3081,4009422,886
RH_MC_3MG16938,56926,90722,4084,4991,3344,565
RH_ME_1MG13754,57336,07131,2854,7861,2394,212
RH_ME_2MG13823,548161,37714,2211,9169133,222
RH_ME_3MG139102,37360,37749,69610,6811,5905,003
RH_PA_1MG159117,60281,49270,87710,6151,6435,653
RH_PA_3MG15152,57038,61233,1405,4721,4604,656
RH_PB_2MG15562,92345,30437,5827,7221,4314,219
RH_PB_3MG14628,52619,29116,6032,6881,1883,873
RH_PI_1MG14450,79437,15833,5103,6481,2465,184
RH_PI_2MG14545,76830,55325,6894,3641,2224,189
RH_PI_3MG1369,7776,0454,9201,1257392,233
RH_SC_1MG14849,55931,40328,1783,2251,2764,489
RH_SC_2MG14919,92114,85913,0231,8361,0493,387
RH_SC_3MG15095,54773,39562,66319,7321,6385,503
All micro-organism diversity within samples (measured on genus level) varied from 4.5 to 5.5 (Fig. 3) and was significantly higher in non-rehabilitated than in rehabilitating study sites (ANOVA, F = 4.137, p = 0.0255, Fig. 3). Significant differences in community composition were detected. First, the cluster analysis separated the samples into two clusters. The larger cluster groups samples from rehabilitating and reference sites, whereas samples from non-rehabilitated sites were grouped outside (Fig. 4).
Figure 3

Shannon diversity of each of the 34 samples (left) and boxplot of species richness, separated by non-rehabilitated (NR), rehabilitating (RH) and reference study sites (REF).

Different letters in the same boxplot meant significant difference at 0.05 level according to a post-hoc Tukey HSD test. Although no significant difference in richness values between REF to NR and RH, we observed a significant difference between NR to RH.

Figure 4

Clustering of samples from non-rehabilitated (NR), rehabilitating (RH) and reference study sites (REF) from Corumbá iron ore mines, Mato Grosso do Sul, Brazil, based on taxonomic counting matrix.

We considered only clusters with approximately unbiased clustering statistics (au) larger than 0.95, which represents a strong similarity between the grouped samples.

Additionally, the complete analysis of taxonomy separated the dataset into four groups by taxonomic profile, three of them divided into subgroups (Fig. 5). As shown in Table 4, samples from all three treatments (non-rehabilitated, rehabilitating and reference sites) were clustered in groups A and B, while a single reference sample forms group D and group C is composed exclusively of non-rehabilitating samples. All analysis carried out here show that taxonomic composition of microorganism communities from rehabilitating and reference sites is highly similar, indicating that rehabilitating activities after iron ore mining in the Urucum massif can rehabilitate soil microorganisms successfully.
Figure 5

Graphical representations of integrated taxonomy analysis performed by MGCOMP, containing a two-level grouping of all identified genera.

Different clusters A, B, C, and D as well as their subclusters, represented as dark blue circles, are composed of different numbers of samples and contain different amounts of core (i.e., present in all first level groupings), exclusive (i.e., occurrence restricted to first level grouping) and neutral (others) genera as shown in Table 4.

Table 4

Exclusive and core taxa for each sample cluster build with MGCOMP.

Cluster IdSamplesExclusive taxaCore taxa
ARH_ME_1Subcluster 1: Frateuria, Leifsonia, Rhodanobacter, Dyella, RubrobacterSubcluster 2: Nonomuraea, Nocardiopsis, Microbispora, Thermomonospora, ActinopolymorphaAnaeromyxobacter, Arthrobacter, Blastococcus, Chitinophaga, Flavihumibacter, Flavisolibacter, Frankia, Gemmatimonas, Gemmatirosa, Geodermatophilus, Janthinobacterium, Marmoricola, Massilia, Mucilaginibacter, Mycobacterium, Myxococcus, Niabella, Niastella, Nocardioides, Novosphingobium, Pedobacter, Phycicoccus, Ramlibacter, Segetibacter, Sinomonas, Sphingobium, Sphingomonas, Variovorax
RH_ME_2
RH_PB_2
RH_PA_1
NR_M5_3
REF_C_1
REF_C_2
BRH_PI_3Subcluster 1: Duganella, Lactococcus, Bryobacter, Chryseobacterium, Steroidobacter, Verrucomicrobium, Streptococcus, Lysobacter, Enterobacter, Geobacter, Belnapia, DechloromonasSubcluster 2: Microvirga, Pseudolabrys, Bosea, Rhodovulum
RH_ME_3
NR_RN_2
RH_PI_1
RH_PI_2
RH_PB_3
NR_PR_3
RH_SC_1
RH_SC_2
RH_SC_3
RH_PA_3
NR_PR_1
NR_PR_2
REF_B_1
REF_B_3
RH_M5_2
RH_M5_3
REF_A_3
REF_C_3
RH_MC_1
RH_MC_3
RH_M5_1
CNR_RN_1 
NR_RN_3
NR_M5_1
NR_M5_2
DREF_A_1Phenylobacterium, Caulobacter

Usage Notes

Contigs and the taxonomic and functional classifications have been generated using an automated process without manual assessment, i.e., represent a draft assembly only. As such, all downstream research should independently assess the accuracy of reads, contigs, and taxonomic and functional assignments for organisms of interest. Nevertheless, this study presents a baseline for further studies of this kind. The dataset contains a significant amount of taxa and functions previously identified, but a high portion of unclassified or incompletely classified CDS indicates the presence of a sizable portion of unseen biodiversity within soils along the sampled rehabilitation chronosequence. The identification of this unseen biodiversity may require additional alignments, eventually using different genome assemblers as well as combinations with further reference databases. Furthermore, there is a need for manual assessment of the quality of functional and taxonomic classification in some cases. This analysis of outstanding seen and unseen biodiversity within this dataset is expected to produce helpful insights to microbial community ecology along rehabilitation chronosequences after iron ore mining.

Additional information

How to cite this article: Gastauer, M. et al. A metagenomic survey of soil microbial communities along a rehabilitation chronosequence after iron ore mining. Sci. Data. 6:190008 https://doi.org/10.1038/sdata.2019.8 (2019). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  19 in total

1.  Next-generation monitoring of aquatic biodiversity using environmental DNA metabarcoding.

Authors:  Alice Valentini; Pierre Taberlet; Claude Miaud; Raphaël Civade; Jelger Herder; Philip Francis Thomsen; Eva Bellemain; Aurélien Besnard; Eric Coissac; Frédéric Boyer; Coline Gaboriaud; Pauline Jean; Nicolas Poulet; Nicolas Roset; Gordon H Copp; Philippe Geniez; Didier Pont; Christine Argillier; Jean-Marc Baudoin; Tiphaine Peroux; Alain J Crivelli; Anthony Olivier; Manon Acqueberge; Matthieu Le Brun; Peter R Møller; Eske Willerslev; Tony Dejean
Journal:  Mol Ecol       Date:  2016-01-18       Impact factor: 6.185

2.  Impact of phosphate mining and separation of mined materials on the hydrology and water environment of the Huangbai River basin, China.

Authors:  Kang Wang; Zhongbing Lin; Renduo Zhang
Journal:  Sci Total Environ       Date:  2015-11-18       Impact factor: 7.963

3.  Soaring extinction threats to endemic plants in Brazilian metal-rich regions.

Authors:  Claudia M Jacobi; Flávio F do Carmo; Iara C de Campos
Journal:  Ambio       Date:  2011-07       Impact factor: 5.129

4.  Using metagenomics to show the efficacy of forest restoration in the New Jersey Pine Barrens.

Authors:  William D Eaton; Shadi Shokralla; Kathleen M McGee; Mehrdad Hajibabaei
Journal:  Genome       Date:  2017-07-21       Impact factor: 2.166

Review 5.  Microbes from mined sites: Harnessing their potential for reclamation of derelict mine sites.

Authors:  Palanisami Thavamani; R Amos Samkumar; Viswanathan Satheesh; Suresh R Subashchandrabose; Kavitha Ramadass; Ravi Naidu; Kadiyala Venkateswarlu; Mallavarapu Megharaj
Journal:  Environ Pollut       Date:  2017-07-07       Impact factor: 8.071

6.  Economic valuation of plant diversity storage service provided by Brazilian rupestrian grassland ecosystems.

Authors:  F M Resende; G W Fernandes; M S Coelho
Journal:  Braz J Biol       Date:  2013-11       Impact factor: 1.651

Review 7.  Metagenomic applications in environmental monitoring and bioremediation.

Authors:  Stephen M Techtmann; Terry C Hazen
Journal:  J Ind Microbiol Biotechnol       Date:  2016-08-24       Impact factor: 3.346

Review 8.  Mine land rehabilitation in Brazil: Goals and techniques in the context of legal requirements.

Authors:  Markus Gastauer; Pedro Walfir Martins Souza Filho; Silvio Junio Ramos; Cecílio Frois Caldeira; Joyce Reis Silva; José Oswaldo Siqueira; Antonio Eduardo Furtini Neto
Journal:  Ambio       Date:  2018-04-11       Impact factor: 5.129

Review 9.  Canga biodiversity, a matter of mining.

Authors:  Aleksandra Skirycz; Alexandre Castilho; Cristian Chaparro; Nelson Carvalho; George Tzotzos; Jose O Siqueira
Journal:  Front Plant Sci       Date:  2014-11-24       Impact factor: 5.753

10.  Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity.

Authors:  Luis M Rodriguez-R; Santosh Gunturu; James M Tiedje; James R Cole; Konstantinos T Konstantinidis
Journal:  mSystems       Date:  2018-04-10       Impact factor: 6.496

View more
  2 in total

1.  Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis.

Authors:  Anupam Gautam; Hendrik Felderhoff; Caner Bağci; Daniel H Huson
Journal:  mSystems       Date:  2022-02-22       Impact factor: 6.496

2.  Enzymatic Cleavage of 3'-Esterified Nucleotides Enables a Long, Continuous DNA Synthesis.

Authors:  Shiuan-Woei LinWu; Ting-Yueh Tsai; Yu-Hsuan Tu; Hung-Wen Chi; Yu-Ping Tsao; Ya-Chen Chen; Hsiang-Ming Wang; Wei-Hsin Chang; Chung-Fan Chiou; Johnsee Lee; Cheng-Yao Chen
Journal:  Sci Rep       Date:  2020-05-05       Impact factor: 4.379

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.