| Literature DB >> 30747914 |
Markus Gastauer1, Mabel Patricia Ortiz Vera1,2, Kleber Padovani de Souza1,3, Eder Soares Pires1, Ronnie Alves1,3, Cecílio Frois Caldeira1, Silvio Junio Ramos1, Guilherme Oliveira1.
Abstract
Microorganisms are useful environmental indicators, able to deliver essential insights to processes regarding mine land rehabilitation. To compare microbial communities from a chronosequence of mine land rehabilitation to pre-disturbance levels from references sites covered by native vegetation, we sampled non-rehabilitated, rehabilitating and reference study sites from the Urucum Massif, Southwestern Brazil. From each study site, three composed soil samples were collected for chemical, physical, and metagenomics analysis. We used a paired-end library sequencing technology (NextSeq 500 Illumina); the reads were assembled using MEGAHIT. Coding DNA sequences (CDS) were identified using Kaiju in combination with non-redundant NCBI BLAST reference sequences containing archaea, bacteria, and viruses. Additionally, a functional classification was performed by EMG v2.3.2. Here, we provide the raw data and assembly (reads and contigs), followed by initial functional and taxonomic analysis, as a base-line for further studies of this kind. Further investigation is needed to fully understand the mechanisms of environmental rehabilitation in tropical regions, inspiring further researchers to explore this collection for hypothesis testing.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30747914 PMCID: PMC6371960 DOI: 10.1038/sdata.2019.8
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Map of geographical position of the study sites in the Urucum Massif, Corumbá, Mato Grosso do Sul, Brazil.
1. Rampa Nova, 2. Mina 5, 3. PRAD 45 C, 4. Piscinão 5. Mina Cateto, 6. PRAD 45 A, 7. Mina Escarpa, 8. Secção 10I, 9. Mina 5 N, 10. PRAD 45B, 11. Reference A, 12. Reference B, and 13. Reference C.
Site information for all 13 sampling locations utilized in this study.
| Category | Study sites | Sample Alias | Latitude | Longitude | Age |
|---|---|---|---|---|---|
| Age is the time interval (in years) between the beginning of rehabilitation activities and sampling. | |||||
| Non-revegetated study sites | Rampa Nova | NR_RN_1 – NR_RN_3 | −19.1950 | −57.6030 | 0 |
| Mina 5 | NR_M5_1 – NR_M5_3 | −19.1848 | −57.6111 | 0 | |
| PRAD 45 C | NR_PR_1 - NR_PR_3 | −19.2171 | −57.5908 | 0 | |
| Sites in environmental rehabilitation | Piscinão | RH_PI_1 - RH_PI_3 | −19.1918 | −57.6024 | 6 |
| Mina Cateto | RH_MC_1 - RH_MC_3 | −19.2168 | −57.5817 | 3 | |
| PRAD 45 A | RH_PA_1 - RH_PA_3 | −19.1855 | −57.6075 | 3 | |
| Mina Escarpa | RH_ME_1 - RH_ME_3 | −19.1927 | −57.6032 | 3 | |
| Secção 10I | RH_SC_1 - RH_SC_3 | −19.1909 | −57.6020 | 2 | |
| Mina 5 N | RH_M5_1 - RH_M5_3 | −19.2178 | −57.5864 | 2 | |
| PRAD 45B | RH_PB_1 - RH_PB_3 | −19.1840 | −57.6110 | 2 | |
| Reference sites, covered by natural | Reference A | REF_A_1 – REF_A1_3 | −19.1921 | −57.6016 | — |
| Reference B | REF_B_1 – REF_B_3 | −19.1837 | −57.6126 | — | |
| Reference C | REF_C_1 – REF_C_3 | −19.2095 | −57.5935 | — | |
Figure 2Workflow of genome assembly, functional and taxonomic classification and data validation applied in this study.
Rounded rectangles symbolize processes containing descriptions and tools, and rectangles represent input and/or output files enclosing a brief description, file name (∗xxx∗ is a placeholder for sample ID) and format, as well as their localization. CDS stands for coding DNA sequences. NCBI indicates that files are available from NCBI (Data Citation 1), whereas SF indicates the corresponding files were deposited in Open Science Framework (Data Citation 2).
Sequencing and assembly data from metagenomic libraries of 34 soil samples from non-rehabilitated, rehabilitating and reference sites from two iron-ore mines, Corumbá, Mato Grosso do Sul, Brazil.
| Sample Alias | Sample ID | Date | Latitude | Longitude | Category | Age (year) | Forward reads | Reverse reads | Estimated coverage | Assembly | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # bases | # reads | interval | # bases | # reads | interval | # contigs | Total length (Mbp) | N50 | ||||||||||||||||||||||
| N50 is an assembly statistics which indicates the length of the smallest contig in the smallest set of contigs whose total number of bases corresponds to at least 50% of the total length of the assembly[ | ||||||||||||||||||||||||||||||
| NR_M5_1 | MG171 | 10/6/2016 | −19.1848 | −57.6111 | non-rehabilitated | 0 | 2.852 + E09 | 3.780 + E07 | 43–76 | 2.851 + E09 | 3.780 + E07 | 63–76 | 33.12% | 135,797 | 99.14 | 1387 | ||||||||||||||
| NR_M5_2 | MG172 | 10/6/2016 | −19.1848 | −57.6111 | non-rehabilitated | 0 | 2.573 + E09 | 3.417 + E07 | 35–76 | 2.572 + E09 | 3.417 + E07 | 62–76 | 22.02% | 110,156 | 61.22 | 943 | ||||||||||||||
| NR_M5_3 | MG173 | 10/6/2016 | −19.1848 | −57.6111 | non-rehabilitated | 0 | 3.657 + E09 | 4.855 + E07 | 35–76 | 3.658 + E09 | 4.855 + E07 | 58–76 | 23.52% | 127,830 | 63.76 | 841 | ||||||||||||||
| NR_PR_1 | MG152 | 10/7/2016 | −19.2171 | −57.5908 | non-rehabilitated | 0 | 2.383 + E09 | 3.168 + E07 | 35–76 | 2.383 + E09 | 3.168 + E07 | 61–76 | 19.76% | 48,854 | 22.46 | 801 | ||||||||||||||
| NR_PR_2 | MG153 | 10/7/2016 | −19.2171 | −57.5908 | non-rehabilitated | 0 | 2.118 + E09 | 2.812 + E07 | 35–76 | 2.117 + E09 | 2.812 + E07 | 60–76 | 14.69% | 71,968 | 36.74 | 837 | ||||||||||||||
| NR_PR_3 | MG147 | 10/7/2016 | −19.2171 | −57.5908 | non-rehabilitated | 0 | 1.076 + E09 | 1.428 + E07 | 38–76 | 1.076 + E09 | 1.428 + E07 | 60–76 | 15.68% | 15,018 | 6.33 | 670 | ||||||||||||||
| NR_RN_1 | MG141 | 10/5/2016 | −19.195 | −57.603 | non-rehabilitated | 0 | 3.122 + E09 | 4.138 + E07 | 35–76 | 3.122 + E09 | 4.138 + E07 | 45–76 | 29.57% | 192,916 | 134.66 | 1346 | ||||||||||||||
| NR_RN_2 | MG142 | 10/5/2016 | −19.195 | −57.603 | non-rehabilitated | 0 | 2.250 + E09 | 2.987 + E07 | 35–76 | 2.251 + E09 | 2.987 + E07 | 60–76 | 20.58% | 70,101 | 36.29 | 826 | ||||||||||||||
| NR_RN_3 | MG143 | 10/5/2016 | −19.195 | −57.603 | non-rehabilitated | 0 | 3.429 + E09 | 4.546 + E07 | 35–76 | 3.429 + E09 | 4.546 + E07 | 58–76 | 35.23% | 204,263 | 151.16 | 1485 | ||||||||||||||
| REF_A_1 | MG163 | 10/4/2016 | −19.1921 | −57.6016 | Reference | — | 3.706 + E09 | 4.911 + E07 | 35–76 | 3.706 + E09 | 4.911 + E07 | 62–76 | 27.03% | 256,960 | 170.89 | 1180 | ||||||||||||||
| REF_A_3 | MG165 | 10/4/2016 | −19.1921 | −57.6016 | Reference | — | 3.097 + E09 | 4.106 + E07 | 35–76 | 3.097 + E09 | 4.106 + E07 | 45–76 | 24.83% | 63,656 | 25.65 | 650 | ||||||||||||||
| REF_B_1 | MG156 | 10/6/2016 | −19.1837 | −57.6126 | Reference | — | 2.619 + E09 | 3.479 + E07 | 35–76 | 2.620 + E09 | 3.479 + E07 | 61–76 | 17.68% | 48,507 | 21.04 | 740 | ||||||||||||||
| REF_B_3 | MG158 | 10/6/2016 | −19.1837 | −57.6126 | Reference | — | 3.747 + E09 | 4.970 + E07 | 35–76 | 3.747 + E09 | 4.970 + E07 | 62–76 | 19.39% | 142,828 | 70.43 | 781 | ||||||||||||||
| REF_C_1 | MG174 | 10/7/2016 | −19.2095 | −57.5935 | Reference | — | 1.753 + E09 | 2.326 + E07 | 35–76 | 1.753 + E09 | 2.326 + E07 | 61–76 | 21.23% | 65,769 | 28.36 | 674 | ||||||||||||||
| REF_C_2 | MG175 | 10/7/2016 | −19.2095 | −57.5935 | Reference | — | 2.100 + E09 | 2.783 + E07 | 43–76 | 2.100 + E09 | 2.783 + E07 | 60–76 | 13.99% | 44,973 | 19.36 | 660 | ||||||||||||||
| REF_C_3 | MG166 | 10/7/2016 | −19.2095 | −57.5935 | Reference | — | 3.248 + E09 | 4.327 + E07 | 35–76 | 3.251 + E09 | 4.327 + E07 | 60–76 | 21.77% | 87,360 | 36.41 | 686 | ||||||||||||||
| RH_M5_1 | MG170 | 10/6/2016 | −19.2178 | −57.5864 | Rehabilitating | 2 | 2.902 + E09 | 3.848 + E07 | 35–76 | 2.902 + E09 | 3.848 + E07 | 61–76 | 21.57% | 60,777 | 27.05 | 688 | ||||||||||||||
| RH_M5_2 | MG161 | 10/6/2016 | −19.2178 | −57.5864 | Rehabilitating | 2 | 2.143 + E09 | 2.843 + E07 | 35–76 | 2.142 + E09 | 2.843 + E07 | 59–76 | 20.06% | 43,609 | 21.8 | 825 | ||||||||||||||
| RH_M5_3 | MG162 | 10/6/2016 | −19.2178 | −57.5864 | Rehabilitating | 2 | 2.403 + E09 | 3.189 + E07 | 36–76 | 2.403 + E09 | 3.189 + E07 | 61–76 | 18.97% | 50,393 | 24.65 | 883 | ||||||||||||||
| RH_MC_1 | MG167 | 10/6/2016 | −19.2168 | −57.5817 | Rehabilitating | 3 | 2.604 + E09 | 3.463 + E07 | 35–76 | 2.604 + E09 | 3.463 + E07 | 61–76 | 12.39% | 15,205 | 5.61 | 612 | ||||||||||||||
| RH_MC_3 | MG169 | 10/6/2016 | −19.2168 | −57.5817 | Rehabilitating | 3 | 2.517 + E09 | 3.338 + E07 | 35–76 | 2.517 + E09 | 3.338 + E07 | 61–76 | 21.86% | 38,569 | 19.38 | 1095 | ||||||||||||||
| RH_ME_1 | MG137 | 10/6/2016 | −19.1927 | −57.6032 | Rehabilitating | 3 | 2.853 + E09 | 3.794 + E07 | 35–76 | 2.854 + E09 | 3.794 + E07 | 61–76 | 16.45% | 54,573 | 21.49 | 625 | ||||||||||||||
| RH_ME_2 | MG138 | 10/6/2016 | −19.1927 | −57.6032 | Rehabilitating | 3 | 1.763 + E09 | 2.339 + E07 | 35–76 | 1.763 + E09 | 2.339 + E07 | 61–76 | 16.54% | 23,548 | 9.23 | 632 | ||||||||||||||
| RH_ME_3 | MG139 | 10/6/2016 | −19.1927 | −57.6032 | Rehabilitating | 3 | 3.668 + E09 | 4.874 + E07 | 35–76 | 3.669 + E09 | 4.874 + E07 | 60–76 | 19.06% | 102,373 | 43.94 | 689 | ||||||||||||||
| RH_PA_1 | MG159 | 10/7/2016 | −19.1855 | −57.6075 | Rehabilitating | 3 | 2.916 + E09 | 3.881 + E07 | 35–76 | 2.914 + E09 | 3.881 + E07 | 61–76 | 25.52% | 117,602 | 50.86 | 699 | ||||||||||||||
| RH_PA_3 | MG151 | 10/7/2016 | −19.1855 | −57.6075 | Rehabilitating | 3 | 2.505 + E09 | 3.322 + E07 | 36–76 | 2.504 + E09 | 3.322 + E07 | 61–76 | 18.39% | 52,570 | 25.35 | 745 | ||||||||||||||
| RH_PB_2 | MG155 | 10/7/2016 | −19.184 | −57.611 | Rehabilitating | 2 | 1.748 + E09 | 2.321 + E07 | 35–76 | 1.748 + E09 | 2.321 + E07 | 60–76 | 20.37% | 62,923 | 27.53 | 700 | ||||||||||||||
| RH_PB_3 | MG146 | 10/7/2016 | −19.184 | −57.611 | Rehabilitating | 2 | 2.336 + E09 | 3.101 + E07 | 35–76 | 2.335 + E09 | 3.101 + E07 | 62–76 | 16.42% | 28,526 | 11.51 | 648 | ||||||||||||||
| RH_PI_1 | MG144 | 10/4/2016 | −19.1918 | −57.6024 | Rehabilitating | 6 | 2.609 + E09 | 3.461 + E07 | 35–76 | 2.608 + E09 | 3.461 + E07 | 60–76 | 13.79% | 50,794 | 24.5 | 831 | ||||||||||||||
| RH_PI_2 | MG145 | 10/4/2016 | −19.1918 | −57.6024 | Rehabilitating | 6 | 2.450 + E09 | 3.251 + E07 | 35–76 | 2.451 + E09 | 3.251 + E07 | 61–76 | 17.47% | 45,768 | 18.79 | 652 | ||||||||||||||
| RH_PI_3 | MG136 | 10/4/2016 | −19.1918 | −57.6024 | Rehabilitating | 6 | 2.045 + E09 | 2.728 + E07 | 35–76 | 2.049 + E09 | 2.728 + E07 | 60–76 | 11.01% | 9,777 | 3.54 | 609 | ||||||||||||||
| RH_SC_1 | MG148 | 10/5/2016 | −19.1909 | −57.602 | Rehabilitating | 2 | 3.455 + E09 | 4.582 + E07 | 35–76 | 3.455 + E09 | 4.582 + E07 | 61–76 | 16.49% | 49,559 | 24.64 | 778 | ||||||||||||||
| RH_SC_2 | MG149 | 10/5/2016 | −19.1909 | −57.602 | Rehabilitating | 2 | 1.606 + E09 | 2.129 + E07 | 35–76 | 1.606 + E09 | 2.129 + E07 | 62–76 | 16.57% | 19,921 | 8.82 | 724 | ||||||||||||||
| RH_SC_3 | MG150 | 10/5/2016 | −19.1909 | −57.602 | Rehabilitating | 2 | 3.560 + E09 | 4.721 + E07 | 35–76 | 3.560 + E09 | 4.721 + E07 | 60–76 | 20.57% | 95,547 | 48.65 | 761 | ||||||||||||||
Taxonomic and functional classification of communities from metagenomic libraries of 34 soil samples from non-rehabilitated, rehabilitating and reference sites from two iron-ore mines, Corumbá, Mato Grosso do Sul, Brazil.
| Sample Alias | Sample ID | Number of contigs | Number of CDS | Classified CDS | Unclassified CDS | Number of Genera | Number of different functions |
|---|---|---|---|---|---|---|---|
| CDS are protein-coding sequences. The number of genera corresponds the number of distinct, fully identified genera of archaea, bacteria, and viruses. | |||||||
| NR_M5_1 | MG171 | 135,797 | 123,230 | 113,675 | 9,555 | 1,664 | 6,769 |
| NR_M5_2 | MG172 | 110,156 | 88,340 | 78,069 | 10,271 | 1,652 | 6,057 |
| NR_M5_3 | MG173 | 127,830 | 84,564 | 76,370 | 8,194 | 1,583 | 6,027 |
| NR_PR_1 | MG152 | 48,854 | 34,574 | 28,374 | 6,200 | 1,467 | 4,498 |
| NR_PR_2 | MG153 | 71,968 | 53,781 | 44,922 | 8,859 | 1,586 | 51,59 |
| NR_PR_3 | MG147 | 15,018 | 10,461 | 9,283 | 1,178 | 853 | 2,741 |
| NR_RN_1 | MG141 | 192,916 | 163,614 | 145,797 | 17,817 | 1,808 | 7,540 |
| NR_RN_2 | MG142 | 70,101 | 54,039 | 48,130 | 5,909 | 1,544 | 5,262 |
| NR_RN_3 | MG143 | 204,263 | 175,879 | 157,329 | 18,550 | 1,824 | 7,324 |
| REF_A_1 | MG163 | 256,960 | 197,062 | 159,536 | 37,526 | 1,894 | 8,643 |
| REF_A_3 | MG165 | 63,656 | 39,039 | 30,236 | 8,803 | 1,428 | 5,195 |
| REF_B_1 | MG156 | 48,507 | 33,925 | 28,716 | 5,209 | 1,403 | 4,425 |
| REF_B_3 | MG158 | 142,828 | 191,768 | 84,886 | 16,882 | 1,737 | 5,794 |
| REF_C_1 | MG174 | 65,769 | 43,759 | 36,997 | 6,762 | 1,372 | 4,497 |
| REF_C_2 | MG175 | 44,973 | 31,717 | 27,736 | 3,981 | 1,261 | 4,149 |
| REF_C_3 | MG166 | 87,360 | 54,934 | 45,323 | 9,611 | 1,523 | 4,750 |
| RH_M5_1 | MG170 | 60,777 | 44,591 | 38,495 | 6,096 | 1,500 | 4,759 |
| RH_M5_2 | MG161 | 43,609 | 32,799 | 28,893 | 3,906 | 1,346 | 4,639 |
| RH_M5_3 | MG162 | 50,393 | 35,744 | 30,963 | 4,781 | 1,373 | 4,661 |
| RH_MC_1 | MG167 | 15,205 | 9,708 | 8,308 | 1,400 | 942 | 2,886 |
| RH_MC_3 | MG169 | 38,569 | 26,907 | 22,408 | 4,499 | 1,334 | 4,565 |
| RH_ME_1 | MG137 | 54,573 | 36,071 | 31,285 | 4,786 | 1,239 | 4,212 |
| RH_ME_2 | MG138 | 23,548 | 161,377 | 14,221 | 1,916 | 913 | 3,222 |
| RH_ME_3 | MG139 | 102,373 | 60,377 | 49,696 | 10,681 | 1,590 | 5,003 |
| RH_PA_1 | MG159 | 117,602 | 81,492 | 70,877 | 10,615 | 1,643 | 5,653 |
| RH_PA_3 | MG151 | 52,570 | 38,612 | 33,140 | 5,472 | 1,460 | 4,656 |
| RH_PB_2 | MG155 | 62,923 | 45,304 | 37,582 | 7,722 | 1,431 | 4,219 |
| RH_PB_3 | MG146 | 28,526 | 19,291 | 16,603 | 2,688 | 1,188 | 3,873 |
| RH_PI_1 | MG144 | 50,794 | 37,158 | 33,510 | 3,648 | 1,246 | 5,184 |
| RH_PI_2 | MG145 | 45,768 | 30,553 | 25,689 | 4,364 | 1,222 | 4,189 |
| RH_PI_3 | MG136 | 9,777 | 6,045 | 4,920 | 1,125 | 739 | 2,233 |
| RH_SC_1 | MG148 | 49,559 | 31,403 | 28,178 | 3,225 | 1,276 | 4,489 |
| RH_SC_2 | MG149 | 19,921 | 14,859 | 13,023 | 1,836 | 1,049 | 3,387 |
| RH_SC_3 | MG150 | 95,547 | 73,395 | 62,663 | 19,732 | 1,638 | 5,503 |
Figure 3Shannon diversity of each of the 34 samples (left) and boxplot of species richness, separated by non-rehabilitated (NR), rehabilitating (RH) and reference study sites (REF).
Different letters in the same boxplot meant significant difference at 0.05 level according to a post-hoc Tukey HSD test. Although no significant difference in richness values between REF to NR and RH, we observed a significant difference between NR to RH.
Figure 4Clustering of samples from non-rehabilitated (NR), rehabilitating (RH) and reference study sites (REF) from Corumbá iron ore mines, Mato Grosso do Sul, Brazil, based on taxonomic counting matrix.
We considered only clusters with approximately unbiased clustering statistics (au) larger than 0.95, which represents a strong similarity between the grouped samples.
Figure 5Graphical representations of integrated taxonomy analysis performed by MGCOMP, containing a two-level grouping of all identified genera.
Different clusters A, B, C, and D as well as their subclusters, represented as dark blue circles, are composed of different numbers of samples and contain different amounts of core (i.e., present in all first level groupings), exclusive (i.e., occurrence restricted to first level grouping) and neutral (others) genera as shown in Table 4.
Exclusive and core taxa for each sample cluster build with MGCOMP.
| Cluster Id | Samples | Exclusive taxa | Core taxa |
|---|---|---|---|
| A | RH_ME_1 | Subcluster 1: Frateuria, Leifsonia, Rhodanobacter, Dyella, RubrobacterSubcluster 2: Nonomuraea, Nocardiopsis, Microbispora, Thermomonospora, Actinopolymorpha | Anaeromyxobacter, Arthrobacter, Blastococcus, Chitinophaga, Flavihumibacter, Flavisolibacter, Frankia, Gemmatimonas, Gemmatirosa, Geodermatophilus, Janthinobacterium, Marmoricola, Massilia, Mucilaginibacter, Mycobacterium, Myxococcus, Niabella, Niastella, Nocardioides, Novosphingobium, Pedobacter, Phycicoccus, Ramlibacter, Segetibacter, Sinomonas, Sphingobium, Sphingomonas, Variovorax |
| RH_ME_2 | |||
| RH_PB_2 | |||
| RH_PA_1 | |||
| NR_M5_3 | |||
| REF_C_1 | |||
| REF_C_2 | |||
| B | RH_PI_3 | Subcluster 1: Duganella, Lactococcus, Bryobacter, Chryseobacterium, Steroidobacter, Verrucomicrobium, Streptococcus, Lysobacter, Enterobacter, Geobacter, Belnapia, DechloromonasSubcluster 2: Microvirga, Pseudolabrys, Bosea, Rhodovulum | |
| RH_ME_3 | |||
| NR_RN_2 | |||
| RH_PI_1 | |||
| RH_PI_2 | |||
| RH_PB_3 | |||
| NR_PR_3 | |||
| RH_SC_1 | |||
| RH_SC_2 | |||
| RH_SC_3 | |||
| RH_PA_3 | |||
| NR_PR_1 | |||
| NR_PR_2 | |||
| REF_B_1 | |||
| REF_B_3 | |||
| RH_M5_2 | |||
| RH_M5_3 | |||
| REF_A_3 | |||
| REF_C_3 | |||
| RH_MC_1 | |||
| RH_MC_3 | |||
| RH_M5_1 | |||
| C | NR_RN_1 | ||
| NR_RN_3 | |||
| NR_M5_1 | |||
| NR_M5_2 | |||
| D | REF_A_1 | Phenylobacterium, Caulobacter |