| Literature DB >> 36100598 |
Sebastian Metz1,2, Paula Huber3,4, Erick Mateus-Barros4, Pedro C Junger4, Michaela de Melo4,5, Inessa Lacativa Bagatini6, Irina Izaguirre7, Mariana Câmara Dos Reis4,8, Maria E Llames1, Victoria Accattatis3, María Victoria Quiroga1, Melina Devercelli3, María Romina Schiaffino9,10, Juan Pablo Niño-García11, Marcela Bastidas Navarro12, Beatriz Modenutti12, Helena Vieira6,13, Martin Saraceno7, Carmen Alejandra Sabio Y García7, Emiliano Pereira14, Alvaro González-Revello15,16, Claudia Piccini17, Fernando Unrein1, Cecilia Alonso14, Hugo Sarmento18.
Abstract
The biogeography of bacterial communities is a key topic in Microbial Ecology. Regarding continental water, most studies are carried out in the northern hemisphere, leaving a gap on microorganism's diversity patterns on a global scale. South America harbours approximately one third of the world's total freshwater resources, and is one of these understudied regions. To fill this gap, we compiled 16S rRNA amplicon sequencing data of microbial communities across South America continental water ecosystems, presenting the first database µSudAqua[db]. The database contains over 866 georeferenced samples from 9 different ecoregions with contextual environmental information. For its integration and validation we constructed a curated database (µSudAqua[db.sp]) using samples sequenced by Illumina MiSeq platform with commonly used prokaryote universal primers. This comprised ~60% of the total georeferenced samples of the µSudAqua[db]. This compilation was carried out in the scope of the µSudAqua collaborative network and represents one of the most complete databases of continental water microbial communities from South America.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36100598 PMCID: PMC9470542 DOI: 10.1038/s41597-022-01665-z
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Global distribution of amplicon sequencing samples from continental water systems using HTS. The information was acquired from MGnify (https://www.ebi.ac.uk/metagenomics/) resource by searching for non-marine aquatic samples, obtained by amplicon or metabarcoding experimental types. The geographical coordinates were retrieved for 4,691 samples from a total of 7,832 using the metadata available from each sample.
Data sources of the samples used to build the µSudAqua[db].
| Ecoregion | Sub-ecoregion | Sample number | References |
|---|---|---|---|
| 18. Central Andes | 18.1 Central High Andes/Puna | 48 | [ |
| 19. Southern Andes | 19.2 Valdivian Forested Hills and Mountains | 12 | [ |
| 20. Amazonian-Orinocan Lowland | 20.4 Amazon and Coastal Lowlands | 173 | [ |
| 21. Eastern Highlands | 21.2 Cerrados | 120 | [ |
| 21. Eastern Highlands | 21.4 Atlantic Forests | 207 | |
| 22 Grand Chaco | 22.2 Humid Chaco | 59 | [ |
| 23. Pampas | 23.1 Northern Rolling | 37 | [ |
| 23. Pampas | 23.2 Southern Flat Pampas | 127 | [ |
| 24. Monte-Patagonian | 24.2 Patagonian Tablelands | 83 | [ |
Metadata associated to the samples used to build the µSudAqua[db] database. Each sample is identified by an sample indentifier and the corresponding Run accession number from the GenBank. Moreover, each sample was assigned to a geographical location, ecoregion, sub-ecoregion and environment.
| Metadata | Description |
|---|---|
| Sample identify | |
| Location identify | |
| Country the observations belongs to | |
| Ecoregion Name | |
| Sub-Ecoregion Name | |
| Type of habitat the sample was taken from | |
| Type of system the sample was taken from | |
| Geographic Latitude in decimal degree | |
| Geographic Longitude in decimal degree | |
| Altitude of sampling location in meters above sea level [m.a.s.l] | |
| Sample depth in meters [m] | |
| Date of the sampling event | |
| Size fraction (µm) upper and lower threshold | |
| Solution in which the filter was preservated | |
| Temperature at which sample was stored C | |
| Nucleic acid target (RNA/DNA) | |
| Method used for the nucleic acid extraction | |
| Method used to disrupte the cells | |
| Duration for which sample was stored | |
| Next-generation sequencing plataform which the reads were generated | |
| Next-generation sequencing plataform model which the reads were generated | |
| If single or paired end reads method was used | |
| Sequencing technique implemented for the library | |
| Type of source material that is being sequenced. | |
| Method used to select and/or enrich the material being sequenced | |
| Name of the target gene | |
| Hypervariable region of 18 S/16S gene target | |
| Forward primer name | |
| Sequence associated wite the forward primer (5′-3′) | |
| Reverse primer name | |
| Sequence associated with the reverse primer (5′-3′) | |
| Bibliographic citation associated Primer used | |
| Bibliographic citation associated with the data | |
| Public database hosted the raw sample | |
| ENA Study Accession Number | |
| ENA Sample Accession Number | |
| ENA Experiment Accession Number | |
| ENA Run Accession Number | |
| Total number of sequenced reads | |
| Total number of sequenced nucleotides | |
| If the sample is included in the µSudAqua[db.sp] or not | |
| Number of reads in the sample fastq | |
| Total number of high quality reads after read trimming, read length filtering, and removal chimeras | |
| Total number of Amplicon Sequences Variants (ASVs) defined by DADA2 after removal chimeras | |
| Date of revision and information update | |
| Ftp link to download the forward raw fastq | |
| Ftp link to download the reverse raw fastq |
Technical information regarding of sampling procedure, nucleic acid extraction methodologies and sequencing strategies are also described. For samples from the µSudAqua[db.sp] database the number of high quality reads and Amplicon Sequences Variants (ASVs) defined is also indicated.
µSudAqua[db.sp] database sample description by ecoregions.
| Eco-region | Sub eco-region | Country | Type of system | Latitude | Longitude | Size fraction | Extraction method (N° of samples) | N° of samples |
|---|---|---|---|---|---|---|---|---|
| 20. Amazonian-Orinocan Lowland | 20.4 Amazon and Coastal Lowlands | Brazil | Floodplain lake | 2°06′S–2°16′S | 55°13′W–55°48′W | 0.2–1.2 µm 0.2–1.2 µm > 3 µm > 3 µm | Epicentre MGD08420 (3) Phenol-Chloroform (9) Epicentre MGD08420 (4) Phenol-Chloroform (8) | 24 |
| 20. Amazonian-Orinocan Lowland | 20.4 Amazon and Coastal Lowlands | Brazil | Rivers and floodplain lakes | 3°49′S | 60°18′W | 0.22–3 µm > 3 µm | Phenol-Chloroform (33) Phenol-Chloroform (36) | 69 |
| 21. Eastern Highlands | 21.4 Atlantic Forests | Brazil | Shallow lakes | 19°58′S–24°36 S | 44°21′W–52°19′W | 0.22–1.2 µm 0.22–1.2 µm | Phenol-Chloroform (53) PowerSoil extraction kit (11) | 64 |
| 21. Eastern Highlands | 21.2 Cerrados | Brazil | Shallow lakes | 19°59′S–23°52′S | 47°10′–51°05′W | 0.22–1.2 µm 0.22–1.2 µm | Phenol-Chloroform (17) PowerSoil extraction kit (5) | 22 |
| 21. Eastern Highlands | 21.2 Cerrados | Brazil | Reservoirs | 22°10′S | 47°54′W | >0.22 µm | Qiagen Dneasy Power Water | 15 |
| 21. Eastern Highlands | 21.2 Cerrados | Brazil | Reservoirs | 20°40′S–22°31′S | 48°31′W–51°16′W | 0.22–3 µm > 3 µm | PowerSoil extraction kit (24) PowerSoil extraction kit (24) | 48 |
| 22 Grand Chaco | 22.2 Humid Chaco | Argentina | Rivers and floodplain lakes | 31°37′S–31°50′S | 60°28′W–60°48′W | 0.22–50 µm | Phenol-Chloroform | 59 |
| 23. Pampas | 23.2 Southern Flat Pampas | Argentina | Shallow lakes | 34°28′S–38°55′S | 56°58′W–63°05′W | 0.22–45 µm | CTAB-Chloroform-Isoamyl alcohol | 50 |
| 23. Pampas | 23.2 Southern Flat Pampas | Argentina | Urban streams | 34°41′S–34°51′S | 58°18′W–58°21′W | 0.22–3 µm | CTAB-Chloroform-Isoamyl alcohol | 14 |
| 23. Pampas | 23.2 Southern Flat Pampas | Argentina | Shallow lakes | 34°34′S–35°51′S | 57°52′W–61°03′W | 0.22–54 µm 0.22–45 µm | CTAB-Chloroform-Isoamyl alcohol (53) CTAB-Chloroform-Isoamyl alcohol (10) | 63 |
| 19. Southern Andes | 19.2 Valdivian Forested Hills and Mountains | Argentina | Deep and shallow lakes | 41°00′S–41°21′S | 71°18′W–71°49′W | >0.22 µm | PowerSoil extraction kit | 7 |
| 24. Monte-Patagonian | 24.2 Patagonian Tablelands | Argentina | Rivers and reservoirs | 39°15′S–40°45′S | 68°44′W–71°06′W | 0.22–18 µm | CTAB-Chloroform-Isoamyl alcohol | 15 |
| 24. Monte-Patagonian | 24.2 Patagonian Tablelands | Argentina | Shallow lakes | 46°71′S–47°09′S | 71°03′W–71°19′W | 0.22–3 µm | CTAB-Chloroform-Isoamyl alcohol | 11 |
| 24. Monte-Patagonian | 24.2 Patagonian Tablelands | Argentina | Shallow lakes | 46°44′S–48°40′S | 71°03 W–71°31′W | 0.22–3 µm | CTAB-Chloroform-Isoamyl alcohol | 41 |
In all cases the V3-V4 region of gen 16S rRNA was sequenced with illumina MiSeq using the primers 341 F (5′-CCTACGGGNGGCWGCAG-3′) and 805 R (5′-GACTACHVGGGTATCTAATCC-3′).
Fig. 2Sampling sites included in the µSudAqua[db] database by ecoregion. In color are highlighted the different ecoregions. The point size indicates the number of samples in the same sample site (e.g. time series). Triangles stand for samples sequenced with different primers from Herlemann & collaborators[21]. Those samples that constitute the µSudAqua[db.sp] database are indicated by circles.
Fig. 3(A) Number of ASVs (richness) and (B) number of reads (relative abundance) of the major bacterial taxonomic groups that contribute with more than 1% of the total reads by ecoregion.
| Measurement(s) | taxonomic diversity assessment by targeted gene survey |
| Technology Type(s) | next generation DNA sequencing |
| Sample Characteristic - Organism | Bacteria • Archaea |
| Sample Characteristic - Environment | aquatic environment |
| Sample Characteristic - Location | South America |