| Literature DB >> 35784064 |
Alexandra Schmidt1,2,3,4, Clément Schneider3,5, Peter Decker3,6, Karin Hohberg3,5, Jörg Römbke7, Ricarda Lehmitz3,5, Miklós Bálint1,3,8.
Abstract
Metagenomics - shotgun sequencing of all DNA fragments from a community DNA extract - is routinely used to describe the composition, structure, and function of microorganism communities. Advances in DNA sequencing and the availability of genome databases increasingly allow the use of shotgun metagenomics on eukaryotic communities. Metagenomics offers major advances in the recovery of biomass relationships in a sample, in comparison to taxonomic marker gene-based approaches (metabarcoding). However, little is known about the factors which influence metagenomics data from eukaryotic communities, such as differences among organism groups, the properties of reference genomes, and genome assemblies.We evaluated how shotgun metagenomics records composition and biomass in artificial soil invertebrate communities at different sequencing efforts. We generated mock communities of controlled biomass ratios from 28 species from all major soil mesofauna groups: mites, springtails, nematodes, tardigrades, and potworms. We shotgun sequenced these communities and taxonomically assigned them with a database of over 270 soil invertebrate genomes.We recovered over 95% of the species, and observed relatively high false-positive detection rates. We found strong differences in reads assigned to different taxa, with some groups (e.g., springtails) consistently attracting more hits than others (e.g., enchytraeids). Original biomass could be predicted from read counts after considering these taxon-specific differences. Species with larger genomes, and with more complete assemblies, consistently attracted more reads than species with smaller genomes. The GC content of the genome assemblies had no effect on the biomass-read relationships. Results were similar among different sequencing efforts.The results show considerable differences in taxon recovery and taxon specificity of biomass recovery from metagenomic sequence data. The properties of reference genomes and genome assemblies also influence biomass recovery, and they should be considered in metagenomic studies of eukaryotes. We show that low- and high-sequencing efforts yield similar results, suggesting high cost-efficiency of metagenomics for eukaryotic communities. We provide a brief roadmap for investigating factors which influence metagenomics-based eukaryotic community reconstructions. Understanding these factors is timely as accessibility of DNA sequencing and momentum for reference genomes projects show a future where the taxonomic assignment of DNA from any community sample becomes a reality.Entities:
Keywords: biomonitoring; eukaryotes; genome completeness; genome size; invertebrates; shotgun metagenomics; species composition; taxonomic bias
Year: 2022 PMID: 35784064 PMCID: PMC9170594 DOI: 10.1002/ece3.8991
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 3.167
FIGURE 1(a) Ratios of species biomass and sequencing reads assigned to these species in the four mock communities types. (b) GLM‐predicted effects of biomass, genome completeness, genome size, and repeat content on taxonomically assigned metagenomic reads. (c) Relative importance of GLM predictor variables
Composition of mock communities. For species where different developmental stages were available, individuals of different sizes were used to achieve the necessary biomass [adults + juveniles, e.g., Paramacrobiotus richtersi in mock 1: 4 + 1]. Mock 1: all species have equal biomass; mock 2: small species have higher biomass; mock 3: some, but not all small species have higher biomass than large species; mock 4: some small and some large species both have higher biomass than other small and large species
| Taxon | Mean body length (µm) | Body volume (10−6 µm3) | Number of individuals | |||
|---|---|---|---|---|---|---|
| Mock 1 | Mock 2 | Mock 3 | Mock 4 | |||
| Tardigrada | ||||||
|
| 700 | 12.1 | 4+1 | 9 | 0+9 | 2+5 |
| Nematoda | ||||||
|
| 340 | 0.15 | 355 | 1775 | 1420 | 710 |
|
| 380 | 0.10 | 521 | 1562 | 1562 | 521 |
|
| 620 | 0.28 | 190 | 570 | 380 | 190 |
|
| 930 | 0.98 | 54 | 162 | 54 | 162 |
| Collembola | ||||||
|
| 300 | 5.7 | 9 | 9 | 37 | 9 |
|
| 880 | 11.0 | 5 | 4 | 5 | 5 |
|
| 560 | 13.9 | 4 | 12 | 8 | 8 |
|
| 1090 | 17.3 | 3 | 6 | 6 | 6 |
|
| 1250 | 31.0 | 2 | 2 | 2 | 2 |
|
| 730 | 36.1 | 1+1 | 1+1 | 1+1 | 1+1 |
|
| 1090 | 44.1 | 1 | 1 | 4 | 4 |
|
| 1400 | 53.2 | 1 | 1 | 2 | 3 |
| Oribatida | ||||||
|
| 240 | 4.8 | 11 | 33 | 11 | 22 |
|
| 280 | 5.6 | 9 | 28 | 19 | 10 |
|
| 340 | 12.9 | 4 | 12 | 4 | 8 |
|
| 360 | 13.7 | 4 | 12 | 8 | 12 |
|
| 300 | 15.9 | 3 | 3 | 7 | 3 |
|
| 440 | 27.1 | 2 | 2 | 2 | 2 |
|
| 470 | 35.5 | 2 | 1 | 5 | 3 |
|
| 410 | 46.5 | 1 | 1 | 1 | 3 |
|
| 560 | 50.8 | 1 | 1 | 2 | 2 |
| Gamasida | ||||||
|
| 700 | 22.0 | 2+1 | 5 | 2+1 | 5+6 |
| Enchytraeidae | ||||||
|
| 4000 | Fragments | ||||
|
| 2500 | |||||
|
| 10500 | |||||
|
| 6500 | |||||
|
| 7500 | |||||
FIGURE 2Numbers over bars represent the actual numbers of correctly identified species, and false‐negative and false‐positive identifications. (a) Species identification success along different Kraken2 classification thresholds. (b) Species identification success along different subsample sizes
Model‐predicted biomass, taxon group, genome completeness, genome size, and repeat content effects on assigned metagenomic read numbers. All predictors were scaled before model fitting. Genome size was log‐normalized before scaling. Collembola served as a model intercept
| Estimate | Standard error |
|
| |
|---|---|---|---|---|
| (Intercept) | 14.047 | 0.132 | 106.498 | .000 |
| Biomass | 0.192 | 0.054 | 3.582 | .000 |
| Enchytraeidae | −6.910 | 1.748 | −3.953 | .000 |
| Nematoda | 0.947 | 0.352 | 2.688 | .008 |
| Oribatida | −1.194 | 0.212 | −5.633 | .000 |
| Tardigrada | −0.002 | 0.369 | −0.005 | .996 |
| Genome completeness | 0.599 | 0.122 | 4.891 | .000 |
| Genome size | 1.238 | 0.160 | 7.761 | .000 |
| Repeat content | −0.244 | 0.082 | −2.966 | .003 |
FIGURE 3Redundancy analysis ordination of mock community replicates along the taxonomically assigned metagenomic reads