Literature DB >> 32217735

DNA metabarcoding reveals metacommunity dynamics in a threatened boreal wetland wilderness.

Alex Bush^1,2, Wendy A Monk³, Zacchaeus G Compson^4,5, Daniel L Peters⁶, Teresita M Porter^7,8,9, Shadi Shokralla^8,9, Michael T G Wright^8,9, Mehrdad Hajibabaei^8,9, Donald J Baird⁴.

Abstract

The complexity and natural variability of ecosystems present a challenge for reliable detection of change due to anthropogenic influences. This issue is exacerbated by necessary trade-offs that reduce the quality and resolution of survey data for assessments at large scales. The Peace-Athabasca Delta (PAD) is a large inland wetland complex in northern Alberta, Canada. Despite its geographic isolation, the PAD is threatened by encroachment of oil sands mining in the Athabasca watershed and hydroelectric dams in the Peace watershed. Methods capable of reliably detecting changes in ecosystem health are needed to evaluate and manage risks. Between 2011 and 2016, aquatic macroinvertebrates were sampled across a gradient of wetland flood frequency, applying both microscope-based morphological identification and DNA metabarcoding. By using multispecies occupancy models, we demonstrate that DNA metabarcoding detected a much broader range of taxa and more taxa per sample compared to traditional morphological identification and was essential to identifying significant responses to flood and thermal regimes. We show that family-level occupancy masks high variation among genera and quantify the bias of barcoding primers on the probability of detection in a natural community. Interestingly, patterns of community assembly were nearly random, suggesting a strong role of stochasticity in the dynamics of the metacommunity. This variability seriously compromises effective monitoring at local scales but also reflects resilience to hydrological and thermal variability. Nevertheless, simulations showed the greater efficiency of metabarcoding, particularly at a finer taxonomic resolution, provided the statistical power needed to detect change at the landscape scale.

Entities: Chemical Disease Gene Mutation Species

Keywords: detectability; occupancy; power analysis; stochasticity; taxonomic resolution

Year: 2020 PMID： 32217735 PMCID： PMC7165428 DOI： 10.1073/pnas.1918741117

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

Tackling the global loss of biodiversity (1) is hindered by a lack of basic biological information needed to guide sustainable management strategies (2). Despite legal protections, freshwater ecosystems are increasingly degraded by multiple stressors (3). In addition, the quality and volume of data collected by monitoring programs often fail to support evidence-based management decisions (4–6). Here, we demonstrate how DNA metabarcoding can resolve challenges faced by traditional monitoring, alter our perspectives on ecosystem dynamics, and improve our understanding of natural variation and sampling error, supporting evidence-based decision making. DNA barcoding uses short genetic sequences to identify individual taxa. By contrast, DNA metabarcoding supports simultaneous identification of entire assemblages via high-throughput sequencing (7, 8). Using metabarcoding for ecosystem monitoring provides an opportunity to identify organisms in bulk samples at a high taxonomic resolution consistently and accurately (Biomonitoring 2.0; ref. 9). The accuracy, consistency, and resolution of taxonomic identification remains a constraint for many biomonitoring programs that must trade off data quality to make assessment protocols rapid and cost-effective (10). Aquatic macroinvertebrates exemplify this challenge, as their diversity of forms and functions are sensitive to multiple drivers of ecosystem condition. Thus, ecosystem degradation can be identified based on changes in assemblage composition due to environmental filtering (5). Despite decades of development, the challenges associated with traditional methods of sample processing limit inference of biomonitoring programs to gross status classifications (e.g., ref. 11). Metabarcoding presents an opportunity to describe community composition more accurately and consistently, supporting more effective and informative biomonitoring (12, 13). The Peace–Athabasca Delta (PAD) in northern Alberta, Canada (Fig. 1 and ref. 14) is North America’s largest inland delta (∼6,000 km2) and is located at the confluence of the Peace and Athabasca Rivers, consisting of hundreds of lakes and wetlands that become connected during flood events, particularly when spring snowmelt leads to ice jams (15, 16). The PAD is a Ramsar wetland, protected within Wood Buffalo National Park, a United Nations Educational, Scientific and Cultural Organization World Heritage site. Nonetheless, there have been concerns that the PAD could be affected by upstream developments, including current and proposed hydroelectric dams on the Peace River, continued expansion of oil sands mining on the Athabasca River to within 30 km of the park boundary, and climate change (17). Assessing how such factors influence the integrity of a natural wilderness is made more challenging by the paucity of biological surveys that have been conducted and the logistics of working in such a remote region. To gain a better understanding of the PAD’s ecology, rapid assessments of aquatic macroinvertebrates have been conducted in since 2011 to establish a baseline understanding of the ecosystem’s diversity and structure (14, 18). Importantly, while surveys have followed established protocols from the Canadian Aquatic Biomonitoring Network (hereafter CABIN) (19), samples were processed using both traditional and DNA metabarcoding approaches, allowing us to test the power of each approach to support environmental management of the delta.

Fig. 1.

Location of sampling sites in the PAD. (Inset) The full extent of Wood Buffalo National Park in Alberta (AB), and boundaries of neighboring provinces: British Columbia (BC), Saskatoon (SK), and the Northwest Territories (NWT). Photo taken at Rocher River wetland (PAD 37). Sampling error is a ubiquitous feature of any ecological survey, irrespective of the methodology, and of particular concern is the frequency of false absences (20). Depending on the covariance of species’ detectability with other environmental characteristics, models can be structurally biased and their confidence overestimated (21). Although imperfect detection is very common, and freshwater biomonitoring protocols have a long history of standardization to maintain comparability (5), there are few examples of research explicitly quantifying the nature of sampling error (e.g., ref. 22). Instead, variability due to sampling error is usually combined with that from natural sources (i.e., as “noise”; ref. 23). An alternative is to specify the likelihood of detection (the observation process model) and simultaneously correct our estimates of species occurrence (the ecological state model) within a single hierarchical framework (24). In this study, we employed multispecies occupancy models (MSOMs; refs. 25 and 26) to account for the effects of imperfect detection on estimates of macroinvertebrate diversity, drawing upon data from 6 y of macroinvertebrate surveys in the PAD. We quantify the efficiency with which the macroinvertebrate community can be surveyed using both traditional morphological identification and DNA metabarcoding and demonstrate that these approaches make a qualitative difference to our view of how the metacommunity is structured, to the efficiency of monitoring, and consequently to our power to detect change (27).

Results

A key difference between our sampling approaches was that the standard CABIN wetland protocol (19) provided estimates of relative abundance based on counts from a subset of each sample, whereas sequences identified using DNA metabarcoding were converted to presence–absence data (13, 28). In addition, CABIN identified 74 families based on morphological features, but metabarcoding could identify 109 families, as well as 263 genera (). As a result, we trained four hierarchical MSOMs for each data type: 1) counts of macroinvertebrate families from CABIN (CABIN Fcount), 2) the presence–absence of macroinvertebrate families from CABIN (CABIN Fpa), 3) the presence–absence of macroinvertebrate families from DNA data (DNA Fpa), and 4) the presence–absence of macroinvertebrate genera from DNA data (DNA Gpa). Although metabarcoding can discriminate among taxa at even finer resolution (i.e., species), given the prevalence was lower than the prevalence of genera and the available sample size, we did not feel the detectability and occupancy of those taxa could be estimated reliably.

Occupancy and Detectability.

The CABIN Fcount model predicted total abundance was dominated by four taxa (two Chironomidae subfamilies, Oligochaeta and Planorbidae) but also suggested that almost all taxa were present everywhere within the PAD (i.e., site occupancy ∼1), with no environmental covariates retained in the final model. This scenario is plausible, but if we apply the predicted probabilities of detection and same survey effort (number of individuals counted), and assume taxa are sampled at random from the pool of individuals, the CABIN Fcount model suggested we should have observed 38 taxa on average instead of 18. Nonrandom aggregation of individuals is typical of ecological communities (29) and may be why the model appeared to be misspecified. In contrast to the count-based model, the presence–absence models all suggested taxon site occupancy was below 1 (Fig. 2 and ), although the “U-shaped” form of the hyperparameters in Fig. 2 was an artifact of the bounded distribution (29). The CABIN Fpa model that estimated the probability of detecting macroinvertebrate families was lower than the DNA Fpa model (Fig. 2 right-skewed relative to Fig. 2 ; see also Fig. 3). Models must balance the expected occupancy to fit with the detections, and probability of detection made in each survey, and the CABIN Fpa model therefore also predicted higher occupancy than the DNA Fpa model (Fig. 2 left-skewed relative to Fig. 2). The differences in occupancy and detectability of specific families were not associated with prevalence, although many taxa were not recorded by both approaches and therefore cannot be compared (red points in Fig. 3; see for detail). In addition, detectability using DNA metabarcoding is intrinsically linked to the genetic primer used (30), and the importance of primer bias is well known from mock laboratory samples (e.g., ref. 31). Here we show biases in detectability can be quantified as part of the observation model, either at the community level (Fig. 2 ) or for individual taxa (). Finally, neither the CABIN Fcount nor Fpa model retained environmental variables to explain changes in occurrence, whereas both DNA occupancy models did so consistently. The covariates retained were 1) the frequency of spring and summer floods (i.e., connections between the wetland and river), 2) time since the ice melt, and 3) maximum water temperature prior to each survey. Responses to environment at the community level were almost neutral (), and the posterior distribution of coefficients differed from zero for only a minority of taxa (), but their inclusion in the model suggests the high interannual turnover () may be explained in part by deterministic factors.

Fig. 2.

Fig. 3.

Comparison of (A) occupancy and (B) detectability estimates in models trained by CABIN data and DNA metabarcode data at the family level (n = 50). Red points indicate taxa not observed by the complementary method, that is, 18 and 59 families were unique to CABIN and metabarcoding, respectively. See for further information on the identities of unique taxa.

Predicted occupancy (A and C) and detectability (B, D, and E) of taxa based on the presence–absence data collected using the CABIN protocol (A and B) and DNA metabarcoding (C–E) at the family level. Detectability using metabarcoding is further split by primer pair (D and E). The shaded polygons describe the probability density of the community hyperparameters, and the gray bars indicate the underlying frequency of the values estimated for each taxon. See for the CABIN Fcount and DNA Gpa model distributions. Comparison of (A) occupancy and (B) detectability estimates in models trained by CABIN data and DNA metabarcode data at the family level (n = 50). Red points indicate taxa not observed by the complementary method, that is, 18 and 59 families were unique to CABIN and metabarcoding, respectively. See for further information on the identities of unique taxa.

Alpha, Beta, and Gamma Diversity.

Recognizing that imperfect detection is commonplace in ecological surveys, it follows that regional (gamma diversity; ) and local (alpha diversity; ) diversity is routinely underestimated. As the CABIN Fcount model effectively assumed alpha and gamma diversity were equal, it estimated that only two families were likely to have gone undetected in the metacommunity. Conversely, the CABIN Fpa model estimated ∼20 families were missed (i.e., γ = 95), a 28% increase on the observed total. Interestingly, this estimate was still short of the richness observed using metabarcoding (n = 109), and based on the distribution of detection probabilities, the DNA Fpa and Gpa models estimated the metacommunity could potentially contain 130 families and 360 genera, a 19% and 37% increase (). Although imperfect detection always underestimates richness, its effect on the observed compositional dissimilarity between sites (beta diversity) is less predictable. The observed pairwise dissimilarity of samples consistently exceeded 40%, both within and between years, with no consistent increase over time (). Our analysis showed that compositional turnover in the CABIN dataset was overestimated, whereas for the DNA models the corrected and observed dissimilarities were similar (), although temporal turnover (i.e., interannual, within-site dissimilarity) was marginally overestimated by the DNA dataset. This implies that although metabarcoding underestimated alpha diversity at each site, the proportions of the taxa missed that were shared or unique to site pairs were similar. Finally, one predictable aspect of turnover is that as the taxonomic resolution is increased subtaxa are on average less prevalent than their parental ranks (), typically harder to detect (presumably because they are also less abundant than parental ranks), and therefore dissimilarity among sites at the genus level was 7% higher compared to the family level.

Power Analysis.

The power to detect statistically significant changes depends on the strength of the ecological signal relative to other natural variability, as well as the efficiency with which we can accurately describe ecological state, factors directly related to taxonomic resolution, and detectability (27). We simulated the PAD metacommunity based on a fitted distribution of occupancy and estimated gamma diversity to represent its baseline condition and then took subsamples that reflected the observed biases in each sampling approach. Note that the true state and behavior of the system are unknown, and underlying processes were instead inferred by the MSOMs after quantifying observation biases. Human impacts that might affect the PAD system in the future were also unknown, and this analysis therefore aimed to identify our power to observe a generalized stressor effect. To keep the process model consistent, we based simulations on the most detailed DNA Gpa model and then aggregated taxa to higher ranks to compare power among sampling approaches. A complete description of the simulation and power analysis is provided in . A natural consequence of high, near-random, background variation in composition is that degradation of a wetland site would need to be severe to raise concerns. Instead, it is more effective to measure when there is a shift away from our expectation of the PAD metacommunity aggregated across sites (i.e., changes in occupancy of many taxa). Even so, based on the high natural variability of the PAD, the survey effort needed to confidently detect shifts in occupancy in any year would be prohibitive. As a result, we considered a monitoring system to be adequate if significant differences in composition were detected within 2 y (at least 50% of the time; ). Our results demonstrated that our power increased 1) as the number of sites sampled increased (but the rate of increase declined beyond 8 to 10 sites per year); 2) with DNA metabarcoding compared to CABIN sampling, and with genus- compared to family-level data; and 3) if we sampled sites multiple times (but gains depended on the number of sites and sampling approach) (Fig. 4). Statistical power also varied by stressor type because metacommunity shifts were readily apparent if the stressor impacted prevalent taxa, whereas changes were challenging to observe if prevalent taxa were also tolerant. The relationship between taxa occupancy and their sensitivity to a stressor was therefore most influential when sample sizes, and hence our power to detect rare taxa, were low ().

Fig. 4.

Minimum reduction to community occupancy that is detectable >50% of the time with 95% confidence in response to number of sites surveyed annually. Lines show the average of 100 simulations based on the CABIN Fpa (blue), DNA Fpa (red), and DNA Gpa (green) occupancy-detection models, with either single (open symbol) or triplicate (closed symbol) samples per site. Taxon tolerance was not correlated with occupancy. See for further information.

Discussion

The PAD represents one of Canada’s national biodiversity treasures. However, multiple external pressures, including the development of oil sands, hydroelectric power, wildfires, and climate change are potentially affecting biodiversity through modification of natural physical processes in the area and threaten its World Heritage listing (17). Our study demonstrates that the PAD is an immensely rich habitat, including over 25% and 20% of all aquatic macroinvertebrate families and genera recorded by the CABIN national biomonitoring program (32). This total may still underestimate the total diversity present, and we demonstrate the importance sampling errors can have for modeling this community. Communities exhibited near-random patterns of spatial and temporal turnover, a property rarely observed in freshwater systems (33). As a result, impacts on the wetland macroinvertebrate community are difficult to establish at local scales because occurrence is weakly related to environmental factors and site composition can fluctuate rapidly over time (). Properties of the metacommunity must therefore be aggregated across sites, and directional shifts can only be inferred when dissimilarities are unlikely to be explained by stochastic differences in our null baseline model (34). Our analysis shows that detecting a decline in metacommunity condition would depend on both sample size and stressor type, and that further changes to sampling design may be required to detect change earlier or at specific locations of concern. The most significant finding of this study was the value added to biomonitoring data generated by DNA metabarcoding of bulk community samples. Our analysis supports previous studies that have shown the breadth and resolution of taxonomic information achievable with metabarcoding (e.g., refs. 14 and 35). Clear differences in occupancy and detectability profiles with metabarcoding (Fig. 3) influence our description of baseline reference conditions (36, 37). Further differences in estimates of occupancy with increasing taxonomic resolution () may also indicate differential environmental responses (38, 39). We did not find evidence to suggest the count data (“relative abundance”) in CABIN samples were necessary to detect changes in ecological structure. In fact, only the presence–absence DNA metabarcoding models identified significant relationships with the major environmental covariates of this region (16). These effects could be estimated precisely because detectability, and thereby sampling efficiency, was higher for so many taxa using DNA metabarcoding (13). We also used the occupancy model framework to compare detectability of each taxon with different primers, a more robust measure of their complementarity than lists of taxa observed. Quantifying detectability is vital to making the results of this study comparable to others with varying protocols, and this approach could be used to refine and select complementary primers (28, 31). Crucially, DNA metabarcoding, particularly at the genus level, substantially improved our power to detect ecosystem-scale changes compared to traditional CABIN sampling (Fig. 4 and ). Extending this approach to the species level could improve overall power further still, as long as a sufficient number of species have a similar probability of detection as their parent genera. A second significant outcome was the importance of explicitly considering imperfect detection. Practitioners are well aware of sampling differences (e.g., ref. 40) but have typically focused on how those errors propagated to aggregated metrics, rather than explicitly quantifying the sources of uncertainty (23). Hierarchical occupancy models accommodate irregularly sampled data, estimate community properties (that extend inference to rare taxa), and allow straightforward biological interpretations of those parameter estimates (41). There have been few examples of hierarchical models accounting for detectability in freshwater ecology (e.g., refs. 42 and 43), despite studies showing it can bias our interpretation of taxonomic, functional, and phylogenetic diversity at the community level (e.g., ref. 44). Given the high prevalence of false absences it is not surprising occupancy models are becoming commonplace for analyzing eDNA data (45), although it appears multispecies models are still rare (46). Importantly, what these and other studies have shown is that the time and expense of adding replicate samples may be the most efficient way to improve the statistical power of a study (24, 47). Inferences about the number of taxa missed in the metacommunity naturally carry some uncertainty (48), but by acknowledging imperfect detection, risks can be quantified, and decision makers’ overall efficiency can be improved. This analysis minimizes the likelihood of management agencies responding to a false signal of degradation (type 1 error) and identifies how to optimize survey design to ensure we have the necessary power to detect a desired degree of change (type 2 error) (27, 49). Although our analysis provided evidence of environmental filtering, the distribution of beta diversity was equivalent to that expected from random assembly, suggesting that the metacommunity was operating in a quasi-neutral manner at the scale of our analysis (50). Quasi-neutral dynamics are expected to be commonly observed in taxon-rich communities, but given the high degree of landscape connectivity, we would expect mass effects, rather than dispersal limitation, to underlie the low habitat specificity of the community. Our dataset was insufficient to identify which mechanisms underlie metacommunity assembly, because the same patterns of turnover may be the result of different assembly processes (51). While further studies could reduce this uncertainty, currently models of coexistence that combine stochasticity with niche theory may be the most suitable option to explain the structure and dynamics of aquatic invertebrate communities in the PAD, without relying on the fragile premise of ecological equivalence in neutral theory (50, 52). Although many ecologists have acknowledged stochastic processes are likely to have a role in understanding community composition (53, 54), we are unaware of any biomonitoring programs that incorporate, or even acknowledge, community assembly mechanisms other than environmental filtering (e.g., ref. 37). Our results firmly challenge that traditional perspective, and if we wish to understand the resilience of the PAD, we must adopt a metacommunity perspective (55). More broadly, a metacommunity perspective of the PAD could indicate which assembly processes are absent from more managed landscapes, therefore providing critical insights into the mitigation of biodiversity loss at the landscape scale. The isolation of wilderness areas like the PAD implies a pristine nature, but that isolation has also hindered our appreciation of the sheer magnitude of diversity which occurs there and has until now precluded a basic description of how community structure changes over space and time. Near-random patterns of assembly and substantial sampling error pose a challenge to detecting ecosystem change. Without evaluating data quality and statistical power at the start, many monitoring programs are unable to confidently reject a false null hypothesis, undermining project goals and providing a misleading sense of achievement (27, 56). Despite the high turnover, we show the statistical power of data generated by DNA metabarcoding was superior to traditional biomonitoring approaches for the detection of large-scale ecosystem change. Although macroinvertebrate composition provides a wealth of information, the power to detect and draw inference from taxonomic changes will be improved by further refining the list of taxa that respond to particular threats (e.g., oil sands contaminants), particularly by linking metabarcoding to trait databases (57), and this remains a major focus of our ongoing research.

Materials and Methods

Field Surveys.

Field survey methods followed the CABIN wetland macroinvertebrate protocol (14, 19). Briefly, aquatic invertebrates were sampled by sweeping submerged and emergent aquatic vegetation at wetland edges for 2 min. A sterile 400-μm-mesh net was steadily moved in a zig-zag pattern, from the surface of the sediment to the water surface, to capture disturbed organisms and minimize the amount of sediment collected. Excess vegetation was carefully rinsed and removed, and samples with excess sediment were sieved. Material was placed in sterile 1-L polyethylene sample jars, filled no more than half full, and immediately preserved in 95% ethanol in the field. Samples were stored in a cooler with ice in the field and transferred to a freezer at the field station before shipment. Nets were disinfected between each new site, and field crews wore nitrile gloves to collect and handle samples, minimizing the risk of cross-site contamination.

Sample Processing.

In total, 126 and 138 samples were collected from 72 separate site visits for the CABIN and DNA metabarcoding datasets, respectively (). Samples identified using morphological characteristics were processed and identified in accordance with the CABIN laboratory manual (19). Briefly, material from each 2-min sweep was subsampled using a 100-cell Marchant box. Successive cells were processed until at least 300 individuals were identified and a minimum of five cells were processed. Most taxa were identified to the family level, although for some groups only class- or order-level identification was recovered, and given the importance and diversity of Chironomidae, we retained four subfamily divisions that could be reliably identified () (58). The laboratory protocol for processing samples for DNA metabarcoding followed the same procedure as outlined in Gibson et al. (14). This targeted the CO1 amplicon using two complementary primers, BE/BR5 and F230R (30, 59). All DNA samples were analyzed using BE or BR5 that target the same COI region, and F230R was introduced in 2012. While field and laboratory protocols have remained consistent since the study began, there have been a number of advances made in bioinformatic tools, as well as expansion of the reference sequence libraries supporting the identification of taxa (60). The bioinformatic pipeline used to process all samples in this study, as well as the CO1 classifier that allocates sequences to the most likely taxa, is described in and available on GitHub (61). The sequences generated have been deposited in the NCBI Sequence Read Archive, project PRJNA603969.

Hierarchical MSOM.

MSOMs employ a flexible hierarchical framework that allows for imperfect detection to predict species’ occurrence (25). The hierarchy consists of an underlying state model that describes the probability of species’ occurrence and a second observation model to describe the probability of detecting that species when it is present (informed by detection across replicates). The fitted state model is thereby updated to account for the probability of false negatives. MSOMs extend this single-species approach by assuming species’ coefficients are related and can be treated as random effects, drawn from a common distribution (hyperparameters). Data augmentation extends the community approach a step further by using the hyperparameters for occupancy and detectability to estimate the possibility additional taxa may have been present but by chance were never observed. Our analysis adapted the notation and code provided by ref. 41 as the basis for this study (see for model code): Data augmentation process: State process: Observation process: Models of taxon heterogeneity: Given: The observed data y describe the detection or nondetection of taxon k at site i in replicate sample j. Replicate observations, in our case simultaneous independent samples (21), allowed the model to discriminate between processes that determine the system’s state (occupancy) and the observation process (detectability). The occupancy of each taxon at each site z is described by a Bernoulli trial with probability ψ, and the likelihood of detecting the respective taxa in each replicate sample is described by another set of Bernoulli processes with probability p. Seven water temperature and flood regime variables were tested as covariates within a multiple logistic regression for occupancy, and measures of sample processing effort were tested for detectability (sequencing depth and the number of individuals identified). Individual intercepts and slopes represented species-specific random effects, governed by a common prior distribution whose mean and variance were estimated as a community-level hyperparameter. The statistical distributions of the parameters governing occupancy and detectability shared by the community were used to consider the possibility of other taxa in the metacommunity that were not recorded in any visit to any site, a process known as data augmentation (48, 62). Given a sufficiently large total pool of M potential taxa, a set of binary indicators, w, governed by the parameter Ω, represent the probability each taxon is part of the community. The total number of taxa in the metacommunity (γ diversity) is therefore simply the sum of w. The occupancy model above was suitable for presence/absence observations of taxa, but CABIN samples also included information on the relative abundances of taxa. To utilize all information available, we constructed a community-level N-mixture model that estimates the latent abundance N of each taxon, rather than their occurrence (z), and modeled counts as a function of a Poisson distribution: State process: Observation process: Models of taxon heterogeneity: Finally, model selection for covariates of taxon heterogeneity in both the occupancy and N-mixture models was determined by a set of binary indicator variables Vx1-xn, one for each of the n predictor variables used (63). Using Vx ∼ Bernoulli(0.5) as standard priors, variables had an equal likelihood of being included or excluded from likelihood estimates, and model selection was therefore based on which combination had the highest joint posterior probability p(Vx1-xn = 1). Note convergence of the Vx indicators was very slow, particularly in the most complex models, and a “slab and spike” approach did not improve mixing (see 7.6.2 in ref. 41). Analyses were conducted using the R package jagsUI (64). We assessed model convergence of all monitored parameters across chains by visual inspection of trace plots and by using the Gelman–Rubin statistic (65), with the diagnostic value <1.1. As overdispersion cannot be estimated from the binary responses in occupancy models (41), plots of Dunn–Smyth residuals for fitted estimates of occupancy and detectability were used to evaluate the fit of separate taxa (66). Although plots suggest the models were well fit in most cases, the pattern of residuals suggested there may have been other covariates, or nonlinear effects, missing from the models influencing the occupancy of some taxa.

Simulation and Power Analysis.

The code and process used to simulate communities are described in detail in . In summary, a hypothetical presence–absence matrix of the metacommunity was derived from estimates of gamma diversity and occupancy in the DNA Gpa occupancy model from which we could manipulate sampling designs. Environmental covariates were varied according to the mean and SD of values observed from the surveys available to us (), but the simulation was not spatially explicit. While occupancy covariates drove some temporal turnover (; ∼10 tp 27%), this was insufficient to replicate the turnover observed (), so permutation of the presence–absence matrix was used to simulate further stochastic changes in composition (i.e., local extinction/colonization; ref. 67). Replicating observed turnover required the complete redistribution of occurrences (i.e., random assembly patterns). Taxon occupancy (row sums) and site richness (column sums) were held constant during permutation. The metacommunity was modified by successively removing occurrences of taxa based on a hypothetical distribution of tolerances, which were themselves generated to covary with the distribution of occupancy. Sampling error was applied by a binomial function weighted by the taxon’s probability of detection, and the “detected” composition of reference and modified metacommunities were then compared using mvabund (68). Power of DNA Gpa was compared to DNA Fpa and CABIN Fpa approaches by aggregating genera to the family level and subsequently applying the family-level detection probabilities.

Data Availability.

All datasets needed to evaluate our conclusions are publicly available as referenced within the article and described in .

21 in total

1. Environmental DNA.

Authors: Pierre Taberlet; Eric Coissac; Mehrdad Hajibabaei; Loren H Rieseberg
Journal: Mol Ecol Date: 2012-04 Impact factor: 6.185

2. Why most conservation monitoring is, but need not be, a waste of time.

Authors: Colin J Legg; Laszlo Nagy
Journal: J Environ Manage Date: 2005-08-19 Impact factor: 6.789

3. Estimating species richness and accumulation by modeling species occurrence and detectability.

Authors: Robert M Dorazio; J Andrew Royle; Bo Söderström; Anders Glimskär
Journal: Ecology Date: 2006-04 Impact factor: 5.499

4. A truce with neutral theory: local deterministic factors, species traits and dispersal limitation together determine patterns of diversity in stream invertebrates.

Authors: Ross Thompson; Colin Townsend
Journal: J Anim Ecol Date: 2006-03 Impact factor: 5.091

5. Effects of sample standardization on mean species detectabilities and estimates of relative differences in species richness among assemblages.

Authors: Yong Cao; Charles P Hawkins; David P Larsen; John Van Sickle
Journal: Am Nat Date: 2007-07-19 Impact factor: 3.926

6. Replication levels, false presences and the estimation of the presence/absence from eDNA metabarcoding data.

Authors: Gentile F Ficetola; Johan Pansu; Aurélie Bonin; Eric Coissac; Charline Giguet-Covex; Marta De Barba; Ludovic Gielly; Carla M Lopes; Frédéric Boyer; François Pompanon; Gilles Rayé; Pierre Taberlet
Journal: Mol Ecol Resour Date: 2014-11-10 Impact factor: 7.090

7. Metacommunity-scale biodiversity regulation and the self-organised emergence of macroecological patterns.

Authors: Jacob D O'Sullivan; Robert J Knell; Axel G Rossberg
Journal: Ecol Lett Date: 2019-06-27 Impact factor: 9.492

8. Ecology. Essential biodiversity variables.

Authors: H M Pereira; S Ferrier; M Walters; G N Geller; R H G Jongman; R J Scholes; M W Bruford; N Brummitt; S H M Butchart; A C Cardoso; N C Coops; E Dulloo; D P Faith; J Freyhof; R D Gregory; C Heip; R Höft; G Hurtt; W Jetz; D S Karp; M A McGeoch; D Obura; Y Onoda; N Pettorelli; B Reyers; R Sayre; J P W Scharlemann; S N Stuart; E Turak; M Walpole; M Wegmann
Journal: Science Date: 2013-01-18 Impact factor: 47.728

9. Simultaneous assessment of the macrobiome and microbiome in a bulk sample of tropical arthropods through DNA metasystematics.

Authors: Joel Gibson; Shadi Shokralla; Teresita M Porter; Ian King; Steven van Konynenburg; Daniel H Janzen; Winnie Hallwachs; Mehrdad Hajibabaei
Journal: Proc Natl Acad Sci U S A Date: 2014-05-07 Impact factor: 11.205

Review 10. A new way to contemplate Darwin's tangled bank: how DNA barcodes are reconnecting biodiversity science and biomonitoring.

Authors: Mehrdad Hajibabaei; Donald J Baird; Nicole A Fahner; Robert Beiko; G Brian Golding
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2016-09-05 Impact factor: 6.237

8 in total

1. Comparison of traditional and DNA metabarcoding samples for monitoring tropical soil arthropods (Formicidae, Collembola and Isoptera).

Authors: Yves Basset; Mehrdad Hajibabaei; Michael T G Wright; Anakena M Castillo; David A Donoso; Simon T Segar; Daniel Souto-Vilarós; Dina Y Soliman; Tomas Roslin; M Alex Smith; Greg P A Lamarre; Luis F De León; Thibaud Decaëns; José G Palacios-Vargas; Gabriela Castaño-Meneses; Rudolf H Scheffrahn; Marleny Rivera; Filonila Perez; Ricardo Bobadilla; Yacksecari Lopez; José Alejandro Ramirez Silva; Maira Montejo Cruz; Angela Arango Galván; Héctor Barrios
Journal: Sci Rep Date: 2022-06-24 Impact factor: 4.996

8. Standardized high-throughput biomonitoring using DNA metabarcoding: Strategies for the adoption of automated liquid handlers.

Authors: Dominik Buchner; Till-Hendrik Macher; Arne J Beermann; Marie-Thérése Werner; Florian Leese
Journal: Environ Sci Ecotechnol Date: 2021-08-30