Literature DB >> 35333301

Metabarcoding versus mapping unassembled shotgun reads for identification of prey consumed by arthropod epigeal predators.

Débora Pires Paula1, Suellen Karina Albertoni Barros2, Rafael Major Pitta3, Marliton Rocha Barreto2, Roberto Coiti Togawa1, David A Andow4.   

Abstract

BACKGROUND: A central challenge of DNA gut content analysis is to identify prey in a highly degraded DNA community. In this study, we evaluated prey detection using metabarcoding and a method of mapping unassembled shotgun reads (Lazaro).
RESULTS: In a mock prey community, metabarcoding did not detect any prey, probably owing to primer choice and/or preferential predator DNA amplification, while Lazaro detected prey with accuracy 43-71%. Gut content analysis of field-collected arthropod epigeal predators (3 ants, 1 dermapteran, and 1 carabid) from agricultural habitats in Brazil (27 samples, 46-273 individuals per sample) revealed that 64% of the prey species detections by either method were not confirmed by melting curve analysis and 87% of the true prey were detected in common. We hypothesized that Lazaro would detect fewer true- and false-positive and more false-negative prey with greater taxonomic resolution than metabarcoding but found that the methods were similar in sensitivity, specificity, false discovery rate, false omission rate, and accuracy. There was a positive correlation between the relative prey DNA concentration in the samples and the number of prey reads detected by Lazaro, while this was inconsistent for metabarcoding.
CONCLUSIONS: Metabarcoding and Lazaro had similar, but partially complementary, detection of prey in arthropod predator guts. However, while Lazaro was almost 2× more expensive, the number of reads was related to the amount of prey DNA, suggesting that Lazaro may provide quantitative prey information while metabarcoding did not.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  diet analysis; environmental DNA; generalist predators; gut content analysis

Mesh:

Substances:

Year:  2022        PMID: 35333301      PMCID: PMC8952265          DOI: 10.1093/gigascience/giac020

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


Data Description

Background

The use of high-throughput DNA sequencing for studying species composition or diversity in environmental samples has been widely adopted, and metabarcoding has become the most commonly used method to study environmental DNA (eDNA) [1-3]. In metabarcoding, target barcode regions are enriched through PCR and sequenced for taxonomic identification (specific taxa or operational taxonomic units) through a bioinformatic workflow [4-8] by similarity of query sequences with taxonomically identified barcode sequences in a reference database [9]. The main limitations of metabarcoding are related to bias in primer amplification efficiency during the target barcode enrichment process and PCR amplification errors (e.g., point mutations, chimeras and heteroduplexes formation) [10-13]. Optimal primer pair(s) would amplify the barcode region(s) of a broad taxonomic range with equivalent efficiency across taxa, avoid formation of chimeras among closely related or abundant sequences, and provide the desired taxonomic resolution without missing any taxon [9,10,12,14]. Such a primer pair has yet to be found, so most recent metabarcoding studies have focused on amplifying barcodes of specific taxonomic groups and/or using multiple primer pairs for the same or different barcodes. In addition, for predator gut content analysis, the more common predator barcode sequences could mask amplification of closely related species. Alternative methods for species identification that have no sample DNA enrichment have been developed, and include those that assemble or do not assemble the reads prior to mapping them to a reference database. Methods with no sample DNA enrichment in which reads are assembled include mitochondrial metagenomics [15-21], metagenome skimming [22], and enrichment of a barcode sequence by hybridization capture followed by high-throughput sequencing [23]. Mitochondrial metagenomics and metagenome skimming are promising methods for general biodiversity surveys but not for predator gut content analysis because they rely on assembling genomes (organelles or nuclear genetic material) to function as a “superbarcode” to identify species. Because the DNA community in predator guts is degraded by digestion, satisfactory assembly of prey mitochondria with sufficient coverage is difficult, and, therefore, the application of mitochondrial metagenomics and metagenome skimming is compromised. Hybridization capture replaces PCR to enrich the barcode sequences in a sample and is suitable for gut content analysis. However, because it still depends on an intermediate step of enrichment of a particular barcode, it might also be subject to bias related to probe design and fidelity/efficiency of the hybridization. Assembly-free methods have been more recently proposed [24, 25], but just a few were tested for gut content analysis [26-29]. These methods basically comprise direct eDNA sequencing and mapping the unassembled reads to a reference database for taxa identification using a threshold of high similarity (>95%) with a minimum predefined overlap length for the matches. No barcode primer pair or probe is required, hypothetically minimizing bias and favoring quantitative estimates of prey content. Without amplification, however, the detection of rare eDNA is likely reduced. Suitable sequences of candidate prey (e.g., species co-occurring with the predators) may be missing from the database, and they need to be elucidated and added to the reference database, otherwise the prey cannot be detected. A major limitation is that different samples cannot be multiplexed in a library because there is no sample DNA enrichment step where individual tags are assigned to each sample. Consequently, every sample has to have its own library, increasing the total cost in library construction. On the other hand, they can use reference databases that can contain a variety of sequences from any part of a genome, not just metabarcodes, such as organellar or nuclear genome fragments [29]. Despite great advances and several options, at present there is no consensus on a “best practice” method for eDNA study of gut contents [9]. To improve the applicability of large-scale DNA-based methods for prey detection, we aimed to test the sensitivity, specificity, and accuracy of metabarcoding compared to a method of detection by mapping unassembled shotgun reads called Lazaro (so-called because it “resuscitates” species identifications from highly degraded DNA) to identify prey in the guts of several epigeal agricultural predators. We used melt curve analysis (MCA) to verify detections by the 2 methods and examined the number of true and false prey species detections, the number of true and false non-detections, the taxonomic resolution of true detections, and the relation between the number of reads for a detection and the relative prey DNA concentration.

Materials and Methods

Mock prey community

Newly emerged (48 h) unfed harlequin ladybird Harmonia axyridis (Coleoptera: Coccinellidae) adults (n = 10, sex ratio of 1:1) were individually supplied simultaneously with 7 species of prey, which were visually confirmed to be consumed within 1 hour. These were 1 adult aptera of the aphids (Hemiptera: Aphididae) Aphis glycines, Aphis gossypii, Aphis craccivora, Acyrthosiphon pisum, and Myzus persicae, and 1 egg each of the diamondback moth Plutella xylostella (Lepidoptera: Plutellidae) and Cycloneda munda (Coleoptera: Coccinellidae). Immediately before feeding (negative control) and after feeding, 5 beetles per sex were placed in 95% ethanol and stored at −80°C.

Arthropod field sampling

Epigeal arthropod predators were sampled twice a month (Brazilian authorization SISBIO 33683–1) in 2014/2015 (July through September) in Sinop-MT/Brazil for a 24-h period [30] in pitfall traps buried level with the soil in 4 replicated agricultural experimental plots: soybean/maize (Glycines max/Zea mays), palisade grass (Brachiaria brizantha), eucalyptus plantation (hybrid of Eucalyptus grandis and Eucalyptus urophylla), and an additive mixture of all 3. Pitfall traps contained 750 mL of water and 2 drops of detergent to break surface tension and preserve the captured specimens [31]. We obtained 12 samples for each of the 2 more abundant ant species (Hymenoptera: Formicidae), Pheidole flavens (n = 200 specimens/sample) and Dorymyrmex brunneus (n = 100 specimens/sample), and 1 sample of Solenopsis substituta (n = 273), 1 sample of the earwig Euborellia annulipes (Dermaptera: Anisolabididae) (n = 46), and 1 sample of the tiger beetle Tetracha sp. (Coleoptera: Carabidae) (n = 49). These species were the most abundant predator species sampled, and all the specimens were used for DNA extraction.

DNA extractions

To clean external DNA from the specimens, all the specimens from the feeding bioassay controls and from the field, before DNA extraction, were soaked individually for 40 min in 2.5% commercial bleach in 1.5 microtubes, followed by orbital rotation at 2g at 4°C for 40 min, discarding the washing solution and rinsing the specimens for 5× in ultrapure water [32]. For the ants, the gaster was separated and collected, and for the other species, guts were dissected under a microscope (30× magnification) immediately before DNA extraction using sterilized entomological dissecting tools. Sterilization was performed by soaking the dissection tools in 0.5% sodium hypochlorite for 10 min and autoclaving (121°C at 1 atm for 20 min), followed by rinsing abundantly with ultrapure water (MilliQ, Burlington, MA, USA) to minimize cross-contamination. Dissected guts or gasters from the same sample were pooled in a lysis buffer from the kit DNeasy Blood & Tissue (Qiagen, Germantown, MD, USA), placed on ice, and macerated with sterilized glass pestles separately for each sample. Cross-contamination was minimized by sanitizing surfaces and sterilizing all equipment and materials between specimen dissections, and filter tips were used to handle all liquids containing DNA. Total DNA extraction was performed using DNeasy Blood & Tissue kit (Qiagen). DNA purity and concentration were assessed by the NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA). DNA quantity was normalized to 1 mg/mL across samples and split into 3 parts, 1 for Lazaro, 1 for metabarcoding, and 1 for MCA in qPCR Roche LightCycler® 480 Real-Time PCR System II (Roche Life Science, Penzberg, Bay., Germany).

Preparation of the DNA samples for metabarcoding and Lazaro analyses

For Lazaro, the pertinent aliquots obtained from the previous step were normalized to 150 ng DNA/sample. For metabarcoding, a region of the 16S mitochondrial gene was amplified using the primer pair Ins16S_1short (forward 5′-TRRGACGAGAAGACCCTATA-3′ and reverse 5′-ACGCTGTTATCCCTAAGGTA-3′), which generates an amplicon of ∼190 bp [11], following the recommendation of using primer pairs that generate amplicons <200 bp for more degraded environmental samples, such as gut contents [33]. 16S barcode was chosen over COI for several reasons. Although the amount of arthropod 16S sequences in GenBank was considerably smaller (133,899 sequences, 2,630 families, 5,829 genera, and 48,711 species obtained from GenBank using the following search: arthropod[organism] AND 16S, release date 31 December 2018) than the COI sequences (2,570,787 sequences from 1,395 families, 10,394 genera, and 26,024 species obtained from GenBank using the following search: coi[Gene Name] AND arthropoda[Organism] AND “1900”[Publication Date] : “2018/12/31”[Publication Date]), it had higher taxonomic coverage. In addition, Clarke et al. [11] demonstrated better taxonomic coverage of 16S than COI and bias of COI to amplify more lepidopterans and dipterans, while failing to amplify other insect orders (e.g., hymenopterans). Last, according to Deagle et al. [10], Elbrecht et al. [34], and Sousa et al. [35], 16S has been preferentially used because 16S has some regions of more conserved sites across taxonomic groups, spanning sufficiently variable regions among taxa, resulting in more universal primers with equivalent taxonomic resolution than COI. Primers were not tagged to eliminate bias related to the tagging process [36], so an independent library was produced for each epigeal predator DNA gut sample. PCR reactions (0.2 μM primer pair) were performed in triplicate using Qiagen Multiplex PCR Master Mix and adding 1.28 μg/μL of bovine serum albumin to prevent PCR inhibition [37]. Cycling conditions were as follows: initial heat activation 15 min at 95°C, 40 cycles of 3-step cycling (denaturation 30 s at 94°C, annealing 90 s at 60°C, extension 90 s at 72°C), and final extension for 10 min at 72°C. Triplicates were pooled and purified using QIAquick PCR Purification Kit (Qiagen). Amplicons were quantified by NanoDrop (Thermo Fisher Scientific) and normalized in equimolar ratios across all the metabarcoding samples. We opted not to multiplex the 27 metabarcoding samples to keep the same coverage between metabarcoding and Lazaro methods.

DNA sequencing

All Lazaro and enriched barcode samples were dried in a speed vacuum centrifuge. For the feeding bioassay samples, 20 samples were dried, which comprised 5 treatments (without feeding and 4 times after feeding) × 2 sexes × 2 methods (metabarcoding and Lazaro). For the field samples, 54 samples were dried, which comprised 27 for metabarcoding and 27 for Lazaro. The dried feeding bioassay and field samples were shipped simultaneously to the Roy J. Carver Biotechnology Center (University of Illinois at Urbana-Champaign, IL, USA) to construct KAPA Hyper libraries (Kapa Biosystems, Wilmington, MA, USA) with insert size 200–600 bp using unique dual indexes. Quality-checked samples were sequenced by Illumina HiSeq4000 (Illumina HiSeq 3000/HiSeq 4000 System, RRID:SCR_016386) (150 bp paired-end, 151 cycles, HiSeq 4000 sequencing kit version 1) in a single lane. The Brazilian license to access the genetic heritage was provided by CGEN/SISGEN A8E3D94. Sequence SRA access codes are presented in Supporting Information 1.

Reference DNA databases and bioinformatic analysis

For metabarcoding, the reference database was constructed by extracting invertebrate 16S barcode regions from the European Nucleotide Sequence database (EMBL) (release 132; inv: invertebrate database/division; std: standard) using the ecoPCR program [38]. The EMBL is shared daily with GenBank (from USA) and DNA Data Bank of Japan databases [39]. In addition, 16S sequences for several species that were collected in the pitfall traps were determined and added to the other arthropod 16S sequences obtained from GenBank, resulting, after in silico PCR using ecoPCR and the Ins16S_1short primer, in a 16S amplicon database composed of 63,618 sequences for 39,397 species from 2,172 families. Prey detection analysis was performed using OBITools as in [26, 40,41]. The metabarcoding threshold for taxonomic assignment was 98% identity, and reads with count <100 were removed. Only “head” and “singleton” identifications were considered. For the Lazaro reference database [28,29], we constructed a comprehensive arthropod mitochondrial DNA database by obtaining all sequences (partial or complete, Fasta format) available at the time at GenBank (n = 3,381, distributed in 2,779 species from 1,850 genera in 598 families). In addition, following the mitochondrial elucidation method described in Paula et al. [28,29] and briefly presented in Supporting Information 1, we provided mitochondrial sequences of 29 taxa (Supplementary Table S1) corresponding to the main potential prey co-occurring with the sampled epigeal predators in the experimental plots, including the predators under analysis (taxa and taxonomic determinations in Supplementary Table S2). For taxonomic prey identification, we used the Lazaro method [42], which is designed to detect and quantify species from degraded eDNA samples. Briefly, this method takes raw BLASTn (BLASTn, RRID:SCR_001598) output of hit matches, identifies the mismatches (or single-nucleotide polymorphisms) between the query and reference sequence, removes false mismatches (e.g., degenerate IUPAC nucleotide codes, e.g., R = A or G; Y = T or C; S = C or G), reanalyzes overlap length and percent identity, filters the best-hit matches with an overlap-identity threshold, eliminates singleton reads, and filters the reads mapping to coding regions of their respective reference mitogenome. The scripts are available in the GitHub repository [43]. The best overlap-identity threshold was determined using previous experimental data [42], and determined to be 100% identity in an overlap length of ≥130 bp (Supporting Information 1). Fastq files were generated and demultiplexed with the bcl2fastq (bcl2fastq, RRID:SCR_015058) v2.17.1.14 Conversion Software (Illumina). The quality assessment for each dataset was done using FastQC (FastQC, RRID:SCR_014583) (v.0.11.3) [44]. Low-quality sequences (Phred <30) and library index adaptors were trimmed by Fastqc-mcf (v.1.04.807) [45] and Cutadapt (cutadapt, RRID:SCR_011841) (v.1.9.1) [46]. Retained high-quality Fastq reads were converted to Fasta format by SeqTK (Seqtk, RRID:SCR_018927) (v1.2) [47].

MCA confirmation of the field-detected prey

For the field samples, we performed MCA in Roche LightCycler® 480 Real-Time PCR System II to confirm the presence of the prey DNA detected by metabarcoding and Lazaro. The principle is based on the estimation of the melting temperature (Tm), which is the temperature at which 50% of the 2 strands of DNA dissociate, a property dependent on nucleotide composition and product length [48,49]. By monitoring denaturation of the PCR products with SYBR Green and fluorescence levels over a temperature gradient, it is possible to construct the melting curve [50,51]. The DNA source was the original DNA extracted from the gut contents of the predators. For 25 of the 30 species potentially detected as prey, we obtained specimens with confirmed taxonomy to determine a positive control reference Tm to distinguish TP and false-positive (FP) detections. Their DNA was extracted using the DNeasy Blood & Tissue kit (Qiagen). Species-specific primer pairs were designed as in Paula and Andow [52], nearly all in regions of the mitogenome (primer sequences at Supplementary Table S3), for all prey species detected by metabarcoding and Lazaro, using the program Primer 3 at Geneious v7.1.9 [53] and checked in NCBI/Primer-BLAST [54]. The cross-reactivity of these primers with related detected species is presented in Supplementary Figs S1–S4. The qPCR reactions (13 μL) were prepared using Maxima SYBR Green/ROX qPCR Master Mix (2×) (Thermo Fisher Scientific), 1.28 μg/μL of bovine serum albumin, and 10 ng of DNA per reaction and each specific primer pair at 0.3 μM. The amplifications were performed in 384-well plates with a Roche LightCycler® 480 Real-Time PCR System II using a 2-step cycling protocol (initial denaturation at 95°C for 10 min, ramp 4.4°C/s), and 40 cycles of denaturation at 95°C for 15 s (ramp 4.4°C/s) and annealing/extension at 60°C for 60 s (ramp 2.2°C/s), and a melt curve from 60°C to 95°C continuous (ramp 1°C/s) with 6 readings/°C. qPCR for each sample was performed in ≥3 technical replicates. No-template controls were included for every primer pair. Melt curves were constructed using the raw fluorescence data and diffQ in the library MBmca in R [55]. Positive prey detection and identification were considered if ≥2 technical replicates had -dF/dT more than 0.1 above background or if 1 technical replicate had -dF/dT more than 0.2 above background at the Tm expected for the prey. When there was no positive control, a sample was considered a TP if the 3 amplicon replicates had similar melting curves with the same sharp Tm. The presence of multiple peaks suggests that the PCR amplicons were heterogeneous and/or possibly mixed with chimeras or primer dimers.

Statistical analysis

For each metabarcoding and Lazaro library, we have the number of species detected, the number of reads for each detected species, and for the field samples, independent confirmation of each detection by MCA. We used MCA to classify true positive (TP) and false positive (FP or type I error) detections and true negative (TN) and false-negative (FN or type II error) non-detections. We are considering the following categories: TP is detected prey DNA confirmed by MCA; FP is detected prey DNA not confirmed by MCA; TN is prey DNA not detected by MCA when the species was not detected by metabarcoding or Lazaro; and FN is prey DNA detected by MCA when not detected by metabarcoding or Lazaro. Prey species that were detected by MCA but were not detected by Lazaro, because they did not have a sequence in the Lazaro DNA reference database, were not considered FNs. For the mock community, we did not need MCA to determine TP, FP, and FN detections because the predator feeding history was known. We calculated the theoretical limit of detection (LOD) of MCA for all of the species with positive controls by estimating the amount of whole organism template that could be detected at a C = 40, and calculating the upper 95% confidence interval of the geometric mean of the estimates. In addition, we estimated the amplification efficiency of the MCA primers to ensure that it was high enough to amplify rare template sufficiently to detect. The limit of detection of MCA is quite low, but if there is only a small amount of prey DNA left in the gut, it is also possible that the prey sequence targeted by the species-specific primer pairs of MCA is absent, while other sequences are detected by metabarcoding and or Lazaro. If this were occurring, then for a given prey there should be a lower read count in both metabarcoding and Lazaro when the MCA is negative than when it is positive. We tested this with the prey species for which there were >3 TP and FP detections in the sample libraries. Number of reads were ln-transformed and analyzed by the Welch t-test for unequal variance using the Welch-Satterthwaite equation to calculate degrees of freedom. To compare the performance of metabarcoding and Lazaro, for each library, we also estimated [56,57]: Sensitivity (or TP rate), which is the probability that a positive is detected: Sensitivity = TP/(TP + FN); Specificity (or TN rate), which is the probability that a negative is not detected: Specificity = TN/(TN + FP); False discovery rate (FDR), which is the probability that a detection is an FP: FDR = FP/(FP + TP); False omission rate (FOR), which is the probability that a non-detection is an FN: FOR = FN/(FN + TN); Accuracy, which is the probability that detections and non-detections are correct: Accuracy = (TP + TN)/(TP + TN + FP + FN). Higher sensitivity, specificity, and accuracy and lower FDR and FOR are indicative of a better method. Using the aforementioned definitions, we tested the following hypotheses: H1: Metabarcoding detects a higher number of TP prey species than Lazaro because the reference database is larger. We tested this by comparing the number of initial detections in a library and the proportion of TP detections after confirmation by MCA using a paired t-test with the 27 samples as independent observations, predicting that metabarcoding would have more TPs and a higher proportion of true detections; H2: Metabarcoding is more prone to FP prey detections because of amplification bias and the larger reference database. We tested this by comparing the FDR and specificity, predicting that metabarcoding would have a higher FDR and a lower specificity; H3: Lazaro is more prone to generate FNs because, lacking the amplification of rare prey DNA fragments, it would be less sensitive. We tested this by comparing the FOR and sensitivity, predicting that Lazaro would have a higher FOR and lower sensitivity; H4: Lazaro enables prey detection with finer taxonomic resolution because of the larger reference targets (e.g., mitogenomes) and coverage, which would reduce ambiguity in species identifications. We tested this by comparing the taxonomic resolution of the final prey identifications. H5: Number of reads for both metabarcoding and Lazaro are positively related to the probability of a TP across all prey species and to the relative qPCR template concentration for TP detections within prey species. We tested the first part of this hypothesis using logistic regression of the ln-transform of the number of reads on the binomial variate indicating TP detections by MCA versus FPs (logit link, binomial error) with Anova in the package car (type II Wald χ2) and glm in Base R. There were 109 observations for metabarcoding and 116 observations for Lazaro, and no significant overdispersion for either regression. We tested the second part of this hypothesis, i.e., the number of reads is related to the amount of prey DNA in a sample, using the estimated relative template concentration from the qPCR for prey species with ≥5 TP detections and variation in both variables of ≥0.5 order of magnitude. This was tested within prey species because amplification efficiency, baselines, and thresholds would be constant for the qPCR. There were 3 and 2 species tested for metabarcoding and Lazaro, respectively. We calculated the relative initial template concentration from the qPCR amplification curves using LinRegPCR (version 2017.1) with the estimated mean PCR efficiency for each primer pair [58]. Relative initial template concentrations were log10 transformed, number of reads was ln-transformed, and data were analyzed with Pearson correlation coefficients using the Fisher transformation to estimate P-values.

Results and Discussion

Prey detections from the mock community

After quality control, the 10 metabarcoding samples (predator guts or libraries) had a mean of 2,728,294 reads (8% coefficient of variation [CV]) and the 10 Lazaro samples had a mean of 5,355,167 reads (10% CV). None of the samples of the unfed control predators had any prey detected for either males or females for either metabarcoding or Lazaro. This indicates that extraneous DNA was unlikely to have contaminated the samples during and after the extraction process. Only Lazaro detected prey species in the mock community (Table 1; Accuracyfemale = 0.71, Accuracymale = 0.43). For some prey species, only 1 sex of the predator detected prey and with few reads (n = 2). Although ecoPCR [38] theoretically returned amplicons for the target prey species (maximum number of mismatches allowed per primer: −e = 2 and using the # feature to ensure perfect matches in the last 2 nucleotides at the 3′-end), and the metabarcoding reference database contained their 16S sequences, none of the prey were detected (Accuracyfemale = Accuracymale = 0), possibly owing to the mismatches in at least 1 primer of the pair (Supplementary Fig. S5) and preferential amplification of the more abundant predator DNA in the samples. Only the predator was detected by metabarcoding. This illustrates that the insufficient primer universality among taxa can preclude the detection (reduced sensitivity) of expected prey species and that the use of multiple barcodes may be preferable.
Table 1

: Number of reads detected for the mock community by metabarcoding (16S barcode) and Lazaro using the harlequin Harmonia axyridis (unfed for 48 h after adult emergence) as predator and 7 prey species, consumed within 1 hour. Gut contents of the predators were analyzed within 6 hours after feeding on all 7 prey species. The threshold used for Lazaro was 100% identity in a minimum overlap of 130 bp and for metabarcoding was 98% identity for an amplicon between 180-230 bp. Mb: metabarcoding; L: Lazaro. The number of reads detected for the control predators: female 114,033 by metabarcoding and 18,984 by Lazaro; male 130,134 by metabarcoding and 35,340 by Lazaro

Predator Prey
Predator sex Time (h) after feeding Harmonia axyridis Acrythosiphon pisum Aphis craccivora Aphis glycines Aphis gossypii Myzus persicae Cycloneda munda Plutella xylostella
Mb L Mb L Mb L Mb L Mb L Mb L Mb L Mb L
Female6128,773±11,71540,498±9,5570160260202000600
Male6451,571±371,43443,452±9,4440100400000100000
: Number of reads detected for the mock community by metabarcoding (16S barcode) and Lazaro using the harlequin Harmonia axyridis (unfed for 48 h after adult emergence) as predator and 7 prey species, consumed within 1 hour. Gut contents of the predators were analyzed within 6 hours after feeding on all 7 prey species. The threshold used for Lazaro was 100% identity in a minimum overlap of 130 bp and for metabarcoding was 98% identity for an amplicon between 180-230 bp. Mb: metabarcoding; L: Lazaro. The number of reads detected for the control predators: female 114,033 by metabarcoding and 18,984 by Lazaro; male 130,134 by metabarcoding and 35,340 by Lazaro Neither metabarcoding nor Lazaro made any FP detections, despite the use of comprehensive reference databases, which increases the likelihood of FP detections. In summary, both metabarcoding and Lazaro generated FNs, with more FN and fewer TP detections by metabarcoding than Lazaro. These results suggest that neither method on its own will detect all of the prey in a predator gut sample, but that Lazaro may sometimes be more accurate.

Prey detections from field sampled predators

After quality control, the 27 metabarcoding samples (predator guts or libraries) had a mean of 2,964,430 reads (12% CV) and the 27 Lazaro samples had a mean of 5,504,578 reads (14% CV). The list of the prey species detected for each method before confirmation by MCA is in Table 2. The presence of the DNA of each prey species in a predator gut sample was confirmed by MCA. Examples of the various positive and negative prey confirmations by MCA are illustrated in Fig. 1. The differentiation of a true and false prey detection was performed by observing the presence or absence of the sample Tm peak corresponding to the Tm of the positive control. For example, Solenopsis invicta TP control had a Tm of 76.5°C, and FP detections had Tm at 77.9, 79.2, 82.6, and 84.0°C. The theoretical LODs for detection by qPCR amplification were <1 pg of whole-organism DNA/technical replicate for 24 of the 30 detected prey species, and for 21 of these species it was <0.1 pg of whole-organism DNA/technical replicate (Supplementary Table S4). As the DNA templates for MCA are only a small proportion of the whole-organism DNA, the LODs indicate that MCA was very sensitive and unlikely to return an FN for the majority of prey species examined. The amplification efficiency varied from 1.893 to 1.997 for all of the species-specific primer pairs, which should result in sufficient amplicons for detection by MCA even when the template is rare. Nevertheless, 3 of the 30 detected prey species had higher LODs, which might have resulted in some FNs: Selenophorus alternans (LOD = 1.652 pg/technical replicate), Euschistus heros (LOD = 3.245 pg/technical replicate), and Cardiocondyla obscurior (LOD = 22.59 pg/technical replicate). Although unlikely, MCA could also give an FN when the prey DNA was present but so scarce that there was no MCA template in the sample. In this case, we reasoned that if MCA returned FN detections then the number of reads associated with MCA negatives should be smaller than the number of reads for MCA positives within a prey species. However, only 1 species (Pheidole tristis detected by Lazaro) had fewer reads associated with negative than with positive MCA detections (Supplementary Fig. S6), indicating that FN MCA detections were generally not a problem.
Table 2

: Species detected as prey of epigeal arthropod predators by metabarcoding, Lazaro or both in at least 1 of 27 libraries before confirmation by Melting Curve Analysis (MCA) in qPCR

OrderSpecies (Family)Detection method(s)No. readsNo. librariesPredator
Annelida Phascolosoma esculenta (Phascolosomatidae)Metabarcoding21,93813 Dorymyrmex brunneus, Tetracha sp.
Coleoptera Anthonomus grandis (Curculionidae)1Both946,28327 Pheidole flavens, Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Eriopis connexa (Coccinellidae)Both2,7065 Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Harmonia axyridis (Coccinellidae)1Both12,800,16027 Pheidole flavens, Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Selenophorus alternans (Carabidae)Both1,639,7469 Pheidole flavens, Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp.
Tetracha brasiliensis (Carabidae)Lazaro21 Solenopsis substituta
Dermaptera Doru luteipes (Forficulidae)1Both1,8087 Pheidole flavens, Dorymyrmex brunneus, Euborellia annulipes
Euborellia annulipes (Anisolabididae)1Both41,49212 Pheidole flavens, Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp.
Diptera Strongygaster triangulifera (Tachinidae)Both622 Tetracha sp., Euborellia annulipes
Hemiptera Chinavia impicticornes (Pentatomidae)1Both24,62110 Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Euschistus heros (Pentatomidae)1Both267,2095 Dorymyrmex brunneus, Tetracha sp.
Mahanarva spectabilis (Cercopidae)1Lazaro122 Solenopsis substituta, Tetracha sp.
Neomegalotomus parvus Metabarcoding3651 Tetracha sp.
Myzus persicae (Aphididae)Metabarcoding1,6962 Dorymyrmex brunneus
Planicephalus flavicosta (Cicadellidae)Metabarcoding3671 Dorymyrmex brunneus
Hymenoptera Atta sextans (Formicidae)Both2,2916 Dorymyrmex brunneus
Brachymyrmex patagonicus (Formicidae)Lazaro2411 Pheidole flavens, Dorymyrmex brunneus, Tetracha sp., Euborellia annulipes
Cardiocondyla obscurior (Formicidae)1Both5,5763 Pheidole flavens, Solenopsis substituta
Dorymyrmex brunneus (Formicidae)1Both3,82915 Pheidole flavens, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Pheidole flavens (Formicidae)1Both244,82015 Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Pheidole obscurithorax (Formicidae)1Both364 Pheidole flavens, Dorymyrmex brunneus
Pheidole oxyops (Formicidae)1Both9,015,22027 Pheidole flavens, Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Pheidole tristis (Formicidae)1Both2,795,07827 Pheidole flavens, Dorymyrmex brunneus, Solenopsis substituta, Tetracha sp., Euborellia annulipes
Solenopsis richteri (Formicidae)Metabarcoding75,5561 Solenopsis substituta
Solenopsis substituta (Formicidae)Both6089 Pheidole flavens, Dorymyrmex brunneus
Isoptera Syntermes spinosus (Termitidae)1Metabarcoding8971 Dorymyrmex brunneus
Lepidoptera Chrysodeixis includens (Noctuidae)1Both451 Tetracha sp.
Glena unipennaria (Geometridae)Both2801 Tetracha sp.
Spodoptera frugiperda (Noctuidae)1Metabarcoding8,44811 Dorymyrmex brunneus, Tetracha sp.
Orthoptera Gryllus argentinus (Gryllidae)Metabarcoding321 Dorymyrmex brunneus

Species with detection confirmed by MCA in ≥1 library. All these species are likely to occur in the sampling area/period.

Figure 1

: Confirmation of prey detection by Melting Curve Analysis (MCA) in qPCR with positive controls and NTCs (no template controls). The graphs represent a melting curve for some prey detected by metabarcoding, Lazaro or both. Predator samples are informed in the side legend. Green, yellow, and gray curves indicate positive identification of the respective prey, and red, magenta, and blue curves indicate negative identifications. Sample numbers designate library and technical replicate.

: Confirmation of prey detection by Melting Curve Analysis (MCA) in qPCR with positive controls and NTCs (no template controls). The graphs represent a melting curve for some prey detected by metabarcoding, Lazaro or both. Predator samples are informed in the side legend. Green, yellow, and gray curves indicate positive identification of the respective prey, and red, magenta, and blue curves indicate negative identifications. Sample numbers designate library and technical replicate. : Species detected as prey of epigeal arthropod predators by metabarcoding, Lazaro or both in at least 1 of 27 libraries before confirmation by Melting Curve Analysis (MCA) in qPCR Species with detection confirmed by MCA in ≥1 library. All these species are likely to occur in the sampling area/period. Initially, 30 prey were identified, all to species level, by both methods prior to confirmation by MCA (Table 3 and Supplementary Table S5). They were 6 species of Heteroptera, 10 Hymenoptera (Formicidae), 5 Coleoptera, 3 Lepidoptera, 2 Dermaptera, and 1 species of Diptera, Orthoptera, Isoptera, and Annelida. Metabarcoding and Lazaro initially detected a similar number of prey species (26 [87%] and 21 [81%], respectively, with 16 species in common), but metabarcoding resulted in more species detections per sample than Lazaro (7.85 ± 0.63 vs 6.78 ± 0.42, respectively, P = 0.0479; Table 3). There were 212 prey detections (i.e., some prey species were detected in more than 1 sample) in the metabarcoding samples, with ln-number of reads averaging 7.66 (range 0–14.44), and 183 prey detections in the Lazaro samples, with ln-number of reads averaging 2.90 (range 0.69–8.39). Of 30 prey species initially detected, 16 (53%) were confirmed by MCA as prey of the 5 epigeal predators (14 species by metabarcoding and 13 by Lazaro, with 11 species in common, i.e., 69% of confirmed prey species; Supplementary Table S5). Ten of the 14 species not confirmed by MCA were FP detections and 3 species (Atta sextans, Neomegalotomus parvus, and Strongygaster triangulifera) were not tested (Supplementary Table S5). Of the 11 FP species, 6 were detected by Lazaro in 16 prey detection instances (8.7% of initial detections), and 7 were detected by metabarcoding in 21 prey detection instances (9.9% of initial detections). Most of these FPs were not amplified by MCA or the replicates did not have a consistent Tm. In a few cases the replicates had a consistent Tm but at the wrong temperature. For example, all of the MCA replicates for the false detections of M. persicae gave a consistent signal with a sharp Tm peak, but the peak was >2°C lower than the Tm of the TP control (Fig. 1). This kind of FP might have resulted from taxonomic overclassification (i.e., erroneous detection of related species when the true species is absent in the reference database) [59]. These would give FP species identifications because they were identified beyond the resolution limitation of a reference database. Indeed, the metabarcode amplicon for the false detections of M. persicae also had high similarity (≥98%) with several other species of Macrosiphonini, the aphidid tribe of M. persicae (BLASTn search, Supporting Information 1) . Hence, some FPs might be a TP prey with a false species determination.
Table 3

: Number of prey species identified by metabarcoding and Lazaro before confirmation, and proportion verified by Melting Curve Analysis (MCA) in qPCR

Predator Species OriginalProportion verified by MCA
MbLBothMbLBoth
Pheidole flavens 6650.400.330.40
Pheidole flavens 5540.500.500.67
Pheidole flavens 6650.200.250.25
Pheidole flavens 6550.400.400.40
Pheidole flavens 5640.330.330.33
Pheidole flavens 5650.600.500.60
Pheidole flavens 4640.330.250.33
Pheidole flavens 4540.000.000.00
Pheidole flavens 5740.670.500.67
Pheidole flavens 4740.500.330.50
Pheidole flavens 4640.330.200.33
Pheidole flavens 5630.400.400.67
Dolymyrmex brunneus 9660.290.250.33
Dolymyrmex brunneus 10750.200.200.33
Dolymyrmex brunneus 8440.200.330.33
Dolymyrmex brunneus 6950.250.330.33
Dolymyrmex brunneus 10550.250.250.25
Dolymyrmex brunneus 11650.190.330.33
Dolymyrmex brunneus 12860.330.500.67
Dolymyrmex brunneus 10850.000.000.00
Dolymyrmex brunneus 10650.430.600.75
Dolymyrmex brunneus 8850.000.000.00
Dolymyrmex brunneus 8650.400.500.67
Dolymyrmex brunneus 13650.130.330.33
Solenopsis substituta 121180.430.500.50
Tetracha sp.1615100.450.550.50
Euborellia annulipes 10760.330.330.33
Mean 7.856.785.040.320.330.40
Standard Error 0.630.420.260.030.030.04

L: Lazaro; Mb: metabarcoding.

: Number of prey species identified by metabarcoding and Lazaro before confirmation, and proportion verified by Melting Curve Analysis (MCA) in qPCR L: Lazaro; Mb: metabarcoding. Another possible reason for FP detections is contamination of the samples after DNA extraction. This could occur during any of the post-extraction procedures, including library preparation and sequencing. If such contamination had occurred, either or both metabarcoding and Lazaro might detect the contaminant with substantial number of reads, but MCA would not because the contaminant would not be in the original DNA extracted sample. For example, many of the FP detections of the coleopterans Anthonomus grandis (19 samples, 47,626 metabarcoding reads, 142 Lazaro reads; Supplementary Table S5) and H. axyridis (8,919,830 metabarcoding reads, 18 samples, 1,190 Lazaro reads; Supplementary Table S5) had these characteristics and may be post-extraction contaminants. The presence of A. grandis at the experimental site was unlikely because it is a specific cotton herbivore and none of the experimental plots had cotton; however, this species is mass-reared in a nearby laboratory, which may have been the source of contamination. FP detections can also occur when the prey species is closely related to the predator. An example was the false detection by metabarcoding of Solenopsis richteri in the gut content of the predator Solenopsis substituta (Supplementary Table S5). Regarding FN prey detection, there were 11 FNs for metabarcoding and 11 FNs for Lazaro (Table 4). For example, Mahanarva spectabilis was not detected as prey by metabarcoding in 3 samples, and Spodoptera frugiperda was not detected as prey by Lazaro in 2 samples (Supplementary Table S5), but they were detected by MCA. FNs could have been generated for rare prey in the samples with a large number of pooled individual predators (Table 5). For example, for the Lazaro samples, coverage ranged from 20,000 to 120,000 reads/predator, and the number of detected prey reads was only 1.3–29.2/predator. Thus, rare prey may be missed (FN) in the Lazaro samples. The metabarcoding samples had coverage ranging from 11,000 to 64,000 amplicons/predator, and the number of detected prey amplicons ranged from 114 to 40,000/predator. Thus, rare prey may have been missed because of the large number of individuals in a sample. However, because the 3 ant species are known to recruit large number of individuals to harvest prey, rare prey are likely to occur in multiple individuals and may be unlikely to be missed. Moreover, if FNs were primarily related to missing rare prey, then the FOR should be negatively correlated with coverage or prey detection. This was not observed (Table 5 and Supplementary Table S6); hence, while some rare prey may have been missed, they were equally likely to have been missed by both metabarcoding and Lazaro. Finally, it is also possible that FP and FN detections can occur because the taxonomy of the reference genetic material at GenBank was incorrect.
Table 4

: False negative (FN) and false positive (FP) species detected by metabarcoding, Lazaro, or both

Metabarcoding Lazaro Both
False negatives Pheidole obscurithorax Syntermes spinosus 1 Cardiocondyla obscurior
Chrysodeixis includens
Mahanarva spectabilis
Solenopsis richteri
Spodoptera frugiperda
False positives Euschistus heros Brachymyrmex patagonicus Anthonomus grandis
Gryllus argentinus 1 Pheidole obscurithorax Cardiocondyla obscurior
Myzus persicae Solenopsis substituta Doru luteipes
Phascolosoma esculenta 1 Tetracha sp. Dorymyrmex brunneus
Planicephalus flavicosta 1 Eriopis connexa
Solenopsis richteri Glena unipennaria
Spodoptera frugiperda Harmonia axyridis
Pheidole oxyops
Pheidole tristis
Selenophorus alternans

Species that did not have a mitogenome deposited at the GenBank.

Table 5

: Number of reads detected per predator in the total sample and for detected prey, and false omission rate for each predator species

Total sampleDetected preyFalse omission rate
Predator speciesNo. of samplesPredators/sampleMetabarcoding reads/predatorLazaro reads/predatorMetabarcoding reads/predatorLazaro reads/predatorMetabarcodingLazaro
Dorymyrmex brunneus 1210029,64455,04611,7974.20.0580.047
Euborellia annulipes 14664,444119,6651149.10.2500.250
Pheidole flavens 1220014,82227,5234,86710.20.0440.058
Solenopsis substituta 127310,85920,1633171.30.2860.167
Tetracha sp.14960,499112,33839,60029.20.2000.167
: False negative (FN) and false positive (FP) species detected by metabarcoding, Lazaro, or both Species that did not have a mitogenome deposited at the GenBank. : Number of reads detected per predator in the total sample and for detected prey, and false omission rate for each predator species Species identifications have been demonstrated to differ when using different DNA extraction protocols, DNA polymerases, amplification parameters, reference databases or barcodes, and even when using different primers from the same barcode [9,13, 60–63]. The results from our mock community also showed that different DNA-based detection methods differed in the species identified. So, the incongruent prey species detection between metabarcoding and Lazaro may not be unusual and additional possible reasons are discussed below. Nonetheless, the underlying consequence is that the ecological inferences are likely to be affected by the prey detection method used as the predator food webs would have different structures (Fig. 2). Specifically, the food webs of 3 of the predators (Ph. flavens, T. brasiliensisandD. brunneus) would be different using only metabarcoding or Lazaro. These results highlight the need for precaution when comparing the data between eDNA studies to enable robust ecological comparisons [64,65]. Indeed, because the 2 methods appear to be partially complementary, using both may provide more robust results.
Figure 2

: Qualitative food web of the 5 epigeal predators (top of figure) detected by metabarcoding only (blue links), Lazaro only (red links), or both (black links) and confirmed by melting curve analysis (MCA) in qPCR. Predation is indicated by the arrow direction. Height of a species indicates its relative trophic level . Predator species are E.ann = Euborellia annulipes; S.sub = Solenopsis substituta; P.fla = Pheidole flavens; T.bra = Tetracha brasilensis; D.bru = Dorymyrmex brunneus. Extra- or intraguild prey are Do.lut = Doru luteipes; C.obs = Cardiocondyla obscurior; P.oxy = Pheidole oxyops; P.tri = Pheidole tristis; P.obs = Pheidole obscurithorax; C.imp = Chinavia impicticornes; Eu.her = Euschistus heros; M.spe = Mahanarva spectabilis; Ch.inc = Chrysodeixis includens; Sp.fru = Spodoptera frugiperda; H.axy = Harmonia axyridis; A.gra = Anthonomus grandis; Sy.spi = Syntermes spinosus.

: Qualitative food web of the 5 epigeal predators (top of figure) detected by metabarcoding only (blue links), Lazaro only (red links), or both (black links) and confirmed by melting curve analysis (MCA) in qPCR. Predation is indicated by the arrow direction. Height of a species indicates its relative trophic level . Predator species are E.ann = Euborellia annulipes; S.sub = Solenopsis substituta; P.fla = Pheidole flavens; T.bra = Tetracha brasilensis; D.bru = Dorymyrmex brunneus. Extra- or intraguild prey are Do.lut = Doru luteipes; C.obs = Cardiocondyla obscurior; P.oxy = Pheidole oxyops; P.tri = Pheidole tristis; P.obs = Pheidole obscurithorax; C.imp = Chinavia impicticornes; Eu.her = Euschistus heros; M.spe = Mahanarva spectabilis; Ch.inc = Chrysodeixis includens; Sp.fru = Spodoptera frugiperda; H.axy = Harmonia axyridis; A.gra = Anthonomus grandis; Sy.spi = Syntermes spinosus. Operational taxonomic unit analysis could be conducted on the reads that did not match satisfactorily with any species in the DNA reference databases to complement the prey diversity analysis. The number of “unassigned” reads was fairly high in both methods but, not surprisingly, more prominent in Lazaro. The percent of reads of confirmed prey detections across all 27 predator samples was 45% for metabarcoding and <1% for Lazaro. The majority of the unassigned reads were related to the predator DNA (e.g., nuclear DNA), even though we reduced the amount of predator biomass by gut dissection or gaster removal (for ants). Another part of the unassigned reads could be related to predator symbionts or parasites or other exogenous species that were not present in the reference databases used in this work. Although high, the 99% of unassigned reads for Lazaro is not unexpected for 2 reasons: we only worked with arthropod mitochondrial reads and the predator reads were not counted as assigned. It is known that the proportion of mitochondrial reads obtained in next-generation sequencing of whole macerated organisms or tissues without in vitro mitochondrial enrichment is only ∼1% [19,29]. The predator mitochondrial reads should not be considered “unassigned” reads in the strict sense; nevertheless they were not included in the assigned reads because there is no means to differentiate the predator reads from reads of a cannibalized conspecific. In a similar way, the prevalence of 55% of unassigned reads for the metabarcoding data was not unexpected because no specific predator-blocking primers were used to preclude or minimize amplification of the predator template. Similar to Lazaro, most of the unassigned reads were from predator 16S amplicons. Our choice to not use specific predator-blocking primers was based on Piñol et al. [66], who demonstrated that predator-blocking primers may co-block the amplification of prey species closely related to the predator, which is critical when analyzing diet composition of arthropod generalist predators.

Metabarcoding versus Lazaro

To compare which method, metabarcoding or Lazaro, resulted in better prey determination, we evaluated the sensitivity, specificity, FDR, FOR, and accuracy of prey determination for the field-sampled predators and accuracy of prey determination for the mock community. In addition, for the field-sampled predators we determined the relation between the number of reads for a prey and the amount of prey DNA in the samples (feeding bioassay controls were not analyzed this way because metabarcoding did not detect prey reads). Specifically, we tested the 5 hypotheses discussed below. H. Contrary to this hypothesis, in the field-sampled predators, metabarcoding and Lazaro had a similar number of confirmed prey per sample (1.81 ± 0.24 and 1.85 ± 0.23, respectively, P = 0.6632) and of the initial prey tested by MCA, the same FDR (0.68 ± 0.033 vs 0.67 ± 0.031, respectively; P = 0.3929, Table 6). The rejection of H1 was corroborated by the results from the mock community for which no TPs were detected by metabarcoding.
Table 6

: Sensitivity, specificity, false discovery rate, false omission rate, and accuracy for prey determinations in field-collected predators by metabarcoding and Lazaro, with paired t-test and P-value

 ParameterSensitivitySpecificityFalse discovery rateFalse omission rateAccuracy
Metabarcoding (95% CI)0.806 (0.059)0.577 (0.026)0.683 (0.033)0.073 (0.019)0.622 (0.024)
Lazaro (95% CI)0.814 (0.059)0.618 (0.022)0.666 (0.031)0.068 (0.019)0.663 (0.020)
t 26 0.27261.1514−0.8689−0.45961.3881
P-value0.78740.26010.39290.64960.1769
: Sensitivity, specificity, false discovery rate, false omission rate, and accuracy for prey determinations in field-collected predators by metabarcoding and Lazaro, with paired t-test and P-value TP detections might be increased for metabarcoding by sequencing PCR replicates separately for each barcode per sample [67], using a multi-level assignment approach [68] or pre-testing the metabarcode primer to ensure amplification of expected prey and reduced amplification of the predator [69]. This last approach may hamper the detection of prey species closely related to the predator. While there are a number of publications showing that the use of multiple metabarcode primers increases the diversity of prey detection (e.g., [40]), they typically do not evaluate whether false prey detection also increases, as might be expected. In addition, there is no guarantee that the use of multiple primers will always improve metabarcoding performance. To illustrate this point, we show the putative mismatches of the 16S barcode primer pair that we used and 3 commonly used COI barcode primer pairs (LCO1490 and HCO2198 [70]; mICOIintF and jgHCO2198 [6]; UniMiniBar [60]) with the prey species used in the mock community and detected in the field samples (Supplementary Fig. S5). LCO1490/HCO2198 and UniMiniBar had a higher number of template mismatches than the 16S primers and probably would have detected fewer prey species than the 16S primer pair. The forward primer of the other COI primer pair had mismatches in the critical last 3 bases of the 3′-end with all 7 prey species in the mock community and in 8 of 29 species detected in the field samples (28%). TP detections might be increased for Lazaro by using multiple reference databases. In this work, we used only a mitogenome reference database, but it is also possible to use rDNA and symbionts [28, 29] and even unassembled reads [24] in a reference database. These databases can be constructed and used at any time after sequencing, unlike metabarcoding. H. Both the FDR and the specificity (0.58 ± 0.026 versus 0.62 ± 0.022, respectively; P = 0.2601, Table 6) were similar for metabarcoding and Lazaro, so H2 was rejected. While it was true that metabarcoding detected on average 350 times more true prey reads than Lazaro (Supplementary Table S5), this did not convert into a higher detection of TP or a significantly higher rate of FP prey detections compared to Lazaro. H. For the field-sampled predators, FOR for Lazaro was similar to that for metabarcoding (0.068 ± 0.019 versus 0.073 ± 0.019, respectively; P = 0.649; Table 6). Similarly, Lazaro did not have lower sensitivity than metabarcoding (0.814 ± 0.059 versus 0.806 ± 0.059, respectively; P = 0.7874; Table 6), so H3 was rejected. The rejection of H3 was corroborated by the results from the mock community. A factor that may have contributed to FN detections in some predators is that prey DNA is in an advanced state of degradation, precluding PCR amplification (in the case of metabarcoding) or being excluded by size selection during library construction, but still possible to be detected by MCA in the original sample DNA because of the smaller length of the target amplicon (100–200 bp, Supplementary Table S3). In the case of metabarcoding, it could also be related to insufficient complementarity between template and metabarcoding primers, precluding the representation of a prey species or taxonomic group in the sample, or preferential amplification of the more common predator DNA, resulting in poor amplification of prey DNA. Realistically, it is likely that the number of FNs might be even higher, because we could not check all species co-occurring in the sample area. With H1, H2, and H3 rejected, it follows that metabarcoding and Lazaro had similar accuracy in prey detection (0.62 ± 0.024 versus 0.66 ± 0.020, respectively; P = 0.1769; Table 6). This differs from the results from the mock community and in Srivathsan et al. [26]. The accuracy in the mock community was 0 for metabarcoding versus 0.43–0.71 for Lazaro (log-linear model g2 = 9.36, P = 0.0022). Srivathsan et al. [26] compared metabarcoding and metagenomics (using BLASTn) to identify diet composition by fecal analysis (host plant chloroplasts) of 2 red-shanked doucs langurs (Pygathrix nemaeus) fed with a known diet. While metabarcoding detected 34% of the diet composition, metagenomics detected 50% of the known diet plus an unexpected species that was later confirmed to be in the diet. H. For the field-sampled predators, all confirmed species identifications were at the species level for both methods. Thus, in our field samples, taxonomic resolution was the same, and H4 was rejected. H. Logistic regression showed that the probability of the TP was not related to the number of reads for metabarcoding (regression coefficient = 0.11±0.07, χ2 = 2.52, P = 0.1122) but was highly positively related to the number of reads for Lazaro (regression coefficient = 0.49±0.12, χ2 = 15.87, P = 6.798E−5). The tests determined whether the number of prey reads was correlated with the amount of prey in the gut of the predator, as measured by qPCR (Table 7). For Lazaro, H5 was accepted because there was a positive correlation between the number of reads and the relative qPCR template concentration in the samples for both species that could be analyzed. However, for metabarcoding, H5 was rejected for 3 of the 4 species tested because there was no correlation between the number of reads and the relative template concentration for these species. The quantitative interpretation of number of reads from the metabarcoding results has been controversial [71-76], and our results provide some support for the argument that the number of metabarcoding reads is an unreliable predictor of the DNA quantity in a sample but the number of Lazaro reads might be a good predictor.
Table 7

: Pearson correlations between relative initial qPCR template concentration and ln number of reads for TP detections

r z-score P
Metabarcoding
Harmonia axyridis −0.306−0.8930.3718
Pheidole tristis 0.8651.8870.0592
Pheidole flavens −0.270−0.9200.3577
Spodoptera frugiperda 0.3050.4190.6752
Lazaro
Harmonia axyridis 0.6051.9850.0472
Pheidole tristis 0.9622.0270.0427
: Pearson correlations between relative initial qPCR template concentration and ln number of reads for TP detections In terms of cost, metabarcoding has the potential to cost roughly half that of Lazaro. In this study, we chose not to multiplex the 27 samples for metabarcoding analysis to keep the sequencing depth per sample similar between methods. The costs that were the same for both methods were sample preparation (USD10), total DNA extractions (USD100), library construction (USD85.50/each), and HiSeq4000 sequencing lane (USD4,310). For metabarcoding, there were additional costs for primer synthesis, PCR reactions, purification and quantifications for each sample, which were estimated to be USD160 total. If we had multiplexed the 27 purified sample amplicons in 1 library, the total cost of metabarcoding would have been USD2,510.50. For the Lazaro method, samples cannot be multiplexed and the total cost was USD4,573.50.

Conclusions

Metabarcoding and Lazaro identified a range of prey species that were preyed upon by arthropod epigeal predators, but they were partially complementary methods sharing 87% of TP detections. Both methods crucially depended on the comprehensiveness of their respective DNA reference databases, which for metabarcoding was undeniably larger. Even so, Lazaro determined prey with similar specificity, sensitivity, FDR, FOR, accuracy, and taxonomic resolution as metabarcoding. The use of multiple barcode primers in the metabarcoding analysis could render higher sensitivity, although it could also increase FP detections (reduce specificity) as each primer carries its own associated bias. One may prefer Lazaro because it preserves qualitatively and quantitatively the original sample DNA community, enabling further search for other targets (e.g., host plants, symbionts, parasites) using any other DNA reference database(s), while for metabarcoding prey detection is constrained by the initial chosen barcodes. In addition, for Lazaro, the number of reads is associated with the quantity of prey DNA in the gut of the predators, enabling broader ecological inferences. However, one may prefer metabarcoding as it remains less expensive than Lazaro for processing a large set of samples because they can be multiplexed in a library.

Data Availability

Library sequencing datasets were deposited at GenBank, and their SRA access codes are in Supporting Information 1. Supporting data and materials are available in the GigaDB database [43].

Additional Files

Supporting Information 1. Contains mtDNA elucidation, SRA access codes, Lazaro bioinformatic workflow, MCA tests of cross-reactivity of primers, BLASTn alignments of species related to Myzus persicae, Figures S1-S6. Supporting Information 2.Contains Tables S1-S6. Supplementary Figure S1. Cross-reactivity of primers designed to detect two species of Carabidae against known positive control samples. Supplementary Figure S2. Cross-reactivity of primers designed to detect Chinavia spp. against known positive control samples. Supplementary Figure S3. Cross-reactivity of primers designed to detect lepidopteran species against known positive control samples. Supplementary Figure S4. Cross-reactivity of primers designed to detect ant species against known positive control samples. Supplementary Figure S5. Annealing of the some metabarcoding primers in the prey species used in the mock community and detected by metabarcoding, before MCA validation. Supplementary Figure S6. Ratio of true to false positive reads for prey species detected by metabarcoding and Lazaro. Supplementary Table S1. Mitochondrial DNA elucidated and annotated to add to the Lazaro reference database for prey identification. Supplementary Table S2. Source of true positive prey samples for the Melting Curve Analysis (MCA) in qPCR. Supplementary Table S3. Species specific primers (5’>3’) used for MCA-qPCR validation of the prey detected by metabarcoding or Lazaro. Supplementary Table S4. Theoretical limit of detection for MCA primers as picograms of whole organism DNA/technical replicate. Supplementary Table S5. Original read number and result of MCA-qPCR confirmation on the detection of prey in 27 libraries of epigeal predators. Supplementary Table S6. Pearson correlations, Fisher z-score transformations, and P-values for the relation between predator false ommision rates and reads (or amplicons per predator; or verified prey for the field collected predators). Click here for additional data file. Click here for additional data file. Click here for additional data file. Tuomas Kankaanpää, Ph.D. -- 10/22/2021 Reviewed Click here for additional data file. Jordan Patrick Cuff, BSc MRes PhD -- 10/25/2021 Reviewed Click here for additional data file. Jordan Patrick Cuff, BSc MRes PhD -- 12/9/2021 Reviewed Click here for additional data file. Click here for additional data file.

Abbreviations

BLAST: Basic Local Alignment Search Tool; bp: base pairs; CV: coefficient of variation; eDNA: environmental DNA; FDR: false discovery rate; FN: false negative; FOR: false omission rate; FP: false positive; IUPAC: International Union of Pure and Applied Chemistry; LOD: limit of detection; MCA: melt curve analysis; NCBI: National Center for Biotechnology Information; SRA: Sequence Read Archive; TN: true negative; TP: true positive.

Competing Interests

The authors declare that they have no competing interests.

Funding

This work was funded by the United States Department of Agriculture-National Institute of Food and Agriculture (USDA-NIFA) grant number 2016–67030-24950.

Authors' Contributions

Design of study: D.P.P., D.A.A., R.M.P., M.R.B. Collection and preparation of samples: S.K.A.B., R.M.P., D.P.P. Data analyses (bioinformatic, qPCR, statistics): R.C.T., D.P.P., D.A.A. Writing of the manuscript: D.P.P., D.A.A., R.C.T., R.M.P.
  58 in total

1.  Can multiple-copy sequences of prey DNA be detected amongst the gut contents of invertebrate predators?

Authors:  R H Zaidi; Z Jaal; N J Hawkes; J Hemingway; W O Symondson
Journal:  Mol Ecol       Date:  1999-12       Impact factor: 6.185

2.  Statistical assignment of DNA sequences using Bayesian phylogenetics.

Authors:  Kasper Munch; Wouter Boomsma; John P Huelsenbeck; Eske Willerslev; Rasmus Nielsen
Journal:  Syst Biol       Date:  2008-10       Impact factor: 15.683

3.  obitools: a unix-inspired software package for DNA metabarcoding.

Authors:  Frédéric Boyer; Céline Mercier; Aurélie Bonin; Yvan Le Bras; Pierre Taberlet; Eric Coissac
Journal:  Mol Ecol Resour       Date:  2015-05-26       Impact factor: 7.090

4.  Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods.

Authors:  J Piñol; G Mir; P Gomez-Polo; N Agustí
Journal:  Mol Ecol Resour       Date:  2014-12-23       Impact factor: 7.090

Review 5.  The role of replicates for error mitigation in next-generation sequencing.

Authors:  Kimberly Robasky; Nathan E Lewis; George M Church
Journal:  Nat Rev Genet       Date:  2013-12-10       Impact factor: 53.242

6.  DADA2: High-resolution sample inference from Illumina amplicon data.

Authors:  Benjamin J Callahan; Paul J McMurdie; Michael J Rosen; Andrew W Han; Amy Jo A Johnson; Susan P Holmes
Journal:  Nat Methods       Date:  2016-05-23       Impact factor: 28.547

7.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

8.  Can DNA-Based Ecosystem Assessments Quantify Species Abundance? Testing Primer Bias and Biomass--Sequence Relationships with an Innovative Metabarcoding Protocol.

Authors:  Vasco Elbrecht; Florian Leese
Journal:  PLoS One       Date:  2015-07-08       Impact factor: 3.240

9.  Bulk de novo mitogenome assembly from pooled total DNA elucidates the phylogeny of weevils (Coleoptera: Curculionoidea).

Authors:  Conrad P D T Gillett; Alex Crampton-Platt; Martijn J T N Timmermans; Bjarte H Jordal; Brent C Emerson; Alfried P Vogler
Journal:  Mol Biol Evol       Date:  2014-05-06       Impact factor: 16.240

10.  Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects.

Authors:  Vasco Elbrecht; Pierre Taberlet; Tony Dejean; Alice Valentini; Philippe Usseglio-Polatera; Jean-Nicolas Beisel; Eric Coissac; Frederic Boyer; Florian Leese
Journal:  PeerJ       Date:  2016-04-19       Impact factor: 2.984

View more
  1 in total

1.  Metabarcoding versus mapping unassembled shotgun reads for identification of prey consumed by arthropod epigeal predators.

Authors:  Débora Pires Paula; Suellen Karina Albertoni Barros; Rafael Major Pitta; Marliton Rocha Barreto; Roberto Coiti Togawa; David A Andow
Journal:  Gigascience       Date:  2022-03-24       Impact factor: 6.524

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.