| Literature DB >> 31363753 |
Alexander M Piper1,2, Jana Batovska1,2, Noel O I Cogan1,2, John Weiss1, John Paul Cunningham1, Brendan C Rodoni1,2, Mark J Blacket1.
Abstract
Trap-based surveillance strategies are widely used for monitoring of invasive insect species, aiming to detect newly arrived exotic taxa as well as track the population levels of established or endemic pests. Where these surveillance traps have low specificity and capture non-target endemic species in excess of the target pests, the need for extensive specimen sorting and identification creates a major diagnostic bottleneck. While the recent development of standardized molecular diagnostics has partly alleviated this requirement, the single specimen per reaction nature of these methods does not readily scale to the sheer number of insects trapped in surveillance programmes. Consequently, target lists are often restricted to a few high-priority pests, allowing unanticipated species to avoid detection and potentially establish populations. DNA metabarcoding has recently emerged as a method for conducting simultaneous, multi-species identification of complex mixed communities and may lend itself ideally to rapid diagnostics of bulk insect trap samples. Moreover, the high-throughput nature of recent sequencing platforms could enable the multiplexing of hundreds of diverse trap samples on a single flow cell, thereby providing the means to dramatically scale up insect surveillance in terms of both the quantity of traps that can be processed concurrently and number of pest species that can be targeted. In this review of the metabarcoding literature, we explore how DNA metabarcoding could be tailored to the detection of invasive insects in a surveillance context and highlight the unique technical and regulatory challenges that must be considered when implementing high-throughput sequencing technologies into sensitive diagnostic applications.Entities:
Keywords: alien species; bioinformatics; biosecurity; biosurveillance; controls; early detection; non-destructive; quality assurance; reference database; validation
Mesh:
Substances:
Year: 2019 PMID: 31363753 PMCID: PMC6667344 DOI: 10.1093/gigascience/giz092
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Methods used for insect identification, with suitability assessed according to accuracy, expertise, general applicability, time, and throughput criteria
| Identification method | Taxonomic expertise | Identify specific taxa | Identify broad range of taxa | Throughput level | Time per identification |
|---|---|---|---|---|---|
|
| |||||
| Microscopic examination | High | High | High | Low | Moderate |
|
| |||||
| PCR–restriction fragment length polymorphism | Low | Moderate | Low | Moderate | Moderate |
| DNA barcoding | Low | High | High | Low | Moderate |
| Quantitative PCR/droplet digital PCR | Low | High | Low | High | Low |
| Loop-mediated isothermal amplification | Low | High | Low | Low | Low |
| Metabarcoding | Low | High | High | Very high | Low |
*This morphological identification score assumes a high level of taxonomic knowledge and a low human error rate.
Figure 1:Metabarcoding in the literature. (A) Published articles obtained from Scopus, Crossref, and PubMed searches on 6 June 2019 for all metabarcoding studies, and those containing keywords in title or abstract relevant to invasive insect surveillance. (B) Sequencing platforms used in the above metabarcoding studies displayed as a proportion for each year.
Figure 2:Overview of common metabarcoding workflows for identification of trapped insect species
Comparison of sequence throughputs, error rate, and associated costs among high-throughput sequencing platforms
| Short-read platforms | Long-read platforms | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Illumina MiSeq | Illumina NextSeq | Illumina HiSeq 3000/4000 | Illumina NovaSeq | MGISeq-200 | MGISeq-2000 | MGISeq-T7 | PacBio Sequel | PacBio Sequel II | ONT MinION | ONT PromethION | |
| Maximum throughput (Gb) | 15 | 120 | 750/1,500 (8/16 lanes) | 6,000 (8 lanes) | 60 | 1,080 | 6,000 | 20 | 160 | 20 | 150 per flow cell (up to 48) |
| Maximum read length | 2 × 300 bp | 2 × 150 bp | 2 × 150 bp | 2 × 150 bp | 2 × 100 bp | 2 × 150 bp | 2 × 150 bp | ∼100 kb | ∼100 kb | ∼2 Mb | ∼2 Mb |
| Error rate | Low | Low | Low | Low | Low | Low | Low | Low (consensus error) | Low (consensus error) | High | High |
| Instrument cost | Low | Medium | High | High | Low | Medium | High | High | High | Extremely low | Low |
| Set-up time (labour) | Medium | Medium | Medium | Medium | Medium | Medium | Medium | High | High | Low | Low |
| Run time (hours) | 56 | 30 | 84 | 40 | <48 | <48 | 24 | 15 | 15 | 1–72 | 1–72 |
| Sequencing cost per sample | <$50 | <$15 | <$10 | <$5 | <$50 | <$10 | <$5 | <$25 | <$15 | <$25 | <$5 |
*Costs are presented in Australian Dollars (AUD) and consider chemistry cost, depreciation, servicing, and computational cost over the lifespan of the instrument; however, total costs and read lengths will further depend on target enrichment and library preparation methods used.
†Assuming pooled sequencing of many traps with 250-Mb sequencing effort per sample.
Figure 3:DNA barcodes in public reference databases. (A) Global distribution of all sufficiently annotated DNA barcode records from BOLD and GenBank for all barcode loci; records for all Insecta are displayed as a density map, while those species present on international pest lists are overlaid in red. (B) Distribution of records and unique species within major public databases for the 10 barcode markers with the most reference information for entire Insecta and for (C) Insecta species present on international pest lists.
Figure 4:Unique dual indexing overcomes issues of cross-contamination due to index-switching. (A) An amplified barcode locus with sequencing adapters attached; read locations and orientations are indicated for commonly used Illumina MiSeq platform. Reads 1 and 2 are designed to overlap to facilitate assembly into a consensus sequence. Both sequencing adapters incorporate a unique oligonucleotide index sequence to allow differentiation of multiplexed samples. Strategies for indexing include (B) combinatorial indexing, where indices on either end of the molecule are shared with other samples but the combination of the two is unique, and (C) unique dual indexing, where adapter indices at both ends of the molecule are completely unique to the sample.
Recommended quality control checkpoints for metabarcoding-based diagnostics
| Category | Quality control checkpoint | Consequences |
|---|---|---|
| Laboratory preparedness | Are all reagents within expiry date and stored properly? | Poor reagent storage can lead to reduced efficiency and false-negative results |
| Is equipment appropriately maintained and calibrated? | Poorly calibrated equipment will generate inconstancies and inaccurate data | |
| Have laboratory surfaces been decontaminated and swipe testing of laboratory surfaces been conducted? | Dirty laboratories can be a source of DNA contamination, leading to lowered sensitivity or false-positive results | |
| Sample acceptance | Have specimens arrived in a condition appropriate for extracting DNA? | Inappropriately stored specimens can lead to false-negative results and a reduction in sensitivity |
| Are specimens traceable to origin location? | Misidentification of sample origin can complicate detection response | |
| Nucleic acid extraction | Is DNA of sufficient quantity and quality? | Insufficient DNA quantity or presence of contaminants can inhibit reactions and result in false-negative results |
| Marker enrichment | Are the correct fragment sizes present for the target barcode marker? | Incorrect fragment sizes could indicate off-target amplification |
| Have the positive control samples successfully amplified? | Absence of product in positive controls indicates amplification failure | |
| Are negative control samples free of DNA fragments? | Visible DNA fragments in negative controls indicates contamination | |
| Library preparation and multiplexing | Are libraries of the appropriate size and concentration? | Libraries of significantly different sizes or concentrations will complicate multiplexing |
| Have sets of unique dual indices been used? | Unique dual indexing is necessary to control for index-switching | |
| Have index sets been alternated since the previous sequencing run? | Cross-contamination of libraries between sequencing runs can cause false-positive results | |
| High-throughput sequencing | Has the pooled library been appropriately sized and quantified? | Inaccurate sizing and quantification can cause overloading of flow cell and failed runs, or underloading and low data output |
| Has the sequencer been appropriately cleaned between runs? | Insufficient cleaning of the sequencer can result in cross-contamination between runs | |
| De-multiplexing and quality trimming | Has minimum sequencing depth been achieved for each sample? | Low sequencing depth can cause false-negative results |
| Are an appropriate number of reads passing quality filtering? | Low numbers of reads passing quality filters can indicate issues with sequencing run and result in false-negative results | |
| OTU clustering and denoising | How much of the original data are explained by the final OTUs/ASVs | Lower-than-expected sequences can indicate overly restrictive bioinformatics parameters |
| Have chimeras and sequences with disrupted open reading frames been checked for? (for protein coding genes) | Chimeras and pseudogenes can inflate taxonomic diversity, leading to false-positive results | |
| Taxonomic assignment | Has the reference database been curated to remove mislabelled taxonomy and pseudogenic sequences? | Mislabelled reference sequences can lead to both false-positive and false-negative results |
| Has the taxonomy been applied with appropriate confidence levels? | Low-confidence assignment indicates incomplete or erroneous reference database | |
| Interpretation of results | Have the taxa received an appropriate number of reads to pass detection threshold? | Taxa under detection threshold could represent laboratory or reagent contamination, or erroneous sequences that have not been sufficiently controlled for |
| Has a minimum detection threshold been applied to remove index-switching? | Index-switching can cause spreading of taxa to other samples and result in false-positive results | |
| Are there any taxa that need to be confirmed with alternative methods? | Any high-risk putative detections should be confirmed with alternative method before reporting, if possible | |
| Reporting and sign-off | Have any exceptions to laboratory standard operating procedure been made? | Non-compliances with standard operating procedure should be highlighted, and diagnostic confidence may be reduced |
| Have data been stored appropriately? | Archiving of data allows future re-analysis in case of disputed results | |
| Have results been signed off by competent individual? | Incorrect reporting or interpretation of significant taxa can lead to incorrect managment response |