| Literature DB >> 34587965 |
Amrita Srivathsan1, Leshon Lee1, Kazutaka Katoh2,3, Emily Hartop4,5, Sujatha Narayanan Kutty1,6, Johnathan Wong1, Darren Yeo1, Rudolf Meier7,8.
Abstract
BACKGROUND: DNA barcodes are a useful tool for discovering, understanding, and monitoring biodiversity which are critical tasks at a time of rapid biodiversity loss. However, widespread adoption of barcodes requires cost-effective and simple barcoding methods. We here present a workflow that satisfies these conditions. It was developed via "innovation through subtraction" and thus requires minimal lab equipment, can be learned within days, reduces the barcode sequencing cost to < 10 cents, and allows fast turnaround from specimen to sequence by using the portable MinION sequencer.Entities:
Keywords: Biodiversity discovery; Bioinformatics; Citizen science; DNA barcoding; MinION; Oxford nanopore; Species delimitation
Mesh:
Year: 2021 PMID: 34587965 PMCID: PMC8479912 DOI: 10.1186/s12915-021-01141-x
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Datasets generated in this study and the results of barcoding using ONTbarcoder at 200X coverage (Consensus by Length) and 100X coverage (Consensus by Similarity)
| Dataset name | Flow cell details | Raw reads/reads passing length threshold/reads of suitable length/ demultiplexed | Demultiplexing rate/# QC_compliant barcodes/# Filtered barcodes with 1 N/# Filtered barcodes with > 1 N /# Unreliable barcodes |
|---|---|---|---|
Mixed Diptera (658 bp, | R10.3: reused flow cell: 71 pores according to QC, but 500+ active during run Runtime: 27.5 h Guppy: 4.2.3+f90bd04 | 3,864,000/3,425,357/3,560,389/1,544,758 | 43.39%/495/2/5/8 Total success rate= 502/511 (98.2%) |
Afrotropical Phoridae (658 bp, | R10.3: new flow cell: QC: 1101 pores Runtime: 49.5 h Guppy: 4.0.11+f1071ce | 6,838,903/5,465,164/5,474,306/2,681,029 | 48.97%/3722/121/59/247 Total success rate= 3905/4275 (91.3%) |
Palaearctic Phoridae (658 bp, | R10.3: new flow cell: QC: 1239 pores Runtime: 47.5 h Guppy: 4.2.3+f90bd04 | 16,595,984/15,658,174/16,100,505/5,012,489 | 31.13%/8026/108/231/780 Total success rate= 8365/9932 (84.2%) |
Palaearctic Phoridae (313 bp, | R10.3: new flow cell: QC: 1297 pores Runtime: 37 h Guppy: 4.2.3+f90bd04 | 13,690,869/13,221,764/10,366,455/12,983,260/2,015,135 | 15.52%/8705/118/112/899 Total success rate= 8935/9929 (90%) |
Mixed Diptera Subsample (658 bp, | Flongle: new QC: 81 pores Runtime: 24 h Guppy: v 4.0.11+f1071ce | 294,896/222,189/190,952/33,270 | 17.42%/185/35/20/9 Total success rate= 240/257 (93.4%) |
Chironomidae (313 bp, | Flongle: new QC: 74 pores Runtime: 15 h Guppy: 4.2.3+f90bd04 | 560,062/525,087/504,621/108,574 | 21.52%/178/1/2/6 Total success rate= 181/191 (94.8%) |
Fig. 1Rapid recovery of accurate MinION barcodes over time (in hours, x-axis) (filtered barcodes: dark green = barcodes passing all 4 QC criteria, light green = one ambiguous base; lighter green = more than 1N, no barcode = white with pattern, 1 mismatch = orange, > 1 mismatch = red). The solid black line represents the number of barcodes available for comparison. White dotted line represents the amount of raw reads collected over time, blue represents the number of demultiplexed reads over time (plotted against Z-axis)
Quality assessment of barcodes generated by ONTbarcoder at 200X read coverage (Consensus by Length) and 100X coverage (Consensus by Similarity). The accuracy of MinION barcodes is compared with the barcodes obtained for the same specimens using Illumina/Sanger sequencing. Errors are defined as the sum of substitution or indel errors. Denominators are the total number of nucleotides assessed
| Dataset | No. of comparison barcodes | No. of barcodes with errors/No. of errors/% identity | # of Ns/%Ns |
|---|---|---|---|
| R10.3: Mixed Diptera: Sanger barcodes available | 476 | 2/10/99.997% | 19 (0.006%) |
| R10.3: Afrotropical Phoridae: Illumina barcodes availablea | 3316 | 23/48/99.995% | 284 (0.011%) |
| Flongle-Mixed Diptera Subsample: Sanger barcodes available | 231 | 5/8/99.994% | 91 (0.058%) |
a5 barcodes with very high distances from reference were excluded for R10.3: Afrotropical Phoridae dataset as they likely represent lab contamination (see Srivathsan, Hartop et al. [35])
Fig. 2Relationship between barcode quality and coverage. Subsetting the data to 5–200X coverage shows that there are very minor gains to barcode quality after 25–50X coverage. (filtered barcodes: dark green = barcodes passing all 4 QC criteria, light green = one ambiguous base; lighter green = more than 1 N, no barcode = white with pattern, 1 mismatch = orange, > 1 mismatch = red)
Fig. 3Read bin size distribution for six amplicon pools (color-coding as in Figs 1 and 2). Due to the very generous coverage for the “Mixed Diptera” dataset, we also use grey to show the bin size distribution after dividing the bin read totals by 5
Fig. 4Relationship between barcoding success and number of raw reads for six amplicon pools (191-9932 specimens; barcoding success rates 84–97%). Percentage of barcodes recovered is relative to the final estimate based on all data
Comparison of barcodes obtained by ONTbarcoder with NGSpeciesID (green=highest number of correct and lowest number of errors for datasets). NGSpeciesID was applied once to the demultiplexed reads obtained with ONTbarcoder and once to those obtained with minibar. Barcode calling was done using all demultiplexed reads as well as 200X subset. ONTbarcoder was run at 200X read coverage (Consensus by Length) and 100X coverage (Consensus by Similarity). The accuracy of MinION barcodes is only compared to reference barcodes obtained with Illumina/Sanger sequencing. Errors are defined as the sum of substitution and indel errors
| ONTBarcoder 200X | ONTBarcoder dem1 + NGSpeciesID All reads | ONTBarcoder dem1 + NGSpeciesID 200X | Minibar + NGSpeciesID All reads | Minibar + NGSpeciesID 200X | |
|---|---|---|---|---|---|
| | 72 (241) | 73 (241) | 123 (241) | 122 (241) | |
| | 70 | 71 | 121 | 120 | |
| | 169 (83/60/13/6/7) | 168 (88/55/13/7/5) | 118 (29/21/8/4/56) | 119 (31/20/8/4/56) | |
| | 33,270/32,978 | 19,511/15,935 | |||
| | 173 (478) | 353 (478) | 418 (478) | 453 (478) | |
| | 170 | 347 | 413 | 447 | |
| | 305 (126/97/48/17/17) | 125 (91/24/5/1/4) | 60 (45/9/3/0/3) | 25 (20/2/0/0/3) | |
| | 1,544,768/1,532,408 | 1,540,751/1,378,910 | |||
| | 2832 (3339) | 3111 (3341) | 3193 (3340) | 3245 (3340) | |
| | 2832 | 3111 | 3193 | 3245 | |
| | 507 (362/90/19/12/24) | 230 (172/31/5/3/19) | 147 (90/27/3/1/26) | 95 (50/13/12/1/27) | |
| | 2,681,029/2,668,933 | 3,117,597/2,782,367 | |||
1dem = demultiplexing
2Identical barcodes match perfectly with references, and there are no ambiguities while compatible barcodes match with reference with at 100% identity but contain ambiguities. NGSpeciesID does not introduce N’s in barcodes
*Compatible barcodes are found in NGSpeciesID datasets due to the presence of Ns in reference sequences
Equipment required for MinION barcoding
| Required (< 500 specimens) | |
| 1 | MinION sequencer (preferably Mk1C for basecalling) with Flongle adapter |
| 2 | Thermocycler(s) |
| 3 | Gel Electrophoresis setup |
| 4 | Magnetic Separation Rack |
| 5 | Qubit for DNA quantification |
| 6 | Standard equipment: Vortex, Mini-centrifuge, pipettes, freezer, fridge |
| 7 | Standard laptop or PC |
| 1 | Multichannel pipette(s) |
| 1 | Hula Mixer |
Datasets used in the study and the corresponding experimental details
| Dataset name | Number of specimens | Fragment size, primer information | Extraction/PCR setup | PCR cleanup | ONT Library Preparation kit/Flow cell used |
|---|---|---|---|---|---|
Mixed Diptera (see Srivathsan et al. [ - Sanger barcodes available | 511 (257 mixed Diptera, 254 Dolichopodidae) 17 negatives | 658 bp HCO2198, LCO1490 [ | Extraction Method: QuickExtract PCR Mix: Total volume: 20 μl 10× buffer: 2 μl dNTPs (2.5 mM): 1.5 μl Taq polymerase: 0.2 μl BSA (1 mg/ml): 2 μl Primer (5 μM): 2 μl each DNA:2 μl | Ampure beads (Beckman Coulter) | SQK-LSK110/FLO-MIN111 |
Afrotropical Phoridae (see Srivathsan et al. [ - Illlumina mini-barcodes available | 4275 (Phoridae) 45 negatives | 658 bp HCO2198, LCO1490 [ | Extraction Method: QuickExtract PCR Mix: Total volume: 15.16 μl Mastermix (CWBio): 10 μl 25 mM MgCl2: 0.16 μl BSA (1 mg/ml): 2 μl Primer (10 μM): 1 μl each DNA: 1 μl | Sera-Mag beads (GE Healthcare Life Sciences) in PEG | SQK-LSK109/FLO-MIN111 |
| Palaearctic Phoridae (658) | 9929 (Phoridae) 105 negatives | 658 bp jgHCO2198, LCO1490 [ | Extraction Method: HotSHOT PCR Mix: Total volume: 16 μl Mastermix (CWBio): 7 μl BSA (1 mg/ml): 1 μl Primer (10 μM): 1 μl each DNA: 6 μl | Ampure beads (Beckman Coulter) | SQK-LSK110/FLO-MIN111 |
| Palaearctic Phoridae (313) | 9932 (Phoridae) 106 negatives | 313 bp m1COlintF, jgHCO2198 [ | Extraction Method: HotSHOT PCR Mix: Total volume: 14 μl Mastermix (CWBio): 7 μl BSA (1 mg/ml): 1 μl Primer (10 μM): 1 μl each DNA: 4 μl | Ampure beads (Beckman Coulter) | SQK-LSK110/FLO-MIN111 |
Mixed Diptera subsample (see Srivathsan et al. [ - Sanger barcodes available | 257 7 negatives | See “Mixed Diptera” entry for R10.3 | See “Mixed Diptera” entry for R10.3 | Ampure beads (Beckman Coulter) | SQK-LSK109/Flongle |
| Chironomidae | 191 (Chironomidae) 1 negative | 313 bp m1COlintF, jgHCO2198 [ | Extraction Method: HotSHOT PCR Mix: Total volume: 14 μl Mastermix (CWBio): 7 μl BSA (1 mg/ml): 1 μl Primer (10 μM): 1 μl each DNA: 4 μl | Ampure beads (Beckman Coulter) | SQK-LSK109/Flongle |
Alternative DNA extraction methods
| Commonly used alternative DNA extraction methods | Advantages | Disadvantages |
|---|---|---|
| “directPCR”: “contaminating” a PCR reaction with the DNA of the target organism by adding the entire specimen or a tissue sample into the PCR reagent mix (Wong, Tay et al. 2014). | • No cost • No waiting time obtaining for template | • Time-consuming when subsampling is needed (antenna, leg) • Low success rate for heavily sclerotized specimens • No DNA template left after PCR |
| Commercial DNA extraction buffers: e.g., QuickExtract: 10 μl sufficient for obtaining DNA template from most insect specimens (Srivathsan, Hartop et al. [ | • Long shelf life of buffers • Template stays viable for weeks • Additional DNA can be obtained through re-extraction of specimen | • Moderate costs (< 0.20 USD) • DNA in leftover templates degrades within weeks/months |
| Commercial DNA extraction kits: e.g., DNeasy Blood & Tissue Kits | • Template is stable | • High cost (> 1 USD) • Time-consuming |