| Literature DB >> 32325704 |
Adeline Seah1, Marisa C W Lim1, Denise McAloose1, Stefan Prost2,3, Tracie A Seimon1.
Abstract
The ability to sequence a variety of wildlife samples with portable, field-friendly equipment will have significant impacts on wildlife conservation and health applications. However, the only currently available field-friendly DNA sequencer, the MinION by Oxford Nanopore Technologies, has a high error rate compared to standard laboratory-based sequencing platforms and has not been systematically validated for DNA barcoding accuracy for preserved and non-invasively collected tissue samples. We tested whether various wildlife sample types, field-friendly methods, and our clustering-based bioinformatics pipeline, SAIGA, can be used to generate consistent and accurate consensus sequences for species identification. Here, we systematically evaluate variation in cytochrome b sequences amplified from scat, hair, feather, fresh frozen liver, and formalin-fixed paraffin-embedded (FFPE) liver. Each sample was processed by three DNA extraction protocols. For all sample types tested, the MinION consensus sequences matched the Sanger references with 99.29%-100% sequence similarity, even for samples that were difficult to amplify, such as scat and FFPE tissue extracted with Chelex resin. Sequencing errors occurred primarily in homopolymer regions, as identified in previous MinION studies. We demonstrate that it is possible to generate accurate DNA barcode sequences from preserved and non-invasively collected wildlife samples using portable MinION sequencing, creating more opportunities to apply portable sequencing technology for species identification.Entities:
Keywords: Biomeme; Chelex; DNA barcoding; FFPE; MinION; field-friendly; non-invasive sampling
Year: 2020 PMID: 32325704 PMCID: PMC7230362 DOI: 10.3390/genes11040445
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Lab and SAIGA bioinformatics pipeline flowchart. Bioinformatics software and parameters are indicated at each step.
Figure 2The number of reads assigned to each Oxford Nanopore Technologies (ONT) index (01–12) per flow cell by MiniBar and by qcat. For flow cell FAL19910, the 1st sequencing run used indexes 01–04 and the 2nd run used indexes 05–10. For flow cell FAL19272, the 1st sequence run used indexes 01–06 and the 2nd run used indexes 07–12.
Figure 3The percent of demultiplexed reads used to generate the final consensus sequence for 100R, 500R, and 5KR subsets for each species. Samples are labeled by tissue type and extraction method (b = biomeme, c = chelex, q = qiagen). Points are linked by a grey line to show difference in values from demultiplexers. Overlapping areas in orange indicate similar results for Minibar and qcat analyses. Asterisks (*) indicate samples with cinnamon teal contamination.
Average and standard deviation (sd) for percent sequence similarity to Sanger sequence, length of matching nucleotides, and number and percent of demultiplexed reads used for the final consensus sequence from 100R, 500R, or 5KR read subsets demultiplexed with MiniBar or qcat. Statistics were calculated across all tissue types and extraction method samples.
| Subset | Demultiplexer | Average % ID (sd) | Average Alignment Length (bp) (sd) | Average Number of Clustered Reads (sd) | Average % Clustered Reads (sd) |
|---|---|---|---|---|---|
| 100 reads per sample (100R) | MiniBar | 99.99 (0.05) | 421.05 (0.21) | 97.5 (5.8) | 97.50% (0.06) |
| qcat | 100 (0.00) | 420.5 (0.86) | 97.45 (6.01) | 97.45% (0.06) | |
| 500 reads per sample (500R) | MiniBar | 99.97 (0.11) | 421.09 (0.43) | 484.5 (35.77) | 96.90% (0.07) |
| qcat | 100 (0.00) | 420.82 (0.59) | 483.68 (38.32) | 96.73% (0.08) | |
| 5000 reads per sample (5KR) | MiniBar | 99.88 (0.24) | 421.18 (0.8) | 4411.14 (916.69) | 88.22% (0.18) |
| qcat | 99.95 (0.18) | 420.41 (0.85) | 4456.14 (939.87) | 89.12% (0.19) |
Figure 4The percent sequence similarity of MinION consensus to Sanger sequence from Blast for 100R, 500R, and 5KR subsets for each species. Samples are labeled by tissue type and extraction method (b = biomeme, c = chelex, q = qiagen). Points are linked by a grey line to show difference in values from demultiplexers. Overlapping areas in orange indicate similar results for Minibar and qcat analyses. The horizontal dashed line is the 99% threshold for sequence similarity. Asterisks (*) indicate samples with cinnamon teal read contamination.
Figure 5Screenshots of selected sections of the Mafft alignments for (A) snow leopard and (B) cinnamon teal showing nucleotide sites with differences between sequences in homopolymeric regions. Sanger sequences are listed above the black line and MinION consensus sequences below.