| Literature DB >> 28977016 |
Michele Menegon1, Chiara Cantaloni2,3, Ana Rodriguez-Prieto1, Cesare Centomo3, Ahmed Abdelfattah2, Marzia Rossato3, Massimo Bernardi1, Luciano Xumerle2,3, Simon Loader4,5, Massimo Delledonne2,3.
Abstract
Biodiversity research is becoming increasingly dependent on genomics, which allows the unprecedented digitization and understanding of the planet's biological heritage. The use of genetic markers i.e. DNA barcoding, has proved to be a powerful tool in species identification. However, full exploitation of this approach is hampered by the high sequencing costs and the absence of equipped facilities in biodiversity-rich countries. In the present work, we developed a portable sequencing laboratory based on the portable DNA sequencer from Oxford Nanopore Technologies, the MinION. Complementary laboratory equipment and reagents were selected to be used in remote and tough environmental conditions. The performance of the MinION sequencer and the portable laboratory was tested for DNA barcoding in a mimicking tropical environment, as well as in a remote rainforest of Tanzania lacking electricity. Despite the relatively high sequencing error-rate of the MinION, the development of a suitable pipeline for data analysis allowed the accurate identification of different species of vertebrates including amphibians, reptiles and mammals. In situ sequencing of a wild frog allowed us to rapidly identify the species captured, thus confirming that effective DNA barcoding in the field is possible. These results open new perspectives for real-time-on-site DNA sequencing thus potentially increasing opportunities for the understanding of biodiversity in areas lacking conventional laboratory facilities.Entities:
Mesh:
Year: 2017 PMID: 28977016 PMCID: PMC5627904 DOI: 10.1371/journal.pone.0184741
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of the species studied in the present work, their origin, tissue sampled, and the gene analyzed.
| Species | Origin | Tissue | Gene analyzed |
|---|---|---|---|
| Tanzania-MUSE | phalanx | 16S | |
| Tanzania-MUSE | phalanx | 16S, CO1 | |
| Tanzania-MUSE | connective skin tissue | 16S | |
| Italy-MUSE | connective skin tissue | 16S | |
| Tanzania-MUSE | connective skin tissue | CO1 | |
| Tanzania | blood | 16S |
Primer pairs used for the amplification of the selected barcode genes.
| Species | Primer name | Forward 5'-3' | Gene | Amplicon length | Reference |
|---|---|---|---|---|---|
| vertebrates | 16Sar-5' | 16S | ~600bp | [ | |
| 16S | |||||
| Amp-P3 F | CO1 | ~900bp | [ | ||
| Amp-P3 R | |||||
| LCO1490 | CO1 | ~710bp | [ | ||
| HC02198 |
Fig 1The “ONtoBAR” pipeline.
(i) The 2D Pass reads produced with the MinION are assembled de-novo using the Loman’s method; (ii) the obtained Loman’s consensus sequence is then BLASTed to retrieve the most similar sequence present in the NCBI database; (iii) the best hit is then selected as the new reference to which the initial 2D Pass reads are aligned using LAST; (iv) the frequency of each nucleotide is calculated for every position along the reference sequence and the final ONtoBAR consensus is generated and (v) BLASTed vs the NCBI database.
Impact of storage temperature on sequencing performances.
| Channels QC | Channels with Reads | Total | 2D | Pass 2D | Total | 2D | Pass 2D | |
|---|---|---|---|---|---|---|---|---|
| 262 | 226 | 54380 | 7873 (14.4%) | 2163 (4.0%) | 241 | 35 | 10 | |
| 494 | 425 | 141908 | 33200 (23.4%) | 8144 (5.7%) | 334 | 78 | 19 | |
| 120 | 128 | 11594 | 2307 (19.9%) | 784 (6.8%) | 91 | 18 | 6 | |
| 365 | 353 | 115673 | 20529 (17.7%) | 5652 (4.9%) | 328 | 58 | 16 | |
Sequencing results obtained after storing the ONT DNA Genomic kit at -20°C or at 4°C. The table reports the results of two independent experiments performed for each storage conditions. “Channels QC” and “Channels with Reads” indicate the number of active channels when the flow cell quality control (QC) was performed or during the sequencing, respectively. The raw read counts (Total, 2D and Pass 2D) are divided by the number of sequencing flow cell channels used during the experiment (Channels with reads) in order to normalize for the specific efficiency of each flow-cell and for the sequencing run time (Normalized Reads). Percentages are calculated by dividing the number of 2D and Pass 2D reads by the total number of reads.
Impact of environmental conditions on sequencing performances.
| Channels QC | Channels with Reads | Total | 2D | Pass 2D | Total | 2D | Pass 2D | |
|---|---|---|---|---|---|---|---|---|
| 365 | 353 | 115673 | 20529 (17.7%) | 5652 (4.9%) | 328 | 58 | 16 | |
| 120 | 128 | 11594 | 2307 (19.9%) | 784 (6.8%) | 91 | 18 | 6 | |
The table reports the sequencing results obtained from experiments performed under different environmental conditions, i.e. in standard laboratory or tropical greenhouse conditions, the latter to simulate extreme environmental conditions in the field. “Channels QC” and “Channels with Reads” indicate the number of active channels when the flow cell quality control (QC) was performed or during the sequencing, respectively. The raw read counts (Total, 2D and Pass 2D) are divided by the number of sequencing flow cell channels used during the experiment (Channels with reads) in order to normalize for the specific efficiency of each flow-cell and for the sequencing run time (Normalized Reads). Percentages are calculated by dividing the number of 2D and Pass 2D reads by the total number of reads.
MinION sequencing data from experiments involving different sample preparation protocols.
| Raw Reads | Normalized Reads | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Protocol | Adjustments | Channels QC | Channels with reads | Total | 2D | Pass 2D | Total | 2D | Pass 2D |
| none | 480 | 257 | 17,0193 | 10,250 (6.0%) | 3,730 (2.2%) | 662 | 40 | 15 | |
| end-repair and dA-tailing removed | 344 | 320 | 1,839 | 5 (0.2%) | 2 (0.1%) | 4 | 0 | 0 | |
| end-repair and dA-tailing removed, PCR with phosphorilated primers | 208 | 203 | 54,512 | 9,536 (17.5%) | 3,441 (6.3%) | 269 | 47 | 17 | |
Protocol 1 includes dA-tailing and end-repair steps whereas Protocol 2 and 3 omit these steps; in protocol 3 PCR uses phosphorylated primers. “Channels QC” and “Channels with Reads” indicate the number of active channels when the flow cell quality control (QC) was performed or during the sequencing, respectively. Read counts (Total, 2D and Pass 2D) are divided by the number of sequencing flow cell channels used during the experiment (Channels with reads) in order to normalize for the specific efficiency of each flow-cell and for the sequencing run time (Normalized Reads). Percentages are calculated by dividing 2D and Pass 2D by the total number of reads.
MinION sequencing data and sequence identification results.
| Sample species | Gene | Total Reads | 2D Reads | 2D Pass Reads | Similarity % | Reference (Accession number) | ||
|---|---|---|---|---|---|---|---|---|
| Loman’s | ONtoBAR | Sanger | ||||||
| 16S | 51,273 | 8,555 | 2,660 | |||||
| 16S | 109,047 | 57,110 | 42,102 | |||||
| CO1 | 181,123 | 113,663 | 110,921 | |||||
| 16S | 97,080 | 16,760 | 8,026 | |||||
| 16S | 84,913 | 24,807 | 7,706 | 99% | ||||
| CO1 | 167,466 | 104,419 | 97,725 | 88% | 99% | 97% | ||
| 16S | 5,039 | 187 | 2 | 97% | 100% | 100% | ||
For each experiment the table reports the sample species, the name of the sequenced gene, the total number of reads obtained by MinION sequencing, the number of 2D reads, and the PASS subsets. The similarity % columns show the identity scores between Loman’s consensus, ONtoBAR consensus and Sanger compared to the reference sequence reported in the last column.
Fig 2Nucleotide frequencies and coverage of aligned reads.
The nucleotide frequencies (bars) of the aligned reads were calculated at every position using the Sanger sequence as reference. The minimum value of the ‘correct nucleotide’ frequency, i.e. corresponding to the reference, along the entire sequence was 0.66. The sequence coverage (continuous line) was obtained by counting the number of nucleotides aligned over each reference position. The frequency value of the four nucleotides and the coverage are shown in a region with average complexity and a homopolymer run, the latter showing a clear drop in coverage.
Fig 3Nanopore sequencing coverage vs homopolymer runs.
Read coverage (upper line) and homopolymer runs (bars) along the 532-bp Amietophrynus brauni barcode region is shown. The coverage value of MinION reads was calculated as respect to the reference sequence generated by Sanger. The homopolymer runs indicated with a bar correspond to sequences of at least two identical nucleotides.
Fig 4Forest field laboratory.
The field laboratory set-up for conducting MinION sequencing of amphibians in a montane rainforest of Tanzania.
Fig 5NCBI BLAST search of the assembled sequence of Arthroleptis xenodactyloides.
Alignment result of the assembled consensus sequence of A. xenodactyloides (Query, 741 bp) with with the highest score hit retrieved from NCBI BLAST, i.e. A. xenodactyloides sequence (FJ151103.1; 2313 bp), Sbjct. The two sequences matched with 97% identity. Homopolymer sites with more than two identical consecutive nucleotides are highlighted in red while mis-matches are marked by an arrow.
Comparison of sequencing data generated with the old and new MinION flowcell and chemistries.
| Flowcell | Total Reads | 2D Pass | Aligned Reads | Mean Error | Mismatch | Insertion | Deletion |
|---|---|---|---|---|---|---|---|
| R7.4 | 36,091 | 1,539 (4.2%) | 815 | 13% | 5% | 3% | 5% |
| R9.4 (I) | 160,321 | 98,252 (61%) | 93,476 | 6.5% | 1.6% | 0.8% | 4.1% |
| R9.4 (II) | 300,252 | 205,929 (69%) | 195,887 | 7.4% | 2% | 0.9% | 4.5% |
The table reports the total number of reads and 2D reads generated with the R7.4 flowcell and MAP005 sequencing kit or with the R9.4 flowcell and the SQK-LSK208 library preparation kit, in two different days (I-II). The number of reads that could be aligned to the reference Sanger sequence is shown, along with the percentage and types of errors detected in the MinION sequences.