| Literature DB >> 30458005 |
Ryan R Wick1, Louise M Judd1, Kathryn E Holt1.
Abstract
Multiplexing, the simultaneous sequencing of multiple barcoded DNA samples on a single flow cell, has made Oxford Nanopore sequencing cost-effective for small genomes. However, it depends on the ability to sort the resulting sequencing reads by barcode, and current demultiplexing tools fail to classify many reads. Here we present Deepbinner, a tool for Oxford Nanopore demultiplexing that uses a deep neural network to classify reads based on the raw electrical read signal. This 'signal-space' approach allows for greater accuracy than existing 'base-space' tools (Albacore and Porechop) for which signals must first be converted to DNA base calls, itself a complex problem that can introduce noise into the barcode sequence. To assess Deepbinner and existing tools, we performed multiplex sequencing on 12 amplicons chosen for their distinguishability. This allowed us to establish a ground truth classification for each read based on internal sequence alone. Deepbinner had the lowest rate of unclassified reads (7.8%) and the highest demultiplexing precision (98.5% of classified reads were correctly assigned). It can be used alone (to maximise the number of classified reads) or in conjunction with other demultiplexers (to maximise precision and minimise false positive classifications). We also found cross-sample chimeric reads (0.3%) and evidence of barcode switching (0.3%) in our dataset, which likely arise during library preparation and may be detrimental for quantitative studies that use multiplexing. Deepbinner is open source (GPLv3) and available at https://github.com/rrwick/Deepbinner.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30458005 PMCID: PMC6245502 DOI: 10.1371/journal.pcbi.1006583
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Neural network architecture.
Layers in the network are drawn as coloured blocks and data as groups of vertical lines. Data dimensions are shown for each step of the process as data length × filter count. Gaussian noise and dropout layers are only active during network training, not during classification.
Classification performance of demultiplexing tools.
| Binned | Reads with known ground truth | Other reads | |||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | Q score | Binned | Binned | Binned neg | ||
| Albacore | 73.8% | 97.3% | 78.9% | 6.8–11.3 | 23.5% | 84.3% | |
| Porechop | 79.0% | 97.5% | 83.6% | 6.1–11.3 | 33.5% | 72.4% | 1.8% |
| Deepbinner | 92.2% | 5.0–11.3 | 68.8% | 0.3% | |||
| Porechop (lenient) | 95.9% | 92.3% | 89.9% | 4.9–11.3 | 85.8% | 92.0% | 73.5% |
| Porechop (stringent) | 17.2% | 99.7% | 18.8% | 7.8–11.5 | 5.8% | 1.0% | 0.0% |
| Deepbinner (stringent) | 53.4% | 99.4% | 56.8% | 5.7–11.3 | 28.7% | 3.8% | 0.0% |
Classification metrics for the three tested demultiplexers using the amplicon read set. The first three rows show results using the tools’ default parameters. The last three rows show results where parameters were changed to increase or decrease stringency.
Binned reads = proportion of all reads assigned to a barcode. Precision (positive predictive value) = proportion of binned reads correctly assigned. Recall (accuracy) = proportion of all reads correctly assigned. Q score range = mean Phred quality scores of binned reads (2.5th–97.5th percentile). Binned unknown = proportion of unknown reads (those unable to be assigned to any amplicon reference) assigned to a barcode. Binned chimeric = proportion of chimeric reads (those assigned to more than one amplicon reference) assigned to a barcode. Binned negative control = proportion of negative control reads (those from a separate barcode-less library preparation) assigned to a barcode.