| Literature DB >> 31391493 |
Harrison S Edwards1,2, Raga Krishnakumar3, Anupama Sinha3, Sara W Bird4,5, Kamlesh D Patel1,6, Michael S Bartsch7.
Abstract
The Oxford MinION, the first commercial nanopore sequencer, is also the first to implement molecule-by-molecule real-time selective sequencing or "Read Until". As DNA transits a MinION nanopore, real-time pore current data can be accessed and analyzed to provide active feedback to that pore. Fragments of interest are sequenced by default, while DNA deemed non-informative is rejected by reversing the pore bias to eject the strand, providing a novel means of background depletion and/or target enrichment. In contrast to the previously published pattern-matching Read Until approach, our RUBRIC method is the first example of real-time selective sequencing where on-line basecalling enables alignment against conventional nucleic acid references to provide the basis for sequence/reject decisions. We evaluate RUBRIC performance across a range of optimizable parameters, apply it to mixed human/bacteria and CRISPR/Cas9-cut samples, and present a generalized model for estimating real-time selection performance as a function of sample composition and computing configuration.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31391493 PMCID: PMC6685950 DOI: 10.1038/s41598-019-47857-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic of the RUBRIC workflow illustrating the division of computational effort between two garden-variety PCs: a laptop that runs the MinION sequencer and its MinKNOW software interfaced through the Read Until API (via ethernet) to a desktop system that performs the key RUBRIC operations of pre-screening reads for admission to the decision process, basecalling and aligning reads to nucleic acid target reference(s) in real-time, and communicating any resulting skip/reject decisions back to MinKNOW.
Summary of RUBRIC experiments and parametric variations for preliminary lambda DNA experiments A1-B1, mainline EagI-digested Lambda DNA experiments B2-E2, and example use case experiments F and G in which Cas9-cut rDNA was selected from E. coli gDNA and E. coli gDNA was selected from human gDNA, respectively.
| Run | Input DNA (ng) | Active Odd/Even Pores | Run Time (min) | Event Sampler (min) | Fast5 Files | Queue Size (reads) | Threshold Filter Limits (pA)a | Digestb | Library Preparationb | Experiment Change Summaryc |
|---|---|---|---|---|---|---|---|---|---|---|
| A1 | 335 | 246/243 | 168 | 167 | 48,525 | 12 | 40–130 (M) | Fresh EagI | Fresh 1D | 2 Laptops, LAST args: -fTAB -C2 -q 1 -r 1 -a 1 -b 1 -D 100 -e 15 |
| A2 | 335 | 206/204 | 175 | 174 | 43,947 | 12 | 40–130 (M) | Same as A1 | Same sample as A1 | LAST args: -fTAB -C2 -q 1 -r 1 -a 1 -b 1 |
| B1 | 308 | 221/222 | 194 | 194 | 82,334 | 24 | 40–130 (M) | Same as A1 (7d) | Fresh 1D | Laptop + desktop, increased queue size |
| B1* | 308 | 221/221 | 11 | 11 | 5,401 | 24 | 40–130 (M) | Same as A1 (7d) | Fresh 1D | (*Time-filtered to remove periods of failed skipping) |
| B2 | 308 | 189/199 | 175 | 174 | 87,818 | 24 | 40–130 (M) | Same as A1 (7d) | Same sample as B1 | RUBRIC desktop operated in Safe Mode |
| C | 322 | 208/219 | 515 | 270 | 60,003 | 12 | 40–130 (M) | Same as A1 (8d) | Frozen (1d), same prep as B1 | Reduced queue size, frozen library |
| D | 386 | 227/228 | 215 | 215 | 100,513 | 16 | 70–110 (M) | Fresh EagI | Fresh 1D | Adjusted mean threshold, increased queue |
| E1 | 380 | 207/221 | 132 | 132 | 31,871 | 16 | 8.48–14.10 (S) | Same as D (2d) | Frozen (2d), same prep as D | Standard deviation (SD) threshold, frozen library |
| E2 | 380 | 199/205 | 431 | 209 | 54,048 | 16 | 5.46–14.56 (S) | Same as D (2d) | Same sample as E1 | Adjusted SD threshold |
| E2* | 380 | 199/205 | 424 | 202 | 52,509 | 16 | 5.46–14.56 (S) | Same as D (2d) | Same sample as E1 | (*Time-filtered to remove periods of failed skipping) |
| F | 125 | 241/246 | 882 | 833 | 18,704 | 16 | 5–15.2 (S) | Cas9 rDNA | Fresh 1Db | Adjusted SD threshold |
| F* | 125 | 241/246 | 830 | 781 | 17,911 | 16 | 5–15.2 (S) | Cas9 rDNA | Fresh 1Db | (*Time-filtered to remove periods of failed skipping) |
| G | 4000 | 127/126 | 77 | 74 | 36,651 | 16 | 10–16 (S) | None | Fresh rapid kit | MinKNOW 1.11.5, adjusted SD threshold, 600 event evaluation window, LAST args: -fTAB -C2 -q 1 -r 1 -a 1 -b 1 -e 30 |
| G* | 4000 | 127/126 | 65 | 62 | 29,880 | 16 | 10–16 (S) | None | Fresh rapid kit | (*Time-filtered to remove periods of failed skipping) |
aLower and upper threshold filter bounds based on mean (M) and standard deviation (S) of the pore current trace.
bFresh digests and library preparations were performed on the day of the sequencing run, while storage time (days) for previously prepared digests and frozen libraries (see Supplementary Section S5) are indicated.
cUnless otherwise noted, adjustments in the Change Summary column apply to all subsequent runs.
*Dataset time-filtered to eliminate reads from periods of failed skipping, see Supplemental Section S3.
Figure 2Sankey chart depicting read and fast5 sequence file data flow analysis for Experiment B2. Because the target lambda DNA fragment was a subset of the overall lambda (background) sequence, no reads mapped exclusively to the target, and therefore all correctly mapped target reads appear in the “both” category at the 3-pronged terminal ends of each chart branch. Undecided read counts shown here include both reads that timed-out of the decision process (>2 seconds in the queue) and those that did not otherwise receive a decision.
Performance metrics for RUBRIC selective sequencing experiments including preliminary lambda DNA runs A1 through B1, mainline lambda experiments B2 through E2, and application examples F and G.
| A1 | A2 | B1 | B1* | B2 | C | D | E1 | E2 | E2* | F | F* | G | G* | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odd Reads/min | 1644 | 1464 | 1873 | 2426 | 2185 | 1040 | 2700 | 1163 | 1465 | 1465 | 2095 | 2117 | 1560 | 1480 | |
| In-Threshold Reads/pore/min | 826.3 | 899.3 | 296.6 | 371.0 | 390.5 | 558.3 | 243.7 | 155.3 | 203.7 | 205.0 | 99.95 | 104.0 | 297.4 | 292.5 | |
| Odd Reads/pore/min | 8.959 | 8.596 | 10.39 | 11.40 | 11.95 | 6.868 | 13.02 | 6.032 | 7.846 | 7.834 | 9.999 | 9.975 | 12.82 | 13.17 | |
| In-Thresh Reads/pore/min | 4.485 | 5.366 | 1.668 | 1.768 | 2.062 | 4.045 | 1.198 | 0.779 | 1.061 | 1.065 | 0.474 | 0.490 | 2.469 | 2.619 | |
| Average Pore Vacancy | 70.2% | 64.6% | 65.2% | 71.0% | 72.0% | 65.2% | 77.2% | 81.9% | 84.0% | 83.9% | 98.3% | 98.3% | 78.2% | 77.2% | |
| Absolute Sequence Enrichmenta,b | 0.578 | 0.759 | 0.135 | 0.972 | 0.987 | 0.949 | 0.888 | 0.756 | 0.940 | 0.939 | 0.413 | 0.422 | 1.128 | 1.149 | |
| Absolute Read Enrichmenta,c | 0.580 | 0.782 | 0.134 | 0.949 | 1.021 | 0.926 | 0.797 | 0.761 | 0.902 | 0.901 | 0.449 | 0.456 | 1.010 | 1.055 | |
| Relative Read Enrichmenta,d | 1.674 | 216.5 | 362.2 | 209.3 | 198.4 | 329.5 | 182.2 | 160.9 | 131.3 | 136.5 | 281.8 | 289.1 | 298.3 | 288.0 | |
| Throughput Ratioa,e | 1.102 | 1.293 | 1.025 | 1.181 | 1.212 | 1.271 | 1.185 | 1.067 | 0.920 | 0.924 | 1.081 | 1.092 | 1.082 | 1.093 | |
| Decision Efficiencyf | 63.6% | 72.1% | 14.7% | 94.0% | 99.3% | 86.1% | 98.8% | 99.4% | 99.5% | 99.5% | 97.6% | 97.7% | 99.3% | 99.9% | |
| Timeout Fractiong | 44.7% | 4.5% | 100.0% | 98.4% | 96.0% | 44.7% | 99.8% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% | 97.5% | 82.6% | |
| Threshold Filterh | Sensitivity | 0.988 | 0.985 | 0.973 | 0.979 | 0.929 | 0.987 | 0.892 | 0.799 | 0.952 | 0.951 | 0.282 | 0.263 | 0.972 | 0.972 |
| Specificity | 0.581 | 0.540 | 0.919 | 0.931 | 0.918 | 0.566 | 0.965 | 0.920 | 0.902 | 0.902 | 0.956 | 0.955 | 0.917 | 0.917 | |
| Precision | 0.135 | 0.089 | 0.525 | 0.508 | 0.462 | 0.114 | 0.566 | 0.372 | 0.373 | 0.372 | 0.010 | 0.009 | 0.583 | 0.593 | |
| Accuracy (MCC) | 0.276 | 0.215 | 0.682 | 0.678 | 0.621 | 0.249 | 0.693 | 0.508 | 0.562 | 0.561 | 0.046 | 0.041 | 0.718 | 0.724 | |
| Skip/Sequence Decisionh | Sensitivity | 0.996 | 0.992 | 0.915 | 0.915 | 0.859 | 0.991 | 0.884 | 0.930 | 0.902 | 0.904 | 0.961 | 0.965 | 0.804 | 0.795 |
| Specificity | 0.584 | 0.982 | 0.995 | 0.995 | 0.993 | 0.985 | 0.991 | 0.951 | 0.963 | 0.963 | 0.998 | 0.998 | 0.996 | 0.996 | |
| Precision | 0.122 | 0.765 | 0.976 | 0.976 | 0.962 | 0.853 | 0.962 | 0.793 | 0.801 | 0.802 | 0.715 | 0.714 | 0.457 | 0.475 | |
| Accuracy (MCC) | 0.266 | 0.863 | 0.934 | 0.934 | 0.892 | 0.913 | 0.904 | 0.828 | 0.824 | 0.825 | 0.828 | 0.829 | 0.604 | 0.612 | |
| RUBRIC Overallh | Sensitivity | 0.633 | 0.717 | 0.133 | 0.832 | 0.795 | 0.847 | 0.790 | 0.799 | 0.865 | 0.866 | 0.602 | 0.599 | 0.771 | 0.776 |
| Specificity | 0.884 | 0.994 | 1.000 | 1.000 | 0.999 | 0.994 | 0.999 | 0.995 | 0.995 | 0.995 | 1.000 | 1.000 | 0.999 | 0.999 | |
| Precision | 0.122 | 0.765 | 0.976 | 0.976 | 0.962 | 0.853 | 0.962 | 0.793 | 0.801 | 0.802 | 0.715 | 0.714 | 0.457 | 0.475 | |
| Accuracy (MCC) | 0.240 | 0.734 | 0.355 | 0.899 | 0.871 | 0.844 | 0.870 | 0.791 | 0.829 | 0.830 | 0.656 | 0.654 | 0.593 | 0.607 | |
aNormalized with respect to even and odd total active pore times indicated in Supplementary Table S1.
bCumulative sequence decision target read length/cumulative odd target read length.
cSequence decision target read count/odd target read count.
dSequence decision target/non-target read count divided by odd target/non-target read count.
eEven sampled read count/odd sampled read count.
f% of in-threshold reads receiving a skip or sequence decision.
g% of undecided reads not receiving a decision due to the RUBRIC 2-second queue timeout period.
hBinary classifier-based performance metrics are detailed in Supplementary Section S1.
*Dataset time-filtered to eliminate reads from periods of failed skipping, see Supplemental Section S3.
Figure 4Read length histograms for RUBRIC selection experiments illustrating the distribution of different read types (target, non-target, unmapped) and their fate as a function of RUBRIC selection applied to even numbered pores. Here, reads excluded by the selection process (i.e. not receiving an affirmative sequence decision) include skipped, out-of-threshold, and undecided reads, while reads not mapped to target include those mapped to background/non-target sequence as well as unmappable reads. (a) Lambda DNA experiment B2 showing selection for the middle (nominally ~17 kb) fragment. (b) Example use case dataset F* showing selection for Cas9-excised rDNA from E. coli gDNA. (c,d) Example use case dataset G* showing selection of 1% E. coli gDNA from a background of 99% human gDNA. Supplementary Fig. S10 provides more detailed distributions of all read types and categories.
Figure 3Lambda DNA sequence coverage plot for experiment B2 showing the effect of RUBRIC selection applied to even pore reads in contrast to unselected odd pore reads. Even and odd coverage numbers are normalized by total even and odd active pore times, respectively.