Literature DB >> 26787508

Nanopore sequencing detects structural variants in cancer.

Alexis L Norris1, Rachael E Workman2, Yunfan Fan2, James R Eshleman1, Winston Timp2.   

Abstract

Despite advances in sequencing, structural variants (SVs) remain difficult to reliably detect due to the short read length (<300 bp) of 2nd generation sequencing. Not only do the reads (or paired-end reads) need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment of short reads. We propose to use the long-reads (up to 20 kb) possible with 3rd generation sequencing, specifically nanopore sequencing on the MinION. Nanopore sequencing relies on a similar concept to a Coulter counter, reading the DNA sequence from the change in electrical current resulting from a DNA strand being forced through a nanometer-sized pore embedded in a membrane. Though nanopore sequencing currently has a relatively high mismatch rate that precludes base substitution and small frameshift mutation detection, its accuracy is sufficient for SV detection because of its long reads. In fact, long reads in some cases may improve SV detection efficiency. We have tested nanopore sequencing to detect a series of well-characterized SVs, including large deletions, inversions, and translocations that inactivate the CDKN2A/p16 and SMAD4/DPC4 tumor suppressor genes in pancreatic cancer. Using PCR amplicon mixes, we have demonstrated that nanopore sequencing can detect large deletions, translocations and inversions at dilutions as low as 1:100, with as few as 500 reads per sample. Given the speed, small footprint, and low capital cost, nanopore sequencing could become the ideal tool for the low-level detection of cancer-associated SVs needed for molecular relapse, early detection, or therapeutic monitoring.

Entities:  

Keywords:  3rd generation sequencing; DNA sequencing; Deletions; cancer diagnostics; inversions; nanopore sequencing; next generation sequencing; structural variation; translocations; tumor suppressor gene

Mesh:

Substances:

Year:  2016        PMID: 26787508      PMCID: PMC4848001          DOI: 10.1080/15384047.2016.1139236

Source DB:  PubMed          Journal:  Cancer Biol Ther        ISSN: 1538-4047            Impact factor:   4.742


Abbreviations

Structural Variation Tumor Suppressor Gene Pancreatic Ductal Adenocarcinoma Polymerase Chain Reaction.

Introduction

Structural variants (SVs) are a hallmark of the genomic instability that underlies cancer, and include translocations, large deletions, amplifications, and inversions. SVs are often driver alterations, with translocations and amplifications activating oncogenes, and deletions and inversions inactivating tumor suppressor genes (TSGs). CDKN2A/p16 and SMAD4/DPC4 are 2 of the most commonly deleted TSGs in human cancer, and complex SVs have been found to underlie approximately half of these deletions in pancreatic ductal adenocarcinoma (PDAC). The sensitive detection of tumor-specific mutations, including both small alterations such as single base substitutions and large alterations such as SVs, of circulating tumor DNA is critical for applications such as molecular relapse, early detection, and possibly therapeutic monitoring of cancer patients. The arrival of 2nd generation sequencing has provided ample opportunity to investigate small alterations, but the large SV alterations remain under-studied because of the difficulty detecting them with the short reads (<300 bp) of 2nd generation sequencing. Not only do the paired-end reads need to straddle a breakpoint, but repetitive elements often lead to ambiguities in the alignment. Given that repetitive regions (including centromeres, telomeres, and other repetitive elements) encompass over half (56%) of the human genome, this is a significant concern when mapping SVs. The long reads (up to 20 kb) generated by 3rd generation DNA sequencing strategies can easily straddle these repetitive regions, allowing for unique alignment. Until recently, 3rd generation sequencing was limited to PacBio, which requires a high capital investment, a large footprint, and technical expertise. These factors limit the utility of PacBio-based 3rd generation sequencing in clinical testing. The new 3rd generation sequencing platform, the MinION™ (Oxford Nanopore Technologies™), lacks the prohibitive factors of PacBio. The MinION instrument is the size of a large USB stick, with low (∼$1k) capital cost and easy operation. Thus, nanopore sequencing on a MinION instrument may prove to be a valuable tool for clinical testing. Nanopore sequencing, first proposed by Church et al operates via a similar principle to a Coulter counter, using a measurement of the current through a hole in a membrane to characterize sample passing through the hole. In the case of nanopore sequencing, the hole is nanometers in diameter, and the DNA molecule passing through the pore influences the current in a way which is characteristic of the local base sequence. The MinION device consists of 512 independently addressed measurement channels, each with 4 sensor wells. The software controlling the MinION selects the “best” sensor during a process called multiplexing, a process repeated several times throughout a sequencing run. Each sensor well has a semi-synthetic membrane containing a proprietary protein pore molecule. An electric field is applied across the membrane, allowing both current measurement and providing the motive force for driving the negatively charged DNA molecule through the pore. The DNA library is enriched via the tether at the membrane surface, and then diffuses along the membrane. When the DNA leader is within range of the pore, it is captured and driven through up to the motor protein. The DNA is driven through pore primarily via electric field acting on the charged phosphate backbone, with translocation velocity controlled by a proprietary motor protein coupled to the DNA molecule. The pore is large enough only for a single stranded DNA molecule; 5 bases are within the central constriction of the pore at a given time and have a significant influence on the current. After the forward DNA strand has completely run through, the hairpin is run, then the reverse or cDNA strand is also sequenced. The consensus of the top and bottom reads is termed a “2D read” and increases the accuracy of base calling. Though nanopore sequencing method still has high error rate, it is rapidly improving; in our hands v7 flowcells had an average of 67.4% of the read correct, 24.2% mismatched, 7.5% insertions and 8.3% deletions, but the newer v7.3 flowcells had an average of 86% correct, with 9.7% mismatch, 4.2% insertion and 4.4% deletion – a dramatic improvement. Though the high error rate currently precludes their application to detecting single base substitutions (KRAS codons 12 and 13) and small frameshift mutations, the long read length easily generated with nanopore sequencing, i.e. average of 8 kb reads, enables easy detection of SV even in repetitive regions. In this paper, we demonstrate the ability of nanopore sequencing to detect SVs that inactivate the p16 and SMAD4 TSGs in PDAC cancer cell lines. Our set of 10 SVs includes large deletions, translocations, inversions, and the complex combination of a translocation and inversion (“TransFlip”). These SVs were previously defined by SNP microarray and whole genome sequencing (WGS), and confirmed by PCR and Sanger sequencing across the junctions. We show proof-of-principle, using dilutions of PCR products containing these SVs into the corresponding wildtype amplicons and show the ability to detect these SVs at 1:100 dilutions.

Results

Ability to detect simple and complex SVs

To demonstrate the value of long read sequencing to detect SVs, we selected a panel of 10 well-characterized SVs in the genes CDKN2A/p16 and SMAD4/DPC4 identified in pancreatic cancer cell lines. These 10 SVs included 2 interstitial deletions, 4 translocations, 4 inversions, and 1 combination of an inversion and translocation (“TransFlip” mutations, Table 1). Wildtype (WT) sequence (intact genomic sequence; no SV) served as a control and one SV (SV01) had a technical replicate (SV07). Using Oxford Nanopore barcodes, libraries for all 12 PCR amplicons were generated and multiplexed in one flowcell run (Fig. 1). The run produced a total of 3,987 2D reads from 194 of 512 channels, resulting in a 2.5 Mb yield. The average read was 640 bp long, full-length of our PCR products, and an average PHRED score of 11.50 (Fig. 2 A-B, Table 2). Importantly, nanopore reads have no discernible quality dependence with length, compared to the cycle dephasing commonly seen in 2nd generation sequencing.
Table 1.

Details of Amplicons included in this study.

Amplicon IDAmplicon Size Without Barcodes (bp)TSG DeletedSV TypeSV Left Breakpoint (hg19)SV Right Breakpoint (hg19)Expected alignment: Left (hg19)Expected alignment: Center (hg19)Expected alignment: Right (hg19)
SV01, SV07573p16TRANSchr9:24,353,014chr22:36,338,191chr9:24352894-24353014(+) chr22:36338191-36338601(+)
SV02579p16(WT)chr9:21,970,115chr9:21,970,649chr9: 21970115-21970649(−)  
SV03562p16INV+TRANSchr9:21,083,362chr9:21,083,521chr9: 21083139-21083362(+)chr9: 21083440-21083521(−) (81bp)chr3:79387683-79387899(−)
SV04573p16TRANSchr10:132,412,941chr9:27,096,867chr10: 132412940-132413131(−) chr9:27096866-27097203(+)
SV05576SMAD4IDchr18:48,570,319chr18:49,191,882chr18:48569959-48570319(+) chr18:49191882-49192052(+)
SV06561p16INVchr9:24,320,470chr9:24,323,843chr9: 24323843-24324156(−) chr9: 24320470-24320672(+)
SV08559p16INVchr9:25,968,399chr9:25,969,868chr9: 25968120-25968399(+) chr9: 25969634-25969868(−)
SV09581p16INVchr9:25,969,504chr9:25,972,326chr9: 25969502-25969820(−) chr9: 25972324-25972543(+)
SV10584p16INV+TRANSchr9:21,326,884chr7:140,023,555chr9: 21326735-21326867(+)chr9: 21326884-21326931(−) (47bp)chr7: 140023553-140023913(+)
SV11578SMAD4TRANSchr6:124,911,349chr18:53,465,051chr6:124911349-124911707(−) chr18:53465049-53465220(+)
SV12573SMAD4IDchr18:48,434,141chr18:49,851,882chr18:48433731-48434141(+) chr18:49851882-49852004(+)

Tumor Suppressor Gene (TSG), Structural Variant (SV), Translocation (TRANS), Wild-type Control (WT), Inversion (INV), Interstitial Deletion (ID).

Figure 1.

Nanopore Library Prep Workflow. Oxford Nanopore barcodes were incorporated into amplicons by PCR- individually for each SV- then resultant reactions were pooled (A). After NEB End Repair and dA-tailing modules (B), hairpin and leader adapters were ligated on, each containing a motor protein. Only the hairpin protein contained a his-tag, which was used to enrich for molecules containing a leader adapter and his-tag (his-tag selection step not shown). Tether attachment (C) allowed for direct attachment of the molecules to the flow cell membrane. Within the MinION flowcell (D), DNA molecules are pulled through a protein pore (blue), with motor protein (orange) affecting speed of DNA translocation through the pore. One side of the DNA molecule is read, then the hairpin, then the second side. Both reads were aligned to produce a 2D consensus read.

Figure 2.

Nanopore sequencing QC data. QC of Flow cell 1 A) length and B) PHRED quality histograms of each of the barcodes as a stacked bar graph. Average length of 570 bp and PHRED score of 11.5. QC of flow cell 2 C) length and D) PHRED quality histograms. Average length of 573 bp and PHRED score of 10.9.

Table 2.

Yield and Quality of Exp1, Limited to 2D reads.

AmpliconAvg. Length (bp)Yield (bp)Yield (reads)Quality (PHRED)% Match% Mismatch% Insertion% Deletion
SV01533.07 53,30710011.5281.3%12.6%2.3%6.1%
SV02582.07 81,49014010.9879.7%13.2%2.4%7.1%
SV03555.62228,91441211.8676.3%15.7%2.6%8.0%
SV04562.60200,28535611.5075.2%16.0%2.5%8.8%
SV05596.14134,13122511.3179.0%13.6%2.2%7.4%
SV06548.78311,15656711.4781.2%12.5%2.2%6.3%
SV07560.40 44,8328011.5380.9%12.7%2.2%6.4%
SV08547.34266,55448711.3378.1%14.0%1.8%8.0%
SV09610.68123,35820211.1781.4%12.2%2.6%6.3%
SV10595.92182,35330611.5878.2%15.4%3.0%6.4%
SV11578.32419,28372511.5577.6%14.8%2.5%7.6%
SV12583.76225,91438712.2676.4%14.7%2.5%8.9%
Average571.22189,29833211.5078.8%14.0%2.4%7.3%
Nanopore Library Prep Workflow. Oxford Nanopore barcodes were incorporated into amplicons by PCR- individually for each SV- then resultant reactions were pooled (A). After NEB End Repair and dA-tailing modules (B), hairpin and leader adapters were ligated on, each containing a motor protein. Only the hairpin protein contained a his-tag, which was used to enrich for molecules containing a leader adapter and his-tag (his-tag selection step not shown). Tether attachment (C) allowed for direct attachment of the molecules to the flow cell membrane. Within the MinION flowcell (D), DNA molecules are pulled through a protein pore (blue), with motor protein (orange) affecting speed of DNA translocation through the pore. One side of the DNA molecule is read, then the hairpin, then the second side. Both reads were aligned to produce a 2D consensus read. Nanopore sequencing QC data. QC of Flow cell 1 A) length and B) PHRED quality histograms of each of the barcodes as a stacked bar graph. Average length of 570 bp and PHRED score of 11.5. QC of flow cell 2 C) length and D) PHRED quality histograms. Average length of 573 bp and PHRED score of 10.9. Details of Amplicons included in this study. Tumor Suppressor Gene (TSG), Structural Variant (SV), Translocation (TRANS), Wild-type Control (WT), Inversion (INV), Interstitial Deletion (ID). Yield and Quality of Exp1, Limited to 2D reads. All SV amplicons (12/12) mapped to their expected region(s) of hg19 (Table 3, Fig. 3), with overall mapping percentage of 99.6% and 79% of aligned reads with correctly matched bases (Table 2). Importantly, the representation of the SV amplicons seems independent of the complexity of their SV: intact genomic sequence (SV02) represented 3.5% of aligned reads, and a complex combination of a deletion, inversion, and translocation (SV10) represented 5.1% of aligned reads (Table 3). The technical replicates (SV01, SV07) had comparable results, as expected. However, some amplicons had a surprisingly low percentage of properly aligned SV structures, specifically of note is SV03 with only 16.5% correctly aligned. In this case only 68 (16.5%) reads had the full alignment of left, center and right sections. However, 313 reads (76.0%) had the left and right alignment, and 12 further reads (2.9%) had left and center alignment.
Table 3.

All SVs are detected by Nanopore multiplex (1:12) experiment [Exp1].

Amplicon IDSV Type2D reads% total reads per barcode2D reads aligned to hg19 (%)Reads properly aligned* (%)Off-target ReadsLumpy break-pointsTop Lumpy Breakpoint
SV01TRANS1002.5%91 (91.0%)77 (77.0%)6 (6.0%)1chr9:24353014/chr22:36338191 (58)
SV02n/a (WT)1403.5%139 (99.3%)115 (82.1%)24 (17.1%)0None
SV03INV+TRANS41210.3%412 (100.0%)68 (16.5%)1 (0.2%)8chr3:79387939/chr9:21083384 (132)
SV04TRANS3568.9%356 (100.0%)303 (85.1%)6 (1.7%)3chr9:27096843/chr10:132412942 (183)
SV05ID2255.6%224 (99.6%)198 (88.0%)7 (3.1%)2chr18:48570319 (154)
SV06INV56714.2%567 (100.0%)549 (96.8%)7 (1.2%)4chr9:24320456/chr9:24323864 (120)
SV07TRANS802.0%78 (97.5%)70 (87.5%)0 (0.0%)2chr9:24353014/chr22:36338191 (52)
SV08INV48712.2%487 (100.0%)449 (92.2%)3 (0.6%)4chr9:25968397/chr9:25969868 (384)
SV09INV2025.1%202 (100.0%)190 (94.1%)5 (2.5%)2chr9:25969501/chr9:25972324 (172)
SV10INV+TRANS3067.7%301 (98.4%)254 (83.0%)1 (0.3%)5chr7:140023522/chr9:21326914 (167)
SV11TRANS72518.2%725 (100.0%)471 (65.0%)3 (0.4%)3chr6:124911349/chr18:53465049 (362)
SV12ID3879.7%387 (100.0%)274 (70.8%)3 (0.8%)2chr18:48434141 (115)
Average 3328.3%331 (98.8%)252 (78.2%)6 (2.8%)  

To be considered properly aligned, a read must align to all expected regions (eg. Left sequence, Center sequence, and Right sequence, from Table 1).

Figure 3.

IGV screenshot alignment of WT (SV02). B-C) IGV Screenshot of Translocation (SV01) alignment. B) Shows the alignment to the area in chr9 and C) the alignment to the area in chr22. Note the erroneous extension of the read past the breakpoint in the bottom left. D-E) IGV Screenshot of Interstitial Deletion (SV05) alignment. The plot shows the alignment to the area upstream D) and downstream E) of the deletion in chr18. Note the erroneous extension of the read past the breakpoint in the top right. F-G) IGV Screenshot of Inversion (SV09) alignment. The plot shows the alignment to the inverted area F) and G) the area downstream of the inversion. We have flipped G) to show how the 2 parts align. Note the erroneous extension of the read past the breakpoint in the top left.

IGV screenshot alignment of WT (SV02). B-C) IGV Screenshot of Translocation (SV01) alignment. B) Shows the alignment to the area in chr9 and C) the alignment to the area in chr22. Note the erroneous extension of the read past the breakpoint in the bottom left. D-E) IGV Screenshot of Interstitial Deletion (SV05) alignment. The plot shows the alignment to the area upstream D) and downstream E) of the deletion in chr18. Note the erroneous extension of the read past the breakpoint in the top right. F-G) IGV Screenshot of Inversion (SV09) alignment. The plot shows the alignment to the inverted area F) and G) the area downstream of the inversion. We have flipped G) to show how the 2 parts align. Note the erroneous extension of the read past the breakpoint in the top left. All SVs are detected by Nanopore multiplex (1:12) experiment [Exp1]. To be considered properly aligned, a read must align to all expected regions (eg. Left sequence, Center sequence, and Right sequence, from Table 1).

Ability to detect low frequency SVs

We next wanted to determine the sensitivity of nanopore based SV detection to low frequency or rare events, to simulate a clinical scenario. To this end, we performed a 1:100 dilution of 6 SV amplicons in a background of intact p16 genomic sequence (SV02, Table 4). These 6 Amplicons included 2 simple interstitial deletions, 2 translocations, 1 inversion, and 1 complex combination of an inversion and translocation. The run produced a total of 4,058 2D reads from 270 of 512 channels, for a total yield of 2.6 Mb, with an average read length of 650 bp and an average PHRED score of 10.9 (Fig. 2C-D, Table 5). All 6 SV barcodes were represented in the alignment (range 9–21%) and aligned to the expected regions of hg19 (Table 4). Remarkably, even with only 378 2D reads in the case of SV04, the SV was detected, with 10 of the 378 reads supporting a chromosome 9-chromosome 10 translocation.
Table 4.

Results of low frequency serial dilutions of SVs 1:100 into wildtype [Exp2].

Amplicon IDSV TypeDilution into WT# 2D reads% per barcodeReads aligned to hg19Aligned reads mapped to WTAligned reads mapped to SV*# Off-target ReadsLumpy break-pointsTop Lumpy Breakpoint
SV01TRANS1:10086721.37%851 (98.2%)838 (96.7%)11 (1.3%)3 (0.3%)1chr9:24353014/chr22:36338197 (7)
SV03INV+ TRANS1:10076018.73%741 (97.5%)685 (90.1%)7 (0.9%)50 (6.6%)3chr3:79387933/chr9:21083384 (13)
SV04TRANS1:1003789.31%377 (99.7%)367 (97.1%)10 (2.6%)1 (0.3%)1chr9:27096848/chr10:132412942 (8)
SV05ID1:10057714.22%571 (99.0%)538 (93.2%)31 (5.4%)3 (0.5%)1chr18:48570319 (25)
SV09INV1:10062115.30%617 (99.4%)601 (96.8%)16 (2.6%)1 (0.2%)1chr9:25969504/chr9:25972324 (14)
SV12ID1:10085521.07%849 (99.3%)810 (94.7%)26 (3.0%)14 (1.6%)1chr18:48434141 (11)
 676 16.67%668 (98.8%)640 (94.8%)17 (2.6%)12 (1.6%)Mean   

To be considered properly aligned, a read must align to all expected regions (eg. Left sequence, Center sequence, and Right sequence, from Table 1).

Table 5.

Yield and Quality of Experiment 2, Limited to 2D reads.

AmpliconAvg Length (bp)Yield (bp)Yield (reads)Quality (PHRED)% Match% Mismatch% Insertion% Deletion
SV01570.85494,92586710.7980.2%13.0%2.6%6.8%
SV03573.37435,76076010.8379.2%13.7%2.8%7.1%
SV04575.37217,49137810.9080.5%12.7%2.6%6.7%
SV05571.22329,59357710.8580.0%13.1%2.8%6.8%
SV09572.65355,61762110.9180.6%12.8%2.7%6.6%
SV12573.27490,14685510.9380.1%13.0%2.6%6.9%
Average572.79387,25567610.8780.1%13.1%2.7%6.8%
Results of low frequency serial dilutions of SVs 1:100 into wildtype [Exp2]. To be considered properly aligned, a read must align to all expected regions (eg. Left sequence, Center sequence, and Right sequence, from Table 1). Yield and Quality of Experiment 2, Limited to 2D reads.

SV breakpoint location detection

To determine the accuracy of breakpoint location detection with this new sequencing methodology, we first employed LUMPY, an established tool for breakpoint detection for both discordant paired end short-read sequencing and long, split read alignments. Using the alignment files generated from BWA above, we extracted the split reads and fed the resulting BAM file into LUMPY. The results are included in Tables 3 and 4. For some of our samples, LUMPY detected the correct breakpoint, and only one breakpoint, (SV01), or detected no breakpoint in the WT sample (SV02), but in general the breakpoints it detected, though correct in type, lacked precision. In the duplicate sample of SV01, SV07, the same correct breakpoint was detected. Not as many pieces of evidence (as decided by LUMPY) support this breakpoint as when simply examining coverage, because LUMPY has strict map quality filters which remove some of the reads from consideration. In many cases LUMPY detects many breakpoints at slightly shifted conditions – to a max of 8 breakpoints detected in SV03. The breakpoint with the plurality of reads accepted by LUMPY is represented in Tables 3 and 4. We examined the alignment more carefully to determine the cause of these artifacts in breakpoint location detection. Figs. 3 B-G give a hint as to the problem – a careful examination notes that the reads frequently align past the breakpoint, but most of the bases are mismatched in these locations. With BWA bound by the promiscuous settings used for nanopore sequencing alignment, it continues the alignment past the breakpoint. We summarized these findings in Table 6. Though in many cases the alignment termini are set to the breakpoints correctly, for SV03 in particular this happens a minority of the time. When the alignment slips past the downstream boundary of the left fragment, there is no longer sufficient sequence for the center fragment to align.
Table 6.

Alignment termini position error.

Amplicon IDOverlapping alignmentsCorrect Upstream Termini (%)Mean Upstream Error ± SDCorrect Downstream Termini (%)Mean Downstream Error ± SD
SV01, SV07 (L) 773 (3.9%)1.8 ± 5.453 (68.8%)5.3 ± 14.5
SV01, SV07 (R) 854 (4.7%)1.1 ± 5.51 (1.2%)−1.3 ± 17.3
SV021167 (6.0%)0.1 ± 5.213 (11.2%)−4.3 ± 10.1
SV03 (L)4076 (1.5%)4.9 ± 8.316 (3.9%)20.8 ± 13.7
SV03 (C) 831 (1.2%)10.8 ± 18.458 (69.9%)4.7 ± 18.7
SV03 (R)39151 (13.0%)1.8 ± 17.23 (0.8%)39.1 ± 24.2
SV04 (L)31723 (7.3%)11.0 ± 19.31 (0.3%)7.3 ± 7.0
SV04 (R)3492 (0.6%)19.0 ± 23.3196 (56.2%)−5.7 ± 14.2
SV05 (L)2254 (1.8%)5.7 ± 8.5107 (47.6%)2.9 ± 14.2
SV05 (R)2061 (0.5%)7.3 ± 9.862 (30.1%)−3.8 ± 9.7
SV06 (L)5650 (0.0%)−20.9 ± 5.2229 (40.5%)−1.0 ± 4.5
SV06 (R)5554 (0.7%)5.6 ± 13.6297 (53.5%)−1.6 ± 7.8
SV08 (L) 706 (8.6%)2.7 ± 5.445 (64.3%)2.9 ± 14.7
SV08 (R) 783 (3.8%)4.4 ± 14.31 (1.3%)−1.1 ± 7.9
SV09 (L)476125 (26.3%)4.4 ± 9.9116 (24.4%)−2.1 ± 6.6
SV09 (R)46437 (8.0%)0.0 ± 11.0276 (59.5%)7.0 ± 28.4
SV10 (L)27132 (11.8%)−0.5 ± 4.70 (0.0%)49.9 ± 19.4
SV10 (C)2600 (0.0%)144.0 ± 26.50 (0.0%)−9.9 ± 14.7
SV10 (R)3020 (0.0%)28.8 ± 24.48 (2.6%)15.7 ± 27.6
SV11 (L)7254 (0.6%)34.7 ± 51.591 (12.6%)−5.7 ± 7.9
SV11 (R)4728 (1.7%)−0.1 ± 8.075 (15.9%)2.4 ± 8.6
SV12 (L)38672 (18.7%)5.2 ± 6.9151 (39.1%)4.1 ± 11.2
SV12 (R)27811 (4.0%)−0.6 ± 7.96 (2.2%)9.9 ± 9.9
Alignment termini position error.

Discussion

The accurate and timely detection of tumor-associated alterations, including SVs, is important for patient management, from early detection to monitoring for molecular relapse, as well as determining or predicting chemoresponse. Detection of all tumor-associated alterations is complicated by the low tumor cellularity often present in tumor samples and biopsies due to contaminating normal cells. Tumor-associated SVs are additionally complicated when they occur in repetitive regions, which account for over half of the human genome. The ability of long-read 3rd generation sequencing methods, such as nanopore, to read through repetitive regions could make it an ideal tool for detecting tumor-associated SVs. This work serves as proof-of-principle, showing the ability of nanopore sequencing to correctly and reliably detect SVs with only hundreds, instead of millions of reads. Furthermore, we have demonstrated the feasibility of the MinION for the detection of well-characterized patient-specific SV rearrangements using in vitro mixtures of PCR amplicons at 1:100 dilutions in wildtype sequence. The 4 types of SVs assessed in this study include simple interstitial deletions, translocations, inversions, and the complex combination of a translocation and an inversion (“TransFlip”). This is accomplished despite the error rate (at the single-base level) of this emerging technology, because the read length is relatively long, the human genome is known, and this level of accuracy is sufficient to correctly map hundreds to thousands of bases, even if they contain multiple point mutation errors. Precision of breakpoint location is still limited, but this can be solved bioinformatically, via alignment parameter optimization or breakpoint detection tailored to the idiosyncrasies of nanopore sequencing data. The primary advantages for nanopore sequencing over 2nd generation sequencing methods for detection of SV are its (1) ability to sequence through repetitive regions, (2) speed, and (3) low cost and availability. First, the long-read nature (up to 20 kb) of nanopore sequencing allows reading through repetitive regions. Even with long mate-pair sequencing at deep coverage, 2nd generation methods' short-read sequences prohibit accurate and efficient mapping of repetitive regions, which often house SVs. Previous work has demonstrated that long-read sequencing on its own has been able to detect novel SVs; 10% of the ∼30,000 SVs detecting in a single individuals somatic genome were detected only via long-read PacBio sequencing. Second, the speed of real-time nanopore sequencing offers results in minutes, allowing for rapid diagnosis and treatment. To have 99% confidence of a variant at 1:100 in the sample, we need ∼450X coverage over the region of interest. In nanopore, each of the 512 channels can generate a read separately, with each read completed and analyzable in minutes. From the 2 sequencing runs we performed in this paper, generating 450 reads required 15 minutes and 33 minutes respectively. In contrast, 2nd generation sequencing generates millions of reads simultaneously, but the reads are only complete after hours or days, meaning that any analysis has to wait for completion. For example, the fastest Illumina 2nd generation sequencing run requires 4 hours to obtain 1 × 36 bp 12 M reads (MiSeq v2); and such short reads would prove challenging for SV detection. In both cases we are omitting the library preparation time, but these times are largely equivalent. Third, the low cost (approximately $1k per device) and small size (USB stick) of the MinION nanopore sequencing instrument offer accessibility to testing in nearly any setting. In contrast, the instrumentation for 2nd generation methods require a substantial upfront investment (>$100k) and sufficient lab space for their large footprint, which are prohibitive to many research and clinical labs. There are currently 2 limitations that restrict the utility of nanopore sequencing: (1) a relatively high mismatch and indel error rate and (2) limited yield (on the scale of Megabases or Gigabases), but both of these factors continue to improve. In our hands, error rate per read decreased from 32% to 14% over a 6 month period. Better tools for corrected basecalling, alignment and assembly tools have already been generated by the community. While still insufficient for whole-genome sequencing, the MinION yield has been increasing, and yields per flow cell by other groups have reached nearly ∼200 Mb, with substantially greater improvements (10-fold) in yield expected soon. Additionally, the throughput of nanopore sequencing should increase with the release of the PromethION, GridION, and subsequent systems from Oxford Nanopore. The capacity of the MinION system is currently sufficient to sequence tumor DNA for SVs provided that a small subset of the genome is first captured. For example, deletions within the p16/CDKN2A locus in pancreatic cancer can span up to 10 megabases. Given the ability for long reads, it may be the ideal tool for phasing of 2 mutations within the same gene, provided frozen tissue is available. To test for circulating tumor DNA (ctDNA) in plasma additional improvements in throughput will be required to achieve the require 1000–100,000X coverage required for minimal residual disease testing and early detection of solid tumors. This will require the PromethION, GridION or subsequent instrument. Here we have shown the ability and reliability of nanopore sequencing, a 3rd generation sequencing method, to detect well-characterized SVs, and at low levels that simulate that seen in the clinical testing. Importantly, the SV sequences were represented equally well in the alignments of nanopore sequencing data - from simple (interstitial deletions) to complex (inversions, translocations, and TransFlips) SVs. Further development is needed on bioinformatics tools which can precisely align to and detect breakpoint locations. It will be critically important to demonstrate the ability to detect SVs from cancer:normal cell titrations of genomic DNA, as well as plasma from pre- and post-resection patients. Ongoing studies involve further dilution experiments and detection of novel (unknown) SVs directly from patient samples.

Materials and methods

Identification of SVs

Genomic DNA was extracted from previously described PDAC cancer cell lines using QIAamp DNA mini kit (Qiagen), per manufacturer's instruction. Structural variants associated with p16 and SMAD4 deletions were identified by high density SNP microarray and WGS, and confirmed by PCR amplifying across the novel DNA:DNA junction and bidirectional Sanger sequencing. Primers were designed upstream and downstream of 10 p16 and SMAD4 deletions associated with different SVs (Table 1), as well as p16 wildtype sequence, to produce amplicons of 550–600 basepairs. We also included a technical replicate in our design to control for technical variation (SV01 and SV07). Residual nucleotides and oligonucleotides were removed using QIAquick PCR purification kit (Qiagen), per manufacturer's instructions. PCR specificity was verified by gel electrophoresis and quantified by Qubit DNA double-stranded high sensitivity assay (dsDNA HS assay, Life Technologies).

Library preparation

Barcodes were added to the PCR amplicons with Oxford Nanopore primers complementary to the tail sequence with a sample specific barcode (Barcode Developer Kit I) using Long Range PCR kits (NEB) (Fig. 1). This allowed for multiplexing of up to 12 samples on a single flow cell. Barcoded PCR libraries were quantified with Qubit dsDNA HS Assay kit, normalized, and pooled to a final amount of 1 μg. For sequencing, the libraries were end-repaired and dA-tailed using NEB DNA Ultra modules, followed by the ligation of hairpin and Oxford Nanopore-specific leader adapters using Genomic DNA Sequencing Kit MAP-004 (Oxford Nanopore). A motor protein was bound to both the leader and hairpin adapters, and serves to ratchet each molecule through the nanopore one base at a time. Enrichment for molecules containing hairpin adapters and bound motor protein was performed using His-Tag Dynabeads® (Life Technologies).

Flowcell runs

For the first flowcell run, the 12 Amplicons were multiplexed together at equal concentrations. For the second flowcell run, in vitro dilutions were performed to assess the ability to detect low-level SVs, to simulate clinical samples. Specifically, the following p16- and SMAD4-associated SVs were diluted at 1:100 in wildtype p16 sequence (SV02): an inversion (SV09), an inversion with translocation (SV03), translocations (SV01 and SV04), and simple interstitial deletions (SV05 and SV12). These dilutions were barcoded and multiplexed together at equal concentrations.

Oxford nanopore MinION™ sequencing and basecalling

The MinION Flow Cell (R7.3 chemistry) was run for 48 hours on MinKNOW software (v0.49.3.7), producing thousands of fast5 files, each file corresponding to a molecule read by the sequencer. Cloud-based basecalling software (Metrichor™, v2.29.1, Oxford Nanopore) was used to convert electrical event data from MinKNOW into basecalled files. Three basecalled reads were produced: a “1D template” and “1D complement,” and “2D read.” The 2D read is the consensus sequence between the template and complement reads, and a basic quality filter is applied to keep only 2D reads with a ratio of template bases to complement bases between 0.5 and 2. Nanopore basecalling is performed by Metrichor using a hidden Markov model, similar to the process described in a simulated data set previously. Briefly, each pentamer generates a specific current which, although difficult to distinguish uniquely, combined with the controlled translocation rate, allows for basecalling the best full sequence.

Alignment and SV calling of reads

Using only 2D nanopore reads which passed the quality filter, we de-mulitplexed and extracted fastq data with custom code in python (https://github.com/timp0/timp_nanoporesv). Data is available at the SRA archive with accession number SRP069199. We then aligned the nanopore long reads against the hg19 reference genome using BWA-MEM, with the –x ont2d option set for nanopore specific alignment parameters. A custom python script to extract split read alignments and calculate error in alignment location is also included in the online git repository (https://github.com/timp0/timp_nanoporesv).
  22 in total

1.  Transflip mutations produce deletions in pancreatic cancer.

Authors:  Alexis L Norris; Hirohiko Kamiyama; Alvin Makohon-Moore; Aparna Pallavajjala; Laura A Morsberger; Kurt Lee; Denise Batista; Christine A Iacobuzio-Donahue; Ming-Tseh Lin; Alison P Klein; Ralph H Hruban; Sarah J Wheelan; James R Eshleman
Journal:  Genes Chromosomes Cancer       Date:  2015-05-29       Impact factor: 5.006

2.  Comparison of Sanger sequencing, pyrosequencing, and melting curve analysis for the detection of KRAS mutations: diagnostic and clinical implications.

Authors:  Athanasios C Tsiatis; Alexis Norris-Kirby; Roy G Rich; Michael J Hafez; Christopher D Gocke; James R Eshleman; Kathleen M Murphy
Journal:  J Mol Diagn       Date:  2010-04-29       Impact factor: 5.568

3.  Detection of circulating tumor DNA in early- and late-stage human malignancies.

Authors:  Chetan Bettegowda; Mark Sausen; Rebecca J Leary; Isaac Kinde; Yuxuan Wang; Nishant Agrawal; Bjarne R Bartlett; Hao Wang; Brandon Luber; Rhoda M Alani; Emmanuel S Antonarakis; Nilofer S Azad; Alberto Bardelli; Henry Brem; John L Cameron; Clarence C Lee; Leslie A Fecher; Gary L Gallia; Peter Gibbs; Dung Le; Robert L Giuntoli; Michael Goggins; Michael D Hogarty; Matthias Holdhoff; Seung-Mo Hong; Yuchen Jiao; Hartmut H Juhl; Jenny J Kim; Giulia Siravegna; Daniel A Laheru; Calogero Lauricella; Michael Lim; Evan J Lipson; Suely Kazue Nagahashi Marie; George J Netto; Kelly S Oliner; Alessandro Olivi; Louise Olsson; Gregory J Riggins; Andrea Sartore-Bianchi; Kerstin Schmidt; le-Ming Shih; Sueli Mieko Oba-Shinjo; Salvatore Siena; Dan Theodorescu; Jeanne Tie; Timothy T Harkins; Silvio Veronese; Tian-Li Wang; Jon D Weingart; Christopher L Wolfgang; Laura D Wood; Dongmei Xing; Ralph H Hruban; Jian Wu; Peter J Allen; C Max Schmidt; Michael A Choti; Victor E Velculescu; Kenneth W Kinzler; Bert Vogelstein; Nickolas Papadopoulos; Luis A Diaz
Journal:  Sci Transl Med       Date:  2014-02-19       Impact factor: 17.956

4.  Human breast cancer: correlation of relapse and survival with amplification of the HER-2/neu oncogene.

Authors:  D J Slamon; G M Clark; S G Wong; W J Levin; A Ullrich; W L McGuire
Journal:  Science       Date:  1987-01-09       Impact factor: 47.728

5.  Circulating mutant DNA to assess tumor dynamics.

Authors:  Frank Diehl; Kerstin Schmidt; Michael A Choti; Katharine Romans; Steven Goodman; Meng Li; Katherine Thornton; Nishant Agrawal; Lori Sokoll; Steve A Szabo; Kenneth W Kinzler; Bert Vogelstein; Luis A Diaz
Journal:  Nat Med       Date:  2007-07-31       Impact factor: 53.440

6.  Diverse hypermutability of multiple expressed sequence motifs present in a cancer with microsatellite instability.

Authors:  J R Eshleman; S D Markowitz; P S Donover; E Z Lang; J D Lutterbaugh; G M Li; M Longley; P Modrich; M L Veigl; W D Sedwick
Journal:  Oncogene       Date:  1996-04-04       Impact factor: 9.867

7.  Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

Authors:  Andrew V Biankin; Nicola Waddell; Karin S Kassahn; Marie-Claude Gingras; Lakshmi B Muthuswamy; Amber L Johns; David K Miller; Peter J Wilson; Ann-Marie Patch; Jianmin Wu; David K Chang; Mark J Cowley; Brooke B Gardiner; Sarah Song; Ivon Harliwong; Senel Idrisoglu; Craig Nourse; Ehsan Nourbakhsh; Suzanne Manning; Shivangi Wani; Milena Gongora; Marina Pajic; Christopher J Scarlett; Anthony J Gill; Andreia V Pinho; Ilse Rooman; Matthew Anderson; Oliver Holmes; Conrad Leonard; Darrin Taylor; Scott Wood; Qinying Xu; Katia Nones; J Lynn Fink; Angelika Christ; Tim Bruxner; Nicole Cloonan; Gabriel Kolle; Felicity Newell; Mark Pinese; R Scott Mead; Jeremy L Humphris; Warren Kaplan; Marc D Jones; Emily K Colvin; Adnan M Nagrial; Emily S Humphrey; Angela Chou; Venessa T Chin; Lorraine A Chantrill; Amanda Mawson; Jaswinder S Samra; James G Kench; Jessica A Lovell; Roger J Daly; Neil D Merrett; Christopher Toon; Krishna Epari; Nam Q Nguyen; Andrew Barbour; Nikolajs Zeps; Nipun Kakkar; Fengmei Zhao; Yuan Qing Wu; Min Wang; Donna M Muzny; William E Fisher; F Charles Brunicardi; Sally E Hodges; Jeffrey G Reid; Jennifer Drummond; Kyle Chang; Yi Han; Lora R Lewis; Huyen Dinh; Christian J Buhay; Timothy Beck; Lee Timms; Michelle Sam; Kimberly Begley; Andrew Brown; Deepa Pai; Ami Panchal; Nicholas Buchner; Richard De Borja; Robert E Denroche; Christina K Yung; Stefano Serra; Nicole Onetto; Debabrata Mukhopadhyay; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Steven Gallinger; Ralph H Hruban; Anirban Maitra; Christine A Iacobuzio-Donahue; Richard D Schulick; Christopher L Wolfgang; Richard A Morgan; Rita T Lawlor; Paola Capelli; Vincenzo Corbo; Maria Scardoni; Giampaolo Tortora; Margaret A Tempero; Karen M Mann; Nancy A Jenkins; Pedro A Perez-Mancera; David J Adams; David A Largaespada; Lodewyk F A Wessels; Alistair G Rust; Lincoln D Stein; David A Tuveson; Neal G Copeland; Elizabeth A Musgrove; Aldo Scarpa; James R Eshleman; Thomas J Hudson; Robert L Sutherland; David A Wheeler; John V Pearson; John D McPherson; Richard A Gibbs; Sean M Grimmond
Journal:  Nature       Date:  2012-10-24       Impact factor: 49.962

8.  Clinical validation of KRAS, BRAF, and EGFR mutation detection using next-generation sequencing.

Authors:  Ming-Tseh Lin; Stacy L Mosier; Michele Thiess; Katie F Beierl; Marija Debeljak; Li-Hui Tseng; Guoli Chen; Srinivasan Yegnasubramanian; Hao Ho; Leslie Cope; Sarah J Wheelan; Christopher D Gocke; James R Eshleman
Journal:  Am J Clin Pathol       Date:  2014-06       Impact factor: 2.493

9.  Oxford Nanopore sequencing, hybrid error correction, and de novo assembly of a eukaryotic genome.

Authors:  Sara Goodwin; James Gurtowski; Scott Ethe-Sayers; Panchajanya Deshpande; Michael C Schatz; W Richard McCombie
Journal:  Genome Res       Date:  2015-10-07       Impact factor: 9.043

10.  MinION Analysis and Reference Consortium: Phase 1 data release and analysis.

Authors:  Camilla L C Ip; Matthew Loose; John R Tyson; Mariateresa de Cesare; Bonnie L Brown; Miten Jain; Richard M Leggett; Ewan Birney; David Buck; Sara Goodwin; Hans J Jansen; Justin O'Grady; Hugh E Olsen; David A Eccles; Vadim Zalunin; John M Urban; Paolo Piazza; Rory J Bowden; Benedict Paten; Solomon Mwaigwisya; Elizabeth M Batty; Jared T Simpson; Terrance P Snutch
Journal:  F1000Res       Date:  2015-10-15
View more
  47 in total

1.  Accurate Typing of Human Leukocyte Antigen Class I Genes by Oxford Nanopore Sequencing.

Authors:  Chang Liu; Fangzhou Xiao; Jessica Hoisington-Lopez; Kathrin Lang; Philipp Quenzel; Brian Duffy; Robi D Mitra
Journal:  J Mol Diagn       Date:  2018-04-03       Impact factor: 5.568

Review 2.  Circulating cell-free DNA for non-invasive cancer management.

Authors:  Caitlin M Stewart; Dana W Y Tsui
Journal:  Cancer Genet       Date:  2018-03-11

3.  Identification of large rearrangements in cancer genomes with barcode linked reads.

Authors:  Li C Xia; John M Bell; Christina Wood-Bouwens; Jiamin J Chen; Nancy R Zhang; Hanlee P Ji
Journal:  Nucleic Acids Res       Date:  2018-02-28       Impact factor: 16.971

4.  Polycationic Probe-Guided Nanopore Single-Molecule Counter for Selective miRNA Detection.

Authors:  Kai Tian; Ruicheng Shi; Amy Gu; Michael Pennella; Li-Qun Gu
Journal:  Methods Mol Biol       Date:  2017

5.  A New Fast Phasing Method Based On Haplotype Subtraction.

Authors:  Evelina Mocci; Marija Debeljak; Alison P Klein; James R Eshleman
Journal:  J Mol Diagn       Date:  2019-03-11       Impact factor: 5.568

6.  A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data.

Authors:  Brett Trost; Susan Walker; Zhuozhi Wang; Bhooma Thiruvahindrapuram; Jeffrey R MacDonald; Wilson W L Sung; Sergio L Pereira; Joe Whitney; Ada J S Chan; Giovanna Pellecchia; Miriam S Reuter; Si Lok; Ryan K C Yuen; Christian R Marshall; Daniele Merico; Stephen W Scherer
Journal:  Am J Hum Genet       Date:  2018-01-04       Impact factor: 11.025

7.  Real-time detection of BRAF V600E mutation from archival hairy cell leukemia FFPE tissue by nanopore sequencing.

Authors:  Davide Vacca; Valeria Cancila; Alessandro Gulino; Giosuè Lo Bosco; Beatrice Belmonte; Arianna Di Napoli; Ada Maria Florena; Claudio Tripodo; Walter Arancio
Journal:  Mol Biol Rep       Date:  2017-12-13       Impact factor: 2.316

Review 8.  Three decades of nanopore sequencing.

Authors:  David Deamer; Mark Akeson; Daniel Branton
Journal:  Nat Biotechnol       Date:  2016-05-06       Impact factor: 54.908

9.  Skin mites in mice (Mus musculus): high prevalence of Myobia sp. (Acari, Arachnida) in Robertsonian mice.

Authors:  Natalia Sastre; Oriol Calvete; Jessica Martínez-Vargas; Nuria Medarde; Joaquim Casellas; Laura Altet; Armand Sánchez; Olga Francino; Jacint Ventura
Journal:  Parasitol Res       Date:  2018-05-04       Impact factor: 2.289

10.  Nanopore Fabrication and Application as Biosensors in Neurodegenerative Diseases.

Authors:  Brian Lenhart; Xiaojun Wei; Zehui Zhang; Xiaoqin Wang; Qian Wang; Chang Liu
Journal:  Crit Rev Biomed Eng       Date:  2020
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.