Literature DB >> 33830997

CaBagE: A Cas9-based Background Elimination strategy for targeted, long-read DNA sequencing.

Amelia D Wallace^1,2, Thomas A Sasani³, Jordan Swanier¹, Brooke L Gates⁴, Jeff Greenland⁴, Brent S Pedersen^1,2, Katherine E Varley⁴, Aaron R Quinlan^1,2,5.

Abstract

A substantial fraction of the human genome is difficult to interrogate with short-read DNA sequencing technologies due to paralogy, complex haplotype structures, or tandem repeats. Long-read sequencing technologies, such as Oxford Nanopore's MinION, enable direct measurement of complex loci without introducing many of the biases inherent to short-read methods, though they suffer from relatively lower throughput. This limitation has motivated recent efforts to develop amplification-free strategies to target and enrich loci of interest for subsequent sequencing with long reads. Here, we present CaBagE, a method for target enrichment that is efficient and useful for sequencing large, structurally complex targets. The CaBagE method leverages the stable binding of Cas9 to its DNA target to protect desired fragments from digestion with exonuclease. Enriched DNA fragments are then sequenced with Oxford Nanopore's MinION long-read sequencing technology. Enrichment with CaBagE resulted in a median of 116X coverage (range 39-416) of target loci when tested on five genomic targets ranging from 4-20kb in length using healthy donor DNA. Four cancer gene targets were enriched in a single reaction and multiplexed on a single MinION flow cell. We further demonstrate the utility of CaBagE in two ALS patients with C9orf72 short tandem repeat expansions to produce genotype estimates commensurate with genotypes derived from repeat-primed PCR for each individual. With CaBagE there is a physical enrichment of on-target DNA in a given sample prior to sequencing. This feature allows adaptability across sequencing platforms and potential use as an enrichment strategy for applications beyond sequencing. CaBagE is a rapid enrichment method that can illuminate regions of the 'hidden genome' underlying human disease.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 33830997 PMCID： PMC8031414 DOI： 10.1371/journal.pone.0241253

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

While short-read DNA sequencing technologies have enabled the discovery of genetic variants underlying numerous rare genetic disorders [1, 2], a large fraction of the human genome remains very difficult to interrogate with short-reads. These so-called “hidden” regions are difficult to sequence with short-read technologies owing to a mixture of sequence paralogy, complex haplotype structures, and tandem repeats [3, 4]. Collectively these hidden regions impact over 700 genes [4]. Paralogous sequences consist of ancestrally duplicated genomic segments. These sequences can be entire genes or segmental duplications (a duplicated sequence >1kb) and can appear in tandem or interspersed throughout the genome. Due to high homology elsewhere in the genome, there is ambiguity when mapping short reads to these regions. Thus, approximately 70% of segmental duplications are not sequence-resolved in the human reference genome, and are simply annotated as gaps [5]. Polymorphic mobile element insertions are similarly difficult to map, as multiple copies exist throughout the genome and yet broad phenotypic effects of this variation have been suggested [6, 7]. Short tandem repeats (STRs) are another class of genomic sequence that is difficult to resolve, and have estimated mutation rates orders of magnitude higher than single nucleotide variation [8]. Yet the contribution of tandem repeats to phenotypic heterogeneity remains poorly understood due to limitations in our ability to accurately detect and genotype these features. STR expansions underlie over 40 developmental and neurological disorders [9], highlighting a clear need for better molecular and informatics techniques to genotype these features across individuals [10]. The (CCCCGG)n repeat expansion in C9orf72 segregates with up to 40% of familial amyotrophic lateral sclerosis (ALS) cases [11] and is one of few established causes of the disease [12]. However, sequencing through complete C9orf72 repeat expansions is difficult; therefore, diagnostics rely on laborious, semi-quantitative methods such as Southern blot or repeat-primed PCR (RP-PCR). In contrast, long-read sequencing (LRS) can, in principle, provide essential quantitative information such as repeat length and sequence content, which may reveal connections between allelic polymorphism and clinical phenotypes such as severity and age of onset. Oxford Nanopore Technologies (ONT) long-read sequencing (LRS) [13] enables direct measurement of loci containing complex structures without introducing biases due to amplification or polymerase slippage, and permits highly accurate mapping. At the same time, native modifications to DNA or RNA are preserved and can be detected concurrently with the nucleic acid sequence. While higher error rates limit the accuracy of single nucleotide variant discovery compared to Illumina DNA sequencing, long reads that completely span hidden genomic regions offer the potential for comprehensive and accurate discovery of the structural variation therein. A recent study sequenced fifteen human genomes with long reads and showed that over 80% of structural variants genotyped were missed when called from Illumina data for the same subjects [14]. In fact, the sensitivity of LRS can greatly exceed standard next generation sequencing (NGS), particularly for large insertions (>50bp) [15]. The ONT MinION is particularly advantageous for diagnostics, as it is affordable, portable, and capable of generating reads up to 1Mb. A pressing limitation of the MinION however, is the low throughput relative to other sequencing technologies (e.g., Illumina). This has motivated recent efforts to enrich loci of interest for subsequent LRS without amplification, which limits target-lengths and can introduce PCR bias. Many emerging methods leverage the highly specific targeting ability of the CRISPR/Cas9 system, but strategies vary widely and have unique strengths and limitations related to DNA input requirements, protocol execution time, target size restrictions, and efficiency [16-22]. CATCH was one of the first methods published and relies on pulsed-field gel electrophoresis to physically isolate a DNA target of known size that is first cut at the flanks with Cas9 [17, 23]. This method is amenable to very large targets (200kb) because DNA is protected from shearing in agarose plugs. However, if the target length is variable or unknown, as with pathogenic repeat expansions, the method suffers and amplification is often required to obtain high sequencing yields. Subsequent strategies improved yield and efficiency by enriching sequencing data for target sequences without physical enrichment of target DNA fragments in the sample. The nCATS method uses dephosphorylation to prevent adapter ligation in sample DNA [24]. Next, the 5-prime phosphates flanking a target are restored using the endonuclease activity of Cas9, so that those fragments alone are available for sequence adapter ligation. This method performs best for targets up to 20-30kb. Most recently, ReadFish, a computational method for real-time enrichment during sequencing, has been expanded to human genomic targets [25]. The method utilizes real-time sequence identification to allow off-target DNA fragments to be rejected from nanopores prior to completion of sequencing, thus performing targeted sequencing without specialized library preparation. ReadFish does not have cost associated with assay design, reagents, or equipment, however rejection of fragments from pores does decrease overall output from flow cells and thus reduces yield across individual targets [25]. Here we introduce a Cas9-based Background Elimination strategy, CaBagE. In contrast to nCATs and ReadFish, CaBagE physically enriches genomic DNA for specific target loci, producing enrichment with comparable efficiency in terms of library preparation time and sequence output. A similar strategy called Negative Enrichment has been independently proposed [26], but with enrichment 3 to 32-fold lower after LRS than with CaBagE. Cas9 is a single-turnover enzyme with endonuclease activity that can be easily directed to specific genomic sequences using guide RNAs. The complex formed between the enzyme, its RNA guide, and target DNA is very stable, and forcibly dissociates only under harsh environmental conditions [27]. In vitro studies have shown that the natural dissociation time of Cas9 from its DNA target is approximately 6 hours [28]. When challenged with competing proteins, Cas9 remains tightly bound in most cases [29]. We were therefore motivated to ask whether this property of Cas9 extends to multiple progressing exonucleases. If so, one can leverage exonucleases as a means to deplete background DNA and enrich for targeted loci that are bound and therefore protected by Cas9 on either side. Exonucleases have previously been used to eliminate background DNA in NGS libraries [26, 30, 31]. For example, Nested Patch PCR protects target DNA from digestion by capping the target sequences with adapters containing phosphodiester bonds [30] and ChIP-exo protocols rely on proteins bound to DNA to protect the “footprint” from exonuclease activity [31]. By directing Cas9 binding to either side of a specific target locus, we show that the DNA flanked by Cas9 is preserved amidst extensive digestion of genomic DNA by exonucleases, allowing for highly specific target enrichment without PCR. By coupling Cas9-based background elimination with long-read sequencing technology, we demonstrate target sequence enrichment in previously poorly characterized regions of the human genome. Further, we combine this output with a computational approach that allows clustering of long-read sequence alignments to yield genotypes across a pathogenic repeat expansion in C9orf72. This generalizable molecular framework is fast, accurate, and multiplex-ready, to characterize recalcitrant yet medically important genes.

Results

Cas9 Background Elimination (CaBagE) targeted sequencing strategy overview

To enrich for a genomic region of interest, we developed a method that uses Cas9 to selectively protect target DNA from background elimination by exonucleases (). First, Cas9 is targeted to both sides of a region of interest using locus-specific guide RNAs. The distance between the enzymes, effectively the target fragment length, is highly flexible and limited only by the ability to design guide RNAs flanking the target and the average fragment length of source genomic DNA. Immediately following Cas9 binding, Exonucleases I, III, and Lambda are introduced to degrade single stranded DNA, and double-stranded DNA from the 3-prime and 5-prime direction, respectively. These enzymes degrade most DNA present in the sample with the exception of the fragments flanked by the Cas9 enzymes, namely, the DNA target of interest. Heat incubation is then used to inactivate the exonucleases and force dissociation of the Cas9 enzyme from the target DNA. Then, the ends of the target DNA fragments are available for A-tailing and ligation of the sequencing adapters. Sequencing libraries are prepared beginning with the adapter ligation step of the ONT Cas-mediated PCR-free enrichment protocol (developed for use with nCATs) and sequenced on a single MinION flow cell for 48 hours. Target enrichment and library preparation can be completed in approximately 6 hours.

Schematic of Cas9 background elimination strategy.

A) Cas9 is bound to either side of target sequence. B) Off-target DNA is digested with a combination of exonucleases. C) Heat is used to dissociate that Cas9 and inactivate the exonucleases. D) On-target fragment is available for A-tailing and sequence adapter ligation. E) Target fragments are sequenced on the MinlON for 48 hours.

Cas9 prevents processive exonuclease from degrading DNA target

To test whether bound Cas9 prevents DNA degradation by a combination of three processive exonucleases, a 997bp synthetic double-stranded DNA gBlock (IDT) was designed to contain multiple guide RNA target sites. Cas9 cleavage requires that the target DNA, which is complimentary to the RNA guide, contains a 3bp protospacer adjacent motif (PAM) at its 3′ end. Cas9 binding affinity differs between the PAM-proximal and distal sides of the cleavage site [28]. Therefore, the gBlock was designed such that flanking pairs of target sites could be in either “PAM-in” or “PAM-out” orientation, where the PAM sequences contained in the paired target sites are oriented toward or away from each other, respectively (). Upon exonuclease challenge, stretches of gBlock DNA contained between two bound Cas9 enzymes were protected from degradation during a 2-hour incubation, while gBlock stretches not bound on both sides by Cas9 were completely degraded (). DNA was protected between two Cas9 enzymes regardless of PAM orientation. However, PAM-in orientation resulted in the highest estimated concentration of the protected segment of DNA following exonuclease challenge (mean PAM-in 225pg/uL, mean PAM-out 106.5pg/uL) and so was selected as the preferable orientation for target enrichment. As expected, in the absence of Cas9, nearly all gBlock DNA is degraded by the three exonucleases (). A. gBlock assay design for Cas9 challenge with exonuclease. gBlock contained two pairs of gRNA target sites, one with PAM-out orientation and one with PAM-in orientation. Upon Cas9 binding (depicted by scissors), each set of target sites generate 3 unique fragment lengths. The gRNAs are represented as dotted lines. B. Capillary electrophoresis results from exonuclease challenge experiment with Cas9. 15nM gBlock DNA was incubated with 40nM ribonucleoprotien complex, followed by digestion with a combination of exonucleases for 2 hours. When Cas9 is used without exonucleases, the gBlock is cut to produce expected fragment lengths. Upon challenge with exonuclease, only the fragments flanked on both sides by Cas9 remain in the sample. (l = in; O = out).

Yield and coverage

We targeted 5 loci using the CaBagE method; guide RNAs were selected with “PAM-in” orientation and are listed in . As a proof of concept, we targeted loci in healthy donor DNA, including a highly variable hexanucleotide repeat in C9orf72, and four cancer-related genes with guide RNAs previously validated for PCR-free targeted sequencing (GSTP1, KRT19, GPX1, SLC12A4) [16]. We multiplexed up to four loci per reaction and sequenced on a single flow cell. Target enrichment and sequencing for each locus was run in duplicate and runs targeted one or four loci, respectively, on a single flow cell (). Multiplexing multiple loci on a single flow cell did not significantly impact coverage across each individual locus, though coverage did vary from run-to-run. aMapQ = 60 bReads that span ≥ 90% of the target locus Sequence reads were aligned using MiniMap2 [32] and on-target reads were visualized with IGV [33]. On-target reads were considered as any reads that overlap the target region by at least 1bp and were counted using samtools [34]. Reads that overlap the target by greater than 90% were considered spanning reads and were counted using the bedtools “coverage” utility [35]. When sequencing across the repeat region of C9orf72 (~4Kb) in a healthy donor, over 90% of on-target reads spanned the locus, terminating at the Cas9 cleavage sites on either side. Further, both DNA strands were equally represented in the alignment data (). For the largest target, SLC12A4 (~24Kb), >65% of on-target reads spanned the locus (). The vast majority of off-target reads were <1,000bp in length. We found that selecting for larger fragments after adapter ligation using the ONT Long Fragment Buffer, which selects for fragments longer than 3kb, resulted in fewer reads overall and fewer on-target reads despite target fragments being larger than 3kb. For example, two independent runs using the same initial DNA sample with Short Fragment Buffer and Long Fragment Buffer generated 2,707,912 reads with 71 on-target and 99,191 reads with 14 on-target, respectively. As expected, the Long Fragment Buffer resulted in an enrichment of longer reads and also higher proportion of reads with map quality ≥60 (). However, due to the difference in the number of on-target reads, all CaBagE runs utilize the Short Fragment Buffer. Off-target reads were typically short (median length = 559bp, ) and randomly distributed throughout the genome, suggesting that they arose primarily by incomplete exonuclease digestion rather than off-target guide RNA binding. To determine whether off-target reads were enriched for other genomic features that might be preferentially protected from exonuclease digestion, we tested for a statistical enrichment for overlaps with G-quadruplex annotations (permutation test, p = 0.97) [36, 37]; further, the GC content distribution of off-target reads centered at 39.5%, reflecting the genome average (). Ten genomic regions showed pile-ups with >50X coverage, and these sites were annotated as having long chains of simple tandem repeats; therefore, the pile-ups were likely the result of mapping errors. The total number of reads generated from CaBagE targeted sequencing ranged from ~800,000 to 2.7 million. When restricting to reads with map quality ≥60, ~40% of off-target reads are removed ().

On-target reads (416X coverage) produced using the CaBagE target sequence enrichment strategy to capture the C9orf72 repeat-expansion locus in a healthy individual.

IGV screenshot shows aligned reads sorted by strand (plus, red; minus, blue).

Characteristics of a random sample of 1% of primary alignments from off-target reads and all on-target reads from a CaBagE run enriching for a 4,044bp target in a healthy individual.

A) Kernel density plot of read lengths in off- and on-target reads B) Kernel density plot of map quality scores in off- and on-target reads. To determine how target enrichment with CaBagE compares to nCATs in our hands, side-by-side sequencing runs targeting four loci were conducted. Using identical DNA input samples, concentrations, and sequencing parameters on flow cells that performed similarly during Platform QC (i.e. similar number of active pores available) the on-target read depth at the target locus achieved with nCATs was 2.6 to 10.7-fold higher than that of CaBagE (). While the CaBagE off-target sequencing rate resulting from incomplete exonuclease digestion likely contributed to its relatively lower on-target yield, coverage across the targets produced by CaBagE were sufficiently high (≥30X) for locus characterization.

CaBagE target enrichment produces reads that span a pathogenic repeat expansion in known carriers

To test the ability of our target enrichment strategy to sequence through disease-specific tandem repeat alleles in affected individuals, we applied CaBagE to two de-identified DNA samples with known C9orf72 repeat expansions from the National Institute of Neurological Disorders and Stroke (NINDS) repository at the Coriell Institute. Repeat copy numbers for these individuals were previously estimated using gene specific repeat-primed PCR (RP-PCR) and gel electrophoresis [38]. The upper limit of detection for repeat copy number estimation using RP-PCR is ~950 copies and genotypes above 950 copies are denoted as EXP, for expanded [38]. The PCR-based copy-number estimates for the two samples’ expanded alleles are 704 and EXP, respectively, where the EXP allele was beyond the upper limit of detection with PCR-based methods. Targeted sequencing of the C9orf72 repeat expansion using the CaBagE method in these individuals resulted in high (>60X) depth of coverage at the target locus (). A bias for the minus strand was observed in both NINDS ALS samples (). Strand bias has been previously observed when sequencing across repeats with ONT [39, 40] and can be correlated with repeat length, however we observed no apparent relationship between strand and repeat size. The G-rich and C-rich repeats of sense and antisense ssDNA at this locus form different secondary structures, which may migrate through the sequencing pores at different rates [41].

Targeted sequencing across repeat expansion at C9orf72 in two ALS cases.

A) Histogram of repeat copy number distribution and copy number estimates derived from a Gaussian mixture model for ND11836 (copy number, [percent of on-target reads]). B) IGV screen shot showing expanded reads across the hexanucleotide repeat for subject ND11836. C) Histogram of repeat copy number distribution and copy number estimates derived from a Gaussian mixture model for ND13803 (copy number, [percent of on-target reads]). D) IGV screen shot showing expanded reads across the hexanucleotide repeat for subject ND13803. aMapQ = 60 *RP-PCR repeat-primed PCR and agarose gel electrophoresis derived genotypes from Bram et al [38], CN copy number Spanning reads were defined as reads that aligned to both the 5 prime and 3 prime flanking sequence around the repeat, as well as the full repeat sequence itself. Per-read hexanucleotide repeat copy number was estimated by counting the number of bases between the position in the read that aligned immediately upstream of the repeat and immediately downstream, divided by six, the repeat motif length. Allele-specific repeat copy numbers were estimated from subgroup means derived from a Gaussian mixture model where the number of clusters was determined a priori by visually counting distinct peaks from a read-length histogram. In both samples, the read-length histograms showed 3 populations of spanning read lengths () and triallelic repeat copy number estimates are listed in . In sample ND11386, the majority of the expanded reads supported a copy number estimate 749 () and for ND13803, the majority of expanded reads supported a copy number 1,538 (), consistent with the estimates derived from RP-PCR. In both samples, the largest alleles detected were absent from the RP-PCR results, as they are larger than the detectable limit of the assay. Further, both samples showed a strong bias to sequencing the shortest allele, representing 79% and 91% of the spanning reads, respectively. This is likely an artifact of the technology sequencing shorter fragments more efficiently, as has been previously observed [19, 42, 43] and the fact that longer (e.g. expanded) fragments are more likely to be damaged between the flanking Cas9 binding sites, which would result in failure of enrichment. The presence of the three alleles in each sample were confirmed by repeated library preparation and sequencing of the same samples (). The appearance of the third alleles in these samples could be artifacts of cell line transformation from which the DNA was derived. Multiple populations of allele lengths have been previously observed in cell lines and was observed in ND11836 via Southern blot during validation of a PCR-based assay [44].

Discussion

We developed a method to enrich long-read sequence data for specific target loci that is fast, efficient, and amenable to the multiplexing of multiple target loci. By relying on the binding kinetics of the Cas9 enzyme to its RNA-guided target, CaBagE can flexibly enrich for targets so long as most fragments in the input DNA are intact between Cas9 binding sites. Therefore, to pursue very large targets (>~30Kb) will likely require ultra-high molecular weight DNA, which must be obtained with specialized DNA extraction methods such as agarose plugs or ultra-high molecular weight DNA extraction kits. CaBagE performs similarly in terms of prep time and input requirements, but with a lower yield than a popular competing method, nCATS [16]. Specifically, CaBagE costs approximately $9.40 more per run than nCATs and requires two additional hours of hands-off incubation time. The reduction in yield that we observe is most likely driven by the inefficiency of exonuclease digestion relative to dephosphorylation, which could be improved with further optimization of the protocol. There is also an increased sensitivity of CaBagE to fragmentation between Cas9 binding sites, where any break in DNA or failure of binding by either of the guides will result in degradation of the target molecule. This sensitivity to breakage increases with increased target size, which is reflected in , where the overall yield and proportion of reads that span the target is lower in larger targets. However, unlike the nCATS and ReadFish methods for amplification-free targeted sequencing, the enrichment achieved from CaBagE occurs at the DNA-level, where the ratio of on- to off-target DNA physically increases in the sample prior to sequencing. The Negative Enrichment strategy shares this feature of CaBagE, however, CaBagE utilizes a larger DNA input, different exonucleases and shorter digestion time, as well as modifications to the library preparation, which lead to significantly higher on-target coverage after sequencing on the MinION (3-32-fold higher). Physically enriching DNA for a specific target without modifying native DNA using CaBagE may therefore prove useful for applications beyond long-read DNA sequencing where isolating specific DNA sequence is required. Furthermore, while a Southern blot is the current gold standard for diagnosis of several repeat expansion disorders, it requires high sensitivity and low background caused by non-specific binding of the probe. The physical removal of off-target DNA by CaBagE might prove useful in background reduction for the Southern Blot and increase specificity for other size selection applications. Physical enrichment of target DNA in a sample may also aid in PCR-free cloning. For example, transformation-associated recombination (TAR) cloning is a method where efficiency has already been shown to increase with the introduction of double-strand breaks around the target of interest (~2% vs. ~30% gene-positive colonies) [45]. This efficiency may be further increased with the simple addition of the CaBagE background elimination step. Despite high on-target coverage, CaBagE sequences off-target fragments at a high rate owing to both incomplete exonuclease digestion and the lack of a selection step for long fragments. However, since an average CaBagE run yields ~1 Gb of sequence, which is well under the >8 Gb typical throughput for the MinION R9.4.1 using the ligation kit, we expect this high off-target rate isn’t detracting from our on-target depth. We demonstrated CaBagE’s ability to capture pathogenic repeat-expansion alleles in two ALS patients. We discovered 3 distinct read-length populations in each sample, potentially representing significant mosaicism. This observation is not uncommon in studies of repeat expansions where genotyping assays are performed on cell line-derived DNA [44, 46]. Determining whether these 3 alleles were present in the blood of these patients or arose as an artifact of cell culture or sequencing would require both blood and LCL-derived DNA from the same individual, which is not available for the NINDS ALS Collection. We note that several challenges remain in utilizing targeted long-read sequencing in the identification of repeat expansions. First, longer repeat expansions have greater instability, and growing and shrinking of repeat length is common and variable cell-to-cell and tissue-to-tissue in patients with the C9orf72 repeat expansion and other repeat expansion diseases [47, 48]. The observation of mosaic lengths of short tandem repeats in ours and previous studies poses an interesting challenge for estimating repeat-length genotypes and further calls into question whether creating a consensus sequence for the repeat is biologically meaningful. However, estimating a distribution of repeat lengths within an individual may be of clinical relevance, where a greater spread may indicate instability, which in turn may be correlated with pathogenesis. Second, sequencing across the repeat expansion using CaBagE resulted in a strong bias in the sequencing data toward shorter alleles. Therefore, in addition to needing high depth of coverage to detect the expansion, this length bias also complicates the ability to accurately quantify relative clonal contributions in cases where somatic mosaicism is present. Carefully extracted, high molecular weight DNA may not have as pronounced a bias, as longer fragments won’t be depleted in those samples. Overcoming this bias would be required for future studies of mosaicism. Accurate base calling also remains a challenge using ONT technologies, particularly in repeats with high GC content. We note that some reads representing the expanded alleles failed base calling using Guppy and were retrieved from the “fastq_fail” folder generated by the MinKNOW software. As the performance of Guppy continues to improve, methods that have been developed to detect tandem repeat in long-read sequencing data will also improve. For example, STRique [19] and TRiCoLoR [49], which detect repeat expansions from aligned reads, have already outpaced Nanosatellite, a repeat detection algorithm designed to circumvent issues with base calling by detecting repeats from raw signal data [42]. Strand biases are also exacerbated across repeats sequenced with long-read technologies [39] and should be considered during repeat sequence characterization. CaBagE’s amplification-free targeted sequencing can be used to effectively sequence across multiple, large loci on a single MinION flow cell. The method is not limited to the MinION, but should be adaptable to any long-read sequencing technology. Future work to improve the method will include increasing the efficiency of the exonuclease digestion and possibly adapting the method to be used for tiling across much larger targets with catalytically inactive dCas9. CaBagE is a target enrichment strategy that does not simply enrich sequencing data for specific loci, but enriches the DNA sample itself without amplification, thus potentially providing utility beyond long-read sequencing. As methods for DNA preparation, sequencing, and downstream data processing continue to improve, targeted sequencing methods like CaBagE will become indispensable in large-scale, cost-effective studies of complex structural variation.

Methods

Samples

A 997bp gBlock was designed to contain four gRNA target sites (). Deidentified healthy donor DNA was obtained from Promega (Human Genomic DNA: Female, G152A). DNA from ALS cases (ND11836 and ND13803) were extracted from EBV transformed LCLs by from the National Institute of Neurological Disorders and Stroke (NINDS) repository at the Coriell Institute. DNA was pre-treated with FFPE Repair Mix from NEB (M6630S) according to manufacturer’s Protocol for use with Other User-supplied Library Construction Reagents to repair nicks that could result in undesired target degradation by exonucleases.

Guide RNA design

Guide RNAs (sgRNA, ) were selected to flank up and downstream of the target locus. A combination of online tools including CHOPCHOP, E-CRISP, and IDT [50-52] were used to design sgRNAs with high in silico predicted on-target efficiency and minimal off-target effects. For target loci, pairs of sgRNAs were designed such that they maintained a “PAM-in” orientation to the target sequence. Preassembled gRNA comprised of crRNA and tracrRNA (IDT, Alt-R® CRISPR-Cas9 sgRNA, 2 nmol) sequences were purchased from IDT and resuspended in IDTE at a 10μM concentration.

Cas9 digestion

The molar ratio of Cas9:gRNA:DNA target was ~10:10:1. The ribonucleoprotein complex was formed by combining 150nM Cas9 enzyme with 150nM of each guide in 1X CutSmart buffer (NEB) and the 23.5μL reaction was incubated at 25°C for 10 minutes. A 40uL reaction containing the RNP complex, ~15nM (3ug) human genomic DNA or 30ng of gBlock in 1x Cutsmart buffer (NEB B7204) was incubated at 37°C for 15 minutes.

Exonuclease digestion

Immediately following Cas9 digestion, 260 total units of exonucleases (Exo I ([40U] NEB M0293), Exo III ([200U] NEB M0206), Lambda ([20U] NEB M0262]) diluted in 1X CutSmart buffer to 10μL were added to the reaction for a final reaction volume 50uL and incubated at 37°C for two hours, followed by heat inactivation at 80°C for 20 minutes.

A-tailing

1μL of 10mm dATP (Zymo Research, D1005) and 1μL Taq DNA Polymerase (M0267S) were added to reaction mix and incubated at 72°C for 5 minutes.

Adapter ligation

An adapter ligation mix was prepared from the LSK-109 Ligation Sequencing Kit by combining 25μL Ligation Buffer, 5μL Quick T4 Ligase (NEB E6057), 5μL Adapter Mix, and 13μL nuclease-free water. The mixture was added to the previous reaction for a total volume of 100uL and incubated for 10 minutes on a hula mixer at room temperature. A clean-up step was then performed using 0.3X AmpureXP magnetic beads (Beckman Coulter A63881) and washed twice with 200μL of Short Fragment Buffer (ONT SQK-LSK109). The final library was eluted in 16.6μL of Elution Buffer and 15.8μL retained.

Nanopore sequencing

Each sample was sequenced on a MinION flow cell (R9.4.1). Flow cells with >800 active pores following Platform QC were primed according to the adapted protocol from Gilpatrick et al [24] with 800μL of Flush Buffer followed by a second priming with priming mix (70μL Sequencing Buffer + 70μL nuclease-free water + 70μL Flush Buffer). The final library is then immediately loaded onto the flow cell in a mixture with 26μL Sequencing Buffer, 9.5μL Loading Beads, and 0.5μL Sequencing Tether from the LSK-109 Ligation Kit. Sequencing was performed for 48 hours using default settings with the MinKNOW software (v.19.05.0) and live base calling was conducted using the high accuracy flip-flop algorithm.

Sequence data alignment and QC

All sequencing reads were aligned to the human reference GRCh38 using minimap2 software with parameters (-Yax map-ont) appropriate for ONT and to prevent hard clipping of supplementary alignments [53]. Reads were considered on-target if they overlapped the target locus by at least 1 bp. Spanning reads aligned to the >90% of the target between Cas9 cleavage sites. Off-target reads with mapQ = 60 were counted using samtools v.1.9. On-target depth of coverage was also measured with samtools and visualized in IGV. GC content of all off-target reads was calculated using samtools and awk and compared to a random sample of 1,000,000 intervals in the GRCH38 reference using Bedtools “nuc” (v2.28.0). All off-target reads were also tested for enrichment with secondary structure annotations, namely G-quadruplexes, using poverlap [37], which permutes a null distribution of overlapping genomic regions.

Repeat copy number estimation in ALS samples

On-target reads at the C9orf72 locus were identified using samtools by identifying reads that overlap the target locus by at least one base pair [34]. For large expansions, a single read would often be soft-clipped within the repeat with sequence up- and downstream represented as multiple alignments in the resulting BAM file. On-target reads were realigned to the upstream and downstream sequences flanking the repeat expansion using the Striped Smith-Waterman algorithm to determine whether the read completely spanned the repeat (scikit-bio v.0.2.3 [54], Python v.2.7). Repeat-spanning reads were defined as reads that aligned both 10bp upstream and 10bp downstream of the repeat after realignment. To determine repeat copy length, the base pair position representing the end of the alignment to the upstream flank was subtracted from the start position of the alignment to the downstream flank within each repeat-spanning read. The repeat length was divided by 6 (the repeat unit length) to estimate repeat copy number. Reads that failed base calling were also aligned with Striped Smith-Waterman to ensure that we weren’t missing on-target reads where the repeat interfered with base calling. Repeat length distributions were then visualized on a histogram to determine the number of expected clusters of allele-lengths, which were then fed into a Gaussian Mixture Model (scikit-learn 0.22.1 [55]) to determine allele-specific repeat copy number estimates.

Accession numbers

All sequencing data from healthy donors are available on the Sequence Read Archive under accessions PRJNA687491. Data from two ALS cases is available through dbGaP with accession phs002368.v1.p1. Data, analysis code and a detailed wet laboratory protocol used to generate the results for this manuscript are available at https://github.com/adw222/CaBagE-manuscript.

Read length and quality using short and long fragment buffer.

Characteristics of a random sample of 9000 reads produced from a CaBagE run enriching for a 4,044bp target. The experiment was conducted in tandem using the same sample DNA with the ONT Long Fragment Buffer (LFB) during adapter ligation or with the Short Fragment Buffer (SFB). A) Kernel density plot of read lengths in LFB and SFB reads. B) Kernel Density plot of map quality scores in LFB and SFB reads. (TIF) Click here for additional data file.

GC content of off-target reads.

GC content distribution of all off-target reads from a single CaBagE run (n = 890,627) compared to a random 1,000,000 intervals from GRCh38 with length equal to the mean off-target read length of the CaBagE run. (TIF) Click here for additional data file.

Replicates of C9orf72 repeat copy number estimates in expansion carriers.

Histograms of repeat copy number distributions for replicated target enrichment and sequencing across C9orf72 repeat expansions in two individuals with ALS. Results confirm presence of >2 alleles in both individuals. (TIF) Click here for additional data file.

Guide RNA sequences.

(XLSX) Click here for additional data file.

Comparison of coverage across targets for CaBagE and nCATs.

(XLSX) Click here for additional data file. (PDF) Click here for additional data file. 2 Nov 2020 PONE-D-20-31316 CaBagE: a Cas9-based Background Elimination strategy for targeted, long-read DNA sequencing PLOS ONE Dear Dr. Quinlan, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The expert reviewers were generally positive about your paper, but indicated that the approach was not entirely innovative. Please compare your methods with previously published similar methods of Cas9 sequence enrichment. Please present median read depth in the abstract. Please pay attention to the remaining useful recommendations of the reviewers that will strengthen your submission. Please submit your revised manuscript by Dec 17 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Alfred S Lewin, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. PLOS ONE now requires that authors provide the original uncropped and unadjusted images underlying all blot or gel results reported in a submission’s figures or Supporting Information files. This policy and the journal’s other requirements for blot/gel reporting and figure preparation are described in detail at https://journals.plos.org/plosone/s/figures#loc-blot-and-gel-reporting-requirements and https://journals.plos.org/plosone/s/figures#loc-preparing-figures-from-image-files. When you submit your revised manuscript, please ensure that your figures adhere fully to these guidelines and provide the original underlying images for all blot or gel data reported in your submission. See the following link for instructions on providing the original image data: https://journals.plos.org/plosone/s/figures#loc-original-images-for-blots-and-gels. 3. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ 4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper presents the CaBagE strategy. The method exploits the stable binding of Cas9 complexes to a target DNA combined with exonucleases digestion to deplete off-target DNA and physically enrich for the region of interest prior to sequencing. The authors evaluate their enrichment strategy on several targets using sequencing runs performed on MinION. Overall the paper reads well and the evaluation of the method uses reasonable experiments. I am not a wet-lab biologist, so I've decided not to go in-depth on the wet-lab part. I will rather focus on the broader use of CaBagE. Comments below are roughly given in order of decreasing importance. I suppose these can all be addressed without performing additional experiments. Comment #1. (General comment) The enrichment strategy described in this paper (that is, Cas9 complexes to flank target specific sequences and exonuclease digestion to deplete off-target DNA), has already been described elsewhere (https://doi.org/10.1371/journal.pone.0215441) and termed Negative Enrichment. I'm aware that the lack of novelty is not a major issue with PLOS ONE. However, the author should at least cite the original method in the Introduction section and highlight similarities/differences with their CaBagE strategy, if any (otherwise, the novelty claim should be toned-down as well). Comment #2. (Table 1 and Abstract, line 63). The authors show that a 416X on-target read depth can be obtained in a single sequencing run after CaBagE enrichment. However, looking at the on-target read depth values in Table 1, it is clear that this event is pretty rare (the median and the mean both equal to 179, less than half of the maximum read depth value). Median (or mean) read depth should be stated clearly in the Abstract in order not to create inflated expectations. Comment #3. (Results, line 210, and Figure 3). The authors state that any reads overlapping the target region by at least 1 bp were considered for the estimation of the region-specific depth of coverage. I'm not convinced that counting all the reads intersecting the region of interest (even those having just 1 bp falling in the target region) make sense, especially if a genetic variant (that is, a complex structural variant or a tandem repeat expansion) lies in the middle of the target region. Can the authors elaborate on the number of reads overlapping a larger portion of the target region (for instance, 50% of the region of interest)? It would be nice to have a column specifying such a value for each sequencing run/region of interest in Table 1. I guess the authors have these data handy. In addition, as from Figure 3, it seems to me that most of the reads are actually spanning more than 50% of the target region and I guess that these additional data won't deteriorate the performances of CaBagE. Comment #4. (Results, line 209). Which minimap2 parameters do the authors used to align sequenced reads? In principle, the standard presets for Nanopore Sequencing data (-x map-ont) are not expected to give the best alignment results. Indeed, long expansions relative to the reference may not map through the repeat region if this penalty increases with length (as from the standard --gap-extend and --lj-min-ratio values): in this case, the alignment can get clipped somewhere within the region, leaving only one side of the read mapped. Comment #5. (Introduction, lines 127-132). The authors cite the ReadUntil method for real-time sequences identification in nanopore sequencing runs. The method has recently been renamed to ReadFish (https://github.com/LooseLab/readfish) and the name should be changed accordingly in the text. Furthermore, it should be stated clearly that only off-target reads are, in principle, rejected (or on-target reads retained). Comment #6. (Discussion, line 324). The authors state that NanoSatellite may be an effective alternative for repeated sequences characterization as the performance of Guppy improves. This is not completely true. As the authors point out, NanoSatellite was originally built to solve issues with guppy/albacore basecalling by operating on raw signals. As basecalling performances improve, this is nowadays not that effective (see also README.md at https://github.com/arnederoeck/NanoSatellite) and NanoSatellite is not currently mantained (see also https://github.com/arnederoeck/NanoSatellite/issues/14). Other tools for tandem repeat profiling which will benefit from improvements of Guppy basecalling performances exist and, among these, TRiCoLOR (https://doi.org/10.1093/gigascience/giaa101) seem in high-quality. Reviewer #2: Reviewer's comments attached in a doc file. Overall a nice manuscript that will help researchers interested in applying in vitro CRISPR-Cas9 enrichment to isolate long loci containing complex structural variants ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: Review_manuscript_CaBaGe.docx Click here for additional data file. 12 Jan 2021 We have attached a file with our response to reviewers. Submitted filename: Cabage_PLOS_Response_to_Reviewers_v4.docx Click here for additional data file. 20 Jan 2021 CaBagE: a Cas9-based Background Elimination strategy for targeted, long-read DNA sequencing PONE-D-20-31316R1 Dear Dr. Quinlan, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Alfred S Lewin, Ph.D. Section Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I thank the authors for addressing all of my previous comments. Thanks also to the other reviewer, that helped a lot improving the wet-lab part. I have no further comments. Reviewer #2: Thank you for taking the time in considering all my comments. I see the authors didn't re-do the tandem repeat profiling with TRICOLOR software as suggested by the other reviewer, however I understand that there is always new software coming to update the ones used. It is up to the other reviewer if the response they gave regarding this, it is sufficient or not. I consider that authors made a great effort in addressing all our comments. I think that now the paper has strengthened and it is ready for publication. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Elena Lopez-Girona 30 Mar 2021 PONE-D-20-31316R1 CaBagE: a Cas9-based Background Elimination strategy for targeted, long-read DNA sequencing Dear Dr. Quinlan: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Alfred S Lewin Section Editor PLOS ONE

Table 1

Results from individual CaBagE runs in DNA from healthy donors.

Run ID	Total Reads^a	Target(s) per flowcell	Target Length (bp)	On-Target Read Depth	Total Spanning Reads^b
L1R1	536,943	C9orf72	4,044	416	404
L1R2	485,412	C9orf72	4,044	179	168
L4R1	845,510	GSTP1	17,819	91	61
		KRT19	18,189	162	98
		GPX1	13,644	190	136
		SLC12A4	24,389	116	77
L4R2	681,142	GSTP1	17,819	39	25
		KRT19	18,189	61	36
		GPX1	13,644	54	39
		SLC12A4	24,389	63	41

aMapQ = 60

bReads that span ≥ 90% of the target locus

Table 2

Results from CaBagE runs in known carriers of the C9orf72 repeat expansion.

Coriell ID	RP-PCR CN Estimate	Total Reads^a	On-Target Read Depth	Total Spanning Reads	Reads spanning expanded repeat	CaBagE CN Estimate
ND11386	8/704	1,490,712	115	98	21	9/749/1,893
ND13803	2/EXP	852,155	71	66	7	2/808/1,538

aMapQ = 60

*RP-PCR repeat-primed PCR and agarose gel electrophoresis derived genotypes from Bram et al [38], CN copy number

48 in total

1. E-CRISP: fast CRISPR target site identification.

Authors: Florian Heigwer; Grainne Kerr; Michael Boutros
Journal: Nat Methods Date: 2014-02 Impact factor: 28.547

2. Validation of a Long-Read PCR Assay for Sensitive Detection and Sizing of C9orf72 Hexanucleotide Repeat Expansions.

Authors: EunRan Suh; Kaitlyn Grando; Vivianna M Van Deerlin
Journal: J Mol Diagn Date: 2018-08-20 Impact factor: 5.568

3. Somatic sequence variation at the Friedreich ataxia locus includes complete contraction of the expanded GAA triplet repeat, significant length variation in serially passaged lymphoblasts and enhanced mutagenesis in the flanking sequence.

Authors: S I Bidichandani; S M Purandare; E E Taylor; G Gumin; H Machkhas; Y Harati; R A Gibbs; T Ashizawa; P I Patel
Journal: Hum Mol Genet Date: 1999-12 Impact factor: 6.150

4. Integrative genomics viewer.

Authors: James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal: Nat Biotechnol Date: 2011-01 Impact factor: 54.908

5. Frequency of the C9orf72 hexanucleotide repeat expansion in patients with amyotrophic lateral sclerosis and frontotemporal dementia: a cross-sectional study.

Authors: Elisa Majounie; Alan E Renton; Kin Mok; Elise G P Dopper; Adrian Waite; Sara Rollinson; Adriano Chiò; Gabriella Restagno; Nayia Nicolaou; Javier Simon-Sanchez; John C van Swieten; Yevgeniya Abramzon; Janel O Johnson; Michael Sendtner; Roger Pamphlett; Richard W Orrell; Simon Mead; Katie C Sidle; Henry Houlden; Jonathan D Rohrer; Karen E Morrison; Hardev Pall; Kevin Talbot; Olaf Ansorge; Dena G Hernandez; Sampath Arepalli; Mario Sabatelli; Gabriele Mora; Massimo Corbo; Fabio Giannini; Andrea Calvo; Elisabet Englund; Giuseppe Borghero; Gian Luca Floris; Anne M Remes; Hannu Laaksovirta; Leo McCluskey; John Q Trojanowski; Vivianna M Van Deerlin; Gerard D Schellenberg; Michael A Nalls; Vivian E Drory; Chin-Song Lu; Tu-Hsueh Yeh; Hiroyuki Ishiura; Yuji Takahashi; Shoji Tsuji; Isabelle Le Ber; Alexis Brice; Carsten Drepper; Nigel Williams; Janine Kirby; Pamela Shaw; John Hardy; Pentti J Tienari; Peter Heutink; Huw R Morris; Stuart Pickering-Brown; Bryan J Traynor
Journal: Lancet Neurol Date: 2012-03-09 Impact factor: 44.182

6. Multi-platform discovery of haplotype-resolved structural variation in human genomes.

Authors: Mark J P Chaisson; Ashley D Sanders; Xuefang Zhao; Ankit Malhotra; David Porubsky; Tobias Rausch; Eugene J Gardner; Oscar L Rodriguez; Li Guo; Ryan L Collins; Xian Fan; Jia Wen; Robert E Handsaker; Susan Fairley; Zev N Kronenberg; Xiangmeng Kong; Fereydoun Hormozdiari; Dillon Lee; Aaron M Wenger; Alex R Hastie; Danny Antaki; Thomas Anantharaman; Peter A Audano; Harrison Brand; Stuart Cantsilieris; Han Cao; Eliza Cerveira; Chong Chen; Xintong Chen; Chen-Shan Chin; Zechen Chong; Nelson T Chuang; Christine C Lambert; Deanna M Church; Laura Clarke; Andrew Farrell; Joey Flores; Timur Galeev; David U Gorkin; Madhusudan Gujral; Victor Guryev; William Haynes Heaton; Jonas Korlach; Sushant Kumar; Jee Young Kwon; Ernest T Lam; Jong Eun Lee; Joyce Lee; Wan-Ping Lee; Sau Peng Lee; Shantao Li; Patrick Marks; Karine Viaud-Martinez; Sascha Meiers; Katherine M Munson; Fabio C P Navarro; Bradley J Nelson; Conor Nodzak; Amina Noor; Sofia Kyriazopoulou-Panagiotopoulou; Andy W C Pang; Yunjiang Qiu; Gabriel Rosanio; Mallory Ryan; Adrian Stütz; Diana C J Spierings; Alistair Ward; AnneMarie E Welch; Ming Xiao; Wei Xu; Chengsheng Zhang; Qihui Zhu; Xiangqun Zheng-Bradley; Ernesto Lowy; Sergei Yakneen; Steven McCarroll; Goo Jun; Li Ding; Chong Lek Koh; Bing Ren; Paul Flicek; Ken Chen; Mark B Gerstein; Pui-Yan Kwok; Peter M Lansdorp; Gabor T Marth; Jonathan Sebat; Xinghua Shi; Ali Bashir; Kai Ye; Scott E Devine; Michael E Talkowski; Ryan E Mills; Tobias Marschall; Jan O Korbel; Evan E Eichler; Charles Lee
Journal: Nat Commun Date: 2019-04-16 Impact factor: 17.694

7. Cas9-based enrichment and single-molecule sequencing for precise characterization of genomic duplications.

Authors: Christopher M Watson; Laura A Crinnion; Sarah Hewitt; Jennifer Bates; Rachel Robinson; Ian M Carr; Eamonn Sheridan; Julian Adlard; David T Bonthron
Journal: Lab Invest Date: 2019-07-04 Impact factor: 5.662

8. Cas9-Assisted Targeting of CHromosome segments CATCH enables one-step targeted cloning of large gene clusters.

Authors: Wenjun Jiang; Xuejin Zhao; Tslil Gabrieli; Chunbo Lou; Yuval Ebenstein; Ting F Zhu
Journal: Nat Commun Date: 2015-09-01 Impact factor: 14.919

9. CRISPR-mediated isolation of specific megabase segments of genomic DNA.

Authors: Pamela E Bennett-Baker; Jacob L Mueller
Journal: Nucleic Acids Res Date: 2017-11-02 Impact factor: 16.971

10. Simplified ChIP-exo assays.

Authors: Matthew J Rossi; William K M Lai; B Franklin Pugh
Journal: Nat Commun Date: 2018-07-20 Impact factor: 14.919

4 in total

1. Long read mitochondrial genome sequencing using Cas9-guided adaptor ligation.

Authors: Amy R Vandiver; Brittany Pielstick; Timothy Gilpatrick; Austin N Hoang; Hillary J Vernon; Jonathan Wanagat; Winston Timp
Journal: Mitochondrion Date: 2022-07-03 Impact factor: 4.534

2. Long-range phasing of dynamic, tissue-specific and allele-specific regulatory elements.

Authors: Sofia Battaglia; Kevin Dong; Jingyi Wu; Zeyu Chen; Fadi J Najm; Yuanyuan Zhang; Molly M Moore; Vivian Hecht; Noam Shoresh; Bradley E Bernstein
Journal: Nat Genet Date: 2022-10-04 Impact factor: 41.307

Review 3. Uncovering Essential Tremor Genetics: The Promise of Long-Read Sequencing.

Authors: Luca Marsili; Kevin R Duque; Rachel L Bode; Marcelo A Kauffman; Alberto J Espay
Journal: Front Neurol Date: 2022-03-23 Impact factor: 4.003

4. Characterization of full-length CNBP expanded alleles in myotonic dystrophy type 2 patients by Cas9-mediated enrichment and nanopore sequencing.

Authors: Massimiliano Alfano; Luca De Antoni; Federica Centofanti; Virginia Veronica Visconti; Simone Maestri; Chiara Degli Esposti; Roberto Massa; Maria Rosaria D'Apice; Giuseppe Novelli; Massimo Delledonne; Annalisa Botta; Marzia Rossato
Journal: Elife Date: 2022-08-26 Impact factor: 8.713

4 in total