Literature DB >> 35301450

Massively parallel enrichment of low-frequency alleles enables duplex sequencing at low depth.

Gregory Gydush¹, Erica Nguyen¹, Jin H Bae¹, Timothy Blewett¹, Justin Rhoades¹, Sarah C Reed², Douglas Shea¹, Kan Xiong¹, Ruolin Liu¹, Fangyan Yu³, Ka Wai Leong^3,4, Atish D Choudhury^1,4,5, Daniel G Stover⁶, Sara M Tolaney⁵, Ian E Krop⁵, J Christopher Love^1,7, Heather A Parsons⁵, G Mike Makrigiorgos⁸, Todd R Golub^9,10,11, Viktor A Adalsteinsson^12,13.

Abstract

Assaying for large numbers of low-frequency mutations requires sequencing at extremely high depth and accuracy. Increasing sequencing depth aids the detection of low-frequency mutations yet limits the number of loci that can be simultaneously probed. Here we report a method for the accurate tracking of thousands of distinct mutations that requires substantially fewer reads per locus than conventional hybrid-capture duplex sequencing. The method, which we named MAESTRO (for minor-allele-enriched sequencing through recognition oligonucleotides), combines massively parallel mutation enrichment with duplex sequencing to track up to 10,000 low-frequency mutations, with up to 100-fold fewer reads per locus. We show that MAESTRO can be used to test for chimaerism by tracking donor-exclusive single-nucleotide polymorphisms in sheared genomic DNA from human cell lines, to validate whole-exome sequencing and whole-genome sequencing for the detection of mutations in breast-tumour samples from 16 patients, and to monitor the patients for minimal residual disease via the analysis of cell-free DNA from liquid biopsies. MAESTRO improves the breadth, depth, accuracy and efficiency of mutation testing by sequencing.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35301450 PMCID： PMC9089460 DOI： 10.1038/s41551-022-00855-9

Source DB: PubMed Journal: Nat Biomed Eng ISSN： 2157-846X Impact factor: 29.234

Mutations in DNA emerge from single cells[1], define cell populations[2], and establish genetic diversity[3]. Considering the vast genetic diversity of living organisms and the significance of mutations in disease biology[4], there is a growing need to assay many distinct, low-abundance mutations in multiple areas of biomedicine spanning oncology[5], obstetrics[6], transplantation[7,8], infectious disease[9], genetics[10], microbiomics[11], forensics[12], and beyond. Yet, the intrinsic tradeoff in breadth-versus-depth of DNA sequencing means that either few mutations can be assayed at high depth, or many mutations at low depth—not both. High depth (i.e. many reads per genomic locus) is required to accurately detect low-abundance mutations, but this severely limits breadth (i.e. number of distinct loci). This explains why, despite massive reductions in sequencing costs, it remains prohibitively expensive to test for large numbers of distinct, low-abundance mutations. Duplex sequencing is one of the most accurate methods for mutation detection, with 1000-fold fewer errors than standard sequencing, but adds significant cost[13]. By requiring mutations to be present in replicate reads from both strands of each DNA duplex, many of the errors in sample preparation and sequencing can be overcome to enable reliable detection of low-abundance mutations. Yet, up to 100-fold more reads per locus are required—a challenge that is exacerbated when tracking many low-abundance mutations. Less stringent methods exist that require fewer reads, but compromising specificity to save cost would be deeply problematic for applications that impact patient care. While methods to enrich rare mutations have been developed[14-21], none have employed high-accuracy sequencing, nor tracked many rare mutations. Here, we describe MAESTRO (minor allele enriched sequencing through recognition oligonucleotides), a technique which combines massively-parallel mutation enrichment with duplex sequencing to enable accurate mutation testing with minimal sequencing. In contrast to conventional hybrid-capture duplex sequencing[22-24] (herein referred to as ‘Conventional’), which uses long probes to capture mutant and wild type with similar efficiency, MAESTRO uses short probes to enrich for patient-specific mutant alleles and uncovers the same mutant duplexes using up to 100-fold fewer reads. We first establish its performance and then present three exemplary use cases.

Results

MAESTRO uncovers mutant duplexes with ~100-fold fewer reads

We have established an accurate and efficient technique to track large numbers of low-abundance mutations in clinical specimens (Fig. 1A). Our technique, called MAESTRO, utilizes allele-specific hybridization with short probes, leveraging thermodynamic differences in heteroduplex versus homoduplex DNA (Fig. 1C), to enrich barcoded library molecules bearing up to 10,000 prespecified mutations. Minimal sequencing is applied, and mutations are detected on both strands of each DNA duplex (Fig. 1B). MAESTRO also employs a tunable noise filter which excludes error-prone loci (Supplementary Fig. 3, Methods).

Figure 1:

MAESTRO enables accurate mutation tracking using minimal sequencing in clinical specimens.

(A) Up to 10,000 MAESTRO probes are designed with stringent length and ΔG for single-nucleotide discrimination of predefined mutations. DNA libraries containing uniquely barcoded top and bottom strands are subject to hybrid capture using allele-specific MAESTRO probes. Only molecules containing tracked mutations are captured and sequenced with duplex consensus for error suppression. (B) Using MAESTRO, the same mutations are discovered using up to 100x less sequencing because uninformative regions are depleted. (C) Probe design overview. See probe design section of Methods.

We first sought to maximize fold-enrichment while minimizing loss of mutations. We created a 1/1k dilution of sheared genomic DNA from two human cell lines, identified exclusive single nucleotide polymorphisms (SNPs) as proxies for clonal mutations, and generated duplex sequencing libraries that were split for hybrid capture. Using qPCR, we confirmed that adapter ligation efficiencies are consistent with prior reports (Supplementary Table 1), and that MAESTRO capture efficiency is only slightly lower than Conventional capture (35% vs. 57% respectively, Supplementary Table 2). After sequencing, we compared raw variant allele fraction (raw VAF) and recall of mutant duplexes (Supplementary Table 3, Supplementary Fig. 2B) using MAESTRO versus Conventional (120 bp probes, 65°C annealing). By adjusting probe length and hybridization parameters, we established conditions (ΔG −18 to −14 kcal/mol, T=50°C, Fig. 1C, Supplementary Fig. 1, Supplementary Fig. 2A) that yielded strong fold-enrichment of mutant vs. wild type alleles (median 948.3-fold, range 8.1 to 3.4E4) while uncovering the majority of mutant duplexes (Fig. 2A,B, Supplementary Table 3). Indeed, the median raw VAF with MAESTRO was 0.97 (range 5.03E-3 to 1), in contrast to 6.98E-4 (range 3.00E-5 to 3.87E-3) with Conventional. The fraction of recoverable mutations (or, enrichment ‘success rate’) was 72.5%. Interestingly, we did not observe equal and opposite magnitude raw VAF changes when swapping strands of C and G reference base probes (Supplementary Fig. 2C). We believe this may be due to differences in probe characteristics (i.e. delta G, length) for each base category but further investigation is needed. MAESTRO cannot uncover more mutations than physically present in a sample; yet, by detecting each with up to 100x fewer reads, it can recover more total unique mutations, particularly when it would not otherwise be possible (e.g. due to cost) to sequence a sample to saturation.

Figure 2:

MAESTRO uncovers most mutant duplexes using significantly fewer reads.

(A) Comparison of variant allele frequency with Conventional and MAESTRO with 438 probe panel at 1/1k dilution. (B) Downsampling of Conventional and MAESTRO. As an inset, mutant duplex overlap is shown; of the 57 mutant duplexes exclusive to Conventional, 42 were detected by MAESTRO but excluded by the noise filter. The initial sample was barcoded with UMIs (unique molecular indices) which allowed for tracking individual duplex molecules through different experimental conditions.

We next tuned the MAESTRO noise filter. This filter was designed to protect against the possibility that errors could arise independently on both strands of library molecules and, given enrichment bias, ‘collide’ to form a duplex (Supplementary Fig. 3A,B). It works based on the assumptions that (i) errors should be impartial to read family, and (ii) error-prone loci should therefore exhibit a disproportionate number of double- (DSC) to single- (SSC) strand consensus read families bearing mutations (Supplementary Fig. 3B). Indeed, we found that sites with DSC/SSC ratios below 0.15 had poor reproducibility in replicate captures of a non-mutant library (our negative control) (Supplementary Fig. 3C). We also found that our filter protected against errors introduced by excessive PCR (Supplementary Fig. 3D), and further confirmed that MAESTRO probes—which contain the mutant base—do not create false mutant duplexes (Supplementary Fig. 4). Filtering by DSC/SSC ratio was found to be robust to changes in sequencing depth with similar concordance observed at 10% of the original sequencing depth (Supplementary Fig. 5). Considering the profound enrichment, we then asked how many fewer reads would be required to detect the same mutant duplexes as Conventional. We found that MAESTRO could uncover the majority (n=150/207) using ~100-fold less sequencing (Fig. 2B), while providing comparable specificity (Supplementary Fig. 6C). Of the 57 mutant duplexes exclusive to Conventional, 42 were detected by MAESTRO but excluded by the noise filter. Though some of these excluded mutations may be real, the current filter threshold prioritizes specificity over sensitivity but could be tuned for varied application. Our results suggest that MAESTRO can uncover the majority of mutant duplexes using significantly less sequencing.

MAESTRO could enable mutation verification

Expansive methods such as whole-exome and whole-genome sequencing stand to unravel the genetic basis of human diseases. However, it remains challenging to resolve low-level mutations (e.g. < 10% VAF) given insufficient depth to read each DNA molecule enough times to suppress errors. Currently, mutations discovered in sequencing studies may be orthogonally validated via technologies such as digital droplet PCR or multiplex amplicon sequencing. However, these are not highly scalable approaches and are usually restricted to a handful of mutations suspected of having potential clinical significance. We reasoned that MAESTRO could enable accurate verification of large numbers of mutations discovered from whole-exome and -genome sequencing. The net result would be that lower abundance mutations could be reliably discovered and verified from comprehensive sequencing studies. To explore this, we performed whole-exome sequencing of tumor biopsies (of varied tumor purity; median 63%, range 26 – 100%) and matched normal DNA from 16 patients. We identified a median of 40 mutations per patient (median 40, range 13–130) and created both a MAESTRO and Conventional panel comprising all mutations for which we could design probes. Requiring the true mutations to be detected on both strands of each duplex, we found similar fractions of validated mutations between MAESTRO and Conventional, with slightly lower fractions for MAESTRO likely due to probe dropout (Fig. 3A). Yet, the fraction of validated mutations was much higher for those which had been identified at >0.10 VAF from tumor whole-exome sequencing (median 0.75, range 0.21–0.90 for MAESTRO; median 0.98, range 0.40–1.0 for Conventional), in comparison to those which had been identified at <0.10 VAF (median 0.29, range 0.07–0.82 for MAESTRO; median 0.35, range 0.04–1.0 for Conventional, Fig. 3A). Indeed, the mutations which were found to be “not validated” tended to have the lowest VAFs from tumor whole-exome sequencing (median 0.04, range 0.01–0.83, Fig. 3B). Expectedly, we observed higher fractions of MAESTRO-validated mutations in fresh-frozen (median 0.65, range 0.62–0.77) as compared to formalin-fixed (median 0.58, range 0.10–0.76) tumor biopsies. Our results suggest that MAESTRO could inform mutation validation and optimization of variant detection pipelines from WES and WGS data.

Figure 3:

MAESTRO fingerprint validation of whole exome tumor samples.

(A) Performance of n=16 tumor fingerprints using both Conventional and MAESTRO. Mutations were called from the n=16 tumor biopsies and both Conventional and MAESTRO probe sets (fingerprints) were created for all possible mutations from each tumor. The tumor biopsy libraries were captured with the Conventional and MAESTRO fingerprints and duplexes were sequenced. Fingerprints were split into two groups based on whether or not their original tumor VAF was < 10%. A mutation was considered validated if it was observed in the sequenced duplexes of the Conventional or MAESTRO sample. (B) Comparing variant allele fraction across all mutations from all Conventional (n=16) and MAESTRO (n=16) panels. Center line represents the median and bounds of the box indicate the first and third quartiles.

MAESTRO could enable chimerism testing

Chimerism testing involves detecting admixtures of cells that arise from different zygotes[25]. For patients who undergo stem cell transplantation, such as in the treatment of hematologic malignancies, assessment of donor chimerism has been shown to inform likelihood of recurrence and graft-versus-host disease[26,27]. Existing methods such as short tandem repeat analysis and qPCR testing can detect chimerism levels ranging from 1–5% to 0.05–0.1%, respectively[28]. We sought to determine whether higher sensitivity could be achieved by tracking more SNPs and from limited blood volume, e.g. a 1uL fingerprick contains ~10,000 white blood cells. We first compared MAESTRO to Conventional for tracking 438 SNPs in 18 × replicate 1/100k dilutions and 17 × replicate negative control samples. We used sheared genomic DNA from the same two cell lines described in the previous section[8,23,29-33] to reflect donor vs. recipient DNA and isolated 20 ng for each replicate. MAESTRO uncovered 81% (n=47/58) and 80% (n=4/5) of the donor-exclusive SNP duplexes detected with Conventional across all 1/100k and negative control samples, respectively, using much less sequencing (Supplementary Fig. 6A). Most that were exclusive to Conventional in the 1/100k samples (n=6/11) were detected by MAESTRO but excluded by the noise filter. MAESTRO also uncovered an additional 52 and 16 donor-exclusive SNP duplexes across all 1/100k and negative control samples, respectively, but most were near fragment ends, which proved less likely to be captured by Conventional in these experiments (Supplementary Fig. 6B). If we consider these differences, the concordance is nearly perfect (Supplementary Fig. 6C). However, for the rest of the study we do not remove the molecules that were less likely to be captured with Conventional. Importantly, MAESTRO detected significantly more donor-exclusive SNP duplexes in the 1/100k samples than the negative controls (Fig. 4A, p=1.16E-5, Welch’s t-test). We also confirmed that without duplex error suppression, we would have been unable to resolve chimerism at these limiting dilutions (Supplementary Fig. 6D).

Figure 4:

MAESTRO can detect signal above noise at 1/100k dilution.

(A) Donor-exclusive SNPs detected in MAESTRO using a 438 probe panel across 18 × biological replicates of a 1/100k dilution and 17 × biological replicates of a negative control (p=1.16E-5, two-sided Welch’s t-test). (B) Donor-exclusive SNPs detected in MAESTRO using a 10,000 probe panel across 16 × biological replicates of a 1/100k dilution, 17 × biological replicates of 1/1M, and 12 × negative controls. The Welch’s t-test (two-sided) was used to determine whether significantly more donor-exclusive SNPs were uncovered in each dilution compared to the negative controls (1/100k vs. NC p=7.23E-11; 1/1M vs NC p=7.47E-5).

While MAESTRO provided comparable sensitivity and specificity using significantly less sequencing, the number of donor-exclusive SNPs detected at 1/100k dilution, of 438 tracked, was not much greater than the negative controls. We thus hypothesized that tracking even more, e.g. 10,000, could improve our signal-to-noise ratio. Yet, this could only be done feasibly with MAESTRO, as Conventional would require >10 billion reads (~$20,000 on the Illumina HiSeqX) to saturate duplex recovery, in contrast to about ~100 million reads (~$200) with MAESTRO. Applying MAESTRO to track 10,000 donor-exclusive SNPs in 16 × replicate 1/100k dilutions, 17 × 1/1M dilutions and 12 × negative controls, we observed a large increase in number detected in the 1/100k samples (median mutations=169, range 91 to 187), which was significantly higher than the negative controls (median 13 mutations, range 5 to 24, p=7.23E-11, Fig. 4B). Higher counts were also observed in the 1/1M dilutions (median 23, range 16 to 36, p=7.47E-5), although further refinements are likely needed to enable reliable detection at 1/1M. Our results suggest that tracking thousands of genome-wide, donor-specific SNPs provides a profound boost in our signal-to-noise ratio, which is likely to be important for detecting very low levels of chimerism. As for the signal in the negative controls, we reason that these could either be (a) true mutations that arose spontaneously with each cell division, (b) cross-contamination when cell lines were cultured, or (c) technical artifacts that have yet to be overcome in duplex sequencing. While we cannot yet discern the source, the counts were consistent with what was expected for scanning tens of millions of bases for potential mutation (10,000 mutations × few thousand haploid genomes of DNA) given the reported error rate of ~1×10−6 in duplex sequencing[13,23,24]. By retesting specific loci, we also verified that the majority would have been detected with Conventional (Supplementary Fig. 7), suggesting that most are not artifacts of the MAESTRO protocol.

MAESTRO could enable high-sensitivity liquid biopsy testing

Liquid biopsy represents an application for which accurate tracking of many distinct mutations could empower clinical decisions[34]. For instance, applying liquid biopsies to detect minimal residual disease (MRD) after cancer treatment[5,35-38] has the potential to inform whether surgery is needed after neoadjuvant therapy[39-41], whether adjuvant therapy is needed after surgery[42], and ultimately, whether it is safe to stop treatment[43]. One promising way to improve sensitivity of liquid biopsies is to track many patient-specific tumor mutations in cell-free DNA (cfDNA), recognizing that not all mutations may be present in an individual blood tube when tumor DNA in the bloodstream is sparse (i.e. less than a genome equivalent of tumor DNA per tube)[23,39,40,44,45]. Yet, this has been challenging, because to rely upon any subset for MRD detection requires extremely accurate sequencing of many rare mutations. We reasoned that MAESTRO could enable sensitive monitoring of tumor mutation fingerprints from cell-free DNA (cfDNA). To explore whether MAESTRO could enhance MRD detection, we created dilutions targeting between 1/100 to 1/300k tumor fraction (i.e. the fraction of tumor-derived DNA versus total DNA) using previously characterized cfDNA from a cancer patient spiked into cfDNA from a healthy donor (Fig. 5A). Replicate 20 ng libraries at each dilution were amplified and split between MAESTRO and Conventional capture, tracking 978 genome-wide SNVs exclusive to the patient’s tumor WGS. Of the 978 mutations tracked, 94.7% were validated with MAESTRO when applied to the patient’s tumor DNA (Fig. 5A inset), meaning that the remaining 5.3% were either artifacts from tumor sequencing or were missed by MAESTRO. MAESTRO uncovered a comparable number of mutations as Conventional at tumor fractions down to 1/100k and required ~50-fold less sequencing (Supplementary Fig. 8). To consider a sample “MRD positive” we used our previously defined threshold of requiring >1 mutation to detect MRD for both MAESTRO and Conventional[23,37,44]. Applying this threshold, none of the eight healthy donor cfDNA (negative control) samples tested positive for MRD. Leveraging probes with imperfect enrichment, we show MAESTRO retains the ability to estimate tumor fraction (r2 = 0.92, Fig. 5B, Methods). Our results show that MAESTRO’s limit-of-detection is comparable to that of Conventional in cfDNA despite ~50-fold less sequencing. MAESTRO is well-positioned to affordably track thousands of genome-wide mutations in cfDNA, well beyond what is feasible with Conventional.

Figure 5:

MAESTRO improves detection of MRD in pre-operative setting.

(A) Number of mutations detected at various tumor fraction dilutions between 1/100 and 1/300k using the cfDNA from a cancer patient and healthy donor. A 978 SNV genome-wide fingerprint exclusive to the cancer patient’s tumor was used for all replicates. Eight unrelated healthy donors were used as negative controls (NC). Closed circles indicate MRD(+). Inset shows number of mutations validated using Conventional and MAESTRO when each panel was applied to patient’s tumor DNA. (B) Observed tumor fraction versus expected dilution for Conventional and MAESTRO probes. (C) Genome-wide tumor mutations detected with MAESTRO compared to exome-wide tumor mutations detected with a personalized MRD test built on our Conventional assay[23]. Fingerprint sizes for the two conditions are shown with triangles. Mutations from all patients were combined into a single panel for MAESTRO and the same panel was applied to all samples. (D) The heatmap shows mutation counts detected using MAESTRO with patient-specific mutations on the diagonal. Filled boxes indicate MRD(+); Unfilled boxes indicate MRD(−). When each panel was applied to the other three patients’ timepoints, there was no evidence of false positives (off-diagonal), highlighting MAESTRO’s specificity.

We next sought to determine whether tracking thousands of genome-wide tumor mutations could enhance MRD detection after cancer therapy. This is significant because (i) detecting MRD remains a significant unmet medical need, and (ii) while MRD detection correlates with the number of tumor mutations tracked in cfDNA[23,39,45], existing techniques have had limited breadth or depth. For instance, cancer gene panels typically cover just a few mutations per patient[24]; patient-specific assays track tens to hundreds[39,44]; and whole-genome sequencing remains far too costly to apply beyond minimal depth[46]. For patients with some common, aggressive forms of breast cancer, standard care involves preoperative systemic chemotherapy for its utility in guiding subsequent response-based treatment[47,48]. We analyzed patients with breast cancer enrolled in a clinical trial (16 patients) of preoperative therapy (Supplementary Fig. 9A), reasoning we could (i) determine the detectability of tumor-derived cfDNA at diagnosis, (ii) describe how cfDNA trends with clinical response over the course of treatment, and (iii) determine whether preoperative MRD testing could predict the presence of residual cancer in the surgical specimen. Reasoning that genome-wide mutation tracking would be most useful in samples with low tumor fraction, we first tracked all exome-wide tumor mutations using a personalized cfDNA test built on our Conventional assay[23]. We found that most patients had detectable circulating tumor DNA at diagnosis (median tumor fraction 0.00858, range 0 to 0.21, Supplementary Fig. 9B) and that a decrease in tumor fraction in cfDNA between the first two time points (T1, T2) trended with clinical response (Supplementary Fig. 9C) which is consistent with prior reports[39-41]. Yet, MRD was detected preoperatively (T4) in only one of eight patients with residual disease at the time of surgery and in only one of five who experienced future distant recurrence. We chose the remaining four patients to explore whether genome-wide mutation tracking could enhance MRD detection. For these four patients who had tested MRD-negative preoperatively but experienced future distant recurrence, we performed PCR-free whole-genome sequencing of their tumor biopsy specimens and blood normal DNA. We identified a median of 5575.5 (range 3385 to 8783) somatic mutations per patient and, using stringent criteria for probe design, we created one MAESTRO test comprising 55–58% of exonic mutations and 30–38% of intronic mutations from all patients (Supplementary Fig. 10). We applied the MAESTRO test to tumor and normal DNA and found 52% (range 41–56%) of probed mutations to be verified (Supplementary Fig. 11). We then applied the assay to all available cfDNA samples from all four patients, such that we assess all mutations in all patients, using the unmatched samples as controls for one another. By also applying MAESTRO tests to matched germline DNA from each patient, we further limited the potential impact of variants arising from clonal hematopoiesis. We found that tracking all tumor mutations with MAESTRO uncovered more mutations per patient in cfDNA compared to Conventional (Fig. 5C) and detected no false mutations in any unmatched samples (Fig. 5D). Previous studies have shown that using > 1 mutation for MRD detection helps to protect against error[23,37,44]. We uncovered multiple tumor mutations preoperatively for two of the four patients, while observing profound signal enhancement in the earlier time points from all patients. These results demonstrate that a threshold of >1 mutation could provide reasonable specificity for MRD detection but further testing is needed to determine clinical relevance. These proof-of-principle results suggest that MAESTRO could enhance MRD detection by enabling all genome-wide tumor mutations to be accurately tracked in cfDNA.

Discussion

We have demonstrated a practical approach to extend the breadth, depth, accuracy, and efficiency of mutation tracking in clinical specimens. Our technique breaks the breadth-vs-depth ‘glass ceiling’ of DNA sequencing, enabling thousands of low-abundance mutations to be accurately tracked using minimal sequencing. As a hybrid capture and sequencing methodology, MAESTRO is easy to adopt and highly adaptable across a range of sample types. This is likely to empower many types of biomedical research and diagnostic tests that demand accurate and efficient tracking of many rare mutations. MAESTRO is the first method, to our knowledge, to simultaneously enrich and detect thousands of mutations with high-accuracy sequencing. In a dilution series involving sheared genomic DNA, we demonstrated a median ~1000-fold enrichment from 0.1% VAF to nearly pure mutant DNA, which enabled us to detect most mutant duplexes using ~100-fold less sequencing. We showed that MAESTRO could track up to 10,000 distinct, low-abundance (< 0.1% VAF) mutations to enable both mutation validation and chimerism testing. Finally, we show signal enhancement in liquid biopsies beyond what is feasible with existing methods. MAESTRO addresses a fundamental challenge in the mutation enrichment field by using molecular barcodes to discern true mutations from low-level errors that may also be enriched. While tracking more mutations increases our detection sensitivity at limiting dilutions, it will never be possible to detect below sequencing error rates. Accordingly, we opted to employ the most accurate sequencing method, duplex sequencing. Further, our DSC/SSC ratio filter is a unique advance that measures intrinsic noise within each sample, but two current limitations are (i) that it needs to be tuned, and (ii) that error-prone loci are discarded, which impacts sensitivity when these regions contain real mutations. We imagine a tunable noise filter could be leveraged to achieve the right balance of sensitivity and specificity across various applications. While our approach requires whole-genome sequencing and individualized probe design, the cost of each continues to decline, and biotinylation of oligonucleotides in-house can further help to limit costs (see Methods). Bespoke genome-wide liquid biopsies reflect one potential application for MAESTRO. In cases like this, we expect upfront costs could be amortized over many serial tests, while being offset by large savings in sequencing required per test. We’ve shown that MAESTRO can be used to estimate the dilution (e.g. tumor fraction) of a sample, even though mutation enrichment may lose the ability to quantify the VAF of each mutation. It is possible that mutations could be missed by MAESTRO due to poor probe performance or exclusion by the noise filter. We are working to address this by incorporating internal controls to calibrate enrichment performance on a locus-by-locus basis, as well as probes against fixed sequences to more systematically estimate the total molecular diversity of the library and to confirm whether we have sequenced to saturation. We also focused on enrichment of point mutations, but expect that MAESTRO could also be useful for tracking other types of alterations such as insertions and deletions or structural variants. Notably, the reduction in reads required by MAESTRO can be up to 100-fold but varies with the fold-enrichment of the assayed mutations. In all, MAESTRO provides a powerful approach to (i) convert low-abundance mutations into high-abundance mutations, and (ii) enable their detection with high-accuracy sequencing using significantly fewer reads. This means that it is no longer necessary to trade breadth for depth, or accuracy for efficiency, when tracking many low-abundance mutations in clinical samples.

Methods

Patients and Samples

All patients provided written informed consent to allow the collection of blood and/or tumor tissue and analysis of clinical and genetic data for research purposes. Adult patients with HER2-negative breast cancer and a tumor size >1.5 cm were prospectively enrolled into Dana-Farber Cancer Institute IRB-approved treatment protocol 07130 (NCT00546156). Primary endpoint of the trial was pathologic complete response (pathCR) to preoperative therapy. Data were collected at the Dana-Farber Cancer Institute and Massachusetts General Hospital in Boston, Massachusetts. All patients completed the following course of neoadjuvant Phase II therapy: Bevacizumab × 1 dose; Doxorubicin/Cyclophosphamide × 4 cycles plus Bevacizumab; Paclitaxel × 4 cycles plus Bevacizumab. Patients had plasma isolated from 20 cc blood in EDTA tubes at baseline and prior to each treatment and tissue sampling performed at diagnosis. A Residual Cancer Burden (RCB) score was calculated after surgery. We selected for analysis all enrolled patients with ER-negative, PR-negative, HER2-negative (triple-negative) breast cancer (TNBC). For those patients with sufficient tumor tissue, exome-sequencing identified mutations that we captured using our Conventional assay[23]. From within this cohort we identified four TNBC patients who had tested MRD-negative using the exome-wide panel but who experienced metastatic recurrence. For these patients, we applied MAESTRO to analyze genome-wide tumor mutations. HapMap DNA from NA12878 and NA19238 was purchased from Corielle. This research was conducted in accordance with the provisions of the Declaration of Helsinki and the U.S. Common Rule.

Defining Mutations to Track

For the HapMap panels, VCF files were taken from the Genome in a Bottle Consortium[49] (NA12878) and 1000 Genomes project[50] (NA19238). Sites specific to NA12878 were subsampled to create MAF files and were subsequently run through probe design to create the 438 and 10,000 SNV (single nucleotide variant) fingerprints. Tumor DNA was extracted from fresh-frozen tumor samples. All patients’ tumor DNA underwent whole-exome sequencing to identify trackable mutations for conventional capture. Of the four patients selected for MAESTRO, tumor DNA underwent PCR-free whole-genome sequencing. Illumina output from whole-genome sequencing was processed by the Broad Picard pipeline and aligned to hg19 using BWA. We ran the GATK best practices workflow on the Terra platform to detect somatic SNVs and indels in our deep whole-genome sequencing data using tumor/normal calling (see Terra workflow). We subset the somatic mutation calls to only SNVs and passed the candidate SNVs for tracking to our probe design pipeline. By sequencing each patient’s tumor and normal to adequate depth we can avoid tracking variants arising from clonal hematopoiesis.

Probe Design

Mutations in MAF (mutation annotation format) were first checked for specificity in the reference genome to filter out potential mapping artifacts. The resulting filtered MAF was then used as input into probe design. Conventional probe design was performed on the filtered MAF as previously described[23]. For MAESTRO probe design, along with the mutation file, initial probe length (default = 30 bp), annealing temperature (default = 50°C), and ΔG range (default = −18 to −14 kcal/mol) were used as input. For ΔG and melting temperature calculations, we use the annealing temperature, [Na+] = 50 mM, [Mg2+] = 0 mM, and [DNA] = 250 nM. An initial sequence was designed for the given length with the mutation at its center. If the sequence was within the specified ΔG range, it proceeded through the subsequent design steps, otherwise the sequence length was adjusted until it fell within the range. A modified BLAST was performed where the melting temperature for each hit was calculated and if it was less than the annealing temperature, it was removed. If there were 10 or greater pass-filter BLAST hits, we attempted to redesign the sequence using a sliding window. This resulted in the mutation being offset from the center of the sequence, but still provided good enrichment. The sequence with the minimum BLAST hits was then chosen. All sequences were output in a tab-delimited file, and the results were filtered based on length, GC content, ΔG, and the number of BLAST hits before ending up with the final panel design (Supplementary Fig. 1). All probe sequences can be found in Probe Sequences File.

In-house Biotinylation of Probe Panel

Patient-specific oligo pools ordered from Twist Bioscience contained universal forward and reverse primer binding sites. Amplification of the oligo pool was performed using an internally biotin-modified forward primer containing a dU base directly 5’ to the biotinylated dT and an unmodified reverse primer containing a BciVI recognition sequence at its 3’ end. The PCR product was purified using Zymo’s DNA Clean & Concentrator-25 columns. Two micrograms of biotinylated, double-stranded product were sequentially subject to the following 100 μL one-tube enzymatic reaction: 40 units BciVI for 60 minutes at 37°C; 10 units Lambda Exonuclease for 30 minutes at 37°C followed by 20 minutes at 80°C; 7 units USER Enzyme for 30 minutes at 37°C (NEB). Zymo’s Oligo Clean & Concentrator columns were used to purify short, single-stranded, biotinylated probes for hybrid capture.

Creating Spike-in Dilution Series with Sheared gDNA

Healthy gDNA from two HapMap cell lines, NA12878 and NA19238, were sheared to 150 bp fragments using a Covaris E220/LE220 Ultrasonicator. Sheared DNA was quantified using the Quant-iT Picogreen dsDNA assay kit on a Hamilton STAR-line liquid handler. Dilutions were created by spiking sheared gDNA from NA12878 (“mutant”) into NA19238 (“normal”) at 0, 1:1k, 1:10k, 1:100k, and 1:1M. All libraries were constructed with 20 ng sheared gDNA using the Kapa Hyper Prep Kit with custom dual-index duplex UMI adapters (IDT). These UMI adapters allowed tracking of the top and bottom strand of each unique starting molecule despite rounds of amplification. In cases where there was insufficient library remaining for a subsequent capture, 200 ng of library was subject to additional rounds of PCR to generate workable mass (>1 μg) for hybrid capture using KAPA’s library amplification primer mix. In cases where technical replicates of the same library were needed, libraries were reindexed using a new set of P5/P7 indices (IDT).

Creating Spike-in Dilution Series with cell-free DNA

Processing of patient blood samples followed the same protocol as previously described[51]. Germline DNA (gDNA) was extracted from either buffy coat or whole blood using the QIAsymphony DSP DNA Mini kit and sheared. Cell-free DNA (cfDNA) was extracted from plasma using the QIAsymphony DSP Circulating DNA Kit. 20 ng libraries were made for patients’ cfDNA timepoints and sheared gDNA. Healthy donor cfDNA and gDNA were obtained from fresh plasma and whole blood, respectively, at Research Blood Components and processed in the same manner as patient samples. For the cfDNA dilution series, tumor fraction dilutions were created by spiking cancer patient’s cell-free DNA (starting tumor fraction ~1/100) into healthy donor cfDNA at 0, 1:8k, 1:30k, 1:80k, 1:100k, and 1:300k tumor fractions. Replicate 20ng libraries were made for the dilution series as described in the above section.

MAESTRO capture

Hybrid capture using biotinylated, short probe panels was performed using xGen Hybridization and Wash Kit with xGen Universal Blockers (IDT) using a protocol adapted from Schmitt, et al[22]. Each hybrid capture contained 1 μg of library and 0.75 pmol/μL of MAESTRO probes (IDT or Twist Bioscience), using wells in the middle of the 96-well plate to prevent temperature fluctuations. The hybridization program began at 95°C for 30 seconds. This was followed by a stepwise decrease in temperature from 65°C to 50°C, dropping 1°C every 48 minutes. Finally, the plate was held at 50°C for at least four hours, making the total time in hybridization 16 hours. Heated wash buffer was kept at 50°C (lid temp 55°C) and heated wash steps were performed at 50°C. After the first round of hybrid capture, 16 cycles of PCR were applied. The product was subject to a second round of hybrid capture using half volumes of Cot-1 DNA, xGen Universal Blockers, and probes. This was followed by another 16 cycles of PCR. Apart from these differences, MAESTRO double capture was performed using the same protocol as outlined in Parsons, et al[23]. Final captured product was quantified and pooled for sequencing on an Illumina HiSeq 2500 (101 bp paired-end reads) or a HiSeqX (151 bp paired-end reads) with a target raw depth of 10,000 × per site.

Conventional Capture

The following described protocol was outlined previously in Parsons, et al[23]. Hybrid capture using a panel consisting of patient-specific (i.e. germline informed), biotinylated 120 nt probes was performed using the xGen Hybridization and Wash Kit with xGen Universal Blockers (IDT). For each Conventional capture reaction, libraries were pooled up to 6-plex with 500 ng input each and 0.56 to 0.75 pmol/μL of probe panel was applied (IDT). The hybridization program began at 95°C for 30 seconds. This was followed by 65°C for 16 hours. Heated wash buffer was kept at 65°C (lid temp 70°C) and heated wash steps were performed at 65°C. After the first round of hybrid capture, 16 cycles of PCR were applied. The product was subject to a second round of hybrid capture using half volumes of Cot-1 DNA, xGen Universal Blockers, and probes. This was followed by another 8 cycles of PCR. Final captured product was quantified and pooled for sequencing on an Illumina HiSeq 2500 (101 bp paired-end reads) or a HiSeqX (151 bp paired-end reads) with a target raw depth of 1,000,000 × per site.

Quantification of Library Conversion Efficiency by ddPCR

To quantify library conversion efficiency, a ddPCR assay was designed to target the flanking adapter regions. Only fragments with successful double ligation were exponentially amplified within the QX200 ddPCR EvaGreen Supermix (Bio-Rad). Varying DNA inputs into LC (3ng, 10ng, 20ng, 50ng) were tested for their varying conversion efficiencies and adjusted to an unligated control. The results are shown in Supplementary Table 1. The sequences of the primers used are as follows: Primer 1: CACTCTTTCCCTACACGACG Primer 2: AGTTCAGACGTGTGCTCTTC

Quantification of Probe Capture Efficiency by ddPCR

To quantify probe capture efficiency, a ddPCR assay was designed to target a homozygous mutation site chosen from the 438 SNV HapMap fingerprint (see ddPCR assay design below). Conventional and MAESTRO hybrid capture was performed on pure tumor Hapmap gDNA libraries, with all waste streams collected from washes. The total number of mutant molecules into hybrid capture and lost during hybrid capture were quantified by using the designed ddPCR assay[23,44]. Probe capture efficiencies were determined using the equation below. The results are shown in Supplementary Table 2. To determine probe capture efficiency, we targeted the homozygous mutation site (g.chr2:29940529A>T) with the following primers and Taqman probe: Primer 1 (targeting a consensus sequence on our duplex UMI adapters): CACTCTTTCCCTACACGACG Primer 2: ATGTCCAGGTCATAGCTCC Taqman probe: /5HEX/-CAACAAACATGCCATCTCCTTCTCCTGA-/ZEN//3IaBkFQ/

Sequencing and Data Analysis

Sequencing and pre-processing of BAM files followed a similar protocol as previously described[23,44] with the following changes. Before grouping reads by UMI, read groups were added to samples from the same library and samples were merged into a single BAM. This ensured identical molecules found in different samples were given the same family ID from Fgbio’s GroupReadsByUmi (Fulcrum Genomics). The resulting BAM was then pushed through GroupReadsByUmi and split afterwards by the added read group tag. The split BAMs were then passed through the consensus calling workflow. Consensus BAM files were indel realigned using GATK 4 before calling mutations using custom scripts. Noise filtering based on DSC/SSC ratio (total mutant DSCs / total mutant SSCs) was performed on all MAESTRO samples. For mutation calling in clinical samples, we used both the matched tumor and normal. We required that each mutation be seen in the tumor and not in the normal in order for the mutation to be considered. Processing of BAM files was automated using a Snakemake[52] workflow (Supplementary Fig. 12).

Miredas Minimal Residual Disease Analysis Scripts

A suite of scripts (Miredas) was used for calling mutations and creating metrics files. In the Snakemake workflow, MiredasCollectErrorMetric uses the duplex BAM file to describe the number of errors and calculates errors per base sequenced. MiredasDetectFingerprint uses the duplex BAM file to call mutations and MiredasDetectFingerprintSsc uses the single-stranded BAM file to call mutations. This single-stranded output of MiredasDetectFingerprintSsc is used along with the duplex MiredasDetectFingerprint output to create DSC/SSC ratios.

VAF/Recall

Raw VAF was calculated using the single strand consensus BAMs as consensus bases are more reliable compared to raw sequenced bases and help correct for PCR bias. We opted to use the single strand consensus BAMs rather than the duplex BAMs as we wanted to retain the majority of sequenced reads - with duplex sequencing, more than 50% of reads can be lost due to support only being observed on one strand. For each site, a pileup was created from the single strand consensus BAM and read bases were compared to the called bases in the MAF file. Each base was categorized as reference (REF), alternate (ALT), or OTHER and the consensus family size (number of reads contributing to the consensus) was added to the site’s read counts. Raw VAF could then be calculated by comparing the number of ALT reads to the total reads (REF + ALT + OTHER) for each site. This raw VAF measurement is important for determining the efficiency of sequencing the ALT base, but may not be an accurate readout of true variant allele fraction due to PCR bias. To address this, we have included duplex (DSC) VAF in Supplementary Table 3, where duplex (DSC) VAF is calculated using the consensus duplex fragments rather than family size as used in raw VAF. To assess recall, the duplex consensus BAM files were used. Our consensus calling workflow gives source molecules the same family ID, so two samples from the same library have many overlapping molecules. Recall was calculated by looking at the overlap of duplex families between two samples (oftentimes a Conventional sample and a MAESTRO sample). See Supplementary Fig. 2B for an example.

Noise Filter

Four replicate negative controls were created from the same source library via reindexing as described in Creating Spike-in Dilution Series with Sheared gDNA. The replicates were captured using the 10,000 SNV MAESTRO panel. For each targeted site with ALT molecules present in any of the replicates, a DSC/SSC ratio was calculated by summing all ALT supporting duplexes and dividing by the total ALT supporting single strand consensus molecules. Targets with ALT duplexes present in more than one replicate were considered “shared” whereas targets with ALT duplexes present in a single replicate were marked as “exclusive”. A single DSC/SSC ratio was chosen that maximized the number of targets shared while minimizing the number of exclusive targets.

Probe Spike-in Experiment

Capture was performed using the 10,000 SNV MAESTRO panel on two replicate negative control samples (no spike-in) and compared to a replicate negative control with 10X the standard concentration of the 10,000 SNV panel (10X spike-in), and two additional negative controls with 1,000X the standard concentration of ten MAESTRO probes added prior to both post-capture PCRs (1,000X spike-in). These ten MAESTRO probes were selected randomly from the 10,000 SNV panel and synthesized by IDT. These spike-in conditions simulated the worst-case scenario that excess probe can create new mutant molecules by extending from real molecules, specifically during post-capture PCR (see Supplementary Fig. 4A for a schematic of this hypothesis).

Tumor Fraction Estimation

Methods for calculating tumor fraction were previously described[23,44] but some changes were made for use with MAESTRO. In a Conventional sample, the full wildtype and mutant diversity is available and can inform tumor fraction. This is important as our tumor fraction methods currently rely on first calculating allele fraction (ALT depth / total depth) for all sites. In MAESTRO samples, we often have full mutant diversity, but wildtype molecules have been depleted. Because enrichment is not perfect, for each panel we have some targets that retain the full diversity of wildtype. We currently leverage this imperfect enrichment to estimate what the total potential depth of the sample is (how many cells likely contributed to the cfDNA library). This estimated depth is applied to all targets which allows us to calculate allele fraction (without considering copy number alterations) and subsequently tumor fraction. Fig. 5B shows this strategy and how it compares to actual tumor fractions. We are aware that these methods are not perfect in their current state, but believe that advances in quality control (ie testing for a handful of germline SNPs to measure unique duplexes per loci) could further improve tumor fraction estimation from enriched samples.

43 in total

1. The propensity of individuals to deposit DNA and secondary transfer of low level DNA from individuals to inert surfaces.

Authors: Alex Lowe; Caroline Murray; Jonathan Whitaker; Gillian Tully; Peter Gill
Journal: Forensic Sci Int Date: 2002-09-10 Impact factor: 2.395

2. Universal noninvasive detection of solid organ transplant rejection.

Authors: Thomas M Snyder; Kiran K Khush; Hannah A Valantine; Stephen R Quake
Journal: Proc Natl Acad Sci U S A Date: 2011-03-28 Impact factor: 11.205

3. Detection of ultra-rare mutations by next-generation sequencing.

Authors: Michael W Schmitt; Scott R Kennedy; Jesse J Salk; Edward J Fox; Joseph B Hiatt; Lawrence A Loeb
Journal: Proc Natl Acad Sci U S A Date: 2012-08-01 Impact factor: 11.205

4. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease.

Authors: Timothy A Blauwkamp; Simone Thair; Judith C Wilber; Samuel Yang; Michael J Rosen; Lily Blair; Martin S Lindner; Igor D Vilfan; Trupti Kawli; Fred C Christians; Shivkumar Venkatasubrahmanyam; Gregory D Wall; Anita Cheung; Zoë N Rogers; Galit Meshulam-Simon; Liza Huijse; Sanjeev Balakrishnan; James V Quinn; Desiree Hollemon; David K Hong; Marla Lay Vaughn; Mickey Kertesz; Sivan Bercovici
Journal: Nat Microbiol Date: 2019-02-11 Impact factor: 17.745

5. Somatic mutations predict poor outcome in patients with myelodysplastic syndrome after hematopoietic stem-cell transplantation.

Authors: Rafael Bejar; Kristen E Stevenson; Bennett Caughey; R Coleman Lindsley; Brenton G Mar; Petar Stojanov; Gad Getz; David P Steensma; Jerome Ritz; Robert Soiffer; Joseph H Antin; Edwin Alyea; Philippe Armand; Vincent Ho; John Koreth; Donna Neuberg; Corey S Cutler; Benjamin L Ebert
Journal: J Clin Oncol Date: 2014-08-04 Impact factor: 44.544

6. The impact of maternal plasma DNA fetal fraction on next generation sequencing tests for common fetal aneuploidies.

Authors: Jacob A Canick; Glenn E Palomaki; Edward M Kloza; Geralyn M Lambert-Messerlian; James E Haddow
Journal: Prenat Diagn Date: 2013-05-31 Impact factor: 3.050

Review 7. Somatic mosaicism and neurodevelopmental disease.

Authors: Alissa M D'Gama; Christopher A Walsh
Journal: Nat Neurosci Date: 2018-10-22 Impact factor: 24.884

8. Measurement and clinical monitoring of human lymphocyte clonality by massively parallel VDJ pyrosequencing.

Authors: Scott D Boyd; Eleanor L Marshall; Jason D Merker; Jay M Maniar; Lyndon N Zhang; Bita Sahaf; Carol D Jones; Birgitte B Simen; Bozena Hanczaruk; Khoa D Nguyen; Kari C Nadeau; Michael Egholm; David B Miklos; James L Zehnder; Andrew Z Fire
Journal: Sci Transl Med Date: 2009-12-23 Impact factor: 17.956