| Literature DB >> 28874686 |
Miguel Alcaide1, Stephen Yu1, Jordan Davidson1, Marco Albuquerque1, Kevin Bushell1, Daniel Fornika1, Sarah Arthur1, Bruno M Grande1, Suzan McNamara2, Mathilde Couetoux du Tertre2, Gerald Batist2, David G Huntsman3,4, Luca Cavallone5, Adriana Aguilar5, Mark Basik5, Nathalie A Johnson5, Rebecca J Deyell6, S Rod Rassekh6, Ryan D Morin7.
Abstract
Ultrasensitive methods for rare allele detection are critical to leverage the full potential offered by liquid biopsies. Here, we describe a novel molecular barcoding method for the precise detection and quantification of circulating tumor DNA (ctDNA). The major benefits of our design include straightforward and cost-effective production of barcoded adapters to tag individual DNA molecules before PCR and sequencing, and better control over cross-contamination between experiments. We validated our approach in a cohort of 24 patients with a broad spectrum of cancer diagnoses by targeting and quantifying single-nucleotide variants (SNVs), indels and genomic rearrangements in plasma samples. By using personalized panels targeting a priori known mutations, we demonstrate comprehensive error-suppression capabilities for SNVs and detection thresholds for ctDNA below 0.1%. We also show that our semi-degenerate barcoded adapters hold promise for noninvasive genotyping in the absence of tumor biopsies and monitoring of minimal residual disease in longitudinal plasma samples. The benefits demonstrated here include broad applicability, flexibility, affordability and reproducibility in the research and clinical settings.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28874686 PMCID: PMC5585219 DOI: 10.1038/s41598-017-10269-2
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the experimental workflow to track ctDNA in cancer patients using semi-degenerate barcoded adapters and personalized panels of biotinylated baits. Biotinylated baits targeting somatic mutations previously identified via the sequencing of tumor/liquid biopsies and matched normal DNA samples are generated “in-house” or ordered from commercial manufacturers (1). Next, libraries are built using the cfDNA isolated from liquid biopsy specimens (2). End-repaired and A-tailed cfDNA fragments are ligated with partially complementary double-stranded barcoded adapters and then PCR-amplified with 6-nucleotide dual-indexed primers that provide P5 and P7 Illumina adapter sequences. Our custom adapters are comprised of the annealing of two oligonucleotides that harbor non-complementary tri-nucleotide tags for either the plus (5′–3′) or the minus (3′–5′) strand. Different nucleotides within the fixed tags are represented by colours (A:red; C:blue; T:green; G:orange). This adapter design also includes a semi-degenerate and potentially complementary 12-nucleotide barcode sequence ((5′-WSMRWSYWKMWW-3′) in plus strand; (5′-WWKMWRSWYKSW-3′) in minus strand)). During the annealing of the two oligonucleotides a perfect complementary match can occur (right adapter) but, more commonly, hybridizations include annealing mispairings (left adapter). Solid red squares represent either A-T or T-A base pairings (W vs W); solid yellow squares represent either G-C or C-G base pairings (S vs S); solid blue squares represent either C-G or A-T base pairings (M vs K); orange squares represent G-C or A-T (R vs Y); solid green squares represent C-G or T-A base pairings (Y vs R) and solid violet squares represent G-C or T-A base pairings (K vs M). Annealing mispairings (see left adapter) are denoted by the presence of the same base at equivalent positions in both strands. Libraries are then subjected to two rounds (ideally) of hybridization capture using personalized panels of biotinylated baits and final enriched libraries are sequenced on Illumina platforms (3). The bioinformatic analysis of the NGS reads involves the filtering of on-target reads, merging of paired reads with overlapping ends and generation of consensus sequences according to a de-novo assembly approach that allows for a maximum of 1% mismatches and maximum gap size of 1 bp (4). In essence, the two parental strands derived from every single cfDNA molecule generate independent PCR families. Consensus sequences are generated from each PCR family with at least three independent reads. Consensus sequences from independent strand orientations are considered to derive from the same cfDNA molecule if they share the same start/end positions in the reference sequence and if they do not show more than 2 mismatches in the last 6 semi-degenerate barcode positions flanking the ligation site. Duplex sequencing allows correcting any strand-specific errors or variants deriving from DNA damage. After sequencing, solid red squares represent W degenerate positions (i.e. either A or T); solid yellow squares = S; solid blue squares = M; solid orange squares = R; solid green squares = Y; solid violate squares = K). Annealing mismatches are denoted by white squares and indicated by asterisks. Black squares represent discrepancies with respect to the reference sequence. Consensus sequences are finally mapped against the reference sequence (5) and targeted genomic positions are screened for duplex support of ctDNA and its abundance (6) Only variants independently supported by the consensus sequences of both parental strands are considered high-confidence.
Disease diagnosis and clinical enrollment of the 24 patients investigated in this study.
| Patient ID | Disease Diagnosis | Clinical Trial | DNA Baits | OT Reads |
|---|---|---|---|---|
| NB-pt01 | NEUROBLASTOMA | POG | Pt-Specif. (2x) | 153 K, 201 K, 279 K 250 K, 271 K, 412 K |
| NB-pt02 | NEUROBLASTOMA | POG | Pt-Specif. (1X) | 1 K |
| NB-pt03 | NEUROBLASTOMA | POG | Pt-Specif. (1X) | 14 K |
| NB-pt04 | NEUROBLASTOMA | POG | Pt-Specif. (2X) | 1,400 K |
| OVC-pt01 | OVARIAN GRANULOSA | POG | Pt-Specif. (2X) | 2,400 K (V1); 1,400 K (V2) |
| OVC-pt02 | OVARIAN CARCINOMA | POG | Pt-Specif. (2X) | 1,800 K |
| OSS-pt01 | OSTEOSARCOMA | POG | Pt-Specif. (2X) | 450 K |
| OSS-pt02 | OSTEOSARCOMA | POG | Pt-Specif. (1X) | 20 K |
| NMC-pt01 | NUT MIDLINE CARCINOMA | POG | Pt-Specif. (1X) | 3 K |
| CPG-pt01 | CRANIOPHARYNGIOMA | POG | Pt-Specif. (2X) | 320 K |
| PIB-pt01 | PINEOBLASTOMA | POG | Pt-Specif. (2X) | 2,500 K |
| IFB-pt01 | INFANTILE FIBROSARCOMA | POG | Pt-Specif. (2X) | 1,280 K |
| ASL-pt01 | ANGIOSARCOMA OF LIVER | POG | Pt-Specif. (1X) | 77 K |
| SAR-pt01 | SARCOMA | POG | Pt-Specif. (1X) | 1 K |
| ESR-pt01 | EWING SARCOMA | POG | Pan-Cancer. (1X) | 400 K |
| MGC-pt01 | MALIGNANT GRANULLAR CELL TUMOUR | POG | Pt-Specif. (2X) | 770 K |
| HGL-pt01 | HODGKIN LYMPHOMA | POG | Pan-Cancer + Dis-Specif. (1X) | 2,000 K |
| DLBCL-pt01 | DIFFUSE LARGE B-CELL LYMPHOMA | POG | Pt-Specif. (2X) | 120 K |
| DLBCL-pt015 | DIFFUSE LARGE B-CELL LYMPHOMA | Q-CROC-02 | Dis-Specif. (1X) | 1,600 K |
| ALL-pt01 | ACUTE LYMPHOBLASTIC LEUKEMIA | POG | Pan-Cancer | 650 K |
| CCR-pt029 | COLORECTAL CANCER | Q-CROC-01 | Pan-Cancer | 4,800 K |
| CCR-pt049 | COLORECTAL CANCER | Q-CROC-01 | Pan-Cancer | 3,600 K |
| Neo-02 | BREAST CANCER | Q-CROC-03 |
| 4,500 K |
| Neo-027 | BREAST CANCER | Q-CROC-03 |
| 2,400 K |
This table shows whether the enrichment of the cell-free DNA was accomplished using patient-specific sets of biotinylated DNA baits (Pt-Specif.), panels of DNA probes spanning the coding regions of a single gene (TP53), disease-specific panels (Dis-Specif.) or broad panels of probes targeting several cancer-related genes (Pan-Cancer,see methods). The “DNA baits” column also indicates whether a single (1x) or two rounds of hybridization capture (2x) were carried out during each corresponding targeted enrichment experiment. The approximate number of on-target reads obtained for each patient is shown in the far right column (K = 103). More than one value for patient NB-pt01 relates to the different library replicates built from the plasma of this patient (top row values: libraries built with barcoded adapters; bottom row values: libraries built with standard sequencing adapters).
Number of unique cfDNA molecules mapping to five independent genomic positions (Loc1 to 5; based on unique mapping coordinates of library fragments) in the three library replicates (R1, R2, R3) that were built with either semi-degenerate barcoded adapters (BarAd) or standard sequencing adapters (StdAd) from the same amount of input cfDNA (2.3 ng) in patient NB-pt01.
| Loc1 | Loc2 | Loc3 | Loc4 | Loc5 | |
|---|---|---|---|---|---|
| BarAd - R1 | 280 (0.379) | 394 (0.076) | 457 (0.543) | 418 (0.029) | 396 (0.215) |
| BarAd - R2 | 228 (0.338) | 411 (0.078) | 469 (0.559) | 450 (0.062) | 460 (0.171) |
| BarAd - R3 | 294 (0.259) | 439 (0.032) | 492 (0.583) | 438 (0.046) | 437 (0.158) |
|
|
|
|
|
|
|
| StdAd - R1 | 266 (0.349) | 358 (0.047) | 470 (0.521) | 413 (0.041) | 396 (0.186) |
| StdAd - R2 | 293 (0.307) | 386 (0.054) | 520 (0.544) | 405 (0.039) | 408 (0.200) |
| StdAd - R1 | 305 (0308) | 381 (0.063) | 447 (0.544) | 412 (0.034) | 411 (0.200) |
|
|
|
|
|
|
|
| Two-tailed p-value | 0.47 | 0.054 | 0.84 | 0.079 | 0.22 |
Estimated allele frequencies for mutant DNA circulating in the plasma are indicated in parentheses. For each triplicate series, one of the rows (in bold) shows average values. We did not find statistically significant differences concerning recovery efficiencies of cfDNA when comparing the number of unique molecules retrieved by either barcoded or standard sequencing adapters (bottom row). Genomic coordinates for the five single-nucleotide variants targeted in this patient are provided in Table S1.
Tracking ctDNA in the plasma of several cancer patients using semi-degenerate barcoded adapters and personalized panels of biotinylated baits.
| Patient ID | Mutant ssDNA | ds Support? | ctDNA Loci | Av. ssDNA Cov | VAF Range (ssDNA) |
|---|---|---|---|---|---|
| NB-pt01-R1 | 180 | >40 molecules | 5/5 | 426.2 | 0.029–0.543 |
| NB-pt01-R2 | 211 | >50 molecules | 5/5 | 501.6 | 0.059–0.559 |
| NB-pt01-R3 | 216 | >50 molecules | 5/5 | 571.8 | 0.027–0.583 |
| NB-pt02 | 377 | 22 molecules | 1/1 | 998 | 0.378 |
| NB-pt03 | 138 (T) | 0 molecules | 3/5 | 67/1,097 | 0.107–0.202 |
| NB-pt04 | >1,000 | >100 molecules | 4/5 | 1300 | 0.20–0.309 |
| OVC-pt01 (V1) | 2 | 0 molecules | 1/6 | 2544 | 0.0006 |
| OVC-pt01 (V2) | 14928 | >2,000 molecules | 4/6 | 13192 | 0.255–0.298 |
| OVC-pt02 | 215 | >50 molecules | 5/5 | 2300 | 0.009–0.025 |
| OSS-pt01 | 11 | 4 molecules | 3/4 | 4839 | 0.0005–0.001 |
| OSS-pt02* | 0 | 0 | 0/8 | 1660 | 0 |
| NMC-pt01 | 1 (T) | 0 | 1/1 | 786 | 0.0013 |
| CPG-pt01* | 0 | 0 | 0/2 | 670 | 0 |
| PIB-pt01 | 5 | 2 molecules | 1/1 | 6843 | 0.0007 |
| IFB-pt01 | 40 | 9 molecules | 1/1 | 6842 | 0.0058 |
| ASL-pt01 | 17 | 1 molecule | 4/5 | 833 | 0.001–0.008 |
| SAR-pt01 | 2 (T) | 0 molecules | 1/1 | 635 | 0.0016 |
| MGC-pt01 | 6 | 2 molecules | 1/1 | 5384 | 0.001 |
| DLBCL-pt01 | 260 | >30 molecules | 4/5 | 334 | 0.223–0.290 |
This table shows the total number of consensus sequences derived from one single strand that supports the presence of ctDNA at targeted loci, whether a ctDNA call has or not duplex sequencing support and the fraction of targeted sites where ctDNA has been detected. The last two columns show the average number of ssDNA consensus sequences spanning each one of the targeted sites and the range of variant allele frequencies (VAF) detected in plasma. We mostly targeted SNVs and small indels but also genomic translocations (indicated by a “T”). Patient NB-pt03 displays two values at the Av. ssDNA Cov column because two independent experiments were carried out to target SNVs and a gene fusion in the plasma.
Figure 2(A) Distribution of the number of nucleotide differences between two random unique molecule identifiers (UIDs) attached to one of the parental strands of dsDNA molecules. (B) Distribution of the number of base mispairings or annealing artifacts along the semi-degenerate barcoded region that arise during adapter annealing. (C) Annealing artifacts are less common within the last six nucleotides of each barcode (i.e. those preceding the ligation site). (D) Distribution of base composition (consensus A: red; consensus C: blue; consensus G: yellow; consensus T: green) and frequency of annealing artifacts (black bars) across every position of the 12-nucleotide semi-degenerate barcode of each adapter molecule. Some of the positions show skewed base ratios that can be attributed to the automated mixing method for randomization during manufacturing. Better ratios for certain semi-degenerate sites might be achieved by selecting the hand mixing method (see methods). The frequency of misannealing artifacts decreases towards the ligation site. (E) Distribution of the number of mismatches between two random UIDs when only the last six barcode positions preceding the ligation site are considered. This data was collected from the three library replicates built from patient NB-pt01 plasma. The X axis shows the number of mismatches or mispairings observed between two given barcode sequences (barcodes attached to single strands originating from independent DNA molecules in A and E and barcodes attached to each of the two parental strands of double stranded DNA molecules in (B and C). The Y axis shows the percentage of comparisons with that number of mismatches or base mispairings.
Figure 3Histogram of PCR family size distribution for five cfDNA libraries built with 12-nucleotide semi-degenerate barcode adapters. The three library replicates for NB-pt01 plasma (R1, R2 and R3) are depicted by different tones of blue. The libraries built from OSS-pt01 and OVC-pt01 (V2) plasmas are indicated by red and green colors, respectively. The X axis reflects different PCR family size categories or bins and the Y axis shows the proportion of each of these categories in a given library. Singletons and consensus sequences generated from just two PCR duplicates are not included in this plot.
Figure 4Landscape of background errors in two diverse cfDNA libraries (OVC-pt01 (V2), blue bars; OSS-pt01, red bars). Panels A and B list the type of errors that are corrected at the ssDNA or dsDNA consensus phases, respectively. The Y axis shows different types of presumably false variants and the X axis shows the observed frequency for each of these putative artifacts.
Figure 5Monitoring of ctDNA abundance across longitudinal plasma series drawn from two colorectal cancer patients (CCR-pt029 and CCR-pt049). V1 relates to the plasma sample drawn at the beginning of the therapeutic treatment. Plasma samples collected at V2 and V3 time points show a decrease in ctDNA levels, in agreement with clinical response. The two patients relapsed after several weeks and this fact is consistent with the raise of ctDNA levels observed in the plasma sample collected at V4 time point. The Y axis indicates the variant allele frequency (VAF) of the somatic mutations quantified in plasma. These libraries were constructed with 12-nucletoide semi-degenerate barcoded adapters and enriched with a panel targeting the exons of 128 cancer-related genes.