| Literature DB >> 29382943 |
Emil Christensen1, Iver Nordentoft1, Søren Vang1, Karin Birkenkamp-Demtröder1, Jørgen Bjerggaard Jensen2,3, Mads Agerbæk4, Jakob Skou Pedersen1, Lars Dyrskjøt5.
Abstract
Analysis of plasma cell-free DNA (cfDNA) may provide important information in cancer research, though the often small fraction of DNA originating from tumor cells makes the analysis technically challenging. Digital droplet PCR (ddPCR) has been utilized extensively as sufficient technical performance is easily achieved, but analysis is restricted to few mutations. Next generation sequencing (NGS) approaches have been optimized to provide comparable technical performance, especially with the introduction of unique identifiers (UIDs). However, the parameters influencing data quality when utilizing UIDs are not fully understood. In this study, we applied a targeted NGS approach to 65 plasma samples from bladder cancer patients. Laboratory and bioinformatics parameters were found to influence data quality when using UIDs. We successfully sequenced 249 unique DNA fragments on average per genomic position of interest using a 225 kb gene panel. Validation identified 24 of 38 mutations originally identified using ddPCR across several plasma samples. In addition, four mutations detected in associated tumor samples were detected using NGS, but not using ddPCR. CfDNA analysis of consecutively collected plasma samples from a bladder cancer patient indicated earlier detection of recurrence compared to radiographic imaging. The insights presented here may further the technical advancement of NGS mediated cfDNA analysis.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29382943 PMCID: PMC5789978 DOI: 10.1038/s41598-018-20282-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Selection of frequently mutated genes in bladder cancer. (a) Upper panel: The fraction of patients with given numbers of mutations detected is displayed for an increasing number of genes frequently mutated in bladder cancer. Genes were ranked according to number of mutations per nucleotide. Lower panel: Size of the gene panel as a function of number of genes included. The red line indicates the number of genes selected for enrichment. (b) The fraction of patients with mutations as a function of mutation number per patient is shown for the training cohort used in (a) and for a local bladder cancer validation cohort. (c) Genes selected for enrichment. The TERT promoter region was added subsequently.
Figure 2Targeted sequencing process and UID usage. (a) Graphical overview of the enrichment process applied in the targeted sequencing approach. Colored arrows represent target-specific enrichment probes and sequencing primers. (b) Illustration of sequencing adaptor content. The barcode is used for sample multiplexing. The UID is a stretch of six random nucleotides. (c) Reads with identical mapping positions are represented by individual lines and associated UIDs. Reads with identical UIDs are grouped and collapsed to high-confidence reads.
Figure 3Data distribution for all sequenced plasma samples. (a) Total reads obtained and unique reads constructed are displayed for every sample. Lines are constructed from points for every million total reads and correspondingly constructed unique reads. (b) Correlation between unique reads and unique coverage. Unique reads were counted without considering UID errors using UMI Tools, thereby corresponding to the unique reads in figure (a).
Figure 4PCR cycles in library amplification and associated UID family composition and efficiency. (a) Fraction of large UID families (>50 reads per family) for varying number of PCR cycles. Dots represent single samples; lines represent means per PCR cycle. (b) Fraction of small UID families (1–2 reads per family). (c) Fraction of optimal UID families (3–50 reads per family). (d) The mean size of UID families. (e) Total number of constructed and finalized UID families is presented as a fraction of total reads assigned per sample. (f) On-target fraction of UID families. (g) A number of samples subjected to fewer PCR cycles were split into two PCR amplifications. The upper panel shows the mean UID family size for PCR amplifications for standard and split PCR amplifications. The lower panel shows same comparison for 21 PCR cycles and hence without bias introduced from number of PCR cycles. (h) Total number of constructed and finalized UID families is presented as a fraction of total reads assigned per sample relative to DNA input quantified using Qubit. Dashed lines represent linear models and shaded areas represent confidence intervals constructed per PCR cycle number. Dot size indicates PCR amplification split.
Figure 5Comparison of targeted NGS and ddPCR. (a) A subset of plasma samples subjected to targeted NGS was analyzed previously and mutations were detected using ddPCR assays. (a) Allele frequencies obtained using NGS compared to allele frequencies estimated using ddPCR for identical genomic positions (mutations). The dashed line represents a linear model. The color code for NGS coverage is shown to the right. (b) Obtained unique coverage (NGS) and estimated allele frequencies (ddPCR) are presented for identical genomic positions (mutations). DdPCR is used as gold standard and detection status refers to NGS data. The dashed line represents the unique coverage theoretically necessary to discover a mutation using NGS (based on the estimated allele frequency using ddPCR).
Figure 6Disease course monitoring using cfDNA analysis for a patient with muscle invasive bladder cancer. (a) Eight plasma samples were subjected to NGS during the disease course of the patient. Dots and lines represent observed variant allele frequencies for all identified mutations. Narrow horizontal lines represent the estimated detection limit at the given genomic position in the given sample. Clinical events based on radiographic imaging are marked at the top. Shaded areas represent chemotherapy treatments. (b) Distribution of the estimated detection limit for all enriched genomic positions in all patients and the specific genomic positions shown in (a).