| Literature DB >> 30886209 |
Quan Peng1, Chang Xu1, Daniel Kim1, Marcus Lewis1, John DiCarlo1, Yexun Wang2.
Abstract
For specific detection of somatic variants at very low levels, artifacts from the NGS workflow have to be eliminated. Various approaches using unique molecular identifiers (UMI) to analytically remove NGS artifacts have been described. Among them, Duplex-seq was shown to be highly effective, by leveraging the sequence complementarity of two DNA strands. However, all of the published Duplex-seq implementations so far required pair-end sequencing and in the case of combining duplex sequencing with target enrichment, lengthy hybridization enrichment was required. We developed a simple protocol, which enabled the retrieval of duplex UMI in multiplex PCR based enrichment and sequencing. Using this protocol and reference materials, we demonstrated the accurate detection of known SNVs at 0.1-0.2% allele fractions, aided by duplex UMI. We also observed that low level base substitution artifacts could be introduced when preparing in vitro DNA reference materials, which could limit their utility as a benchmarking tool for variant detection at very low levels. Our new targeted sequencing method offers the benefit of using duplex UMI to remove NGS artifacts in a much more simplified workflow than existing targeted duplex sequencing methods.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30886209 PMCID: PMC6423013 DOI: 10.1038/s41598-019-41215-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Design of single-end duplex-UMI adapter. (a) Schematics showing how previously described duplex UMI sequencing is not compatible with single primer PCR enrichment. The newly formed amplicons (dash lines) do not contain enough information to be grouped into corresponding duplexes. (b) Design and synthesis of the single end duplex-UMI adapter. Both UMI and strand barcodes (forming duplex UMI) are contained within a single adapter molecule and can be sequenced within one read. (c) Depiction of how duplex amplicons are encoded during the first few PCR enrichment cycles.
Figure 2Analysis of base artifacts from sequencing results. (a) Observed panel-wise mean error rates from single UMI consensus reads for 12 different base substitution types from various DNA inputs. Here the base substitution type refers to the expected vs alternative nucleotide incorporated by DNA polymerase. For the same locus, the expected nucleotides to be incorporated will be different depending on which DNA strand is used as the template. The error bars indicate 95% confidence intervals of the error rates. The CIs were calculated using the Wilson’s method assuming Binomial distribution. (b,c) One possible mechanism (b) and evidence (c) for how artifacts were introduced during end repair process. We performed a two sample T-test to compare the mean of the two groups (25 real mutation sites and 29 false positives with duplex UMI support). The one-sided p-value is 0.12 under alpha level of 0.05.
Summary of the sequencing runs for various DNA reference materials.
| Sample | 0.2% DNA reference | 0.1% DNA reference | Seraseq® ctDNA Wild Type | Seraseq® 0.25% ctDNA |
|---|---|---|---|---|
| Input amount | 160 ng | 160 ng | 40 ng | 40 ng |
| Number of enrichment primers | 192 | 192 | 72 | 72 |
| Size of target region | 17,859 bp | 17,859 bp | 5,831 bp | 5,831 bp |
| Total read pairs | 14,428,738 | 23,761,224 | 3,469,754 | 4,535,556 |
| Read pairs on-target | 12,243,309 | 20,251,846 | 2,484,623 | 3,456,269 |
| On-target rate | 85% | 85% | 72% | 76% |
| Mean read pair/primer | 63,827 | 105,665 | 34,048 | 47,363 |
| Mean UMI/primer | 14,221 | 16,928 | 7,778 | 8,727 |
| Mean Duplex UMI/primer | 3,322 | 4,075 | 2,777 | 3,473 |
| Mean read/UMI | 4.5 | 6.2 | 4.4 | 5.5 |
| % primer ≥0.2X mean read depth | 99.5% | 99.5% | 98.6% | 98.6% |
Figure 3Sensitivity and specificity for detecting 0.1% and 0.2% SNVs from in-house made DNA reference. ROC curves for 0.1% SNVs (a) and 0.2% SNVs (b). “Duplex” represent variant calling performance using both single-plex and duplex UMI information. “Single-plex” represent variant calling performance by treating all UMIs as single-plex.
Figure 4Detecting 0.25% variants from commercial NGS reference. (a) Sensitivity and specificity for detecting 0.25% SNVs and indels from Seraseq® ctDNA mix. For the small target region, one false positive is proportional to 171 false positives/Mbp region. (b) Distribution of allele fractions for all variants seen in the Seraseq® 0.25% cfDNA mix. Variants were identified from UMI consensus reads and no variant calling threshold was applied. Most of the false variants were below 0.1% allele fractions and were well separated from the real variants. (c) Around 80% of the low level false variants observed in Seraseq® 0.25% ctDNA mix were also seen at similar fractions in the corresponding wildtype control. Since the 0.25% mix was made using wildtype control as the background, this strongly suggested that those false variants were intrinsic artifacts in the sample but not random errors from the enrichment and sequencing.