| Literature DB >> 29354287 |
Austin P So1, Anna Vilborg1, Yosr Bouhlal1, Ryan T Koehler1, Susan M Grimes2, Yannick Pouliot1, Daniel Mendoza1, Janet Ziegle1, Jason Stein1, Federico Goodsaid1, Michael Y Lucero1, Francisco M De La Vega1,3, Hanlee P Ji2,4.
Abstract
Next-generation deep sequencing of gene panels is being adopted as a diagnostic test to identify actionable mutations in cancer patient samples. However, clinical samples, such as formalin-fixed, paraffin-embedded specimens, frequently provide low quantities of degraded, poor quality DNA. To overcome these issues, many sequencing assays rely on extensive PCR amplification leading to an accumulation of bias and artifacts. Thus, there is a need for a targeted sequencing assay that performs well with DNA of low quality and quantity without relying on extensive PCR amplification. We evaluate the performance of a targeted sequencing assay based on Oligonucleotide Selective Sequencing, which permits the enrichment of genes and regions of interest and the identification of sequence variants from low amounts of damaged DNA. This assay utilizes a repair process adapted to clinical FFPE samples, followed by adaptor ligation to single stranded DNA and a primer-based capture technique. Our approach generates sequence libraries of high fidelity with reduced reliance on extensive PCR amplification-this facilitates the accurate assessment of copy number alterations in addition to delivering accurate single nucleotide variant and insertion/deletion detection. We apply this method to capture and sequence the exons of a panel of 130 cancer-related genes, from which we obtain high read coverage uniformity across the targeted regions at starting input DNA amounts as low as 10 ng per sample. We demonstrate the performance using a series of reference DNA samples, and by identifying sequence variants in DNA from matched clinical samples originating from different tissue types.Entities:
Year: 2018 PMID: 29354287 PMCID: PMC5768874 DOI: 10.1038/s41525-017-0041-4
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Fig. 1Overview of in-solution OS-Seq process. DNA fragmented to 550 bp is used as the starting material for the OS-Seq assay. Damaged bases are removed by excision only, without implementing a corrective repair step. The DNA is then denatured followed by adapter ligation to single stranded DNA. In-solution capture using primer-probes is performed for ~2 h, followed immediately by extension to complete the library. Finally, the sequence library is expanded by PCR using primers targeting the P7 and P5 regions to generate sufficient quantities of library for sequencing. 5’ and 3’ ends indicated, P7 and P5 indicate regions of adapters and probes, respectively, required for clustering on the Illumina flow cell, or in the “expansion” section, they indicate PCR primers complementary to the P7 and P5 parts of the adapters and probes, respectively. “Ix” stands for index sequence, and “SP” for sequencing primer-binding site
Sequencing metrics for control DNA samples
| Cell line | Input DNA (ng) | Number of replicates | Target coverage, mean (SD) | % on target bases, mean (SD) | Fold 80 base penalty, mean (SD) |
|---|---|---|---|---|---|
| NA12878 | 300 | 4 | 3097 (125) | 85% (0%) | 1.77 (0.01) |
| 100 | 4 | 3028 (149) | 79% (0%) | 1.96 (0.01) | |
| 30 | 4 | 2342 (161) | 78% (1%) | 2.20 (0.04) | |
| 10 | 4 | 2735 (289) | 67% (3%) | 3.57 (0.33) | |
| HD753 | 100 | 3 | 6941 (739) | 56% (1%) | 1.85 (0.01) |
| 30 | 3 | 3920 (301) | 56% (1%) | 1.87 (0.08) | |
| 10 | 3 | 4045 (727) | 51% (2%) | 2.22 (0.16) | |
| HD200 | 300 | 4 | 4441 (1312) | 73% (1%) | 1.96 (0.05) |
| 100 | 4 | 4509 (1073) | 66% (1%) | 2.05 (0.10) | |
| 30 | 4 | 2766 (507) | 62% (1%) | 2.29 (0.06) | |
| 10 | 4 | 1721 (214) | 61% (1%) | 2.64 (0.11) |
SD standard deviation
Detection of SNV and indel variants from NA12878
| DNA input (ng) | Replicate number | TP | FN | % of expected | Average per input amount | Standard deviation per input amount |
|---|---|---|---|---|---|---|
| 300 | 1 | 131 | 6 | 0.96 | 0.96 | 0.01 |
| 2 | 129 | 8 | 0.94 | |||
| 3 | 131 | 6 | 0.96 | |||
| 4 | 129 | 8 | 0.94 | |||
| 100 | 1 | 126 | 11 | 0.92 | 0.92 | 0.01 |
| 2 | 129 | 8 | 0.94 | |||
| 3 | 129 | 8 | 0.94 | |||
| 4 | 128 | 9 | 0.93 | |||
| 30 | 1 | 125 | 12 | 0.91 | 0.91 | 0.02 |
| 2 | 122 | 15 | 0.89 | |||
| 3 | 126 | 11 | 0.92 | |||
| 4 | 128 | 9 | 0.93 | |||
| 10 | 1 | 102 | 35 | 0.74 | 0.74 | 0.04 |
| 2 | 113 | 24 | 0.82 | |||
| 3 | 100 | 37 | 0.73 | |||
| 4 | 107 | 30 | 0.78 |
All variants (N = 137). TP true positives, the variants in the call set that match the variants in ground truth list for the reference material available for Genome in a Bottle (GIAB) sample HG00, v 3.3.2., see Methods; FN false negatives, the variants that are in the ground truth list, but not present in the call set; FP false positives, variants in the call set that are not present in the ground truth list
Detection of variants from the STMM-Mix-II control DNA mixtures
| DNA Input (ng) | Expected variant allelic fraction (VAF) | Calls | Accuracy | ||||
|---|---|---|---|---|---|---|---|
| FN | TP | FP | Sensitivity | Specificity | PPV | ||
| 100 | 40% | 0 | 36 | 1 | 100.0% | 100.0% | 97.3% |
| 25% | 1 | 35 | 2 | 97.2% | 100.00% | 94.6% | |
| 15% | 0 | 36 | 6 | 100.0% | 100.00% | 85.7% | |
| 10% | 1 | 35 | 2 | 97.2% | 100.00% | 94.6% | |
| 5% | 0 | 36 | 5 | 100.0% | 100.00% | 87.8% | |
Analysis includes 37 variants of the Seraseq STMM-II that overlap regions with sufficient read coverage. Analysis was performed with the Compass analysis software (TOMA Biosystems Inc., Foster City, CA), removing PCR duplicates. TP true positives, the variants in the call set that match the variants in ground truth list for the reference material provided by the manufacturer; FN false negatives, the variants that are in the ground truth list, but not present in the call set; FP false positives, variants in the call set that are not present in the ground truth list. The ground truth list only includes the spiked-in synthetic somatic mutations and not the germline variants present in the background genome
Fig. 2Analysis of variant allelic fraction. a Detection rate of spiked-in somatic variants in HD200. Variants detected in 4 out of 4 replicates are shown in blue, 3 out of 4 in red, 2 out of 4 in green, 1 out of 4 in purple, and 0 out of 4 in yellow. b Detection rate of spiked-in somatic variants in HD753. Variants detected in 3 out of 3 replicates are shown in blue, 2 out of 3 in red, 1 out of 3 in green, and 0 out of 3 in purple
Fig. 3Detection of copy number alterations. Normalized coverage for all genes in the 130-gene panel for each replicate of HD753 (target) plotted vs. normalized coverage for all genes in NA12878 (control, the same control is used for comparison with each target replicate) for 100 ng (a), 30 ng (b) and 10 ng (c) DNA input. Each replicate is shown in a different color. The three amplified genes are shown as diamonds (MYC), squares (MET) and triangles (ALK)
CNA calling from a control DNA sample HD753
| Amount of DNA (ng) | Gene | Read depth CNA calling | Varscan2 CNA calling | ||
|---|---|---|---|---|---|
| Expected ratio* | Observed ratio, mean (SD) | Expected ratio (log scale)* | Observed ratio (log scale), mean (SD) | ||
| 100 |
| 4.90 | 4.09 (0.04) | 2.29 | 2.51 (0.15) |
|
| 2.25 | 1.87 (0.04) | 1.17 | 1.24 (0.14) | |
|
| 1.32 | 1.41 (0.04) | 0.40 | 0.66 (0.14) | |
| 30 |
| 4.90 | 3.67 (0.05) | 2.29 | 2.93 (0.08) |
|
| 2.25 | 1.51 (0.03) | 1.17 | 1.61 (0.08) | |
|
| 1.32 | 1.40 (0.03) | 0.40 | 1.11 (0.09) | |
| 10 |
| 4.90 | 4.52 (0.24) | 2.29 | 3.26 (0.11) |
|
| 2.25 | 1.64 (0.07) | 1.17 | 1.49 (0.20) | |
|
| 1.32 | 1.49 (0.04) | 0.40 | 1.16 (0.20) | |
SD standard deviation
*According to information from manufacturer
Fig. 4Variant overlap between DNA from PBMCs vs. FFPE tissue. Overlap of variants called by GATK in the FFPE (red circles) and PBMC (green circles) samples from the four patients included in the matched sample study. a–d show Patients1–4, respectively