| Literature DB >> 24314227 |
Eva C Berglund1, Carl Mårten Lindqvist, Shahina Hayat, Elin Övernäs, Niklas Henriksson, Jessica Nordlund, Per Wahlberg, Erik Forestier, Gudmar Lönnerholm, Ann-Christine Syvänen.
Abstract
BACKGROUND: Target enrichment and resequencing is a widely used approach for identification of cancer genes and genetic variants associated with diseases. Although cost effective compared to whole genome sequencing, analysis of many samples constitutes a significant cost, which could be reduced by pooling samples before capture. Another limitation to the number of cancer samples that can be analyzed is often the amount of available tumor DNA. We evaluated the performance of whole genome amplified DNA and the power to detect subclonal somatic single nucleotide variants in non-indexed pools of cancer samples using the HaloPlex technology for target enrichment and next generation sequencing.Entities:
Mesh:
Year: 2013 PMID: 24314227 PMCID: PMC4046713 DOI: 10.1186/1471-2164-14-856
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Samples and sequence data statistics
| Sample or poola | Average sequence depth | Cumulative depth at variants (%) | Mapped reads (%)d | Mapped on target (%)e | ||
|---|---|---|---|---|---|---|
| Designb | Variantsc | ≥1x | ≥30x | |||
| ALL1_gDNA | 1385 | 1940 | 99.4 | 96.5 | 92.4 | 67.1 |
| Normal1_gDNA | 1280 | 1588 | 99.6 | 96.2 | 91.4 | 60.8 |
| ALL2_gDNA | 1466 | 1791 | 99.4 | 95.7 | 92.2 | 64.7 |
| Normal2_gDNA | 792 | 1008 | 99.2 | 95.2 | 91.8 | 64.0 |
| ALL1_wgaDNA | 1564 | 1831 | 99.2 | 94.2 | 92.6 | 73.5 |
| ALL2_wgaDNA | 1569 | 1706 | 99.1 | 94.0 | 92.5 | 73.0 |
| ALL1_pool2 | 1752 | 2254 | 99.7 | 97.4 | 91.8 | 67.7 |
| ALL2_pool5 | 1502 | 1710 | 99.4 | 96.9 | 93.1 | 68.8 |
| ALL1_pool10 | 1044 | 1445 | 99.6 | 96.8 | 89.4 | 63.0 |
| ALL1_pool2_rep | 1150 | 1614 | 99.4 | 95.9 | 84.5 | 42.4 |
| ALL2_pool5_rep | 1021 | 1503 | 99.3 | 91.6 | 85.5 | 49.9 |
| ALL1_pool10_rep | 1386 | 1837 | 99.5 | 96.3 | 85.9 | 58.5 |
a Pools are named with the whole genome sequenced ALL sample included and the total number of samples in the pool.
a Average sequence depth in the complete region covered by the HaloPlex design.
c Average sequence depth at the 1528 candidate SNVs and SNPs covered by the HaloPlex design.
d Percentage of sequence reads that map to the human genome.
e Percentage of the sequence reads mapping to the genome that map to the regions covered by the HaloPlex design.
Figure 1Allele fractions for putative single nucleotide variants (SNVs) in ALL and normal samples. The allele fractions observed in HaloPlex sequence data for 1509 candidate SNVs in ALL and normal samples. Only data from libraries derived from genomic DNA are shown. SNVs classified as somatic are shown in red. Candidate SNVs that follow the x = y line represent putative germline SNPs that escaped detection in the normal sample during WGS or alignment artifacts. The cluster with allele fractions close to 0 in both ALL and normal samples represents likely false positive SNV calls. Most of the candidate SNVs with allele fractions close to 1 in the ALL sample and around 0.5 in the normal sample in patient 2 are located in a large region of somatic loss of heterozygosity.
Figure 2Sequence depth variation in genomic DNA (gDNA) and whole genome amplified DNA (wgaDNA) samples. Density plot showing the variation in HaloPlex sequence depth of 1509 candidate single nucleotide variants (SNVs) and 19 germline SNPs in gDNA and wgaDNA samples. The sequence depth is more uneven in the wgaDNA samples, which display relatively low or high coverage at more sites. To increase clarity, sites with a depth > 5000 are shown at 5000 and the x-axis has been cut at 0 and 5000.
Detection of single nucleotide variants and error rates in genomic DNA and whole genome amplified DNA
| Sample | Somatic SNVsa | Errorb |
|---|---|---|
| ALL1_gDNA | 227 (13) | 0.036 |
| ALL1_wgaDNA | 223 (9) | 0.020 |
| ALL2_gDNA | 305 (5) | 0.038 |
| ALL2_wgaDNA | 302 (2) | 0.031 |
a Within parenthesis is the number of candidate SNVs that were not classified as somatic in the corresponding gDNA or wgaDNA experiment from the same original sample.
b Average absolute difference from 0.5 for 19 germline SNPs.
Figure 3Correlation between allele fractions in genomic DNA (gDNA) and whole genome amplified DNA (wgaDNA). Correlation of the allele fractions of candidate single nucleotide variants (SNVs) and germline SNPs determined in experiments using gDNA and wgaDNA from the same original DNA sample. Only variants with a HaloPlex sequence depth ≥30 in both gDNA and wgaDNA are shown (n = 1439). SNVs classified as somatic are shown in black, non-validated candidate SNVs in grey, and germline SNPs in red.
Figure 4Correlation between expected and observed allele fractions in pools and comparison of replicated pools. The top panel shows the correlation between expected and observed allele fractions in the three pools, where the expected value is calculated by dividing the allele fraction observed in the ALL1 or ALL2 genomic DNA (gDNA) sample with the number of samples in the pool. Only single nucleotide variants (SNVs) classified as somatic in the ALL1 (n = 227) or ALL2 (n = 305) gDNA samples are shown. The bottom panel shows the correlation between allele fractions in the replicated pools. The x- and y-axes are cut at 0.5, 0.2 and 0.1 for the pools with two, five and ten samples, respectively. SNVs with an observed allele fraction greater than this are shown in black at the cutoff value. The MR value is the median ratio between all x- and y-values in each plot. It represents how close the two distributions are to each other, with a value of 1 indicating similar distributions.
Accuracy of single nucleotide variant detection in non-indexed pools
| Sample | TPa | TNa | FPa | FNa | FDR (%)b | Novelc |
|---|---|---|---|---|---|---|
| ALL1_pool2 | 224 | 304 | 1 | 3 | 0.4 | 0 |
| ALL2_pool5 | 304 | 227 | 0 | 1 | 0 | 3 |
| ALL1_pool10 | 225 | 296 | 9 | 2 | 3.8 | 4 |
| ALL1_pool2_rep | 224 | 304 | 1 | 3 | 0.4 | 0 |
| ALL2_pool5_rep | 298 | 222 | 5 | 7 | 1.7 | 2 |
| ALL1_pool10_rep | 223 | 291 | 14 | 4 | 5.9 | 4 |
a TP: true positives; TN: true negatives; FP: false positives; FN: false negatives.
b FDR: false discovery rate, calculated as FP/(TP + FP).
c Putative novel SNVs called in the previously uncharacterized samples included in the pools. One of the variants called in ALL2_pool5 was not called in the corresponding replicated pool (ALL2_pool5_rep). All other novel variants were identical between replicates.