| Literature DB >> 28732520 |
Gahee Park1,2, Joo Kyung Park3, Seung-Ho Shin1,4, Hyo-Jeong Jeon1, Nayoung K D Kim1, Yeon Jeong Kim1, Hyun-Tae Shin1, Eunjin Lee1, Kwang Hyuck Lee3,4, Dae-Soon Son1, Woong-Yang Park5,6,7, Donghyun Park8.
Abstract
BACKGROUND: Targeted deep sequencing is increasingly used to detect low-allelic fraction variants; it is therefore essential that errors that constitute baseline noise and impose a practical limit on detection are characterized. In the present study, we systematically evaluate the extent to which errors are incurred during specific steps of the capture-based targeted sequencing process.Entities:
Keywords: Background error; DNA fragmentation; Next-generation sequencing; Plasma DNA; Substitution rate; Targeted deep sequencing
Mesh:
Substances:
Year: 2017 PMID: 28732520 PMCID: PMC5521083 DOI: 10.1186/s13059-017-1275-2
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Base call quality in targeted deep sequencing data. a The density plot visualizes Phred base quality score distribution of background and total bases. b, c After the removal of bases with a quality score <30, the average base quality scores (i.e., total and for each of the four nucleotides) was box-plotted for PBL (b) and plasma (c) DNA samples
Fig. 2Characterization of background errors in targeted deep sequencing data. a The background allele frequencies from plasma and PBL DNA samples were box-plotted (n = 19 for each group). b The frequency of error-free positions in each sample was calculated and box-plotted for each group. c The distribution of background allele frequencies across all possible 12 substitution classes. The y-axis denotes the frequency of each class in the pre-treatment PBL and plasma samples. The relative base substitution frequency is shown in the stacked bar plot on the right side. d The ratio of background errors in PBL compared to plasma DNA samples was plotted for each substitution class indicated on the x-axis. e The ratio of background allele frequencies for reciprocal base substitutions. Error bars indicate standard deviation
Fig. 3The effect of DNA fragmentation on the background allele frequency. a The background error rates were calculated from sequencing data sets generated using genomic input DNA fragmented under various conditions. b A detailed description of the conditions used in (a). The median DNA fragment sizes of input DNA and inserts of sequencing data were measured. Varied parameters for each condition are indicated in the table
Fig. 4Characterization of the DNA break point. a Fold changes in error rates across all substitution classes were compared between PBL and plasma DNA samples. The fold change was calculated by dividing the substitution rate at each position by the average rate of 1 − 50 bp. The distribution of fold changes in 1 − 50 bp was box-plotted. Above the box plots, the observed fold change at the first and the second bases was marked for comparison. For better discrimination between the groups, plasma data are displayed on a grey background. b The distribution of quality scores of total bases, including low quality bases, is plotted according to the position of reads. c Mononucleotide frequencies around the DNA break point. Note: except in (b), all analyses of background alleles were performed after filtering of bases with a quality score <30
Fig. 5The false positive rate. a The error rate of each background allele was calculated and their distribution was plotted, dependent on the total read counts. Data from 19 plasma DNA samples were down-sampled to a designated size of total reads in a range between 2.5 and 50 M. The x-axis denotes the frequency of background alleles, and the y-axis denotes the fraction of alleles with the designated background rate on the x-axis. b–d The fraction of background alleles present at a frequency greater than a given threshold is plotted against the depth of coverage after de-duplication (x-axis). b The effect of coverage depth on the false positive rate is shown in the down-sampled data set generated from 19 plasma DNA samples. c The effect of each DNA shearing condition on the false positive rate was estimated using the down-sampled data set from Fig. 3. a–d indicate the fragmentation conditions, as described in Fig. 3. d A comparison of plasma and fragmented PBL DNA samples