| Literature DB >> 25626454 |
Frank Reinecke1, Ravi Vijaya Satya2, John DiCarlo3.
Abstract
BACKGROUND: Next-generation sequencing (NGS) is rapidly becoming common practice in clinical diagnostics and cancer research. In addition to the detection of single nucleotide variants (SNVs), information on copy number variants (CNVs) is of great interest. Several algorithms exist to detect CNVs by analyzing whole genome sequencing data or data from samples enriched by hybridization-capture. PCR-enriched amplicon-sequencing data have special characteristics that have been taken into account by only one publicly available algorithm so far.Entities:
Mesh:
Year: 2015 PMID: 25626454 PMCID: PMC4384318 DOI: 10.1186/s12859-014-0428-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Dispersion correction. Illustration of the dispersion correction effect by ϕ. First row: before correction, second row: after correction. Calls for the sequencing data sets (M62 and M63) and both control samples (NA12898, NA19129) are plotted separately (in columns). A CNV is called if the determined Q score is higher than the threshold (normalized to 1 in this diagram). False classifications (FN: false negative, FP: false positive) are shown in red. Loci with known CNVs in the sample are shown as dots and loci with normal copy number are plotted as crosses.
Figure 2ROC curve. Receiver operator characteristic curves for ONCOCNV (o) and our own development using the comparison dataset (Table 2): Plain t-test statistics without corrections (naive, n) compared to the performance achieved by using weights (w) or removing outliers (r). The final algorithm quandico (q) includes both of these steps together with an additional dispersion balancing. Symbols are plotted for local maxima. The inset shows a magnification of the region above 90% specificity and sensitivity.
Figure 3False positive/negative rates. False positive rate (FPR) and false negative rates (FNR) observed on the comparison dataset. The optimal threshold for every algorithm was determined by selecting the value that generated the minimal sum of FPR and FNR. The scores for every individual algorithm (x-axis) were then divided by the identified threshold (normalized to 1) for comparison. For algorithm details, see legend of Figure 2.
Algorithm comparison using a subset of samples with four controls
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| quandico | 41 | 491 | 18 | 1 | 551 | 0.976 | 0.965 |
| ONCOCNV | 35 | 503 | 10 | 9 | 557 | 0.795 | 0.981 |
Performance metrics
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Training | 145 | 1426 | 26 | 3 | 0.980 | 0.982 |
| Validation | 72 | 878 | 10 | 0 | 1.000 | 0.989 |
| Combined | 217 | 2304 | 36 | 3 | 0.986 | 0.985 |
Summary of calls made using the training and validation datasets.