| Literature DB >> 25161258 |
Evangelos Bellos1, Lachlan J M Coin2.
Abstract
MOTIVATION: Exome sequencing technologies have transformed the field of Mendelian genetics and allowed for efficient detection of genomic variants in protein-coding regions. The target enrichment process that is intrinsic to exome sequencing is inherently imperfect, generating large amounts of unintended off-target sequence. Off-target data are characterized by very low and highly heterogeneous coverage and are usually discarded by exome analysis pipelines. We posit that off-target read depth is a rich, but overlooked, source of information that could be mined to detect intergenic copy number variation (CNV). We propose cnvOffseq, a novel normalization framework for off-target read depth that is based on local adaptive singular value decomposition (SVD). This method is designed to address the heterogeneity of the underlying data and allows for accurate and precise CNV detection and genotyping in off-target regions.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25161258 PMCID: PMC4147927 DOI: 10.1093/bioinformatics/btu475
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Flowchart describing the local adaptive SVD normalization process. (a) Iterative segmentation process. (b) Differential normalization of each segment according to the observed RD pattern
Fig. 2.Comparison of normalization techniques for two samples on chr6:48 911 000–48 959 000. The red dashed lines denote the breakpoints of a gold standard deletion that is present only in sample NA06989. (a) Raw RD data. The noisy nature of unnormalized RD makes the deletion difficult to detect in NA06989. The two regions denoted by the yellow dashed lines, show highly correlated RD profiles that most likely correspond to systematic bias. (b) Removing the first singular component mitigates the systematic bias, but also suppresses the signal of the true deletion. (c) Filtering low-order singular components de-noises the signal and enhances the true deletion but leaves systematic bias unaffected. (d) Our local adaptive SVD algorithm achieves the best results by combining the two approaches in one cohesive framework. The yellow shaded regions were normalized by filtering the first singular component (as in b) while the red shaded region was normalized by filtering low-order components (as in c)
cnvOffSeq performance comparison
| Metric | cnvOffSeq (%) | cnvHiTSeq (%) | CoNIFER (%) | Local Static SVD |
|---|---|---|---|---|
| Sensitivity | 57.5 | 55.0 | 7.6 | 49.4 |
| Specificity | 99.2 | 79.1 | 99.9 | 97.4 |
| PPV | 95.0 | 17.3 | 96.0 | 82.2 |
| NPV | 89.8 | 95.7 | 81.1 | 88.7 |
| FPR | 0.8 | 20.9 | 0.1 | 2.6 |
| FDR | 5.0 | 82.7 | 4.0 | 17.8 |
| Accuracy | 90.4 | 77.4 | 81.3 | 88.0 |
Note: PPV, positive predictive value; NPV, negative predictive value; FPR, false positive rate; FDR, false discovery rate.
Fig. 3.Normalization results for seven gold standard regions that account for 30% of the total deletion calls. The top panels in each plot represent LOESS smoothed RD. The bottom panels represent RD that is normalized using local adaptive SVD. Each colour corresponds to a different sample while the red dashed lines denote the breakpoints of the deletions as determined by the 1000 Genomes Project
Genotyping accuracy across methods
| Method | Genotyping accuracy (%) | Missing rate (%) |
|---|---|---|
| cnvOffSeq | 96.3 | 10.4 |
| cnvHiTSeq | 73.0 | 19.0 |
| Local Static SVD | 90.8 | 8.3 |
cnvOffSeq performance across CNV lengths
| CNV length threshold (bp) | Sensitivity (%) | Specificity (%) | FDR (%) | Accuracy (%) | Number of CNV loci |
|---|---|---|---|---|---|
| >500 | 57.5 | 99.2 | 5.0 | 90.4 | 104 |
| >2000 | 70.2 | 99.1 | 5.0 | 93.2 | 89 |
| >3000 | 73.5 | 98.9 | 5.0 | 93.5 | 67 |
| >4000 | 83.4 | 98.6 | 5.4 | 95.3 | 58 |
| >5000 | 90.4 | 98.4 | 5.8 | 96.7 | 49 |
| >6000 | 89.9 | 98.3 | 6.8 | 96.6 | 39 |
| >7000 | 88.5 | 99.0 | 4.4 | 96.8 | 36 |
| >8000 | 87.4 | 98.9 | 4.9 | 96.6 | 31 |
| >9000 | 94.2 | 99.0 | 4.0 | 98.0 | 25 |
| >10 000 | 93.9 | 98.9 | 4.2 | 97.8 | 22 |
| >11 000 | 94.4 | 98.7 | 5.6 | 97.8 | 20 |
| >13 000 | 95.7 | 98.4 | 5.7 | 97.8 | 16 |
| >14 000 | 98.5 | 98.1 | 5.7 | 98.2 | 13 |
| >15 000 | 100.0 | 98.2 | 4.3 | 98.7 | 11 |
| >22 000 | 100.0 | 97.7 | 3.9 | 98.5 | 5 |
| >25 000 | 100.0 | 97.3 | 9.5 | 97.8 | 2 |