| Literature DB >> 22474122 |
Jason Li1, Richard Lupat, Kaushalya C Amarasinghe, Ella R Thompson, Maria A Doyle, Georgina L Ryland, Richard W Tothill, Saman K Halgamuge, Ian G Campbell, Kylie L Gorringe.
Abstract
MOTIVATION: In light of the increasing adoption of targeted resequencing (TR) as a cost-effective strategy to identify disease-causing variants, a robust method for copy number variation (CNV) analysis is needed to maximize the value of this promising technology.Entities:
Mesh:
Year: 2012 PMID: 22474122 PMCID: PMC3348560 DOI: 10.1093/bioinformatics/bts146
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.CONTRA workflow. Either a matched control sample (left arrow) or a pool of normal samples for creating a baseline control (right arrow) must be present.
A summary of the samples that have been assessed against our characterization of depth-of-coverage and log-ratios in TR data
| Species | Manufacturer | Target enrichment platform | Platform alias used | Technology | No. of samples | Sequencing | SE/PE | Type |
|---|---|---|---|---|---|---|---|---|
| Human | Roche | Sequence Capture 2.1 M | SeqCap Array | Array based | 10 | GAIIx | SE and PE | Normal |
| NimbleGen | Exome Array | blood DNA | ||||||
| EZ Exome Library v2.0 | EZ Exome v2 | Solution based | 10 | HiSeq or GAIIx | PE | |||
| EZ Exome Library v1.0 | EZ Exome v1 | 10 | GAIIx | |||||
| Agilent | SureSelect All Exon 50 Mb | SureSelect v1 | 10 | |||||
| SureSelect All Exon v.2 | SureSelect v2 | 5 | ||||||
| SureSelect Custom | Custom | 10 | SE | Tumor versus | ||||
| Exon Capture | Capture | normal | ||||||
| Mouse | SureSelect Mouse All | SureSelect | 6 | PE | ||||
| Exon | Mouse |
Fig. 2.Characteristics of base-level log-ratios. (A) Log-ratio versus GC-content; (B) log-ratio versus log2-coverage derived from two normal samples; and (C) effect of imbalanced library-size on log-ratios, for both simulated negative binomial data (left) and real data (right). The data points represent copy number neutrality. Top: library size of case sample is two times that of control; middle: equal size; bottom: case is half of control.
Fig. 3.Comparison of log-ratio variations between matched control and pooled controls of varying number of samples, plotting log-ratio SD against log2 coverage. The same case sample has been used throughout. Control sample(s) are subset/superset of others.
Fig. 4.Variation of DOC in TR. (A) Histogram of exon-level coverages in a single sample; (B) coverage profile along a chromosome (showing first 20 k targeted bp of Chromosome 1); (C) coverage versus GC-content; and (D) coverage versus distance from the first targeted base.
Fig. 5.Coverage correlation between samples. (A) Log-ratio versus targeted base position along Chromosome 20, derived from pairs of random samples as indicated in the plot titles. E.g. top-left: log-ratios between two EZ Exome v2 samples; bottom-right: an EZ Exome v1 sample matched against a SureSelect v2 sample. See also Supplementary Figure S4. (B) Base-level coverage variance against coverage mean, using six random samples for each platform.
CNV detection performance over a 50x coverage simulated dataset, using default algorithmic parameters
| Size of variants | No. of instances simulated | CONTRA (%) | ExomeCNV (%) | ||
|---|---|---|---|---|---|
| Sensitivity | Specificity | Sensitivity | Specificity | ||
| 20–50 bp | 100 | 57.0 | 99.7 | 8.0 | 100.0 |
| 50–200 bp | 100 | 68.0 | 100.0 | 25.0 | 100.0 |
| Full exons | 111 | 96.4 | 100.0 | 62.2 | 100.0 |
Fig. 6.Receiver operating characteristics (ROC) curve for the HapMap samples, generated by varying CONTRA's P-value threshold. The middle table shows sensitivities and specificities for each individual sample at a P-value of 0.01.