| Literature DB >> 22942019 |
Vincent Plagnol1, James Curtis, Michael Epstein, Kin Y Mok, Emma Stebbings, Sofia Grigoriadou, Nicholas W Wood, Sophie Hambleton, Siobhan O Burns, Adrian J Thrasher, Dinakantha Kumararatne, Rainer Doffinger, Sergey Nejentsev.
Abstract
MOTIVATION: Exome sequencing has proven to be an effective tool to discover the genetic basis of Mendelian disorders. It is well established that copy number variants (CNVs) contribute to the etiology of these disorders. However, calling CNVs from exome sequence data is challenging. A typical read depth strategy consists of using another sample (or a combination of samples) as a reference to control for the variability at the capture and sequencing steps. However, technical variability between samples complicates the analysis and can create spurious CNV calls.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22942019 PMCID: PMC3476336 DOI: 10.1093/bioinformatics/bts526
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) Comparison of fragment per kilobase and million base pairs (FPKM) between two exomes (FPKM squared correlation coefficient = 0.992). (B) Total read depth for two typical well-matched exomes (y-axis) as a function of the proportion of reads mapping to one of two exomes (x-axis). The red lines show the 99% confidence interval assuming the best fitting binomial distribution for the read count data. The blue lines show the same 99% confidence interval assuming the best fitting beta-binomial robust model for the same dataset. (C) Same as (B) but for two typical exomes that are poorly matched to each other. (D) statistic (x-axis) and correlation between FPKM values (y-axis), both of them computed for each exome with its associated reference set
Fig. 2.Power study showing the expected posterior probability for a heterozygous deletion call. (A) Expected value of the posterior probability (averaged over all exons) for the 15 exomes in Dataset 1 as a function of the (test:reference) read count ratio (which is closely approximated by the number of exomes in the aggregate reference set). Each line shows a different test exome sample and the most correlated exome is added to the reference at each step. (B) The expected number of reads that would be mapping to a normal copy number exon varies (along the x-axis) but the (reference:test) sequencing depth remains constant at 10 (i.e. the reference set approximately consists of an aggregate of 10 exomes). Other parameters, including the level of correlations between test and reference exome, are kept constant. Power estimates assume a typical exome from each of the two Datasets 1 and 2 (the median value of the posterior probability is shown). (C), (D) The number of exomes in the aggregate reference set varies but the expected number of reads mapping to a normal copy number exon for the test sample is set to 100 (C) and 200 (D). For (B), (C) and (D), the black line refers to an optimum dataset in the absence of sample-to-sample technical variability ( = 1), longer dash to the typical dispersion parameter estimated from Dataset 1 (Rs = 1.6) and shorter dash for the typical dispersion parameter estimated from Dataset 2 ( = 2.5)
Comparison between our package (ExomeDepth) and two other tools: exomeCopy and ExomeCNV
| exomeDepth | exomeCopy | exomeCNV | |
|---|---|---|---|
| Median nb of CNVs | 213 | 495 | 2256 |
| Percentage in DGV | 86.5 | 67.8 | 16.3 |
| Median CNV size (kb) | 8.9 | 1.83 | 0.16 |
| Median CNV size (exons) | 5 | 3 | 1 |
| Median nb of CNVs | 177 | 1228 | 11 046 |
| Percentage in DGV | 80 | 36.9 | 26.6 |
| Median CNV size (kb) | 12.2 | 10.04 | 0.26 |
| Median CNV size (exons) | 5 | 5 | 1 |
| Median nb of CNVs | 246 | 641 | 5261 |
| Percentage in DGV | 66 | 37.2 | 34.2 |
| Median CNV size (kb) | 1.7 | 9.75 | 0.34 |
| Median CNV size (exons) | 3 | 4 | 1 |
| Percentage of known | |||
| CNVs found | 75.2 | 52.8 | 41.2 |
We define a CNV called from exome data as ‘in DGV’ (or a ‘known CNV’ in the 1000 Genome analysis) when the CNV in the database overlaps >50% of our CNV call.
Fig. 3.(A) Heterozygous deletion of exons 6 and 7 of the GATA2 gene identified by ExomeDepth in the exome sequence data. The red crosses show the ratio of observed/expected number of reads for the test sample. The grey shaded region shows the estimated 99% confidence interval for this observed ratio in the absence of CNV call. The presence of two contiguous exons with read count ratio located outside of the condfidence interval is indicative of a heterozygous deletion in this sample. Independently, each exon would yield a posterior probability for the deletion call of 15% (exon 6) and 77% (exon 7). The combined CNV call has a posterior probability >99.9%. (B) Validation of a 6-kb deletion using a targeted array CGH (Agilent 15K format) containing 26 probes in the GATA2 gene region. Each cross indicates a probe and red crosses indicate probes located in the region of a heterozygous deletion. (C) Sequencing of the deletion breakpoints identified the exact boundaries of this 5797-bp deletion overlapping exons 6 and 7 of the GATA2 gene