| Literature DB >> 22691279 |
Quan Wang1, Peichao Peng, Minping Qian, Lin Wan, Minghua Deng.
Abstract
BACKGROUND: Copy number variation (CNV) is essential to understand the pathology of many complex diseases at the DNA level. Affymetrix SNP arrays, which are widely used for CNV studies, significantly depend on accurate copy number (CN) estimation. Nevertheless, CN estimation may be biased by several factors, including cross-hybridization and training sample batch, as well as genomic waves of intensities induced by sequence-dependent hybridization rate and amplification efficiency. Since many available algorithms only address one or two of the three factors, a high false discovery rate (FDR) often results when identifying CNV. Therefore, we have developed a new CNV detection pipeline which is based on hybridization and amplification rate correction (CNVhac).Entities:
Mesh:
Year: 2012 PMID: 22691279 PMCID: PMC3428662 DOI: 10.1186/1755-8794-5-24
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Figure 1ROC curves of the sex classification for CNVhac, CRMA_v2 and cn.FARMS on 59 HapMap CEU founders. Left: Full ROC curves. Right: Top-left corner of ROC curves. CNVhac performs better than CRMA_v2 and cn.FARMS.
Figure 2Genomic wave patterns on a segment of Chromosome X of one CEU female founder, NA06985, for (a) cn.FARMS, (b) CRMA_v2 and (c) CNVhac. CNVhac has the smallest amplitude of estimated raw CNs.
Figure 3Density of raw CNs estimated by different methods for (a) male CEU founders and (b) female CEU founders on chromosome X. Raw CNs are scaled to the same median (for males 1 and females 2). CNVhac shows significantly smaller variance than CRMA_v2 and cn.FARMS (F test, all p-values are < 2e-16).
Figure 41-precision versus recall curves for CNV detection on 269 HapMap samples. A curve that is located more toward the upper-left corner indicates better performance. Note: FDR is 1-precision. Compared to Birdsuite, CNVhac shows an appreciably lower FDR when calling CNVs.
Results of CNV calling based on different training sample batches for CNVhac and Birdsuite
| NA12156 | 17 | 19 | 21 | 14 | 22 | 0.64 | 15 | 17 | 18 | 15 | 17 | 0.88 |
| NA12878 | 22 | 21 | 19 | 15 | 28 | 0.54 | 29 | 26 | 24 | 20 | 33 | 0.61 |
| NA18507 | 19 | 15 | 20 | 10 | 23 | 0.43 | 16 | 20 | 20 | 15 | 21 | 0.71 |
| NA18517 | 20 | 21 | 21 | 14 | 25 | 0.56 | 21 | 21 | 18 | 16 | 23 | 0.7 |
| NA18555 | 16 | 16 | 15 | 11 | 20 | 0.55 | 16 | 14 | 17 | 11 | 18 | 0.61 |
| NA18956 | 13 | 12 | 16 | 9 | 16 | 0.6 | 20 | 21 | 24 | 16 | 24 | 0.67 |
§The number of predicted CNVs using group 1 for parameter training.
¶The number of CNVs in intersection set of “G1”, “G2” and “G3”.
†The number of CNVs in union set of “G1”, “G2” and “G3”.
‡The ratio of intersection to union.