| Literature DB >> 20142258 |
Andrew E Dellinger1, Seang-Mei Saw, Liang K Goh, Mark Seielstad, Terri L Young, Yi-Ju Li.
Abstract
Determination of copy number variants (CNVs) inferred in genome wide single nucleotide polymorphism arrays has shown increasing utility in genetic variant disease associations. Several CNV detection methods are available, but differences in CNV call thresholds and characteristics exist. We evaluated the relative performance of seven methods: circular binary segmentation, CNVFinder, cnvPartition, gain and loss of DNA, Nexus algorithms, PennCNV and QuantiSNP. Tested data included real and simulated Illumina HumHap 550 data from the Singapore cohort study of the risk factors for Myopia (SCORM) and simulated data from Affymetrix 6.0 and platform-independent distributions. The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization before enacting full analysis. We used 10 SCORM samples for optimizing parameter settings for each method and then evaluated method performance at optimal parameters using 100 SCORM samples. The statistical power, false positive rates, and receiver operating characteristic (ROC) curve residuals were evaluated by simulation studies. Optimal parameters, as determined by NSR and ROC curve residuals, were consistent across datasets. QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs.Entities:
Mesh:
Year: 2010 PMID: 20142258 PMCID: PMC2875020 DOI: 10.1093/nar/gkq040
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Positions of simulated copy number variants with University of California Santa Clara genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway) tracks on chromosome 1. The copy number (sim1 copy, sim2 copy), the size of copy number variant (sim1 size, sim2 size) and the frequency of copy number variant detection method (sim2 mm) were designated at each location for simulations 1 and 2.
Results of NSR optimization search on the training datasets
| Method | Parameters | CNV SNPs per Sample | Unique CNV SNPs | NSR | ROC residual |
|---|---|---|---|---|---|
| CBS | 1 standard deviation | 275 | 0.8050 | 0.00293 | 0.00284 |
| 3 standard deviations | 270 | 0.8135 | 0.00302 | 0.00325 | |
| 4 standard deviations | 265 | 0.8169 | 0.00309 | 0.00316 | |
| 5 standard deviations | 130 | 0.9158 | 0.00707 | 0.00079 | |
| CNVFinder | SDe = 6, 4, 4, 2 | 630 | 0.8476 | 0.00135 | 0.0016 |
| SDe = 7, 5, 4, 2 | 446 | 0.8907 | 0.00200 | 0.00114 | |
| SDe = 8, 7, 6, 5 | 258 | 0.9037 | 0.00350 | 0.00061 | |
| cnvPartition | Confidence >0 | 195 | 0.8187 | 0.00420 | 0.00257 |
| Confidence >5 | 193 | 0.8185 | 0.00425 | 0.00253 | |
| Confidence >10 | 186 | 0.8167 | 0.00440 | 0.00249 | |
| GLAD | Default | 262 | 0.7245 | 0.00276 | 0.00208 |
| Segment log | 243 | 0.7476 | 0.00307 | 0.00186 | |
| Segment log | 192 | 0.7724 | 0.00402 | 0.00111 | |
| Nexus Rank | Threshold <0.01 | 10 126 | 0.8696 | 8.588 | 0.00481 |
| Threshold <0.001 | 2041 | 0.9385 | 0.00046 | 0.00349 | |
| Threshold <0.0001 | 540 | 0.9137 | 0.00169 | 0.00312 | |
| Nexus SNPRank | Threshold <0.01 | 235 | 0.7862 | 0.00335 | 0.00288 |
| Threshold <0.001 | 206 | 0.7816 | 0.00380 | 0.00272 | |
| Threshold <0.0001 | 185 | 0.7889 | 0.00426 | 0.00246 | |
| PennCNV | All CNV calls | 271 | 0.7909 | 0.00292 | 0.00284 |
| SNPs >1 | 248 | 0.7677 | 0.00310 | 0.00282 | |
| QuantiSNP | 332 | 0.8793 | 0.00265 | 0.00313 | |
| 314 | 0.8423 | 0.00268 | 0.00311 | ||
| 283 | 0.7978 | 0.00282 | 0.00308 | ||
| Log Bayes ≥0 | 311 | 0.8312 | 0.00267 | 0.00308 | |
| Log Bayes ≥2.5 | 268 | 0.8111 | 0.00303 | 0.00299 | |
Number of CNV SNPs and ROC curve residuals for each DNA type in the pilot dataset
| Method | Parametera | Measureb | Buccal | Saliva | Blood | Amplified Buccal |
|---|---|---|---|---|---|---|
| CBS | CNV SNPs | 391 | 209 | 688 | 51 404 | |
| ROC residual | 0.0024 | 0.0009 | −8.3 | 0.0217 | ||
| 4 SD | ROC residual | 0.0024 | 0.0009 | −5.9 | 0.0217 | |
| CNVFinder | CNV SNPs | 336 | 371 | 218 | 2032 | |
| ROC residual | 0.0007 | 0.0007 | 0.0008 | 0.0063 | ||
| 7,5,4,2 SDe | ROC residual | 0.0005 | 0.0005 | 0.0007 | 0.0038 | |
| cnvPartition | CNV SNPs | 329 | 212 | 410 | 4900 | |
| ROC residual | 0.0018 | 0.0012 | 0.0013 | 0.0166 | ||
| Confidence >5 | ROC residual | 0.0018 | 0.0012 | 0.0013 | 0.0167 | |
| GLAD | CNV SNPs | 135 | 157 | 646 | 7667 | |
| ROC residual | 0.0011 | 0.0010 | −0.0009 | 0.0121 | ||
| Smoothed µ > |0.3| | ROC residual | 0.0011 | 0.0010 | 0.0011 | 0.0122 | |
| Nexus Rank | CNV SNPs | 29 843 | 54 662 | 57 186 | 285 812 | |
| ROC residual | 0.0057 | 0.0098 | −0.0573 | 0.0096 | ||
| Threshold = 1 | ROC residual | 0.0026 | 0.0029 | 0.0014 | 0.0170 | |
| Nexus SNPRank | CNV SNPs | 592 | 1528 | 2717 | 85 646 | |
| ROC residual | 0.0016 | 0.0008 | −0.0003 | 0.0217 | ||
| Threshold = 1 | ROC residual | 0.0011 | 0.0008 | −0.0003 | 0.0216 | |
| PennCNV | CNV SNPs | 278 | 411 | 382 | 6 663 | |
| ROC residual | 0.0016 | 0.0016 | 0.002 | 0.0201 | ||
| SNPs ≥2 | ROC residual | 0.0016 | 0.0016 | 0.0019 | 0.0142 | |
| Confidence >2 | ROC residual | 0.0016 | 0.0016 | 0.0019 | 0.0144 | |
| QuantiSNP | CNV SNPs | 468 | 739 | 568 | 36 200 | |
| ROC residual | 0.0026 | 0.0022 | 0.0029 | 0.0267 | ||
| ROC residual | 0.0025 | 0.0021 | 0.0023 | 0.0235 |
aBold text designates NSR optimal parameter results for each method.
bCNV SNPs are given as number per sample.
ROC curve residuals used to optimize parameters on the analysis dataset
| Method | Parameter | DGV ROC residual | HapMap Asian ROC residual | HapMap confirmed ROC residual |
|---|---|---|---|---|
| CBS | 5 SD | 4.41 | 6.91 | 0.0020 |
| 3 SD | 4.99 | 8.22 | 0.0023 | |
| CNVFinder | 8,7,6,5 SDe | 1.85 | 2.56 | 8.36 |
| 7,5,4,2 SDe | 3.76 | 5.10 | 0.0015 | |
| cnvPartition | Confidence >10 | 6.98 | 0.00146 | 0.0026 |
| Confidence >5 | 7.21 | 0.00148 | 0.0027 | |
| GLAD | Smoothed µ >|0.4| | 5.04 | 7.99 | 0.0022 |
| Smoothed µ >|0.3| | 5.87 | 9.59 | 0.0024 | |
| Nexus Rank | Threshold 1 | 0.00313 | 0.00307 | −0.00293 |
| Threshold 1 | 0.00432 | 0.00386 | −0.00816 | |
| Nexus SNPRank | Threshold 1 | 8.95 | 0.00128 | 1.23 |
| Threshold 1 | 0.00106 | 0.00147 | 1.69 | |
| PennCNV | Confidence >17.5 | 7.35 | 0.00131 | 0.0025 |
| Confidence >10 | 9.01 | 0.00155 | 0.0030 | |
| QuantiSNP | Log Bayes >10 | 9.23 | 0.00159 | 0.0030 |
| Log Bayes >2.5 | 0.0012 | 0.00196 | 0.0038 | |
Maximal ROC curve residuals are optimal and indicated in bold.
The relationship between CNV SNPs and CNVs
| CBS | CNVFinder | cnvPartitiona | GLAD | Nexus rank | Nexus SNPRank | PennCNV | QuantiSNP | |
|---|---|---|---|---|---|---|---|---|
| Pilot CNV SNPs | 1046 | 1855 | 1060 | 785 | 273 311 | 7640 | 2055 | 3694 |
| Pilot CNVs | 127 | 210 | 64 | 62 | 42 468 | 1405 | 405 | 369 |
| Pilot SNPs/CNV | 8.24 | 8.83 | 16.56 | 12.66 | 6.44 | 5.44 | 5.07 | 10.01 |
| Analysis CNV SNPs | 35 650 | 119 780 | 18 520 | 32 300 | 3 179 600 | 43 370 | 29 560 | 70 190 |
| Analysis CNVs | 5680 | 13 110 | 1380 | 1520 | 544 410 | 1960 | 6030 | 29 560 |
| Analysis SNPs/CNV | 6.28 | 9.14 | 13.42 | 21.25 | 5.84 | 22.13 | 4.90 | 8.22 |
| Sim 1 CNV SNPs | 24 249 | 47 549 | N/A | 24 370 | 28 178 | 24 059 | 24 080 | 24 825 |
| Sim 1 CNVs | 1319 | 5084 | N/A | 1368 | 1448 | 1202 | 1266 | 1256 |
| Sim 1 SNPs/CNV | 18.38 | 9.35 | N/A | 17.81 | 19.46 | 20.02 | 19.02 | 19.77 |
| Sim 2 CNV SNPs | 21 225 | 21 077 | N/A | 19 209 | 23 037 | 20 564 | 21 066 | 23 041 |
| Sim 2 CNVs | 3587 | 2921 | N/A | 3144 | 2320 | 3346 | 3337 | 3696 |
| Sim 2 SNPs/CNV | 5.92 | 7.22 | N/A | 6.11 | 9.93 | 6.15 | 6.31 | 6.23 |
aN/A is used here because cnvPartition was not evaluated in the simulations.
Figure 2.Boxplots for ROC curve residual from comparison of CNVs in the analysis dataset and three CNV databases. (a) CNVs from SNP studies in the Database of Genomic Variants were used to compute sensitivity and 1-specificity. (b) CNVs from Asian samples in HapMap from Redon et al. (11) were used. (c) Experimentally confirmed CNVs in all HapMap samples from Redon et al. (11) were used.
Figure 4.Performance comparison of CNV detection methods using simulated data. (a) power, (b) false positive rate, (c) mean of ROC curve residuals for 100 simulated samples and (d) standard deviation of ROC residuals for 100 simulated samples. CNVFinder and GLAD were not tested on the Neutral and Affymetrix simulations and so do not appear in these graphs.
Figure 3.Boxplots of ROC curve residual from simulated data for each method. (a) ROC residuals from simulation 1. (b) ROC residuals from simulation 2.