| Literature DB >> 27605188 |
Lingyang Xu1,2, Yali Hou3, Derek M Bickhart4, Jiuzhou Song2, George E Liu5.
Abstract
Copy number variations (CNVs) are gains and losses of genomic sequence between two individuals of a species when compared to a reference genome. The data from single nucleotide polymorphism (SNP) microarrays are now routinely used for genotyping, but they also can be utilized for copy number detection. Substantial progress has been made in array design and CNV calling algorithms and at least 10 comparison studies in humans have been published to assess them. In this review, we first survey the literature on existing microarray platforms and CNV calling algorithms. We then examine a number of CNV calling tools to evaluate their impacts using bovine high-density SNP data. Large incongruities in the results from different CNV calling tools highlight the need for standardizing array data collection, quality assessment and experimental validation. Only after careful experimental design and rigorous data filtering can the impacts of CNVs on both normal phenotypic variability and disease susceptibility be fully revealed.Entities:
Keywords: algorithm; cattle genome; copy number variation (CNV); segmental duplication; single nucleotide polymorphism (SNP)
Year: 2013 PMID: 27605188 PMCID: PMC5003459 DOI: 10.3390/microarrays2030171
Source DB: PubMed Journal: Microarrays (Basel) ISSN: 2076-3905
Survey of recent comparison studies of copy number variation (CNV) detection.
| Authors | Year | Algorithm | Data | Platform | Vendor | Conclusion | Comment | |
|---|---|---|---|---|---|---|---|---|
| Lai [ | 2005 | CGHseq, Quantreg, CLAC, GLAD, CBS, HMM, Wavelet, Lowess, ChARM, GA and ACE | Simulation and empirical samples for Glioblastoma | array CGH | Custom cDNA array | Several general characteristics of future program development were suggested. | Earlier programs for array CGH. | |
| Baross [ | 2007 | CNAG, dChip, CNAT, GLAD | Simulation and empirical mental retardation 100K Affymetrix SNP array | SNP array | Affymetrix | Multiple programs were needed to find all real aberrations. | False positive deletions was substantial, but could be greatly reduced by using the SNP genotype information to confirm loss of heterozygosity. | |
| Winchester [ | 2009 | Birdsuite, CNAT, GADA, PennCNV, QuantiSNP | NA12156, NA15510 | SNP array | Affymetrix, Illumina | Multiple predictions from different software. | Use software designed for the platform. | |
| Dellinger [ | 2010 | CBS, cnvFinder, cnvPartition, GALD, Nexus, PennCNV and QuantiSNP | Simulation and empirical samples from Singapore cohort study of the risk factors for Myopia | SNP array | Illumina | QuantiSNP outperformed other methods based on ROC curve residuals over most datasets. Nexus Rank and SNPRank have low specificity and high power. Nexus Rank calls oversized CNVs. PennCNV detects one of the fewest numbers of CNVs. | The normalized singleton ratio (NSR) is proposed as a metric for parameter optimization. | |
| Tsuang [ | 2010 | PennCNV, QuantiSNP, HMMSeg, and cnvPartition | 48 Schizophrenia samples | SNP array | Illumina | Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed. | Given the variety of methods used, there will be many false positives and false negatives. | |
| Zhang [ | 2011 | Birdsuite, Partek Genomics Suite, HelixTree, and PennCNV-affy | ~1,000 Bipolar + 270 HapMap samples | SNP array | Affymetrix | Birdsuite and Partek had higher positive predictive values. | Poor overlap between 2 gold standards (Kidd et al. and Conrad et al.). | |
| Marenne [ | 2011 | cnvPartition, PennCNV, and QuantiSNP | 96 pair samples from Spanish Bladder Cancer/EPICURO study | SNP array | Illumina | PennCNV was the most reliable algorithm when assessing the number of copies. | Current calling algorithms should be improved for high performance CNV analysis in genome-wide scans. | |
| Pinto [ | 2011 | Birdsuite, cnvFinder, cnvPartition, dCHIP, ADM-2 (DNA Analytics), Genotyping Console (GTC), iPattern, Nexus Copy Number, Partek Genomics Suite, PennCNV, QuantiSNP | 6 samples in triplicate on 11 array platforms | array CGH, SNP array, and BAC array | Agilent, NimbleGen, Affymetrix, and Illumina | Different analytic tools applied to the same raw data typically yield CNV calls with <50% concordance. Moreover, reproducibility in replicate experiments is <70% for most platforms. | The CNV resource presented here allows independent data evaluation and provides a means to benchmark new algorithms. CNV calls are disproportionally affected by genome complexity as they tend to overlap SDs and a single CNV is detected as multiple smaller variants. | |
| Koike [ | 2011 | Birdsuite, Birdseye, PennCNV, CGHseg, DNAcopy | HapMap samples | SNP array | Affymetrix | Hidden Markov model-based programs PennCNV and Birdseye (part of Birdsuite), or Birdsuite show better detection performance. | Segmental duplications and interspersed repeats (LINEs) are involved in CNVs. | |
| Eckel-Passow [ | 2011 | Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM | 1,418 GENOA (Genetic Epidemiology Network of Atherosclerosis)/FBPP (Family Blood Pressure Program) samples | SNP array | Affymetrix | Recommended trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests. | Advocated that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. | |
CNVs and CNVRs identified using PennCNV, cnvPartition, SVS, and DNAcopy.
| Tool | Event | Count | Gain | Loss | Average Length |
|---|---|---|---|---|---|
| CNV | 46,751 (74.2) | 17,796 (28.2) | 28,955 (46.0) | 2,334,244,479 (49,929) | |
| CNVR | 3,364 a | 1,382 b | 2,376 c | 147,476,461 (43,840) | |
| CNV | 16,566 (26.3) | 5,021 (8.0) | 11,545 (18.3) | 2,191,528,246 (132,291) | |
| CNVR | 1,298 a | 541 b | 916 c | 172,378,730 (132,803) | |
| CNV | 92,463 (146.8) | 205 (0.3) | 92,258 (146.4) | 2,234,601,290 (24,168) | |
| CNVR | 7,099 a | 78 b | 7,056 c | 151,471,634 (21,337) | |
| CNV | 41,858 (66.4) | 4,469 (7.1) | 37,389 (59.3) | 1,863,930,368 (44,530) | |
| CNVR | 5,961 a | 1,457 b | 5,284 c | 194,287,154 (32,593) |
Numbers in parentheses are values normalized by sample counts, except in the case of the parentheses values in the “Average Length” column, which are average lengths normalized by CNV counts. a These numbers represent non-redundant CNVR counts after merging both gain and loss CNVs identified across all 630 samples. b Gain CNV events were merged separately. c Loss CNV events were merged separately.
Figure 1Comparisons of CNVR results identified by PennCNV, cnvPartition, SVS, and DNAcopy based on genomic location in UMD3.1. The overlap lengths of CNVRs were indicated in Mb.
Overlaps among CNVRs across 4 CNV detection tools.
| Count | Length (base pair) | ||||||
|---|---|---|---|---|---|---|---|
| Tool1 | Tool2 | Intersection a | Union a | Percentage | Intersection b | Union b | Percentage |
| PennCNV | cnvPartition | 1,420 | 3,242 | 43.80% | 107,775,740 | 212,079,451 | 50.82% |
| PennCNV | DNAcopy | 2,355 | 6,970 | 33.79% | 93,149,061 | 248,614,554 | 37.47% |
| PennCNV | SVS | 1,264 | 9,199 | 13.74% | 59,557,597 | 239,390,498 | 24.88% |
| cnvPartition | DNAcopy | 1,284 | 5,975 | 21.49% | 79,825,624 | 286,840,260 | 27.83% |
| cnvPartition | SVS | 981 | 7,416 | 13.23% | 56,569,347 | 267,281,017 | 21.16% |
| DNAcopy | SVS | 2,332 | 10,728 | 21.74% | 88,864,805 | 256,893,983 | 34.59% |
a These numbers represent intersections and unions of two CNVR datasets by count. b These numbers represent intersections and unions of two CNVR datasets by length in base pair.