| Literature DB >> 27600231 |
Logan C Walker1, George A R Wiggins2, John F Pearson3.
Abstract
Constitutional copy number variants (CNVs) include inherited and de novo deviations from a diploid state at a defined genomic region. These variants contribute significantly to genetic variation and disease in humans, including breast cancer susceptibility. Identification of genetic risk factors for breast cancer in recent years has been dominated by the use of genome-wide technologies, such as single nucleotide polymorphism (SNP)-arrays, with a significant focus on single nucleotide variants. To date, these large datasets have been underutilised for generating genome-wide CNV profiles despite offering a massive resource for assessing the contribution of these structural variants to breast cancer risk. Technical challenges remain in determining the location and distribution of CNVs across the human genome due to the accuracy of computational prediction algorithms and resolution of the array data. Moreover, better methods are required for interpreting the functional effect of newly discovered CNVs. In this review, we explore current and future application of SNP array technology to assess rare and common CNVs in association with breast cancer risk in humans.Entities:
Keywords: SNP arrays; breast cancer; copy number variants (CNVs); genetic variation; risk
Year: 2015 PMID: 27600231 PMCID: PMC4996380 DOI: 10.3390/microarrays4030407
Source DB: PubMed Journal: Microarrays (Basel) ISSN: 2076-3905
Commonly (>10 citations) applied CNV detection methods for SNP-array data.
| Software | Algorithm | Code | Platform | Year a | Reference | Citations b | Software URL |
|---|---|---|---|---|---|---|---|
| PennCNV | HMM | Perl | Multiple | 2007 | [ | 300 | |
| Birdsuite (Birdseye, Canary) | Mixture models | Java/Python/R | Affymetrix | 2008 | [ | 300 | |
| Nexus Copy Number | Proprietary (Segmentation) | windows executable | Multiple | - | - | 100 | |
| QuantiSNP | HMM | MATLAB | Multiple | 2007 | [ | 100 | |
| CNVPartition | Proprietary | windows executable | Illumina | 2006 | - | 100 | |
| Partek Genomics Suite | Proprietary (Segmentation or HMM) | windows executable | Multiple | - | - | 30 | |
| CNVFinder | Experimental variability | perl | Array CGH | 2006 | [ | 30 | |
| CGHCall | segmentation and mixture model | R | Array CGH | 2007 | [ | 30 | |
| GenoCNV | HMM | R | Multiple | 2009 | [ | 30 | |
| SW-ARRAY | Smith Waterman | R | Array CGH | 2005 | [ | 30 | Not available |
| HMMSeg | HMM wavelet smoothing | Java | Multiple | 2007 | [ | 10 | |
| VanillaICE | HMM | R | Affymetrix | 2008 | [ | 10 | |
| CNVHap | HMM, Haplotype | Java | Multiple | 2010 | [ | 10 | |
| dChip | Multiple | R | Multiple | 2008 | [ | 10 | |
| GADA | Bayesian | R | Multiple | 2010 | [ | 10 | |
| CNV Workshop | Segmentation | complete VM | Multiple | 2010 | [ | 10 |
a Year reference when published. b At least this many citations in PubMed or company website at July 2015. Abbreviation: HMM, Hidden Markov Model.
Accuracy of CNV-calling algorithms.
| Algorithm(s) | Platform | Validation Method | Accuracy | Study Conclusion | Reference |
|---|---|---|---|---|---|
| Adapted method on SW-ARRAY and GIM | Affymetrix | qPCR or Mass Spec Validation | 2.5% false positives, ~90% singleton validation | Developed a multistep algorithm to better call CNVs. | [ |
| Birdsuite, CNAT, CNVPartition, GADA, Nexus, PennCNV and QuantiSNP | Affymetrix, Illumina | Comparison of HapMap samples to Kidd | Assay sensitivity ranged 20%−49% with some algorithms predicting more events ( | PennCNV had the greatest sensitivity (49%). Little agreement between studies and within studies. | [ |
| cnvHap, CNVPartition, PennCNV and QuantiSNP | Aglient, Illumnina | Compared samples either with previously characterized (by aCGH) CNVs or HapMap samples from Kidd | cnvHap had very good sensitivity (68%) for larger CNVs (>10kb) in Kidd | cnvHap has increased sensitivity compared with other CNV algorithms. | [ |
| PennCNV, Aroma.Affymetrix, APT and CRLMM | Affymetrix | Compared concordance between calling algorithms. | Greater concordance in deletion (51.5%) than duplications (47.9%). The probable false positive rates for CRLMM and PennCNV were 26% and 24%. | PennCNV appeared to detect all the CNV and more than CRLMM predicted | [ |
| CNVPartition, PennCNV and QuantiSNP | Illumnina | Agreement between algorithms | Agreement varied from 59%−62% for deletions, to 43%−57% for duplications. | Use of multiple algorithms increased the positive predictive value, as did the number of probes and the minimum size (kb). | [ |
| CNVPartition, PennCNV and QuantiSNP | Illumnina | MLPA validation, measures were taken to reduce false positive calls. | All algorithms show better specificity than sensitivity. QuantiSNP was the most sensitive, predicting 28% of CNVs. PennCNV was better at discriminating copy number state. | Applying methods to reduce false positives results in low sensitivity. | [ |
| ADM-2, Birdsuite, CNVfinder, CNVPartition, dCHIP, GTC, iPattern, Nexus, Partek, PennCNV, QuantiSNP | CGH arrays and SNP arrays (Affymetrix and Illumina) | Experiments were repeated in triplicate and CNV calls were compared. CNV calls were also compared to 5 references (‘gold standards’). | Algorithm replication has <70% reproducibility. CNV calls between any two algorithms is typically low (25%–50%) within a platform. Overlap with DGV was high, whereas overlap with references [ | Newer high resolution arrays outperform older arrays in both CNVs’ call and reproducibility. Algorithms developed for specific array platforms outperformed adapted and independent algorithms. | [ |
| Birdsuite, Partek, Genomics Suite, HelixTree and PennCNV | Affymetrix | Comparison with HapMap CNV in two studies [ | Overlap ranged between 42% and 70% when including 20 probes for Kidd | Birdsuite outperformed the other 3 algorithms over multiple permutation. | [ |
| qPCR validation of rare CNVs (a single CNV event in >1000 bipolar samples) | For each algorithm between 10 or 11, CNVs were tested. Partek and Birdsuite both validated all (5/5) deletion events tested. | Birduite and Partek had high positive predictive values, particularly with deletions. HelixTree performed poorly. | |||
| CNVPartition, PennCNV and QuantiSNP | Illumnina | Comparison to a previous CGH study [ | 50 CNVs were called by all 3 algorithms. QuantiSNP had the highest overlap with CNVs predicted from CGH arrays (25%). Validation rates were greater than 80% for the 3 loci. | CNVPartition predicted the least CNVs, suggesting a high false negative rate. | [ |
| GenoCN, PennCNV and QuantiSNP | Illumnina | Comparison of HapMap sample to Conrad | All algorithms show much better specificity than sensitivity. PennCNV had the worst sensitivity, predicting <15% of Conrad | The three HMM algorithms all performed with varied results. They were all highly specific (>98%), but sensitivity remains to be an issue for all three algorithms. | [ |
| cnvHap, COKGEN, GenoCNV, HaplotypeCN, PennCNV and QuantiSNP | Affymetrix | Compared 270 HapMap samples which have been previously described. Compared simulated data to test haplotype phasing between cnvHap and HaplotypeCNV. | GenoCNV has the most sensitivity (28%) when using Kidd | Algorithm performance varied with reference study. GenoCNV was the most sensitive but had the lowest concordance rate. HaplotypeCNV, cnvHap and PennCNV (under a specific permutation) were compared separately, with HaplotypeCN outperforming the other two. | [ |
| Birdsuite, dCHIP, GTC and PennCNV | Affymetrix | Comparison to a previous CGH study [ | GTC had the highest portion of CNV matching (50% overlap) to CGH, 66%. Larger CNVs were called with greater accuracy. | Birdsuite called the most CNVs; however, PennCNV outperformed all algorithms with greater specificity and sensitivity. | [ |
Abbreviations: aCGH, array comparative genomic hybridisation; APT, Affymetrix Power Tools; CNV, copy number variant; CRLMM, corrected robust linear mixture model; DGV, Database of Genomic Variants (http://dgv.tcag.ca/dgv/app/home ); HMM, hidden Markov model; GTC, Genotyping Console; kb, kilobases; MLPA, Multiplex ligation-dependent probe amplification; qPCR, quantitative polymerase chain reaction.