| Literature DB >> 25657572 |
Eric L Seiser1, Federico Innocenti2.
Abstract
Somatic alterations in DNA copy number have been well studied in numerous malignancies, yet the role of germline DNA copy number variation in cancer is still emerging. Genotyping microarrays generate allele-specific signal intensities to determine genotype, but may also be used to infer DNA copy number using additional computational approaches. Numerous tools have been developed to analyze Illumina genotype microarray data for copy number variant (CNV) discovery, although commonly utilized algorithms freely available to the public employ approaches based upon the use of hidden Markov models (HMMs). QuantiSNP, PennCNV, and GenoCN utilize HMMs with six copy number states but vary in how transition and emission probabilities are calculated. Performance of these CNV detection algorithms has been shown to be variable between both genotyping platforms and data sets, although HMM approaches generally outperform other current methods. Low sensitivity is prevalent with HMM-based algorithms, suggesting the need for continued improvement in CNV detection methodologies.Entities:
Keywords: copy number variation; genotyping microarray; hidden Markov model
Year: 2015 PMID: 25657572 PMCID: PMC4310714 DOI: 10.4137/CIN.S16345
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
CNVs detected in three HapMap samples using QuantiSNP, PennCNV, and GenoCN. The HMM-based CNV detection tools were applied to Illumina Human610-Quad BeadChip v1.0 data from three HapMap samples of European ancestry (NA06985, NA06991, and NA06993). For each sample, the total number of CNVs detected using each algorithm is listed along with the percentage of regions unique to each algorithm and the percentage of regions that overlap results from the other CNV detection algorithms.
| QuantiSNP | NA06985 | NA06991 | NA06993 |
|---|---|---|---|
| total regions | 103 | 120 | 113 |
| % unique regions | 33.0% | 41.7% | 38.1% |
| % PennCNV overlap | 10.7% | 14.2% | 11.5 % |
| % GenoCN overlap | 19.4% | 11.7% | 15.9% |
| % PennCNV and GenoCN overlap | 36.9% | 32.5% | 34.5% |
| total regions | 63 | 61 | 60 |
| % unique regions | 11.1% | 8.2% | 5.0% |
| % QuantiSNP overlap | 14.3% | 21.3% | 21.7% |
| % GenoCN overlap | 15.9% | 6.6% | 10.0% |
| % QuantiSNP and GenoCN overlap | 58.7% | 63.9% | 63.3% |
| total regions | 169 | 108 | 132 |
| % unique regions | 48.5% | 42.6% | 50.8% |
| % QuantiSNP overlap | 11.2% | 13.0% | 13.6% |
| % PennCNV overlap | 6.5% | 4.6% | 4.5% |
| % QuantiSNP and PennCNV overlap | 33.7% | 39.8% | 31.1% |
Figure 1Size of CNVs detected in the HapMap samples using QuantiSNP, PennCNV, and GenoCN. The HMM-based CNV detection tools were applied to Illumina Human610-Quad BeadChip v1.0 data from three HapMap samples of European ancestry (NA06985, NA06991, and NA06993). For each sample, boxplots were generated for CNV sizes from each HMM-based detection method. Boxplots on the left are CNV sizes measured in genomic length (base pairs), and boxplots on the right are CNV sizes measured by the number of genotype microarray probes in the detected region.
Performance of the HMM-based CNV detection tools in three HapMap Samples. The copy number status of 2419 CNV regions, containing at least one probe on the Illumina Human610-Quad BeadChip v1.0 genotype array, was determined for three HapMap samples (NA06985, NA06991, NA06993) using results from QuantiSNP, PennCNV, and GenoCN. Comparison of the results from the HMM-based approaches to gold standard copy number data from the Conrad et al study allowed for the calculation of sensitivity, specificity, and the false discovery rate for each HMM-based method. These metrics were recalculated as the list of gold standard CNVs was filtered to remove CNV regions that encompassed less than a minimum number of probes on genotype microarray (stepwise from a minimum of two probes to five probes).
| NA06985 | NA06991 | NA06993 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| QuantiSNP | PennCNV | GenoCN | QuantiSNP | PennCNV | GenoCN | QuantiSNP | PennCNV | GenoCN | |
| sensitivity | 7.85% | 5.39% | 7.50% | 12.15% | 8.68% | 14.24% | 10.70% | 8.72% | 11.74% |
| specificity | 98.99% | 99.04% | 98.81% | 99.29% | 99.62% | 99.29% | 99.40% | 99.72% | 99.45% |
| false discovery rate | 53.66% | 61.76% | 59.09% | 30.00% | 24.24% | 26.79% | 28.89% | 18.75% | 25.53% |
| sensitivity | 11.95% | 8.23% | 11.46% | 17.71% | 13.02% | 20.31% | 14.29% | 12.38% | 17.33% |
| specificity | 98.57% | 98.50% | 98.05% | 98.98% | 99.38% | 98.98% | 99.24% | 99.54% | 99.08% |
| false discovery rate | 50.00% | 60.61% | 59.09% | 27.66% | 24.24% | 25.00% | 25.64% | 19.35% | 25.53% |
| sensitivity | 11.38% | 10.66% | 14.05% | 21.38% | 17.24% | 24.83% | 17.95% | 16.13% | 20.65% |
| specificity | 98.03% | 97.92% | 97.49% | 98.53% | 99.10% | 98.53% | 98.89% | 99.33% | 98.78% |
| false discovery rate | 56.25% | 59.38% | 57.50% | 29.55% | 24.24% | 26.53% | 26.32% | 19.35% | 25.58% |
| sensitivity | 12.50% | 12.63% | 15.96% | 24.55% | 20.91% | 26.36% | 20.66% | 19.17% | 22.50% |
| specificity | 97.38% | 97.39% | 96.63% | 97.95% | 98.74% | 98.10% | 98.74% | 99.22% | 98.59% |
| false discovery rate | 58.62% | 58.62% | 59.46% | 32.50% | 25.81% | 29.27% | 24.24% | 17.86% | 25.00% |
| sensitivity | 14.29% | 14.46% | 17.07% | 23.53% | 18.82% | 25.88% | 20.88% | 18.89% | 21.11% |
| specificity | 96.76% | 96.77% | 96.02% | 97.82% | 98.61% | 98.02% | 98.63% | 99.22% | 98.83% |
| false discovery rate | 58.62% | 58.62% | 60.00% | 35.48% | 30.43% | 31.25% | 26.92% | 19.05% | 24.00% |