| Literature DB >> 22897923 |
Zhongyang Zhang1, Kenneth Lange, Chiara Sabatti.
Abstract
BACKGROUND: Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22897923 PMCID: PMC3534631 DOI: 10.1186/1471-2105-13-205
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Detection accuracy and computation time of four methods on simulated normal samples
| | | | | | | ||||||||
| 5 | Deletion | | 83.80 | 4.92 | | 78.20 | 0.68 | | 63.93 | 1.74 | | 64.27 | 1.83 |
| | Duplication | | 58.53 | 4.67 | | 11.67 | 10.26 | | 20.00 | 37.76 | | 39.87 | 14.33 |
| 10 | Deletion | | 95.03 | 1.45 | | 88.37 | 0.56 | | 88.50 | 0.60 | | 88.87 | 0.56 |
| | Duplication | | 93.43 | 0.78 | | 56.50 | 4.40 | | 83.90 | 12.60 | | 91.60 | 3.85 |
| 20 | Deletion | | 94.63 | 0.58 | | 90.50 | 0.39 | | 90.80 | 0.47 | | 90.83 | 0.47 |
| | Duplication | | 96.13 | 0.92 | | 86.22 | 3.58 | | 92.77 | 4.95 | | 94.98 | 2.13 |
| 30 | Deletion | | 94.57 | 0.28 | | 93.30 | 0.29 | | 89.38 | 0.52 | | 89.77 | 0.53 |
| | Duplication | | 96.09 | 0.05 | | 90.77 | 1.61 | | 94.32 | 1.78 | | 94.98 | 1.29 |
| 40 | Deletion | | 97.83 | 0.59 | | 97.58 | 0.09 | | 97.28 | 0.19 | | 97.28 | 0.19 |
| | Duplication | | 94.61 | 0.46 | | 92.77 | 0.98 | | 93.94 | 1.15 | | 94.63 | 0.75 |
| 50 | Deletion | | 94.33 | 0.07 | | 92.76 | 0.04 | | 90.47 | 0.11 | | 90.48 | 0.11 |
| | Duplication | | 94.50 | 0.09 | | 93.81 | 0.74 | | 93.11 | 0.79 | | 93.64 | 0.49 |
| Overall Deletion | | 95.02 | 0.55 | | 93.06 | 0.19 | | 91.08 | 0.33 | | 91.19 | 0.34 | |
| Overall Duplication | | 93.82 | 0.44 | | 86.92 | 1.55 | | 90.56 | 2.85 | | 92.46 | 1.38 | |
| Overall | | 94.42 | 0.49 | | 89.99 | 0.85 | | 90.82 | 1.60 | | 91.83 | 0.87 | |
| Time (sec.) | 0.48 (0.01) | 0.78 (0.69) | 0.22 (0.13) | 0.28 (0.05) | |||||||||
TPR and FDR are measured as the percentage of related SNPs. Overall accuracy is calculated by pooling all sequences with a given type of CNV. Also reported are the average and standard deviation of the number of seconds required for the analysis of one sequence.
Figure 1Sensitivity as a function of percentage contamination by normal cells in the 10 different simulated CNV regions. Sensitivity is not defined at 100% contamination.
Figure 2Specificity as a function of percentage contamination by normal cells. Note that [53] reports better performance of PSCN in dealing with contamination levels of 85%, 95% and 100%.
Comparison of four CNV analyses on four real normal samples
| | | | | | | | ||||||||
| Analysis 1 | | 170 | 38 | | 144 | 34 | | 160 | 25 | | 145 | 22 | | 1.2 |
| Analysis 2 | | 102 | 36 | | 109 | 33 | | 93 | 25 | | 91 | 20 | | 3.7 |
| Analysis 3 | | 80 | 38 | | 82 | 32 | | 69 | 25 | | 56 | 15 | | 8.5 |
| MPCBS | 98 | 34 | 88 | 28 | 59 | 18 | 68 | 21 | 313.9 | |||||
The number of CNV detected (Det.) and overlapping (Ovlp.) and the average computation time (in minutes) for each sample under the different analyses.
Comparison of three CNV analyses in the bipolar disorder study
| PennCNV | 189 | 63 | 33.33% | 3.44 |
| GFL-Individual (LRR+BAF) | 95 | 50 | 52.63% | 3.90 |
| GFL-Pedigree (LRR) | 106 | 62 | 58.49% | 1.57 |
The number and overlap of CNP regions with frequency ≥0.1 detected in our sample by different methods. These CNP regions were compiled from HapMap. Computation time is given in minutes per sample.
Detected CNVs in a common deletion on Chromosome 8
| PennCNV | 125 | 39 | 102 | 35 | 0.19 |
| GFL-Individual | 123 | 97 | 0 | 20 | 0.21 |
| GFL-Pedigree | 123 | 137 | 0 | 15 | 0.09 |
| MSSCAN-Pedigree | 123 | 154 | 0 | 15 | 0.11 |
Across the various algorithms, subjects are assigned to one of 4 copy numbers. For each algorithm, we report the total numbers of CN≠2 identified, the total number of nuclear families with Mendelian errors, and the average computation time (in minutes) per sample for the analysis of Chromosome 8.
Figure 3CNV detection and Mendelian errors for a Central American pedigree. Displayed are four families derived from an extended pedigree. Circles and squares correspond to females and males. The dashed line is used to indicate identical individuals. Beneath each individual, from top to bottom, are CNV genotypes by PennCNV and by GFL. The subjects for whom PennCNV and GLF infer different CNV genotypes are highlighted in red and blue. Red is used when PennCNV genotypes result in Mendelian error, while GFL genotypes do not. Blue is used when both genotypes are compatible with Mendelian transmissions. Orange singles out a member for whom both PennCNV and GFL genotypes result in Mendelian error.