| Literature DB >> 17910767 |
Agnes Baross1, Allen D Delaney, H Irene Li, Tarun Nayar, Stephane Flibotte, Hong Qian, Susanna Y Chan, Jennifer Asano, Adrian Ally, Manqiu Cao, Patricia Birch, Mabel Brown-John, Nicole Fernandes, Anne Go, Giulia Kennedy, Sylvie Langlois, Patrice Eydoux, J M Friedman, Marco A Marra.
Abstract
BACKGROUND: Genomic deletions and duplications are important in the pathogenesis of diseases, such as cancer and mental retardation, and have recently been shown to occur frequently in unaffected individuals as polymorphisms. Affymetrix GeneChip whole genome sampling analysis (WGSA) combined with 100 K single nucleotide polymorphism (SNP) genotyping arrays is one of several microarray-based approaches that are now being used to detect such structural genomic changes. The popularity of this technology and its associated open source data format have resulted in the development of an increasing number of software packages for the analysis of copy number changes using these SNP arrays.Entities:
Mesh:
Year: 2007 PMID: 17910767 PMCID: PMC2148068 DOI: 10.1186/1471-2105-8-368
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
List of copy number analysis software packages evaluated
| CNAG 1.1 | Copy Number Analyser for GeneChip | yes | yes | yes | [22] |
| dChip (Nov 17, 2005) | DNA-Chip Analyzer | yes | yes | yes | [23] |
| CNAT 3.0 | Chromosome Copy Number Analysis Tool | yes | yes | no | [14] |
| GLAD (R) | Gain and Loss Analysis of DNA | no | yes | yes | [26] |
aCapability to perform normalization, scaling and feature extraction on Affymetrix GeneChip® Mapping 100 K array data.
Figure 1Overview of the data analysis process. A) Methods appear in blue, and data in yellow. B) The reference sets used for each analysis method are as follows. '': within each MR trio (child, mother and father), three comparisons were done – child to father as reference, child to mother as reference, and father to mother as reference. '': each sample was compared to a reference set of 50 unaffected mothers of children with MR. These 50 mothers selected for this reference set had the lowest numbers of CNVs detected by dChip compared to other mothers. '': each sample was compared to a reference set that included all 214 unaffected parents (107 mothers and 107 fathers) of the children with MR. '': a default reference set of 106 individuals provided by Affymetrix for copy number analysis with CNAT [18].
Candidate copy number variants from synthetic data
| CNAG-GLAD | 334 | 20 | 314 | 331 | 58 | 3 | ||||
| dChip | 381 | 166 | 215 | 213 | 32 | 168 | ||||
| dChip-GLAD | 70 | 0 | 70 | 70 | 31 | 0 | ||||
| CNAT-GLAD | 111 | 10 | 101 | 101 | 36 | 10 | ||||
| CNAG-GLAD | 70 | 11 | 59 | 70 | 42 | 0 | ||||
| dChip | 269 | 91 | 178 | 155 | 26 | 114 | ||||
| dChip-GLAD | 101 | 5 | 96 | 94 | 33 | 7 | ||||
| CNAT-GLAD | 49 | 0 | 49 | 48 | 23 | 1 | ||||
aWhere two software packages are listed, the first one was used for normalization and the second for CNV detection.
bCNVs with different chromosomal locations and breakpoints.
cThe number of true (synthetic) CNVs per array is 100.
Candidate copy number variants from empirical data
| 1-CNAG | 2 | 3,210 | 1,755 | 1,455 | 970 | |||
| 2-CNAG | 50 | 924 | 820 | 104 | 35 | |||
| 3-CNAG-GLAD | 2 | 1,850 | 996 | 854 | 343 | |||
| 4-CNAG-GLAD | 50 | 340 | 69 | 271 | 62 | |||
| 5-dChip | 50 | 31,354 | 19,093 | 12,261 | 3,830 | |||
| 6-dChip | 214 | 5,443 | 4,076 | 1,367 | 452 | |||
| 7-dChip-GLAD | 50 | 1,292 | 66 | 1,226 | 456 | |||
| 8-dChip-GLAD | 214 | 1,207 | 30 | 1,177 | 402 | |||
| 9-CNAT-GLAD | 50 | 701 | 253 | 448 | 214 | |||
| 10-CNAT-GLAD | 106 | 454 | 232 | 222 | 98 | |||
| 11-CNAT-GLAD | 214 | 866 | 240 | 626 | 363 | |||
| 1-CNAG | 2 | 444 | 361 | 83 | 21 | |||
| 2-CNAG | 50 | 235 | 211 | 24 | 3 | |||
| 3-CNAG-GLAD | 2 | 416 | 332 | 84 | 27 | |||
| 4-CNAG-GLAD | 50 | 133 | 48 | 85 | 17 | |||
| 5-dChip | 50 | 17,034 | 4,846 | 12,188 | 3,804 | |||
| 6-dChip | 214 | 2,273 | 907 | 1,366 | 452 | |||
| 7-dChip-GLAD | 50 | 1,042 | 27 | 1,015 | 313 | |||
| 8-dChip-GLAD | 214 | 1,027 | 22 | 1,005 | 283 | |||
| 9-CNAT-GLAD | 50 | 426 | 87 | 339 | 115 | |||
| 10-CNAT-GLAD | 106 | 272 | 88 | 184 | 61 | |||
| 11-CNAT-GLAD | 214 | 540 | 117 | 423 | 172 | |||
| 1-CNAG | 2 | 2,127 | 1,161 | 966 | 638 | |||
| 2-CNAG | 50 | 324 | 202 | 122 | 41 | |||
| 3-CNAG-GLAD | 2 | 1,299 | 697 | 602 | 206 | |||
| 4-CNAG-GLAD | 50 | 366 | 20 | 346 | 87 | |||
| 5-dChip | 50 | 21,124 | 17,843 | 3,281 | 1,402 | |||
| 6-dChip | 214 | 5,792 | 4,603 | 1,189 | 469 | |||
| 7-dChip-GLAD | 50 | 790 | 42 | 748 | 253 | |||
| 8-dChip-GLAD | 214 | 806 | 41 | 765 | 274 | |||
| 9-CNAT-GLAD | 50 | 650 | 108 | 542 | 210 | |||
| 10-CNAT-GLAD | 106 | 360 | 90 | 270 | 108 | |||
| 11-CNAT-GLAD | 214 | 462 | 56 | 406 | 161 | |||
| 1-CNAG | 2 | 377 | 300 | 77 | 26 | |||
| 2-CNAG | 50 | 52 | 12 | 40 | 15 | |||
| 3-CNAG-GLAD | 2 | 383 | 287 | 96 | 29 | |||
| 4-CNAG-GLAD | 50 | 140 | 9 | 131 | 38 | |||
| 5-dChip | 50 | 6,488 | 3,230 | 3,258 | 1,392 | |||
| 6-dChip | 214 | 1,822 | 637 | 1,185 | 468 | |||
| 7-dChip-GLAD | 50 | 744 | 23 | 721 | 238 | |||
| 8-dChip-GLAD | 214 | 748 | 18 | 730 | 245 | |||
| 9-CNAT-GLAD | 50 | 547 | 52 | 495 | 182 | |||
| 10-CNAT-GLAD | 106 | 311 | 67 | 244 | 89 | |||
| 11-CNAT-GLAD | 214 | 426 | 48 | 378 | 150 | |||
aPercentage of all candidate CNVs.
bNumbers of false positive deletions were estimated using SNP genotype data (Methods).
cThis is the percentage of false positives among the deletions (# false positive deletions/# deletions * 100).
Detection of validated CNVs
| 3476c | delb | 4 | 10,655 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 1895c | del | 13 | 4,887 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 4818c | del | 12 | 3,204 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 9143c | del | 11 | 3,175 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 8326c | del | 14 | 1,923 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 6235c | del | 10 | 1,737 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 9 | |
| 6545c | del | 7 | 785 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 10 | |
| 7807c | del | 22 | 731 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 4357c | del | 6 | 595 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 4357m | del | 6 | 595 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 4357c | del | 6 | 353 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 4357m | del | 6 | 353 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 5003c | del | 2 | 294 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 9 | |
| 7551c | del | 2 | 220 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 7551m | del | 2 | 220 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 10 | |
| 1280c | del | 4 | 192 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 9 | |
| 1280m | del | 4 | 192 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 9 | |
| 0674c | del | 2 | 147 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 9 | |
| 0674f | del | 2 | 147 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 8 | |
| 5566c | del | 14 | 130 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 4 | |
| 6789c | del | 14 | 68 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 5 | |
| 6789m | del | 14 | 68 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4 | |
| 3476c | del | 1 | 66 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 8 | |
| 3476m | del | 1 | 66 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 6 | |
| 6607c | del | 20 | 57 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| 6607m | del | 20 | 57 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| 8785c | del | 18 | 43 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 7 | |
| 8785f | del | 18 | 43 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 3 | |
| 9299f | del | 9 | 38 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 6 | |
| 9299c | del | 9 | 38 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 5 | |
| 8379c | dup | 10 | 23,842 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 11 | |
| 4794c | dup | 16 | 3,356 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 5 | |
| 8379c | dup | 15 | 1,481 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 10 | |
| 3595c | dup | 15 | 781 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 7 | |
| 3595m | dup | 15 | 781 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | |
| 3923c | dup | 11 | 494 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 7 | |
| 3923m | dup | 11 | 494 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | |
| 6168c | dup | 17 | 324 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 4 | |
| 29 | 24 | 33 | 28 | 27 | 21 | 26 | 26 | 29 | 17 | 32 | ||||||
aDetection of CNV from at least one type of array data (Xba or Hind or both). 1 means detected, 0 means not detected.
bThe method of validation for each CNV is shown in Additional file 15.
Figure 2Size distribution of candidate CNVs detected. The five plots show numbers of candidate copy number gains and losses identified using Xba and Hind arrays, arranged according to the numbers of SNPs within the aberrations: A) all CNVs (>= 4 SNPs); B) CNVs >= 11 SNPs; C) CNVs >= 21 SNPs; D) CNVs >= 41 SNPs and E) CNVs >= 101 SNPs. The y-axis value of each horizontal line represents the total number of CNVs detected by a given method: 1 – CNAG Ref2; 2 – CNAG Ref50; 3 – CNAG-GLAD Ref2; 4 – CNAG-GLAD Ref50; 5 – dChip Ref50; 6 – dChip Ref214; 7 – dChip-GLAD Ref50; 8 – dChip-GLAD Ref214; 9 – CNAT-GLAD Ref50; 10 – CNAT-GLAD Ref106; 11 – CNAT-GLAD Ref214 (the reference sets are described in Figure 1 and in the Methods.) The left and right side of each panel correspond to the fraction of deletions and duplications, respectively. The orange bars within the black lines show the fraction of CNVs that passed the following confidence thresholds: p <= 0.05 (t-test) and copy number < 1.25 for deletions (left); or p <= 0.05 (t-test) and copy number > 2.75 for duplications (right). The fractions of false positive deletion calls, calculated based on SNP heterozygosity, are indicated by the red vertical bars on the left side of each panel. For example, the y-axis value of the top line (5) in plot 'A' indicates the total number of candidate CNVs (52,478) including at least 4 consecutive SNPs identified by dChip Ref50 (from Xba and Hind data). 30% of the 52,478 putative CNVs were deletions (left) and 70% were duplications (right). 99% of the deletions (orange fraction of the line, left) and 22% of the duplications (orange fraction of the line, right) passed our p-value <= 0.05 and copy number (<1.25 or >2.75) thresholds described above. 34% of the candidate deletions were considered to be false positives, indicated by the red bar (left).
Figure 3Theoretical resolving power of CNAG, dChip and CNAT with reference sets of 2, 50, 106 and 214 (see Methods and Figure 1 legend). The resolving power was defined as the average size of the smallest one-copy deletion or duplication that could be detected with a given method at a given confidence level. The theoretical p-value (in log10 scale) is shown as a function of the deletion (A) or duplication (B) size detected from Affymetrix GeneChip 100 K Xba and Hind data. For a given p-value, e.g. 10-5, the theoretical minimum size of detectable deletion or duplication is shown for each method. For a deletion or duplication of a given size, e.g. 400,000 bp, the theoretical p-values are shown for each method.