Literature DB >> 27605193

Copy Number Studies in Noisy Samples.

Philip Ginsbach1, Bowang Chen2, Yanxiang Jiang3, Stefan T Engelter4, Caspar Grond-Ginsbach5.   

Abstract

System noise was analyzed in 77 Affymetrix 6.0 samples from a previous clinical study of copy number variation (CNV). Twenty-three samples were classified as eligible for CNV detection, 29 samples as ineligible and 25 were classified as being of intermediate quality. New software ("noise-free-cnv") was developed to visualize the data and reduce system noise. Fresh DNA preparations were more likely to yield eligible samples (p < 0.001). Eligible samples had higher rates of successfully genotyped SNPs (p < 0.001) and lower variance of signal intensities (p < 0.001), yielded fewer CNV findings after Birdview analysis (p < 0.001), and showed a tendency to yield fewer PennCNV calls (p = 0.053). The noise-free-cnv software visualized trend patterns of noise in the signal intensities across the ordered SNPs, including a wave pattern of noise, being co-linear with the banding pattern of metaphase chromosomes, as well as system deviations of individual probe sets (per-SNP noise). Wave noise and per-SNP noise occurred independently and could be separately removed from the samples. We recommend a two-step procedure of CNV validation, including noise reduction and visual inspection of all CNV calls, prior to molecular validation of a selected number of putative CNVs.

Entities:  

Keywords:  copy number variation (CNV); noise reduction; noise-free-cnv software; per-SNP noise; validation of CNV findings; variance; wave noise

Year:  2013        PMID: 27605193      PMCID: PMC5003442          DOI: 10.3390/microarrays2040284

Source DB:  PubMed          Journal:  Microarrays (Basel)        ISSN: 2076-3905


1. Introduction

Genomic copy number variation (CNV) was associated with a variety of clinical phenotypes [1,2,3,4,5,6]. Hence, the study of CNV is of diagnostic importance. CNV identification from high-density SNP-microarrays may be unreliable, particularly in noisy data [7,8,9]. Therefore, extensive validation of CNV findings is needed. Since CNV detection software may identify hundreds of putative CNVs in each sample and since validation of CNV findings by qPCR, or by other molecular methods, is laborious, we searched for simple strategies to evaluate large numbers of CNV findings. Rigorous studies revealed that several components of system error occur in copy number data [10,11,12,13]. Here we focus on two major types of noise and present the noise-free-cnv software package for the visualization of copy number data and for the reduction of noise. This software enables large-scale inspection of CNV findings (produced by PennCNV [14], Birdview [15,16], or other specialized software packages). For illustration, we used 77 microarrays from a previous study of patients with cervical artery dissection from Switzerland and Southern Germany (age: 42.5 ± 9.8 years; 31 (40.3%) women) [17]. DNA was isolated from peripheral blood samples (no DNA from lymphoblastoid cell lines was used). DNA extraction, array hybridization, and array scanning were performed according to the manufacturer’s instructions [17]. The LRR and BAF values were obtained from the CEL files with the Affymetrix Power Tools software (APT). The quantile normalization was done in APT. The LRR and BAF can be then imported to PennCNV, to other CNV detections software packages (QuantiSNP, MAD), or to noise-free-cnv. The Affymetrix 6.0 microarrays used for CNV detection contain a total of 906,600 single nucleotide polymorphisms (SNPs) and 946,000 non-polymorphic copy number probes (CNPs) covering all human chromosomes. In the present article, the notion of SNP is used for all analyzed probe sets (SNPs as well as CNPs).

2. Noise Components

Figure 1 shows two samples (visualized by noise-free-cnv), displaying signal intensity (LRR—upper panel) and B-allele frequency (BAF—lower panel) of all SNPs ordered along the chromosomes. The Log R Ratio (LRR) is a normalized measure of the total signal intensity for two alleles of the SNP. The B-Allele Frequency (BAF) is a normalized measure of the allelic intensity ratio of two alleles [18]. Signal intensities in sample ID 2355 show larger variance than in ID 1022. Moreover, a prominent pattern of waves is apparent in sample ID 2355. In many samples, we observed similar wave patterns. The noise-free-cnv software identified waves using a Gaussian filter with a large standard deviation, for instance comprising 1,000 SNPs. This filter “blurs” the values as shown in Figure 2(G,H). We called the resulting wave data the wave component of the LRR values. The variance of the blurred LRR values is a measure for the prominence of waves, the wave variance.
Figure 1

Signal strength (LRR) and B-allele frequency (BAF) of samples from two male patients (ID 2355 and ID 1022). SNPs were visualized in increasing position along the chromosomes. LRR values of patient ID 2355 have larger variance and show pronounced wave noise.

Figure 2

Wave noise. Ideograms of pro-metaphase (A) and metaphase (B) chromosome 7 were compared with signal intensities of SNPs of chromosome 7 of two patients (C,D) and with a human prometaphase (E) and metaphase (F) chromosome 7. Signal intensities shown in C and D were smoothed (noise-free-cnv software, function “blur” across 1,000 probe sets) to visualize genomic waves (G,H).

Signal strength (LRR) and B-allele frequency (BAF) of samples from two male patients (ID 2355 and ID 1022). SNPs were visualized in increasing position along the chromosomes. LRR values of patient ID 2355 have larger variance and show pronounced wave noise. Wave noise. Ideograms of pro-metaphase (A) and metaphase (B) chromosome 7 were compared with signal intensities of SNPs of chromosome 7 of two patients (C,D) and with a human prometaphase (E) and metaphase (F) chromosome 7. Signal intensities shown in C and D were smoothed (noise-free-cnv software, function “blur” across 1,000 probe sets) to visualize genomic waves (G,H). This wave pattern was compared with the banding pattern of metaphase chromosomes (Figure 2). Human metaphase chromosomes were stained with the Giemsa-trypsine procedure, which induces a banding pattern. AT-rich regions are more frequent in Giemsa-dark bands than in Giemsa-light bands [19,20]. In our study samples, Giemsa-dark bands corresponded to genomic regions with reduced probe set signals. This pattern of noise was described by others as “genomic waves” or “CG-waves” [10,11,12,13]. The co-linearity of genomic waves with Giemsa bands illustrates that genomic waves follow a similar pattern in all samples. After subtraction of the wave component, the resulting LRR values follow an approximately normal distribution around zero. We called the resulting values per-SNP component and their variance the per-SNP variance. The decomposition of system noise in wave component and per-SNP component is shown for one sample in Figure 3. Wave variance and per-SNP variance components were calculated for all samples in Table A1.
Figure 3

Noise components. LRR values of a noisy sample (A), split up in wave component (B) and per-SNP component (C). All SNPs of chromosomes 1–3 were shown (chromosomes indicated on top of panel A).

Table A1

Eligibility of samples.

IDcall ratevar.wave var.per_SNP var.Chr1Chr2Chr3Chr4Chr5Chr6Chr7Chr8Chr9Chr10Chr11Chr12Chr13Chr14Chr15Chr16Chr17Chr18Chr19Chr20Chr21Chr22
398.330.0680.0010.0672002200000011010022000102
1596.020.1440.0010.142104388112050161181501307822112119040
3696.000.2430.0040.2385068456787511,376503977974722385379639593232541752356397364225262203
3895.760.1830.0010.181179801024814080912683384122442714420471144115504020
4896.160.1740.0070.16703960141234290695651059110183236129320941555
4994.810.1740.0070.16720331351114028332214622981541002312070
5097.140.1030.0020.1014611014222422080065001123045022
6297.920.0900.0030.0870140143191511220000006160002
7193.920.2020.0020.20066749032678149402692522151164523196163142266134148248389750
7693.710.2290.0020.22651125120065229593364673524224572338518517025228516146211216749
9789.520.2910.0080.2823,1235,46711,6133,7956,7294,8705,3744,8984,4554,1695,7216,4923,4863,0203,7094,0314,1203,1773,5812,4661,7751,281
10197.850.0770.0020.0750000000000000900000200
11196.510.1750.0030.17270761449287187297664266105476117152007410136621
11294.700.1470.0110.1342,3782,9882,7432,5142,8234,4552,7743,0142,8372,7074,2273,0542,3151,5951,9051,0872,0852,0441,562650580602
12996.610.1390.0030.1357315191935394307723231009602740220
13196.450.1990.0020.1963192291393141081252483041723632796896931182121827183281328
14196.440.1210.0170.10126612117417314764195121258244329145365920210596208908672
14494.720.1760.0050.17025174613046425032825341039013979955553631931785228619591545
16894.360.3150.0360.2755,2055,3825,2724,4794,3375,7376,6824,5043,9662,6272,9015,2942,4923,0652,4873,1262,5592,6582,1791,405961948
18295.020.3160.0020.3131,1891,9882,0511,0962,3012,9911,8142,3651,4086871,1441,3148396546898155691,953427517456377
18890.120.4740.0110.46114,53415,55428,32210,24510,90416,21212,47114,3006,6426,4998,6819,2415,5956,7847,5035,1506,0487,0283,9782,4173,5362,274
18997.320.0970.0120.08412034155123541746902947341661000300352
19397.340.0930.0040.0885330300057162047000000000
41297.510.0920.0060.086006220020021401022000220004
41598.230.1030.0010.1025110531035020020000090002
42196.730.1650.0550.10207000528000000200040000
42296.100.0740.0020.07337,99636,65432,34329,93529,24829,06223,19824,95121,45722,17222,24721,57415,56014,54214,52512,57410,44414,0958,61610,2947,2184,456
43096.390.1600.0190.13810,61410,0969,19616,27815,3837,9597,7589,4656,7576,2956,2916,5594,9553,4872,4913,4593,0745,4041,5252,3993,5691,187
43889.760.4630.0570.39945,98156,16242,40345,75055,97637,90144,21234,06930,38726,84226,46138,93023,96320,80018,25819,16016,62418,3679,63713,4858,7905,479
44297.820.0840.0080.076486382001409572033400000067043
45195.960.2050.0040.20010915449363753611072131241008515012420634601165059019
46180.860.7060.0070.69664,82274,30241,65564,99356,98758,82549,36955,43345,35747,37235,75349,37730,87027,15825,44829,59014,63722,23115,37219,28912,4318,971
61397.640.0900.0010.08826374049150022210200007020
64795.490.1570.0020.15415130166047906454158052841240144411
65398.220.0790.0040.074122141,61826712709000002000044
66597.320.1230.0370.0825,0718,1406,2895,5206,4176,8165,8906,8944,2385,3264,5035,2462,6412,6962,3011,7661,4454,2111,6991,3992,448932
67096.740.1520.0430.1053,1604,4863,8953,9133,8123,0383,3093,6762,3592,1123,3272,7911,5911,5951,1429691,0282,0435431,2131,034286
67597.670.0840.0090.07400304133700584000110048000
67698.150.0780.0010.0770000452180001440000000000
67795.580.2080.0030.204251491615477572671573821063135264344457137121165675
69395.720.1330.0050.1284731863112152770108731410053202012
71597.440.0950.0060.0893500404710200004040240400
71796.600.1340.0010.1320222212941017065710020150034000
72994.750.1890.0010.1871152311518449558491424510111116571038057611920220
73395.840.1980.0090.18823288421821803061804034591132333621086838460
73595.280.1730.0030.1699421111290814111614220790874731344316131810022
74296.830.1140.0130.09900302010402001582180137000
74497.260.1080.0170.0890135126332731482630135315215018034154001661140940
74695.720.2530.0040.2482838953503782873882277235947122295593232069847839868934173184173
75097.150.1140.0150.0972,3896421,7501,5011,3701,7927071,440997615674750559225544631878144581771840
75297.800.1030.0040.0998812111915519683118187932066614727355041327015362725
79694.230.2470.0020.2442,1654825681,1275392633605914406979088061066302564982625651272756748
102098.210.0750.0010.07420002120000300000080160
102297.770.0820.0010.08161623022201201100000010000
102698.340.0660.0010.06410005000002001190003000
102897.490.0890.0010.088900040490300300000000000
102998.540.0620.0010.0603640000000010250025020000
103397.580.0870.0010.085008300020704400080000000
103497.500.0920.0100.0810002001200000000000000
103798.260.0680.0010.06785250262120210200354132200
104096.560.1140.0010.1139017533140000017000000010
104197.160.0940.0010.09300042200001330000000000
104297.460.0870.0030.08415141920112810194226841842242020
105697.110.0980.0030.09520220200000150400350100018
106398.310.0750.0020.072021691000290000020130002000
106598.230.0680.0030.06504270002000080000584000
108897.640.0920.0040.088532361458902676324450713461554022134939
109196.690.1380.0290.1057,6316,0807,1466,5127,2994,0064,9526,6664,1692,5793,9904,0122,3882,1941,4152,6131,0302,8286463,6971,862451
114797.960.0790.0020.07757648004416023202207190000
115197.900.0870.0010.08624003008049000002000112
211093.160.3430.0100.3326713,3073,6141,4143,5483,0352,7752,0701,5812,7182,1532,0541,5731,4269951,1731,5841,358662285971581
213495.730.1340.0080.1251445281515110028887003577355716567412181
214497.120.0930.0040.08925814040201440003000200
224094.480.2990.0040.2944881,6561,0191,7342,1801,6057541,7501,1795319161,728916751643921501427713356339374
235594.500.2580.0260.2291,8701,7991,4227411,4912,8291,6631,3538291,5001,3601,7146546771,2627671,4201,061842826464452
240694.780.1950.0110.183671251361222018014210962615517068347608312253430
D_06294.170.3220.0030.3187728385475364,419436711496239308219582319215222816254010186094
Noise components. LRR values of a noisy sample (A), split up in wave component (B) and per-SNP component (C). All SNPs of chromosomes 1–3 were shown (chromosomes indicated on top of panel A). The system deviations of individual SNP signal intensities are strongly correlated across samples (Figure 4). To quantify the correlation of the noise (variance) components between different samples, we computed two additional data series: for each SNP the median through all 77 per-SNP components was computed and saved as the per-SNP profile. For the wave profile the same procedure was applied to the wave components. We then computed, for each sample, the correlation between the wave profile and the (individual) wave component as well as the correlation between the per-SNP profile and the (individual) per-SNP component. Details of the algorithm are described in Appendix. The high correlations found in our 77 samples confirmed that wave noise and per-SNP noise are system noise, i.e., follow highly non-random patterns. On average, the correlation was 0.843 for the wave component and 0.568 for the per-SNP component.
Figure 4

per-SNP system noise. Signal intensities in genomic region 2: 189766706–189891527 shown for four patients (ID 1020; ID 1022; ID1026; ID 1028). The lower panel shows the per-SNP median profile (median signal intensities) of all samples (n = 77). Arrows and arrowheads indicate SNPs with LRR values far above and below the mean.

3. Factors Associated with Quality of Copy Number Data

The resolution of a classical chromosome study depends on the quality of the chromosomes and is expressed as the total number of visible cytogenetic bands (400 bands: low to moderate quality; 850 bands: excellent quality). According to our knowledge, no comparable quality metric for molecular karyotyping exists. Quality control in most copy number studies consists of rejecting samples with outlier numbers of CNV findings. A quality metric for the resolution of a CNV study (relating the size of a CNV and the likelihood of its detection) has not yet been defined. per-SNP system noise. Signal intensities in genomic region 2: 189766706–189891527 shown for four patients (ID 1020; ID 1022; ID1026; ID 1028). The lower panel shows the per-SNP median profile (median signal intensities) of all samples (n = 77). Arrows and arrowheads indicate SNPs with LRR values far above and below the mean. In the current study we propose a preliminary quality metric based on the median number of SNPs per chromosome with copy number state (CN) ≠ 2 (numbers/chromosome for all cases are shown in Table A1). Copy Number state of each SNP was determined by the Affymetrix Power Tools software package (APT). SNPs located in common CNVs were excluded from this analysis. To identify SNPs located in common CNVs, we analyzed 403 control samples without visible waves and with highest genotype call rates selected from a large German population (PopGen [21]), as described before [17]. The median number of SNPs with CN ≠ 2 per chromosome was considered as a preliminary quality metric. The quality of a sample was related to the chromosomal background of SNPs with abnormal copy number (Figure 5). We defined deliberate quality categories: samples were classified as eligible, if the median number of SNPs per chromosome with CN ≠ 2 was zero, those with >100 SNPs with CN ≠ 2 were classified as ineligible.
Figure 5

Quality of copy number samples. Number of SNPs with CN ≠ 2 per chromosome were scored. Sample ID 715 is eligible for CNV studies (most chromosomes without SNPs with CN ≠ 2). Accumulation of aberrant SNPs in chromosome 7 and 18 indicates presence of rare CNVs. Sample ID 50 is of intermediate quality. Sample ID 062 was classified as ineligible for CNV studies (>100 SNPs with CN ≠ 2 in most chromosomes).

Quality of copy number samples. Number of SNPs with CN ≠ 2 per chromosome were scored. Sample ID 715 is eligible for CNV studies (most chromosomes without SNPs with CN ≠ 2). Accumulation of aberrant SNPs in chromosome 7 and 18 indicates presence of rare CNVs. Sample ID 50 is of intermediate quality. Sample ID 062 was classified as ineligible for CNV studies (>100 SNPs with CN ≠ 2 in most chromosomes). Samples were classified according to the defined quality categories in Table 1. The use of freshly prepared DNA (compared to DNA samples that were used since years and had been thawed and frozen repeatedly) was a significant determinant of eligible samples (p < 0.001). Samples with high call rate (rate of successfully genotyped SNPs) were more likely to be suitable for copy number studies than those with lower call rates (p < 0.001). Low levels of wave variance as well as per-SNP variance were associated with eligibility for CNV analysis (p < 0.001). Eligibility for CNV studies was not significantly associated with the median number of calls by PennCNV (p = 0.053). However, eligible samples had between 63 and 165 calls, while the range of calls was much broader in ineligible samples. Birdview yielded significantly more calls in ineligible samples (p < 0.001). The proportion of putative false positive Birdview calls increased with decreasing confidence rates: The number of CNV findings with confidence below 2.5 was most strongly elevated.
Table 1

Characteristics of 77 analyzed samples, classified according to eligibility for copy number variation (CNV) analysis. Numbers indicate mean values and range (lowest–highest value). Mean values were compared between groups with the Chi-2 test or the Kruskal-Wallis test.

IneligibleIntermediateEligibleChi-2/kruskal-wallis
(n = 29)(n = 25)(n = 23) p
Fresh DNA preparation0 (0.0 %)6 (20.7 %)14 (60.9 %)<0.001
Genotyping call rate94.7 [80.9–97.3]96.6 [94.8–98.3]97.7 [96.6–98.5]<0.001
Autosomal variance0.2291 [0.115–0.706]0.1343 [0.068–0.208]0.0870 [0.062–0.114]<0.001
wave noise0.0109 [0.002–0.058]0.0034 [0.001–0.017]0.0015 [0.001–0.013]<0.001
per–SNP noise0.2259 [0.082–0.696]0.1281 [0.067–0.204]0.0811 [0.060–0.164]<0.001
 PennCNV, No. of calls238 [14–1821)103 [34–1024]98 [63–165]0.053
 PennCNV, % of deletions18.6 [1.3–81.3]27.4 [0.7–65.9]40.0 [10.3–54.8]0.164
Birdview No. of calls527 [163–8,203]225 [154–1,339]208 [163–348]<0.001
 Birdview (cf > 10)15 [2–717]12 [5–33]14 [4–20]0.048
 Birdview (cf = 10)89 [76–145]92 [74–105]94 [77–102]0.209
 Birdview (cf 2.5–10)93 [14–3344]19 [10–361]21 [11–45]<0.001
 Birdview (cf < 2.5)370 [52–5665]106 [35–857]85 [42–194]<0.001
Characteristics of 77 analyzed samples, classified according to eligibility for copy number variation (CNV) analysis. Numbers indicate mean values and range (lowest–highest value). Mean values were compared between groups with the Chi-2 test or the Kruskal-Wallis test. Figure 6 summarizes salient aspects of system noise in SNP microarrays. Figure 6(A) plots for each sample the variances of wave component and per-SNP component. Wave variance and per-SNP variance seem to occur independently from each other: the observed correlation between both noise components (r = 0.124) was not significant (p = 0.401). Figure 6(B) illustrates the relation between sample eligibility and noise components in the eligible (n = 23) and ineligible (n = 29) cases. Eligible samples (i.e., those that are supposed to be excellent for copy number studies) have low levels of per-SNP variance. Samples with high wave variance are inappropriate for copy number studies.
Figure 6

Wave variance and per-SNP variance. (A) Noise components in all 77 samples and (B) in samples of low (O) and high (●) quality (samples of intermediate quality were not included in (B)).

Wave variance and per-SNP variance. (A) Noise components in all 77 samples and (B) in samples of low (O) and high (●) quality (samples of intermediate quality were not included in (B)).

4. Noise Reduction in Copy Number Samples

The noise-free-cnv software package permits the visualization of samples, the isolation of noise components and the subtraction of isolated noise components. The next two examples (Figure 7 and Figure 8) illustrate noise reduction by comparing a test sample with a reference sample. We finally demonstrate the use of the noise-free-cnv-filter algorithm for the evaluation of CNVs.
Figure 7

Signal intensities (y-axis: LRR values) of all SNPs from chromosome 18q up to chromosome 22. (A) Patient ID 1091; (B) reference sample ID 2355. After subtraction of the samples, a deletion in chromosome 20 became apparent (arrow).

Figure 8

Sample with mosaic large deletion in chromosome 5q. (A,B) LRR- and BAF-values of SNPs of chromosomes 5 and 6 of patient. (C) LRR values of reference sample. (D) Signal intensities after subtraction of reference sample. Arrows indicate region with reduced LRR values. (E) LRR values after application of noise-free-cnv blur over 2,000 SNPs. (Bottom panel) Chromosome analysis of cultured peripheral blood lymphocytes from patient (courtesy of Johannes W.G. Janssen, Department of Human Genetics, University of Heidelberg). Arrow points to 5q-minus chromosome.

Figure 7 shows a deletion in chromosome 20 of patient ID 1091, which was detected by PennCNV and Birdview analysis. Due to strong waves, reduced signal intensities in the region of the putative deletion are not easily seen. Visual inspection of the LRR values of chromosome 20 after subtraction of a reference sample (A–B) suggested the presence of a true deletion in this patient. Signal intensities (y-axis: LRR values) of all SNPs from chromosome 18q up to chromosome 22. (A) Patient ID 1091; (B) reference sample ID 2355. After subtraction of the samples, a deletion in chromosome 20 became apparent (arrow). Figure 8 illustrates the analysis of a mosaic deletion. Although sample ID D62 was classified as ineligible for CNV studies, analysis of SNPs with CN ≠ 2 per chromosome revealed significant clustering on chromosome 5 (Table A1; Figure 5). Neither PennCNV nor Birdsuite identified a large CNV on chromosome 5. After noise reduction, LRR and BAF values were suggestive for the presence of a mosaic deletion [22,23,24] (Figure 8(B,D)). To confirm the diagnosis of a mosaic deletion, a conventional chromosome analysis was performed: Some rare 5q chromosomes were observed amongst a majority of normal chromosome sets. Interestingly, it was recently demonstrated that the identification of mosaic abnormalities by microarray analysis is unreliable [25]. We developed the noise-free-cnv-filter algorithm for optimized noise reduction (Appendix). In the samples of our study population, noise-free-cnv-filter analysis resulted in an average reduction of the wave variance by 74.2%, of per-SNP variance by 35.3% and of the overall variance by 38.1%. Noise-reduction according to this algorithm supports the evaluation of CNV findings, in particular when the putative CNVs are small (Figure 9).
Figure 9

Validation of CNV findings. Left panels show crude LRR values, left panels show LRR values after noise-free-cnv-filter analysis. Samples were renamed with suffix “nf” after noise-free-cnv-filter analysis. Bars indicate putative CNV findings.

In patient ID 715, both Birdview and PennCNV identified a deletion on chromosome 18 (green bar in Figure 9). Noise-free-cnv-filter analysis of the sample (ID 715 nf) suggested that the deletion was true. Subsequent molecular analysis confirmed the finding: the joining segment of the deletion was identified by a case-specific PCR and the breakpoints of the deletion were identified by DNA sequencing following standard procedures [17,26]. Two putative duplications in patients ID 412 were evaluated after noise-free-cnv-filter analysis. We considered the duplication in chromosome 1 (region 222 Mb) as spurious (red bar), but the duplication in chromosome 9 as probably true. As a consequence, this putative duplication is a candidate for further validation by molecular methods. Sample with mosaic large deletion in chromosome 5q. (A,B) LRR- and BAF-values of SNPs of chromosomes 5 and 6 of patient. (C) LRR values of reference sample. (D) Signal intensities after subtraction of reference sample. Arrows indicate region with reduced LRR values. (E) LRR values after application of noise-free-cnv blur over 2,000 SNPs. (Bottom panel) Chromosome analysis of cultured peripheral blood lymphocytes from patient (courtesy of Johannes W.G. Janssen, Department of Human Genetics, University of Heidelberg). Arrow points to 5q-minus chromosome. Validation of CNV findings. Left panels show crude LRR values, left panels show LRR values after noise-free-cnv-filter analysis. Samples were renamed with suffix “nf” after noise-free-cnv-filter analysis. Bars indicate putative CNV findings.

5. Conclusions—Proposal of a Two-Step Procedure for the Validation of CNV Findings

Our analysis had the following key findings: (1) Copy number samples may be noisy, which interferes—above a certain level of noise—with reliable identification of CNVs; (2) Eligible copy number samples were more likely when fresh DNA was used for microarray hybridization; (3) wave component and per-SNP component of noise are independent; (4) noise-free-cnv software enables noise reduction by subtracting wave and per-SNP noise components from samples; and (5) noise-free-cnv software supports the quality control of copy number data and the validation of copy number findings. The current noise-free-cnv version was developed for the analysis of SNP microarray samples and was not designed for noise reduction in array based comparative genomic hybridization samples. The present study highlighted the value of noise reduction for large scale CNV validation (after software-assisted CNV detection). However, the value of noise reduction before software-assisted CNV detection is to be analyzed in future studies. Based on our analysis of noise in real-life copy number samples we suggested a two-step procedure of CNV validation. As a first step of preliminary CNV validation we proposed large-scale inspection of CNV findings after noise reduction, to select putative candidate CNVs and reject false positive findings. In a second stage, this selection of putative CNV calls is analyzed further by independent molecular methods for final validation [17,26].
Table A2

Analysis of noise components in samples.

IDvariancewave varianceper-SNP variancewave correlationper-SNP correlationwave subtraction factorper-SNP subtraction factor
30.0680.0010.0670.8040.6760.4090.800
150.1440.0010.1420.5670.5640.3930.972
360.2430.0040.2380.8770.3880.9970.865
380.1830.0010.1810.4560.5020.3140.977
480.1740.0070.1670.9390.5081.4050.949
490.1740.0070.1670.9590.5271.4190.985
500.1030.0020.1010.8810.5360.6260.779
620.0900.0030.0870.9290.6090.8860.820
710.2020.0020.2000.3650.5890.2741.203
760.2290.0020.2260.5800.5290.5111.151
970.2910.0080.2820.8750.3601.4270.873
1010.0770.0020.0750.9060.7270.7170.910
1110.1750.0030.1720.9140.4820.9030.914
1120.1470.0110.1340.8640.5781.6710.968
1290.1390.0030.1350.8980.6200.9641.043
1310.1990.0020.1960.8750.4170.7900.844
1410.1210.0170.1010.9490.7032.2971.024
1440.1760.0050.1700.8810.5911.1411.115
1680.3150.0360.2750.9360.4063.2340.975
1820.3160.0020.3130.8090.3860.7040.990
1890.0970.0120.0840.9300.6681.8740.883
1930.0930.0040.0880.9600.7041.1740.956
4120.0920.0060.0860.9500.6911.3040.925
4210.1030.0010.1020.8980.6800.5980.991
4220.1650.0550.1020.8730.5233.7630.763
4250.0740.0020.0730.9080.5720.6530.705
4300.1600.0190.1380.8960.4572.2650.778
4380.4630.0570.3990.9130.3544.0061.021
4420.0840.0080.0760.9420.6631.4960.835
4510.2050.0040.2000.9270.4531.1110.927
4610.7060.0070.696−0.3100.228−0.4600.868
6130.0900.0010.0880.8500.5650.5890.767
6470.1570.0020.1540.7660.5890.6711.059
6530.0790.0040.0740.9380.5691.1410.707
6650.1230.0370.0820.8960.5553.1590.726
6700.1520.0430.1050.9060.5653.4220.837
6750.0840.0090.0740.9510.6721.6470.837
6760.0780.0010.0770.6460.6350.3130.807
6770.2080.0030.2040.9010.4650.8700.960
6930.1330.0050.1280.9530.5811.1790.952
7150.0950.0060.0890.9500.6671.3080.911
7170.1340.0010.1320.5220.6500.3511.081
7290.1890.0010.1870.6060.6110.4111.209
7330.1980.0090.1880.9470.4881.6280.967
7350.1730.0030.1690.9010.6090.9171.144
7420.1140.0130.0990.9560.6492.0170.936
7440.1080.0170.0890.9210.5702.1860.779
7460.2530.0040.2480.9060.4141.0330.943
7500.1140.0150.0970.9400.6122.1370.874
7520.1030.0040.0990.9370.4711.0330.679
7960.2470.0020.2440.0150.5270.0121.192
10200.0750.0010.0740.7420.6140.3480.767
10220.0820.0010.0810.9090.6440.6400.836
10260.0660.0010.0640.8610.6640.5570.770
10280.0890.0010.0880.7420.6430.3800.871
10290.0620.0010.0600.9120.6610.5720.742
10330.0870.0010.0850.8200.7030.5180.940
10340.0920.0100.0810.9630.7091.7320.924
10370.0680.0010.0670.7820.6330.3900.748
10400.1140.0010.1130.6390.7010.3551.077
10410.0940.0010.0930.8500.6720.5460.937
10420.0870.0030.0840.9470.6950.8840.921
10560.0980.0030.0950.9410.6860.8930.966
10630.0750.0020.0720.9240.5710.8320.701
10650.0680.0030.0650.9590.6570.9040.764
10880.0920.0040.0880.9180.4971.0460.675
10910.1380.0290.1050.9120.5372.8540.797
11470.0790.0020.0770.9440.7170.7950.909
11510.0870.0010.0860.6880.5720.3110.769
21100.3430.0100.3320.9480.4081.7151.075
21340.1340.0080.1250.9360.5511.5750.893
21440.0930.0040.0890.9530.6091.0460.832
22400.2990.0040.2940.9090.4331.0601.075
23550.2580.0260.2290.9600.5302.8261.161
24060.1950.0110.1830.9400.5701.8331.113
188c0.4740.0110.4610.8990.3141.7150.975
D620.3220.0030.3180.8190.3400.7610.878
  26 in total

1.  PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data.

Authors:  Kai Wang; Mingyao Li; Dexter Hadley; Rui Liu; Joseph Glessner; Struan F A Grant; Hakon Hakonarson; Maja Bucan
Journal:  Genome Res       Date:  2007-10-05       Impact factor: 9.043

2.  Copy Number Variation Detection via High-Density SNP Genotyping.

Authors:  Kai Wang; Maja Bucan
Journal:  CSH Protoc       Date:  2008-06-01

Review 3.  Detection and interpretation of genomic structural variation in health and disease.

Authors:  Geert Vandeweyer; R Frank Kooy
Journal:  Expert Rev Mol Diagn       Date:  2013-01       Impact factor: 5.225

Review 4.  Role of copy number variants in structural birth defects.

Authors:  Abigail E Southard; Lisa J Edelmann; Bruce D Gelb
Journal:  Pediatrics       Date:  2012-03-19       Impact factor: 7.124

5.  Copy number variation in patients with cervical artery dissection.

Authors:  Caspar Grond-Ginsbach; Bowang Chen; Rastislav Pjontek; Tina Wiest; Yanxiang Jiang; Barbara Burwinkel; Sandrine Tchatchou; Michael Krawczak; Stefan Schreiber; Tobias Brandt; Manja Kloss; Marie-Luise Arnold; Kari Hemminki; Christoph Lichy; Philippe A Lyrer; Ingrid Hausser; Stefan T Engelter
Journal:  Eur J Hum Genet       Date:  2012-05-23       Impact factor: 4.246

6.  Human chromosomal bands: nested structure, high-definition map and molecular basis.

Authors:  Maria Costantini; Oliver Clay; Concetta Federico; Salvatore Saccone; Fabio Auletta; Giorgio Bernardi
Journal:  Chromosoma       Date:  2006-10-28       Impact factor: 4.316

7.  Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays.

Authors:  Andrew E Dellinger; Seang-Mei Saw; Liang K Goh; Mark Seielstad; Terri L Young; Yi-Ju Li
Journal:  Nucleic Acids Res       Date:  2010-02-08       Impact factor: 16.971

Review 8.  PopGen: population-based recruitment of patients and controls for the analysis of complex genotype-phenotype relationships.

Authors:  Michael Krawczak; Susanna Nikolaus; Huberta von Eberstein; Peter J P Croucher; Nour Eddine El Mokhtari; Stefan Schreiber
Journal:  Community Genet       Date:  2006

9.  Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization.

Authors:  John C Marioni; Natalie P Thorne; Armand Valsesia; Tomas Fitzgerald; Richard Redon; Heike Fiegler; T Daniel Andrews; Barbara E Stranger; Andrew G Lynch; Emmanouil T Dermitzakis; Nigel P Carter; Simon Tavaré; Matthew E Hurles
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

10.  Comparison of chromosome analysis and chromosomal microarray analysis: what is the value of chromosome analysis in today's genomic array era?

Authors:  Weimin Bi; Caroline Borgan; Amber N Pursley; Patricia Hixson; Chad A Shaw; Carlos A Bacino; Seema R Lalani; Ankita Patel; Pawel Stankiewicz; James R Lupski; Arthur L Beaudet; Sau Wai Cheung
Journal:  Genet Med       Date:  2012-12-13       Impact factor: 8.822

View more
  4 in total

Review 1.  Copy Number Variation and Risk of Stroke.

Authors:  Caspar Grond-Ginsbach; Philipp Erhart; Bowang Chen; Manja Kloss; Stefan T Engelter; John W Cole
Journal:  Stroke       Date:  2018-10       Impact factor: 7.914

2.  Genetic Imbalance in Patients with Cervical Artery Dissection.

Authors:  Caspar Grond-Ginsbach; Bowang Chen; Michael Krawczak; Rastislav Pjontek; Philip Ginsbach; Yanxiang Jiang; Shérine Abboud; Marie-Luise Arnold; Anna Bersano; Tobias Brandt; Valeria Caso; Stéphanie Debette; Martin Dichgans; Andreas Geschwendtner; Giacomo Giacalone; Juan-José Martin; Antti J Metso; Tiina M Metso; Armin J Grau; Manja Kloss; Christoph Lichy; Alessandro Pezzini; Christopher Traenka; Stefan Schreiber; Vincent Thijs; Emmanuel Touzé; Elisabetta Del Zotto; Turgut Tatlisumak; Didier Leys; Philippe A Lyrer; Stefan T Engelter
Journal:  Curr Genomics       Date:  2017-04       Impact factor: 2.236

3.  The copy number variation and stroke (CaNVAS) risk and outcome study.

Authors:  John W Cole; Taiwo Adigun; Rufus Akinyemi; Onoja Matthew Akpa; Steven Bell; Bowang Chen; Jordi Jimenez Conde; Uxue Lazcano Dobao; Israel Fernandez; Myriam Fornage; Cristina Gallego-Fabrega; Christina Jern; Michael Krawczak; Arne Lindgren; Hugh S Markus; Olle Melander; Mayowa Owolabi; Kristina Schlicht; Martin Söderholm; Vinodh Srinivasasainagendra; Carolina Soriano Tárraga; Martin Stenman; Hemant Tiwari; Margaret Corasaniti; Natalie Fecteau; Beth Guizzardi; Haley Lopez; Kevin Nguyen; Brady Gaynor; Timothy O'Connor; O Colin Stine; Steven J Kittner; Patrick McArdle; Braxton D Mitchell; Huichun Xu; Caspar Grond-Ginsbach
Journal:  PLoS One       Date:  2021-04-19       Impact factor: 3.752

4.  Multiple Arterial Dissections and Connective Tissue Abnormalities.

Authors:  Philipp Erhart; Daniel Körfer; Susanne Dihlmann; Jia-Lu Qiao; Ingrid Hausser; Peter Ringleb; Jörg Männer; Nicola Dikow; Christian P Schaaf; Caspar Grond-Ginsbach; Dittmar Böckler
Journal:  J Clin Med       Date:  2022-06-07       Impact factor: 4.964

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.