| Literature DB >> 19997636 |
Louise V Wain1, Inti Pedroso, John E Landers, Gerome Breen, Christopher E Shaw, P Nigel Leigh, Robert H Brown, Martin D Tobin, Ammar Al-Chalabi.
Abstract
BACKGROUND: The genetic contribution to sporadic amyotrophic lateral sclerosis (ALS) has not been fully elucidated. There are increasing efforts to characterise the role of copy number variants (CNVs) in human diseases; two previous studies concluded that CNVs may influence risk of sporadic ALS, with multiple rare CNVs more important than common CNVs. A little-explored issue surrounding genome-wide CNV association studies is that of post-calling filtering and merging of raw CNV calls. We undertook simulations to define filter thresholds and considered optimal ways of merging overlapping CNV calls for association testing, taking into consideration possibly overlapping or nested, but distinct, CNVs and boundary estimation uncertainty. METHODOLOGY AND PRINCIPALEntities:
Mesh:
Year: 2009 PMID: 19997636 PMCID: PMC2780722 DOI: 10.1371/journal.pone.0008175
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sex and age of sample population following sample quality control.
| Case | Control | Overall | |||||||
| n | Mean age | Age range | n | Mean age | Age range | n | Mean age | Age range | |
|
|
| 52.0 | 21–86.5 |
| 62.4 | 25–95 |
| 57.2 | 21–95 |
|
|
| 56.9 | 19–83 |
| 54.9 | 21–89 |
| 55.8 | 19–89 |
|
|
| 53.8 | 19–86.5 |
| 59.2 | 21–95 |
| 56.6 | 19–95 |
Overall summary of CNV calls across 1194 (575 cases and 619 controls) individuals.
| Type | Cases | Controls | P value | |
| Duplications | Total | 4015 | 3941 | |
| Mean per individual (range) | 7.03 (0–21) | 6.44 (0–21) |
| |
| Median size, kb (range) | 202.3 (0–3074) | 160.3 (0–10000) |
| |
| Heterozygous deletions | Total | 1416 | 1685 | |
| Mean per individual (range) | 1.13 (0–6) | 1.12 (0–2) | 0.4115 | |
| Median size, kb (range) | 47.6(0.57–1513) | 46.8 (0.35–9888) | 0.1127 | |
| Homozygous deletions | Total | 128 | 136 | |
| Mean per individual (range) | 2.77 (0–14) | 2.99 (0–16) | 0.2247 | |
| Median size, kb (range) | 5.38(0–425) | 5.5(0–383) | 0.8699 | |
| Total | 5559 | 5762 |
Significant p values are given in bold.
ALS-specific and control-specific CNV counts.
| Type | ALS-specific (n) | Control-specific (n) | P value | ALS-specific: median size (kb) | Control-specific: median size (kb) | P value |
| All | 119 | 136 | 0.69 | 90.6 | 79.1 | 0.52 |
| Heterozygous deletions | 50 | 50 | 0.68 | 53.9 | 49.8 | 0.48 |
| Homozygous deletions | 5 | 4 | 0.65 | 2.4 | 23.4 | 0.62 |
| Duplications | 59 | 68 | 0.74 | 158.7 | 103.0 | 0.14 |
| Multiallelic | 5 | 9 | 0.36 | 49.5 | 395.1 |
|
| Deletion (het or hom) | 0 | 5 |
| na | 8.3 | na |
ALS-specific CNVs defined as those present in 2 or more ALS cases but in no controls. Control-specific defined as those present in 2 or more controls but no ALS cases.
Pearson's chi-squared test, 1 df.
Mann-Whitney U-test.
Summary of association results with p<0.01 (using Fishers exact test).
| Chr | Start (bp) | End (bp) | ALS | controls | P value | Distance from centromere (kb) | Overlapped genes | CNP | Frequency across all samples |
|
| |||||||||
| 5 | 45850032 | 46384240 | 264 | 174 | 1.07×10−9 | 324.3 | - | 2 | 0.367 |
| 8 | 47062007 | 47406312 | 30 | 8 | 8.21×10−5 | 276.1 | - | 0 | 0.032 |
| 12 | 36528296 | 36801139 | 157 | 109 | 2.65×10−4 | 521.8 | - | 1 | 0.223 |
| 19 | 32615675 | 32935836 | 165 | 122 | 5.87×104 | 2852.1 | RDH13 | 0 | 0.240 |
| 7 | 61663407 | 62155064 | 172 | 132 | 0.0018 | 851.0 | - | 0 | 0.255 |
| 3 | 33270957 | 33296620 | 3 | 18 | 0.0027 | 57303.8 |
| 0 | 0.018 |
| 8 | 47062007 | 47711911 | 3 | 18 | 0.0032 | 428.9 | - | 1 | 0.018 |
| 8 | 43689385 | 43910848 | 74 | 52 | 0.0046 | 157.9 | - | 0 | 0.106 |
| 16 | 969913 | 1834962 | 8 | 15 | 0.0056 | 33740.9 | SOX8, SSTR5, C1QTNF8, | 1 | 0.019 |
| 5 | 28842013 | 28912873 | 7 | 0 | 0.0058 | 17563.9 | - | 0 | 0.006 |
| 4 | 761587 | 1014752 | 16 | 4 | 0.0068 | 48466.7 |
| 0 | 0.017 |
|
| |||||||||
| 22 | 21011312 | 21394287 | 0 | 11 | 0.002 | 6872.8 |
| 0 | 0.009 |
A reciprocal overlap threshold of >70% was used.
Number of CNPs from the McCarroll CNP map[37] that also overlap this region.
Regions that were within the telomeric chromosome band. Seven additional regions identified to show significant association with ALS were unable to be mapped to build 36 of the human genome. These regions are given in table S1. Genes which were also identified with p<0.01 in the gene-based analysis are given in italics. Genes which may be reasonable ALS candidates are in bold.
Sensitivity analyses.
| chr | start | end | P value (70%) | >0% | >50% | 100% |
| Gains | ||||||
| 5 | 45850032 | 46384240 | 1.07E-09 |
|
|
|
| 8 | 47062007 | 47406312 | 8.21E-05 | - |
|
|
| 12 | 36528296 | 36801139 | 0.000265 |
|
|
|
| 19 | 32615675 | 32935836 | 0.000587 |
|
|
|
| 7 | 61663407 | 62155064 | 0.0018 |
|
|
|
| 3 | 33270957 | 33296620 | 0.0027 |
|
|
|
| 8 | 47062007 | 47711911 | 0.0032 | - |
|
|
| 8 | 43689385 | 43910848 | 0.0046 | - |
|
|
| 16 | 969913 | 1834962 | 0.0056 | - |
|
|
| 5 | 28842013 | 28912873 | 0.0058 |
|
| - |
| 4 | 761587 | 1014752 | 0.0068 |
|
|
|
| 5 | 510955 | 738748 | 0.012 | - | - | - |
| 20 | 60320976 | 60493125 | 0.016 |
|
| - |
| 19 | 32615675 | 32851754 | 0.022 |
|
|
|
| 6 | 58675121 | 58878583 | 0.024 |
|
|
|
| 19 | 60270514 | 60293927 | 0.025 |
|
|
|
| 14 | 103232016 | 103721150 | 0.025 | - |
| - |
| 20 | 60214968 | 60493125 | 0.035 |
|
| - |
| 16 | 1744358 | 1781034 | 0.038 | - |
|
|
| 14 | 104197399 | 104356204 | 0.043 | - | - | - |
| 8 | 142512205 | 142529990 | 0.046 | - | - | - |
| 3 | 101837214 | 101916282 | 0.047 |
|
|
|
| 21 | 43646295 | 43663581 | 0.047 |
|
|
|
| 14 | 19375271 | 19536664 | 0.049 |
|
| - |
| 11 | 382079 | 434628 | 0.049 |
|
|
|
| Losses | ||||||
| 22 | 21011312 | 21394287 | 0.002 |
|
| - |
| 10 | 82869699 | 82882268 | 0.013 |
|
|
|
| 11 | 539119 | 652407 | 0.025 |
|
| - |
| 17 | 43969101 | 44059535 | 0.032 | - |
| - |
| Losses/gains | ||||||
| 8 | 47224322 | 47711911 | 0.003 | - |
|
|
| 8 | 144686338 | 144765210 | 0.012 |
|
| - |
For each of the loci defined using a 70% reciprocal overlap that reached p<0.05, the raw CNV calls within the regions were re-assigned to CNV loci using different reciprocal overlaps (>0%, >50% or 100%). Where re-assignment resulted in more than one CNV locus within the region defined using 70% overlap, the locus giving the highest significance was included.
p value<0.01,
p value<0.05, - p value>0.05.
Results of gene-based association approach.
| Gene | Pooled gains and losses | Losses | Gains | ||||||
| Cases | Controls | P value | Cases | Controls | P value | Cases | Controls | P value | |
| C11orf35 | 14 | 0 | 3.96E-05 | 8 | 0 | 0.002988 | 6 | 0 | 0.01271 |
| LRRC56 | 14 | 0 | 3.96E-05 | 8 | 0 | 0.002988 | 6 | 0 | 0.01271 |
| RASSF7 | 13 | 0 | 8.13E-05 | 7 | 0 | 0.006159 | 6 | 0 | 0.01271 |
| MUPCDH | 12 | 0 | 0.000167 | 6 | 0 | 0.01271 | 6 | 0 | 0.01271 |
| PHRF1 | 12 | 0 | 0.000167 | 6 | 0 | 0.01271 | 6 | 0 | 0.01271 |
| IRF7 | 12 | 0 | 0.000167 | 6 | 0 | 0.01271 | 6 | 0 | 0.01271 |
| SCT | 12 | 0 | 0.000167 | 6 | 0 | 0.01271 | 6 | 0 | 0.01271 |
| DRD4 | 11 | 0 | 0.000343 | 6 | 0 | 0.01271 | 5 | 0 | 0.026252 |
|
| 10 | 0 | 0.000705 | 7 | 0 | 0.006159 | 3 | 0 | 0.112288 |
| NAPRT1 | 9 | 0 | 0.00145 | 6 | 0 | 0.01271 | 3 | 0 | 0.112288 |
| C8orf73 | 9 | 0 | 0.00145 | 6 | 0 | 0.01271 | 3 | 0 | 0.112288 |
| GSDMD | 9 | 0 | 0.00145 | 6 | 0 | 0.01271 | 3 | 0 | 0.112288 |
| FBXL2 | 4 | 21 | 0.001771 | 4 | 21 | 0.001771 | 0 | 0 | na |
| BCR | 8 | 26 | 0.004875 | 1 | 4 | 0.375881 | 7 | 22 | 0.013279 |
| C14orf2 | 7 | 0 | 0.006159 | 7 | 0 | 0.006159 | 0 | 0 | na |
|
| 7 | 0 | 0.006159 | 7 | 0 | 0.006159 | 0 | 0 | na |
| DEAF1 | 7 | 0 | 0.006159 | 6 | 0 | 0.01271 | 1 | 0 | 0.482008 |
| LAMA5 | 27 | 56 | 0.006206 | 27 | 56 | 0.006206 | 0 | 0 | na |
| PRAME | 9 | 27 | 0.006383 | 2 | 5 | 0.454278 | 7 | 22 | 0.013279 |
| FGFRL1 | 19 | 6 | 0.007514 | 19 | 6 | 0.007514 | 0 | 0 | na |
| IDUA | 19 | 6 | 0.007514 | 19 | 6 | 0.007514 | 0 | 0 | na |
| SLC26A1 | 19 | 6 | 0.007514 | 19 | 6 | 0.007514 | 0 | 0 | na |
| GATA5 | 3 | 15 | 0.008199 | 3 | 15 | 0.008199 | 0 | 0 | na |
| ZNF280B | 9 | 26 | 0.009548 | 2 | 4 | 0.6881 | 7 | 22 | 0.013279 |
| ZNF280A | 9 | 26 | 0.009548 | 2 | 4 | 0.6881 | 7 | 22 | 0.013279 |
Results are ranked by p-value when gains and losses are pooled together. A 2-sided Fishers exact test was used to test for association. Genes which we considered reasonable ALS candidates are in bold.
Figure 1All CNV calls and CNV regions overlapping CNV region chr11: 539119-652407.
Raw CNV calls are shown in dark grey. No control losses or gains were detected within the region shown but 8 gains in cases and 6 losses in cases were detected. Blue bars represent CNV regions defined by merging CNV calls that share a greater than 70% reciprocal overlap. The purple bar represents the CNV region found to show significant association in this study: the CNV calls that were merged into this CNV region are indicated with an asterisk. Genes are shown at the bottom. Genomic position is shown along the top.
Figure 2Sample quality measures (log R ratio and B allele frequency outlier rates and standard deviations).
Maximum LRR standard deviation against maximum BAF standard deviation (top) and maximum LRR outlier rate against maximum BAF outlier rate (bottom). Thresholds for exclusion are shown in red.
Figure 3Clustering CNV calls into CNV loci based on a reciprocal overlap threshold of 50%.
Each coloured bar represents one CNV call in a single individual (note: the method used here cannot distinguish overlapping calls in the same individual). CNV loci are defined by vertical dashed lines. CNV locus 1 shows three CNV calls that each share a greater than 50% reciprocal overlap with each of the other CNV calls at that locus. Overlapping CNV loci 2 and 3 result from three overlapping CNV calls, of which only two share a reciprocal overlap of greater than 50%. CNV locus 4 and CNV locus 5 are an example of how nested CNVs can occur.