| Literature DB >> 19966329 |
Mamoru Kato1, Takahisa Kawaguchi, Shumpei Ishikawa, Takayoshi Umeda, Reiichiro Nakamichi, Michael H Shapero, Keith W Jones, Yusuke Nakamura, Hiroyuki Aburatani, Tatsuhiko Tsunoda.
Abstract
Copy number variations (CNVs) are universal genetic variations, and their association with disease has been increasingly recognized. We designed high-density microarrays for CNVs, and detected 3000-4000 CNVs (4-6% of the genomic sequence) per population that included CNVs previously missed because of smaller sizes and residing in segmental duplications. The patterns of CNVs across individuals were surprisingly simple at the kilo-base scale, suggesting the applicability of a simple genetic analysis for these genetic loci. We utilized the probabilistic theory to determine integer copy numbers of CNVs and employed a recently developed phasing tool to estimate the population frequencies of integer copy number alleles and CNV-SNP haplotypes. The results showed a tendency toward a lower frequency of CNV alleles and that most of our CNVs were explained only by zero-, one- and two-copy alleles. Using the estimated population frequencies, we found several CNV regions with exceptionally high population differentiation. Investigation of CNV-SNP linkage disequilibrium (LD) for 500-900 bi- and multi-allelic CNVs per population revealed that previous conflicting reports on bi-allelic LD were unexpectedly consistent and explained by an LD increase correlated with deletion-allele frequencies. Typically, the bi-allelic LD was lower than SNP-SNP LD, whereas the multi-allelic LD was somewhat stronger than the bi-allelic LD. After further investigation of tag SNPs for CNVs, we conclude that the customary tagging strategy for disease association studies can be applicable for common deletion CNVs, but direct interrogation is needed for other types of CNVs.Entities:
Mesh:
Year: 2009 PMID: 19966329 PMCID: PMC2816609 DOI: 10.1093/hmg/ddp541
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Figure 1.Definitions of CNVs and the typical observed pattern. (A) Definitions of CNVs. CNV segments are the chromosomal segments with CNVs for each individual (blue). CNV regions are the union of overlapping CNV segments (red). CNV events are the union of CNV segments that have the same start and end positions (black). CNV fragments are the parts of CNV segments that are divided with the start and end positions of any CNV segments (red circles). CNV fragment-sites are the union of CNV fragments (green). A fragment-segment rate is the proportion of the number of individuals with CNV fragments to the number of individuals with any CNV segments at a CNV fragment-site. (B) The typical segment pattern in a CNV region (chr 18: 45 938 595 to 45 956 033 for CEU). The red and four blue lines indicate a region (17 kb) and segments, respectively. Most CNV regions (89–90%) had a simple segment pattern characterized by two features: no individual with multiple segments and only one ‘core’ fragment-site, which was a fragment-site with a 100% fragment-segment rate.
Statistics of CNV regions
| CEU and YRI together | CEU | YRI | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Nsp1.3M | 500KEA | WGTP | Nsp1.3M | 500KEA | WGTP | Nsp1.3M | 500KEA | WGTP | |
| Count | 6184 | 699 | 669 | 2986 | 379 | 484 | 4083 | 417 | 469 |
| Genomic coverage (bp) | 224 M | 72 M | 240 M | 123 M | 47 M | 176 M | 156 M | 40 M | 168 M |
| Median length (bp) | 12 700 | 31 367 | 228 858 | 12 241 | 37 270 | 224 588 | 12 700 | 30 990 | 233 836 |
These statistics pertain to CNV regions on the autosomal chromosomes. The statistics of 500KEA and WGTP were calculated from Redon et al. (1).
Figure 2.Comparison of CNV regions in Nsp1.3M with those in 500KEA. The length of CNV regions detected with one platform versus the number of regions. The number of regions that did and did not overlap with those from the other platform is shown in red and olive, respectively. (A) CNV regions detected with Nsp1.3M. (B) CNV regions previously detected with 500KEA.
Number of common, relatively common and rare CNV regions
| CEU | YRI | |
|---|---|---|
| Common | 133 (5.7%) | 187 (6.0%) |
| Relatively common | 427 (18.4%) | 729 (23.5%) |
| Rare | 1,760 (75.9%) | 2,185 (70.5%) |
The common, relatively common and rare CNV regions were CNV regions for which one minus the frequency of A1 was ≥5%, 1–5% and <1%, respectively.
Figure 3.Frequency spectrums. These counts are based on allelic copy numbers and the derived diploid copy numbers that are classified by their population frequency. The width of each bin is 2%. Alleles with a very small or large frequency of <0.1% or >99.9% are excluded from the counts. (A) The allele frequency spectrum. (B) The frequency spectrum of diploid copy numbers.
CNV regions with high population differentiation (Fst > 0.1)
| Chr | Start | End | Overlapping gene | |
|---|---|---|---|---|
| 1 | 149365093 | 149419009 | 0.427 | |
| 2 | 34576366 | 34662239 | 0.469 | None |
| 4 | 10063092 | 10086289 | 0.106 | |
| 4 | 34595900 | 34663168 | 0.485 | None |
| 6 | 32060463 | 32136004 | 0.209 | |
Fst is a commonly used statistic to estimate population differentiation, ranging from 0 (undifferentiated) to 1 (population-specific). The start and end positions in the table indicate the boundaries of the union of the CEU and YRI CNV regions. Bold letters indicate that those CNV regions were not reported for either population in the previous study (1).
Association of CNV and flanking regions with sequence features
| CNV region or flanking region | Sequence feature | Odds ratio, CEU | Odds ratio, YRI |
|---|---|---|---|
| Common | Segmental duplication | 6.93 | 4.56 |
| Relatively common | 3.88 | 2.80 | |
| Rare | 1.92 | 2.00 | |
| Flanking around common | 5.23 | 3.51 | |
| Flanking around relatively common | 2.83 | 2.03 | |
| Flanking around rare | 1.41 | 1.47 | |
| Common | Gene | 2.15a | 1.83a |
| Relatively common | 1.34a | 1.14a | |
| Rare | 1.17a | 1.24a | |
| Flanking around common | 2.00a | 1.79a | |
| Flanking around relatively common | 1.38a | 1.23a | |
| Flanking around rare | 1.25a | 1.25a | |
| Common | Repetitive element | 1.45 | 1.57 |
| Relatively common | 1.30 | 1.29 | |
| Rare | 1.27 | 1.24 | |
| Flanking around common | 1.39 | 1.42 | |
| Flanking around relatively common | 1.27 | 1.25 | |
| Flanking around rare | 1.22 | 1.21 |
Flanking regions are regions up to 10 000 bp from the boundaries of common, relatively common or rare CNV regions. For an odds ratio, we first calculated the summed number of bases that overlapped with CNV regions and regions of a sequence feature, that of bases that overlapped with only CNV regions, that of bases that overlapped with only sequence feature regions, and that of bases that did not overlap with either of them. We used these four summed numbers to calculate an odds ratio. The superscript ‘a’ represents the reciprocal number of a calculated odds ratio with the value of below one so that this number can be easily comparable with other odds ratios. All the odds ratios were significant (P < 10−6) in the Fisher's exact test.
Figure 4.CNV–SNP LD. LD versus distance for (A) bi-allelic CNVs and (B) two-way tri-allelic CNVs. The numbers of bi-allelic CNVs and tri-allelic CNVs were 503 and 875 and 32 and 22 for CEU and YRI, respectively. The distance between a CNV and a SNP was measured from either boundary of a CNV region to a SNP position. The distances were binned in a 10-kb width, and the median of (≥10) LD values was plotted against the middle distance of the bin. ‘SNP (permutated)’ indicates that the SNP genotype data were permutated across individuals, and the error bars indicate the standard deviation. ‘SNP (adjusted)’ indicates that the minor allele frequencies of one half of the SNP pairs in SNP–SNP LD were adjusted to those of CNVs. The larger and relatively larger frequencies indicate ≥10% and 1–10% frequencies of the deletion/duplication alleles, respectively.
Figure 5.Number of CNVs tagged by SNPs. Number of tagged CNVs versus the cutoff association values (R2 and conditional probability). We searched for tag SNPs up to 200 kb from the boundaries of each CNV region. ‘c.p.’ indicates conditional probability. (A) For bi-allelic CNVs. (B) For two-way tri-allelic CNVs.