| Literature DB >> 20026555 |
Seon-Hee Yim1, Tae-Min Kim, Hae-Jin Hu, Ji-Hong Kim, Bong-Jo Kim, Jong-Young Lee, Bok-Ghee Han, Seung-Hun Shin, Seung-Hyun Jung, Yeun-Jun Chung.
Abstract
Recent discovery of the copy number variation (CNV) in normal individuals has widened our understanding of genomic variation. However, most of the reported CNVs have been identified in Caucasians, which may not be directly applicable to people of different ethnicities. To profile CNV in East-Asian population, we screened CNVs in 3578 healthy, unrelated Korean individuals, using the Affymetrix Genome-Wide Human SNP array 5.0. We identified 144,207 CNVs using a pooled data set of 100 randomly chosen Korean females as a reference. The average number of CNVs per genome was 40.3, which is higher than that of CNVs previously reported using lower resolution platforms. The median size of CNVs was 18.9 kb (range 0.2-5406 kb). Copy number losses were 4.7 times more frequent than copy number gains. CNV regions (CNVRs) were defined by merging overlapping CNVs identified in two or more samples. In total, 4003 CNVRs were defined encompassing 241.9 Mb accounting for approximately 8% of the human genome. A total of 2077 CNVRs (51.9%) were potentially novel. Known CNVRs were larger and more frequent than novel CNVRs. Sixteen percent of the CNVRs were observed in > or =1% of study subjects and 24% overlapped with the OMIM genes. A total of 476 (11.9%) CNVRs were associated with segmental duplications. CNVS/CNVRs identified in this study will be valuable resources for studying human genome diversity and its association with disease.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20026555 PMCID: PMC2830825 DOI: 10.1093/hmg/ddp564
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
General characteristics of CNVs and CNVRs in this study
| CNV | CNVR | |
|---|---|---|
| Total count | 144 207 | 4003 |
| CN-gain count | 25 347 | 112 |
| CN-loss count | 118 860 | 3553 |
| Complex count | — | 338 |
| Average number per genome | 40.3 | 37.5 |
| Median size (range) (kb) | 18.9 (0.2–5406) | 30.3 (0.4–5521) |
| Median size of CN-gains | 13.4 (0.2–2263) | 17.7 (1.2–345) |
| Median size of CN-losses | 20.0 (0.2–5406) | 29.9 (0.4–5521) |
| Genome coverage | — | 241.9 Mb (∼8%) |
CN-gain/loss, copy number gain/loss CNVs.
Figure 1.Size distribution of CNVs (A) and CNVRs (B) from this study with corresponding CNV/CNVRs from the DGV (August 2009 version). X-axis, the sizes in kilobases. Y-axis, the proportions of CNV/CNVRs within each size bin.
Figure 2.Examples of signal intensity ratio plots of CNVRs. (A) A CNVR on 2p12, gain only. (B) A CNVR on 10g21.1, loss only. (C) A CNVR on 4p11, gain/loss complex. X-axis, genomic coordinates (Mb). Y-axis, signal intensity ratios (test/reference) in log2 scale.
Figure 3.Degree of match between CNVRs from this study and DGV CNVRs with respect to the allele frequency. X-axis, the degree of match (%). Y-axis, the allele frequency of CNVRs.
Potential Korean-specific CNVRs with the allele frequency ≥5%
| Chr | Cytoband | Start (bp) | End (bp) | Size (kb) | Frequency (%) | Status | Genes |
|---|---|---|---|---|---|---|---|
| 1 | 1q23.3 | 162009464 | 162082267 | 72.804 | 6.74 | L | — |
| 1 | 1q31.2 | 189272340 | 189325007 | 52.668 | 5.76 | L | — |
| 2 | 2q22.1 | 140652118 | 140854714 | 202.597 | 6.99 | L | LRP1B |
| 3 | 3p24.2 | 24363248 | 24448859 | 85.612 | 5.09 | L | THRB |
| 3 | 3p13 | 72398646 | 72444156 | 45.511 | 5.28 | L | — |
| 4 | 4q13.1 | 61078573 | 61119445 | 40.873 | 6.99 | C | — |
| 5 | 5q11.2 | 50838014 | 50898889 | 60.876 | 6.09 | L | — |
| 5 | 5q14.3 | 89233437 | 89297026 | 63.59 | 5.48 | L | — |
| 7 | 7p21.3 | 11648289 | 11768235 | 119.947 | 9.03 | L | THSD7A |
| 7 | 7q21.3 | 92910385 | 93006702 | 96.318 | 5.25 | L | CALCR, MIR653,MIR489 |
| 8 | 8q21.12 | 78570181 | 78643662 | 73.482 | 6.65 | L | — |
| 11 | 11p15.1 | 21345470 | 21395190 | 49.721 | 5.25 | L | NELL1 |
| 12 | 12q21.1 | 70034230 | 70095777 | 61.548 | 7.04 | L | — |
| 13 | 13q21.1 | 55173808 | 55234854 | 61.047 | 7.57 | L | — |
| 17 | 17q21.33 | 47334364 | 47468516 | 134.153 | 5.81 | L | CA10 |
L, CNVRs containing CN-losses only; C, complex CNVRs containing both gains and losses in the same loci.
Figure 4.Association of retrotransposons with CNVRs. (A) The mean regional fractions of the three retrotransposons around CNVRs by distance from the CNVRs. X-axis, the separating distance from the CNVRs. Y-axis, the fractions (%) of the retroelements. 95% confidence intervals are shown with error bars. (B) Divergence rates of the three retrotransposons around the CNVRs. X-axis, the separating distance from the CNVRs. Y-axis, the mean milliDivergence of the retroelements. 250 milliDivergence corresponds to 25.0% of divergence rates. The orientation of repeat sequences was determined with respect to the corresponding CNVRs; concordant elements are those whose tails are more closely located to CNVRs; divergent elements are those with opposite orientation. Calculation of milliDivergence was performed following the description at the RepeatMaskers (http://www.repeatmasker.org/).
Functional enrichment analysis of genes associated with CNVRs
| Annotated functions | Genes | Gene number | Significance ( | |
|---|---|---|---|---|
| Known CNVR | Pentose and glucuronate interconversions | 14 | 25 | 9.20E − 13 |
| Starch and sucrose metabolism | 23 | 83 | 4.72E − 12 | |
| Metabolism of xenobiotics by cytochrome P450 | 21 | 70 | 7.42E − 12 | |
| Porphyrin and chlorophyll metabolism | 15 | 41 | 3.14E − 10 | |
| Antigen processing and presentation | 18 | 75 | 1.22E − 08 | |
| Androgen and estrogen metabolism | 14 | 54 | 1.80E − 07 | |
| Nervous system development | 37 | 382 | 4.73E − 05 | |
| Cell adhesion molecules | 18 | 132 | 6.54E − 05 | |
| Xenobiotic metabolic process | 5 | 11 | 9.38E − 05 | |
| Response to xenobiotic stimulus | 5 | 12 | 0.000154 | |
| Heparan sulfate biosynthesis | 6 | 19 | 0.000196 | |
| Glutathione transferase activity | 5 | 15 | 0.000519 | |
| Starch and sucrose metabolism | 7 | 31 | 0.000567 | |
| Extracellular structure organization and biogenesis | 7 | 32 | 0.000695 | |
| Type I diabetes mellitus | 8 | 43 | 0.00092 | |
| Synaptogenesis | 5 | 18 | 0.001311 | |
| 4 | 12 | 0.001955 | ||
| Classic pathway | 4 | 13 | 0.002716 | |
| Novel CNVR | Glutamate signaling pathway | 6 | 17 | 8.25E − 06 |
| Neuroactive ligand receptor interaction | 21 | 239 | 2.06E − 05 | |
| 3,5-Cyclic nucleotide phosphodiesterase activity | 5 | 13 | 3.02E − 05 | |
| Cyclic nucleotide phosphodiesterase activity | 5 | 14 | 4.58E − 05 | |
| Receptor activity | 36 | 571 | 5.45E − 05 | |
| Ionotropic glutamate receptor activity | 4 | 10 | 0.000169 | |
| Axon guidance | 13 | 127 | 0.00017 | |
| Cell adhesion molecules | 13 | 132 | 0.00025 | |
| Glutamate receptor activity | 5 | 20 | 0.000304 | |
| Long-term depression | 9 | 74 | 0.000476 | |
| Neurotransmitter secretion | 4 | 13 | 0.000534 | |
| Regulated secretory pathway | 4 | 15 | 0.000969 | |
| Transmembrane receptor activity | 25 | 410 | 0.001168 |