| Literature DB >> 25340765 |
Ruth B McCole1, Chamith Y Fonseka2, Amnon Koren3, C-Ting Wu1.
Abstract
Ultraconserved elements (UCEs) are strongly depleted from segmental duplications and copy number variations (CNVs) in the human genome, suggesting that deletion or duplication of a UCE can be deleterious to the mammalian cell. Here we address the process by which CNVs become depleted of UCEs. We begin by showing that depletion for UCEs characterizes the most recent large-scale human CNV datasets and then find that even newly formed de novo CNVs, which have passed through meiosis at most once, are significantly depleted for UCEs. In striking contrast, CNVs arising specifically in cancer cells are, as a rule, not depleted for UCEs and can even become significantly enriched. This observation raises the possibility that CNVs that arise somatically and are relatively newly formed are less likely to have established a CNV profile that is depleted for UCEs. Alternatively, lack of depletion for UCEs from cancer CNVs may reflect the diseased state. In support of this latter explanation, somatic CNVs that are not associated with disease are depleted for UCEs. Finally, we show that it is possible to observe the CNVs of induced pluripotent stem (iPS) cells become depleted of UCEs over time, suggesting that depletion may be established through selection against UCE-disrupting CNVs without the requirement for meiotic divisions.Entities:
Mesh:
Year: 2014 PMID: 25340765 PMCID: PMC4207606 DOI: 10.1371/journal.pgen.1004646
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Five types of CNVs.
(A) classicalCNVs are identified solely by variation among individuals in the copy number of genomic regions. (B) CNVs are present in an individual but not in the soma of either of the parents. (C) cancerCNAs are copy number alterations that occur specifically in the cancer cells (orange) of an individual and, therefore, are absent from the healthy cells of the same individual (black). In this study we required cancerCNAs to be recurrent between individuals. (D) somaticCNVs are defined by regions that vary in copy number among the healthy somatic cells of an individual. (E) iPSCNVs are defined by regions that vary in copy number within a population of iPS cells and which are not detectable in the fibroblast cells from which the iPS cells were derived.
Depletion of UCEs from pooled classicalCNVs is robust to the species used to define UCEs.
| Observed overlap | Expected overlap (bp) | Result | ||||||
| UCE type | Number of UCEs | bp | Mean | Standard deviation | Proportion | P-value | obs/exp | Outcome |
| HMR-HDM-HC | 368 | 97394 | 126264 | 4149 | 0.000 | 1.7×10−12 | 0.771 | Depleted |
| DMR | 239 | 61992 | 71462 | 3045 | 0.000 | 0.001 | 0.867 | Depleted |
| CoDHo | 713 | 196689 | 245830 | 5773 | 0.000 | <1.0×10−17 | 0.800 | Depleted |
| HMR | 211 | 55832 | 66356 | 2966 | 0.000 | 1.9×10−4 | 0.841 | Depleted |
HMR-HDM-HC: 896 UCEs representing the union of Human-Mouse-Rat, Human-Dog-Mouse, and Human-Chicken UCEs [2]. DMR: 527 Dog-Mouse-Rat UCEs. CoDHo: 1,696 Cow-Dog-Horse UCEs. HMR: 481 Human-Mouse-Rat UCEs. Proportion: of 1,000 expected overlap iterations, the number of times the expected overlap generated was equal to, or more extreme than, the observed UCE overlap (bp), divided by the total number of iterations, which was always 1,000. P-value: significance of whether the observed overlap (bp) differs from the expected overlaps, as determined by a Z-test. obs/exp: observed overlap (bp) divided by mean of expected overlaps (bp). Outcome: determined with a one-tailed test (α = 0.05).
Depletion of UCEs is observed in all classicalCNV datasets examined.
| Observed overlap | Expected overlap (bp) | Result | ||||||
| Dataset | Number of UCEs | bp | Mean | Standard deviation | Proportion | P-value | obs/exp | Outcome |
| Pooled classicalCNVs | 368 | 97394 | 126264 | 4149 | 0.000 | 1.7×10−12 | 0.771 | Depleted |
| Jakobsson 2008 | 17 | 3922 | 12625 | 1795 | 0.000 | 6.2×10−7 | 0.311 | Depleted |
| McCarroll 2008 | 0 | 0 | 2944 | 921 | 0.000 | 0.001 | 0.000 | Depleted |
| Matsuzaki 2009 | 17 | 4078 | 9440 | 1668 | 0.000 | 0.001 | 0.432 | Depleted |
| Shaikh 2009 | 7 | 2187 | 12981 | 1950 | 0.000 | 1.5×10−8 | 0.168 | Depleted |
| Conrad 2010 | 1 | 202 | 9453 | 1642 | 0.000 | 8.3×10−9 | 0.021 | Depleted |
| Drmanac 2010 | 0 | 0 | 4346 | 1142 | 0.000 | 7.1×10−5 | 0.000 | Depleted |
| Durbin 2010 | 343 | 90605 | 110499 | 4258 | 0.000 | 1.5×10−6 | 0.820 | Depleted |
| Campbell 2011 | 0 | 0 | 2160 | 790 | 0.000 | 0.003 | 0.000 | Depleted |
The 896 HMR-HDM-HC UCEs are depleted from all classicalCNV datasets. Proportion, P-value, obs/exp, and Outcome, as described for Table 1.
UCEs are depleted from pooled CNVs, enriched in pooled cancerCNAs, and depleted from pooled somaticCNVs and high passage iPSCNVs.
| Observed overlap | Expected overlap (bp) | Result | ||||||
| Dataset | Number of UCEs | bp | Mean | Standard deviation | Proportion | P value | obs/exp | Outcome |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
| TCGARN 2008 | 60 | 15670 | 16549 | 2091 | 0.345 | 0.337 | 0.947 | Neither |
| Walter 2009 | 121 | 32312 | 29544 | 2652 | 0.143 | 0.148 | 1.094 | Neither |
| Beroukhim 2010 | 172 | 46060 | 47042 | 3270 | 0.383 | 0.382 | 0.979 | Neither |
| Taylor 2010 | 84 | 23669 | 19685 | 2384 | 0.058 | 0.047 | 1.202 | Neither |
| TCGARN 2011 | 259 | 67447 | 57988 | 3689 | 0.005 | 0.005 | 1.163 | Enriched |
| Curtis 2012 | 51 | 13893 | 11872 | 1869 | 0.141 | 0.140 | 1.170 | Neither |
| TCGARN 2012 breast | 156 | 42421 | 26852 | 2677 | 0.000 | 3.0×10−9 | 1.580 | Enriched |
| TCGARN 2012 colon | 26 | 6813 | 10016 | 1672 | 0.021 | 0.028 | 0.680 | Neither |
| TCGARN 2012 squamous | 218 | 58477 | 51127 | 3424 | 0.020 | 0.016 | 1.144 | Enriched |
| Robinson 2012 | 60 | 16125 | 11569 | 1844 | 0.010 | 0.007 | 1.394 | Enriched |
| Walker 2012 | 893 | 240548 | 233338 | 1433 | 0.000 | 2.4×10−7 | 1.031 | Enriched |
| Zhang 2012 | 37 | 10028 | 12325 | 1934 | 0.120 | 0.118 | 0.814 | Neither |
| TCGARN 2013 | 83 | 21061 | 28531 | 2709 | 0.001 | 0.003 | 0.738 | Depleted |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
| Forsberg 2012 | 23 | 6576 | 4195 | 1061 | 0.018 | 0.012 | 1.568 | Enriched |
| Jacobs 2012 | 265 | 69836 | 70178 | 3831 | 0.459 | 0.464 | 0.995 | Neither |
| Laurie 2012 | 264 | 69935 | 67605 | 3559 | 0.259 | 0.256 | 1.034 | Neither |
| McConnell 2013 | 221 | 60353 | 70021 | 3747 | 0.005 | 0.005 | 0.862 | Depleted |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Hussein 2011 low passage | 13 | 2926 | 3086 | 943 | 0.466 | 0.433 | 0.948 | Neither |
| Hussein 2011 medium passage | 12 | 3428 | 5196 | 1240 | 0.069 | 0.077 | 0.660 | Neither |
| Hussein 2011 high passage | 1 | 211 | 1980 | 761 | 0.001 | 0.010 | 0.107 | Depleted |
| Laurent 2011 high passage | 5 | 1068 | 1962 | 768 | 0.108 | NA | 0.544 | NA |
Here, we show the relationship between 896 HMR-HDM-HC UCEs and CNVs, cancerCNAs, somaticCNVs, and iPSCNVs, reporting the results for pooled datasets as well as all individual datasets that met our requirement for 20 Mb of coverage (Table S3). Individual CNV and CNA datasets are named according to the first author and the year of the study.
The pooled CNV dataset included datasets from Xu 2008 [69], Itsara 2010 [70], Malhotra 2011 [71], and Sanders 2011 [72], which were too small to be considered on their own.
The pooled cancerCNA dataset included all the cancerCNA datasets listed in this table, except for Walker 2012 [87], which was excluded to avoid bias from its extensive coverage of the genome, and also included the datasets Bullinger 2010 [78], Nik-Zainal 2012 [85], Holmfeldt 2013 [89], and Weischenfeldt 2013 [91], which were too small to be considered on their own.
The pooled somaticCNV dataset included the four somaticCNV datasets listed in this table as well as Piotrowski 2008 [63] and O'Huallachain 2012 [67], which were too small to be considered on their own.
The pooled iPSCNV datasets were comprised of CNVs from low, medium, and high passage iPS cells from the two datasets Hussein 2011 [100] and Laurent 2011 [98]. Proportion, P-value, and obs/exp, as described for Table 1. Outcome: determined with a one-tailed test (α = 0.05) for the pooled CNV dataset because dataset was analyzed prior to our discovery that CNVs can be enriched for UCEs; all other assessments of depletion or enrichment carried out with a two-tailed test (P≤0.025 in each tail for an overall α of 0.05). NA (not applicable): expected overlaps not normally distributed, precluding a Z-test.
Figure 2Partial correlation analyses.
The positive correlation between the positions of UCEs and cancerCNAs (first row) and the negative correlation between the positions of UCEs and classicalCNVs (second row) remain even after accounting for the correlation between the positions of UCEs and the genomic features listed across the top. P-values correspond to analyses in which the genome was divided into 50 kb windows and then assessed for the number of base pairs encompassed by the various genetic features within each window. Analyses using 10 kb and 100 kb bins also produced significant values across the board.
Figure 3Timescales through which different types of genomic variation have been present and their relationships to UCEs.