| Literature DB >> 23675311 |
Pierre-Étienne Jacques1, Justin Jeyakani, Guillaume Bourque.
Abstract
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23675311 PMCID: PMC3649963 DOI: 10.1371/journal.pgen.1003504
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
The 75 DNase I data sets used in this study were grouped in 8 tissues.
| Tissue | Cell lines |
| Fibroblast | Normal, Normal_Park., ProgFib, Neonatal, Fetal_lung (AG04450)†, Toe (AG09309)†, Gum (AG09319)†, Gingival (HGF)†, Abdominal (AG10803)†, Lung (NHLF)†, Skin (BJ-T)†, Cardiac (HCF)† |
| Muscle | Myoblast (HSMM), Myotube (HSMMtube), Myocytes (HCM)†, Skeletal (SKMC)†, Aortic_smooth (AoSMC) |
| Epithelial | Small_air (SAEC), Esophageal (HEE)†, Choroid_plex (HCPE)†, Retinal (HRPE)†, Ciliary (HNPCE)†, Renal_cortical (HRCE)†, Renal (HRE)†, Prostate (LHSR), Amniotic (HAE)† |
| Lymphoblastoid | GM12891, GM19238, GM19239, GM19240, GM12865†, GM18507, GM12878 |
| Others | Myometrial, PanIslets, Melanocytes, Epidermal (NHEK), Endothelial (HUVEC) |
| hESC | H1esc, H7esc, H9esc |
| Solid_tumor | HepG2, HeLa-S3†, PANC-1, MCF-7††, Medullobastoma, Neuroblastoma |
| Leukemia | CMK, HL-60, NB4†, K562†, Jurkat |
Cell lines marked with a “†” or “††” had two or three biological replicates respectively.
Figure 1TEs have contributed a large fraction of accessible regions in human cells.
(A) Proportion of human DHS regions overlapping different classes of repeats based on the age of the sequence in which they are embedded. (B) Specific repeat subfamilies, called DHS-associated repeats (DARs), are over-represented and their cumulative relative contribution (Observed-Expected) is shown as a percentage of all DHS data. (C–D) Proportion of all repeat instances in the genome (All repeats) and for DAR instances in three classes of cells (Normal, ESC and Cancer). (E–F) Fraction of repeat subfamily instances that is contributing to open chromatin in at least one data set. The estimated age is in millions of years (Myrs).
Figure 2DARs are enriched for binding motifs, occupied by TFs, and show sequence conservation.
Aggregate profiles of DNase I tags (green) over the instances of different DARs: (A) LTR9B in ESC, (B) LTR13 in K562 and (C) LTR2B in GM12878. The profiles over another cell type (Nhlf) are shown as a control (dashed brown lines). The plots underneath the profiles represent the localization of regulatory motifs or ChIP-Seq peaks in the same cell lines (yellow, blue, red points). The Venn diagrams represent the proportion of repeat instances (grey) containing DHS and regulatory motifs or ChIP-Seq peaks using the same color code. (D) Proportion of DARs with at least one enriched TF (blue bars) and the total number of binding sites reported (black line) for each cell line. (E) Diagram showing the number of DARs supported by at least one TF based on ChIP-Seq or motif enrichment. (F) Scatterplot showing for each repeat subfamily, the fraction of conserved repeat instances amongst the opened instances. The black line represents a polynomial trend line of order 2. The red dashed line is the expected distribution.
Figure 3Thousands of LTR/ERV–derived sequences are activated in a cell type–specific manner, especially in ESCs.
(A) Proportion of DHS clusters that overlap repeats based on the number of distinct cell types in which they are observed. (B) Proportion of cell type-specific DARs and of all repeat subfamilies (Expected) by repeat class. (C) Average number of cell type-specific DARs per data set in normal, embryonic and cancer cell lines. (D) Heatmap showing the cell type-specific fold enrichment for the top 100 repeat subfamilies. (E) Number of instances contributing to open chromatin for the LTR2B, LTR7 and MER121 repeat subfamilies.
Figure 4Cell type–specific expression of DAR–associated genes.
(A) Distribution of the expected number of up-regulated genes in proximity to the LTR2B DAR instances in GM18265. Actual number of up-regulated genes is shown using an arrowhead. (B) UCSC genome browser view of the NAPSB gene with selected RNA-Seq and DHS ENCODE tracks (y-axis maximum set to 20 and 100 respectively). The LTR2B repeat is highlighted in pink along with its cell type-specific open chromatin and expression profiles. (C) Boxplots showing the expression values across cell types for the DAR-associated genes that are up-regulated. Red lines are connecting the expression values observed in GM18265. (D) Cell type-specific DARs have more cell type-specific expression. DARs were binned according to their cell type-specific fold enrichment and the proportion of them having a Z-score of cell type-specificity expression above 3 is shown.