| Literature DB >> 24860841 |
Ahsan Huda1, Pierre R Bushel2.
Abstract
BACKGROUND: Transposable Elements (TEs) have long been regarded as selfish or junk DNA having little or no role in the regulation or functioning of the human genome. However, over the past several years this view came to be challenged as several studies provided anecdotal as well as global evidence for the contribution of TEs to the regulatory and coding needs of human genes. In this study, we explored the incorporation and epigenetic regulation of coding sequences donated by TEs using gene expression and other ancillary genomics data from two human hematopoietic cell-lines: GM12878 (a lymphoblastoid cell line) and K562 (a Chronic Myelogenous Leukemia cell line). In each cell line, we found several thousand instances of TEs donating coding sequences to human genes. We compared the transcriptome assembly of the RNA sequencing (RNA-Seq) reads with and without the aid of a reference transcriptome and found that the percentage of genes that incorporate TEs in their coding sequences is significantly greater than that obtained from the reference transcriptome assemblies using Refseq and Gencode gene models. We also used histone modifications chromatin immunoprecipitation sequencing (ChIP-Seq) data, Cap Analysis of Gene Expression (CAGE) data and DNAseI Hypersensitivity Site (DHS) data to demonstrate the epigenetic regulation of the TE derived coding sequences. Our results suggest that TEs form a significantly higher percentage of coding sequences than represented in gene annotation databases and these TE derived sequences are epigenetically regulated in accordance with their expression in the two cell types.Entities:
Keywords: Alu; DNAseI HS; Epigenetics; Exonization; Gene expression; LTR Retrotransposons; RNA-seq; Transcription; Transposable elements
Year: 2013 PMID: 24860841 PMCID: PMC4028971 DOI: 10.4172/2329-8936.1000101
Source DB: PubMed Journal: Transcr Open Access ISSN: 2329-8936
Figure 1Comparison of Gencode guided versus non-reference guided assembly of transcripts and exons.The larger circles represent all transcripts and exons whereas the smaller circles represent TE-derived transcripts and exons (a) GM12878 exons (b) GM12878 transcripts (c) K562 exons (d) K562 transcripts.
Figure 2Comparison of Refseq guided vs non-reference guided assembly of transcripts and exons.The larger circles represent all transcripts and exons whereas the smaller circles represent TE-derived transcripts and exons (a) GM12878 exons (b) GM12878 transcripts (c) K562 exons (d) K562 transcripts.
Figure 3TE-derived (a) transcripts and (b) exons Illustrated as a fraction of total.
Percentage of TE-derived transcripts and exons as a function of total transcripts and exons in GM12878 and K562 cell lines, refer to Figure 3.
| Transcripts | Exons | |||
|---|---|---|---|---|
| GM12878 | K562 | GM12878 | K562 | |
| Refseq | 19.6 | 13.4 | 4.2 | 3.7 |
| Gencode | 12.1 | 12.1 | 4.6 | 4.7 |
| Non-reference | 43.5 | 39.3 | 11.2 | 9.8 |
Figure 4Fraction of TE-derived exons in all exons vs first exons. TE-derived exons form a signifcantly higher proportion of first exons compared to all exons (a) larger circle represents all exons while the smaller circle represents first exons and (b) TE-derived exons expressed as a percentage of total exons.
Percentage of TE-derived first exons as a fraction of all exons in GM12878 and K562 cell lines, refer to Figure 4.
| All exons | First exons | ||
|---|---|---|---|
| GM12878 | TE derived | 11.2 | 43.5 |
| Non TE derived | 88.8 | 56.5 | |
| K562 | TE derived | 9.9 | 39.3 |
| Non TE derived | 90.1 | 60.7 |
Significantly differentially expressed genes (q-value<0.05) between GM12878 and K562 cell lines and have TE inserted into the first exon of the genes.
| Representative transcript | Uni Gene Cluster | Gene name | Gene symbol |
|---|---|---|---|
| NM_003975 | Hs.103527 | SH2 domain containing 2A | SH2D2A |
| NM_001039477 | Hs.10649 | Chromosome 1 open reading frame 38 | C1orf38 |
| NM_001146310 | Hs.107101 | Chromosome 1 open reading frame 86 | C1orf86 |
| NM_019089 | Hs.118727 | Hairy and enhancer of split 2 (Drosophila) | HES2 |
| NM_018420 | Hs.125482 | Solute carrier family 22, member 15 | SLC22A15 |
| NM_033312 | Hs.127411 | CDC14 cell division cycle 14 homolog A (S. cerevisiae) | CDC14A |
| NM_001765 | Hs.132448 | CD1c molecule | CD1C |
| NM_016074 | Hs.13880 | BolA homolog 1 (E. coli) | BOLA1 |
| NM_001080471 | Hs.142003 | Platelet endothelial aggregation receptor 1 | PEAR1 |
| NM_001042747 | Hs.1422 | Gardner-Rasheed feline sarcoma viral (v-fgr) oncogene homolog | FGR |
| NM_001766 | Hs.1799 | CD1d molecule | CD1D |
| NM_004672 | Hs.194694 | Mitogen-activated protein kinase kinasekinase 6 | MAP3K6 |
| NM_148902 | Hs.212680 | Tumor necrosis factor receptor superfamily, member 18 | TNFRSF18 |
| NM_030812 | Hs.2149 | Actin-like 8 | ACTL8 |
| NM_003528 | Hs.2178 | Histone cluster 2, H2be | HIST2H2BE |
| NM_001166294 | Hs.22587 | Synovial sarcoma, X breakpoint 2 interacting protein | SSX2IP |
| NM_014215 | Hs.248138 | Insulin receptor-related receptor | INSRR |
| NM_173452 | Hs.333383 | Ficolin (collagen/fibrinogen domain containing) 3 (Hakata antigen) | FCN3 |
| NM_006824 | Hs.346868 | EBNA1 binding protein 2 | EBNA1BP2 |
| NM_001007794 | Hs.363572 | Choline/ethanolamine phosphotransferase 1 | CEPT1 |
| NM_147192 | Hs.375623 | Diencephalon/mesencephalon homeobox 1 | DMBX1 |
| NM_002529 | Hs.406293 | Neurotrophic tyrosine kinase, receptor, type 1 | NTRK1 |
| NM_003517 | Hs.408067 | Histone cluster 2, H2ac | HIST2H2AC |
| NM_001193300 | Hs.408846 | Sema domain, immunoglobulin domain (Ig), transmembrane domain (TM) and short | SEMA4A |
| NM_052941 | Hs.409925 | Guanylate binding protein 4 | GBP4 |
| NM_015696 | Hs.43728 | Glutathione peroxidase 7 | GPX7 |
| NM_001143778 | Hs.437379 | ArfGAP with SH3 domain, ankyrin repeat and PH domain 3 | ASAP3 |
| NM_030764 | Hs.437393 | Fc receptor-like 2 | FCRL2 |
| NM_001033082 | Hs.437922 | V-mycmyelocytomatosis viral oncogene homolog 1, lung carcinoma derived (avian) | MYCL1 |
| NM_198715 | Hs.445000 | Prostaglandin E receptor 3 (subtype EP3) | PTGER3 |
| NM_003196 | Hs.446354 | Transcription elongation factor A (SII), 3 | TCEA3 |
| NM_005101 | Hs.458485 | ISG15 ubiquitin-like modifer | ISG15 |
| NM_001195740 | Hs.462033 | Chromosome 1 open reading frame 93 | C1orf93 |
| NM_001040195 | Hs.464438 | Angiotensin II receptor-associated protein | AGTRAP |
| NM_001048195 | Hs.469723 | Regulator of chromosome condensation 1 | RCC1 |
| NM_024772 | Hs.471243 | Zinc finger, MYM-type 1 | ZMYM1 |
| NM_002959 | Hs.485195 | Sortilin 1 | SORT1 |
| NM_001199772 | Hs.485246 | Proteasome (prosome, macropain) subunit, alpha type, 5 | PSMA5 |
| NM_178454 | Hs.485606 | DNA-damage regulated autophagy modulator 2 | DRAM2 |
| NM_002744 | Hs.496255 | Protein kinase C, zeta | PRKCZ |
| NM_001143989 | Hs.511849 | Neuroblastoma breakpoint family, member 4 | NBPF4 |
| NM_003820 | Hs.512898 | Tumor necrosis factor receptor superfamily, member 14 (herpesvirus entry mediator) | TNFRSF14 |
| NM_004000 | Hs.514840 | Chitinase 3-like 2 | CHI3L2 |
| NM_025008 | Hs.516243 | ADAMTS-like 4 | ADAMTSL4 |
| NM_001178062 | Hs.516316 | Sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) | SEMA6C |
| NM_021181 | Hs.517265 | SLAM family member 7 | SLAMF7 |
| NM_022873 | Hs.523847 | Interferon, alpha-inducible protein 6 | IFI6 |
| NM_152493 | Hs.524248 | Zinc finger protein 362 | ZNF362 |
| NM_000760 | Hs.524517 | Colony stimulating factor 3 receptor (granulocyte) | CSF3R |
| NM_001135585 | Hs.530003 | Solute carrier family 2 (facilitated glucose/fructose transporter), member 5 | SLC2A5 |
| NM_033504 | Hs.534521 | Transmembrane protein 54 | TMEM54 |
| NM_024901 | Hs.557850 | DENN/MADD domain containing 2D | DENND2D |
| NM_005529 | Hs.562227 | Heparansulfate proteoglycan 2 | HSPG2 |
| NM_001408 | Hs.57652 | Cadherin, EGF LAG seven-pass G-type receptor 2 (flamingo homolog, Drosophila) | CELSR2 |
| NM_033467 | Hs.591453 | Membrane metallo-endopeptidase-like 1 | MMEL1 |
| NM_024980 | Hs.632367 | G protein-coupled receptor 157 | GPR157 |
| NM_002143 | Hs.632391 | Hippocalcin | HPCA |
| NM_014787 | Hs.647643 | DnaJ (Hsp40) homolog, subfamily C, member 6 | DNAJC6 |
| NM_052938 | Hs.656112 | Fc receptor-like 1 | FCRL1 |
| NM_152890 | Hs.659516 | Collagen, type XXIV, alpha 1 | COL24A1 |
| NM_175065 | Hs.664173 | Histone cluster 2, H2ab | HIST2H2AB |
| NM_207397 | Hs.664836 | CD164 sialomucin-like 2 | CD164L2 |
| NM_001114748 | Hs.668654 | Chromosome 1 open reading frame 70 | C1orf70 |
| NM_144701 | Hs.677426 | Interleukin 23 receptor | IL23R |
| NM_001013693 | Hs.710255 | Low density lipoprotein receptor class A domain containing 2 | LDLRAD2 |
| NM_001037675 | Hs.714127 | Neuroblastoma breakpoint family, member 1 | NBPF1 |
| NM_152498 | Hs.729552 | WD repeat domain 65 | WDR65 |
| NM_001127714 | Hs.729693 | Human immunodeficiency virus type I enhancer binding protein 3 | HIVEP3 |
| NM_001164722 | Hs.77542 | Platelet-activating factor receptor | PTAFR |
Figure 5Relative contribution of TE-families in donating coding sequence to human gene transcripts. Absolute counts normalized by genomic background abundance for each family.
Figure 6FPKM values for all TE-derived exons were sorted and binned in 100 equal sized bins and plotted against DHS clusters, CAGE clusters, and various histone modifications that mark promoters, TSS and exons in the GM12878 cell line.
Figure 7FPKM values for all TE-derived exons were sorted and binned in 100 equal sized bins and plotted against DHS clusters, CAGE clusters, and various histone modifications that mark promoters, TSS and exons in the K562 cell line.