| Literature DB >> 25767249 |
Mihaela Pavlicev1, Kaori Hiratsuka2, Kayleigh A Swaggart2, Caitlin Dunn2, Louis Muglia2.
Abstract
Transposable elements (TEs) comprise approximately half of the human genome, and several independent lines of investigation have demonstrated their role in rewiring gene expression during development, evolution, and oncogenesis. The identification of their regulatory effects has largely been idiosyncratic, by linking activity with isolated genes. Their distribution throughout the genome raises critical questions-do these elements contribute to broad tissue- and lineage-specific regulation? If so, in what manner, as enhancers, promoters, RNAs? Here, we devise a novel approach to systematically dissect the genome-wide consequences of TE insertion on gene expression, and test the hypothesis that classes of endogenous retrovirus long terminal repeats (LTRs) exert tissue-specific regulation of adjacent genes. Using correlation of expression patterns across 18 tissue types, we reveal the tissue-specific uncoupling of gene expression due to 62 different LTR classes. These patterns are specific to the retroviral insertion, as the same genes in species without the LTRs do not exhibit the same effect. Although the LTRs can be transcribed themselves, the most highly transcribed TEs do not have the largest effects on adjacent regulation of coding genes, suggesting they function predominantly as enhancers. Moreover, the tissue-specific patterns of gene expression that are detected by our method arise from a limited number of genes, rather than as a general consequence of LTR integration. These findings identify basic principles of co-opting LTRs for genome evolution, and support the utility of our method for the analysis of TE, or other specific gene sets, in relation to the rest of the genome.Entities:
Keywords: LTR; endogenous retrovirus; long terminal repeats; placenta; transcriptome
Mesh:
Year: 2015 PMID: 25767249 PMCID: PMC4419796 DOI: 10.1093/gbe/evv049
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
The List of LTR Elements (nomenclature following the RepBase [Jurka et al. 2005]; minimal phylogenetic distribution according to USCS, March 2015), Included in the Screen
| LTR Element | Total Numbers in Human Genome | Numbers of 10 kb Upstream of Genes | Vicinity of Genes (%) | Taxon |
|---|---|---|---|---|
| LTR78 | 4,819 | 105 | 2.18 | Mammals |
| LTR79 | 4,054 | 109 | 2.69 | Mammals |
| MLT1M | 2,956 | 108 | 3.65 | Mammals |
| LTR10A | 313 | 18 | 5.75 | |
| LTR16A | 6,966 | 220 | 3.16 | |
| LTR33 | 9,260 | 301 | 3.25 | |
| LTR67B | 3,717 | 124 | 3.34 | |
| LTR16C | 6,631 | 218 | 3.29 | |
| LTR78B | 3,281 | 65 | 1.98 | |
| LTR9 | 2,011 | 106 | 5.27 | |
| MER21C | 5,501 | 192 | 3.49 | |
| MER54B | 434 | 22 | 5.07 | |
| MLT1A | 9,070 | 231 | 2.55 | |
| MLT1A0 | 20,643 | 590 | 2.86 | |
| MLT1A1 | 6,766 | 198 | 2.93 | |
| MLT1B | 18,004 | 553 | 3.07 | |
| MLT1C | 19,824 | 644 | 3.25 | |
| MLT1D | 20,741 | 656 | 3.16 | |
| MLT1E1A | 3,362 | 82 | 2.44 | |
| MLT1E2 | 3,996 | 102 | 2.55 | |
| MLT1F | 4,297 | 167 | 3.89 | |
| MLT1F1 | 3,279 | 115 | 3.51 | |
| MLT1F2 | 6,036 | 203 | 3.36 | |
| MLT1G | 2,854 | 100 | 3.5 | |
| MLT1G1 | 3,592 | 120 | 3.34 | |
| MLT1H | 10,094 | 273 | 2.7 | |
| MLT1H1 | 3,640 | 90 | 2.47 | |
| MLT1H2 | 4,714 | 145 | 3.08 | |
| MLT1I | 11,089 | 312 | 2.81 | |
| MLT1J | 15,270 | 560 | 3.67 | |
| MLT1J1 | 4,925 | 126 | 2.56 | |
| MLT1J2 | 6,925 | 203 | 2.93 | |
| MLT1K | 18,173 | 617 | 3.4 | |
| MLT1L | 12,074 | 377 | 3.12 | |
| MLT1N2 | 5,884 | 224 | 3.81 | |
| MLT2B1 | 4,480 | 111 | 2.48 | |
| MLT2B2 | 2,209 | 80 | 3.62 | |
| MLT2B3 | 3,313 | 87 | 2.63 | |
| MLT2B4 | 4,587 | 94 | 2.05 | |
| MLT2D | 4,525 | 112 | 2.48 | |
| MSTC | 3,169 | 128 | 4.04 | |
| LTR7B | 848 | 50 | 5.9 | |
| LTR8 | 3,543 | 170 | 4.8 | |
| LTR12C | 2,740 | 206 | 7.52 | |
| LTR12D | 489 | 27 | 5.52 | |
| MER21A | 1,921 | 117 | 6.09 | |
| MER39 | 3,337 | 73 | 2.19 | |
| MER39B | 1,179 | 93 | 7.89 | |
| MER41B | 2,852 | 126 | 4.42 | |
| MLT2A1 | 3,780 | 69 | 1.83 | |
| MLT2A2 | 3,898 | 99 | 2.54 | |
| MSTA | 19,782 | 490 | 2.48 | Primates |
| MSTB | 8,562 | 247 | 2.88 | |
| MSTD | 7,665 | 251 | 3.27 | |
| MSTB1 | 5,073 | 158 | 3.11 | Primates |
| LTR2 | 887 | 61 | 6.88 | |
| LTR22B | 233 | 13 | 5.58 | |
| LTR2B | 326 | 34 | 10.42 | |
| MER11A | 964 | 53 | 5.5 | |
| THE1A | 4,233 | 93 | 2.2 | |
| THE1C | 9,874 | 233 | 2.36 | |
| THE1D | 12,642 | 305 | 2.41 |
Note.—The number of repeats for each element, and the number, and percentage of the repeats that are localized within 10 kb upstream of genes in human genome. LTR: Long Terminal Repeats, MER: MEdium Reiteration repeats
FSchematic presentation of the transcriptome comparisons. (A) Similarity of transcriptomes relates to the angle in expression space spanned by the gene-axes, here shown as the 3-dimensional space of three genes. (B) Tissue-specific TE effect on gene expression is reflected as the subspace (here 2-dimensional space) of TE-associated genes, in which specific tissue shows lower similarity to other transcriptomes, when compared with the similarity between tissues in the space of other genes in the genome.
FHeat maps for the five LTR elements that show the placenta-specific regulation of transcription. The genes colocalized with particular LTR elements are less coregulated (more divergent in expression) between pairs of tissues than genes selected at random. The color shows the odds ratio to observe the particular effect, for every pair of tissues. The presented LTRs show high specificity in placenta, as the LTR-associated genes show significantly lower coregulation with all other transcriptomes than random gene subsets.
Detected Associations between the LTR and Tissues, as well as the Genes that Contribute Strongly to the Signature
| LTR | Tissue | LTR-Associated Genes |
|---|---|---|
| LTR67B | ESC | |
| Adipose | ||
| MLT1J2 | Lung | |
| Heart | ||
| Adrenal | ||
| Thyroid | ||
| MLT1A | Brain | |
| Thyroid | ||
| Placentaa | ||
| MLT1A0 | Kidney | |
| Ovary | ||
| MLT1B | Placenta | |
| MLT1F2 | Placenta | |
| MLT1H2 | Breast | |
| MLT1C | ESC | |
| MLT1E1A | Testes | |
| Colon | ||
| Lymph node | ||
| Prostate | ||
| Brain | ||
| ESC | ||
| MLT1E2 | Adipose | |
| MLT1J1 | Adipose | |
| MLT1J | Heart | |
| Skel. muscle | ||
| MLT1J2 | Lung | |
| Heart | ||
| Thyroid | ||
| Lymph node | ||
| Adrenal | ||
| MLT1M | Kidney | |
| LTR78 | Adrenal | |
| Ovary | ||
| Kidney | ||
| Adipose | ||
| MLT1N2 | Skel. muscle | |
| MLT2A1 | Prostate | |
| LTR78B | Brain | |
| LTR16A | Brain | |
| LTR16C | WBC | |
| MER39 | ESC | |
| WBC | ||
| Ovary | ||
| Breast | ||
| Ovary | ||
| Lymph node | ||
| WBC | ||
| ESC | ||
| Kidney | ||
| THE1C | Liver | |
| Prostate | ||
| MSTA | WBC | |
| Lung |
Note.—ESC, embryonal stromal cells; WBC, white blood cells. Boldface indicates the previously documented tissue-specific effects of the particular LTR/ERV on gene expression.
aSignifies weak effect; >4 of 17 tissue comparisons are significant.
bSignifies strong pattern; >7 of 17 tissue comparisons are significant.
FDistribution of sequence divergence from the consensus sequence for the five putative placental LTRs. The histogram in the background shows the distribution of the distances for all elements. The histogram in red color shows the distribution for the elements colocalized within 10 kb upstream of the coding genes. Note that consensus is an estimated sequence and may or may not resemble ancestral state (e.g., when there were unknown waves of recent viral activity).
FThe overall levels of mapped (one or multiple times) paired reads to a set of LTR, using a placental transcriptome as well as that of two other tissues: skeletal myoblast and lung fibroblast. This mapping regards all members of particular LTR type, yielding very low normalized FPKM values. However, the consistently increased placental values for all elements are notable. The putative placenta-specific elements are marked with asterisk, and depending on the proportion of active elements involved, and the type of action, may or may not be distinguishable by increased genome-wide expression. “Hu” in the name refers to human data.
FProportion of identified expressed elements (pooled across the LTR types) in different genomic compartments. Note that most of these are not found immediately upstream of the genes, but rather in introns, LINC RNAs, and other intergenic regions.
FComparison of transcription levels of individual elements selected for transcription >1 FPKM in either of the human placental tissue, lung fibroblast, or skeletal muscle myoblast. Only the elements from figure 4 were considered.