| Literature DB >> 25853282 |
Andrew Garazha1, Alena Ivanova, Maria Suntsova, Galina Malakhova, Sergey Roumiantsev, Alex Zhavoronkov, Anton Buzdin.
Abstract
Endogenous retroviruses (ERVs) and LTR retrotransposons (LRs) occupy ∼8% of human genome. Deep sequencing technologies provide clues to understanding of functional relevance of individual ERVs/LRs by enabling direct identification of transcription factor binding sites (TFBS) and other landmarks of functional genomic elements. Here, we performed the genome-wide identification of human ERVs/LRs containing TFBS according to the ENCODE project. We created the first interactive ERV/LRs database that groups the individual inserts according to their familial nomenclature, number of mapped TFBS and divergence from their consensus sequence. Information on any particular element can be easily extracted by the user. We also created a genome browser tool, which enables quick mapping of any ERV/LR insert according to genomic coordinates, known human genes and TFBS. These tools can be used to easily explore functionally relevant individual ERV/LRs, and for studying their impact on the regulation of human genes. Overall, we identified ∼110,000 ERV/LR genomic elements having TFBS. We propose a hypothesis of "domestication" of ERV/LR TFBS by the genome milieu including subsequent stages of initial epigenetic repression, partial functional release, and further mutation-driven reshaping of TFBS in tight coevolution with the enclosing genomic loci.Entities:
Keywords: Database; LTR retrotransposon; endogenous retrovirus; genome browser; human genome; transcription factor binding site
Mesh:
Substances:
Year: 2015 PMID: 25853282 PMCID: PMC4612461 DOI: 10.1080/15384101.2015.1022696
Source DB: PubMed Journal: Cell Cycle ISSN: 1551-4005 Impact factor: 4.534
Figure 1.Correlation between NDT and RT+ for all ERV/LR families. Each data point represents a separate ERV/LR family. NDT is the normalized density of TFBS, and RT+ is the proportion of TFBS-containing elements in a family. “TFBS+/all” means RT+, and “TNT/all” means NDT.
Figure 2.Distribution of the individual TFBS and DHS-containing ERV/LR elements is dependent on the number of TFBS per single element. “TFBS consisted HERV” means individual ERV/LR elements containing TFBS, shown in red; “TFBS & DNase I consisted HERV” means individual ERV/LR elements containing both TFBS and DNase I hypersensitivity site(s), shown in blue.
List of the LTR5Hs elements selected for the experimental luciferase assay
| Name | Chromosome | Start | End | Number of DHS | Number of TFBS |
|---|---|---|---|---|---|
| Element 1 | 17 | 57367467 | 57368278 | 3 | 23 |
| Element 2 | 2 | 171119813 | 171120760 | 2 | 20 |
| Element 3 | 17 | 7959654 | 7959869 | 1 | 13 |
| Element 4 | 19 | 35411081 | 35412022 | 1 | 13 |
| Element 5 | 1 | 145501710 | 145502679 | 2 | 11 |
| Element 6 | 2 | 128300149 | 128301124 | 1 | 11 |
| Element 7 | 6 | 52626627 | 52627596 | 2 | 8 |
| Element 8 | 7 | 150724100 | 150725056 | 3 | 8 |
| Element 9 | 19 | 55455103 | 55456032 | 1 | 7 |
| Element 10 | 1 | 160621928 | 160622885 | 2 | 6 |
| Element 11 | 12 | 123235406 | 123236378 | 2 | 6 |
| Element 12 | 18 | 77720165 | 77721063 | 2 | 5 |
Figure 3.Profiling of LTR enhancer activity in luciferase reporter experiments. (A) Schematic representation of the luciferase reporter constructs. Filled arrow – individual LTR elements tested in this assay; empty arrow – SV40 promoter; black bar – luciferase gene; (B) relative enhancer activities for LTR elements 1–12, established in a dual-luciferase assay. Data show means ± standard deviations of 4 independent experiments. Data is shown for the cell lines Tera-1, NT2/D1, A549, NGP127 and HepG2.
Figure 4.Proportion of TFBS-containing elements in correlation with the divergence of each ERV/LR family from their consensus sequences. Each data point represents a separate ERV/LR family. “TFBS+/all” means RT+. The divergence is shown as a millidiv score, with each unit equal to one substitution per 1000 nucleotides.
Figure 5.Distribution of TFBS for the particular transcription factor proteins among all the ERV/LR elements, in correlation with the divergence of the respective ERV/LR elements from their consensus sequence. The distribution is shown for NF-YA (red) and Rad21 (blue) transcription factor proteins. The divergence is shown as a millidiv score, with each unit equal to one substitution per 1000 nucleotides. The Y-axis is arbitrary and is customized for each transcription factor.
Figure 6.Screen shot of the representative HERV/LR browser output page. The user settings were “LTR5Hs” as the repeat family, “chr1” as the chromosome number. The Browser displays all the LTR5Hs inserts on the 1st chromosome, featuring TFBS – positive elements. An option is shown “show list of browser HERVs” that enables listing all the elements of a selected category on the browser screen, as a table supplemented by hyperlinks to the structure of particular each element. The zooming tool is enabled to facilitate navigation.
Sequences of oligonucleotides used in this study
| LTR name | Forward oligonucleotide | Reverse oligonucleotide |
|---|---|---|
| Element 1 | TAGGTACCAGTGAGCCAAGATTGAGCC | AACGCGGCCAAGACCTCTGAGTTCCC |
| Element 2 | TAGGTACCGAAATCCAACACCCTGAGACCA | AACGCGTCAAACAACCCTAACACTTAGCA |
| Element 3 | TAGGTACCTACAACAATAAGAGAATCAGGCGG | AACGCGTGGCTAATAGAACAGAACAGGAC |
| Element 4 | TAGGTACCAGGAAGTAAACAGGAT TGGG | AACGCGTAGGAAAGGAAACAGGAGGAG |
| Element 5 | TAGGTACCATCACTCAGTCTCGGCTCAC | AACGCGTACACTCCTGTTTCTCCTTTCTC |
| Element 6 | TAGGTACCGCTCCTTTCTGTCCTGTCTG | AACGCGTCGCCACTTTCAGCTCTTCCT |
| Element 7 | TAGGTACCTATTCACTAATCAGCCCACC | AACGCGTCCCACTCTAGGATATTTCTAAGCA |
| Element 8 | TAGGTACCCTCCATACCAATAGTTCTC | AACGCGTATCTCTAGATGTCCCGTCGT |
| Element 9 | TAGGTACCGTGGACAGCTTTACCCTTGGA | AACGCGTGGCAACTATATGAAGCAGTGGA |
| Element 10 | TAGGTACCTCTATCCATTCACCATACCAC | AACGCGTGACCCATTGAAGAGTTTAAGAGG |
| Element 11 | TAGGTACCGAATCTCCCTATGCTGTCCA | AACGCGTAACTCCCATGTGTTTACCCA |
| Element 12 | TAGGTACCGGAGACCACTTTGAAGACCC | AACGCGTCTCACTGTAGCCTTGAACTG |