Literature DB >> 35950452

Z-DNA-Containing Long Terminal Repeats of Human Endogenous Retrovirus Families Provide Alternative Promoters for Human Functional Genes.

Du Hyeong Lee1,2,3, Woo Hyeon Bae1,2,3, Hongseok Ha4,3, Eun Gyung Park1,2, Yun Ju Lee1,2, Woo Ryung Kim1,2, Heui-Soo Kim5,2.   

Abstract

Transposable elements (TEs) account for approximately 45% of the human genome. TEs have proliferated randomly and integrated into functional genes during hominoid radiation. They appear as right-handed B-DNA double helices and slightly elongated left-handed Z-DNAs. Human endogenous retrovirus (HERV) families are widely distributed in human chromosomes at a ratio of 8%. They contain a 5'-long terminal repeat (LTR)-gag-pol-env-3'-LTR structure. LTRs contain the U3 enhancer and promoter region, transcribed R region, and U5 region. LTRs can influence host gene expression by acting as regulatory elements. In this review, we describe the alternative promoters derived from LTR elements that overlap Z-DNA by comparing Z-hunt and DeepZ data for human functional genes. We also present evidence showing the regulatory activity of LTR elements containing Z-DNA in GSDML. Taken together, the regulatory activity of LTR elements with Z-DNA allows us to understand gene function in relation to various human diseases.

Entities:  

Keywords:  Z-DNA; gene function; human diseases; human endogenous retrovirus; long terminal repeat elements

Mesh:

Substances:

Year:  2022        PMID: 35950452      PMCID: PMC9385571          DOI: 10.14348/molcells.2022.0060

Source DB:  PubMed          Journal:  Mol Cells        ISSN: 1016-8478            Impact factor:   4.250


INTRODUCTION

The human genome contains several transposable elements (TEs) introduced by exogenous retroviral infection in the germline cells of our ancestors (Lower et al., 1996). Human endogenous retroviruses (HERVs) with autonomous retroelements are the most well-known retrotransposons (Havecker et al., 2004). HERV insertions comprise two long terminal repeats (LTRs) flanking an internal region that encodes protein-coding genes (gag, pol, env) necessary for retroviral replication and propagation (Jern and Coffin, 2008). Using reverse transcriptase (RTase) encoded by the pol gene of HERVs, the genes randomly integrated into the human genome (Supplementary Fig. S1A) (Kim et al., 2004; Yi et al., 2004). They then processed multiple duplication events during hominoid radiation. HERVs comprise up to 8% of the human genome and are dispersed throughout the genome (Medstrand and Mager, 1998). They cause many mutation events, including deletion of subgenomes, insertions of other transposons (Alu, LINE, ERV, and DNA transposons), and homologous recombination between the 5′-LTR and 3′-LTR of HERV sequences. Both LTRs of one HERV show more similarity than with the LTR sequence of any other HERV; therefore, they allow the production of a solitary LTR element (Huh et al., 2006; Medstrand and Mager, 1998; Thomas et al., 2018). The structure of LTR elements includes a hormone responsive element, enhancer, promoter TATA box (located within the U3 region), polyadenylation signal AATAAA (located within the R region), and the U5 region (Supplementary Fig. S1B) (Sverdlov, 2000). LTRs can influence host gene expression by acting as regulatory elements (promoters or enhancers) (Durnaoglu et al., 2021; Montension et al., 2018; Ruda et al., 2004). A left-handed double-helical Z-DNA fragment was identified using X-ray diffraction analysis (Dickerson et al., 1982; Drew et al., 1980; Wang et al., 1981). Purine-pyrimidine alternating sequences, such as poly(dT-dG)-poly(dC-dA), have been shown to adopt the Z-DNA conformation in the presence of high CsCl concentrations and in ethanolic solutions (Zimmer et al., 1982). Stretches of the dC-dG alternating sequence [Z(C-G) element] were found to be moderately repetitive in human, mouse, and salmon genomes (Hamada et al., 1982). The abundant occurrence and evolutionary conservation of the Z(T-G) and Z(C-G) elements could have important biological implications as they could be involved in regulating gene expression and act as hotspots for gene recombination or rearrangement (Hamada et al., 1982). Computer programs (Z-hunt and Z-hunt-II) have been developed to search for genomic sequences in regions most likely to adopt the Z-conformation (Ho et al., 1986; Schroth et al., 1992). The recently developed deep-learning approach, DeepZ, aggregates information from genome-wide maps of epigenetic markers, transcription factors, RNA polymerase-binding sites, and chromosome accessibility maps (Beknazarov et al., 2020). Z-DNA has been found to form in actively transcribed regions of the genome and has been confirmed using ChIP-Seq, indicating that Z-DNA formation depends on chromatin structure as well as sequence composition and is associated with active transcription in human cells (Shin et al., 2016). Potential Z-DNA-forming sequences (ZFS) are abundant near the transcriptional start sites of genes (Li et al., 2009; Schroth et al., 1992). This suggests that Z-DNA plays a biological role in transcriptional regulation and that RNA polymerase II accumulates local negative supercoiling, creating a suitable environment for Z-DNA formation (Herbert and Rich, 2001; Liu and Wang, 1987). For instance, the highly conserved negative regulatory element (NRE) at the 5'-UTR of the human ADAM12 gene acts as a transcriptional repressor. The NRE contains a stretch of a dinucleotide-repeat sequence that can adopt a Z-DNA conformation. ZFS negatively regulate ADAM-12 expression in normal cells (Ray et al., 2011). Further, hypoxia-inducible factor 1 (HIF-1) regulates allelic variation in SLC11A1 expression by binding directly to Z-DNA-forming microsatellites during macrophage activation due to infection or inflammation (Bayele et al., 2007). In this review, we summarize the functional genes containing alternative promoters derived from LTR elements that overlap Z-DNA prediction sites. We also indicate the location of alternative promoters, LTR elements, and Z-DNA prediction sites analyzed using the Z-hunt and DeepZ programs. Finally, we discuss variant isoforms introduced by alternative splicing as biomarkers for the detection of human diseases associated with LTR elements and Z-DNA.

IDENTIFICATION AND CHROMOSOMAL LOCATION OF HERV LTR ELEMENTS AND Z-DNA

Retrotransposon activity is linked with Z-DNA-forming sites that overlap with recombination hotspots (Blaho and Wells, 1989; Wahls et al., 1990). A large portion of ZFS are enriched in promoter regions and contain sequences with high potential to form Z-DNA. The Z-DNA-forming sites identified using ChIP-Seq are associated with actively transcribed regions (Shin et al., 2016). ZFSs are also abundant in transposable elements (Alu) (Herbert, 2019). Alternative splicing and Z-formation appear in genes with Alu repeats and dsRNA editing of transcripts. Homologous recombination between the 5′-LTR and 3′-LTR of HERVs results in excision of structural genes (gag-pol-env), leaving a solitary LTR element (Kim, 2012; Ruda et al., 2004; Thomas et al., 2018). LTRs have regulatory potential to host protein-coding genes because of their highly enriched transcription factor-binding sites (Ito et al., 2017; Yu et al., 2013). Global information about ZFS positioning could provide useful information for further understanding Z-DNA structure-dependent transcriptional regulation. Elucidation of ZFS in HERV LTR elements has revealed variant isoforms of functional genes in relation to alternative promoters. Analysis of the genomic position between LTRs and ZFS is needed to determine whether LTR elements contain ZFSs. The positions of TEs including LTR class were obtained from the group “Repeats” (RepeatMasker) on the table browser of UCSC genome browser from the human genome (hg19) (Kent et al., 2002). A dataset (https://github.com/Nazar1997/DeepZ/tree/master/annotation) annotated in a previous study was adopted to obtain the genomic positions of ZFS (Beknazarov et al., 2020). This dataset included experimental Z-DNA regions and putative regions generated using Z-hunt (Schroth et al., 1992) or DeepZ (Beknazarov et al., 2020). The IntersectBed module from Bedtools was used to identify TE coordinates that overlapped with the ZFS position. The output files were modified using an in-house Python code. In this processing step, only the LTR elements belonging to the LTR repeat class were considered for downstream analyses. The modified file was used as an input for the web-based PhenoGram (http://visualization.ritchielab.org/phenograms/plot) to visualize the coordinates of LTR elements with ZFS on human chromosomes. Regarding the chromosomal region of the overlaps, an average of 1.11% of the LTR elements in each chromosome (7,823 out of 708,332 in entire chromosomes) had ZFSs (Supplementary Table S1). Among these, LTR/ERV1 accounted for the largest proportion, with 34.33% (2,686 of 7,823 in LTR containing ZFS), followed by LTR/ERVL-MaLR with 30.68% (2,400 out of 7,823). However, LTR/ERVK had the highest density with 5.89% (618 out of 10,490). The density indicates the ratio of each LTR class/family to ZFS (the number of LTR fragments containing ZFS/the number of LTR fragments in the human genome). In our previous study, we reported the chromosomal distribution and copy numbers of the HERV family in humans and great apes, indicating that HERV-K/solitary LTR elements were the most abundant (Kim, 2012). HERV-K family members also proliferated continuously during hominid evolution (Supplementary Fig. S2) (Anderssen et al., 1997; Di Cristofano et al., 1995; Hervé et al., 2004; Kjellman et al., 1999; Lavie et al., 2004; Lee and Kim, 2006; Yi et al., 2007a; 2007b; 2007c). Human-specific HERV-K activity has contributed to genomic divergence between humans and chimpanzees, as well as within the human population (Shin et al., 2016). Multiple copy numbers of solitary LTR elements belonging to the HERV-K family (GenBank accession No. AC002350, AC002400, AC002508, L47334, U47924, Z80898, and AL034407) have been identified as being unique to humans (Akopov et al., 1998; Buzdin et al., 2002; Medstrand and Mager, 1998). Solitary LTR elements were formed because of an equal homologous recombination excision event. Several evolutionary processes have occurred throughout the chromosomes during primate evolution. HERV-K LTR elements are the youngest retrovirus family in the human genome and are the only group of endogenous retroviruses that have polymorphic members in human populations (Macfarlane and Simmonds, 2004). As shown in Fig. 1, HERV-K/solitary LTR elements containing Z-DNA are present in all chromosomes except chromosomes 15, 20, 21, and 22, suggesting that they are still active in the human genome as regulatory members. Therefore, we investigated functional genes with alternative promoters derived from LTR elements containing Z-DNA.
Fig. 1

The chromosome ideogram showing the LTR classes of HERV-K families.

Each diagram indicates the putative Z-DNA location detected using Z-hunt and DeepZ programs. Different LTR classes are distinguished by different colors.

ALTERNATIVE PROMOTERS DERIVED FROM LTR ELEMENTS OVERLAPPING Z-DNA

Promoters regulate the transcription of exons located in downstream positions. Over half of human genes contain more than one promoter, which are collectively described as alternative promoters. Alternative promoters provide transcript diversity and confer dimensional complexity to cells (Landry et al., 2003). They also have different tissue specificities, developmental activities, and expression levels (Medstrand et al., 2001; Schon et al., 2009). Alternative promoters contribute to expression diversity as they create mRNA isoforms by expanding the choice of transcription initiation sites in a gene (Jacox et al., 2010). HERV LTR elements have a potential evolutionary role in enhancing the coding capacity and regulatory versatility of the genome without compromising its integrity (Sorek, 2007). Moreover, they increase genome plasticity and provide beneficial effects for the species by providing alternative promoters (Akopov et al., 1998; Sverdlov, 2000). Most protein-coding genes in humans are regulated by multiple distinct promoters, suggesting that promoter choice is as important as the level of transcriptional activity. Transcriptome diversity is the key to cellular identity. Although most HERV elements appear inactive, some are still transcribed and translated in specific human tissues (Lower et al., 1996). In our previous study, we examined the LTR10A element located upstream of the original promoter sequence of NOS3. Expression analysis using RT-PCR and reporter gene assays in HCT116 and COS7 cells have demonstrated that placenta-specific expression of NOS3 is driven by the LTR10A-derived alternative promoter (Huh et al., 2008). Alternative transcripts (FPR3-1 and FPR3-2) generated by the LTR54 element have been reported to show tissue-specific patterns with strong expression in the human lung or uterus, whereas the FPR3-1 transcript in rhesus macaque is broadly expressed in various tissues (Ha et al., 2011). Bioinformatics analyses have revealed that the LTR12C element has multiple transcription factor-binding sites specific for the nuclear transcription factor Y (NF-Y) and that the promoter activity of LTR12C is significantly decreased after NF-Y knockdown (Jung et al., 2017). Twelve alternative transcripts of PCDH11X/Y in relation to TEs have also been identified by in silico analysis, indicating that dominant expression patterns are present in several tissues (Tx1-fetal liver, Tx3-adult brain, Tx4-adult brain and kidney, Tx5-bone marrow, Ty1-fetal brain, and Ty2-adrenal glands). Tx4 transcripts show specific expression patterns in olfactory tissues (Ahn et al., 2010). The expression of HERV LTRs varies significantly in various cell lines and shows strict cell-type specificity in some cases (Schon et al., 2001). We thus summarized functional genes containing alternative promoters derived from HERV and solitary LTR elements overlapping Z-DNA prediction sites (Table 1, Fig. 2). Among 72 known LTR-derived gene promoters, 19 (26.39%) show ZFS in the Z-hunt analysis. Placenta-specific expression of insulin-like 4 (INSL4) is mediated by the 3' LTR of the HERV element, and the latter may play a major role in INSL4 upregulation during human cytotrophoblast differentiation into syncytiotrophoblasts (Bieche et al., 2003). GSDML (gasdermin-like protein), located on human chromosome 17q21.1, is an oncogenomic recombination hotspot. We previously identified the LTR element of HERV-H with reverse orientation, which acts as an alternative promoter of GSDML, and analyzed its expression pattern in human tissues and cancer cells. The transcripts of this LTR7B-derived promoter were found to be widely distributed in various human tissues and cancer cells, whereas transcripts of the cellular promoter were found only in stomach tissues. A reporter gene assay for the promoter activity of LTR7B on the GSDML in HCT-116, HeLa, and Cos7 cells revealed that the LTR7B promoter with reverse orientation had a stronger promoter activity compared with that of the forward promoter (Sin et al., 2006). These findings suggest that a new transcript variant ofGSDML was formed by integrating the antisense-oriented HERV-H LTR element (possibly forming Z-DNA) during hominoid evolution (Huh et al., 2008; Sin et al., 2006). HERV-H LTR sequences were found to positively regulate the transcriptional activity of GSDML. In a transient transfection assay, deletion of the U5 region resulted in a significant decrease in the transcriptional activity of GSDML (Huh et al., 2008). As shown in Fig. 3, a transcript variant (NM 018530) appeared upon integration of the LTR7B element. Within the LTR7B element, a high Z-score 1.5 band was determined using Z-hunt, overlapped with Z-DNA. Taken together, genomic integration by antisense-oriented HERV and solitary LTR elements results in Z-DNA, that acts as a regulatory element, such as a promoter or enhancer. This kind of alternative promoter or enhancer can play an important biological role in human cells, including recruitment of transcription factors, regulation of gene expression, and control of genome instability, resulting in biodiversity.
Table 1

Functional genes containing alternative promoter derived from LTR elements overlapping Z-DNA prediction site

GenesNCBI Gene IDLociTE types
MAN1C1 571341p36.11MER52A-ERV1
ZNF80 76343q13.31LTR12C-ERV1
FIS(C5orf27) 2022995q15LTR12C-ERV1
ZNF323(ZSCAN31) 642886p22.1HERV18 int-ERVL
LOC223075(CCDC129) 2230757p14.3LTR12D-ERV1
FLJ45974 4013377p12.1MLT2B3-ERVL
HHLA1 100868q24.22HERV-H int-ERV1
LY6K 547428q24.3LTR43-ERV1
INSL4 36419p24.1LTR22B/HML-5
NOV1(C11orf40) 14350111p15.4LTR18B-ERV1
PRDM10 5698011q24.3LTR52-ERVL
LINC00615 43991612q21.33LTR60-ERV1
MSLN 1023216p13.3MER54B-ERVL
OTOA 14618316p12.2LTR45B-ERV1
CCL15 635917q12MER50-ERV1
FLJ10260(SLFN12) 5510617q12MER51A-ERV1
GSDML(GSDMB) 5587617q21.1LTR7B-ERV1
APOC1 34119q13.32LTR2/HERV-E
HSPC072(LINC00652) 2907520p11.23MLTF-ERVL
Fig. 2

The genomic structure of functional genes containing alternative promoters derived from LTR elements overlapping with Z-DNA.

Fig. 3

Expression of the transcript variant (NM_018530) of GSDML regulated by integration of the LTR7B element as an alternative promoter.

Z-hunt detected a high Z-score band of 1.5 within the LTR7B element, which overlapped with Z-DNA.

HERV AND SOLITARY LTR ELEMENTS ARE IMPLICATED IN VARIOUS HUMAN DISEASES

HERV and solitary LTR elements can cause several human diseases such as azoospermia, multiple sclerosis, schizophrenia, diabetes, and cancer (Conrad et al., 1997; Kamp et al., 2000; Karlsson et al., 2001; Kim et al., 1999a; 1999b; Li et al., 2020; Patzke et al., 2002; Perron et al., 1997; Xiao and Xu, 2021). Apolipoprotein C1 (APOC1) appears to be an independent prognostic factor in patients with clear cell renal cell carcinoma (ccRCC). APOC1 could be a potential therapeutic target for ccRCC as it regulates cell growth and metastasis pathways (Xiao and Xu, 2021). APOC1 promotes metastasis of ccRCC cells by activating STAT3. Moreover, the metastatic potential of ccRCC cells driven by APOC1 is suppressed by DPP-4 inhibition (Li et al., 2020). Human transcripts containing HERV-E LTRs are fused to the APOC1 and endothelin B receptor (EBR) genes (Medstrand et al., 2001). Alternative transcripts of APOC1 and EBR are initiated and promoted by LTRs. In contrast to the LTR at the APOC1 locus, a significant proportion of EBR transcripts is derived from the LTR promoter in the placenta. This type of LTR element seems to have a dual role, acting both as a promoter and an enhancer for the expression of neighboring functional genes in specific tissues (Fig. 4).
Fig. 4

Schematic illustration of the mechanism that LTRs containing ZFS act as regulatory elements.

Some of the HERVs and solitary LTRs caused by exogenous retrovirus infection could have potential ZFS, and be integrated into the neighboring region of functional genes. This integration could result in a Z-DNA conformation, and LTRs containing ZFS might act as alternative promoters or enhancers of a functional gene.

LY6K-AS long noncoding RNA (lncRNA) is an anti-sense transcript, known as a prognostic biomarker, which shows elevated expression in patients with lung adenocarcinoma (LUAD); higher expression of LY6K-AS in LUAD predicts poor survival outcomes, indicating that LY6K-AS silencing is a promising therapeutic option, which inhibits oncogenic mitotic progression in LUAD (Ali et al., 2021). Osteoarthritis (OA) is the most prevalent articulating joint disease in humans and frequently results in joint pain, movement limitations, inflammation, and progressive degradation of the articular cartilage. LncRNAs are involved in multiple cellular and biological processes. Moreover, numerous lncRNAs are differentially expressed in human OA cartilage (Abbasifard et al., 2020). LncRNA-mRNA co-expression analysis has revealed a remarkable relationship, wherein OTOA may play a critical role in the differential mechanisms of OA progression between Tibetan and Han Chinese populations (Luo et al., 2021). LncRNA-derived LINC00652 can exert biological functions by co-expression with prognostic genes (INSL3, SNAP91, and REN) and lipid metabolism-related genes (MIA2, APOA1). Accordingly, this lncRNA-mRNA-based classifier might be clinically useful for predicting the recurrence and prognosis of childhood acute lymphoblastic leukemia (ALL) (Qi et al., 2021). DYX1C1 is a candidate gene for developmental dyslexia. One of its transcripts has been directly associated with an HERV-H LTR element, and alternatively spliced transcript variants of DYX1C1 have been demonstrated as potential biomarkers to detect colorectal cancer (Kim et al., 2009; 2011). Moreover, DYX1C1 expression in breast cancer is associated with several clinicopathological parameters, whereas loss of DYX1C1 is correlated with a more aggressive disease (Rosin et al., 2012). In our previous study, we detected transcript variants (a and b) of the choroideremia (CHM) gene in human cancer cells and tissues. High expression levels ofCHMisoform b caused by an LTR12C element offering an alternative splicing site were detected in colon and lung cancer cell lines and in tissues of patients with colon cancer (Jung et al., 2011). Identification of alternative spliced variants as biomarkers to distinguish between normal and cancer cells could thus enhance the existing understanding of tumorigenesis. Compared with the adjacent normal tissues, high expression levels of HERV-K were noted in testis tumor tissues, HERV-R in liver and lung tumor tissues, HERV-H in liver, lung, and testis tumor tissues, and HERV-P in colon and liver tumor tissues (Ahn and Kim, 2009). Human HHLA1 (HERV-H LTR-associating 1) and OC90 (otoconin-90) are normally expressed independently of different promoters but are expressed from the LTR promoter and are spliced together in teratocarcinoma cells, indicating that the strong activity of the LTR promoter in this cell type could induce transcriptional fusion of these two genes (Kowalski et al., 1999). The PLA2L (phospholipase A2-like domain) 5'-HERV-H sequence functions as an abnormally long and complex 5'-UTR, resulting in suppressed translation in both teratocarcinoma cell lines and full-length cDNA transfectants (Kowalski and Mager, 1998). Taken together, HERV and solitary LTR elements have been randomly integrated into neighboring functional genes during primate evolution and have evolved as regulatory elements, such as promoters or enhancers. Moreover, they provide alternative splicing and binding sites for transcription factors that control tissue-specific gene expression and transcript variants in relation to various human diseases.

Supplemental Materials

Note: Supplementary information is available on the Molecules and Cells website (
  77 in total

Review 1.  Retroviruses and primate evolution.

Authors:  E D Sverdlov
Journal:  Bioessays       Date:  2000-02       Impact factor: 4.345

2.  Expression and phylogenetic analyses of human endogenous retrovirus HC2 belonging to the HERV-T family in human tissues and cancer cells.

Authors:  Joo-Mi Yi; Heui-Soo Kim
Journal:  J Hum Genet       Date:  2007-02-03       Impact factor: 3.172

3.  Supercoiling of the DNA template during transcription.

Authors:  L F Liu; J C Wang
Journal:  Proc Natl Acad Sci U S A       Date:  1987-10       Impact factor: 11.205

Review 4.  The role and function of long non-coding RNAs in osteoarthritis.

Authors:  Mitra Abbasifard; Zahra Kamiab; Zahra Bagheri-Hosseinabadi; Iman Sadeghi
Journal:  Exp Mol Pathol       Date:  2020-02-21       Impact factor: 3.362

5.  Two long homologous retroviral sequence blocks in proximal Yq11 cause AZFa microdeletions as a result of intrachromosomal recombination events.

Authors:  C Kamp; P Hirschmann; H Voss; K Huellen; P H Vogt
Journal:  Hum Mol Genet       Date:  2000-10-12       Impact factor: 6.150

6.  Allelic variation of HERV-K(HML-2) endogenous retroviral elements in human populations.

Authors:  Catriona Macfarlane; Peter Simmonds
Journal:  J Mol Evol       Date:  2004-11       Impact factor: 2.395

7.  Molecular phylogenetic analysis of the human endogenous retrovirus E (HERV-E) family in human tissues and human cancers.

Authors:  Joo-Mi Yi; Heui-Soo Kim
Journal:  Genes Genet Syst       Date:  2007-02       Impact factor: 1.517

8.  Molecular characterization of the DYX1C1 gene and its application as a cancer biomarker.

Authors:  Yun-Ji Kim; Jae-Won Huh; Dae-Soo Kim; Min-In Bae; Ja-Rang Lee; Hong-Seok Ha; Kung Ahn; Tae-Oh Kim; Geun-Am Song; Heui-Soo Kim
Journal:  J Cancer Res Clin Oncol       Date:  2008-07-10       Impact factor: 4.553

9.  Variation in proviral content among human genomes mediated by LTR recombination.

Authors:  Jainy Thomas; Hervé Perron; Cédric Feschotte
Journal:  Mob DNA       Date:  2018-12-18

10.  Promoter expression of HERV-K (HML-2) provirus-derived sequences is related to LTR sequence variation and polymorphic transcription factor binding sites.

Authors:  Meagan Montesion; Zachary H Williams; Ravi P Subramanian; Charlotte Kuperwasser; John M Coffin
Journal:  Retrovirology       Date:  2018-08-20       Impact factor: 4.602

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.