| Literature DB >> 32383760 |
Jonathan D Williams1,2, Dominika Houserova3, Bradley R Johnson2, Brad Dyniewski2, Alexandra Berroyer2, Hannah French2, Addison A Barchie4, Dakota D Bilbrey4, Jeffrey D Demeis4, Kanesha R Ghee4, Alexandra G Hughes4, Naden W Kreitz4, Cameron H McInnis4, Susanna C Pudner4, Monica N Reeves4, Ashlyn N Stahly4, Ana Turcu4, Brianna C Watters4, Grant T Daly3, Raymond J Langley3, Mark N Gillespie3, Aishwarya Prakash3,5, Erik D Larson2,6, Mohan V Kasukurthi7, Jingshan Huang7, Sue Jinks-Robertson1, Glen M Borchert3,4.
Abstract
Mammalian antibody switch regions (∼1500 bp) are composed of a series of closely neighboring G4-capable sequences. Whereas numerous structural and genome-wide analyses of roles for minimal G4s in transcriptional regulation have been reported, Long G4-capable regions (LG4s)-like those at antibody switch regions-remain virtually unexplored. Using a novel computational approach we have identified 301 LG4s in the human genome and find LG4s prone to mutation and significantly associated with chromosomal rearrangements in malignancy. Strikingly, 217 LG4s overlap annotated enhancers, and we find the promoters regulated by these enhancers markedly enriched in G4-capable sequences suggesting G4s facilitate promoter-enhancer interactions. Finally, and much to our surprise, we also find single-stranded loops of minimal G4s within individual LG4 loci are frequently highly complementary to one another with 178 LG4 loci averaging >35 internal loop:loop complements of >8 bp. As such, we hypothesized (then experimentally confirmed) that G4 loops within individual LG4 loci directly basepair with one another (similar to characterized stem-loop kissing interactions) forming a hitherto undescribed, higher-order, G4-based secondary structure we term a 'G4 Kiss or G4K'. In conclusion, LG4s adopt novel, higher-order, composite G4 structures directly contributing to the inherent instability, regulatory capacity, and maintenance of these conspicuous genomic regions.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32383760 PMCID: PMC7293029 DOI: 10.1093/nar/gkaa357
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.G4 DNA. (A) Illustration of guanine quartet with each guanine engaged in four hydrogen bonds and a central potassium cation coordinately bound. (B) Structural illustration depicting unimolecular antiparallel G4 DNA.
Figure 2.LG4s in the Human Genome. (A) Genome-side distribution of LG4 (red bars) on the left and examples of hits on the right with >3 bp G-repeats that are highlighted red. (B) Distribution of locations with respect to annotated genes for LG4 and control loci. (C) The length distribution of LG4 loci. (D) The number of LG4 and control loci on each chromosome, with asterisks indicating a significant difference. (E) Distribution of LG4 loci with respect to the distance from the ends of each chromosome.
Figure 3.LG4s are capable of G4 formation. (A) The average number of non-overlapping G4 motifs predicted by QGRS mapper per kb (G4 motif density) in LG4s, regions directly flanking LG4s (flanking) and control loci (Ctrl). (B) Circular dichroism ellipticities of oligonucleotides representing LG4s. (C) Klenow DNA polymerase primer-extension reactions of G- or C-rich single stranded Sγ3 or DIP2C DNA templates in buffer containing K+ or Li+. (D) Primer extension reactions of the C-rich LG4 strand of loci shown to stall polymerase in G4 supportive and non-supportive conditions. Reactions were in G4-supportive conditions (K+) (E) Klenow primer-extension reactions on LG4 G-rich templates in different G4-permissive conditions. LG4 sequences are denoted at the top of lanes; areas of stalled DNA synthesis is denoted by the brackets and full-length replication products are denoted by arrows. Sγ3 and Sμ are G4 model sequences previously shown to form G4 in vitro (24,52).
Figure 4.LG4s are associated with increased small and large-scale genome variation. (A) Entries from the dbSNP database for LG4s and size matched control regions. (B) Entries from the dbSNP database for LG4 and intronic regions directly surrounding LG4 (flanking). (C) CNV sizes from dbVAR and the average number of breakpoints/kb (y-axis) was calculated for each transcribed LG4, in 1kb increments away from LG4, and the rest of the transcript not directly associated with LG4 (Unassociated TXN). (D) Schematic diagram of the location and size of copy-number variants with respect to the predominant TMPRSS2 transcript. Introns, exons and UTRs denoted by lines, solid boxes, or open boxes, respectively. The yellow dashed box highlights the location of the TMPRSS2 LG4 and individual duplications (blue boxes) and deletions (red boxes) are shown.
Figure 5.A DIP2C intronic LG4 is prone to deletions and duplications over 10 bp. (A) A 130 bp fragment of DIP2C LG4 was cloned into yeast. The G-repeats are in red. A table of LG4s genetic traits and human genome variation compared to non-exon regions surrounding all LG4s is below. (B) The LG4 LYS2 reversion window was sequenced for revertants and rates adjusted by proportion of a given mutation type. Error bars are 95% confidence intervals and adjusted for the total rate. (C) Mutation spectrum for deletions (red bars) and duplications (green bars) with corresponding size and number detected for both Lys+1 sequences (top) and Lys −1 (bottom). (D) The bp of perfect homology at the end of duplications/deletions.
Figure 6.Recurrent translocations between LG4s. (A) Mechanism of isotype switching in activated B cells. Resulting mRNA indicated below genomic sequences. V; variable exon. D; diversity exon. J; joining exon. IgM; immunoglobulin M. IgG1; immunoglobulin G subclass 1. μ; immunoglobulin M heavy chain exon. δ; immunoglobulin D heavy chain exon. γ3; immunoglobulin G subclass 3 heavy chain exon. γ1;immunoglobulin G subclass 3 heavy chain exon. ϵ; immunoglobulin E heavy chain exon. α; immunoglobulin A heavy chain exon. 2β; immunoglobulin G subclass 2 beta heavy chain. 2α; immunoglobulin G subclass 2 alpha heavy chain (32,52,127–130). (B) Gene fusion (40,41) resulting from distinct LG4 breaks independently reported in breast (TCGA-BRCA) and cervical (TCGA-CESC) malignancies. SBNO2; Strawberry notch homolog 2. TPGS1; Tubulin Polyglutamylase Complex Subunit 1. (C) Model of potential LG4 super structure.
Figure 7.LG4 Kissing Loops. (A) Predicted model of neighboring G4 loop interaction in human LG4 locus occurring at hg38:chr17:1052333–1053488 is illustrated. Complementary loops are highlighted in yellow. Guanine triplets are highlighted in red. (B) Nondenaturing G4 gel electrophoresis of minimal G4 capable sequence (taken from the LG4 detailed in A). Lane 1 – tandem of two minimal G4 capable sequences with complementary loops. Lanes 2 and 3 – tandem of two minimal G4 capable sequences with complementary loops replaced by adenosines. Upper gel image: EtBr stain (orange) only. The affinity for EtBr binding of dsDNA is ∼25 times its affinity for ssDNA. Lower gel image: Subsequent Thioflavin (blue) staining of the identical gel shown in the upper image. (C) Nondenaturing G4 gel electrophoresis of G4 capable sequences with complementary loops from known HPV kissing hairpins. (Left) Control oligonucleotide lacking complementary loops. (Right) Oligonucleotide containing complementary loops known to participate in HCV hairpin kissing (111). Upper gel image: EtBr stain (orange) only. Lower gel image: Subsequent Thioflavin (blue) staining of the identical gel shown in the upper image. (D) Additional example of a LG4 locus with loops described as ‘Self complements’ (like the LG4 in A). (E) Example of a LG4 locus with loops described as ‘Neighboring complements’.
Figure 8.LG4 enhancer model. Potential mechanism by which LG4s could interact with multiple gene promoters to coordinate their expressions.