| Literature DB >> 34187812 |
Wilfried M Guiblet1, Michael DeGiorgio2, Xiaoheng Cheng3, Francesca Chiaromonte4,5,6, Kristin A Eckert5,7, Yi-Fei Huang3,5, Kateryna D Makova3,5.
Abstract
Approximately 1% of the human genome has the ability to fold into G-quadruplexes (G4s)-noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized to participate in others (e.g., firing of replication origins). Moreover, G4s differ in their thermostability, and this may affect their function. Yet, G4s may also hinder replication, transcription, and translation and may increase genome instability and mutation rates. Therefore, depending on their genomic location, thermostability, and functionality, G4 loci might evolve under different selective pressures, which has never been investigated. Here we conducted the first genome-wide analysis of G4 distribution, thermostability, and selection. We found an overrepresentation, high thermostability, and purifying selection for G4s within genic components in which they are expected to be functional-promoters, CpG islands, and 5' and 3' UTRs. A similar pattern was observed for G4s within replication origins, enhancers, eQTLs, and TAD boundary regions, strongly suggesting their functionality. In contrast, G4s on the nontranscribed strand of exons were underrepresented, were unstable, and evolved neutrally. In general, G4s on the nontranscribed strand of genic components had lower density and were less stable than those on the transcribed strand, suggesting that the former are avoided at the RNA level. Across the genome, purifying selection was stronger at stable G4s. Our results suggest that purifying selection preserves the sequences of functional G4s, whereas nonfunctional G4s are too costly to be tolerated in the genome. Thus, G4s are emerging as fundamental, functional genomic elements.Entities:
Year: 2021 PMID: 34187812 PMCID: PMC8256861 DOI: 10.1101/gr.269589.120
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.The schematic presentation of the G4 consensus motif and structure.
Figure 2.Fold-differences in mean density of G4 loci located at different genic components (A) and nongenic functional regions (B) compared with the genome-wide average and the noncoding nonrepetitive (NCNR) subgenome. Red horizontal line indicates no difference compared with the genome-wide average.
Figure 3.Distribution of stability scores (violin plots) at G4 loci located at different genic components (A) and nongenic functional regions (B) compared with the genome-wide distribution and with the distribution in the NCNR subgenome. Stability scores were obtained with the Quadron software (Sahakyan et al. 2017a). Median values are marked on the violin plots. The number of G4 loci contained completely within components or regions (Supplemental Table S1) is shown in the parentheses. Because eQTLs are always smaller than G4 loci, we plotted the scores of G4 loci only partially intersecting with eQTLs. Red horizontal line indicates stability score of 19 used to differentiate between stable (more than 19) and unstable (19 or fewer) G4 loci. Stars indicate a significant difference between median stability scores in a group of components or regions and that in the rest of the genome.
Figure 4.Odds ratios (and their 95% confidence intervals) of the Fisher's exact test used to evaluate the significance of the Hudson–Kreitman–Aquadé test used to evaluate selection acting on G4 loci located at genic components and nongenic functional regions. Stable and unstable G4 loci are considered together (A) and separately (B). Red line represents an expectation under a null hypothesis of similar selective pressure acting on G4 loci and on the remaining sequences at the components/regions they are located within. If confidence intervals do not overlap the red line, the test is significant. Sample sizes are shown in Supplemental Table S1.