Literature DB >> 32396327

Potential G-Quadruplex Forming Sequences and N6-Methyladenosine Colocalize at Human Pre-mRNA Intron Splice Sites.

Manuel Jara-Espejo1,2, Aaron M Fleming1, Cynthia J Burrows1.   

Abstract

Maturation of mRNA in humans involves modifying the 5' and 3' ends, splicing introns, and installing epitranscriptomic modifications that are essential for mRNA biogenesis. With respect to epitranscriptomic modifications, they are usually installed in specific consensus motifs, although not all sequences are modified suggesting a secondary structural component to site selection. Using bioinformatic analysis of published data, we identify in human mature-mRNA that potential RNA G-quadruplex (rG4) sequences colocalize with the epitranscriptomic modifications N6-methyladenosine (m6A), pseudouridine (Ψ), and inosine (I). Using the only available pre-mRNA data sets from the literature, we demonstrate colocalization of potential rG4s and m6A was greatest overall and occurred in introns near 5' and 3' splice sites. The loop lengths and sequence context of the m6A-bearing potential rG4s exhibited short loops most commonly comprised of single A nucleotides. This observation is consistent with a literature report of intronic m6A found in SAG (S = C or G) consensus motifs that are also recognized by splicing factors. The localization of m6A and potential rG4s in pre-mRNA at intron splice junctions suggests that these features could function together in alternative splicing. A similar analysis for potential rG4s around sites of Ψ installation or A-to-I editing in mRNA also found a colocalization; however, the frequency was less than that observed with m6A. These bioinformatic analyses guide a discussion of future experiments to understand how noncanonical rG4 structures may collaborate with epitranscriptomic modifications in the human cellular context to impact cellular phenotype.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32396327      PMCID: PMC7309266          DOI: 10.1021/acschembio.0c00260

Source DB:  PubMed          Journal:  ACS Chem Biol        ISSN: 1554-8929            Impact factor:   5.100


Introduction

The process of pre-mRNA maturation is of considerable interest because each step can impact the stability, coding potential, or localization of the mature mRNA.[1−5] Maturation of mRNA involves installation of a 5′ cap, addition of a 3′-polyadenosine tail, writing of epitranscriptomic modifications, and intron excision.[6] Processing of the 5′ and 3′ ends of mRNA has been studied, and recent work suggests there is more to learn.[7,8] Understanding of writing epitranscriptomic modifications and alternative mRNA splicing are rapidly advancing as a result of next-generation sequencing (NGS) and expansion of bioinformatic tools.[1−4] In human cells, NGS-based studies in tandem with knockdown or knockout of established epitranscriptomic protein readers, writers, or erasers have identified that these modifications are involved in all aspects of mRNA biogenesis including splicing, nuclear export, translation efficiency, and cellular half-life. The collaboration of epitranscriptomic modifications and alternative mRNA splicing has recently been described,[9] but the structural features of mRNA driving the process remain mysterious. A better understanding of the epitranscriptome and alternative mRNA splicing will enable future researchers to modulate these pathways judiciously in cells and provide new therapeutic opportunities to target these pathways for disease treatment.

N6-Methyladenosine

The best studied epitranscriptomic modification in mRNA is N6-methyladenosine (m6A; Figure A). Writing of the methyl group on A nucleotides in mRNA occurs in specific sequence motifs by the METTL3/14 SAM-dependent methyltransferase complex at a frequency of 0.1–1% of A nucleotides; however, not all such target sequences are modified.[1−5] The DRACH (D = A, G, or U; R = G or A; H = A, C, or U; A = m6A) consensus sequence appears to be a dominant site of m6A installation in mature mRNA; in contrast, the splicing sites within intron sequences of pre-mRNA have m6A installed in SAG (S = C or G) motifs.[9,10] To date, the accessory protein WTAP appears to be the most essential for guiding the methylation process, while VIRMA, ZC3H13, and RBM15, as examples, also have been implicated in writing of m6A on mRNA.[3,4,11] Once written on mRNA, m6A is read by the cytosolic YTHDF1–3 proteins, and the m6A modifications can be removed by the demethylases FTO or ALKBH5 imparting dynamics to this epitranscriptomic system.[1−5] Current studies have led to the suggestion that RNA secondary structure plays a key role in selection of m6A modified sites.[12−14] Future work to clarify the structural component in selection or mRNA sites epitranscriptomically modified is needed.
Figure 1

(A) Modifications m6A, Ψ, and I are epitranscriptomic. (B) Maturation of mRNA involves intron splicing; potential G-quadruplex sequences are found in introns near splice sites.

(A) Modifications m6A, Ψ, and I are epitranscriptomic. (B) Maturation of mRNA involves intron splicing; potential G-quadruplex sequences are found in introns near splice sites.

Pseudouridine and A-to-I Editing

A second modification is the isomerization of uridine to pseudouridine (Ψ; Figure A) in mRNA that occurs at a frequency of 0.2–0.6% of U nucleotides.[1] Out of the many human pseudouridine synthases, PUS1 and PUS7 are established to isomerize U to Ψ in mRNA.[2,15] The same questions regarding site selection of Ψ installation have been asked, with a recent report finding that PUS1 targets hairpin-type structures.[2] Lastly, two human adenosine deaminase RNA specific (ADAR) proteins catalyze A-to-I editing that yields inosine (I; Figure ) as another epitranscriptomic modification.[16] Extensive studies on RNA editing have been conducted and cataloged in databases for researchers to interrogate.[17] Editing of RNA by ADARs is catalyzed in a central region of long dsRNA with some minor sequence and local structural context requirements noted.[18] More analyses are needed to seek out and clarify the RNA structural and sequence requirements that dictate sites of m6A and Ψ epitranscriptomic modification, as well as to explore whether other sequence motifs can be sites of A-to-I RNA editing.[12,18,19]

Pre-mRNA and G-Quadruplexes (G4s)

Alternative splicing of nascent or pre-mRNA to yield mature mRNA is a highly regulated eukaryotic process resulting in a single gene having the potential to code for multiple proteins.[20] This diversification occurs by inclusion or exclusion of particular exons in the final processed mRNA. In humans, ∼95% of multiexonic genes are alternatively spliced enabling the ∼20 000 protein-coding genes to direct synthesis of a much greater diversity of proteins. Alternative splicing of mRNA is cell-type specific and changes with oxidative stress or disease.[20,21] The mRNA features and epitranscriptomic components that drive when and to what extent alternative splicing occurs in particular cell types have been topics of recent studies,[20,21] and future work is needed to better understand the RNA structural details guiding the process. Genomes code for all the information found in RNA (Figure B), and in the human genome there exists enrichment of potential G-quadruplex forming sequences (PQSs) in promoters, 5′-untranslated regions, and introns near splice sites.[22] G-Quadruplexes (G4s) are noncanonical folds in nucleic acids that have sequences comprised of four or more runs of G in close proximity providing the opportunity for the sequence to fold around intracellular K+ ions (Figure B).[23] In DNA G4s, stable folds each have G-runs of three or more Gs, while in RNA G4s (rG4s), two Gs per run can adopt stable folds.[23−25] Those PQSs in the nontemplate strand (i.e., coding strand) of a gene will also be present in the RNA transcript (Figure B). We have noted that potential rG4s in the Zika and HIV viral RNAs colocalize with sites of m6A installation;[13] however, whether a similar colocalization exists in the human transcriptome had not yet been studied.

Bioinformatic Analysis of G4s and Epitranscriptomic Modifications

In the present set of bioinformatic studies, mapping data for m6A and Ψ in human mature mRNA, as well as I in the human transcriptome, were inspected for colocalization around potential rG4s. We found enrichment of each epitranscriptomic modification around potential rG4s in the data inspected. A second analysis looked at the only available modification maps in pre-mRNA looking for m6A, in which we found a greater colocalization of this modification with potential rG4s than observed in mature mRNA for all three modifications analyzed. This finding suggests a possible synergy in the deposition of m6A on pre-mRNA splice sites around potential rG4s. Biochemical studies suggest that m6A is written on nascent mRNA cotranscriptionally; therefore, chromatin features such as genomic sequences and structures or histone modifications that interact with the mRNA synthesis machinery can impact where and to what extent epitranscriptomic modifications are installed.[3,9,26,27] Here, we identify that colocalized sites of m6A in pre-mRNA and rG4s track with the genomic G4s found in introns, suggesting a possible long-range interplay of chromatin structure and the epitranscriptome. In the final analysis of the m6A sites in the potential rG4s of intronic pre-mRNA, a preference for rG4 loop sequences with a modifiable A nucleotide was identified. The SAG sequence previously found for the consensus motif of m6A deposition in intronic sequences was found to be part of a larger rG4 structural context. The publicly available RNA modification maps in human mRNA used in the present study are summarized in Table S1.[9,12,17,27−29] The peak summits for each epitranscriptomic modification were identified, and then we selected a window of sequence space ±30 nucleotides flanking the summit to inspect for PQSs computationally. The quadruplex-forming G-rich sequence (QGRS) mapper algorithm was used to look for the sequence pattern 5′-GL≤7GL≤7GL≤7G-3′ where x ≥ 2 nucleotides and L represents the loops.[30] The first data set inspected had sequenced m6A in mature mRNA collected from HeLa cells using the m6A-CLIP sequencing protocol.[27] From these data in mature HeLa mRNA, 17% (7838 out of 46355) of the m6A enriched regions were found to also contain a PQS (Figure A). Inspection of m6A mapped via miCLIP in mature mRNA from HEK293 cells[29] found 18% (3774 out of 20579) of the m6A enriched regions colocated with a PQS (Figure A). These data in two different cell lines indicate that nearly one-fifth of the m6A peaks colocate with a PQS in human mature mRNA.
Figure 2

Inspection of human mature mRNA sequenced for m6A, Ψ, and A-to-I editing sites to determine whether a PQS resides in the same location. (A) Bar plot showing the number of m6A enriched sites in HeLa and HEK293 mature mRNA that also have a PQS overlapping with the site of modification. (B) Classification of the PQSs found in the HEK293 mature mRNA on the basis of the number of G-tetrads that can form from the sequence. (C) Plot of PQS enrichment (observed/expected) found in the m6A or Ψ epitranscriptomic modification sites in mature mRNA from HEK293 or HeLa cells, and A-to-I editing sites in the Inosinome Atlas.[17,27−29]

Inspection of human mature mRNA sequenced for m6A, Ψ, and A-to-I editing sites to determine whether a PQS resides in the same location. (A) Bar plot showing the number of m6A enriched sites in HeLa and HEK293 mature mRNA that also have a PQS overlapping with the site of modification. (B) Classification of the PQSs found in the HEK293 mature mRNA on the basis of the number of G-tetrads that can form from the sequence. (C) Plot of PQS enrichment (observed/expected) found in the m6A or Ψ epitranscriptomic modification sites in mature mRNA from HEK293 or HeLa cells, and A-to-I editing sites in the Inosinome Atlas.[17,27−29] In the next step of the analysis, the PQSs found in sites of m6A enrichment were classified on the basis of the number of G-tetrads that would occur in the rG4 fold. In RNA, stable rG4 folds have been found with only two G-tetrads, which is in contrast to DNA that generally require at least three G-tetrads to adopt a stable fold.[13,23] The greater stability in rG4s results from the 2′-OH providing an additional hydrogen bond that is not possible in DNA.[23] Prior studies have shown that two-tetrad rG4s provide a quasi-stable fold that can be harnessed as a switch to impact the fate of RNA in cells.[25,31] In mature mRNA from HEK293 cells, 81% could adopt a two G-tetrad rG4 and 19% could adopt a three or more G-tetrad rG4 (Figure B); a similar excess of two G-tetrad compared to three G-tetrad potential rG4s was observed for the mature mRNA from HeLa cells (Figure S1). The overabundance of two G-tetrad rG4s may have biological significance because this lowers the stability of the fold and possibly allows it to function as an on–off switch with dependency on the modification status. Studies of m6A in rG4s have yet to be conducted, although a recent report of N6-methyl-2′-deoxyadenosine in the loop of a DNA G4 has found this modification to destabilize the structure.[32] A similar impact on stability may exist in rG4s. In this situation, m6A would function to destabilize rG4 folds and possibly favor other secondary structures as the structural switch. The statistical significance of PQSs colocalizing with m6A-enriched sites in human mRNA was calculated by comparison of the identified count to one obtained from randomized and shuffled sequences (Figure C). The presence of PQSs in the m6A sites in the HEK293 and HeLa cells was found to be significant with 2.6-fold (P < 2.2 e–16) and 2.1-fold (P < 2.2 e–16; Table S2) enrichment, respectively, relative to the randomized samples on the basis of the Fisher’s exact test. To summarize the m6A analysis in mature mRNA, the modification sites were found to be favorably enriched (>2-fold) around potential rG4s that predominantly have two G-tetrads (Figure ). Next, mature mRNA from HeLa or HEK293 cells were inspected for colocalization of Ψ sites with PQSs that were chosen for study to be consistent with the m6A data sets analyzed (Table S1).[12,17] In the Ψ data set from HEK293 cells, 13% of the modified sites (308 out of 2058) also contained a PQS (Figure S1). This number of PQSs represents a significant 1.7-fold enrichment of these G-rich sequences (P < 8.8 e–8; Figure C and Table S2). In the HeLa cell data set, 20% of the modified sites (24 out of 115) also contained a PQS in the mature mRNA (Table S1). This represents a significant 2.0-fold enrichment of these G-rich sequences (P < 6.4 e–3; Figure C and Table S2). In the Ψ sites from the different cell lines, 95% of the HEK293 and 89% of the HeLa sites had a two G-tetrad rG4s with the remainder having three or more G-tetrads (Figure S1). This identified a small but favorable enrichment of Ψ sites around potential rG4s, and the rG4s predominantly adopt two G-tetrads. The A-to-I editing data were obtained from the Inosinome Atlas that provides a comprehensive listing of all established editing sites.[17] A key difference in this analysis is that the data were obtained from the entire transcriptome and not restricted to mature mRNA. In the A-to-I analysis, 22% (1 004 026 out of 4 668 508) colocalized with a PQS (Figure C), which was a significant 2.3-fold enrichment (P < 2.2 e–16; Table S2). The G-tetrad count for PQSs colocalized with A-to-I editing sites was ∼80% two G-tetrad G4s and ∼20% three or more G-tetrad rG4s (Figure S1); these values are very similar to those found in the m6A and Ψ PQS colocalization data described above. Guided by the knowledge that PQSs are enriched in human introns,[33] inspection of the chromatin-associated RNA (i.e., pre-mRNA) was then conducted. Maps for m6A in HeLa and HEK293 cellular pre-mRNA are the only ones available;[9,27] thus, we were not able to conduct a similar analysis for Ψ or I in pre-mRNA. The m6A enriched sites in pre-mRNA from HEK293 cells were sequenced using transient N6-methyladenosine transcriptome sequencing (TNT-seq),[9] while the HeLa pre-mRNA were sequenced using the m6A-CLIP protocol;[27] thus, there exists a difference in how these two maps were obtained and they may have different sequence and structural biases. In the HeLa pre-mRNA, 21% (7919 out of 37606) of the m6A sites colocalized with a potential rG4 (Figure A). On the other hand, in HEK293 pre-mRNA 40% (23372 out of 58311) of the enriched m6A sites colocalized with a potential rG4 (Figure A). In the population of potential rG4s colocalized with m6A in HEK293 cells, ∼80% were two G-tetrad rG4s and the rest had three of more G-tetrads (Figure B). In the HeLa pre-mRNA, potential rG4s were found to be ∼90% two G-tetrad rG4s and the remainder were three or more G-tetrad rG4s (Figure S1). This analysis suggests a high incidence of m6A enriched sites occurring in potential rG4s with two G-tetrads in pre-mRNA particularly in the pre-mRNA from HEK293 cells.
Figure 3

Analysis of enriched sites of m6A in human pre-mRNA for potential rG4s. (A) Counts of m6A sites with and without a colocalized potential rG4. (B) Break down of potential rG4s found in the HEK293 data set for the number of G-tetrads. (C) Fold enrichment of potential rG4s in the experimental sample relative to the randomized sample. (D) Intron map illustrating the m6A enriched sites found in HEK293 pre-mRNA sequenced by TNT-seq,[9] the position of the PQSs found in the sequencing data, and comparison to the position of G4s found in the human genome via G4-seq.[33]

Analysis of enriched sites of m6A in human pre-mRNA for potential rG4s. (A) Counts of m6A sites with and without a colocalized potential rG4. (B) Break down of potential rG4s found in the HEK293 data set for the number of G-tetrads. (C) Fold enrichment of potential rG4s in the experimental sample relative to the randomized sample. (D) Intron map illustrating the m6A enriched sites found in HEK293 pre-mRNA sequenced by TNT-seq,[9] the position of the PQSs found in the sequencing data, and comparison to the position of G4s found in the human genome via G4-seq.[33] The statistical significance for enrichment of PQSs in the regions of m6A installation in human pre-mRNA was compared to randomized, shuffled sequences (Figure C). Comparison of the 23372 PQSs identified in the HEK293 pre-mRNA to the 5461 PQSs expected by randomized shuffling found a 4.3-fold enrichment in PQSs; this finding is significant on the basis of Fisher’s exact test (P < 2.2 e–16; Figure C and Table S2). In the HeLa mRNA analyzed for m6A and PQS colocalization, there was a significant 2.1-fold enrichment found in the pre-mRNA (P < 2.2 e–16; Figure C and Table S2). Taken together, these results support a favorable colocalization of m6A sites and PQSs in human mRNA. The finding of m6A and PQS colocation in human mRNA, particularly introns of pre-mRNA, suggests that the rG4 secondary structure and epitranscriptomic m6A may be synergistic; this observation is consistent with our previous report on the colocalization of m6A and PQSs in viral genomic RNA.[13] In the HEK293 pre-mRNA m6A data reported by Louloupi et al.,[9] a high incidence of m6A residing in intronic regions was observed. Next, a focused inspection of PQSs in the intron regions was conducted. The m6A data (Figure D, solid red) and PQS (Figure D, dashed red) sites tracked with each other and were favorably enriched on the intronic side of both the 5′ and 3′ splice sites. With the knowledge that epitranscriptomic modifications are installed cotranscriptionally and that the chromatin architecture on the genome and histones impacts the methylation process, we asked whether DNA G4s near the region coding for the RNA m6A sites are favorably colocalized. The Balasubramanian laboratory developed G4-seq to find all sequences that could adopt G4s in the human genome.[33] There is one noteworthy point regarding this comparison, which is that stable DNA G4s are generally at least three G-tetrads,[24] while the present RNA analysis found a preference for two G-tetrad rG4s (Figures B, 3B, and S1). The comparison found the G4-seq data on the coding strand of human introns did indeed show enrichment at both the 5′ and 3′ splice sites (Figure D, blue line) that also tracked with the mRNA m6A and PQS profiles. How could a genomic G4 impact writing of m6A on pre-mRNA? Events that stall the RNA pol II complex during transcription show increased deposition of m6A on the transcript.[3,9,26,27] It is known that template strand G4s stall polymerase bypass;[34] however, G4s in the template strand do not code for the G-rich sequence in the mRNA. In contrast, genomic G4s in coding strands can stall the RNA pol II complex by increasing the persistence and length of R-loops[35,36] and at the same time code for an rG4 in the nascent mRNA. It is possible that the genomic G-rich sequence has two effects that are (1) to slow mRNA synthesis and (2) to cause greater writing of m6A on potential rG4s in pre-mRNA. Future experimental studies are needed to address this hypothesis derived from bioinformatic inspection of m6A mapping data in HEK293 pre-mRNA.

A Role for G4s and m6A in Splicing?

A closer examination of the data around intron splice sites provided a few additional observations. Within 200 nt of each splice junction, 8040 m6A-enriched sites on the intronic 5′ splice site representing 3187 genes and 7236 m6A-enriched sites on the intronic 3′ splice site representing 2681 genes were found; these numbers represent 47% of total m6A intronic sites in the HEK295 pre-mRNA data set. A similar distribution of m6A and PQS colocalization sites was observed indicating a possible close association of these two RNA features near mRNA splicing sites. These observations suggest an opportunity for future studies that address whether rG4s and m6A function synergistically to guide mRNA splicing. The A-to-I editing sites in introns of HEK293 mRNA were plotted alongside the m6A, PQS, and G4-seq data to find that RNA editing was not observed around splice sites and did not track with PQSs in this region (Figure D, black dashed line). The A-to-I editing sites appear to be depleted around intron splicing sites. The Ψ data set from HEK293 cells was conducted on mature mRNA, in which introns are not present, and therefore, no further analysis of the data was conducted.

G-Quadruplex Loop Analysis

In mature mRNA, m6A mapping studies have suggested a broad consensus motif for A methylation in the sequence context DRACH (D = A, G, or U; R = G or A; and H = A, C, or U). In the work by Louloupi et al., intronic m6A was favorably deposited in SAG (S = C or G) sequence motifs.[9] Because the HEK293 pre-mRNA data exhibited the highest colocalization of m6A and PQSs, the sequence population was further interrogated to identify favorable rG4 loop profiles with respect to length and sequence. During the PQS inspection, the three loops could have 7 or fewer nucleotides. The loop analysis was conducted on 14076 two G-tetrad PQSs and 729 three or more G-tetrad PQSs. In the loop length analysis of two G-tetrad PQSs, there was a slight preference for shorter loop lengths, but there were many longer loop PQSs observed at a high relative frequency (Figure A). Additionally, a breakdown of the first, second, or third loops found they all had a similar length profile (Figure A). The minimal dependency of two G-tetrad rG4 loop lengths in the population colocalizing with m6A suggests a broad structural substrate scope for writing this modification on these noncanonical folds. Additionally, the data suggest symmetry in loop length may be a feature of rG4s that are at sites of m6A installation.
Figure 4

Loop length analysis of the PQSs that colocalized with m6A enriched regions in the HEK293 pre-mRNA.[9] Analysis of individual loop lengths for (A) two G-tetrad PQSs and (B) three or more G-tetrad PQSs. Analysis of loop length combinations for (C) two G-tetrad PQSs and (D) three or more G-tetrad PQSs that identify the most prevalent loop length combinations.

Loop length analysis of the PQSs that colocalized with m6A enriched regions in the HEK293 pre-mRNA.[9] Analysis of individual loop lengths for (A) two G-tetrad PQSs and (B) three or more G-tetrad PQSs. Analysis of loop length combinations for (C) two G-tetrad PQSs and (D) three or more G-tetrad PQSs that identify the most prevalent loop length combinations. In the PQS population with three or more G-tetrads, one nucleotide loop lengths were less common, while three and four nucleotide loop lengths were most common (Figure B). Inspection of the combination of all three loop lengths together in the two G-tetrad PQSs found the 1–1–1 loop length combination to be most common followed by the 4–4–4 loop length combination. In general, as the loop length combination increased or became asymmetric, the number of PQSs observed decreased (Figure C). For the three-loop length combination analysis of the three G-tetrad or more PQSs, the most common combination found had 3–3–3 nucleotide loop lengths; this was followed by the longer 7–7–7 and shorter 2–2–2 nucleotide loop lengths (Figure D). In the three or more G-tetrad PQS data, the least common were those with asymmetric loop lengths, with the exception to the least common pool being 6–6–6 nucleotide long loops (Figure D). This information suggest longer loops in three or more G-tetrad rG4s with which m6A colocalize are preferred. The reason for this difference relative to the two G-tetrad cohort is not known; however, longer loops in G-quadruplexes usually destabilize the structure,[37] which may be important in rG4s used as structural switches responding to the presence of m6A in cells. Inspection of the loop sequences of the two G-tetrad PQSs in m6A-enriched regions of intronic pre-mRNA in HEK293 cells identified a nucleotide preference. In Table , a rank ordering of the top five most common loop sequences found in each of the three possible rG4 loops is provided. Single-nucleotide loops comprised of an A nucleotide were the most common in all three loops. Because A nucleotides are potential sites of methylation, the observation of single A nucleotides in the two G-tetrad PQSs nicely fits our hypothesis that rG4 folds provide a structural motif to guide sites of m6A introduction in human mRNA. Furthermore, the high incidence of single A nucleotides is consistent with the work by Louloupi et al.,[9] in which the consensus motif SAG occurred with greater frequency.
Table 1

PQS Loop Nucleotide Composition for Those Loops in m6A-Enriched Regionsa

 order most to least common
 12345
loop 1AUGAGAA
count1044518502474237
loop 2AUGGAC
count936657384250239
loop 3AUGCAGA
count999691459379332

Data corresponds to HEK293 pre-mRNA m6A profile reported by Louloupi et al.[9] See Table S3 for additional data.

Data corresponds to HEK293 pre-mRNA m6A profile reported by Louloupi et al.[9] See Table S3 for additional data. The second most common loop sequence observed in the two G-tetrad PQSs was single U nucleotides in any of the three loops, although many of the top ten most prevalent loop sequences contained A nucleotides within dinucleotide motifs such as 5′-GA, 5′-AA, 5′-CA, and 5′-AG. Interestingly, the 5′-AC dinucleotide that would indicate the DRACH consensus motif within a PQS was not among the top 10 most prevalent loop sequences (Table S3). Further inspection of the 795 PQSs in which single-nucleotide loops were identified also found that homogeneous loops of single A nucleotides dominated the distribution with 142 occurrences. Inspection for homogeneous single U or C nucleotide loops found 18 and 34 occurrences, respectively, and no PQSs were observed with all-G loops. In summary, two G-tetrad PQSs in m6A enriched regions are biased in their nucleotide composition to have A > U > G > C and in their propensity to have the same lengths for all three loops (Figure and Tables and S3). This indicates that homogeneous short loops favor A nucleotides as evidenced by the loop composition analysis, although the sequences with all C or U nucleotide loops will not provide an RNA substrate for writing m6A. These sequences may be false positives or the methylated A may reside just beyond the G4; additional analysis of the tail sequences for the small sample of PQSs without an A was not conducted. The sequence composition for the three or more G-tetrad PQSs found colocalized with m6A was also analyzed and found to be rich in A nucleotides (Table S4). Additionally, the 5′-AC dinucleotide common to the DRACH consensus motif was not found in the top ten most common loop sequences, as indicated by the Louloupi et al. work.[9] Short rG4 folds such as these found in the sequencing data can adopt stable folds as was recently reported.[38]

Conclusions and Outlook

The observations of the bioinformatic analysis of m6A colocalization with rG4s, especially those near intronic splice sites, can guide the design of future experiments. (1) This bioinformatic study suggests a synergy between rG4 folds with sites of m6A epitranscriptomic modification. Do rG4 folds function as a structural motif for methylation of the RNA by the METTL3/14 methyltransferase complex? The loop length and sequence identity of the rG4s found are essential knowledge for the design of in vitro studies to test the hypothesis that METLL3/14 favors writing of m6A on rG4 scaffolds. One feature of the rG4s the present data cannot address is whether the preferred folds exist in sequence contexts that are dynamic between rG4 structures or with other RNA structures; this type of information will likely need to be addressed on a sequence-by-sequence basis or via inspection of high-resolution RNA structural maps in vivo obtained by chemical probing.[39,40] (2) The SAG consensus sequence is also recognized by some splicing proteins such as SRSF3.[9] Do these splicing proteins bind rG4s and is their binding modulated by the presence of m6A in a rG4 loop? (3) Splicing factors that bind SAG sequence motifs were found to be involved in alternative mRNA splicing.[41] The role of PQSs in introns and their folding to rG4s to impact alternative splicing of specific mRNA, such as the p53 mRNA, has been noted.[42−44] Is there synergy between rG4s and m6A to guide alternative mRNA splicing? At present, studies have addressed rG4s[42,43] and m6A individually for guiding alternative splicing.[9] The analysis presented here suggests that rG4s and m6A may collaborate in alternative splicing. (4) Whether the rG4 fold is the signal for writing or erasing m6A in the mRNA or the presence of m6A impacts the rG4 fold is not known, and is a question we previously asked.[13] Further, rG4 folds are known to be dynamic and adopt many different structures; because each sequence is unique, this would have to be studied on a case-by-case basis. (5) The preference for A-rich two G-tetrad potential rG4s identified in the intronic m6A enriched sites is not understood, and further studies are needed. It is known that A-rich G4 loops destabilize the fold that may be important for stability of the fold to be altered by methylation under physiological conditions.[45] Herein, we explored the structural pattern of human RNA sites harboring m6A, Ψ, or A-to-I editing modifications, focusing on the presence of PQSs at these sites. The study revealed that all three modifications favorably colocalized with potential rG4s when a comparison to a randomized data set was conducted (Figures C and 3C). The greatest colocalization was observed between m6A and potential rG4s near the splice sites in introns of HEK293 cells (Figure A–D). This observation suggests there may be an interplay between m6A, rG4s, and mRNA splicing that could be a component of alternative splicing; future experimental work is needed to address this possibility. Our prior interest in the colocalization of m6A and PQSs focused on viral RNA genomes that showed a preference for DRACH motif methylation.[13] The present work on human pre-mRNA is consistent with the viral RNA analysis; however, the sequence context found for the colocalization in intronic pre-mRNA occurs largely within SAG motifs (Table ). This difference observed may reflect the fact that writing of m6A on pre-mRNA occurs in the nucleus, while m6A in viral RNA occurs in the cytosol.[9,13] The present bioinformatic study identified colocalization of potential rG4s in mRNA and epitranscriptomic modifications that suggests many additional experimental questions to be asked. Structurally, rG4s will present to epitranscriptomic writing enzymes differently than duplex, hairpin, or single-stranded regions of RNA, which may help explain why not all consensus motifs for a given modification are modified and why they generally are not quantitatively modified.
  45 in total

Review 1.  Where, When, and How: Context-Dependent Functions of RNA Methylation Writers, Readers, and Erasers.

Authors:  Hailing Shi; Jiangbo Wei; Chuan He
Journal:  Mol Cell       Date:  2019-05-16       Impact factor: 17.970

2.  In-cell RNA structure probing with SHAPE-MaP.

Authors:  Matthew J Smola; Kevin M Weeks
Journal:  Nat Protoc       Date:  2018-05-03       Impact factor: 13.491

Review 3.  Mapping N6 -Methyladenosine (m6 A) in RNA: Established Methods, Remaining Challenges, and Emerging Approaches.

Authors:  Katja Hartstock; Andrea Rentmeister
Journal:  Chemistry       Date:  2019-01-08       Impact factor: 5.236

4.  Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome.

Authors:  Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey
Journal:  Nat Methods       Date:  2015-06-29       Impact factor: 28.547

5.  Transcription Impacts the Efficiency of mRNA Translation via Co-transcriptional N6-adenosine Methylation.

Authors:  Boris Slobodin; Ruiqi Han; Vittorio Calderone; Joachim A F Oude Vrielink; Fabricio Loayza-Puch; Ran Elkon; Reuven Agami
Journal:  Cell       Date:  2017-04-06       Impact factor: 41.582

6.  Keth-seq for transcriptome-wide RNA structure mapping.

Authors:  Xiaocheng Weng; Jing Gong; Yi Chen; Tong Wu; Fang Wang; Shixi Yang; Yushu Yuan; Guanzheng Luo; Kai Chen; Lulu Hu; Honghui Ma; Pingluan Wang; Qiangfeng Cliff Zhang; Xiang Zhou; Chuan He
Journal:  Nat Chem Biol       Date:  2020-02-03       Impact factor: 15.040

7.  The G4 genome.

Authors:  Nancy Maizels; Lucas T Gray
Journal:  PLoS Genet       Date:  2013-04-18       Impact factor: 5.917

8.  Profiling RNA editing in human tissues: towards the inosinome Atlas.

Authors:  Ernesto Picardi; Caterina Manzari; Francesca Mastropasqua; Italia Aiello; Anna Maria D'Erchia; Graziano Pesole
Journal:  Sci Rep       Date:  2015-10-09       Impact factor: 4.379

Review 9.  Colocalization of m6A and G-Quadruplex-Forming Sequences in Viral RNA (HIV, Zika, Hepatitis B, and SV40) Suggests Topological Control of Adenosine N 6-Methylation.

Authors:  Aaron M Fleming; Ngoc L B Nguyen; Cynthia J Burrows
Journal:  ACS Cent Sci       Date:  2019-02-04       Impact factor: 14.553

Review 10.  Mechanistic insights into mRNA 3'-end processing.

Authors:  Ananthanarayanan Kumar; Marcello Clerici; Lena M Muckenfuss; Lori A Passmore; Martin Jinek
Journal:  Curr Opin Struct Biol       Date:  2019-09-06       Impact factor: 6.809

View more
  4 in total

1.  Recognition of G-quadruplex RNA by a crucial RNA methyltransferase component, METTL14.

Authors:  Atsuhiro Yoshida; Takanori Oyoshi; Akiyo Suda; Shiroh Futaki; Miki Imanishi
Journal:  Nucleic Acids Res       Date:  2022-01-11       Impact factor: 16.971

Review 2.  Action and function of helicases on RNA G-quadruplexes.

Authors:  Marco Caterino; Katrin Paeschke
Journal:  Methods       Date:  2021-09-10       Impact factor: 4.647

3.  Analysis of putative quadruplex-forming sequences in fungal genomes: novel antifungal targets?

Authors:  Emily F Warner; Natália Bohálová; Václav Brázda; Zoë A E Waller; Stefan Bidula
Journal:  Microb Genom       Date:  2021-05

Review 4.  RNA G-quadruplexes (rG4s): genomics and biological functions.

Authors:  Kaixin Lyu; Eugene Yui-Ching Chow; Xi Mou; Ting-Fung Chan; Chun Kit Kwok
Journal:  Nucleic Acids Res       Date:  2021-06-04       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.