Manuel Jara-Espejo1,2, Aaron M Fleming1, Cynthia J Burrows1. 1. Department of Chemistry, University of Utah, 315 South 1400 East, Salt Lake City, Utah 84112-0850, United States. 2. Department of Morphology, Piracicaba Dental School, University of Campinas-UNICAMP, Av. Limeira 901, Piracicaba, CEP 13414-018 Sao Paulo, Brazil.
Abstract
Maturation of mRNA in humans involves modifying the 5' and 3' ends, splicing introns, and installing epitranscriptomic modifications that are essential for mRNA biogenesis. With respect to epitranscriptomic modifications, they are usually installed in specific consensus motifs, although not all sequences are modified suggesting a secondary structural component to site selection. Using bioinformatic analysis of published data, we identify in human mature-mRNA that potential RNA G-quadruplex (rG4) sequences colocalize with the epitranscriptomic modifications N6-methyladenosine (m6A), pseudouridine (Ψ), and inosine (I). Using the only available pre-mRNA data sets from the literature, we demonstrate colocalization of potential rG4s and m6A was greatest overall and occurred in introns near 5' and 3' splice sites. The loop lengths and sequence context of the m6A-bearing potential rG4s exhibited short loops most commonly comprised of single A nucleotides. This observation is consistent with a literature report of intronic m6A found in SAG (S = C or G) consensus motifs that are also recognized by splicing factors. The localization of m6A and potential rG4s in pre-mRNA at intron splice junctions suggests that these features could function together in alternative splicing. A similar analysis for potential rG4s around sites of Ψ installation or A-to-I editing in mRNA also found a colocalization; however, the frequency was less than that observed with m6A. These bioinformatic analyses guide a discussion of future experiments to understand how noncanonical rG4 structures may collaborate with epitranscriptomic modifications in the human cellular context to impact cellular phenotype.
Maturation of mRNA in humans involves modifying the 5' and 3' ends, splicing introns, and installing epitranscriptomic modifications that are essential for mRNA biogenesis. With respect to epitranscriptomic modifications, they are usually installed in specific consensus motifs, although not all sequences are modified suggesting a secondary structural component to site selection. Using bioinformatic analysis of published data, we identify in human mature-mRNA that potential RNA G-quadruplex (rG4) sequences colocalize with the epitranscriptomic modifications N6-methyladenosine (m6A), pseudouridine (Ψ), and inosine (I). Using the only available pre-mRNA data sets from the literature, we demonstrate colocalization of potential rG4s and m6A was greatest overall and occurred in introns near 5' and 3' splice sites. The loop lengths and sequence context of the m6A-bearing potential rG4s exhibited short loops most commonly comprised of single A nucleotides. This observation is consistent with a literature report of intronic m6A found in SAG (S = C or G) consensus motifs that are also recognized by splicing factors. The localization of m6A and potential rG4s in pre-mRNA at intron splice junctions suggests that these features could function together in alternative splicing. A similar analysis for potential rG4s around sites of Ψ installation or A-to-I editing in mRNA also found a colocalization; however, the frequency was less than that observed with m6A. These bioinformatic analyses guide a discussion of future experiments to understand how noncanonical rG4 structures may collaborate with epitranscriptomic modifications in the human cellular context to impact cellular phenotype.
The process of pre-mRNA
maturation is of considerable interest
because each step can impact the stability, coding potential, or localization
of the mature mRNA.[1−5] Maturation of mRNA involves installation of a 5′ cap, addition
of a 3′-polyadenosine tail, writing of epitranscriptomic modifications,
and intron excision.[6] Processing of the
5′ and 3′ ends of mRNA has been studied, and recent
work suggests there is more to learn.[7,8] Understanding
of writing epitranscriptomic modifications and alternative mRNA splicing
are rapidly advancing as a result of next-generation sequencing (NGS)
and expansion of bioinformatic tools.[1−4] In human cells, NGS-based studies in tandem
with knockdown or knockout of established epitranscriptomic protein
readers, writers, or erasers have identified that these modifications
are involved in all aspects of mRNA biogenesis including splicing,
nuclear export, translation efficiency, and cellular half-life. The
collaboration of epitranscriptomic modifications and alternative mRNA
splicing has recently been described,[9] but
the structural features of mRNA driving the process remain mysterious.
A better understanding of the epitranscriptome and alternative mRNA
splicing will enable future researchers to modulate these pathways
judiciously in cells and provide new therapeutic opportunities to
target these pathways for disease treatment.
N6-Methyladenosine
The best studied epitranscriptomic
modification in mRNA is N6-methyladenosine
(m6A; Figure A). Writing of the
methyl group on A nucleotides in mRNA occurs in specific sequence
motifs by the METTL3/14 SAM-dependent methyltransferase complex at
a frequency of 0.1–1% of A nucleotides; however, not all such
target sequences are modified.[1−5] The DRACH (D = A, G, or U; R = G or A; H = A, C, or
U; A = m6A) consensus sequence appears to
be a dominant site of m6A installation in mature mRNA;
in contrast, the splicing sites within intron sequences of pre-mRNA
have m6A installed in SAG (S = C or G) motifs.[9,10] To date, the accessory protein WTAP appears to be the most essential
for guiding the methylation process, while VIRMA, ZC3H13, and RBM15,
as examples, also have been implicated in writing of m6A on mRNA.[3,4,11] Once written
on mRNA, m6A is read by the cytosolic YTHDF1–3 proteins,
and the m6A modifications can be removed by the demethylases
FTO or ALKBH5 imparting dynamics to this epitranscriptomic system.[1−5] Current studies have led to the suggestion that RNA secondary structure
plays a key role in selection of m6A modified sites.[12−14] Future work to clarify the structural component in selection or
mRNA sites epitranscriptomically modified is needed.
Figure 1
(A) Modifications m6A, Ψ, and I are epitranscriptomic.
(B) Maturation of mRNA involves intron splicing; potential G-quadruplex
sequences are found in introns near splice sites.
(A) Modifications m6A, Ψ, and I are epitranscriptomic.
(B) Maturation of mRNA involves intron splicing; potential G-quadruplex
sequences are found in introns near splice sites.
Pseudouridine
and A-to-I Editing
A second modification is the isomerization
of uridine to pseudouridine
(Ψ; Figure A)
in mRNA that occurs at a frequency of 0.2–0.6% of U nucleotides.[1] Out of the many humanpseudouridine synthases,
PUS1 and PUS7 are established to isomerize U to Ψ in mRNA.[2,15] The same questions regarding site selection of Ψ installation
have been asked, with a recent report finding that PUS1 targets hairpin-type
structures.[2] Lastly, two humanadenosine
deaminase RNA specific (ADAR) proteins catalyze A-to-I editing that
yields inosine (I; Figure ) as another epitranscriptomic modification.[16] Extensive studies on RNA editing have been conducted and
cataloged in databases for researchers to interrogate.[17] Editing of RNA by ADARs is catalyzed in a central
region of long dsRNA with some minor sequence and local structural
context requirements noted.[18] More analyses
are needed to seek out and clarify the RNA structural and sequence
requirements that dictate sites of m6A and Ψ epitranscriptomic
modification, as well as to explore whether other sequence motifs
can be sites of A-to-I RNA editing.[12,18,19]
Pre-mRNA and G-Quadruplexes
(G4s)
Alternative
splicing of nascent or pre-mRNA to yield mature mRNA
is a highly regulated eukaryotic process resulting in a single gene
having the potential to code for multiple proteins.[20] This diversification occurs by inclusion or exclusion of
particular exons in the final processed mRNA. In humans, ∼95%
of multiexonic genes are alternatively spliced enabling the ∼20 000
protein-coding genes to direct synthesis of a much greater diversity
of proteins. Alternative splicing of mRNA is cell-type specific and
changes with oxidative stress or disease.[20,21] The mRNA features and epitranscriptomic components that drive when
and to what extent alternative splicing occurs in particular cell
types have been topics of recent studies,[20,21] and future work is needed to better understand the RNA structural
details guiding the process.Genomes code for all the information
found in RNA (Figure B), and in the human genome
there exists enrichment of potential G-quadruplex forming sequences
(PQSs) in promoters, 5′-untranslated regions, and introns near
splice sites.[22] G-Quadruplexes (G4s) are
noncanonical folds in nucleic acids that have sequences comprised
of four or more runs of G in close proximity providing the opportunity
for the sequence to fold around intracellular K+ ions (Figure B).[23] In DNA G4s, stable folds each have G-runs of three or more
Gs, while in RNA G4s (rG4s), two Gs per run can adopt stable folds.[23−25] Those PQSs in the nontemplate strand (i.e., coding strand) of a
gene will also be present in the RNA transcript (Figure B). We have noted that potential
rG4s in the Zika and HIV viral RNAs colocalize with sites of m6A installation;[13] however, whether
a similar colocalization exists in the human transcriptome had not
yet been studied.
Bioinformatic Analysis of G4s and Epitranscriptomic Modifications
In the present set of bioinformatic studies, mapping data for m6A and Ψ in human mature mRNA, as well as I in the human
transcriptome, were inspected for colocalization around potential
rG4s. We found enrichment of each epitranscriptomic modification around
potential rG4s in the data inspected. A second analysis looked at
the only available modification maps in pre-mRNA looking for m6A, in which we found a greater colocalization of this modification
with potential rG4s than observed in mature mRNA for all three modifications
analyzed. This finding suggests a possible synergy in the deposition
of m6A on pre-mRNA splice sites around potential rG4s.
Biochemical studies suggest that m6A is written on nascent
mRNA cotranscriptionally; therefore, chromatin features such as genomic
sequences and structures or histone modifications that interact with
the mRNA synthesis machinery can impact where and to what extent epitranscriptomic
modifications are installed.[3,9,26,27] Here, we identify that colocalized
sites of m6A in pre-mRNA and rG4s track with the genomic
G4s found in introns, suggesting a possible long-range interplay of
chromatin structure and the epitranscriptome. In the final analysis
of the m6A sites in the potential rG4s of intronic pre-mRNA,
a preference for rG4 loop sequences with a modifiable A nucleotide
was identified. The SAG sequence previously found for
the consensus motif of m6A deposition in intronic sequences
was found to be part of a larger rG4 structural context.The
publicly available RNA modification maps in human mRNA used
in the present study are summarized in Table S1.[9,12,17,27−29] The peak summits for each epitranscriptomic modification
were identified, and then we selected a window of sequence space ±30
nucleotides flanking the summit to inspect for PQSs computationally.
The quadruplex-forming G-rich sequence (QGRS) mapper algorithm was
used to look for the sequence pattern 5′-GL≤7GL≤7GL≤7G-3′ where x ≥ 2 nucleotides
and L represents the loops.[30] The first
data set inspected had sequenced m6A in mature mRNA collected
from HeLa cells using the m6A-CLIP sequencing protocol.[27] From these data in mature HeLa mRNA, 17% (7838
out of 46355) of the m6A enriched regions were found to
also contain a PQS (Figure A). Inspection of m6A mapped via miCLIP in mature
mRNA from HEK293 cells[29] found 18% (3774
out of 20579) of the m6A enriched regions colocated with
a PQS (Figure A).
These data in two different cell lines indicate that nearly one-fifth
of the m6A peaks colocate with a PQS in human mature mRNA.
Figure 2
Inspection
of human mature mRNA sequenced for m6A, Ψ,
and A-to-I editing sites to determine whether a PQS resides in the
same location. (A) Bar plot showing the number of m6A enriched
sites in HeLa and HEK293 mature mRNA that also have a PQS overlapping
with the site of modification. (B) Classification of the PQSs found
in the HEK293 mature mRNA on the basis of the number of G-tetrads
that can form from the sequence. (C) Plot of PQS enrichment (observed/expected)
found in the m6A or Ψ epitranscriptomic modification
sites in mature mRNA from HEK293 or HeLa cells, and A-to-I editing
sites in the Inosinome Atlas.[17,27−29]
Inspection
of human mature mRNA sequenced for m6A, Ψ,
and A-to-I editing sites to determine whether a PQS resides in the
same location. (A) Bar plot showing the number of m6A enriched
sites in HeLa and HEK293 mature mRNA that also have a PQS overlapping
with the site of modification. (B) Classification of the PQSs found
in the HEK293 mature mRNA on the basis of the number of G-tetrads
that can form from the sequence. (C) Plot of PQS enrichment (observed/expected)
found in the m6A or Ψ epitranscriptomic modification
sites in mature mRNA from HEK293 or HeLa cells, and A-to-I editing
sites in the Inosinome Atlas.[17,27−29]In the next step of the analysis,
the PQSs found in sites of m6A enrichment were classified
on the basis of the number of
G-tetrads that would occur in the rG4 fold. In RNA, stable rG4 folds
have been found with only two G-tetrads, which is in contrast to DNA
that generally require at least three G-tetrads to adopt a stable
fold.[13,23] The greater stability in rG4s results from
the 2′-OH providing an additional hydrogen bond that is not
possible in DNA.[23] Prior studies have shown
that two-tetrad rG4s provide a quasi-stable fold that can be harnessed
as a switch to impact the fate of RNA in cells.[25,31] In mature mRNA from HEK293 cells, 81% could adopt a two G-tetrad
rG4 and 19% could adopt a three or more G-tetrad rG4 (Figure B); a similar excess of two
G-tetrad compared to three G-tetrad potential rG4s was observed for
the mature mRNA from HeLa cells (Figure S1). The overabundance of two G-tetrad rG4s may have biological significance
because this lowers the stability of the fold and possibly allows
it to function as an on–off switch with dependency on the modification
status. Studies of m6A in rG4s have yet to be conducted,
although a recent report of N6-methyl-2′-deoxyadenosine
in the loop of a DNA G4 has found this modification to destabilize
the structure.[32] A similar impact on stability
may exist in rG4s. In this situation, m6A would function
to destabilize rG4 folds and possibly favor other secondary structures
as the structural switch.The statistical significance of PQSs
colocalizing with m6A-enriched sites in human mRNA was
calculated by comparison of the
identified count to one obtained from randomized and shuffled sequences
(Figure C). The presence
of PQSs in the m6A sites in the HEK293 and HeLa cells was
found to be significant with 2.6-fold (P < 2.2
e–16) and 2.1-fold (P < 2.2
e–16; Table S2) enrichment,
respectively, relative to the randomized samples on the basis of the
Fisher’s exact test. To summarize the m6A analysis
in mature mRNA, the modification sites were found to be favorably
enriched (>2-fold) around potential rG4s that predominantly have
two
G-tetrads (Figure ).Next, mature mRNA from HeLa or HEK293 cells were inspected
for
colocalization of Ψ sites with PQSs that were chosen for study
to be consistent with the m6A data sets analyzed (Table S1).[12,17] In the Ψ data
set from HEK293 cells, 13% of the modified sites (308 out of 2058)
also contained a PQS (Figure S1). This
number of PQSs represents a significant 1.7-fold enrichment of these
G-rich sequences (P < 8.8 e–8; Figure C and Table S2). In the HeLa cell data set, 20% of
the modified sites (24 out of 115) also contained a PQS in the mature
mRNA (Table S1). This represents a significant
2.0-fold enrichment of these G-rich sequences (P <
6.4 e–3; Figure C and Table S2). In the
Ψ sites from the different cell lines, 95% of the HEK293 and
89% of the HeLa sites had a two G-tetrad rG4s with the remainder having
three or more G-tetrads (Figure S1). This
identified a small but favorable enrichment of Ψ sites around
potential rG4s, and the rG4s predominantly adopt two G-tetrads.The A-to-I editing data were obtained from the Inosinome Atlas
that provides a comprehensive listing of all established editing sites.[17] A key difference in this analysis is that the
data were obtained from the entire transcriptome and not restricted
to mature mRNA. In the A-to-I analysis, 22% (1 004 026
out of 4 668 508) colocalized with a PQS (Figure C), which was a significant
2.3-fold enrichment (P < 2.2 e–16; Table S2). The G-tetrad count for PQSs
colocalized with A-to-I editing sites was ∼80% two G-tetrad
G4s and ∼20% three or more G-tetrad rG4s (Figure S1); these values are very similar to those found in
the m6A and Ψ PQS colocalization data described above.Guided by the knowledge that PQSs are enriched in human introns,[33] inspection of the chromatin-associated RNA (i.e.,
pre-mRNA) was then conducted. Maps for m6A in HeLa and
HEK293 cellular pre-mRNA are the only ones available;[9,27] thus, we were not able to conduct a similar analysis for Ψ
or I in pre-mRNA. The m6A enriched sites in pre-mRNA from
HEK293 cells were sequenced using transient N6-methyladenosine transcriptome sequencing (TNT-seq),[9] while the HeLa pre-mRNA were sequenced using
the m6A-CLIP protocol;[27] thus,
there exists a difference in how these two maps were obtained and
they may have different sequence and structural biases. In the HeLa
pre-mRNA, 21% (7919 out of 37606) of the m6A sites colocalized
with a potential rG4 (Figure A). On the other hand, in HEK293 pre-mRNA 40% (23372 out of
58311) of the enriched m6A sites colocalized with a potential
rG4 (Figure A). In
the population of potential rG4s colocalized with m6A in
HEK293 cells, ∼80% were two G-tetrad rG4s and the rest had
three of more G-tetrads (Figure B). In the HeLa pre-mRNA, potential rG4s were found
to be ∼90% two G-tetrad rG4s and the remainder were three or
more G-tetrad rG4s (Figure S1). This analysis
suggests a high incidence of m6A enriched sites occurring
in potential rG4s with two G-tetrads in pre-mRNA particularly in the
pre-mRNA from HEK293 cells.
Figure 3
Analysis of enriched sites of m6A
in human pre-mRNA
for potential rG4s. (A) Counts of m6A sites with and without
a colocalized potential rG4. (B) Break down of potential rG4s found
in the HEK293 data set for the number of G-tetrads. (C) Fold enrichment
of potential rG4s in the experimental sample relative to the randomized
sample. (D) Intron map illustrating the m6A enriched sites
found in HEK293 pre-mRNA sequenced by TNT-seq,[9] the position of the PQSs found in the sequencing data, and comparison
to the position of G4s found in the human genome via G4-seq.[33]
Analysis of enriched sites of m6A
in human pre-mRNA
for potential rG4s. (A) Counts of m6A sites with and without
a colocalized potential rG4. (B) Break down of potential rG4s found
in the HEK293 data set for the number of G-tetrads. (C) Fold enrichment
of potential rG4s in the experimental sample relative to the randomized
sample. (D) Intron map illustrating the m6A enriched sites
found in HEK293 pre-mRNA sequenced by TNT-seq,[9] the position of the PQSs found in the sequencing data, and comparison
to the position of G4s found in the human genome via G4-seq.[33]The statistical significance
for enrichment of PQSs in the regions
of m6A installation in human pre-mRNA was compared to randomized,
shuffled sequences (Figure C). Comparison of the 23372 PQSs identified in the HEK293
pre-mRNA to the 5461 PQSs expected by randomized shuffling found a
4.3-fold enrichment in PQSs; this finding is significant on the basis
of Fisher’s exact test (P < 2.2 e–16; Figure C and Table S2). In the HeLa mRNA analyzed for m6A and PQS colocalization, there was a significant 2.1-fold
enrichment found in the pre-mRNA (P < 2.2 e–16; Figure C and Table S2). Taken together,
these results support a favorable colocalization of m6A
sites and PQSs in human mRNA. The finding of m6A and PQS
colocation in human mRNA, particularly introns of pre-mRNA, suggests
that the rG4 secondary structure and epitranscriptomic m6A may be synergistic; this observation is consistent with our previous
report on the colocalization of m6A and PQSs in viral genomic
RNA.[13]In the HEK293 pre-mRNA m6A data reported by Louloupi
et al.,[9] a high incidence of m6A residing in intronic regions was observed. Next, a focused inspection
of PQSs in the intron regions was conducted. The m6A data
(Figure D, solid red)
and PQS (Figure D,
dashed red) sites tracked with each other and were favorably enriched
on the intronic side of both the 5′ and 3′ splice sites.
With the knowledge that epitranscriptomic modifications are installed
cotranscriptionally and that the chromatin architecture on the genome
and histones impacts the methylation process, we asked whether DNA
G4s near the region coding for the RNA m6A sites are favorably
colocalized. The Balasubramanian laboratory developed G4-seq to find
all sequences that could adopt G4s in the human genome.[33] There is one noteworthy point regarding this
comparison, which is that stable DNA G4s are generally at least three
G-tetrads,[24] while the present RNA analysis
found a preference for two G-tetrad rG4s (Figures B, 3B, and S1). The comparison found the G4-seq data on
the coding strand of human introns did indeed show enrichment at both
the 5′ and 3′ splice sites (Figure D, blue line) that also tracked with the
mRNA m6A and PQS profiles.How could a genomic G4
impact writing of m6A on pre-mRNA?
Events that stall the RNA pol II complex during transcription show
increased deposition of m6A on the transcript.[3,9,26,27] It is known that template strand G4s stall polymerase bypass;[34] however, G4s in the template strand do not code
for the G-rich sequence in the mRNA. In contrast, genomic G4s in coding
strands can stall the RNA pol II complex by increasing the persistence
and length of R-loops[35,36] and at the same time code for
an rG4 in the nascent mRNA. It is possible that the genomic G-rich
sequence has two effects that are (1) to slow mRNA synthesis and (2)
to cause greater writing of m6A on potential rG4s in pre-mRNA.
Future experimental studies are needed to address this hypothesis
derived from bioinformatic inspection of m6A mapping data
in HEK293 pre-mRNA.
A Role for G4s and m6A in Splicing?
A closer examination of the data around intron
splice sites provided
a few additional observations. Within 200 nt of each splice junction,
8040 m6A-enriched sites on the intronic 5′ splice
site representing 3187 genes and 7236 m6A-enriched sites
on the intronic 3′ splice site representing 2681 genes were
found; these numbers represent 47% of total m6A intronic
sites in the HEK295 pre-mRNA data set. A similar distribution of m6A and PQS colocalization sites was observed indicating a possible
close association of these two RNA features near mRNA splicing sites.
These observations suggest an opportunity for future studies that
address whether rG4s and m6A function synergistically to
guide mRNA splicing.The A-to-I editing sites in introns of
HEK293 mRNA were plotted
alongside the m6A, PQS, and G4-seq data to find that RNA
editing was not observed around splice sites and did not track with
PQSs in this region (Figure D, black dashed line). The A-to-I editing sites appear to
be depleted around intron splicing sites. The Ψ data set from
HEK293 cells was conducted on mature mRNA, in which introns are not
present, and therefore, no further analysis of the data was conducted.
G-Quadruplex
Loop Analysis
In mature mRNA, m6A mapping studies
have suggested a
broad consensus motif for A methylation in the sequence
context DRACH (D = A, G, or U; R = G or A; and H = A,
C, or U). In the work by Louloupi et al., intronic m6A
was favorably deposited in SAG (S = C or G) sequence
motifs.[9] Because the HEK293 pre-mRNA data
exhibited the highest colocalization of m6A and PQSs, the
sequence population was further interrogated to identify favorable
rG4 loop profiles with respect to length and sequence. During the
PQS inspection, the three loops could have 7 or fewer nucleotides.
The loop analysis was conducted on 14076 two G-tetrad PQSs and 729
three or more G-tetrad PQSs. In the loop length analysis of two G-tetrad
PQSs, there was a slight preference for shorter loop lengths, but
there were many longer loop PQSs observed at a high relative frequency
(Figure A). Additionally,
a breakdown of the first, second, or third loops found they all had
a similar length profile (Figure A). The minimal dependency of two G-tetrad rG4 loop
lengths in the population colocalizing with m6A suggests
a broad structural substrate scope for writing this modification on
these noncanonical folds. Additionally, the data suggest symmetry
in loop length may be a feature of rG4s that are at sites of m6A installation.
Figure 4
Loop length analysis of the PQSs that colocalized
with m6A enriched regions in the HEK293 pre-mRNA.[9] Analysis of individual loop lengths for (A) two
G-tetrad PQSs and
(B) three or more G-tetrad PQSs. Analysis of loop length combinations
for (C) two G-tetrad PQSs and (D) three or more G-tetrad PQSs that
identify the most prevalent loop length combinations.
Loop length analysis of the PQSs that colocalized
with m6A enriched regions in the HEK293 pre-mRNA.[9] Analysis of individual loop lengths for (A) two
G-tetrad PQSs and
(B) three or more G-tetrad PQSs. Analysis of loop length combinations
for (C) two G-tetrad PQSs and (D) three or more G-tetrad PQSs that
identify the most prevalent loop length combinations.In the PQS population with three or more G-tetrads, one nucleotide
loop lengths were less common, while three and four nucleotide loop
lengths were most common (Figure B). Inspection of the combination of all three loop
lengths together in the two G-tetrad PQSs found the 1–1–1
loop length combination to be most common followed by the 4–4–4
loop length combination. In general, as the loop length combination
increased or became asymmetric, the number of PQSs observed decreased
(Figure C). For the
three-loop length combination analysis of the three G-tetrad or more
PQSs, the most common combination found had 3–3–3 nucleotide
loop lengths; this was followed by the longer 7–7–7
and shorter 2–2–2 nucleotide loop lengths (Figure D). In the three
or more G-tetrad PQS data, the least common were those with asymmetric
loop lengths, with the exception to the least common pool being 6–6–6
nucleotide long loops (Figure D). This information suggest longer loops in three or more
G-tetrad rG4s with which m6A colocalize are preferred.
The reason for this difference relative to the two G-tetrad cohort
is not known; however, longer loops in G-quadruplexes usually destabilize
the structure,[37] which may be important
in rG4s used as structural switches responding to the presence of
m6A in cells.Inspection of the loop sequences of
the two G-tetrad PQSs in m6A-enriched regions of intronic
pre-mRNA in HEK293 cells identified
a nucleotide preference. In Table , a rank ordering of the top five most common loop
sequences found in each of the three possible rG4 loops is provided.
Single-nucleotide loops comprised of an A nucleotide were the most
common in all three loops. Because A nucleotides are potential sites
of methylation, the observation of single A nucleotides in the two
G-tetrad PQSs nicely fits our hypothesis that rG4 folds provide a
structural motif to guide sites of m6A introduction in
human mRNA. Furthermore, the high incidence of single A nucleotides
is consistent with the work by Louloupi et al.,[9] in which the consensus motif SAG occurred
with greater frequency.
Table 1
PQS Loop Nucleotide
Composition for
Those Loops in m6A-Enriched Regionsa
order
most to least common
1
2
3
4
5
loop
1
A
U
GA
G
AA
count
1044
518
502
474
237
loop 2
A
U
G
GA
C
count
936
657
384
250
239
loop
3
A
U
G
CA
GA
count
999
691
459
379
332
Data corresponds to HEK293 pre-mRNA
m6A profile reported by Louloupi et al.[9] See Table S3 for additional
data.
Data corresponds to HEK293 pre-mRNA
m6A profile reported by Louloupi et al.[9] See Table S3 for additional
data.The second most common
loop sequence observed in the two G-tetrad
PQSs was single U nucleotides in any of the three loops, although
many of the top ten most prevalent loop sequences contained A nucleotides
within dinucleotide motifs such as 5′-GA, 5′-AA, 5′-CA,
and 5′-AG. Interestingly, the 5′-AC dinucleotide that
would indicate the DRACH consensus motif within a PQS
was not among the top 10 most prevalent loop sequences (Table S3). Further inspection of the 795 PQSs
in which single-nucleotide loops were identified also found that homogeneous
loops of single A nucleotides dominated the distribution with 142
occurrences. Inspection for homogeneous single U or C nucleotide loops
found 18 and 34 occurrences, respectively, and no PQSs were observed
with all-G loops. In summary, two G-tetrad PQSs in m6A
enriched regions are biased in their nucleotide composition to have
A > U > G > C and in their propensity to have the same lengths
for
all three loops (Figure and Tables and S3). This indicates that homogeneous short loops
favor A nucleotides as evidenced by the loop composition analysis,
although the sequences with all C or U nucleotide loops will not provide
an RNA substrate for writing m6A. These sequences may be
false positives or the methylated A may reside just beyond the G4;
additional analysis of the tail sequences for the small sample of
PQSs without an A was not conducted. The sequence composition for
the three or more G-tetrad PQSs found colocalized with m6A was also analyzed and found to be rich in A nucleotides (Table S4). Additionally, the 5′-AC dinucleotide
common to the DRACH consensus motif was not found in
the top ten most common loop sequences, as indicated by the Louloupi
et al. work.[9] Short rG4 folds such as these
found in the sequencing data can adopt stable folds as was recently
reported.[38]
Conclusions and Outlook
The observations of the bioinformatic analysis of m6A colocalization with rG4s, especially those near intronic splice
sites, can guide the design of future experiments. (1) This bioinformatic
study suggests a synergy between rG4 folds with sites of m6A epitranscriptomic modification. Do rG4 folds function as a structural
motif for methylation of the RNA by the METTL3/14 methyltransferase
complex? The loop length and sequence identity of the rG4s found are
essential knowledge for the design of in vitro studies to test the
hypothesis that METLL3/14 favors writing of m6A on rG4
scaffolds. One feature of the rG4s the present data cannot address
is whether the preferred folds exist in sequence contexts that are
dynamic between rG4 structures or with other RNA structures; this
type of information will likely need to be addressed on a sequence-by-sequence
basis or via inspection of high-resolution RNA structural maps in
vivo obtained by chemical probing.[39,40] (2) The SAG consensus sequence is also recognized by some splicing
proteins such as SRSF3.[9] Do these splicing
proteins bind rG4s and is their binding modulated by the presence
of m6A in a rG4 loop? (3) Splicing factors that bind SAG sequence motifs were found to be involved in alternative
mRNA splicing.[41] The role of PQSs in introns
and their folding to rG4s to impact alternative splicing of specific
mRNA, such as the p53 mRNA, has been noted.[42−44] Is there synergy
between rG4s and m6A to guide alternative mRNA splicing?
At present, studies have addressed rG4s[42,43] and m6A individually for guiding alternative splicing.[9] The analysis presented here suggests that rG4s
and m6A may collaborate in alternative splicing. (4) Whether
the rG4 fold is the signal for writing or erasing m6A in
the mRNA or the presence of m6A impacts the rG4 fold is
not known, and is a question we previously asked.[13] Further, rG4 folds are known to be dynamic and adopt many
different structures; because each sequence is unique, this would
have to be studied on a case-by-case basis. (5) The preference for
A-rich two G-tetrad potential rG4s identified in the intronic m6A enriched sites is not understood, and further studies are
needed. It is known that A-rich G4 loops destabilize the fold that
may be important for stability of the fold to be altered by methylation
under physiological conditions.[45]Herein, we explored the structural pattern of human RNA sites harboring
m6A, Ψ, or A-to-I editing modifications, focusing
on the presence of PQSs at these sites. The study revealed that all
three modifications favorably colocalized with potential rG4s when
a comparison to a randomized data set was conducted (Figures C and 3C). The greatest colocalization was observed between m6A and potential rG4s near the splice sites in introns of HEK293 cells
(Figure A–D).
This observation suggests there may be an interplay between m6A, rG4s, and mRNA splicing that could be a component of alternative
splicing; future experimental work is needed to address this possibility.
Our prior interest in the colocalization of m6A and PQSs
focused on viral RNA genomes that showed a preference for DRACH motif methylation.[13] The present
work on human pre-mRNA is consistent with the viral RNA analysis;
however, the sequence context found for the colocalization in intronic
pre-mRNA occurs largely within SAG motifs (Table ). This difference observed
may reflect the fact that writing of m6A on pre-mRNA occurs
in the nucleus, while m6A in viral RNA occurs in the cytosol.[9,13] The present bioinformatic study identified colocalization of potential
rG4s in mRNA and epitranscriptomic modifications that suggests many
additional experimental questions to be asked. Structurally, rG4s
will present to epitranscriptomic writing enzymes differently than
duplex, hairpin, or single-stranded regions of RNA, which may help
explain why not all consensus motifs for a given modification are
modified and why they generally are not quantitatively modified.
Authors: Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey Journal: Nat Methods Date: 2015-06-29 Impact factor: 28.547
Authors: Boris Slobodin; Ruiqi Han; Vittorio Calderone; Joachim A F Oude Vrielink; Fabricio Loayza-Puch; Ran Elkon; Reuven Agami Journal: Cell Date: 2017-04-06 Impact factor: 41.582
Authors: Ananthanarayanan Kumar; Marcello Clerici; Lena M Muckenfuss; Lori A Passmore; Martin Jinek Journal: Curr Opin Struct Biol Date: 2019-09-06 Impact factor: 6.809