This Outlook calls attention to two seemingly disparate and emerging fields regarding viral genomics that may be correlated in a way previously overlooked. First, we describe identification of conserved potential G-quadruplex-forming sequences (PQSs) in viral genomes relevant to human health. Studies have demonstrated that PQSs are highly conserved and can fold to G-quadruplexes (G4s) to regulate viral processes. Key examples include G4s as a countermeasure to the host's immune system or G4-guided regulation of replication or transcription. Second, emerging data are discussed concerning the epitranscriptomic modification N 6-methyladenosine (m6A) in viral RNA installed by host proteins in a consensus sequence favoring 5'-GG(m6A)C-3'. The proposed pathways by which m6A is written, read, and erased in viral RNA genomes and the impact this has on viral replication are described. The structural reason why certain sites are selected for modification while others are not is still mysterious. Finally, we discuss our new observations regarding these previous sequencing data that identify m6A installation within the loops of two-tetrad PQSs in the RNA genomes of the Zika, HIV, hepatitis B, and SV40 viruses. We hypothesize that conserved viral PQSs can provide a framework (sequence and/or structural) for m6A installation. We also discuss literature sources suggesting that PQSs as sites of RNA modification could be a general phenomenon. We anticipate our observations will provide ample opportunities for exciting discoveries regarding the interplay between G4 structures and epitranscriptomic modifications of RNA.
This Outlook calls attention to two seemingly disparate and emerging fields regarding viral genomics that may be correlated in a way previously overlooked. First, we describe identification of conserved potential G-quadruplex-forming sequences (PQSs) in viral genomes relevant to human health. Studies have demonstrated that PQSs are highly conserved and can fold to G-quadruplexes (G4s) to regulate viral processes. Key examples include G4s as a countermeasure to the host's immune system or G4-guided regulation of replication or transcription. Second, emerging data are discussed concerning the epitranscriptomic modification N 6-methyladenosine (m6A) in viral RNA installed by host proteins in a consensus sequence favoring 5'-GG(m6A)C-3'. The proposed pathways by which m6A is written, read, and erased in viral RNA genomes and the impact this has on viral replication are described. The structural reason why certain sites are selected for modification while others are not is still mysterious. Finally, we discuss our new observations regarding these previous sequencing data that identify m6A installation within the loops of two-tetrad PQSs in the RNA genomes of the Zika, HIV, hepatitis B, and SV40 viruses. We hypothesize that conserved viral PQSs can provide a framework (sequence and/or structural) for m6A installation. We also discuss literature sources suggesting that PQSs as sites of RNA modification could be a general phenomenon. We anticipate our observations will provide ample opportunities for exciting discoveries regarding the interplay between G4 structures and epitranscriptomic modifications of RNA.
Potential G-Quadruplex Forming Sequences in Viral Genomes
In specific guanine-rich (G-rich) sequences of DNA or RNA, G-quadruplex
(G4) folds can drive a variety of cellular processes. In human cells,
immunofluorescent visualization of G4s identified their formation
in the genome[1] and transcriptome.[2] Processes in the genome that destabilize duplex
DNA,[3,4] DNA in the single-stranded regions of telomeres,[5,6] or R-loops[7] with the correct sequence
allow G4s to fold. Functional roles for G4s have been documented in
transcription,[8] telomere homeostasis,[5,6] alteration of the epigenetic landscape,[9,10] their
occurrence at origins of replication,[11] and their function in both DNA and RNA in class-switch recombination,[12] and these sequences may be sites of double-strand
breaks to the genome.[13] In RNA, the literature
is incongruent with respect to the importance of these folds,[2,14] although experimental evidence has built a strong case supporting
G4 folding in cells[2,15] even though they may not persist
long-term. In RNA, G4s can alter mRNA expression,[16,17] regulate pre-mRNA processing (splicing and polyadenylation),[16] are a component of stress granules,[18] and function in microRNAs.[19,20] Thus, G4s in DNA and RNA are critical for regulating the complex
human cellular network, and similar functions have been reported in
other eukaryotes,[21] plants,[22] prokaryotes,[23,24] and viruses.[25] Recent developments regarding PQSs and G4s in
viruses are described in this Outlook.Sequences of DNA or RNA
with the pattern 5′-GLGLGLG-3′, in which four G runs
are closely spaced and separated
by loops (L), are potential G-quadruplex-forming sequences (PQSs; Figure A). Generally, it
is found that n ≥ 3 and x = 1–12 and sometimes up to 20.[26−29] This pattern allows computational
inspection of genomes for PQSs as a first step in G4 studies followed
by experimental validation of folding for those deemed interesting.
Folded G4s are composed of tetrads comprised of one G from each of
the four runs embraced in G·G Hoogsteen pairing (Figure B).[30,31] This G·G pairing directs one lone pair of electrons on O6 of each G toward the interior channel to coordinate
with the major monovalent cation K+ (∼140 mM in
human cells) and stabilize the structure (Figure B).[30,31] Structural analysis
has identified a variety of G4 topologies such as parallel-stranded,
antiparallel-stranded, or hybrid structures (Figure C).[31] These folds
differ in the 5′ to 3′ orientation of the G runs and
the syn or anti conformations of
the G nucleotides, resulting in different loops and grooves in the
folds for protein recognition and targeting with small molecules.[32−34] In DNA, all of these folds have been observed. In contrast, the
2′-OH on ribose in RNA provides an additional hydrogen bond,
alters the hydration state of the structure, and leads to a preference
for the anti conformation of G as a consequence of
the C3′-endo sugar pucker, leading to RNA
G4s strongly favoring parallel-stranded folds.[35,36] Folded G4s in RNA are typically more stable than their DNA counterparts.[35,36] Stable G4s in viral DNA and RNA with two tetrads (n = 2; Figure A) are
reported.[37−40] Lastly, folded G4s with bulges, hairpins in the loops, or between
two strands are reported,[41−43] while identification of these
sequence types is challenging to predict.
Figure 1
Characteristics of G4
folds. (A) Formula for a PQS, (B) G-tetrad
structure, (C) example G4 folds.
Characteristics of G4
folds. (A) Formula for a PQS, (B) G-tetrad
structure, (C) example G4 folds.In human viruses PQSs can exist in DNA, RNA, or both polymers
depending
on the viral replication cycle. Excellent reviews on viral G4s exist
that highlight their physiological importance and focus on targeting
G4s as a therapeutic approach to fight viral infections;[44−48] herein, examples of conserved PQSs and folded G4s impacting viral
replication are described.The flaviviruses include dengue, hepatitis C,
West Nile, yellow fever, and Zika viruses that have significantly
impacted human health.[37,49,50] The flaviviruses have positive-sense (+), single-stranded RNA (ssRNA)
genomes devoid of a 3′-poly-A tail (Figure A).[51] Genome replication
occurs in the cytosol through a double-stranded RNA (dsRNA) intermediate
via a specialized RNA-dependent RNA polymerase (RdRp) encoded by the
virus (Figure ).[52] The complementary strand to a PQS can adopt
an i-motif fold that is a tetraplex structure found in C-rich strands
complementary to G4s comprised of (C:C)+ hemiprotonated
base pairs that fold under acidic conditions; however, these folds
are unstable in RNA[53] and likely are not
found in flaviviruses. Potential G-quadruplex-forming sequences can
occur in either the positive- or negative-sense strand and impact
viral processes when folded. Inspection for PQSs in flavivirus genomes
identified seven conserved sequences on the positive strand, while
no conserved PQSs were found on the negative strand throughout the
genus.[37] This observation was quite surprising
because of the high degree of sequence variability in the viral cohort.[51]
Figure 2
Locations in which flaviviruses and HIV-1 replicate in
human cells
and provide the opportunity for PQSs to adopt G4 folds.
Locations in which flaviviruses and HIV-1 replicate in
human cells
and provide the opportunity for PQSs to adopt G4 folds.As examples, the Zika genome has ∼70 PQSs
on the positive
strand and is devoid of PQSs on the negative strand;[37] in addition, beyond the conserved positive-sense strand
PQSs, the hepatitis C viral (HCV) genome also has a PQS on the 3′
end of the negative-sense strand.[37] In
vitro analysis demonstrated that a subset of the Zika and HCV PQSs
could fold to stable G4s, and addition of a G4-specific ligand stalled
polymerase bypass of template G4s from both viruses.[37,49,50] Cellular studies identified that
HCV replication is attenuated by various G4-specific ligands.[50] These findings support G4 folding in flaviviral
genomes, even if ligand induced, and folding impedes replication.Why do PQSs
persist throughout the flaviviruses, and is there an
evolutionary advantage to maintaining these sequences? Flaviviruses
replicate their RNA genomes in the cytosol and must avoid detection
by the RNA decay pathway comprised of the 5′,3′-endonuclease
XRN1 that digests foreign RNAs.[54] Foreign
RNAs are identified by nonstandard 5′ modifications, absence
of appropriate ribonucleoprotein signatures, dsRNAs, or lack of a
3′-poly-A tail, in which the latter three signatures are common
to flaviviruses.[54] One approach that flaviviruses
utilize to avoid complete nuclease digestion of their genomes is through
XRN1-resistant secondary structures,[54] of
which G4s represent one such fold.[55] Subgenomic
flaviviral genomes are hypothesized to be sacrificial providing an
opportunity for replication and packaging of full-length genomes to
occur.[54] Recently, a PQS in the RNA genome
of the Rift Valley fever virus was shown to block XRN1 degradation.[56] The flaviviral genomes all possess conserved
PQSs that can adopt G4s to block XRN1. Thus, we hypothesize that flaviviruses
retain PQSs in order to adopt G4s to counteract host-derived surveillance
nucleases that combat viral infections. Additional examples of RNA
viruses that replicate solely through RNA intermediates and harbor
PQSs, some of which can adopt G4s, include the Ebola virus,[57] phleboviruses (including Rift Valley fever virus),[56] and arenaviruses (e.g., Lassa virus).[56] Further studies regarding G4s as nuclease-resistant
structures as a viral countermeasure to host defenses are needed.The human immunodeficiency virus (HIV) is a retrovirus responsible
for causing acquired immune deficiency syndrome (AIDS). The HIV capsid
has two copies of the sense-strand ssRNA genome essential for reverse
transcription (RT) to yield a proviral dsDNA (Figure ).[58] The proviral
dsDNA upon integration in the host genome is the template for mRNA
synthesis to produce viral proteins and more viral RNA to incorporate
into newly formed viral capsids. The HIV genome harbors PQSs that
can fold in the DNA and RNA throughout the sequences;[38,39] i-motif folds in the proviral DNA have not been reported. Key PQSs
in HIV are the U3 region in the 3′-UTR of the viral RNA that
codes for part of the 5′ and 3′ long-terminal repeat
(LTR) of the proviral DNA. In the proviral DNA, the PQSs in the U3
region in the 5′-LTR are part of the promoter for regulation
of RNA synthesis in the nucleus. The conservation of these PQSs has
been extensively documented.[39,59,60] Currently, three NMR-based structures have been reported for two
different HIVG4s found in the U3 region of the 5′-LTR of the
proviral DNA genome.[39,42,61] The RNA counterparts in the U3 region were shown to fold on the
basis of circular dichroism (CD) analysis and a reverse transcriptase
stop assay.[38] In the cellular context demonstration
of G4 folding in DNA and RNA was achieved by addition of various G4-specific
ligands that slowed viral replication, and mutational studies that
demonstrated sequences incapable of G4 formation impacted viral fitness,
i.e., the ability of the virus to thrive and replicate.[38,43,62]Another PQS site that can
fold in HIV is one between two RNA genomes
to yield a bimolecular G4 in the central polypurine track (cPPT) near
the dimer initiation site[43,63] and the gag polyprotein
coding region.[64] The close association
of the two viral genomes is an essential part of the viral replication
strategy, and a G4 fold may aid in this process based on the observation
that mutation to abolish G4 folding impacts replication.[43] Additionally, folded G4s were identified in
the HIV proviral genome in the nef protein coding
region.[65] Thus, HIV DNA and RNA provide
many examples of PQSs with functional roles during replication.A fascinating example of G4s functioning as a countermeasure to
surveillance by the host’s immune system was documented in
the Epstein–Barr virus.[66] Infection
with this virus is associated with Burkitt’s lymphoma, nasopharyngeal
carcinoma, and Hodgkin’s lymphoma.[66] The Epstein–Barr virus has a dsDNA genome with a series of
13 PQSs in the coding region of the Epstein–Barr virus-encoded
nuclear antigen 1 (EBNA1) mRNA that function as cis-acting regulatory elements.[66] Verification
of two-tetrad G4 folding was shown by CD and NMR spectroscopies. Cellular
studies showed that folded G4s in the EBNA1 mRNA downregulated protein
expression, which at first glance appears counterproductive; however,
the decrease in EBNA1 protein expression allows the virus to evade
detection by the host’s immune system.[66] This example illustrates that viruses may have favorably evolved
nucleic acid secondary structures, such as G4s, to increase their
survival and dissemination in host communities. Similarly, PQSs in
other viruses may function in a similar fashion as that demonstrated
for the Epstein–Barr virus.[66]Identification of important PQSs in other human viruses has been
noted. The SV40 virus has a closed-circular dsDNA genome with a PQS
in the promoter proposed to adopt a G4 and function in early and late
replication of the genome.[67] The human
papillomaviruses and herpesviruses have dsDNA genomes, in which PQSs
were found in key regulatory regions.[68,69] Further support
for SV40 and papillomaviruses containing folded G4s is derived from
both genomes coding for a helicase that unwinds G4s.[70,71] These viruses may harness G4 folds to serve a vital function during
replication and then ensure they are unwound when not needed. Future
studies in this area will provide more answers and may expand the
roles of G4s.Folded G4s in viruses are bound by protein “readers.”
Nucleolin was demonstrated to bind G4s from HCV,[72] HIV,[73] and Epstein–Barr
viruses,[74] and a key function ascribed
to nucleolin is induction of G4 folds. In HIV, nucleolin binding and
folding of PQSs were determined to favor the proviral DNA over the
viral RNA PQSs.[73] In contrast to favoring
G4 folding by nucleolin, the nuclear proteins HNRNPA2 and HNRNPB1
were found to bind HIV proviral DNA and to unwind folded G4s that
may function during regulation and timing of HIV replication.[75] Additionally, the HIV-1 nucleocapsid protein
can function as a chaperone for G4 folding.[76] These competing activities of proteins on viral G4s paint a complex
cellular picture, and future studies will provide more clues that
are desperately needed. Another possible role for viral PQSs suggested
by a reviewer is that they may function as scaffolds for binding host
proteins to label the viral strands as “self” in an
attempt to avoid immune surveillance. The diverse cellular properties
for G4s identified in humans and viruses suggest that they likely
are involved in other cellular processes.
The Viral Epitranscriptome
Includes N6-Methyladenosine
Chemical
modifications to the transcriptome are called the epitranscriptome,
and those in mRNA are of keen focus at present.[77,78] Modifications to mRNA include 5-methylcytosine (5mC), pseudouridine
(Ψ), inosine (I), N1-methyladenosine (m1A), N6-methyladenosine (m6A), and N6-2′-O-dimethyladenosine
(m6Am).[77,79] By far the best studied, and
likely the most abundant modification in mRNA, is m6A (Figure ), although effects
associated with decoupling m6A from m6Am have
presented challenges because of their similar structures.[78] Emerging data have identified that m6A can function in nearly all aspects of mRNA biogenesis including
splicing, nuclear export, translation efficiency, and decay of the
unwanted strands; these are focal points in recent reviews.[77−81] In this Outlook we focus on recent studies of m6A in
human viruses in which the base modification is installed by the host
cell.
Figure 3
Readers, writers, and erasers of m6A in host and viral
RNA.
Readers, writers, and erasers of m6A in host and viral
RNA.In mRNA, methyl groups are “written”
on A at the N6 position via a large hetero-multimeric
methyltransferase
complex found in the nucleus.[82] This complex
contains the methyltransferase METTL3 responsible for methyl group
transfer from S-adenosylmethionine to adenosine at
the N6 position.[78] Critical for substrate recognition in the complex is METTL14 (Figure ), and additional
factors found to be important include WTAP, KIAA1429, ZC3H13, and
HAKAI (Figure ).[78] Formation of m6A occurs within the
consensus sequence 5′-DRACH (D = A, G, or U; R
= A or G; H = A, C, or U; A = m6A), in which
GGACH is a favored insertion context.[83] Recent studies have identified that m6Am is
written into the 5′ cap of mRNA by an RNA polymerase II-associated
methyltransferase.[84,85] In mammalian genomes, m6A has been noted throughout mRNA from highly regulated genes, resulting
in methylation of 0.1% of adenosines with an average of 2–3
m6As per target transcript, and the extent of methylation
at a site can reach ∼90% in certain cases.[77−81,86] One of the great mysteries
regarding m6A is why certain DRACH motifs are methylated
while others are not. A prominent example found a hairpin structure
in RNA that was selectively methylated demonstrating a structural
component to site selection;[87] however,
hairpin structures do not explain all m6A sites, and other
secondary structures and/or protein factors are likely involved.In mRNA,
m6A is a dynamic modification because an “eraser”
(i.e., demethylase) can revert the sequence back to A. Two prominent
m6A demethylases identified are the nonheme Fe(II)- and
α-ketoglutarate-dependent dioxygenases ALKBH5 and FTO (Figure ).[88] These erasers strongly select m6A in single-stranded
RNA (ssRNA) over double-stranded RNA (dsRNA).[89,90] When the activity for ALKBH5 or FTO are knocked out of cells, an
increase in global A methylation is observed supporting their function;[91,92] however, many unanswered questions remain regarding selection of
demethylation sites, cell dependency of the erasers, and the substrate
scope for the demethylases as well as whether other erasers exist.The best characterized “readers” of m6A in mRNA include YTHDF1, YTHDF2, and YTHDF3 that reside in the cytoplasm,
as well as YTHDC1 found in the nucleus and YTHDC2 that is nucleocytoplasmic
(Figure ).[77−81] The presence of m6A in specific contexts of RNA can unfold
secondary structures (i.e., structural switches or cis-regulatory elements) allowing HNRNPC and HNRNPG to bind, designating
these proteins as m6A readers.[87] The list of m6A readers in mRNA is continually growing.
The three cytoplasmic YTH factors have similar binding constants for
m6A, and therefore, binding is determined by other factors
such as proteins and RNA structure.[93] Specifically,
YTHDF1 binds m6A and recruits translation factors to increase
expression; YTHDF2 binding promotes 3′-poly-A tail deadenylation
resulting in mRNA degradation; and YTHDF3 can promote translation
or degradation by interacting with YTHDF1 or YTHDF2, although, this
is not well understood.[77−81] Lastly, YTHDC1 facilitates mRNA biogenesis and export from the nucleus,
and YTHDC2 has two conflicting roles by promoting translation or degradation.
Degradation observed by YTHDC2 is through interaction with XRN1;[94] the activity of this protein can be blocked
by G4s.[55] Because of the considerable interest
in this field,[77−81] additional readers and their impact on mRNA biology will likely
soon be found. During a viral infection, m6A-associated
proteins are found in both the cytoplasm and nucleus allowing viral
RNA methylation to occur (Figure ).Enabling next-generation sequencing has allowed
advancements in
understanding where and how m6A functions in mRNA. The
challenges in locating m6A in a sequence result from this
heterocycle coding like an A during PCR workup for any sequencing
application. Thus, sequencing m6A follows one of three
specialized methods. The first approach is site-specific cleavage and radioactive-labeling followed by ligation-assisted extraction and thin-layer chromatography (SCARLET), which is a low-throughput
method capable of identifying the exact location and quantity of m6A at any site chosen for study.[95] This is the gold-standard approach, but it cannot be applied to
the entire transcriptome. In the other methods, an m6A-selective
antibody is used to take a fragmented pool of mRNA and immunoprecipitate
those containing m6A. The fragments can be directly sequenced
to locate m6A at the resolution of the shear length of
∼100 nucleotides (m6A-Seq[96] or meRIP-Seq[97]). Alternatively, the antibody-RNA
complex is photo-cross-linked to yield a covalent bond between the
protein and nucleic acid at the C nucleotide 3′ to m6A in the DRACH motif. Upon protease
removal of the antibody, the adducted C nucleotide causes sequencing
termination or yields a characteristic mutation signature (C →
T transition) to identify the location of m6A (m6A-CLIP).[98,99] Another version of the cross-linking approach
feeds cells with 4-thiouridine (4SU) to be metabolically incorporated
into the RNA, and during immunoprecipitation photo-cross-linking occurs
between the antibody and 4SU yielding a covalent bond (PAR-CLIP).[100] The 4SU site when adducted to the protein results
in a mutation signature (U → C) during sequencing near the
m6A for identification. Advancements to the photo-cross-linking
approach have been reported (i.e., m6A-LAIC-Seq).[86] Drawbacks to antibody-based methods include
biases to the data, missing clustered modifications, and the generation
of data that are not easily quantifiable.[79] Even though new methods and technologies are sorely needed to better
understand m6A, significant knowledge has been gained with
existing techniques.The presence of m6A in viral
RNA was first noted in
the 1970s in the human viruses SV40,[101] influenza A,[102] and human adenoviruses.[103] Recently, new sequencing approaches allowed
three independent reports of multiple m6A sites in the
HIV-1 RNA genome, additionally, each study conducted a series of knockdown
experiments to determine the impact of m6A readers, writers,
and erasers on viral fitness.[104−106] Among the data differing observations
and interpretations exist; these have been recently reviewed.[107−109] For the present discussion, key points include overlap of m6A sites in the 3′-UTR of the HIV-1 RNA genome in which
there are conserved PQSs shown to adopt G4 folds.[38] The sequencing approaches applied to identify m6A were low- and high-resolution m6A sequencing.[104−106] Also noted was the binding of m6A readers (i.e., YTHDF1-3)
to the 3′-UTR region that is critical for the impact of m6A on viral fitness; however, whether binding is favorable
or detrimental for HIV-1 replication is a core difference in the studies.[104−109]Regions of m6A in flaviviral genomes were sequenced
by two independent laboratories that found similar sites and functions
for m6A in the viral RNA.[110,111] Sequencing
for m6A by the lower resolution methods in the dengue,
HCV, West Nile, yellow fever, and Zika viruses identified conserved
regions of m6A installation across the flaviviruses studied.[110,111] In both studies, knockdown of established m6A-interacting
proteins found that when m6A installation was suppressed,
viral fitness increased, and when m6A removal was suppressed,
viral fitness decreased.[110,111] These observations
suggest that the host introduces m6A in specific regions
of flaviviral RNA as a mechanism for combating the infection; however,
why these regions are selected out of the many possible DRACH motifs is not known.A defining role for m6A
in viral RNA from other human
viruses was reported, a great example of which was documented in the
hepatitis B virus (HBV) that has a DNA genome and replicates through
an RNA intermediate.[112] Installation of
m6A in RNA from the HBV increased translation but decreased
reverse transcription of the RNA to DNA providing a dual role for
m6A on replication.[112] A key
location for m6A installation in the HBV within the epsilon
loop on the 3′-end of the RNA was identified. The Kaposi’s
sarcoma-associated herpesvirus has m6A introduced by host
writers that aids in splicing of a pre-mRNA for an essential protein;
additionally, m6A insertion was favored in 5′-GGAC sequence contexts.[113,114] For readers interested
in viral epitranscriptomics, excellent reviews on this topic exist.[107,109,115−117]
Is There Synergy between PQSs and m6A?
Upon inspection of the viral PQS data and m6A epitranscriptomic
sequencing data, we observed instances in which sites of m6A occurred within the loops of PQSs. This observation led us to ask
the question whether a synergy exists between some sites selected
for m6A insertion and folded G4s. In the first example,
sequencing for m6A in flaviviral genomes was conducted
by low-resolution methods.[110,111] The discussion here
will focus on the Zika genome, for which a map of the PQSs exists
from our prior work (Figure A).[37] There are 12 regions within
the Zika genome that showed enrichment of m6A above the
background input control; these are illustrated with the blue bars
below the map in Figure A. When we overlaid the PQS and m6A enrichment maps,[37,111] 8 of the 12 enriched peaks had PQSs with the DRACH
motif in a loop region (Figure A). In five of the eight overlapping peaks there was no ambiguity,
all possible DRACH motifs were in PQS contexts. In the
remaining three overlap peaks, DRACH motifs existed either
in PQSs or adjacent to PQSs. Inspection of the Zika genome found >300
possible m6A sites according to sequence, while only 12
of those potential sites were methylated. Interestingly, among the
∼70 PQSs in the Zika genome,[37] 8
out of the 12 m6A sites were associated with PQSs. This
argument does not constitute a strong statistical proof of our observation;
nonetheless, it is fascinating to find many examples of PQSs in Zika
that have a strong potential to be methylated in predicted loops.
Figure 4
Locations
for the overlap of PQSs and sites of m6A in
the Zika and HIV viral RNAs. (A) Overlap of an m6A enrichment
map (red line) compared to the input control (gray line) reported
for the Zika viral genome[111] and the PQS
map for the same genome[37] to illustrate
m6A sites with DRACH motifs in PQS loops.
The 12 sites that are enriched above background are shown with the
blue bars in comparison to the Zika genome diagram at the bottom.
(B) Diagram for the LTR and UTR regions of the HIV DNA and RNA genomes,
respectively, to identify the U3 region of the 3′-UTR in the
viral RNA, in which PQSs that adopt G4 folds (U3-II, U3-III, and U3-IV
G4s)[38] have m6A sites in loops.
Sequence conservation in this region is shown by the LOGO obtained
from 1527 HIV-1 genomes downloaded from www.hiv.lanl.gov.
Locations
for the overlap of PQSs and sites of m6A in
the Zika and HIV viral RNAs. (A) Overlap of an m6A enrichment
map (red line) compared to the input control (gray line) reported
for the Zika viral genome[111] and the PQS
map for the same genome[37] to illustrate
m6A sites with DRACH motifs in PQS loops.
The 12 sites that are enriched above background are shown with the
blue bars in comparison to the Zika genome diagram at the bottom.
(B) Diagram for the LTR and UTR regions of the HIV DNA and RNA genomes,
respectively, to identify the U3 region of the 3′-UTR in the
viral RNA, in which PQSs that adopt G4 folds (U3-II, U3-III, and U3-IV
G4s)[38] have m6A sites in loops.
Sequence conservation in this region is shown by the LOGO obtained
from 1527 HIV-1 genomes downloaded from www.hiv.lanl.gov.In the second example, sequencing m6A in the HIV-1 RNA
genome from two laboratories found modification in the 3′-UTR.[104,106,107] In one of the studies, PAR-CLIP
was applied to sequence m6A[104] to pinpoint three m6A sites adjacent to G runs that define
conserved PQSs in the U3 region of the HIV RNA genome (Figure B). Previous studies determined
that the region in which m6A was installed has the potential
to adopt G4 folds (U3-G4 II, U3-G4 III, and U3-G4 IV; Figure B) that impact viral replication.[38] Inspection of the sequence LOGO generated from
alignment of 1527 HIV-1 3′-UTR U3 sequences demonstrated strong
conservation of the PQSs along with two of the three A nucleotides
that are methylated, and the third A is the dominate nucleotide found
at this position (Figure B). To reiterate, the nucleotides showing strong conservation
in HIV-1 strains,[39] as well as HIV-2 and
SIV[60] in the U3 region of the 3′-UTR,
are the Gs needed for G4 formation as well as the A nucleotides that
are methylated.Additional examples illustrating m6A and PQS overlap
were found in the RNA from HBV and SV40. At position 1907 in the HBV
pregenomic RNA, an A is methylated adjacent to a two-tetrad PQS (5′-UU GGG U GG CUUU GGGG CAU GGAC-3′ underline
= Gs in PQS).[112] Changes to nucleotides
adjacent to G4 folds can significantly impact structure.[118] In the HBV case, the m6A site was
confirmed by mutational analysis to be critical for viral replication.
Other m6A sites were not further interrogated and prevent
our PQS inquiry into their data. In the SV40 mRNA, PAR-CLIP sequencing
found many examples of m6A in PQSs, and one example is
an A at position 2444 (5′-CA GG A GGACACAGA GGG U GG AU-3′).[119] In SV40, m6A functions to enhance viral replication and
gene expression. These observations provide additional examples of
established m6A sites in PQS contexts.As a consequence
of the only recent emergence of sequencing m6A in viruses,
more data are not yet available to provide stronger
generalizations. With this limitation in mind, examples exist from
viruses of critical m6A sites existing in PQS contexts
that appear to impact these human viruses significantly. In this Outlook,
we bring attention to the possible synergy between these two disparate
fields of study and hope that researchers will add data to support
or refute the G4-m6A correlation in the future. As more
m6A sequencing data are collected on viral RNAs, inspection
for m6A within PQSs and how these G-rich sequences can
impact local secondary structure may provide missing clues as to why
these regions are selected by proteins for methylation while others
are not. Moreover, how m6A impacts G4 folds can aid in
drawing conclusions regarding the impact of methylation on viral fitness.
Currently, whether the PQS context favors methylation (writing m6A) or disfavors demethylation (erasing m6A) is
not known.What possible biological outcomes could arise from
m6A installation in PQS contexts? Each sequence context
is unique and
would need to be studied on a case-by-case basis, although some general
speculations can be made; one possibility is that N6 methylation of A could drive the formation of G4 folds
at the expense of other secondary structures destabilized by m6A (Figure A). A structure switching role for m6A in the context
of an RNA hairpin has been demonstrated to modulate protein factor
binding.[87,120] The impact on secondary structure by m6A can occur via destabilization of base pairing to U and/or
the hydrophobicity of the new methyl group may alter RNA hydration[121] to favor parallel-stranded RNA G4 folding in
these sequences. This G4 switch would display sequences for protein
recognition differently, resulting in a downstream phenotypic change.
Stabilization of G4 folds would also result in stronger blocks to
polymerases during replication of the viral genome. In this scenario,
m6A in a G4 would slow viral replication, especially if
this increases binding of nucleolin,[72] and
this proposal is consistent with the decrease in replication shown
for adenosine methylation in flavivirus.[110,111]
Figure 5
Alternative
pathways for installation of m6A in the
PQS context can create a structural switch in viral genomes. (A) An
unfolded PQS can be methylated to facilitate G4 folding. (B) A folded
G4 can be methylated at an A nucleotide in the loop, resulting in
loss of the G4 structure or retention/alteration of the fold. To maintain
brevity, only the writing of m6A in the PQS context is
illustrated; however, erasing m6A may also be impacted
by secondary structure.
Alternative
pathways for installation of m6A in the
PQS context can create a structural switch in viral genomes. (A) An
unfolded PQS can be methylated to facilitate G4 folding. (B) A folded
G4 can be methylated at an A nucleotide in the loop, resulting in
loss of the G4 structure or retention/alteration of the fold. To maintain
brevity, only the writing of m6A in the PQS context is
illustrated; however, erasing m6A may also be impacted
by secondary structure.On the other hand, m6A could destabilize G4 folds
and
drive the RNA to another structure (Figure B). This is likely because G4s can be sensitive
to changes in hydrogen bonding between loop nucleotides.[31] Consequently, methylation of A would disfavor
G4 folds and diminish the challenges these structures can pose to
replication and protein recruitment. Disrupting the G4 could facilitate
polymerase bypass of these sites and favor increased replication.
This situation would be consistent with two of the studies regarding
m6A impact on HIV-1 in which methylation favored viral
replication.[104,105] Further, m6A is readily
bypassed by polymerases and would not pose an impasse to replication.[122] These potential pathways in which synergy exists
between m6A and G4 folds may provide missing links to understanding
the viral epitranscriptomic impacts observed.Lastly, epitranscriptomic
modification of viral genomes in PQS
contexts may represent a small window to a larger phenomenon. In the
human transcriptome, we note an example of a primary miRNA with m6A in a PQS context that could be a structural switch,[123] and recent nanopore sequencing data of the
human transcriptome identified significant overlap of m6A sites and PQSs in mRNA 3′-UTRs.[124] Additionally, m6A is one of many epitranscriptomic modifications
that exist (e.g., m6Am, m1A, I, Ψ, and 5mC; Figure ),[77,79] and this list may now include internal N7-methylguanosines (m7G) in human RNA.[125,126] Other epitranscriptomic
modifications may be installed in PQSs. Support for this final claim
comes from a recent protein pull-down experiment with the humanNRAS 5′-UTR RNA G4 used as bait in which a methylase
for biosynthesis of 5mC (NSUN5) was significantly pulled down.[127] We anticipate our observations will provide
ample opportunities for exciting discoveries regarding the interplay
between G4 structures and epitranscriptomic modifications of RNA.
Figure 6
Structures
for chemically modified structures in mRNA.
Structures
for chemically modified structures in mRNA.
Authors: Bastian Linder; Anya V Grozhik; Anthony O Olarerin-George; Cem Meydan; Christopher E Mason; Samie R Jaffrey Journal: Nat Methods Date: 2015-06-29 Impact factor: 28.547
Authors: Sara Artusi; Rosalba Perrone; Sara Lago; Paolo Raffa; Enzo Di Iorio; Giorgio Palù; Sara N Richter Journal: Nucleic Acids Res Date: 2016-10-27 Impact factor: 16.971