Literature DB >> 23235488

The A-nucleotide preference of HIV-1 in the context of its structured RNA genome.

Formijn J van Hemert¹, Antoinette C van der Kuyl, Ben Berkhout.

Abstract

A bipartition of HIV-1 RNA genome sequences into single- and double-stranded nucleotides is possible based on the secondary structure model of a complete 9 kb genome. Subsequent analysis revealed that the well-known lentiviral property of A-accumulation is profoundly present in single-stranded domains, yet absent in double-stranded domains. Mutational rate analysis by means of an unrestricted model of nucleotide substitution suggests the presence of an evolutionary equilibrium to preserve this biased nucleotide distribution.

Entities: Chemical Disease Gene Species

Keywords: HIV; RNA structure; evolution; lentiviruses; mutational pattern; nucleotide composition

Mesh：

Substances：

Year: 2012 PMID： 23235488 PMCID： PMC3594280 DOI： 10.4161/rna.22896

Source DB: PubMed Journal: RNA Biol ISSN： 1547-6286 Impact factor: 4.652

Introduction

The tendency of lentiviral open reading frames to become A-rich has been documented previously. For instance, the single-stranded RNA genome of HIV-1 contains 36.2% A, 23.9% G, 22.2% U and 17.6% C. The increased A-content dictates the typical codon usage of this virus., The apparent selection of A-rich codons in HIV genomes even contributes to a biased amino acid composition of the encoded proteins. Also, HIV particles contain tRNAs that decode A-ending codons, suggesting a modulation of the cellular tRNA pool toward the typical codon preference of HIV genes. These basic RNA properties are well conserved over time and among the different members of the Lentiviridae., dCTP pool imbalance during reverse transcription has been proposed as a cause of G→A hypermutation of the HIV-1 genome., dNTP pool imbalance appeared to contribute more to HIV evolution in vivo than sequence editing by the cellular restriction factors Apobec 3G/3F. A reduction of the A-richness of HIV-1 polymerase sequences impaired viral DNA synthesis, but a biological function for this typical lentiviral A-pressure has not yet been elucidated. Recently, a secondary structure model of the complete 9kb HIV-1 RNA genome at single nucleotide resolution has been constructed by means of combined chemical assay of nucleotide accessibility (SHAPE, see ref. 13) and RNA folding prediction (RNAstructure, see refs. 14,15). The biased nucleotide composition of the HIV-1 RNA genome will definitely have some implications for the distribution of the different nucleotides over the structured RNA genome. Even when assuming maximal base pairing across the genome, the character of the possible base pairs (G-C, A-U and G-U and the reverse set of 3) dictates that not every A can be paired given the 14% surplus of A (36.2%) over its unique pairing partner U (22.2%). This would mean that the single-standed regions of HIV-1 RNA will statistically have a surplus of A and possibly G over U and particularly C. As these patterns could constitute a distinct molecular signature of the viral genome, we set out to further analyze the nucleotide distribution in the context of the HIV-1 RNA secondary structure model.

Results

Nucleotide composition of the structured HIV-1 RNA genome

The nucleotide composition differs significantly between single- and double-stranded regions of the HIV-1 RNA structure model of the NL4-3 isolate (Table 1). Of the total 9173 nucleotides, 59% and 41% are present in these ss and ds regions, respectively. As much as 79% of A nucleotides in this HIV-1 RNA genome participate in the ss parts. In other words, almost 4 of 5 A nucleotides are predicted to be unpaired in this highly structured RNA molecule. In contrast, 57% of U, 45% of G and only 38% of C are found in ss regions. These striking data indicate a differential nucleotide bias in the ss vs. ds domains of HIV-1 RNA. Apparently, the lentiviral property of A-pressure at the expense of C as described previously, is intensified in the ss regions (A-rich and C-poor with 79% and 38%, respectively) but absent in the ds parts that show a strikingly reversed pattern (C-rich and A-poor with 62% and 21%, respectively). Analysis of the HIV-1 sequence after partition into separate reading frames and codon positions (GAG, POL, ENV and NEF, excluding regions with gene overlap) confirmed these patterns (). The combined 5′ and 3′ non-coding regions displayed twice as much paired nucleotides than the genes and a concomitant decrease in A-content.,

Table 1. Biased nucleotide composition of single- and double-stranded regions in HIV-1 NL4-3 RNA

Nucl	ss ds	Number		ss/ds
Nucl	ss ds	(proportion)		ss/ds
All	ss	5377	(0.59)	1.416
All	ds	3796	(0.41)	1.416
A	ss	2596	(0.79)	3.779
A	ds	687	(0.21)	3.779
U	ss	1155	(0.57)	1.304
U	ds	886	(0.43)	1.304
C	ss	623	(0.38)	0.616
C	ds	1012	(0.62)	0.616
G	ss	1003	(0.45)	0.828
G	ds	1211	(0.55)	0.828

Analysis of the base pairs in the HIV-1 RNA secondary structure model indicates that the most stable GC and CG base pairs are used more frequently than AU and UA pairs (Fig. 1). The least stable GU and UG pairs are present at an even lower frequency. This unequal base pair composition correlates with the slightly preferred occurrence of G and C in the ds parts of the HIV-1 genome (Table 1: 55% and 62%, respectively).

Figure 1. Base pair composition of the double-stranded portion of the HIV-1 NL4-3 RNA structure.

Figure 1. Base pair composition of the double-stranded portion of the HIV-1 NL4-3 RNA structure. The structure of the NL4-3 RNA genome has been deduced from a combination of experimental RNA structure probing data and computational RNA structure prediction. In short, the SHAPE reactivity assay monitored the accessibility of nucleotides in the RNA structure by chemical base modification. These experimental data were fed into the RNA folding software to obtain a pairing probability value for each individual nucleotide. We analyzed the distribution of the four nucleotides for SHAPE reactivity (Fig. 2). The A-nucleotides show a peak in SHAPE reactivity around 0.8, which contrasts with the much lower values calculated for the other three nucleotides. The SHAPE reactivity of the C-nucleotides is most restricted and largely confined to the 0.2–0.4 window. These results indicate that A nucleotides are in general more exposed to chemical modification than the other nucleotides because the most As are single-stranded, whereas Cs are best protected against the modifying agent by base pairing. These structure-probing results are in agreement with the biased nucleotide composition based on the predicted HIV-1 RNA structure model: A is overrepresented in ss regions and C is found preferentially in ds domains (Table 1). This points to an intimate relationship between the nucleotide composition of the HIV-1 RNA genome and its structure.

Figure 2. SHAPE reactivity of each nucleotide (A, U, C and G) in HIV-1 NL4-3 RNA. The histograms show increasing SHAPE reactivity in windows of 0.2 (X-axis, relative units). Frequency refers to the number of nucleotides per SHAPE window. Note the deviant SHAPE reactivity of the A-nucleotide. The connection between base composition and secondary structure in the RNA genome of the HIV-1 strain NL4-3 may be exemplary for other virus isolates. To test this, the ss/ds designation of NL4-3 was projected onto the corresponding nucleotides of 448 aligned HIV-1 subtype B sequences taken from the Los Alamos database (year 2010, no recombinants). An ss/ds bipartition was created without affecting the individual base-to-base alignments. Indeed, nucleotide frequencies differ between these two data sets quite similarly as described above for NL4-3 RNA (Table 2). The small values for standard deviation (StD) indicate considerable conservation of the typical nucleotide composition in the ss and ds compartments. Apparently, the property of A-pressure at the expense of C is prominent in portions of HIV-1 subtype B RNAs that represent unpaired regions in NL4-3 RNA.

Table 2. Nucleotide frequencies in single- and double-stranded regions of HIV-1 subtype B RNA genomes

448 HIV-1 isolates		A	U	C	G
All	AVG	36.20	22.23	17.64	23.94
All	StD	0.55	0.17	0.31	0.33
ss	AVG	47.50	21.30	11.90	19.20
ss	StD	0.40	0.21	0.26	0.30
ds	AVG	19.90	23.60	25.80	30.70
ds	StD	0.44	0.23	0.29	0.30

The alignment of 448 sequences (All) was divided in two parts by the ss or ds designation of bases in the NL4-3 RNA structure. The individual nucleotide compositions were used for the calculation of average and standard deviation.

Different nucleotide substitution pattern in ss and ds domains of HIV-1 RNA

The strikingly different nucleotide composition of ss and ds RNA regions may point to different evolutionary rates of the nucleotides in these two domains. Maximum likelihood estimates of relative evolutionary rates for A, U, C and G nucleotides in ss and ds alignments confirmed this expectation (Table 3). A positive value indicates the substitution probability of a row nucleotide by one of the column nucleotides in the same row. A negative value on the matrix’s diagonal represents the quantity to reduce the summarized values of the substitution probabilities in the same row to zero. From inspection of the Qss matrix, it is obvious that the A-nucleotide shows the lowest probability and the C-nucleotide the highest probability of being substituted (-0.699428 and -1.776556, respectively). The single-stranded As alter most frequently into G (0.461079), followed by C (0.156830) and U (0.081518). G-nucleotides, in turn, rapidly change into A (1.148242), while G→C and G→U are relatively rare mutational events (0.114724 and 0.084045, respectively). Likewise, the C→U substitution is more prominent than U→C (1.007012 vs. 0.568940), C→A outscores A→C (0.574634 vs. 0.156830) and U→A exceeds A→U (0.138854 vs. 0.081518). This nucleotide substitution pattern will lead to an accumulation of A at the expense of C, G and U until an equilibrium is reached, which is exactly the nucleotide distribution that has been observed in the ss regions of HIV-1 subtype B RNAs (Table 2).

Table 3. Different patterns of nucleotide substitution for ss and ds nucleotides in HIV-1 subtype B RNA genomes

Qss	A	U	C	G	Qds	A	U	C	G
A	-0.699428	0.081518	0.156830	0.461079	A	-1.499043	0.254533	0.339833	0.904678
U	0.138854	-0.899975	0.568940	0.192180	U	0.175181	-0.868417	0.532371	0.160865
C	0.574634	1.007012	-1.776556	0.194910	C	0.300107	0.471187	-0.837239	0.065945
G	1.148242	0.084045	0.114724	-1.347011	G	0.736551	0.068547	0.064782	-0.869880

Patterns of nucleotide substitution are presented as rate matrices (Qss and Qds). A positive value of a row represents the rate of substitution of the row nucleotide into one of the column nucleotides. A negative value on the matrix diagonal is the quantity by which the sum of the positive row becomes reduced to zero (meaning a zero rate of substitution). An unrestricted model of nucleotide substitution was used. The two alignments of ss and ds nucleotides were analyzed in five batches of 80 sequences. The resulting matrices () were arithmetically averaged to obtain the two “consensus” matrices (Qss and Qds). The Qds matrix contrasts strongly with the Qss matrix. The A-nucleotide is most prone to substitution (-1.499043) and G→A is slightly less probable than A→G (0.736551 and 0.904678, respectively), which is in line with the enhanced proportion of G and the equivalent diminishment of A in ds domains of HIV-1 RNA genomes (Table 2). It should be noted that these matrices have been constructed by means of an unrestricted model of nucleotide substitution without any constraining condition like reversibility, (partial) rate equality or fixed transition/transversion ratios. In addition, the RNA genomes of different HIV-1 isolates generated nearly identical Q matrices (). We report that ss and ds regions in HIV-1 RNA employ different mutational patterns/signatures to maintain their distinct nucleotide composition. This may relate to experimental findings that indicate that local RNA structure can influence pausing of the Reverse Transcriptase enzyme,, which may increase the probability of misincorporation.20 Overall, the secondary RNA structure seems to pose serious constraints on the nucleotide composition and evolution of the HIV-1 RNA genome.

Discussion

It is known that the RNA genomes of retroviruses do not use an equal portion of the four possible nucleotides, the HIV-1 genome being particularly A-rich (36.2%) and C-poor (17.6%). We now evaluated these biases with respect to the ss and ds nature of the nucleotides in the viral RNA genome. We document a strikingly different nucleotide signature for the ss and ds regions. The bias is put to the extreme for the ss regions (47.5% A, 21.3% U, 19.2% G and 11.9% C) and approaches a more neutral nucleotide composition for the ds regions (19.9% A, 23.6% U, 30.7% G and 25.8% C). We subsequently show that distinct mutational patterns can be observed in these two regions that will result in the maintenance of the typical nucleotide composition of the ss and ds regions. The paired/unpaired status of a nucleotide in a viral RNA structure can have several biological effects. For instance, chemical and Apobec 3G-mediated nucleotide modification affects ss RNA more than ds RNA., Error rates of the HIV-1 Reverse Transcriptase differ by template structure, being higher for ss than ds RNA., The biology of an RNA molecule is obviously determined by properties other than the ss/ds nature. The protein coding capacity dictates the selection of certain strings of nucleotides to form the required codons. In protein-coding sequences, which concern nearly the entire HIV genome, shifts in codon bias are restricted by the availability of cellular aminoacyl-tRNAs, overlapping reading frames (tat, rev and env) and overlapping regulatory sequences (e.g., nef overlaps with the 3′ Long-terminal Repeat). Indeed, the viral genome is riddled with specific sequence elements that control RNA splicing and many other processes such as RNA packaging into virion particles. Despite these multiple constraints, we disclosed a relatively simple pattern of biased nucleotide composition that is highly related to the base paired structure of the RNA molecule: excessive A-usage and C-restriction in the ss domains. The molecular mechanism responsible for the creation of this typical A-rich genome configuration remains unknown, but the new findings do specify our thoughts on the possible evolutionary events. A priori, two possible scenarios can be envisaged that relate to the two independent steps of evolution: mutation and selection. The A-bias might arise through a preferred mutational activity and/or evolutionary selection. According to the first scenario, the generation of an A-rich genome may be caused by an enzymatic property of the error-prone Reverse Transcriptase enzyme or cellular editing activities encoded by the Apobec functions, which may induce G-to-A hypermutation in HIV sequences.,,,5 The new finding of clustering of A nucleotides in ss regions of the HIV-1 RNA genome does not support these mutational scenarios as a driving force for the acquisition of A-richness as this would create an ubiquitously A-rich genome. Of course, we cannot exclude a mutational activity that is selective for ss regions. According to the second scenario, HIV-1 and other lentiviruses have become A-rich (and C-poor) over evolutionary times by selective pressure. It is currently unknown what purpose is served by the strikingly differential base content of the HIV-1 RNA genome, but the new finding that excessive A usage is restricted to the ss domains does support this scenario and further specifies the typical lentiviral genome requirements. Our favorite suggestion would be that an RNA genome with A-rich ss domains provides a molecular signature that is recognized during virus replication. This recognition could occur in the context of the virus replication cycle, e.g., in selective packaging of this RNA molecule into virion particles amidst an excess of other transcripts. Alternatively, this recognition could occur in the context of the virus-host interplay, e.g., in recognition of the invading RNA by cellular factors of the innate immune system. The virus may have adopted a particular genome architecture to adapt to cellular defense mechanisms. Interestingly, gag, pol and env transcripts lose the ability to induce type 1 interferon responses upon “translational optimization” of the codons. A more accurate description of this lentiviral RNA structure by biophysical means, 3D-modeling and functional studies, e.g., binding studies with candidate viral or cellular proteins, should help to unravel the underlying biological meaning of this particular RNA genome architecture.

Materials and Methods

The RNA sequence of HIV-1 isolate NL4-3, belonging to subtype B, its structure, SHAPE reactivity data and base pairing probabilities were taken from Watts et al. The Los Alamos HIV database (www.hiv.lanl.gov) provided aligned genomes of HIV-1 subtype B isolates (year 2010, 448 genomes, no recombinants). The NL4-3 RNA sequence including its single-stranded (ss) or double-stranded (ds) designation for each nucleotide position was manually made part of this alignment. All nucleotides involved in base pairing (regular Watson-Crick and G-U/U-G pairs) were scored as ds, and unpaired nucleotides (interhelical segments, hairpin loops and internal loops, bulges) as ss. Subsequently, a bipartition was created guided by the ss or ds designation under stringent preservation of aligned nucleotide positions. Nucleotide frequency estimates and tree building was performed using MEGA5. The unrestricted model 8 of the BASEML module of PAML V4.4 was used to estimate the mutational rate parameters of U→C, U→A, U→G, C→U, C→A, C→G, A→U, A→C, A→G, G→U and G→C relative to the fixed value of 1 for G→A by means of maximum likelihood (ML) iteration. The pattern of nucleotide substitution is presented as a matrix (Q) specifying relative rates multiplied by a constant so that the average rate is made equal to 1 when the process is in equilibrium (see the PAML manual for details). In view of the parameter richness of this model 8, both ss and ds alignments were analyzed in 5 portions of 80 isolates without the final 48 isolates of the original alignment. Arithmetic averaging was applied to generate two “consensus” Q matrices showing different nucleotide substitution patterns between paired and unpaired nucleotides in 401 HIV-1 subtype B RNAs (the source EXCEL sheet is available as ). Click here for additional data file. Click here for additional data file. Click here for additional data file.

28 in total

1. The leader of the HIV-1 RNA genome forms a compactly folded tertiary structure.

Authors: B Berkhout; J L van Wamel
Journal: RNA Date: 2000-02 Impact factor: 4.942

2. The extent of codon usage bias in human RNA viruses and its evolutionary origin.

Authors: Gareth M Jenkins; Edward C Holmes
Journal: Virus Res Date: 2003-03 Impact factor: 3.303

3. Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization.

Authors: David H Mathews
Journal: RNA Date: 2004-08 Impact factor: 4.942

4. Single-strand specificity of APOBEC3G accounts for minus-strand deamination of the HIV genome.

Authors: Qin Yu; Renate König; Satish Pillai; Kristopher Chiles; Mary Kearney; Sarah Palmer; Douglas Richman; John M Coffin; Nathaniel R Landau
Journal: Nat Struct Mol Biol Date: 2004-04-18 Impact factor: 15.369

5. G-->A hypermutation of the human immunodeficiency virus type 1 genome: evidence for dCTP pool imbalance during reverse transcription.

Authors: J P Vartanian; A Meyerhans; M Sala; S Wain-Hobson
Journal: Proc Natl Acad Sci U S A Date: 1994-04-12 Impact factor: 11.205

6. Unusual codon usage of HIV.

Authors: J Kypr; J Mrázek
Journal: Nature Date: 1987 May 7-13 Impact factor: 49.962

7. Purine and pyrimidine metabolism in human T lymphocytes. Regulation of deoxyribonucleotide metabolism.

Authors: A Cohen; J Barankiewicz; H M Lederman; E W Gelfand
Journal: J Biol Chem Date: 1983-10-25 Impact factor: 5.157

8. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure.

Authors: David H Mathews; Matthew D Disney; Jessica L Childs; Susan J Schroeder; Michael Zuker; Douglas H Turner
Journal: Proc Natl Acad Sci U S A Date: 2004-05-03 Impact factor: 11.205

9. Mutagenicity and pausing of HIV reverse transcriptase during HIV plus-strand DNA synthesis.

Authors: J Ji; J S Hoffmann; L Loeb
Journal: Nucleic Acids Res Date: 1994-01-11 Impact factor: 16.971

Review 10. The biased nucleotide composition of the HIV genome: a constant factor in a highly variable virus.

Authors: Antoinette C van der Kuyl; Ben Berkhout
Journal: Retrovirology Date: 2012-11-06 Impact factor: 4.602

14 in total

1. In vivo SELEX of single-stranded domains in the HIV-1 leader RNA.

Authors: Nikki van Bel; Atze T Das; Ben Berkhout
Journal: J Virol Date: 2013-12-11 Impact factor: 5.103

2. Large-scale nucleotide optimization of simian immunodeficiency virus reduces its capacity to stimulate type I interferon in vitro.

Authors: Nicolas Vabret; Marc Bailly-Bechet; Alice Lepelley; Valérie Najburg; Olivier Schwartz; Bernard Verrier; Frédéric Tangy
Journal: J Virol Date: 2014-01-29 Impact factor: 5.103

3. Sequence-specific activation of the DNA sensor cGAS by Y-form DNA structures as found in primary HIV-1 cDNA.

Authors: Anna-Maria Herzner; Cristina Amparo Hagmann; Marion Goldeck; Steven Wolter; Kirsten Kübler; Sabine Wittmann; Thomas Gramberg; Liudmila Andreeva; Karl-Peter Hopfner; Christina Mertens; Thomas Zillinger; Tengchuan Jin; Tsan Sam Xiao; Eva Bartok; Christoph Coch; Damian Ackermann; Veit Hornung; Janos Ludwig; Winfried Barchet; Gunther Hartmann; Martin Schlee
Journal: Nat Immunol Date: 2015-09-07 Impact factor: 25.606

4. RNA structure. Structure of the HIV-1 RNA packaging signal.

Authors: Sarah C Keane; Xiao Heng; Kun Lu; Siarhei Kharytonchyk; Venkateswaran Ramakrishnan; Gregory Carter; Shawn Barton; Azra Hosic; Alyssa Florwick; Justin Santos; Nicholas C Bolden; Sayo McCowin; David A Case; Bruce A Johnson; Marco Salemi; Alice Telesnitsky; Michael F Summers
Journal: Science Date: 2015-05-22 Impact factor: 47.728

5. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain.

Authors: Zsuzsanna Sükösd; Ebbe S Andersen; Stefan E Seemann; Mads Krogh Jensen; Mathias Hansen; Jan Gorodkin; Jørgen Kjems
Journal: Nucleic Acids Res Date: 2015-10-17 Impact factor: 16.971

6. Nucleotide composition of the Zika virus RNA genome and its codon usage.

Authors: Formijn van Hemert; Ben Berkhout
Journal: Virol J Date: 2016-06-08 Impact factor: 4.099

7. Within-patient mutation frequencies reveal fitness costs of CpG dinucleotides and drastic amino acid changes in HIV.

Authors: Kristof Theys; Alison F Feder; Maoz Gelbart; Marion Hartl; Adi Stern; Pleuni S Pennings
Journal: PLoS Genet Date: 2018-06-28 Impact factor: 5.917

8. Euclidean Distance Analysis Enables Nucleotide Skew Analysis in Viral Genomes.

Authors: Formijn van Hemert; Maarten Jebbink; Andries van der Ark; Frits Scholer; Ben Berkhout
Journal: Comput Math Methods Med Date: 2018-10-30 Impact factor: 2.238

9. Comparison of SIV and HIV-1 genomic RNA structures reveals impact of sequence evolution on conserved and non-conserved structural motifs.

Authors: Elizabeth Pollom; Kristen K Dang; E Lake Potter; Robert J Gorelick; Christina L Burch; Kevin M Weeks; Ronald Swanstrom
Journal: PLoS Pathog Date: 2013-04-04 Impact factor: 6.823

Review 10. RNA Structure-A Neglected Puppet Master for the Evolution of Virus and Host Immunity.

Authors: Redmond P Smyth; Matteo Negroni; Andrew M Lever; Johnson Mak; Julia C Kenyon
Journal: Front Immunol Date: 2018-09-19 Impact factor: 7.561