Literature DB >> 23028282

Stochastic simulations suggest that HIV-1 survives close to its error threshold.

Kushal Tripathi¹, Rajesh Balagam, Nisheeth K Vishnoi, Narendra M Dixit.

Abstract

The use of mutagenic drugs to drive HIV-1 past its error threshold presents a novel intervention strategy, as suggested by the quasispecies theory, that may be less susceptible to failure via viral mutation-induced emergence of drug resistance than current strategies. The error threshold of HIV-1, μ c, however, is not known. Application of the quasispecies theory to determine μ c poses significant challenges: Whereas the quasispecies theory considers the asexual reproduction of an infinitely large population of haploid individuals, HIV-1 is diploid, undergoes recombination, and is estimated to have a small effective population size in vivo. We performed population genetics-based stochastic simulations of the within-host evolution of HIV-1 and estimated the structure of the HIV-1 quasispecies and μ c. We found that with small mutation rates, the quasispecies was dominated by genomes with few mutations. Upon increasing the mutation rate, a sharp error catastrophe occurred where the quasispecies became delocalized in sequence space. Using parameter values that quantitatively captured data of viral diversification in HIV-1 patients, we estimated μ c to be 7 x 10(-5)-1 x 10(-4) substitutions/site/replication, ≈ 2-6 fold higher than the natural mutation rate of HIV-1, suggesting that HIV-1 survives close to its error threshold and may be readily susceptible to mutagenic drugs. The latter estimate was weakly dependent on the within-host effective population size of HIV-1. With large population sizes and in the absence of recombination, our simulations converged to the quasispecies theory, bridging the gap between quasispecies theory and population genetics-based approaches to describing HIV-1 evolution. Further, μ c increased with the recombination rate, rendering HIV-1 less susceptible to error catastrophe, thus elucidating an added benefit of recombination to HIV-1. Our estimate of μ c may serve as a quantitative guideline for the use of mutagenic drugs against HIV-1.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 23028282 PMCID： PMC3441496 DOI： 10.1371/journal.pcbi.1002684

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

The high mutation rate of HIV-1 coupled with its massive turnover rate in vivo results in the continuous generation of mutant viral genomes that are resistant to administered drugs and can evade host immune responses [1], [2]. The design of drugs and vaccines that exhibit lasting activity against HIV-1 has remained a challenge [3]–[6]. A promising strategy to overcome this challenge has emerged from insights into viral evolution gained from the molecular quasispecies theory [7], [8]. The theory predicts that a collection of closely related but distinct genomes, called the quasispecies, exists in an infected individual when the viral mutation rate is small. When the mutation rate is increased beyond a critical value, called the error threshold, the quasispecies delocalizes in sequence space, inducing a severe loss of genetic information–a phenomenon termed error catastrophe–and compromising the viability of the viral population. It is widely believed therefore that viral mutation rates may have been evolutionarily optimized to lie close to but below their error thresholds so that viral diversity, and hence adaptability, is maximized while genomic identity is maintained [9]–[11]. Consequently, a small increase in the viral mutation rate may trigger an error catastrophe. In accordance, 4-fold increase in the mutation rate induced a dramatic 70% loss of poliovirus infectivity in vitro [9]. Chemical mutagens have been employed successfully to enhance the mutation rates of a host of other viruses [10]–[13] including HIV-1 [14]–[17]. An HIV-1 mutagen is currently under clinical trials [18]. Identification of the host restriction factor APOBEC3G (A3G) has suggested that mutagenesis might also be a natural antiviral defence mechanism (reviewed in [19], [20]). A3G (and, to a smaller extent, APOBEC3F) induces G to A hypermutations in HIV-1, which when unchecked can severely compromise the viability of HIV-1. Interestingly, HIV-1 appears to have evolved a strategy to resist A3G. The HIV-1 protein Vif targets A3G for proteasomal degradation and suppresses its mutagenic activity. Vif thus presents a novel drug target. Inhibiting Vif may enable A3G to exert mutagenic activity adequate to compromise HIV-1. Indeed, significant efforts are underway to develop potent HIV-1 Vif-inhibitors [21]. The use of mutagenesis as an antiviral strategy requires caution because increasing the mutation rate to values below the error threshold could prove counterproductive. The quasispecies theory predicts that a suboptimal increase in the mutation rate would result in an increase in viral diversity that may not be accompanied by a substantial loss of genetic information, which in turn may facilitate the emergence of mutant genomes resistant to drugs and/or host-immune responses [22]. The mutagenic activity of drugs and of host-factors like A3G is dose-dependent [9], [17], [23]. It is important, therefore, to identify the minimum exposure to mutagenic drugs that would ensure that the error threshold of HIV-1 is crossed. The error threshold of HIV-1 is not known. Translation of the predictions of the quasispecies theory to HIV-1 has remained a challenge: The theory considers the asexual reproduction of a haploid organism with an infinitely large population size, whereas HIV-1 is diploid, undergoes recombination [24]–[27], and is estimated to have a small effective population size in vivo, ∼102–105 cells [28]–[37]. Several studies have advanced the quasispecies theory to account for the diploid nature of HIV-1 and recombination [38]–[50]. The small effective population size of HIV-1 in vivo, however, renders the deterministic formalism of the quasispecies theory inadequate. Population genetics-based stochastic simulations have been resorted to as an alternative [37], [40], [41], [47], [49], [51]–[53]. Such simulations often make significant departures from the quasispecies theory that may render an error catastrophe untenable. For instance, a sharp error catastrophe may not occur with certain fitness landscapes [54]–[58]. Further, in the large population size limit, the simulations may not converge to the predictions of the quasispecies theory [59]. Indeed, whether population genetics- or quasispecies theory-based approaches are more appropriate for describing viral evolution has been the subject of an ongoing debate [56]. We have recently developed stochastic simulations of HIV-1 evolution in vivo that incorporate key aspects of the HIV-1 lifecycle and the underlying evolutionary forces, namely, mutation, multiple infections of cells, recombination, fitness selection using a landscape representative of HIV-1, and random genetic drift [37], [47], [49]. The simulations quantitatively described data of the evolution of viral diversity and divergence in HIV-1 infected individuals over several years following seroconversion, indicating that the simulations faithfully mimicked HIV-1 evolution in vivo [37]. Here, we applied the simulations to determine the structure of the HIV-1 quasispecies and estimate its error threshold. In the limit of large population sizes and in the absence of recombination, our simulations converged to the quasispecies theory, thus bridging the gap between population genetics- and quasispecies theory-based approaches to describing viral evolution and suggesting the existence of an error threshold for HIV-1. We estimated the error threshold of HIV-1 to be ∼2–6-fold higher than its natural mutation rate. HIV-1 thus appears to survive close to its error threshold and may be readily susceptible to mutagenic drugs.

Results

Simulations of the within-host evolution of HIV-1

We performed simulations as follows. Uninfected cells were synchronously infected by a pool of identical virions, each cell potentially infected by multiple virions. Viral genomic RNA in cells were then reverse transcribed to proviral DNA. Reverse transcription involved mutation and recombination. The proviral DNA were transcribed to viral genomic RNA, which were assorted into pairs and released as progeny virions. Virions from the pool of progeny virions were selected according to their relative fitness to infect a new generation of uninfected cells, and the cycle was repeated. Following several thousand generations and several such realizations, the expected structure of the viral quasispecies at a given mutation rate was determined. Simulations at different mutation rates allowed identification of the error threshold. Details of the simulation procedure and parameter values employed are presented in Methods.

Evolution of genome frequencies and the viral quasispecies

We present first the evolution of the frequencies of genomes in different Hamming classes in one realization of our simulations (Fig. 1). Hamming class contains genomes carrying mutations with respect to the fittest, or master, sequence; thus, , where is the genome length. Without loss of generality, we let the fittest sequence be the founder sequence (Fig. S1). Thus, initially, the distribution of genome frequencies was localized at Hamming class zero. As time (or the number of generations) progressed, mutant genomes arose and higher Hamming classes were populated (Fig. 1A). The average number of mutations contained in the proviral pool gradually increased and the peak of the frequency distribution shifted to higher Hamming classes. After a certain number of generations, here ∼500, the distribution became steady; no net shift occurred from generation 500 to 10000. Correspondingly, the Shannon entropy, , rose from zero at the start and attained the steady value, , of by generation 500 (Fig. 1B). We averaged the above frequencies over the last 1500 generations and over several realizations of our simulations to obtain the expected frequency distribution at steady state. The latter distribution yielded the structure of the viral quasispecies (Fig. 1A).

Figure 1

Viral genomic diversification in one realization of our simulations.

(A) The frequencies of proviral genomes in different Hamming classes at various times (generations) indicated in one realization of our simulations with nucleotides, cells, substitutions/site/replication and infections/cell. Other parameters are mentioned in Methods. The quasispecies (thick black line) is the average frequency distribution over the last 1500 generations. (B) The corresponding evolution of the Shannon entropy (purple) and its mean over the last 1500 generations (black).

Viral genomic diversification in one realization of our simulations.

Error catastrophe

Upon increasing the mutation rate, , the quasispecies shifted to higher Hamming classes indicating the increasing accumulation of mutations (Fig. 2A). The peak Hamming class (i.e., the Hamming class with the maximum frequency) shifted gradually from to as increased from to substitutions/site/replication. (Note that nucleotides in Fig. 2A.) At this point, a small increase in to substitutions/site/replication produced a remarkable jump in the peak Hamming class to . Subsequent increases in again caused only gradual shifts in the peak Hamming class. This jump was more dramatic with larger genome lengths. With nucleotides, the peak Hamming class jumped from to when increased from to substitutions/site/replication (Fig. 2B). Correspondingly, jumped from 0.24 to as increased from to substitutions/site/replication (Fig. 2C). implied that all possible genomes occurred with equal frequencies. The number of distinct genomes in Hamming class is . Thus, if all genomes occurred with equal likelihood, the Hamming class frequencies would follow . Indeed, we found that the quasispecies structure obtained by our simulations was identical to the latter distribution of Hamming class frequencies (Fig. 2B inset), confirming that all genomes occurred with equal likelihood when . Thus, the jump in indicated the transition to error catastrophe.

Figure 2

Structure of the quasispecies and the error threshold.

The structure of the quasispecies at different values of indicated (substitutions/site/replication) with (A) and (B) nucleotides. Inset in (B) compares the quasispecies structure predicted by our simulations for substitutions/site/replication (line) with that expected when all genomes occur with equal likelihood (i.e., ; see text) (symbols). (C) The mean Shannon entropy, , corresponding to the quasispecies in (A) and (B). Other parameters are the same as in Fig. 1.

Structure of the quasispecies and the error threshold.

Error threshold

The transition from low to occurred over a narrow range of values of . For within this range, the quasispecies structure was bimodal because error catastrophe occurred in some realizations and not in others depending on the stochastic variations encountered. For illustration, we present several independent realizations of our simulations at three values of , namely, , and substitutions/site/replication (Fig. 3), where the first is well below the transition from low to , the second is in the transition region, and the third is well above the transition in Fig. 2B. With substitutions/site/replication, in each realization rose from zero and reached in generations (Fig. 3A). There was little variation between the realizations. With substitutions/site/replication, rose from zero and reached in generations, again with little variation between the different realizations (Fig. 3C). With substitutions/site/replication, however, we found substantial variation between realizations (Fig. 3B). rose from zero and reached a plateau value of in generations. In some realizations, remained at this value till the end, i.e., 10000 generations. In other realizations, at some intermediate time, which differed from realization to realization, rose sharply from and reached 1. remained at 1 subsequently. Averaging the Hamming class frequencies thus yielded the bimodal structure of the quasispecies observed for substitutions/site/replication (Fig. 2B), where realizations with yielded the peak at Hamming class and realizations with yielded the peak at Hamming class .

Figure 3

Stochastic evolution near the error threshold.

Stochastic evolution near the error threshold.

Time-evolution of the Shannon entropy, , in several independent realizations of our simulations at three values of , namely, (A) , (B) and (C) substitutions/site/replication. The other parameters are the same as in Fig. 2B. The different realizations in (A) and (C) nearly overlap and are indistinguishable. Our aim was to identify the smallest value of at which error catastrophe was ensured. We found that when stochastic variations became insignificant and error catastrophe occurred nearly invariably. We therefore identified the smallest for which as the error threshold, . Thus, and substitutions/site/replication for and nucleotides, respectively, in Fig. 2C.

Influence of model parameters on the error threshold

Genome length

Upon increasing , the transition to error catastrophe became sharper and occurred at lower values of (Fig. 4). For instance, substitutions/site/replication when nucleotides and substitutions/site/replication when nucleotides (Fig. 4A). Further, decreased linearly with with a slope of −1.07 (Fig. 4B) indicating that is approximately proportional to . These predictions that the transition sharpens with increasing and that are in agreement with the quasispecies theory [7], [8].

Figure 4

Dependence of the error threshold on the genome length.

(A) The mean steady state Shannon entropy, , as a function of the mutation rate, , for different genome lengths, L (nucleotides), indicated. Other parameters are the same as in Fig. 1. The corresponding structures of the quasispecies are shown in Fig. S2. (B) The resulting dependence of the error threshold, , on L. Inset in (B) shows a linear fit (line) to the data (symbols) yielding .

Dependence of the error threshold on the genome length.

Population size

Increasing the cell population, , increased (Fig. 5). For instance, increased from to substitutions/site/replication as rose from 100 to 10000 cells (Fig. 5A). Eventually, the dependence of on weakened and appeared to plateau asymptotically as increased (Fig. 5B). Further, decreased linearly as increased (Fig. 5B inset), as suggested by previous extensions of the quasispecies theory to finite populations [59], [60]. A fit to the data in Fig. 5B yielded (Fig. 5B inset). Extrapolation provided an estimate of for , which for the parameters in Fig. 5B was substitutions/site/replication.

Figure 5

Dependence of the error threshold on the population size.

(A) The mean steady state Shannon entropy, , as a function of the mutation rate, , for different population sizes, C (cells), indicated. Other parameters are the same as in Fig. 1. The corresponding structures of the quasispecies are shown in Fig. S3. (B) The resulting dependence of the error threshold, , on C. Inset in (B) shows a linear fit (line) to the data (symbols) yielding .

Dependence of the error threshold on the population size.

Recombination and multiple infections of cells

Increasing the recombination rate, , or the number of infections per cell, , also increased (Fig. 6). rose from to substitutions/site/replication as increased from zero to crossovers/site/replication when infections/cell (Fig. 6A and C). Similarly, rose from to substitutions/site/replication as increased from infection/cell, following a distribution with few multiple infections, to infections/cell when crossovers/site/replication (Fig. 6B, C, and D). The quasispecies structure at fixed shifted to smaller peak Hamming classes and widened upon increasing (Fig. 6C inset), consistent with our previous observations that recombination increased the mean fitness and the diversity of the quasispecies with small [47] (also see Discussion). Consequently, higher mutation rates were necessary to induce an error catastrophe as increased. Increasing effectively increased recombination [51], [61] and hence also resulted in an increase in .

Figure 6

Dependence of the error threshold on the recombination rate.

The mean steady state Shannon entropy, , as a function of the mutation rate, , for different recombination rates, , indicated (crossovers/site/replication) with (A) infections/cell and (B) determined from a distribution with few multiple infections (Methods). Here, nucleotides, cells, and the other parameters are the same as in Fig. 1. The corresponding structures of the quasispecies are shown in Fig. S4 and Fig. S5, respectively. (C) and (D) The resulting dependence of the error threshold, , on in (A) and (B), respectively. Inset in (C) shows the quasispecies for different (crossovers/site/replication) indicated with substitutions/site/replication and infections/cell.

Dependence of the error threshold on the recombination rate.

Mutation and genome sequence composition

The HIV-1 genome is known to be A rich [62]. Besides, not all mutations occur at the same rate; G to A transitions are the most frequent [63]. We therefore performed simulations with a founder sequence containing nucleotides at frequencies corresponding to those in HIV-1 and with nucleotide-specific transition rates mimicking HIV-1 (Methods). When all nucleotides were equally represented but mutations occurred in a nucleotide-specific manner, we found that substitutions/site/replication (Fig. S6), which is close to substitutions/site/replication when mutations occurred in a nucleotide independent manner (Fig. 2). Further, substitutions/site/replication when the founder sequence mimicked the HIV-1 nucleotide frequencies and mutations occurred in a nucleotide-specific manner (Fig. S6). Thus, did not depend significantly on the nucleotide composition of the founder sequence (as also observed in Fig. S1) and on whether mutations occurred in a nucleotide-independent manner or the observed nucleotide-specific manner.

Fitness landscape

The above simulations employed a fitness landscape derived from data [64] of in vitro replicative fitness assays (see [37] and Methods). To examine whether our predictions were specific to the fitness landscape employed, we performed simulations with several alternative theoretical landscapes. First, we modified our present landscape to allow genomes to have zero fitness: we set the fitness of all genomes below a particular threshold, , in the above landscape to zero (Methods). The resulting landscape is similar to the truncated landscape employed previously (e.g., see [65]). The minimum fitness in the above landscape was 0.24. We performed simulations with and with nucleotides and found little variation in from the above estimate (Fig. S7A) (also see Discussion). In a previous study, we found that an exponential fitness landscape, which assigns a fixed fitness penalty for every mutation (see below), does not agree with patient data and thus may not be representative of HIV-1 in vivo [37]. The complex fitness interactions of HIV-1 mutations unraveled recently [66] have been characterized using a fitness landscape that accounts sequentially for the effects of individual mutations, interactions between pairs of mutations, between triplets of mutations, and so on [67], akin to spin glass-based and other correlated landscapes employed earlier [60]. We found that such a landscape reduced under limiting conditions to a polynomial in the Hamming distance of genomes from the master sequence (Methods). We identified the coefficients of the polynomial by fitting mean fitness data (Fig. S7B inset) and performed simulations with the resulting best-fit polynomial landscape. We found that substitutions/site/replication (Fig. S7B), close to substitutions/site/replication obtained with the landscape above (Fig. 2), indicating only a minor influence of these modifications to the fitness landscape on .

Comparison with quasispecies theory

To test whether our simulations converged to the quasispecies theory, we performed simulations with parameter values that mimic the assumptions employed in the quasispecies theory. We let infection/cell and crossovers/site/replication to represent the asexual reproduction of effectively haploid individuals. We chose a large population size, cells, and a small genome length, nucleotides, to approximate the infinite population size limit (). We employed the single peak fitness landscape, typically employed in calculations of the quasispecies theory, which we implemented by letting viral production be virions/cell for cells infected with the master sequence and virion/cell for all other cells and then selecting virions with equal probability from the viral pool. We also solved the equations of the quasispecies theory using the latter fitness landscape (Methods). Remarkably, our simulations were in excellent agreement with the quasispecies theory for a wide range of mutation rates (Fig. 7A).

Figure 7

Comparisons of our simulations with the quasispecies theory.

Comparisons of our simulations with the quasispecies theory.

Structure of the quasispecies for different values of (substitutions/site/replication) indicated determined by our simulations (circles connected by lines) and by the quasispecies theory (pluses) for (A) isolated peak fitness landscape, (B) exponential landscape with s = 0.01, and (C) the experimental landscape with d = 3. To test the robustness of this agreement, we performed simulations with two other fitness landscapes, an exponential landscape, , where the relative fitness declined nearly linearly (at rate per mutation) with the number of mutations from the master sequence, , and the experimental landscape above rescaled to the smaller genome length. In both these cases, we let virions/cell in our simulations and selected virions in proportion to their relative fitness. Again, our simulations were in excellent agreement with solutions of the quasispecies theory using the latter fitness landscapes (Figs. 7B and C). Thus, with large population sizes, our simulations were in quantitative agreement with the quasispecies theory. With smaller population sizes, our simulations predicted trends that were consistent with previous finite population models of genomic evolution. Further, with parameter values representative of HIV-1 infection in vivo, we showed previously that our simulations quantitatively described patient data of the evolution of viral diversity and divergence over extended durations (∼10–12 years) [37], giving us confidence in our simulations. We employed our simulations to estimate the error threshold of HIV-1.

Estimate of the error threshold of HIV-1

We performed simulations with parameter values that mimic patient data of viral genomic diversification quantitatively (Methods). We previously analyzed data of viral diversity and divergence from 9 patients [68] and found that with infections/cell, following observations of Jung et al. [24], the best-fit values of varied from 400–10000 cells across the patients with a mean of cells [37]. Accordingly, we performed simulations here with , , and cells. We found a sharp error catastrophe with , and substitutions/site/replication, respectively (Fig. 8A). A smaller frequency of multiple infections of cells, mimicking the observations of Josefsson et al. [69], was also able to capture the same patient data with higher best-fit values of [37]. Then, except for one patient (Patient 11), for whom was 105 cells, the best-fit values of were in the range of 1500–10000 cells. Recognizing that the dependence of on was weak for large , we performed simulations with , , cells (where 5000 cells was the mean for the remaining 8 patients) using drawn from a distribution mimicking the observations of Josefsson et al. We found again that a sharp error catastrophe occurred with , and substitutions/site/replication for the three cases (Fig. 8B), close to the estimates above. The modest increase of with again displayed the dependence (, Fig. 8B inset) and yielded substitutions/site/replication for cells and substitutions/site/replication for . Taken together, our simulations predict that HIV-1 undergoes a sharp error catastrophe and estimate the error threshold to be in the range substitutions/site/replication.

Figure 8

Estimates of the error threshold of HIV-1.

Mean steady state Shannon entropy, , as a function of the mutation rate, , for different population sizes, (cells), indicated with (A) infections/cell and (B) determined from a distribution with few multiple infections of cells, where 77% of the cells were singly, 19% doubly, and 4% triply infected. Other parameters are as follows: nucleotides; crossovers/site/replication; infectious rogeny virions/cell; the fitness landscape , where , , and (Methods). The corresponding structures of the quasispecies are shown in Fig. S8 and Fig. S9, respectively. Inset in (B) shows a linear fit (line) to the data (symbols) yielding .

Estimates of the error threshold of HIV-1.

Discussion

The success of mutagenic drugs against HIV-1 hinges on reliable estimates of the error threshold of HIV-1, which are currently lacking. The assumptions employed in the quasispecies theory render it inadequate for describing HIV-1 evolution. Here, we have employed population genetics-based simulations of HIV-1 evolution to examine the susceptibility of HIV-1 to mutation-driven error catastrophe. With these simulations, we found that HIV-1 experienced a sharp error catastrophe at a mutation rate of substitutions/site/replication. Our simulations incorporated key evolutionary forces underlying the within-host genomic diversification of HIV-1 and were shown previously to be in agreement with longitudinal patient data of viral diversity and divergence [37], giving us confidence in our estimate of the error threshold. That the estimated error threshold is ∼2–6 fold higher than the natural mutation rate of HIV-1 in vivo, substitutions/site/replication [63], [70], suggests that HIV-1 exists close to its error threshold. The mutation rate of HIV-1 thus appears to be evolutionarily optimized to maximize diversity while retaining genomic identity. A relatively small (2–6 fold) increase in the mutation rate may thus drive HIV-1 past its error threshold, presenting a quantitative guideline for mutagenic drugs. The quasispecies theory has presented remarkable insights into viral evolution and suggested new strategies of intervention [9], [71]–[74]. Yet, its ability to describe viral evolution comprehensively is limited, as recognized by Eigen himself [75], by its assumptions of, for instance, an infinitely large population size, asexual reproduction of haploid organisms, and an isolated peak fitness landscape where all mutants are equally less fit than the master sequence. The last 40 years have seen significant efforts to relax these assumptions and tailor the quasispecies theory to specific organisms, especially HIV: Several, more complex and more realistic fitness landscapes have been employed [37], [49], [54], [55], [58], [60], [76]–[78]. Simultaneously, population genetics-based approaches, which naturally consider stochastic effects associated with finite populations, have been developed [59], [60], [79]–[85]. The latter descriptions, however, while painting a more realistic picture of the organisms considered, often make marked deviations from the key predictions of the quasispecies theory. In particular, finite population models may not converge to the quasispecies theory in the infinite population limit [59], or with more complex fitness landscapes, a sharp error catastrophe may cease to occur [54]–[58]. Consequently, questions arise of the relative merits and appropriateness of using the quasispecies theory or population genetics-based approaches to describe viral evolution (reviewed in [56]). Here, we showed that our simulations converge to the quasispecies theory in the large population size limit, indicating that quasispecies theory is not at odds with population genetics-based descriptions at least of HIV-1. In a related study, convergence of similar population genetics-based descriptions to the quasispecies theory has been established formally [86]. Importantly, with a fitness landscape representative of HIV-1 [64], and with other parameters that mimic patient data, our simulations predict that a sharp error threshold exists for HIV-1. In our simulations, the error threshold scaled linearly with , where C is the population size of cells, in agreement with previous studies [59], [60]. We note that some studies using alternative simulation strategies found a linear scaling with [80]. The origin of this discrepancy in the dependence of the error threshold on C remains to be established. Nonetheless, the weak dependence of the error threshold on C implies that our estimate of the error threshold remains robust to any increase in the effective population size in vivo either due to inter-patient variations or due to uncertainties in the estimates of model parameters. We showed previously that estimates of the effective population size of HIV-1 in vivo were sensitive to the frequency of multiple infections of cells, M, and the recombination rate [37]. Few estimates of M in vivo are available. While one study of infected splenocytes in two patients found that most cells were multiply infected with a mean of 3–4 proviruses per cell [24], recent evidence from peripheral blood mononuclear cells of several acute and chronically infected individuals suggests that multiple infections of cells may be rare [69], and hence the influence of recombination weak [51], [61]. Using parameters corresponding to either observation, we found that our simulations captured patient data of viral diversification with appropriate values of C [37]. Using both combinations of M and C that matched patient data, we estimated the error threshold of HIV-1 here and found that the estimates were close, suggesting that uncertainties in the frequency of multiple infections did not significantly affect our estimate of the error threshold. The role of recombination in HIV-1 evolution has remained difficult to interpret [2], [39], [87]. Just as recombination can bring favorable mutations together, it can also drive favorable combinations of mutations apart, raising questions more generally about the evolutionary origins of the ubiquitously present recombination and sexual reproduction, often referred to as the paradox of sex [39], [88], [89]. The benefit of sex has recently been suggested to arise from the subtle interactions of random genetic drift, selection, and recombination in finite populations [90]. When the population size is small, negative linkage disequilibrium () is generated by the Hill-Robertson effect [91]. Recombination lowers the absolute value (magnitude) of , which when enhances diversity and favors selection [89], [92]–[94]. Indeed, our simulations showed that as the recombination rate increased, the quasispecies shifted to lower peak Hamming classes and spread wider, implying greater average fitness and diversity. In agreement, we showed previously that the mean fitness and diversity of the viral population increased with recombination when the population size was small [47]. An added advantage of recombination that we found here was that the error threshold also increased with recombination, rendering the quasispecies more resistant to mutation-driven loss of genetic information. In an earlier study, recombination was found in contrast to decrease the error threshold [38]. The latter study, however, considered an infinitely large population size with a single peak landscape, which is expected to generate . Accordingly, the lowering of by recombination decreases diversity and is therefore expected to lower the error threshold. generated by the Hill-Robertson effect underlies the enhancement of the error threshold due to recombination in our simulations. Given that host factors such as A3G combat HIV-1 by increasing the viral mutation rate [19], [20], recombination, in synergy with Vif-induced degradation of A3G, may serve to stall the onslaught of A3G and establish lasting infection. The population sizes we employed were obtained by fits of our simulations to patient data [37]. The census population size of HIV-1 is ∼107–108 infected cells [95]. Yet, the effective population sizes obtained by several independent studies are small and lie in the range of ∼102–105 cells (reviewed in [35]). The effective population size is defined as the size of the population in an idealized model of evolution that has the same population genetic properties as that of the natural population [96]. The reasons underlying the differences between the census and effective population sizes of HIV-1 remain to be established; bottlenecks introduced by the immune system and other selection pressures [36], asynchronous infections of cells [97], pseudohitchhiking [98], and metapopulation structure [99] may all contribute to the small effective population sizes estimated, but their roles in HIV-1 evolution are yet to be fully elucidated. We employed a fitness landscape that is a measure of the relative replicative ability of various HIV-1 mutants determined using in vitro assays [64]. The landscape suggests that the predominant fitness effects depend on the number and not on the specific combinations of mutations, allowing us to group genomes into Hamming classes [100]. Simpler fitness landscapes, such as multiplicative landscapes, were not compatible with patient data [37]. More comprehensive fitness interactions are beginning to be unraveled [66], [101]. The resulting fitness values [66] have been shown to be correlated with the viral load in vivo [102]. Under certain limiting conditions, we found that the latter interactions yielded a fitness landscape consistent with the landscape we employed above (Methods and Fig. S7). Further, our estimates of the error threshold were robust to minor variations in the fitness landscape. For instance, allowing lethal mutations using a truncated landscape, where genomes with fitness below a certain threshold were assumed replication incompetent, did not substantially alter the error threshold (Fig. S7). We recognize that lethal mutations can occur more frequently; for instance, 40% of random mutations in an RNA viral genome were found to be lethal [103]. Such a scenario is estimated to increase the error threshold for an infinitely large population size and a single peak fitness landscape by a factor of ∼5/3 [104], [105]. Understanding the influence of major variations in the fitness landscape is computationally prohibitive and awaits future studies. Finally, we recognize that we have assumed uniform recombination rates and either uniform or nucleotide-specific mutation rates across the HIV-1 genome, whereas mutation [63] and recombination hot-spots [106], [107] are known to exist within HIV-1. Estimation of the error threshold of HIV-1 from experimental studies of viral mutagenesis-induced loss of viral infectivity has not been possible because of several confounding effects. For instance, 2–3 fold increase in the mutation rate obliterated HIV-1 infection in vitro [14], [15], in agreement with our present findings. The agreement, however, is not conclusive because establishing that the loss of infectivity in vitro is due to an error catastrophe is not straight-forward. The loss of infectivity may be due to an error catastrophe, as demonstrated with poliovirus [9], but may also arise from other effects: At mutation rates above the natural mutation rate but below the error threshold, production of defective genomes may drain resources within cells, compromising the production of viable genomes and causing extinction of the viral population [13]. Thus, whether viral extinction necessarily implies crossing the error threshold remains unclear. Conversely, crossing the error threshold may not imply viral extinction; the latter may require crossing an alternative ‘extinction’ threshold, where each viral particle produces less than one progeny that infects a cell, akin to the epidemiological threshold for extinction of disease [108]. (Note that in our present simulations, infection was sustained by keeping the pool of infected cells constant.) Viral extinction may also be determined by the influence of mutations on protein stability and its impact on viability [109], [110]. Establishing which of these phenomena underlies the observed loss of viral infectivity in vitro remains a challenge. Finally, we recognize that the dynamics of the transition to error catastrophe, which remains poorly characterized, is also of importance to mutagenic strategies targeting HIV-1. For instance, 9–24 serial passages were required for loss of viral infectivity in vitro [14]. In a recent clinical trial with an HIV-1 mutagen, no viral load decline was observed in patients following 124 days of treatment although the mutational patterns were altered [18]. This absence of apparent antiviral activity was attributed to the lack of knowledge of both the level and the duration of exposure of the drug necessary to compromise the viability of HIV-1 [18], reiterating the importance of reliable estimates of the error threshold and of the timescales of the transition. Our estimate of the error threshold together with the dose-response data of the drug may help determine the level of drug necessary to induce an error catastrophe in HIV-1. Further, although we focused here on identifying the structure of the HIV-1 quasispecies and estimating its error threshold, our simulations present a framework for determining the time required to ensure completion of the transition to error catastrophe, thus elucidating guidelines for the duration of treatment with mutagenic drugs.

Methods

Simulation protocol

Creation of the viral pool

We represented an HIV-1 genome as a sequence of nucleotides. We generated such a sequence with each nucleotide chosen randomly from A, G, C and U with equal probability or with probabilities representative of the nucleotide content in HIV-1 (see below). We let the resulting sequence be the master sequence and set its relative fitness to unity. We represented a virion by the pair of RNA genomes it contained. We let the initial pool of virions all carry the master sequence.

Infection of cells

We considered a pool of uninfected cells. We randomly selected virions, each with a probability equal to its relative fitness (see below), from the viral pool to infect one of the uninfected cells. was either constant or drawn from a predetermined distribution (see below). The genomes of the chosen virions were transferred to the cell and the virions were removed from the viral pool. This process was repeated for each of the remaining cells.

Reverse transcription

Following infection, the viral RNA were mutated and recombined. We considered one of the pairs of viral RNA within an infected cell. We selected one of the two genomes in the pair randomly and began copying its nucleotide sequence bit by bit to the resulting recombinant DNA genome. At each position, we switched templates to the other RNA strand with probability , the recombination rate, thus producing a recombinant genome that was a mosaic of the two parent viral RNA genomes. Next, we mutated the recombinant genome with probability at every position, where was the mutation rate. In some simulations, we let the probability be nucleotide-specific (see below). The resulting sequence was the proviral DNA produced by reverse transcription. We repeated this process for the remaining pairs of viral RNA within the cell and in all the other cells.

Viral production

Each infected cell produced progeny virions. For each virion produced from a cell, we randomly chose two of the proviral DNA present in the cell and assigned their sequences as the viral RNA genomes of the virion. When M = 1, the same provirus was chosen twice. The transcription of proviral DNA to viral RNA is catalyzed by host proteins and introduces far fewer mutations than reverse transcription. We therefore assumed that no mutations occurred during proviral DNA transcription. The resulting progeny virions constituted the new viral pool for infecting the next generation of uninfected cells. We repeated the process for a large number of generations (see below) and averaged over many such realizations to obtain the expected evolution for a given set of parameter values. We performed the simulations using a computer program written in C++ (Text S1).

Measures of viral evolution and quasispecies structure

Hamming class frequencies

In each generation we determined the number of proviral genomes, , belonging to different Hamming classes, , where . Note that Hamming class contains genomes carrying mutations with respect to the master sequence. The frequency of genomes in Hamming class was .

Shannon entropy

By definition, the per-bit Shannon entropy is , where is the frequency of genome i. We assumed that transitions alone occurred (see below), thus restricting the total number of distinct genomes to (also see [100]). The Hamming class frequency , where is the Hamming distance of genome from the master sequence so that the summation extends over all i belonging to Hamming class j. Because all genomes in a given Hamming class were equally fit we assumed that they were equally likely to occur, so that , where is the number of possible distinct genomes in Hamming class . (This assumption neglects the influence of recombination.) Substituting for and simplifying yielded . Note that when the master (or any other) sequence alone exists and when all possible genomes occur with equal likelihood, the latter signifying an error catastrophe. We evaluated factorials using Stirling's approximation, for large values of (which sometimes yielded ).

Simulation parameters

We employed parameter values representative of HIV-1 infection in vivo [37]. Variations are mentioned below and in the text and figures. We let nucleotides and ρ = 8.3×10−4 crossovers/site/replication [46]. We fixed to 3 infections/cell following Jung et al. [24], or let follow a distribution–similar to that observed by Josefsson et al. [69]–where 77% of the cells were singly, 19% doubly, and 4% triply infected [37]. With each , we chose an appropriate that matched patient data [37]. Following recent estimates of the basic reproductive ratio of HIV-1 in vivo [111], we let P = 10 infectious progeny virions/cell. A majority of HIV-1 mutations are transitions [70]; as a simplification, we therefore ignored transversions, insertions and deletions. We spanned a wide range of mutation rates in order to identify the error threshold. We let selection follow the fitness landscape derived in [37] to capture corresponding experimental data from [64]. Accordingly, the relative fitness of genome is represented by , where is the minimum fitness of sequences, is the Hamming distance at which , and is analogous to the Hill coefficient [37]. The fitness of a virion is determined by the average Hamming distance of its two genomes from the master sequence. We let simulations proceed to 10000 generations (∼30 years). We examined the influence of variations in some of these parameters as mentioned below.

Nucleotide frequencies in HIV-1

In a recent study, 1357 whole genome sequences of HIV-1 were analyzed for their nucleotide composition and found to contain on average ∼36% A's, 24% G's, 18% C's and 22% U's [62]. To mimic this composition, we generated founder sequences by choosing A, G, C and U at each position with probabilities equal to 0.36, 0.24, 0.18 and 0.22, respectively.

Nucleotide-specific mutation rates

The frequency of occurrence of different types of mutations in a single round of HIV-1 replication has recently been characterized [63]. In a representative experiment, of the 274 transitions observed, 10.58% were A to G, 53.28% were G to A, 29.56% were C to T and 6.57% were T to C transitions. We implemented nucleotide specific mutation rates mimicking these frequencies as follows. We let , , , and be the mutation rates of A, G, C and T, respectively. (We used T and U interchangeably.) We defined the average mutation rate , where is the frequency of nucleotide K in the unmutated sequence. If (equal representation of all nucleotides), then are expected to be proportional to the above frequencies of transitions observed. In other words, , and so on. Using this in the definition of the average mutation rate, we obtained , or . Similarly, we found , , and . Thus, given , we determined whether a mutation occurred at any position containing a particular nucleotide using the corresponding values of , , , and . As an approximation, we employed the latter values when were not all equal as well. We performed simulations with two alternative fitness landscapes. First, we modified the fitness landscape above by setting the fitness of genomes below a particular threshold to zero, akin to truncated landscapes employed previously [57], [65], [104], [105]. Thus, if and otherwise. We performed simulations with and and with nucleotides. Second, we followed recent studies of Bonhoeffer and colleagues [66], [67], [102], who assessed the in vitro replicative capacity of about 70000 HIV-1 sequences and argued that the resulting fitness landscape may be described by an equation of the form , where if there is a mutation at position and is zero otherwise; is the loss of fitness due to a mutation at position ; are terms quantifying pair-wise epistatic effects; the third term quantifies ternary effects; and so on. Under conditions when , independent of , the second term above becomes , because the latter summation then simply counts the number of mutations in genome i, which is equal to d, the Hamming distance from the master (and also the founder) sequence. Similarly, assuming that pair-wise epistatic effects are also position independent, the third term becomes , because the latter double summation now counts the number of ways in which two mutations can be chosen from the d mutations in genome i. Proceeding similarly, it follows that the above expression for fitness becomes a polynomial in d, namely, Note that the constant term is set to zero to ensure that the master sequence () has the maximum relative fitness (). We found that the latter polynomial with terms up to degree 3 provided a good fit to the mean replicative fitness data obtained earlier [64] (Fig. S7). We performed simulations with the resulting best-fit polynomial with non-monotonicities suppressed.

Predictions of the quasispecies theory

According to the quasispecies theory [7], [8], the structure of the quasispecies is obtained as the dominant eigenvector of the value matrix, . We constructed the mutation matrix by recognizing that its element, , is the probability that genome mutates to genome , with the Hamming distance between genomes and . The selection matrix is a diagonal matrix with elements , the relative fitness of the respective genomes. We employed three different fitness landscapes: the experimental landscape above, the isolated peak landscape, and the exponential landscape (see above). We computed the dominant eigenvector of and normalized it so that . The Hamming class frequencies were then . We performed computations using a program written in MATLAB. Dependence of the structure of the quasispecies on the founder sequence. Structures of the quasispecies obtained when the founder sequence was the master sequence (circles connected by lines) or was a sequence obtained by mutating the master sequence at 10% of the sites chosen randomly (pluses) with nucleotides for a range of values of indicated (substitutions/site/replication). The other parameters are the same as in Fig. 1. Inset shows the corresponding dependence of the mean steady state Shannon entropy, , on obtained with the master sequence (circles connected by lines) or the mutated sequence (diamonds) as the founder sequence. The structure of the quasispecies and the error threshold are thus not influenced by the choice of the founder sequence. (PDF) Click here for additional data file. Quasispecies structure as a function of the genome length. Structures of the quasispecies obtained with different genome lengths, L, and over a range of values of (substitutions/site/replication) indicated. (Some intermediate values of are omitted for clarity.) The corresponding steady state Shannon entropy, , and the resulting dependence of the error threshold, , on L are presented in Fig. 4. (PDF) Click here for additional data file. Quasispecies structure as a function of the population size. Structures of the quasispecies obtained with different population sizes, C, and over a range of values of (substitutions/site/replication) indicated. (Some intermediate values of are omitted for clarity.) The corresponding steady state Shannon entropy, , and the resulting dependence of the error threshold, , on C are presented in Fig. 5. (PDF) Click here for additional data file. Quasispecies structure as a function of the recombination rate with M = 3 infections/cell. Structures of the quasispecies obtained with different recombination rates, (crossovers/site/replication), and over a range of values of (substitutions/site/replication) indicated. (Some intermediate values of are omitted for clarity.) The corresponding steady state Shannon entropy, , and the resulting dependence of the error threshold, , on are presented in Figs. 6A and C. (PDF) Click here for additional data file. Quasispecies structure as a function of the recombination rate with M∼1 infection/cell. Structures of the quasispecies obtained with different recombination rates, (crossovers/site/replication), and over a range of values of (substitutions/site/replication) indicated, with M drawn from a distribution (Methods). (Some intermediate values of are omitted for clarity.) The corresponding steady state Shannon entropy, , and the resulting dependence of the error threshold, , on are presented in Figs. 6B and D. (PDF) Click here for additional data file. Dependence of the error threshold on the nucleotide composition of the founder sequence and nucleotide-specific mutation rates. The mean steady state Shannon entropy, , as a function of the mutation rate obtained with the founder sequence containing all nucleotides with equal frequencies and mutating at equal rates (blue), reproduced from Fig. 2C ( nucleotides). The corresponding when the founder sequence contained nucleotides at frequencies representative of HIV-1 (∼36% A's, 24% G's, 18% C's and 22% U's) mutating at equal rates (green) or at nucleotide-specific rates (, , , and ) (red). The other parameters are the same as in Fig. 2C. (PDF) Click here for additional data file. Dependence of the error threshold on the fitness landscape. (A) The mean steady state Shannon entropy, , as a function of the mutation rate obtained with the fitness landscape if and otherwise, with (red), 0.3 (blue), and 0.4 (green). Note that corresponds to the simulations in Fig. S1. With larger lengths, the fitness landscape has to be appropriately rescaled to avoid the extinction of the viral population due to severe fitness penalties (not shown). (B) as a function of obtained with the landscape (blue) and the polynomial fitness landscape (red). Note that the former data is the same as in Fig. 2C with nucleotides. Inset in (B) shows best fits of the two landscapes (blue and red lines, respectively) to data (symbols) excluding outliers (open symbols) from Bonhoeffer et al. (Science 306: 1547–1550 (2004)) modified to account for the observed frequencies of synonymous and non-synonymous mutations (see Balagam et al., PLoS ONE 6: e14531 (2011)). The best-fit parameter estimates are , and (blue); and , and (red). Because data was available only until Hamming distance ∼90 to which the polynomial can be fit, extrapolating the polynomial to higher Hamming distances yielded an unrealistic increase of fitness. To avoid this non-monotonic behavior, the fitness of genomes beyond the minimum (which occurred at Hamming distance 82) was set equal to the minimum. (PDF) Click here for additional data file. Quasispecies structures yielding estimates of the error threshold of HIV-1 with M = 3 infections/cell. Structures of the quasispecies obtained with different population sizes, C, and over a range of values of (substitutions/site/replication) indicated. (Some intermediate values of are omitted for clarity.) The corresponding dependence of the steady state Shannon entropy, , on is presented in Fig. 8A. (PDF) Click here for additional data file. Quasispecies structures yielding estimates of the error threshold of HIV-1 with M∼1 infection/cell. Structures of the quasispecies obtained with different population sizes, C, and over a range of values of (substitutions/site/replication) indicated, with M drawn from a distribution (Methods). (Some intermediate values of are omitted for clarity.) The corresponding dependence of the steady state Shannon entropy, , on is presented in Fig. 8B. (PDF) Click here for additional data file. The computer program employed for our simulations. (PDF) Click here for additional data file.

96 in total

Review 1. Resolving the paradox of sex and recombination.

Authors: Sarah P Otto; Thomas Lenormand
Journal: Nat Rev Genet Date: 2002-04 Impact factor: 53.242

Review 2. Molecular strategies to design an escape-proof antiviral therapy.

Authors: Ben Berkhout; Rogier W Sanders
Journal: Antiviral Res Date: 2011-04-12 Impact factor: 5.970

3. Coordinate linkage of HIV evolution reveals regions of immunological vulnerability.

Authors: Vincent Dahirel; Karthik Shekhar; Florencia Pereyra; Toshiyuki Miura; Mikita Artyomov; Shiv Talsania; Todd M Allen; Marcus Altfeld; Mary Carrington; Darrell J Irvine; Bruce D Walker; Arup K Chakraborty
Journal: Proc Natl Acad Sci U S A Date: 2011-06-20 Impact factor: 11.205

4. Building on the past to define an efficient path to an HIV vaccine.

Authors: Bette Korber
Journal: Expert Rev Vaccines Date: 2011-07 Impact factor: 5.217

5. A finite population model of molecular evolution: theory and computation.

Authors: Narendra M Dixit; Piyush Srivastava; Nisheeth K Vishnoi
Journal: J Comput Biol Date: 2012-10 Impact factor: 1.479

6. Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection.

Authors: R Shankarappa; J B Margolick; S J Gange; A G Rodrigo; D Upchurch; H Farzadegan; P Gupta; C R Rinaldo; G H Learn; X He; X L Huang; J I Mullins
Journal: J Virol Date: 1999-12 Impact factor: 5.103

7. Lethal mutants and truncated selection together solve a paradox of the origin of life.

Authors: David B Saakian; Christof K Biebricher; Chin-Kun Hu
Journal: PLoS One Date: 2011-07-26 Impact factor: 3.240

8. Exploring the complexity of the HIV-1 fitness landscape.

Authors: Roger D Kouyos; Gabriel E Leventhal; Trevor Hinkley; Mojgan Haddad; Jeannette M Whitcomb; Christos J Petropoulos; Sebastian Bonhoeffer
Journal: PLoS Genet Date: 2012-03-08 Impact factor: 5.917

9. Differential trends in the codon usage patterns in HIV-1 genes.

Authors: Aridaman Pandit; Somdatta Sinha
Journal: PLoS One Date: 2011-12-22 Impact factor: 3.240

10. Assessing predicted HIV-1 replicative capacity in a clinical setting.

Authors: Roger D Kouyos; Viktor von Wyl; Trevor Hinkley; Christos J Petropoulos; Mojgan Haddad; Jeannette M Whitcomb; Jürg Böni; Sabine Yerly; Cristina Cellerai; Thomas Klimkait; Huldrych F Günthard; Sebastian Bonhoeffer
Journal: PLoS Pathog Date: 2011-11-03 Impact factor: 6.823

7 in total

1. Tautomerism provides a molecular explanation for the mutagenic properties of the anti-HIV nucleoside 5-aza-5,6-dihydro-2'-deoxycytidine.

Authors: Deyu Li; Bogdan I Fedeles; Vipender Singh; Chunte Sam Peng; Katherine J Silvestre; Allison K Simi; Jeffrey H Simpson; Andrei Tokmakoff; John M Essigmann
Journal: Proc Natl Acad Sci U S A Date: 2014-07-28 Impact factor: 11.205

2. An analog of camptothecin inactive against Topoisomerase I is broadly neutralizing of HIV-1 through inhibition of Vif-dependent APOBEC3G degradation.

Authors: Ryan P Bennett; Ryan A Stewart; Priscilla A Hogan; Roger G Ptak; Marie K Mankowski; Tracy L Hartman; Robert W Buckheit; Beth A Snyder; Jason D Salter; Guillermo A Morales; Harold C Smith
Journal: Antiviral Res Date: 2016-11-05 Impact factor: 5.970

3. Statistical Mechanics and Thermodynamics of Viral Evolution.

Authors: Barbara A Jones; Justin Lessler; Simone Bianco; James H Kaufman
Journal: PLoS One Date: 2015-09-30 Impact factor: 3.240

4. Recombination Enhances HIV-1 Envelope Diversity by Facilitating the Survival of Latent Genomic Fragments in the Plasma Virus Population.

Authors: Taina T Immonen; Jessica M Conway; Ethan O Romero-Severson; Alan S Perelson; Thomas Leitner
Journal: PLoS Comput Biol Date: 2015-12-22 Impact factor: 4.475

5. Pre-existing resistance in the latent reservoir can compromise VRC01 therapy during chronic HIV-1 infection.

Authors: Ananya Saha; Narendra M Dixit
Journal: PLoS Comput Biol Date: 2020-11-30 Impact factor: 4.475

6. Increased B Cell Selection Stringency In Germinal Centers Can Explain Improved COVID-19 Vaccine Efficacies With Low Dose Prime or Delayed Boost.

Authors: Amar K Garg; Soumya Mittal; Pranesh Padmanabhan; Rajat Desikan; Narendra M Dixit
Journal: Front Immunol Date: 2021-11-30 Impact factor: 7.561

Review 7. Becoming a Selfish Clan: Recombination Associated to Reverse-Transcription in LTR Retrotransposons.

Authors: Hajk-Georg Drost; Diego H Sanchez
Journal: Genome Biol Evol Date: 2019-12-01 Impact factor: 3.416

7 in total