Literature DB >> 15031727

The population genetics and evolutionary epidemiology of RNA viruses.

Andrés Moya¹, Edward C Holmes, Fernando González-Candelas.

Abstract

Entities: Chemical Disease Gene Species

Mesh：

Year: 2004 PMID： 15031727 PMCID： PMC7096949 DOI： 10.1038/nrmicro863

Source DB: PubMed Journal: Nat Rev Microbiol ISSN： 1740-1526 Impact factor: 60.633

× No keyword cloud information.

Main

RNA viruses have become an important area of study for epidemiologists and evolutionary biologists alike[1,2,3]. Much of this research is centred on two main themes; understanding the mechanisms of RNA virus evolution, often through experimental analyses, and reconstruction of the epidemiological history of a given virus, namely its origin and spread through populations and the forces that promote its emergence. Here, we review the progress and current state of both these research topics. Mechanisms of viral evolution Central to population genetics is understanding how the five main forces of evolutionary change — mutation, recombination, natural selection, GENETIC DRIFT and migration — interact to shape the genetic structure of populations. These same forces are also central to understanding RNA virus evolution, although their relative strengths differ to those observed for DNA-based organisms. For RNA viruses, most attention has been directed towards mutation, selection and genetic drift. We can understand their importance and interaction by considering four basic properties of RNA virus populations. First, RNA viruses often have very large population sizes, such that the number of viral particles in a given organism might be as high as 1012. Second, such immense population sizes, which are several orders of magnitude larger than those observed for cellular organisms, are a product of explosive replication. For example, a single infectious particle can produce an average of 100,000 viral copies in 10 hours. As natural selection is most efficient with large populations, it is no surprise that experiments using RNA viruses have shown that selection is of fundamental importance in controlling their evolutionary dynamics, such that new mutants with increased FITNESS (as measured by their selection coefficient, s) continually appear and out-compete older, inferior alleles[4]. Third, owing to the lack of proofreading activity in their polymerase proteins, RNA viruses exhibit the highest mutation rates of any group of organisms, approximately one mutation per genome, per replication[5,6]. Finally, the genome sizes of RNA viruses are typically small, ranging from only 3 kb to ∼30 kb, with a median size of ∼9 kb. These last two properties are intimately related because high-mutation rates are theoretically expected to limit genome size. In particular, a mutation rate that exceeds a notional ERROR THRESHOLD (set at approximately the reciprocal of the genome size) generates so many deleterious mutations in each replication cycle that even the fittest viral genomes are unable to reproduce, and population size decreases to extinction[7,8]. However, RNA viruses that exist close to (but below) the error threshold are also able to produce many beneficial mutations in a short time, thereby enhancing adaptability, provided that their populations are sufficiently large. In the simple situation outlined above, RNA viruses should evolve in a highly deterministic manner, with the process of natural selection working efficiently on a vast array of mutational variants. Although it is true that RNA virus populations are often highly diverse, this is not sufficient to explain the entirety of RNA virus evolution. In particular, deterministic approaches assume that population sizes are universally large, such that the fate of a given mutation can be predicted if its frequency and fitness are known. Although the population sizes of RNA viruses are often very large, factors such as variation in replication potential among variants, differences in generation time among infected cells and POPULATION BOTTLENECKS, most notably during transmission between hosts, might lead to an EFFECTIVE POPULATION SIZE (denoted Ne) that is much smaller than the actual number of infected cells. Theory predicts that in populations where Ne is small (such that the compound parameter Nes < 1), genetic drift has an important role in determining the frequency and fate of mutations[9]. Recombination might also have an important role in RNA virus evolution. Although most studies indicate that recombination rates in many RNA viruses are often lower than those in other organisms[10], there are notable exceptions. Perhaps the most dramatic is HIV, in which the genomic recombination rate exceeds the genomic mutation rate[11]. Frequent recombination seems advantageous because it can create high fitness genotypes more rapidly than by mutation alone. Moreover, recombination might also purge deleterious mutations from virus populations, thereby preventing a dramatic decrease in fitness (see below). However, simulation studies have indicated that frequent recombination is more likely to reduce fitness when mutation rates are close to the error threshold[12]. Finally, recombination rates in RNA viruses might not be set by natural selection at all. Rather, they could simply be a passive function of the replication machinery or ecological circumstances of the virus in question. For example, recombination rates seem to be particularly low in negative-sense RNA viruses[13], which might be a result of the RNA-packaging mechanism. Understanding the causes of variation in recombination rate among RNA viruses is a key area for future study. A final factor to consider in RNA virus evolution is migration. Migration (also referred to as gene flow) must not only be understood at a macroscopic level (that is, among hosts within a population, among populations or between host species), but also within a single infected individual. From the site of inoculation, viruses can be transported to several tissues, generating intra-host spatial variation[14]. However, the effect of a non-uniform population distribution on the spread, fitness and variability of virus populations has been much less studied than other evolutionary factors, although in some experiments a positive correlation between migration rate and the average fitness of the population has been observed[15]. The remarkable mutational power of RNA viruses has meant that their evolution has often been considered to be different to DNA-based organisms[1]. Key to this is the concept of the quasispecies, which was first developed by Eigen and Schuster[16] to understand the dynamics of primitive evolutionary systems. RNA viruses are of particular importance in this respect as they might represent biological entities that evolve according to the rules of quasispecies theory. The basis of quasispecies theory is the notion that the target of natural selection is not simply the fastest growing replicator, but rather a broad spectrum of mutants that are produced by erroneous copying of the fittest (or master) sequence[16,17]. Natural selection acts on the entire quasispecies because mutation rates close to the error threshold mean that individual viral genomes are linked by a mutational coupling — all the possible mutational links between viral genomes are established — so that the whole population evolves as a single unit. One particularly important implication of this special form of group selection is that the fastest replicating RNA viral genomes could be out-competed by those with lower replication rates if the latter have a high probability of being generated by mutation from closely related variants. An important question is whether the quasispecies model is an accurate description of RNA virus evolution. Experimental evidence for the quasispecies was first reported for the bacteriophage Qβ[18]. Subsequent experiments with mammalian vesicular stomatitis virus (VSV) provided one of the most important supporting observations for the quasispecies — a high-fitness viral variant was suppressed by one of lower fitness[19]. However, this can also be explained by genetic drift; the probability that any variant achieves fixation in a population is partially dependent on its initial frequency, so most rare, albeit advantageous, variants are lost by drift in small populations. Indeed, a generic problem of quasispecies theory is that genetic drift is expected to be extremely restricted[17], which might not be the case for viruses in Nature[20,21]. More recently, in vitro studies of the evolutionary dynamics of bacteriophage φ6 provided evidence for one aspect of quasispecies theory — that viral genomes differ in their mutational spectra and that this affects fitness[22]. However, because these experiments used small populations and RUGGED FITNESS LANDSCAPES, and because little is known about fitness landscapes in Nature, the generality of these results in uncertain. Although the quasispecies has a firm theoretical foundation, and there is some evidence for it in laboratory populations, whether it applies to RNA viruses in nature is less clear. For example, simple observations of high levels of genetic variation in RNA viruses are not sufficient to prove the existence of quasispecies, although this is often the only evidence presented, nor is the existence of an error threshold, which can easily be explained using evolutionary models[8]. Rather, to demonstrate that RNA viruses form quasispecies it is necessary to show that natural selection acts on viral populations as a unit. Testing this prediction in Nature will be one of the most important future areas of study for those investigating the mechanics of viral evolution. Experimental evolution of RNA viruses Experimental evolution constitutes a powerful tool for simulating natural evolution[23] and is frequently used to test the basic principles of evolutionary theory[24]. Box 1 provides a schematic overview of experiments of this type. The strength of the experimental approach is that the phenotypic and molecular changes of RNA viral populations can be monitored in real time. More importantly, under an appropriate experimental setting, it is also possible to directly estimate changes in fitness, one of the main goals of modern population genetics. A panoply of experiments have highlighted both the mechanisms of RNA virus evolution and what RNA viruses can tell us about evolution in general. Experimental studies have made a substantial contribution towards understanding the processes that govern RNA virus evolution[25]. The main findings of these experiments can be grouped into three general types. First, the molecular basis of adaptive evolution in viruses, including the occurrence and frequency of CONVERGENT EVOLUTION[26,27], viral attenuation[28] and compensatory mutations[29]. Second, the role of population bottlenecks and the accumulation of deleterious mutations, and how they affect fitness[30,31,32,33]. Finally, the importance of CLONAL INTERFERENCE[34] and COMPLEMENTATION[35] in determining rates of viral adaptation. In brief, these studies make eight conclusions[36]. First, there is extensive convergent and parallel evolution (both in genotype and in phenotype) across lineages replicating in the same host, perhaps reflecting the fact that relatively few sites are free to vary when genome sizes are small. Second, advantageous mutations that are fixed early on when viruses are challenged with new environments confer the largest fitness benefit. Third, phenotypic evolution tends to mirror the evolution of fitness increments, with large changes occurring early in new environments. Fourth, rates of nucleotide substitution remain approximately constant through time. Fifth, overall genetic diversity remains low during the phase of maximum fitness increase and rises once fitness becomes asymptotic. Sixth, evolutionary changes that increase fitness in one host often, although not always, reduce fitness in an alternative host. Seventh, population bottlenecks and spatial heterogeneity lead to an increase in unique nucleotide substitutions. Finally, severe reductions in population size can lead to the accumulation of deleterious mutations and consequent fitness losses. Many fundamental questions in evolutionary biology have been addressed using RNA viruses as model experimental systems. One of the first questions addressed concerned a severe consequence of deleterious mutation accumulation known as MULLER'S RATCHET[37]. Studies have explored how clonally evolving RNA viruses prevent the excessive build-up of deleterious mutations in populations that are experiencing strong bottlenecks or small effective sizes[38]. One popular suggestion is that sexual reproduction has evolved in RNA viruses, because it allows them to escape Muller's ratchet when effective population sizes are small[39,40,41]. In this model, the accumulation of deleterious mutations is expected to be less of a problem for RNA viruses that either recombine or undergo reassortment in the case of viruses with segmented genomes. However, recombination rates are often so low in RNA viruses that it is difficult to hypothesize that they have a direct fitness benefit. In the face of low recombination rates, RNA viruses could escape Muller's ratchet because their long-term effective population sizes might not be small enough to allow deleterious mutations to accumulate, or perhaps compensatory mutations are sufficiently frequent to counteract fitness losses[42]. Experimental evolution with RNA viruses has also been crucial for studying the dynamics of natural selection. This has been the case, for instance, in studies of COMPETITIVE EXCLUSION[43], the RED QUEEN HYPOTHESIS[43], convergent evolution[26,27,44] and the rhythm of adaptive change[45]. We expect RNA viruses to continue to have an important role in this area for many years to come. Finally, one promising area involves using RNA viruses to test theories for the evolution of cooperation. For example, Turner and Chao conducted a series of experiments with bacteriophage φ6 in which they demonstrated that RNA viruses could evolve under PRISONER'S DILEMMA conditions[46], and also escape from it[47]. Despite the power of experiments, there are still difficulties in estimating some of the important parameters in evolutionary biology, such as the rate of deleterious mutation and fitness values. In recent years, a number of in silico approaches have been used to answer these questions, most notably with computer-generated genomes ('digital organisms') that are designed to behave as living systems. Digital organisms have the ability to create a copy of their own genome, but are subject to copying errors, so that populations of programs evolve in, and adapt to, their environments[48]. Although digital organisms are not as sophisticated as viruses, they are useful study tools because experiments can be easily controlled and repeated. Digital organisms have been useful in studying predicted adaptive evolution over short and long timescales[49,50], the role of epistasis in evolution[51] and testing key aspects of quasispecies theory, particularly whether fast replicators can be less fit than slower replicators at high-mutation rates[52]. Molecular epidemiology of RNA viruses Molecular epidemiology was first introduced to the study of infectious disease in the early 1970s[53]. Since this time, the analysis of gene sequence variation has become a standard practice for virologists with an interest in epidemiology, especially with the advent of new, high-throughput sequencing technologies. From the typing of viral populations to the study of the origins of a new virus, viral gene sequence variation has been used to answer a wide variety of questions, increasing both the quantity and quality of the epidemiological data available. In this section, we discuss how particular aspects of RNA virus evolution affect the reconstruction of their epidemiological history. In this context, it is important to note that the observed epidemiological patterns of viruses result from their evolution at two different levels: within individual hosts[54] (and vectors[55]) and among hosts at the population level. RNA viruses differ greatly in their patterns and processes of intra- and inter-host evolution, as well as in the duration of the infection caused and the type of immune response that is induced. Such factors must be considered when discussing their epidemiological dynamics in a comparative setting[3]. Many aspects of the epidemiological history of viruses can be graphically summarized in a phylogenetic tree. The timescale of these trees can vary from a few weeks to many centuries, and depends on the rate of accumulation of variation in the sequences under study and the timescale of sampling. Although graphically very similar, there are important differences between phylogenetic trees and gene genealogies[56]. The former are used to analyse the evolutionary history of distinct viral species or genes, usually by sampling one representative of each unit under study. By contrast, genealogies depict the history of genetic polymorphisms segregating in contemporaneous populations. Gene genealogies have been used extensively in the study of RNA viruses because their rapid evolution means that sequence variation increases over very short periods of time. Indeed, RNA viruses constitute the most important class of measurably evolving populations[57], with evolution even occurring during the infection of a single individual[58]. More accurate methods of evolutionary inference have already been designed for these rapidly evolving populations[59,60,61]. In particular, whereas phylogenetic trees generally depict strictly bifurcating patterns of relationships, gene genealogies can take into account recombination, which will lead to inter-connected networks of lineages. Moreover, using a statistical approach called coalescent theory[62,63,64], it is possible to infer demographic processes from genetic polymorphism data, most notably, rates of viral population growth and decline. Coalescent methods can operate under several population genetic models and use gene genealogies as key analytical tools. Coalescent theory therefore provides a crucial conceptual link between phylogenetics and population genetics. Both phylogenies and gene genealogies are relevant for the epidemiological analysis of RNA viruses and are useful for investigating the origin of new viruses[65,66,67] or identifying the source of an outbreak[68,69,70,71]. Used in conjunction with population genetic theory, they are also essential for the identification of positive[72,73], or purifying[74], natural selection at nucleotide sites, the presence and extent of recombination[13,75,76] and for dating important points in the history of epidemics[67,77,78]. Genealogical analysis is especially relevant for reconstructing the recent epidemiological history of viral populations[66,79,80,81], such as in forensic studies (Box 2, Fig. 1), and therefore have important implications for determining public health policies.

Figure 1

Two alternative epidemiological scenarios translate into different phylogenetic tree topologies, the statistical support for which can be compared directly.

Two alternative epidemiological scenarios translate into different phylogenetic tree topologies, the statistical support for which can be compared directly.

The tree in panel a depicts a common and close origin for samples 1–3 (node A), which is separate from the control samples 4–7 (node B). Node A might correspond to a single outbreak or a suspected transmission among these patients, whereas node B includes samples suspected, but not related to, the outbreak (4–7) and unrelated population controls (8,9). Panel b represents the alternative proposal for sample 1, which is now separated from the former cluster and instead groups with the control samples. Similar proposals can be separately formulated for each of the samples 1–3. Although it is fair to assume that frequent mutation means that long-term rates of nucleotide substitution are usually high in RNA viruses, in reality these rates might vary widely, both within and among genes in the same species and among viral species[82]. Indeed, present data indicate that viral substitution rates are much more variable than their underlying mutation rates[5,6], which is most likely a reflection of important differences in replication dynamics. For example, the nucleotide substitution rate in human T-lymphotropic virus type II (HTLV-II) varies from ∼1 × 10−4 substitutions per site per year in epidemics with high rates of transmission and where replication is rapid, such as those in injecting drug users, to ∼1 × 10−7 substitutions per site per year in endemic situations, where viruses are maintained within hosts through the clonal expansion of infected cells rather than by active replication[83]. However, in many RNA viruses, substitution rates of 1 × 10−3 to 1 × 10−4 substitutions per site per year are observed[82]. The variation in substitution rates across viral genomes has benefits, because it allows different epidemiological questions to be addressed, relating to different temporal scales. So, rapidly evolving (hypervariable) gene regions are informative for studying viral evolution within individuals, or for identifying the source of a particular disease outbreak. More conserved regions are better suited for in-depth phylogenetic inference, from analysing viral genotypes at the species to family levels. In a number of cases it has been proposed that molecular evolution within specific RNA virus genes proceeds at a constant rate. Such MOLECULAR CLOCKS have been proposed for human influenza A virus[84], although the constancy of the evolutionary rate does not hold in many other cases[82,85,86,87]. Non-clocklike evolution can result from a number of evolutionary forces, such as changes in host species, changes in structural and functional constraints[88,89], and the occurrence of positive selection. Although most modern methods of phylogenetic analysis incorporate such rate variation — so that it is unlikely to cause significant error in the reconstruction of tree topologies — it can have an important impact on the analysis of divergence times. The emergence of RNA viruses The past 25 years have seen the emergence of several RNA viruses,which are either new to medical science or have increased in prevalence to the extent that they are now a major concern for public health. Agents that fall into this category include HIV, hepatitis C virus and, most recently, severe acute respiratory syndrome coronavirus (SARS-CoV). Given the continuing threat that is posed by viral diseases, it is essential that we determine the factors underlying viral emergence. Hosts acquire RNA viruses by two different mechanisms. First, owing to host–virus co-speciation, host populations might have carried a specific RNA virus for their entire evolutionary history. Although co-speciation has been proposed in some RNA viruses[90], the process seems to be rare. This is most likely a result of the short infectious periods of many RNA viruses, so that they have limited opportunity for the sustained transmission that is probably needed for co-speciation. By contrast, many DNA viruses establish persistent infections and are therefore expected to be able to follow long-term patterns of host speciation[91]. A more common method by which RNA viruses could enter new host species is through lateral transfer from different reservoir species. Both ecology and genetics seem to be important in determining whether a virus is able to successfully cross species boundaries. In many cases, ecological factors are the most important. Although such factors are diverse, and have been reviewed in detail elsewhere[92,93], they usually reflect changes in either the proximity or density of the host and/or reservoir species, which increase the likelihood that humans are exposed to new pathogens and that sustained transmission networks will be established. Far less is known about the possible genetic factors that might affect the ability of viruses to cross species boundaries. Although RNA viruses are the group of pathogens that seem most able to cross species boundaries[94], perhaps because high mutation rates provide them with an increased capacity to adapt to new hosts[95], not all RNA viruses are equally equipped in this respect. For example, in many cases, RNA viruses (such as rabies virus infection in humans) establish 'dead-end' infections in specific hosts, without subsequent transmission, which reflects imperfect adaptation. This indicates that there are constraints that inhibit viral adaptation to new hosts, perhaps owing to the fitness trade-offs that seem commonplace in viruses that need to infect different hosts or cell types[25,96]. Therefore, infecting different hosts is likely to represent a major adaptive challenge for RNA viruses, despite their mutation rate. Examples are animal vector-borne viruses, which are less subject to adaptive evolution than their non-vector-borne counterparts, presumably owing to the difficulties that are associated with simultaneous replication in hosts as divergent as invertebrates and vertebrates[97]. If extended over longer periods of time, this will lead to a phylogenetic rule of cross-species transmission, such that the greater the evolutionary distance between hosts, the lower the probability of viral transfer among them[98]. A fundamental aspect of the mechanistic basis of viral emergence is the relationship between virus and host cell receptors[99]. Unless a virus has sequences that are able to recognize the cellular receptors of a potential host species, successful cross-species transmission will not be possible. Therefore, jumping species boundaries might only be a problem for an RNA virus if it has to adapt to different cellular receptors, although this still does not guarantee that sustained transmission will be established in the new host. An informative example is provided by influenza A virus. Birds are the main species reservoir, and avian influenza A viruses are usually unable to jump directly into humans because they lack the necessary mutations in the haemagglutinin (HA) gene to infect human cells[100]. Even when avian influenza A viruses do infect humans, human-to-human transmission might not be established. More generally, the relationship between the virus and the host cell receptor predicts an association between the number of cells a virus infects and its host range, thereby explaining whether a virus is a host 'specialist' or 'generalist'. Determining whether such a relationship exists should be a key goal in understanding the genetic basis of viral emergence. The complex interplay between ecology and genetics in viral emergence can be seen in HIV. An important ecological factor in HIV emergence involves the bushmeat trade in west Africa. Not only have a wide range of related simian immunodeficiency viruses (SIVs) been isolated from animal carcasses[101], but the bushmeat trade has increased owing to encroachment by humans on the ranges occupied by non-human primates. The SIV that is found in chimpanzees (SIVcpz) is the closest relative, and therefore the most likely ancestor, of human HIV-1 (Ref. 102), whereas SIVsm from sooty mangabeys seems to be the reservoir population for HIV-2 (Ref. 103). For both HIV-1 and HIV-2, there have been multiple transfers of virus from their reservoirs into humans, with these viruses most likely establishing themselves in humans during the last century[67,77]. Also of importance, was the movement of individuals infected early in the epidemic from small, isolated rural populations to cities in Africa[103], which enabled incipient epidemic strains to reach a large number of susceptible hosts. Yet, genetics is also likely to have been important in the emergence of HIV. In particular, phylogenetic studies indicate that SIVs are most easily transmissable among related primate hosts[104], implying that not all possible instances of cross-species viral transmission that could occur do occur, and that adaptive constraints might exist. A more recent and highly publicized example of viral emergence is provided by the SARS-CoV, the agent of a severe form of pneumonia that has killed more than 700 people worldwide since its appearance in China in November 2002. It is unclear whether the epidemic of 2002–2003 was the first appearance of SARS, or whether the virus had sporadically entered human populations previously, but without detrimental consequences. The animal reservoir for SARS-CoV is also a subject for debate. Phylogenetic analysis reveals that SARS-CoV is equidistant between coronavirus groups 1 and 2, which are usually isolated from mammalian species, and coronavirus group 3, which is currently confined to birds[65,105,106] (Fig. 2). Moreover, the sequence divergence between these three groups and SARS-CoV is so large that SARS-CoV has clearly experienced a long period of independent evolution. Studies of animals sold at Chinese markets have detected antibodies in a number of mammalian species[107]. Most notably, viruses obtained from the Himalayan palm civet (a member of the Viverridae) are closely related to human strains of SARS-CoV. Whether this species represents the main reservoir for SARS-CoV is still unclear. Finally, there have also been suggestions that SARS-CoV is a recombinant of mammalian and avian coronaviruses and that this genetic event might have trigged viral emergence[108]. However, because the sequences involved are so divergent, the phylogenetic incongruence in trees of SARS-CoV seems more likely to be due to variation in the molecular clock than inter-coronavirus recombination.

Figure 2

The phylogenetic relationships of SARS coronavirus (SARS-CoV) inferred using sequences of the spike glycoprotein.

The phylogenetic relationships of SARS coronavirus (SARS-CoV) inferred using sequences of the spike glycoprotein.

a | Phylogenetic relationship of SARS-CoV to the known coronaviruses. Owing to the highly divergent nature of these viruses, the analysis was conducted using an alignment of 12 amino acid sequences that are 1,270 residues in length. The tree was inferred using the maximum likelihood (ML) method available in TREE-PUZZLE[135]. Numbers next to some branches represent quartet puzzling support values, which give an indication of the reliability of that branch. SARS-CoV appears as a distinct lineage. b | Magnified phylogeny of representative SARS-CoV strains isolated from humans and the Himalayan palm civet (Paguma larvata), a putative reservoir species. The tree was constructed using the same region as in part a but using nucleotide sequences (16 sequences, 3,765 bp). The tree was inferred using the ML method available in PAUP*[136]. Maximum-likelihood bootstrap values are shown for the main branches. Both trees are mid-point rooted and all horizontal branches are drawn to a scale of the number of substitutions per site (note the difference in scale between the two trees). All parameter settings used in the phylogenetic analysis are available from the authors on request. The following sequences were analysed (abbreviated viral names and GenBank accession numbers are given in parentheses); Group 1 coronaviruses: feline infectious peritonitis virus (FIPV; CAA29535); Group 2 coronaviruses: bovine coronavirus (BCoV; AF220295), human coronavirus OC43 (HCoV-OC43; S44241), murine hepatitis virus (MHV; AF029248, AF201929, AF208066, CAA28484), rat sialodacryoadenitis coronavirus (SDAV; AAF97738); porcine haemagglutinating encephalomyelitis virus (PHEV; AF481863); Group 3 coronaviruses: infectious bronchitis virus (IBV; AJ311317); SARS coronaviruses: Himalayan palm civet SARS-CoV, strains SZ1 (AY304489), SZ3 (AY304486), SZ13 (AY304487) and SZ16 (AY304488), and human SARS-CoV, strains Sin2677 (AY283795), BJ01 (AY278488), CUHK-AG01 (AY345986), GD01 (AY278489), GZ02 (AY390556), GZ50 (AY304495), HSZ-Bc (AY394994), PUMC02 (AY357075), Taiwan TC1 (AY338174), TW7 (AY502930), Urbani (AY278741) and ZS-C (AY395003). The 'holy grail' for studies of emerging diseases is to predict which infectious agents are likely to infect human populations in the future. Although we are a long way from making accurate predictions, evolutionary genetics does allow some basic rules to be established, and phylogenetic methods have been used to successfully predict the future population survival of strains of influenza A virus[109]. First, the larger the population size of the reservoir species, the more viruses it can harbour, including those with shorter durations of infection and increased virulence[110]. Consequently, animal species that live at high densities, such as some bats, rodents and birds, are most likely to be reservoirs, particularly those animal populations that already live in close proximity to humans. Less intuitively, if there is a relationship between the breadth of cell tropism and the number of species infected, most attention should be given to those viruses that infect several cell types. More importantly, a comprehensive survey of RNA virus diversity should be undertaken in appropriate animal species. This can be done through the use of degenerate PCR primers that have been designed for several RNA virus families, followed by studies to determine whether the viruses will grow in human cells. Similar approaches have already uncovered a plethora of new virus families from marine environments[111]. RNA virus evolution in the long term One aim of studies of RNA virus evolution is to use our understanding of evolutionary processes in the short term, which have often been acquired from experiments, to predict what evolution will do in the long term. Although evolutionary biologists are rightly nervous about predicting future change, the rapid pace of RNA virus evolution means that these predictions can be tested quickly. Of most immediate interest are patterns of drug resistance and viral virulence. Understanding drug resistance is one area in which population biology has a direct impact on public health[112,113]. In the case of RNA viruses, most interest has focused on the potential of drugs to control HIV infection. Despite the optimism that initially surrounded the deployment of highly active antiretroviral therapy (HAART), which involves combinations of drugs[114], antiviral therapy is unlikely to provide a cure for HIV. There are several reasons for this, not least of which is that despite our inability to detect viruses in some patients receiving HAART, viral replication is ongoing, although at greatly reduced levels[115]. Early studies predicting the length of time it would take for resistance to arise under multiple drug therapy also underestimated the importance of recombination in HIV, which we now know is extensive[116]. Frequent recombination could allow drug resistance to be acquired more rapidly than acquisition through mutation alone. There are many factors that influence the evolution of drug resistance, and important results have been obtained — for example, regarding the probability of resistance mutations arising before and during treatment, and the optimal time for the onset of drug treatment[117]. One important question, which also relates to the mechanics of viral evolution in general, is whether drug-resistance mutations have a fitness cost compared with wild-type alleles in the absence of the drug. If there is a fitness cost, we would hypothesize that resistance mutations would not reach high frequencies in populations, despite their benefit to the virus in hosts. There is evidence that, in the absence of the drug, HIV strains harbouring drug-resistant mutations are less fit than wild-type HIV strains[118]. Unfortunately, in other cases, drug-resistant HIV mutants seem to have greater infectivity, and even replication capacity, than wild-type viruses[119]. Not surprisingly, these mutations are increasingly sampled from drug-naive patients[120]. Even if drug mutations are universally advantageous, their long-term success depends on more than their individual fitness. An important mediating factor is the strength of genetic drift at the population level. If drift is strong, which will be the case if effective population sizes are small, the frequencies that mutations eventually attain in populations has a large stochastic component. For RNA viruses, effective population size reflects the mode of transmission. This can be shown by comparing HIV with influenza A virus[3]. In the case of the respiratory transmitted influenza A virus, the host population is large and extensively mixed. Consequently, advantageous mutations, most notably those that confer antigenic escape, are able to be fixed in the virus population in a regular manner[121]. By contrast, population-level selection seems to be considerably weaker in the sexually transmitted HIV, although some evidence for long-term cytotoxic T lymphocyte (CTL)-mediated selection has been found[122]. This contrasts with intra-host HIV evolution, in which immune-driven natural selection is the dominant evolutionary process[72,123,124]. The reduced impact of selection at the population level is most likely to be caused by extensive variation in rates of partner exchange, which in turn reduces the effective population size[125], and because there is a large bottleneck at transmission[126]. Therefore, for HIV, intra-host and inter-host evolution seem to be largely decoupled. Although fewer studies have compared intra- and inter-host evolution in acute RNA virus infections, a recent analysis of dengue virus indicated that most amino acid changes that arise within hosts are deleterious in the long term[74]. The evolution of virulence has long been of interest to population biologists. A common view is that if virulence is a selected trait at all, then it is often involved in a trade-off with transmissibility; the balance of these factors that maximizes the BASIC REPRODUCTIVE RATE (R0) of the pathogen is favoured by natural selection[127], although this has recently been questioned[128]. For RNA viruses it is therefore important to determine whether virulence is optimized and, if so, how it is linked to transmissibility. Complexities arise because virulence is also likely to vary according to the transmission mode[129], and whether there is a long period of intra-host evolution, including superinfection by other strains, which increases intra-strain competition and therefore virulence[130]. In short, predictions about the long-term evolution of virulence in RNA viruses need to be made on a case-specific basis. However, some aspects of the evolution of virulence reflect those that are associated with drug resistance. For example, if particular mutations confer virulence, then whether they become fixed in populations also depends on the strength of genetic drift, even if they are advantageous. Consequently, the optimal level of virulence might not be acquired by chance in small populations. Similarly, if the evolutionary process differs greatly within and among hosts, a selectively favoured level of virulence within hosts might be disadvantageous among hosts. The intra-host evolution of HIV tends to result in the production of high-virulence viral strains that preferentially use the CXCR4 chemokine receptor, infect cells faster and cause AIDS to develop more rapidly[131]. However, these strains seem to be transmitted less often, indicating that they are selectively disadvantageous in new hosts[132]. Understanding the interplay between virulence and transmissibility is clearly central to understanding the evolution of virulence of RNA virus in diseases. Conclusions Establishing the rules of RNA virus evolution is important: not only will this provide information that is essential for understanding the basic mechanisms of evolutionary change, but it will assist in the design of strategies for the control, treatment and eradication of RNA viruses, and perhaps for predicting their emergence. Although it is clear that RNA viruses are unique in the rapidity with which they mutate, their evolution cannot be described in full without a consideration of all the processes of evolutionary change. A particular challenge for the future is to determine whether viral evolution in nature is similar to that established in vitro. The beauty of RNA viruses is that the link between experimental and natural systems can be made simply — few other organisms are as well suited for studying evolutionary processes. At the start of an experiment (see figure panel a), n flasks containing a cell monolayer are infected with the original mutant (resistant to the wild-type (WT) inhibitor) population. Subsequently, a sample is transferred between flasks until the end of the experiment, when evolved mutants are collected. The final samples can then be analysed in two different ways. Fitness assays This technique (shown in panel b of the figure) requires a mixture of the evolved mutant and the ancestral WT, which is then used to infect cell monolayers in a flask and two additional dishes (with and without the WT inhibitor, respectively). On the following day, a sample is taken from the first flask to infect cell monolayers in a new flask and two new dishes (under the same conditions as used previously). The sample from the second flask is then used to infect two new dishes, with and without the WT inhibitor. Plaque numbers are counted as a measure of virus particle numbers. Calculating the logarithm of the mutant/WT quotient against time gives the relative fitness of the evolved mutant. Growth curves This procedure (shown in panel c of the figure) requires the infection of the evolved mutants into cell monolayers in separate flasks, taking several samples from each flask at different times, and using the ancestral WT clone as a control. Plaque numbers are counted in the plate assay to measure viral growth. Calculation of the logarithm of the viral titre against time (t) shows the growth rate for the evolved mutant populations. Establishing the source of a viral outbreak is one of the many applications of molecular epidemiology. Occasionally, this work has implications other than epidemiological. For instance, Ou et al.[68] were able to identify an HIV-infected dentist as the inadvertent donor of the virus to some of his patients, and to simultaneously discard him as the source of viruses infecting others. More recently, a similar analysis identified a medical doctor as the source of HIV infection in a former lover[133]. This is the first case in which a molecular epidemiological analysis has been accepted as evidence in a criminal court in the United States. Owing to the rapid rate of RNA virus evolution, they can be useful in forensic medicine, given a rigorous statistical framework. For this, it is necessary to translate the proposals of both the defendant and the prosecutor into testable phylogenetic hypotheses. González-Candelas et al.[134] used this approach to evaluate the individual likelihoods of a number of hepatitis C virus patients having been infected in a hospital from a common source. In this case, the prosecutor's hypothesis was that each patient was infected at the hospital. Conversely, the defendant's hypothesis was that each patient became separately infected from the general population. These two proposals translate into different phylogenetic tree topologies, the likelihoods of which can be compared directly. The prosecutor's proposal is equivalent to observing a monophyletic group comprising the infected patients, which is significantly distinct from the background population, whereas the defendant's proposal corresponds to the grouping of the sequences from each infected patient with those of the background population. This molecular analysis proved that all the individuals were from a single outbreak, and was the first report of the use of molecular phylogenies to determine the likelihood of a patient sharing the source of infection with other infected patients.

123 in total

1. Distribution of spontaneous mutants and inferences about the replication mode of the RNA bacteriophage phi6.

Authors: Lin Chao; Camilla U Rang; Linda E Wong
Journal: J Virol Date: 2002-04 Impact factor: 5.103

Review 2. Recombination in evolutionary genomics.

Authors: David Posada; Keith A Crandall; Edward C Holmes
Journal: Annu Rev Genet Date: 2002-06-11 Impact factor: 16.830

3. Selection forces and constraints on retroviral sequence variation.

Authors: J Overbaugh; C R Bangham
Journal: Science Date: 2001-05-11 Impact factor: 47.728

4. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene.

Authors: R Nielsen; Z Yang
Journal: Genetics Date: 1998-03 Impact factor: 4.562

5. FREQUENCY-DEPENDENT SELECTION IN A MAMMALIAN RNA VIRUS.

Authors: Santiago F Elena; Rosario Miralles; Andrés Moya
Journal: Evolution Date: 1997-06 Impact factor: 3.694

6. Molecular epidemiology of HIV transmission in a dental practice.

Authors: C Y Ou; C A Ciesielski; G Myers; C I Bandea; C C Luo; B T Korber; J I Mullins; G Schochetman; R L Berkelman; A N Economou
Journal: Science Date: 1992-05-22 Impact factor: 47.728

7. Evolution of human influenza A viruses over 50 years: rapid, uniform rate of change in NS gene.

Authors: D A Buonagurio; S Nakada; J D Parvin; M Krystal; P Palese; W M Fitch
Journal: Science Date: 1986-05-23 Impact factor: 47.728

8. Evolutionary relationship of DNA sequences in finite populations.

Authors: F Tajima
Journal: Genetics Date: 1983-10 Impact factor: 4.562

9. Fitness of RNA virus decreased by Muller's ratchet.

Authors: L Chao
Journal: Nature Date: 1990-11-29 Impact factor: 49.962

Review 10. Analysis of influenza A virus nucleoproteins for the assessment of molecular genetic mechanisms leading to new phylogenetic virus lineages.

Authors: C Scholtissek; S Ludwig; W M Fitch
Journal: Arch Virol Date: 1993 Impact factor: 2.574

116 in total

9. Novel Alphacoronaviruses and Paramyxoviruses Cocirculate with Type 1 and Severe Acute Respiratory System (SARS)-Related Betacoronaviruses in Synanthropic Bats of Luxembourg.

Authors: Maude Pauly; Jacques B Pir; Catherine Loesch; Aurélie Sausy; Chantal J Snoeck; Judith M Hübschen; Claude P Muller
Journal: Appl Environ Microbiol Date: 2017-08-31 Impact factor: 4.792

10. Complementation and epistasis in viral coinfection dynamics.

Authors: Hong Gao; Marcus W Feldman
Journal: Genetics Date: 2009-03-06 Impact factor: 4.562

The population genetics and evolutionary epidemiology of RNA viruses.

Main

Two alternative epidemiological scenarios translate into different phylogenetic tree topologies, the statistical support for which can be compared directly.

The phylogenetic relationships of SARS coronavirus (SARS-CoV) inferred using sequences of the spike glycoprotein.

1. Distribution of spontaneous mutants and inferences about the replication mode of the RNA bacteriophage phi6.

Review 2. Recombination in evolutionary genomics.

3. Selection forces and constraints on retroviral sequence variation.

4. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene.

5. FREQUENCY-DEPENDENT SELECTION IN A MAMMALIAN RNA VIRUS.

6. Molecular epidemiology of HIV transmission in a dental practice.

7. Evolution of human influenza A viruses over 50 years: rapid, uniform rate of change in NS gene.

8. Evolutionary relationship of DNA sequences in finite populations.

9. Fitness of RNA virus decreased by Muller's ratchet.

Review 10. Analysis of influenza A virus nucleoproteins for the assessment of molecular genetic mechanisms leading to new phylogenetic virus lineages.

1. Long-term evolution of the Luteoviridae: time scale and mode of virus speciation.

2. Chaotic Red Queen coevolution in three-species food chains.

3. Homologous crossovers among molecules of brome mosaic bromovirus RNA1 or RNA2 segments in vivo.

4. Epistasis and the adaptability of an RNA virus.

5. Widespread genetic exchange among terrestrial bacteriophages.

6. Prevalence and genetic diversity of coronaviruses in bats from China.

7. High frequency of mutations that expand the host range of an RNA virus.

8. An exact nonparametric method for inferring mosaic structure in sequence triplets.

9. Novel Alphacoronaviruses and Paramyxoviruses Cocirculate with Type 1 and Severe Acute Respiratory System (SARS)-Related Betacoronaviruses in Synanthropic Bats of Luxembourg.

10. Complementation and epistasis in viral coinfection dynamics.