Literature DB >> 19564871

Evolutionary analysis of the dynamics of viral infectious disease.

Abstract

Many organisms that cause infectious diseases, particularly RNA viruses, mutate so rapidly that their evolutionary and ecological behaviours are inextricably linked. Consequently, aspects of the transmission and epidemiology of these pathogens are imprinted on the genetic diversity of their genomes. Large-scale empirical analyses of the evolutionary dynamics of important pathogens are now feasible owing to the increasing availability of pathogen sequence data and the development of new computational and statistical methods of analysis. In this Review, we outline the questions that can be answered using viral evolutionary analysis across a wide range of biological scales.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2009 PMID： 19564871 PMCID： PMC7097015 DOI： 10.1038/nrg2583

Source DB: PubMed Journal: Nat Rev Genet ISSN： 1471-0056 Impact factor: 53.242

Main

Rapidly evolving pathogens are unique in that their ecological and evolutionary dynamics occur on the same timescale and can therefore potentially interact. For example, the exceptionally high nucleotide mutation rate of a typical RNA virus[1] — a million times greater than that of vertebrates — allows these viruses to generate mutations and adaptations de novo during environmental change, whereas other organisms must rely on pre-existing variation maintained by population structure or balancing selection. In addition, many viruses frequently recombine, further increasing the opportunity for genetic novelty. Consequently, populations of fast-evolving pathogens can accumulate detectable genetic differences in just a few days and can adapt brutally swiftly, even when the adapted genotype would have been strongly deleterious in a previous environment. The interaction between evolution and epidemiology is reciprocal: the maintenance of onward transmission may be crucially dependent on continuous viral adaptation, just as the fate of a viral mutant may be decided by its hosts' position in a transmission network. The term phylodynamics has been coined[2] to describe infectious disease behaviour that arises from a combination of evolutionary and ecological processes, and we adopt the term in this Review as a convenient shorthand for the existence and investigation of such behaviour. We focus on studies that infer viral transmission dynamics from genetic data; these are typically based on concepts from phylogenetics and population genetics, but they also link pathogen evolution to the dynamics of infection and transmission. In the last decade, such studies have matured from theoretical and qualitative investigations (for example, Refs 3,4) to global genomic investigations of key human pathogens (for example, Refs 5, 6, 7). Understandably, most studies have focused on important human RNA viruses such as influenza virus, HIV, dengue virus and hepatitis C virus (HCV); therefore, this Review concentrates on these infections. However, the range of pathogens and hosts to which phylodynamic methods are applied is expanding, and we also discuss infectious diseases of wildlife, crops and livestock. The field of viral evolutionary analysis has greatly benefited from three developments: the increasing availability and quality of viral genome sequences; the growth in computer processing power; and the development of sophisticated statistical methods. Although the explosion in viral genomic data is outpacing our ability to develop methods that fully exploit the potential of these data, we provide an overview of the key biological questions that can be tackled using current evolutionary analysis methods (Box 1). For example, when did a newly emergent epidemic begin, and from which population or reservoir species did it originate? Can genetic data resolve the order and timing of transmission events during an outbreak? How swiftly do pathogen strains move between continents, regions and epidemiological risk groups, or even between different tissues in a single infected host? Perhaps the most recognizable achievements of viral evolutionary analysis to date are the reconstruction of the origin and worldwide dissemination of HIV-1 (Refs 8, 9, 10, 11, 12, 13, 14, 15, 16), and the explanation of influenza A epidemics through the combined effects of natural selection and global migration[5,6,17,18,19,20,21,22,23,24]. We describe the range of empirical questions that phylodynamic studies can address by outlining the findings of important studies, most of which have been published in the last few years. Our Review also highlights the variety of practical contexts in which such questions arise, including epidemic management and control, understanding variation in clinical disease, the design of effective vaccines, and criminal trials in which negligent transmission has been alleged. To emphasize the general applicability of the phylodynamic approach, we consider the various organizational scales at which analyses are undertaken, from the global evolutionary behaviour of pathogens to evolution in a single infected host. It is clear that, even for the same pathogen, evolutionary and ecological processes combine in different ways at different scales[2] (Box 2). For example, influenza A virus displays strong genetic evidence of antigenic selection when studied over many years, but seems to be dominated by stochastic processes when only a single epidemic in one location is considered[22]. We also discuss aspects of data collection, pathogen biology and analysis methodology that may promote or hinder the generation of reliable conclusions. Methods to analyse viral evolutionary dynamics Investigating the joint evolutionary and ecological dynamics of infectious disease requires a common frame of reference within which models and data from different fields can be integrated. As we illustrate, this is often achieved by reconstructing evolutionary change on a natural timescale of months or years, enabling researchers to date epidemiologically important events such as zoonotic transmissions. A real timescale also allows pathogen evolution to be directly compared with known surveillance or time series data, perhaps revealing the time period during which a pathogen existed in a population before its discovery, or indicating the impact of public health interventions on viral genetic diversity. Phylodynamic analyses commonly use molecular clock models to represent the relationship between genetic distance and time (Box 1). Early simplistic models that assume a constant rate of virus evolution have been superseded by those that explicitly incorporate rate variation, either between strains or through time (for example, Ref. 25). A second, and increasingly popular, common frame of reference is provided by the geographic or spatial distribution of disease isolates (Box 1). Combined spatial and genetic analyses not only reveal the location of origin of emerging infections, but can also discern the route of transmission and the rate of geographic spread. In addition, statistical models based on coalescent theory are used to directly link patterns of genetic diversity to ecological processes, such as changing population size and population structure (Box 1). Using these models, it becomes possible to infer the characteristics of pathogen populations, such as their rate of growth, from a small sample of genomes. The resolution and scope of phylodynamic methods depends on the rate of pathogen evolution relative to that of ecological or spatial change — epidemics that fluctuate faster than mutations accumulate among pathogens will not leave an imprint in genetic diversity, although longer-term dynamic trends will. Dynamics on a global scale The broadest perspective on the evolutionary dynamics of a pathogen is obtained by sampling its worldwide genetic diversity over a suitable period of time. Not all viruses are geographically widespread — some might be limited by the range and dispersal of their hosts — but for those that are, it is essential to understand the geographic structure of viral genetic diversity. For example, HCV shows genotype-specific responses to antiviral drugs, and the clinical severity of dengue virus infection may depend on previous exposure to genetically distinct strains. Genetic data also reveal the rate and route of global spread, which have been most effectively studied for highly infectious airborne viruses such as severe acute respiratory syndrome (SARS) coronavirus and influenza viruses. Humans are an atypical host species as urban population densities and international transport provide opportunities for pathogen transmission that would be otherwise absent. The role of contemporary human migration in determining global viral dynamics has been most comprehensively studied for the influenza A virus by the systematic collection, sequencing and analysis of thousands of viral isolates. Historically, influenza has caused intense bursts of human mortality, most notably associated with the reassortment of human and non-human influenza viruses, which creates strains for which humans have no acquired immunity. Evolutionary analysis of the antigenic haemagglutinin gene (HA) of the dominant H3N2 strain has shown that the influenza A virus evolves rapidly through time, yet viruses sampled concurrently from different continents exhibit limited diversity and are typically descended from a common ancestor only a few years earlier[5,22]. Recent evolutionary studies have revealed that the virus re-emerges each year from a persistent Southeast Asian 'source' and follows global aviation networks to temperate 'sink' regions, seeding new winter epidemics there that die out over summer[5,6] (Fig. 1). The global restriction on the diversity of influenza A virus is caused by selective sweeps driven by the host's acquired immunity, which generates rapid antigenic evolution[24] and corresponding high rates of amino acid change at HA antigenic sites[19]. Evolution of influenza A virus is even more dynamically complex when the whole genome is considered — reassortment between genome segments modulates the action of selection, so that some selective sweeps are genome-wide, whereas others only restrict the diversity of HA[5].

Figure 1

The global dynamics of influenza A virus.

The global dynamics of influenza A virus.

Human influenza A virus exhibits a complex pattern of global seasonal dynamics, with epidemics in temperate areas occurring during the winter and year-round sporadic outbreaks in the tropics. Recent analyses indicate that these dynamics are best described by a source–sink model of viral population structure, with a persistent reservoir in South-East Asia driving viral diversity worldwide[5,6]. a | Complete genome sequences sampled from New York State, USA, and from Australia and New Zealand have provided a high-resolution snapshot of diversity in these locales over successive seasons[5,22]. Continuous transmission of influenza in the reservoir populations allows natural selection for antigenic diversity, whereas the sink populations with seasonal dynamics will tend to be a representative sample of this diversity. b–d | Different patterns of global gene flow will be reflected in the phylogenies of influenza isolates sampled from sequential epidemics in one location. b | The entire diversity of the second season is descended from a single lineage originating from the global reservoir (lineages representing this global reservoir are in green). c | As part b, but with multiple lineages from the global reservoir seeding each season. d | As part b, but with a few lineages persisting locally (red) from one season to the next. e | The entire second season is descended from local lineages, implying that transmission persists from season to season in this location. Part a is modified, with permission, from Ref. 5 © (2008) Macmillan Publishers Ltd, all rights reserved. Influenza A dynamics are clearly the result of intricate and ongoing interactions between evolutionary and ecological processes. However, not all pathogens with a worldwide distribution show such complex behaviour at this scale. Although the HIV-1 pandemic is truly international, it is the result of simpler ecological processes that are less strongly coupled to viral evolution. Evolutionary analysis has proven successful in reconstructing the global epidemic history of HIV-1. Viral sequences sampled at various times since the discovery of HIV in 1983 have been used to date the origin of the pandemic to the first half of the twentieth century[10,15] and to pinpoint west-central Africa as its geographic source[14]. These results have been validated and refined by the recovery of genomic fragments from older isolates, notably two 50-year old preserved tissue samples from Kinshasa, Democratic Republic of Congo[15,16], which indicate that considerable HIV diversity had accrued there by 1960. The worldwide dissemination of HIV-1 from its central African source over several decades was propelled by multiple 'founder events', whereby individual HIV-1 lineages moved to new regions and established epidemics, sometimes recombining in the process, thus generating an array of circulating recombinant forms. The nature and timing of both founder and recombination events have been estimated by evolutionary analysis[8,13,26]. In contrast to influenza A, the absence of protective immunity against HIV means that viral adaptation probably played little part in shaping the current geographical distribution of HIV-1 subtypes, although there is evidence that the virus acquired specific mutations after zoonosis to enable efficient transmission among humans[27] and that HIV-1 is now adapting to the diversity of human leukocyte antigen class I molecules[28,29]. Simple epidemic dynamics also explain the global dissemination of HCV, which has infected humans for at least several centuries[30]. A handful of endemic HCV strains, originally from Asia and Africa, exploded in prevalence worldwide during the twentieth century owing to their chance association with new routes of transmission, such as transfused blood[31]. Although few pathogens have been sampled as comprehensively as influenza A virus or HIV-1, new insights are being gained as large data sets are compiled for other viruses. For example, recent studies of echovirus 30, a transmissible human enterovirus that causes periodic outbreaks of meningitis, have revealed a fascinating picture of evolutionary forces that vary among viral genes[32,33]. Echoviral capsid genes diverge continuously and rapidly, show rapid global transmission, but exhibit limited concurrent variation. This is analogous to the immune-driven turnover of influenza A HA lineages, but there is substantially less genetic evidence of positive selection for immunologically novel echoviral variants[32,33]. By contrast, echovirus 30 polymerase gene lineages are geographically structured, diverse, and coexist on a global scale. Frequent recombination between the capsid and polymerase genes generates transient recombinant forms that are estimated to persist for approximately 5 years[33]. This modular nature of echovirus 30 evolution is all the more remarkable given that it takes place in an unsegmented linear genome that is less than 8 kb long. Human metapneumovirus, a recently discovered and common cause of childhood respiratory illness, exhibits complex behaviour that is less fully understood. The virus forms several lineages, each of which contains little genetic diversity — suggesting that genetic bottlenecks are common, but only partial or local in effect[34]. Evolutionary analysis has helped track the global spread of the H5N1 highly pathogenic avian influenza (HPAI). Because the virus has been continuously sampled since its emergence in China in 1996, phylogenies can provide accurate reconstructions of its movements, both internationally[35] and locally[36]. Molecular clock results indicate that HPAI lineages typically reside at a location for several months before their official detection[37]. HPAI strains in Asia also undergo frequent reassortment, which may be facilitated by the dense and interconnected duck and poultry populations in the region[36,37]. As more pathogens are studied on a global scale, we should remember that conclusions drawn from small and local samples will underestimate dynamic complexity. Indeed, our understanding of both HIV-1 and influenza virus population dynamics changed appreciably after comprehensive surveys of viral diversity were published[6,14]. If we extrapolate from the examples of echovirus 30 and influenza A virus, then it seems that the most complex global behaviour occurs in highly transmissible viruses that cause acute infections and short-lived epidemics, possibly because their dynamics arise from a three-way interplay between transmission, host herd immunity and viral adaptation. Human viruses that might show such behaviour — when sampled on a sufficiently large scale — include enteroviruses, rhinoviruses, caliciviruses and paramyxoviruses. Regionally or genetically defined epidemics A large proportion of evolutionary analyses of pathogens consider individual lineages, strains or subtypes circulating in a specific location, which may be a whole continent or just one town or district. Such outbreaks frequently correspond to a single epidemic, as defined by surveillance organizations, and may involve a single lineage or cluster of infections, as defined by phylogenetic analysis. Evolutionary analysis at this scale can determine the source and time of origin of an epidemic, reveal its genetic composition, and is often used to estimate the rate of viral transmission and spatial spread in the affected region. Studies on a regionally or genetically defined scale often begin by seeking the source of the new strain, which could be either a zoonotic reservoir or an epidemiologically distinct or distant human population. The origin of an epidemic is typically inferred by finding the most genetically similar non-epidemic strain. This is a simple procedure but is greatly dependent on previous sampling. For example, the SARS coronavirus was highly distinct with no close relatives when initially characterized in April 2003 (Ref. 38). The discovery in October 2003 of related viruses in civet cats from animal markets[39] suggested that SARS originated from a zoonotic source, but further sampling has shown that bats are the primary reservoir of these viruses[40]. Molecular clock analysis of bat coronaviruses indicates that the cross-species transfer to civet cats occurred only 4 years before the onset of the human epidemic[41]. Epidemic origins are also hard to locate if the source is geographically or temporally remote; West Nile virus strains sampled from the Mediterranean in 1998 were quickly identified as the source of the 1999 North American epidemic[42], whereas the discovery of the probable zoonotic source of pandemic HIV-1 — Pan troglodytes troglodytes chimpanzees in south-eastern Cameroon — was the culmination of many years of research[9]. In some instances, genetic analysis can reveal hidden multiple origins for epidemics that initially seemed homogenous. The 1980s HIV epidemic in the UK among men who have sex with men and the 1990s outbreak of HCV in a subset of the same population are both comprised of at least five distinct strains, each with similar epidemiological behaviours[43,44]. Similarly, phylodynamic analysis of whole viral genomes indicates that the 2005 Singapore dengue virus epidemic comprised multiple viral lineages of different geographical origins[45]. The existence of hidden genetic heterogeneity within an epidemic implies that rapid movement of lineages at a higher geographic scale is likely. Viral isolates sampled from regional epidemics can contain valuable information about the spatial dynamics of infection. For example, Biek et al.[46] estimated the spread of raccoon rabies across the north-eastern United States from sequences sampled over three decades. Viral movement was initially rapid but slowed considerably after a few years as individual lineages became established in different locales, and ecological data on outbreak size closely matched the estimates obtained using coalescent methods (see next section). A similar process of invasion and establishment was also reported for dengue virus in the Americas[47]. Interestingly, dengue virus diversity was maintained across epidemic cycles by the metapopulation structure built up during the invasion phase (Fig. 2). If both the location and sampling date of viral sequences is specified it is possible to estimate the distance pathogens move per year solely from genetic data, as demonstrated by reconstructions of Ebola virus spread in central Africa[48] and feline immunodeficiency virus infection of Rocky Mountain cougars[49].

Figure 2

A spatially and temporally defined epidemic.

A spatially and temporally defined epidemic.

a | A molecular clock phylogeny that illustrates the history of dengue virus genotype 2 infection in the Caribbean and in Central and South America[47]. A simple parsimony approach has been used to reconstruct the likely location of each phylogenetic branch (blue, Caribbean islands; red, mainland Central America and mainland South America). By combining phylogenetic and geographic information, the phylogeny indicates that the outbreak began in the Caribbean before repeatedly and independently invading mainland locations some years later. b | An estimate of the relative genetic diversity of the same dengue virus epidemic, which shows an initial increase before stabilizing (95% confidence limits shown in blue). This stabilization does not match the varying number of reported dengue outbreaks (shown in part c), probably because spatial population structure maintains viral diversity across epidemic peaks and troughs. More generally, when the sampled population exhibits strong positive selection or population structure then the y-axis cannot be reliably interpreted as proportional to effective population size. The estimated common ancestor of the sampled sequences (arrow) is dated slightly earlier than the first reported outbreak in the region (see part c). c | Shows the number of countries affected by dengue virus genotype 2 infection per year. Figure is modified, with permission, from Ref. 47 © (2005) American Society for Microbiology. Regionally or genetically defined outbreaks most closely represent the typical 'epidemic' that is described by models of mathematical epidemiology. In some cases this representation can be formalized using population genetic models based on coalescent theory, which directly link phylogenetic structure with ecological processes (Box 1). This approach is typically used to infer past rates of epidemic growth from sampled viral sequences[3] but can, in some circumstances, be used to directly estimate the fundamental epidemiological parameter(R0) from such data[30,46,50]. Coalescent-based methods have been successfully applied to HCV and HIV-1. This success is partly because of the chronic nature of infection and the absence of cross-immunity for these viruses, which result in comparatively slow changes in prevalence that leave clear footprints in the patterns of viral diversity. Analysis of HCV genomes indicates that, during the twentieth century, strains varied significantly in their rates of growth according to the transmission route by which each strain was spread[30,51]. The reliability of coalescent-based methods — which make a number of limiting assumptions — was tested in an analysis of HCV in Egypt: here, the methods correctly reconstruct a mid-twentieth century explosion in transmission that was caused by widespread unsafe injection during campaigns against schistosomiasis[52]. Comparable phylodynamic studies of HIV-1 subtypes also show agreement between genetic and epidemiological reconstructions[53,54], even though commonly used coalescent methods ignore the presence of HIV recombination. As well as describing the origin and spread of many individual outbreaks, analyses of regional epidemics have helped reveal conceptual connections between the different fields of epidemiology, population genetics and phylogenetics, and have validated methods of statistical inference. Despite the choice of examples above, analysis at this scale is not limited to human and animal pathogens. For example, Fargette et al.[55] linked the timescale of the emergence of rice yellow mottle virus to the nineteenth century expansion of rice culture in Africa, and Almeida et al.[56] used similar methods to conclude that the human transport of contaminated plants disseminated banana bunchy top virus among Hawaiian islands after it was introduced to the islands in 1989. Infection clusters and transmission chains If an outbreak or infection cluster occurs on a small enough scale then we can realistically expect to sample viruses from all or most of the individuals involved. Studies of such outbreaks tend to fall into two categories: those for which the transmission history (that is, who infected whom, and when) is mostly or wholly known, and those for which it is unknown. Examples in which the transmission history is known are highly informative, as the specified infection history allows evolutionary processes to be investigated with a greater degree of certainty. When the transmission chain is unknown, the primary goal may be the reconstruction of the chain or the identification of its source, timescale or transmission route. Naturally occurring outbreaks for which the transmission event details are known are understandably rare; the majority of those with known details are HIV outbreaks. Known chains of transmission have been used to measure the rate of HIV evolution[57] (Box 2) and the magnitude of the bottleneck in virus diversity generated at transmission[58]. The Irish anti-D cohort — a well-studied group of HCV-infected women who were accidentally infected with almost identical strains at the same time — has also provided valuable information about variation in viral evolution, host immune selection and disease outcome between patients[59,60]. Using a different HCV transmission cluster, Wrobel et al.[61] demonstrated that molecular clock methods can reliably estimate the date that a patient was infected. Transmission chains can also resolve whether the same viral adaptation arises in different hosts (convergent evolution)[62]. Known transmission chains have been used to test whether sequence-based phylogenies match the true history of transmission among epidemiologically connected infections. Although several studies of HIV clusters have reported close agreement[62,63,64], it is often not appreciated that there are good reasons to expect occasional mismatches between the phylogeny and the true transmission history of a cluster. When one 'donor' infection transmits the virus to multiple recipients, the common ancestors of viral lineages sampled from the recipients will exist in the donor. If the amount of viral diversity in the donor is comparatively high, then the relative order of phylogenetic splitting events (one for each common ancestor) may differ from the order of infection events (Fig. 3). The branching order of transmission for genetically diverse infections is therefore best analysed using metapopulation models that integrate the process of transmission with that of lineage coalescence[65]. This issue is not only restricted to specialized phylogenetic studies — evolutionary analyses of transmission chains are presented in criminal proceedings in which individuals are accused of intentional or negligent transmission[66].

Figure 3

Reconstruction of a known HIV-1 transmission chain.

Reconstruction of a known HIV-1 transmission chain.

A phylogeny of 13 HIV-1 viral particles (blue circles) sampled at different times (horizontal axis) from 9 different patients for whom the times and direction of viral transmission are known. The virus phylogeny (blue lines) can be mapped within the transmission tree (yellow boxes and arrows), analogous to the mapping of a gene genealogy within a species tree. We can trace all the viruses sampled from one patient back to the time of transmission. Whether more than one lineage is transmitted at this time from the donor will depend on the size of the genetic bottleneck at transmission. Even in the presence of a tight bottleneck, a diverse population in the donor can result in lineage sorting, with the result that the topology of the virus phylogenetic tree does not exactly match the transmission tree. Anew and interesting approach to the analysis of transmission chains is presented in recent studies of UK outbreaks of foot and mouth disease virus (FMDV). These studies describe the infection process at the level of individual farms, with transmission between farms mainly caused by the transport of infected livestock. Cottam et al.[67] developed dynamic models that provide a probability distribution for the date of infection of a particular infected farm and its likely period of 'infectiousness' before FMDV diagnosis and culling of the animals. This temporal information was then combined with the genome sequences of viruses that were sampled from the infected herds to identify the most likely chains of transmission linking the farms in time and space. A joint analysis was particularly suitable because FMDV spread is so rapid that comparatively few genetic changes accrue between inter-farm transmissions. Not all studies of infection clusters focus on the pathways of transmission; sometimes the initiation date of an outbreak is of most interest[68] and at other times the precise epidemic source is sought[69]. However, coalescent-based estimates of population processes are not suitable for infection clusters because this approach requires that the sequences analysed represent a small fraction of the sampled population. Despite this restriction, transmission chain phylogenies can still provide important information about populations, such as the minimum time between transmission events[70]. Furthermore, modern sequencing technology is fast enough for genetic analysis to assist contact tracing and control as an epidemic unfolds. For example, phylogenies confirmed epidemiological suspicions that the 2007 Italian chikungunya outbreak originated from an Indian index case[71]. Considered together, the studies discussed in this section highlight the relevance of transmission chain analyses to applied problems in clinical medicine, forensics and public health. The microevolutionary dynamics of infection events will become a major focus of infectious disease research as high-resolution longitudinal studies will be made possible by the application of next-generation sequencing. Within-host dynamics The exceptionally rapid rate of evolution of RNA viruses means that viral evolution in a single host can be studied for the duration of an infection. Dynamics at this scale are fundamental as within-host evolution is the ultimate source of all viral genetic diversity, and therefore it must be understood before models that link different evolutionary scales can be properly developed (Box 2). Additionally, within-host analyses can reveal the evolutionary processes that underlie some aspects of clinical disease. In practice, such analyses have so far been limited to viruses that establish chronic infections lasting months or years, and for which measurable amounts of genetic change occur between viral samples; this is particularly the case for HIV infection and, to a lesser extent, for HCV and hepatitis B virus[72] infection. Strong natural selection is clearly the dominant force determining HIV evolutionary dynamics in hosts: HIV phylogenies display a high turnover of short-lived lineages that is driven by host immune selection, analogous to the pattern observed for influenza A virus at the global scale[2] (Box 2). Correspondingly, HIV genetic diversity at any particular time is low but slowly increases over the course of chronic infection[73]. Numerous analyses have quantified HIV adaptation and evolution using gene sequences, particularly for the viral envelope gene. These studies have found that these processes correlate with the rate of progression to clinical AIDS[74,75,76] and the rate at which HIV evades neutralizing antibody responses[77]. Equivalent studies of HCV infection have found that viral adaptation predicts the outcome of acute infection[78,79] and that HCV diversity correlates with levels of liver damage[80]. Perhaps the most important outcome of HIV within-host evolution is the generation of T cell escape mutants that can elude host cytotoxic T lymphocyte responses[81] — this is a major barrier to the development of effective HIV vaccines. Although much of the work on T cell escape is not explicitly phylogenetic, there has been a trend away from cross-sectional surveys of viral variation (for example, Ref. 82) towards longitudinal and evolutionary studies at all organizational scales, from the level of the pandemic[83] to that of small transmission chains[81] and in individual hosts[84]. The rate at which HIV evolves during an infection depends not only on viral adaptation but also on the replication rate of the virus and its population size: these factors combine to generate measurable variation in viral evolutionary rate both within and between hosts. As a result, evolutionary rates estimated from sequence data may be crucially dependent on the scale of analysis (Box 2). Phylodynamic methods have detected and measured the compartmentalization of viral lineages into specific tissues during chronic infection, which creates within-host subpopulations (so-called virodemes), which are analogous to the location-specific clusters of infection seen at higher scales. Highly distinct strains of HIV are found in the brains of patients with neurological illness[85,86], suggesting that virus movement across the blood–brain barrier is not common and might be unidirectional. Finer genetic structure is apparent even among viruses from different brain regions, which seem to evolve at different rates[87]. HIV subpopulations in other tissues have been proposed, including in the cervix[88] and seminal fluid[89], as has compartmentalization in livers with chronic HCV infection[90]. Integrating levels of phylodynamic processes The evolutionary and ecological dynamics of viral pathogens take place in a hierarchy of organizational scales, from within-host processes to the global dynamics of pandemics, but it is not obvious how dynamics at lower scales combine to generate higher-order behaviour. Such hierarchical processes can be studied from the perspective of both populations genetics[65] and mathematical epidemiology[91]. Multiscale interactions are of great public health importance as well as being of theoretical interest; for example, the success of antiviral drug treatment campaigns will depend on the degree to which drug resistance mutations that arise in treated hosts can accumulate at the epidemic level[92]. There are intriguing parallels between processes in hosts and those at the epidemic or global level[2]. First, within-host studies reconstruct the dynamics of large viral populations from small samples, hence techniques commonly applied to large-scale epidemics (particularly coalescent models) can be re-employed with an appropriate change in perspective — each sequence represents an infected cell or virion, rather than an infected host. Secondly, within-host evolution is closely intertwined with ecological processes, such as the turnover of virions, host cells and components of the host immune response. These dynamics are studied using virus kinetics models[93], which were directly inspired by related models developed by mathematical epidemiologists. As at higher scales, within-host studies have attempted to integrate evolutionary and ecological processes[94,95,96]; for example, in vivo HIV cell-to-cell generation times can be accurately estimated by coalescent analysis of sampled virus sequences[97,98]. There is great potential for further development of models that combine the abundant longitudinal data on infection kinetics with those on viral evolution. Conclusions The field of infectious disease evolutionary dynamics is currently seeing a revolution in all three of the technologies on which it relies: genomic sequencing, statistical methodology and high-performance computing. This confluence has produced a burgeoning interest in the evolutionary and epidemiological processes that leave their imprint on pathogen genomes, as reflected in the empirical studies and analysis techniques reviewed here. However, it is our opinion that many investigations still fail to fully appreciate or utilize the rich source of epidemiological information contained in viral genome sequences. Genetic data can independently corroborate surveillance data during an epidemic and can shed light on events before the initial report of the outbreak. Furthermore, evolutionary and surveillance data provide alternative perspectives on the same underlying phylodynamic process and can therefore be validated against one another. The practicality of this approach was demonstrated during the H1N1 'swine flu' epidemic, first detected in April 2009. Tens of viral sequences were made publically available within days of discovery of the virus, and evolutionary analysis was incorporated into initial assessments of the pandemic potential of the new strain[50]. Large-scale sampling and sequencing could also revolutionize our understanding of medically important RNA viruses, such as caliciviruses, rotaviruses and enteroviruses, the genetics of which are currently comparatively neglected. DNA viruses with small genomes that evolve at similar rates to RNA viruses[1] will be equally suitable for phylodynamic analysis. When applied to slower-evolving DNA viruses, bacteria and protozoa, evolutionary analyses similar to those introduced here can help elucidate longer-term processes, such as host–pathogen co-divergence and pathogen speciation[99,100,101]. In the near future, the greatest impact on viral evolutionary analysis will come from the increasing accessibility of new high-throughput sequencing technologies[102]. For RNA viruses, which have genomes that are on average only 15,000 nucleotides long, it is likely that hundreds or thousands of complete genomes sampled from both viral epidemics and infected hosts can be routinely subjected to molecular epidemiological analysis. Ensuring that computational and statistical developments keep pace with this revolution in data acquisition will be a great challenge. One promising solution is to harness the power of 'multi-core' or massively parallel computing technologies in evolutionary analysis[103]. The coming genomic era will also allow us to determine how much information can be inferred from gene sequences alone — only those ecological processes that occur on the same timescale as genetic change will leave their mark on genetic data, and robust evolutionary inferences carry a statistical uncertainty that should be accurately estimated and reported. Therefore, a clear goal for the future is to further develop analytic methods that combine genetic and epidemiological data to reconstruct epidemic history and to predict future trends, a task to which Bayesian inference methods of statistical inference are well suited. Further development of analysis methods is required in three key areas: the quantification of viral adaptation by natural selection; the explicit integration of evolutionary and spatial information; and the measurement of rates of viral reassortment or recombination. Advances in these areas could raise new questions for phylodynamic analysis. For example, do lineages differ in their rates of spatial diffusion? And are bursts of viral adaptation associated with recombination events? However, such analytical finesse is of little use if basic epidemiological information, such as the date and location of sampling, is unavailable, and we implore researchers generating viral sequences to attach as much sample information to each sequence as ethical constraints permit. Rooted molecular phylogenies can be estimated from viral gene sequences (see the figure, part a). Depending on the scale of the analysis undertaken, the sampled sequences (red circles) may represent infected individuals, infected cells, virions or higher-level units such as villages. The phylogeny branching order shows the shared ancestry of the sequences, which usually — but not always — reflects the history of pathogen transmission between these units (discussed in main text). This phylogeny has no timescale, so the branch lengths represent the genetic divergence from the ancestor (black circle). If the sequences of interest undergo recombination, then a single phylogenetic tree may not adequately describe evolutionary history and alternative methods can be applied (for example, Ref. 104). The same phylogeny can also be reconstructed using a molecular clock model (see the figure, part b), which defines a relationship between genetic distance and time. The pathogen sequences have been sampled at known time points and the phylogeny branches have lengths in units of years. This approach estimates the ages of branching events, including that of the common ancestor. The simplest, 'strict' clock model assumes that all lineages evolve at the same rate. More complex, 'relaxed' models allow evolutionary rates to vary through time or among lineages, resulting in variation around an average rate[25]. In this phylogeny, unusually fast or slow evolving lineages are shown as thick or thin lines, respectively. The relationships among genetic distance, evolutionary rate and time can be understood by comparing the branch lengths in part a and part b. Phylodynamic data can also highlight the evolution through time of mutations that may reflect viral adaptations (see the figure, part c). Observed amino acid changes (crosses) are shown mapped onto specific phylogeny branches. Amino acid sites under positive selectioncan be identified using dn/ds methods, which compare the rate of replacement substitutions (that change the amino acid) with the rate of silent substitutions (that do not change the amino acid)[18,105]. Such methods are most powerful when detecting diversifying selection, making them appropriate for the analysis of infectious disease, but the results obtained using these methods require careful interpretation[106]. Of particular interest are the replacement mutations that are found on the persisting phylogenetic 'backbone' that represents the ancestor of future virus populations (blue branches), as opposed to those occurring on branches that die out (black branches). The data can also be analysed using temporal phylogeography (see the figure, part d). The nine sequences were sampled from France (green, A), the United Kingdom (blue, B) and two locations in Spain (red, C1 and C2). Statistical methods can be used to reconstruct the history of pathogen spread, so that each branch is labelled with its estimated geographic position. Current reconstruction methods mostly use simple parsimony approaches[107] that reconstruct a minimum set of migration events consistent with the observed phylogeny. Lineage movement events are marked on the phylogeny with crosses. Combining the spatial and temporal information provides further insights — this hypothetical pathogen spread to location C1 years before independently arriving at location C2. Such analyses are not limited to hypotheses concerning physical geography, as the labels A, B, C can stand for any trait of interest, for example, host species, cell tropism during infection, host risk factors or clinical outcome. The principles of coalescent analyses, which incorporate an explicit model of the sampled pathogen population, are illustrated in figure, part e. Each circle represents an infection, and circles on the same row occur during the same period of time. The increasing width of each row therefore reflects the growth of the epidemic through time. Starting from the sampled infections (red), the sampled lineages (black lines) can be traced back through unsampled infections (grey) to the common ancestor (black circle). The rate at which the sampled lineages merge or coalesce depends on population processes such as population dynamics, population structure, selection and recombination (only change in population size is represented here). Coalescent methods are used to infer these processes from randomly sampled pathogen sequences. To illustrate the challenges involved in understanding dynamics at multiple levels jointly, we consider here the well-characterized rate of HIV-1 genome evolution at the within-host and between-host scales. The divergence rates of a series of infections can be plotted against time (see the figure, parts a–d). Each infection is represented by a differently coloured cone of divergence — the gradient of each cone equals the mean rate of within-host virus evolution and the width of each cone represents the variance of this rate. The long-term accumulation of virus divergence at the epidemic level (dashed lines) depends on three factors: the variation in evolutionary rate among strains within a host; whether the average viral rate varies over the course of infection; and whether the strain transmitted to the next host is selected randomly with respect to its evolutionary rate. Empirical analyses indicate a high variance in evolutionary rate among lineages within a host[75], which is caused, at least in part, by latent non-replicative infection of cells[108]. Provided that the lineages are transmitted to subsequent hosts randomly (see the figure, part a), the long-term virus evolutionary rate will, on average, equal the average within-host evolutionary rate, even when these average rates differ between patients (P) (see the figure, part e). Discrepancy between within- and between-host rates In contrast to the above, it seems that HIV-1 evolutionary rates are slower when measured at the epidemic level (see the figure, part e; DRC, Democratic Republic of Congo) than when measured at the within-host level[109] (see the figure, part e; P1–P9 and P11). One explanation for this difference is that transmission is nonrandom, such that slower-evolving lineages are more likely to successfully generate the next infection than faster ones, with the result that the long-term rate is less than the average within-host rate (see the figure, part b). Indeed, the short-sighted action of natural selection will tend to favour those strains with higher within-host fitness, even at the cost of lowered transmissibility. Thus, transmitted viruses could be preferentially drawn from lineages that have accumulated fewer mutations, such as those that have spent a greater proportion of time in a latent state. This effect may be enhanced by the existence of a genetically distinct HIV subpopulation in genital mucosa[88,89]. The discrepancy between within- and between-host rates can also be explained if viral evolutionary rates decrease over the course of infection (see the figure, parts c,d). Several processes could cause such a decrease: the rate of viral replication declines as the disease progresses[75,110]; selection for viral immune escape variants weakens later in infection[76,105]; and adaptation of the viral population is fastest early in infection, soon after its transmission to a new host environment. As yet, the possible effect of recombination on HIV evolutionary rates at different scales is unknown. Whatever the underlying cause, if average evolutionary rates vary during infection then the long-term rate of evolution becomes dependent on when transmission occurs. If within-host rates decline during infection then more rapid transmission will result in a faster long-term rate of evolution (see the figure, part c) than slower transmission (see the figure, part d). This has been shown for the human T cell lymphotropic virus type II, a leukaemia-causing relative of HIV, which seems to evolve many times faster in rapidly transmitting drug users than in populations that are vertically infected during breastfeeding[4]. Conversely, it has been argued that within-host rates increase over the first weeks of infection, owing to the activation of the immune response that drives viral adaptation, hence fast early transmission could alternatively lead to slower long-term rates[111].

109 in total

1. Long term trends in the evolution of H(3) HA1 human influenza type A.

Authors: W M Fitch; R M Bush; C A Bender; N J Cox
Journal: Proc Natl Acad Sci U S A Date: 1997-07-22 Impact factor: 11.205

2. Evolutionary indicators of human immunodeficiency virus type 1 reservoirs and compartments.

Authors: David C Nickle; Mark A Jensen; Daniel Shriner; Scott J Brodie; Lisa M Frenkel; John E Mittler; James I Mullins
Journal: J Virol Date: 2003-05 Impact factor: 5.103

3. The molecular population genetics of HIV-1 group O.

Authors: Philippe Lemey; Oliver G Pybus; Andrew Rambaut; Alexei J Drummond; David L Robertson; Pierre Roques; Michael Worobey; Anne-Mieke Vandamme
Journal: Genetics Date: 2004-07 Impact factor: 4.562

4. HIV phylogenetics.

Authors: Deenan Pillay; Andrew Rambaut; Anna Maria Geretti; Andrew J Leigh Brown
Journal: BMJ Date: 2007-09-08

5. A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus.

Authors: Roman Biek; J Caroline Henderson; Lance A Waller; Charles E Rupprecht; Leslie A Real
Journal: Proc Natl Acad Sci U S A Date: 2007-04-30 Impact factor: 11.205

6. Novel mammalian herpesviruses and lineages within the Gammaherpesvirinae: cospeciation and interspecies transfer.

Authors: Bernhard Ehlers; Güzin Dural; Nezlisah Yasmum; Tiziana Lembo; Benoit de Thoisy; Marie-Pierre Ryser-Degiorgis; Rainer G Ulrich; Duncan J McGeoch
Journal: J Virol Date: 2008-01-23 Impact factor: 5.103

7. Prevalence of drug-resistant HIV-1 variants in untreated individuals in Europe: implications for clinical management.

Authors: Annemarie M J Wensing; David A van de Vijver; Gioacchino Angarano; Birgitta Asjö; Claudia Balotta; Enzo Boeri; Ricardo Camacho; Maire-Laure Chaix; Dominique Costagliola; Andrea De Luca; Inge Derdelinckx; Zehava Grossman; Osamah Hamouda; Angelos Hatzakis; Robert Hemmer; Andy Hoepelman; Andrzej Horban; Klaus Korn; Claudia Kücherer; Thomas Leitner; Clive Loveday; Eilidh MacRae; Irina Maljkovic; Carmen de Mendoza; Laurence Meyer; Claus Nielsen; Eline L Op de Coul; Vidar Ormaasen; Dimitris Paraskevis; Luc Perrin; Elisabeth Puchhammer-Stöckl; Lidia Ruiz; Mika Salminen; Jean-Claude Schmit; Francois Schneider; Rob Schuurman; Vincent Soriano; Grzegorz Stanczak; Maja Stanojevic; Anne-Mieke Vandamme; Kristel Van Laethem; Michela Violin; Karin Wilbe; Sabine Yerly; Maurizio Zazzi; Charles A Boucher
Journal: J Infect Dis Date: 2005-08-15 Impact factor: 5.226

8. The emergence of HIV/AIDS in the Americas and beyond.

Authors: M Thomas P Gilbert; Andrew Rambaut; Gabriela Wlasiuk; Thomas J Spira; Arthur E Pitchenik; Michael Worobey
Journal: Proc Natl Acad Sci U S A Date: 2007-10-31 Impact factor: 11.205

9. Adaptation of HIV-1 to human leukocyte antigen class I.

Authors: Yuka Kawashima; Katja Pfafferott; John Frater; Philippa Matthews; Rebecca Payne; Marylyn Addo; Hiroyuki Gatanaga; Mamoru Fujiwara; Atsuko Hachiya; Hirokazu Koizumi; Nozomi Kuse; Shinichi Oka; Anna Duda; Andrew Prendergast; Hayley Crawford; Alasdair Leslie; Zabrina Brumme; Chanson Brumme; Todd Allen; Christian Brander; Richard Kaslow; James Tang; Eric Hunter; Susan Allen; Joseph Mulenga; Songee Branch; Tim Roach; Mina John; Simon Mallal; Anthony Ogwu; Roger Shapiro; Julia G Prado; Sarah Fidler; Jonathan Weber; Oliver G Pybus; Paul Klenerman; Thumbi Ndung'u; Rodney Phillips; David Heckerman; P Richard Harrigan; Bruce D Walker; Masafumi Takiguchi; Philip Goulder
Journal: Nature Date: 2009-02-25 Impact factor: 49.962

10. Evolutionary dynamics and emergence of panzootic H5N1 influenza viruses.

Authors: Dhanasekaran Vijaykrishna; Justin Bahl; Steven Riley; Lian Duan; Jin Xia Zhang; Honglin Chen; J S Malik Peiris; Gavin J D Smith; Yi Guan
Journal: PLoS Pathog Date: 2008-09-26 Impact factor: 6.823

260 in total

1. Agent-based and phylogenetic analyses reveal how HIV-1 moves between risk groups: injecting drug users sustain the heterosexual epidemic in Latvia.

Authors: Frederik Graw; Thomas Leitner; Ruy M Ribeiro
Journal: Epidemics Date: 2012-05-02 Impact factor: 4.396

2. Endogenous or exogenous spreading of HIV-1 in Nordrhein-Westfalen, Germany, investigated by phylodynamic analysis of the RESINA Study cohort.

Authors: Glenn Lawyer; Eugen Schülter; Rolf Kaiser; Stefan Reuter; Mark Oette; Thomas Lengauer
Journal: Med Microbiol Immunol Date: 2012-01-20 Impact factor: 3.402

Evolutionary analysis of the dynamics of viral infectious disease.

Main

The global dynamics of influenza A virus.

A spatially and temporally defined epidemic.

Reconstruction of a known HIV-1 transmission chain.

1. Long term trends in the evolution of H(3) HA1 human influenza type A.

2. Evolutionary indicators of human immunodeficiency virus type 1 reservoirs and compartments.

3. The molecular population genetics of HIV-1 group O.

4. HIV phylogenetics.

5. A high-resolution genetic signature of demographic and spatial expansion in epizootic rabies virus.

6. Novel mammalian herpesviruses and lineages within the Gammaherpesvirinae: cospeciation and interspecies transfer.

7. Prevalence of drug-resistant HIV-1 variants in untreated individuals in Europe: implications for clinical management.

8. The emergence of HIV/AIDS in the Americas and beyond.

9. Adaptation of HIV-1 to human leukocyte antigen class I.

10. Evolutionary dynamics and emergence of panzootic H5N1 influenza viruses.

1. Agent-based and phylogenetic analyses reveal how HIV-1 moves between risk groups: injecting drug users sustain the heterosexual epidemic in Latvia.

2. Endogenous or exogenous spreading of HIV-1 in Nordrhein-Westfalen, Germany, investigated by phylodynamic analysis of the RESINA Study cohort.

Review 3. Endogenous viruses: insights into viral evolution and impact on host biology.

4. A Bayesian phylogenetic method to estimate unknown sequence ages.

5. Statistical inference to advance network models in epidemiology.

6. Harnessing evolutionary biology to combat infectious disease.

7. Relating phylogenetic trees to transmission trees of infectious disease outbreaks.

8. Genetic and phylogenetic analyses of influenza A H1N1pdm virus in Buenos Aires, Argentina.

9. Defining influenza A virus hemagglutinin antigenic drift by sequential monoclonal antibody selection.

10. Intra-host evolutionary rates in HIV-1C env and gag during primary infection.