Literature DB >> 23382662

Integrating phylodynamics and epidemiology to estimate transmission diversity in viral epidemics.

Gkikas Magiorkinis¹, Vana Sypsa, Emmanouil Magiorkinis, Dimitrios Paraskevis, Antigoni Katsoulidou, Robert Belshaw, Christophe Fraser, Oliver George Pybus, Angelos Hatzakis.

Abstract

The epidemiology of chronic viral infections, such as those caused by Hepatitis C Virus (HCV) and Human Immunodeficiency Virus (HIV), is affected by the risk group structure of the infected population. Risk groups are defined by each of their members having acquired infection through a specific behavior. However, risk group definitions say little about the transmission potential of each infected individual. Variation in the number of secondary infections is extremely difficult to estimate for HCV and HIV but crucial in the design of efficient control interventions. Here we describe a novel method that combines epidemiological and population genetic approaches to estimate the variation in transmissibility of rapidly-evolving viral epidemics. We evaluate this method using a nationwide HCV epidemic and for the first time co-estimate viral generation times and superspreading events from a combination of molecular and epidemiological data. We anticipate that this integrated approach will form the basis of powerful tools for describing the transmission dynamics of chronic viral diseases, and for evaluating control strategies directed against them.

Entities: Chemical

Mesh：

Year: 2013 PMID： 23382662 PMCID： PMC3561042 DOI： 10.1371/journal.pcbi.1002876

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Mathematical epidemiology describes the spread of infectious diseases and aims to aid in the design of effective public health interventions [1]–[3]. Central to this endeavour is the basic reproductive number (R 0) of an infectious disease, the mean number of secondary infections per primary infection in a completely susceptible population [4] (for notations see Table 1). Under simple epidemiological scenarios, in which all infected individuals behave identically, R 0 depends on the transmission probability per contact with a susceptible individual, the duration of infectiousness and the rate at which new contacts are made [2], [4], [5]. However, studies on sexually transmitted and vector-borne infections indicate that infected individuals behave far from identically and that variation in the number of secondary infections per infected individual can play a major role in epidemic dynamics. For example, some researchers have invoked the so-called 20–80 rule to describe the finding that approximately 20% of infected individuals are responsible for 80% of onward transmission [3], [6], [7]. The term ‘superspreaders’ has been coined to describe hosts that contribute disproportionately to onward infection.

Table 1

Abbreviations and terms used throughout the manuscript.

Symbol	Name	Statistical definiton	Units
R ₀	Basic reproductive number or ratio	Mean number of secondary infections	Number of infections
R _0,a	Basic reproductive number or ratio of the transmitter group assuming a transmitter, non-transmitter secondary infections model	Mean number of secondary infections	Number of infections
Z	Number of secondary infections per infected individual	Random variable	Number of infections
Z _a	Number of secondary infections of the transmitter group assuming a transmitter, non-transmitter secondary infections model	Random variable	Number of infections
N	Number of prevalent cases	-	Number of infected people
N_e	Effective number of infections	-	Number of infected people
PTP	Phylodynamic transmission parameter	-	Number of infections per year
T	Generation time	Average length of time between primary and secondary infections	Years
γ	Recovery rate from the disease	-	Number of persons per year
μ	Death rate of the population	-	Number of persons per year
SSE	Superspreading Events	Minimum expected number of secondary infections from a superspreader	Number of secondary infections
k	Dispersion parameter of the negative binomial distribution	-	-
superspreader	Top 1% of infected individuals when we rank them by their attributed secondary infections	-	-

In previous work, variation in the number of secondary infections per infected individual, Z, has been represented by a negative binomial distribution that is described by two parameters, (i) mean R 0 among infections and (ii) the dispersion parameter k [8], [9]. A small k (<0.1) indicates that a small proportion of infected individuals actively transmit the pathogen, whilst a large k (>4) means that all infected individuals contribute approximately equally to onwards transmission [8], [10]. Lloyd-Smith et al. introduced a definition of superspreaders as the top 1% of hosts when ranked by the number of secondary infections they create [8]. Although superspreading events (SSE) (i.e. the minimum number of secondary infections generated by a superspreader) have been estimated for directly-transmitted acute infections [8], they have never been described for chronic viral infections. The indolent and subclinical nature of chronic infections makes it difficult to track primary and secondary infections of the multiple strains that concurrently transmit in a given population. The problem is further compounded for HIV and the hepatitis C virus (HCV) that circulate in socially-marginalised groups such as injecting drug users (IDUs) and commercial sex workers. In addition to R 0 and the variation in onward transmission, another epidemiologically-important parameter is the average time between the primary and secondary infections, typically termed the infection generation time (T; several other definitions are used in the literature). A short T indicates rapid transmission, whilst a longer T suggests slower spread but also longer carriage. The duration of carriage of pathogens, which is usually known, represents an upper-limit on T and thus it is reasonable to conclude that directly transmitted acute infections have T<1 month whilst chronic infections have T values on the order of months or years. Here we show how transmission variability and infection generation time can be estimated by combining viral genomic data with surveillance data and mathematical epidemiology.

Results/Discussion

Conceptual modelling framework

The concept of effective population size (Ne) has been used in population genetics for at least 50 years (for a brief review see Text S1) [11], [12]. Ne(t) is generally defined as the size of an idealised population (one without selection or population structure) that experiences the same level of genetic drift as the studied population at time t. Ne(t) is typically lower than N(t), the population's actual size at time t. The ratio N(t)/Ne(t) thus indicates how similarly the real population's reproduction matches the assumptions of the idealised model [13], [14]. Under a wide range of scenarios this ratio represents the variation in offspring numbers among individuals [15], [16]. If the population in question is a viral epidemic, then N(t) is the number of infections at time t (or number of prevalent cases) and Ne(t) represents the effective number of infections (i.e. the number of infections of an idealised epidemic that experiences the same level of genetic drift as the studied population). Crucially, if genetic variation among strains has little or no effect on their ability to infect hosts, as appears to be the case for HIV and HCV [11] then the ratio N(t)/Ne(t), is formally equal to var(Z), the variance in the number of secondary infections [17], [18]: N(t) can be directly observed or estimated from surveillance data using classical epidemiological methods [19]. Ne(t) can be estimated by analysing the pattern of genetic diversity in a sample of the viral population. Specifically, methods based on coalescent theory, such as the skyline plot [11], [20], estimate the product of the coalescent Ne(t) multiplied by T, the generation time. The value var(Z)/T is inferable from empirical data and we here call it the phylodynamic transmission parameter, PTP. With all these estimates in hand it is therefore possible to estimate var(Z) from equation 1 as follows:PTP reflects two important features of the intensity of transmission within a population, (i) the variance of secondary infections among infections, and (ii) time between infections. Equation 2 suggests that an epidemic with a specific PTP is equally well described either by slow and highly variable onward transmission or by fast and more homogeneous onward transmission. This means that by comparing prevalent cases and genetic diversity (as measured by the skyline plot) alone, we cannot directly infer var(Z) and T; more information is required to separate these parameters. In the next two sections we consider practical aspects of inferring these two variables.

Infection generation time

Volz and Frost [21], [22] incorporated mathematical epidemiology in coalescent models assuming that pathogens spread in the population according to compartmental models of epidemic spread. As theory predicts they showed that there is no constant transformation from NeT to N because as susceptible hosts decline in the population, T expands; a constant transformation from NeT to N is observed when the epidemic is on the exponential phase (i.e. T remains constant). Koelle and Rasmussen [23] showed similarly that a linear constant transformation of NeT to N is also observed when the epidemic is within a steady endemic state. Thus, if we compare NeT with N at the exponential phase or the endemic state we can assume that T remains constant.

Distributions of numbers of secondary infections for epidemics with active and inactive transmitters

To describe the variability in onward transmission we require a probability density function of the random variable Z, the number of secondary infections per infected individual. Previous work has modeled variation in this number with a negative binomial distribution described by two parameters, mean R 0 and a dispersion parameter k [8], [9]. Chronic viral infections, such as those caused by HIV and HCV, are unlikely to be well described by a single distribution. For these epidemics a significant proportion of transmissions result in inactive infections that transmit the virus no further and thus a mixed distribution is a more realistic representation. In our study we define a sub-population of “inactive” infections whose expected number of secondary infections is equal to 0. The rest of the population is defined as “active”. Active infections comprise a proportion u of all infections and their expected number of secondary infections are assumed to be Poisson distributed with mean R 0,a. The distribution of the number of secondary infections Z in the whole population (active and inactive combined) is therefore a zero-inflated Poisson distribution, such that: Equations 3 and 4 can be used to estimate the number of secondary infections of active infections (R 0,a) provided that estimates of E(Z), u and var(Z) are available.

Proof of concept: Concurrent nationwide epidemics of HCV

Well-described cohorts of HCV infections (of subtypes 1a, 1b, 3a and 4a) have been described in Greek populations [24], [25]. Crucially, for these epidemics we have both surveillance information and concurrent samples of viral genome sequences from the same population. First, we used inferred HCV incidence and prevalence by subtype from previous studies [25]. Next, we used the skyline plot method to estimate the value Ne(t)T for each subtype from the viral genome sequences sampled concurrently from the same populations (see Table S1) [26]–[28]. For both methods we assume that the population corresponds to the set of individuals chronically infected with HCV. The majority of patients with HCV infection develop persistent or chronic infection (60–92%) whilst a minority clears HCV-RNA (8–40%); viral clearance is much faster within the first 2 years of infection and slower thereafter (≪1% per year), while increased rates of viral clearance are associated with younger age, female gender, lack of HIV co-infection, chronic HBV infection and genetic variation in IL28B [29]–[42].

HCV phylodynamic analysis

In total, 24, 27, 24 and 22 samples from Greek patients were amplified and sequenced for subtypes 1a, 1b, 3a and 4a, respectively (Table S1). The majority of subtype 1a and 3a infections were associated with injecting drug use, while for subtype 1b and 4a infections the source of infection was usually unknown. These distributions are consistent with previous epidemiological findings [24]. Phylogenetic trees (Figure S1) were estimated using a part of the NS5B region (nt 8297–8597) for which more reference sequences from other locations are available. These revealed the epidemics of different subtypes in Greece are not monophyletic and thus they arose through multiple introductions. Since the outbreaks were not monophyletic we can only provide upper limits of the date of introduction of each subtype (i.e. the date of the oldest possible introduction). Analysis using molecular clock coalescent methods (Figure 1, Figure S2) indicates that the 1a, 1b, 3a and 4a epidemics first entered the Greek population around 1965, 1958, 1975 and 1967, respectively (Table S2). It is important to note that the methods developed here depend on the exponential growth phase of each subtype, and not on the date of its most recent common ancestor, as the latter is more sensitive to sampling biases. The most striking difference in epidemic history among the subtypes is the rapid exponential growth of subtype 3a during 1978–1990, whereas the other subtypes appeared to expand more slowly during 1960–1990 (Figure 1).

Figure 1

Plots through time of NeT (estimated from genetic data using the Bayesian skyline plot) versus N (estimated from surveillance data using back calculation).

The plot of N is drawn by means of locally weighted smoothing on the scatter plot (lowess) of the estimated N. We have truncated the plots after 1990 as we wish to characterise HCV transmission prior the virus' discovery in 1989. The vertical axes of the plots through time of NeT N for each HCV subtype (B) have been scaled between maximum and minimum values.

Plots through time of NeT (estimated from genetic data using the Bayesian skyline plot) versus N (estimated from surveillance data using back calculation).

Epidemic and phylodynamic estimates are correlated

For each HCV subtype, the estimated plots of N(t)T and N(t) for each subtype correspond with each other in relative size (Figure 1a), indicating that larger N corresponds to larger N. The plots of N(t)T and N(t) for each subtype are also remarkably similar in shape (Figure 1b), indicating that PTP = (N(t)/Ne(t)T) is relatively constant through time. Subsequently, to estimate the ratio N/N for each subtype, we assessed the correlation of N and N during the period of exponential growth using linear regression (suppressing the constant term, since theory proposes that N is directly proportional to N). The correlation of N(t) and Ne(t)T is thus given by N(t) = a Ne(t)T, such that a is an estimate of the phylodynamic transmission parameter PTP = (N/NeT). Since all these metrics are time-series data we corrected the cross-correlations between NeT and N for auto-correlation by means of the Newey-West method [43]. Specifically, we assessed the auto-correlation structure for each parameter and each subtype and then used the maximum lag between the cross-correlated data to correct statistical significance. Linear regressions of N(t) against Ne(t)T for each HCV subtype are strong and significant (p<0.01; R2 = 0.70–0.95). The regression gradients (a) provide estimates of PTP = (N/NeT), which vary from 15.6 to 43.4 for the different HCV subtypes (Table 2, S3).

Table 2

Estimates of transmission parameters for each HCV subtype.

	All				Transmitters	99^th percentile SSE
	PTP = (N/NeT)1 (95% C.I.)	E(Z) = R₀ (95% C.I.)	T 2	u 3	E(Z _a) = Var(Z _a) = R₀ _,a	Top 1% (overall)4
1a	25.8 (21.2–30.2)	3.4 (3.3–3.5)	1.4	0.26	13.1	20
1b	15.6 (14.6–16.4)	4.5 (4.2–4.8)	20.6	0.06	75	83
3a	43.4 (38.6–48.2)	11.5 (10.7–12.4)	3.7	0.47	24.5	35
4a	27.8 (23.2–31.4)	2.4 (2.3–2.5)	0.9	0.2	12	18

Generation time estimated as Var(Z)/PTP (maximum estimate assuming that the minimum proportion of transmitters equals the proportion of IDUs in each subtype).

Proportion of transmitters, practically equal to the proportion of IDUs within each subtype.

Upper 1% of the distribution of secondary infections including transmitters and non-transmitters.

The phylodynamic transmission parameter PTP = N/(NeT) has been estimated as the coefficient of the linear regression of N versus NeT without constant term. For the confidence intervals the autocorrelation structure of each variable has been taken into account according to the Newey-West correction. Generation time estimated as Var(Z)/PTP (maximum estimate assuming that the minimum proportion of transmitters equals the proportion of IDUs in each subtype). Proportion of transmitters, practically equal to the proportion of IDUs within each subtype. Upper 1% of the distribution of secondary infections including transmitters and non-transmitters.

Subtype-specific R estimates

The subtype-specific estimates of mean R 0 during the exponential growth phase of Ne or N were 2.4–11.5 (Table 2, Table S3) assuming that infectivity period is 40 years and life expectancy is 70 years. These estimates are similar to those reported previously for subtypes 1a and 1b (both global samples) and 4a (sampled from Egypt) [44]. The expansion of subtype 3a is characterised by faster epidemic growth over a shorter timeframe compared to the other subtypes (Figure 1) and this is reflected in the large R 0 value for that subtype, which suggests an average of >10 secondary infections per primary infection.

Model of secondary infections in the Greek HCV epidemics

Historically, HCV epidemics have taken two distinct forms: older transfusion and iatrogenic-related transmission, and more recent intravenous drug use-related (IDU-related) outbreaks. The earlier transmission was characterised by slower spread; individuals infected by transfusion or nosocomial transmission are less likely to practice high-risk behaviors and thus often represent transmission chain dead-ends. The more recent IDU-related epidemics are characterised by rapid spread. HCV is hyperendemic in IDUs worldwide with anti-HCV prevalence of 15–90% [45]; IDUs may share syringes, needles and other contaminated equipment and are likely to cause long transmission chains [46], [47]. As explained above, the Z-values of HCV epidemics are thus unlikely to be described well by a single distribution; instead we suggest a bimodal distribution model for the number of secondary infections (see Eq.3–5) that can represent both types of transmission behavior. We can use Equation 4 to test whether our model is congruent with epidemiological data. Equation 4 predicts that PTP increases with the proportion of “transmitters” in the population of infected individuals (provided that the proportion of transmitters is <50%, which is the case for all the HCV epidemics in this study). Regression of PTP against the percentage of IDU infections for each HCV subtype is strongly significant (Figure 2) whereas the regressions for other risk groups are not (Table S4). This suggests that the estimates of PTP are compatible with the known epidemiology of HCV. However, we note that this regression contains only 4 points and therefore data from more sub-epidemics are required to strengthen this finding.

Figure 2

Scatter plot of the proportion of IDUs against the phylodynamic transmission potential ( = N/NeT) for each subtype.

Estimation of the generation time (T)

There is no previously-available estimate for the generation time (T) of HCV since tracking of secondary infections is very difficult and date of infection is in most cases unknown. Some workers have suggested approximating T using the duration of infectiousness (1/(γ+μ)) [48], which for HCV is around 25 years (i.e 1/γ = 40 years and 1/μ = 70 years) (Table S3). If we assume that secondary infections follow a Poisson process within the duration of infectiousness (1/(γ+μ)) (i.e. if we perform a simulation of random secondary infections within 25 years of infectiousness), then the mean average time between primary and the subtending secondary infections is similarly high (∼12.5 years) regardless of the average number of secondary infections. Such values are epidemiologically and empirically unrealistic for many HCV epidemics: we know that IDUs usually get infected within 2 years after initiating injection [49]. By combining Equations 2, 3 4 taking into account that we can investigate how T is dependent on the proportion of the transmitters (u) and vice versa (Table 3, Figure 3):

Table 3

Sensitivity analysis of the transmission parameters (var(Z), u, R 0,a) accounting for different generation times (T) using the two-group (transmitter, non-transmitter) model of secondary infections (Eq.1).

	R₀	T	var(Z)	u	R _0,a
1a	3.4	1	25.8	0.34	9.99
		2	51.6	0.19	17.58
		10	258	0.04	78.28
		25	645	0.02	192.11
1b	4.5	1	15.6	0.65	6.97
		2	31.2	0.43	10.43
		10	156	0.12	38.17
		25	390	0.05	90.17
3a	11.5	1	43.4	0.81	14.27
		2	86.8	0.64	18.05
		10	434	0.24	48.24
		25	1085	0.11	104.85
4a	2.4	1	27.8	0.18	12.98
		2	55.6	0.1	24.57
		10	278	0.02	117.23
		25	695	0.01	290.98

The proportion of the transmitters (u) contrasted to the proportion of IDU, provides us information about epidemiologically probable generation times (T) i.e. we do not expect that the proportion of transmitters would be less than the proportion of IDU in the same population.

Figure 3

Contour plots showing how generation time (T), basic reproductive number (R 0) and the proportion of transmitters in the population (u) co-vary.

Contour plots showing how generation time (T), basic reproductive number (R 0) and the proportion of transmitters in the population (u) co-vary.

Gray bands highlight different values of u. The area between the white dashed lines represents R 0 values estimated by sensitivity analysis of mortality and recovery rate (Table S3). The area between the yellow dashed lines represents the 95% confidence limits of R 0 values estimated assuming 40 years of infectivity and 70 years of life expectancy. The black dots show the maximum T value for each subtype, which is defined by empirical values for u and the median values of R 0 (see text). The proportion of the transmitters (u) contrasted to the proportion of IDU, provides us information about epidemiologically probable generation times (T) i.e. we do not expect that the proportion of transmitters would be less than the proportion of IDU in the same population. We assume that T is constant, which is reasonable for the exponential phase of the epidemic that we focus on [50]–[53]. Equation (5) shows that T is maximized at the smallest plausible value of u. The known epidemiology of HCV in IDUs suggests that the proportion of the transmitters (u) will not be smaller than the proportion of the IDUs (i.e. every IDU is likely to have transmitted), at least in our subtype 1a, 3a and 4a outbreaks, which are driven by intravenous drug use. Thus an epidemiologically-meaningful maximum T value can be obtained by setting u equal to the proportion of IDUs in the population (Figure 3). Using Greek surveillance data on the proportion of HCV infections of each subtype associated with IDU [24] we estimate that the maximum T (Figure 3, Table 3) for subtype 1a (IDU: 26%) is 1.4 years, for subtype 3a (IDU: 47%) is 3.7 years and for subtype 4a (IDU: 20%) is 0.9 years. For the iatrogenic (non IDU-driven) epidemic of 1b (IDU:<10%) we estimate the maximum T close to the approximate duration of infectiousness (∼20 years) [Note that we use IDU as transmitters even if the epidemic is non-IDU driven; this is due to their engagement in repeated paid blood donation up to the end of the 1970s.] [54]. These estimates of T for subtypes 1a, 3a and 4a are more compatible with the natural history of the disease than those based on the duration of infectiousness (∼12.5 years). The probability of secondary infection per contact is expected to be higher during the first year of infection, when viral load is 10 times greater than later in infection [55], [56]. Also, in the first year patients are less likely to have ceased or reduced the high-risk behavior (e.g. IDU) that led them to be infected. Taken together, this suggests that secondary infections are more likely during the first year of infection. For subtype 1b the estimated T is artificially inflated due to its transmission route (see below).

Analysing the transmission diversity of HCV epidemics

We used equations (3) and (4) to estimate the basic reproductive number of the transmitters (R 0,a) and the variability in onward transmission, given the values for u, PTP, R 0 and T obtained above (Table 2). We estimate that for HCV subtypes 1a, 1b, 3a and 4a the R 0,a values ranged from 12 to 74 and the 99th percentile SSE from 18 to 83 secondary infections (Table 2, Figure 4, Figure S4). Compared to directly-transmitted pathogens, HCV epidemics generally have large 99th percentile SSE values, at least at the levels of SARS and Smallpox. For outbreaks of subtypes 1a, 1b, 3a and 4a investigated here, we estimate that 80% of the infections are caused by approximately 20%, 5%, 35% and 15% of the most infectious individuals, respectively (Figure 5).

Figure 4

Estimated distributions of the number of secondary infections per primary infection for each HCV subtype.

Figure 5

Cumulative proportion of onward infection versus the infected population ranked by the number of secondary infections they create.

Cumulative proportion of onward infection versus the infected population ranked by the number of secondary infections they create.

20% of onward infections is indicated with a grey horizontal line. The proportion of the population that generates 80% of onward infections is shown by a vertical dashed line. HCV subtype 1a is close to the 80-20 rule (i.e. 80% of the infections are caused by the most infectious 18%). The subtype 1b epidemic is the oldest and most prevalent in Greece, characterised by a small proportion of IDUs (6%) and was spread due to the use of contaminated blood and blood products. The very large number of secondary infections for each member of the transmitter population (R 0,a = 75), the high degree of superspreading (SSE 99th percentile = 83) and the long generation time (T∼20 years) are compatible with the expected transmission dynamics of blood transfusions in the 1960s and 1970s. Historically, subtype 1b infections in Greece are attributed to the use of imported pooled plasma products, a practice that increased the probability of contaminating dozens of individuals from a single contaminated batch; the plasma products could be stored and distributed over many years leading to an artificially large “generation time”. Moreover, within Greece, infected IDUs during the 1960s and 1970s practiced repeated paid blood donations as a source of income. The reported dynamics of HCV-1b are typical of older (pre-1990s) HCV epidemics and do not apply to contemporary transmission (except in rare instances when transfusion safety breaks down. Similar trends in blood transfusion as a risk factor for HCV have been documented in many developed countries [46], [57]–[60]. On the other hand, the epidemics of subtypes 1a, 3a and 4a epidemics have higher proportions of IDUs (26%, 47% and 20% respectively) [24] and are typical of the modern HCV epidemics in the Western societies. For these epidemics the higher proportion of IDUs resulted in almost proportionally higher mean and variance in the number of secondary infections. The dynamics of these epidemics are still operating in the developed world and the estimated transmission parameters can be used to design mitigating strategies.

Limitations of the study

Phylogenetic analysis suggests the sub-epidemics of HCV in Greece are the result of multiple introductions (i.e. non-monophyletic; Figure S1) suggesting that estimates of Ne(t)T near the root of the each subtype phylogeny may be biased upwards (because lineages fail to coalesce due to population structure). Two arguments suggest this is not a significant issue in our analysis. First, the trajectories of N(t) and Ne(t)T, which were estimated from separate data sources, closely correspond in four independent epidemics (in scale and shape) and N was obtained from epidemiological surveillance data of wholly Greek origin. Second, it is reasonable to assume that coalescent events within the exponential phase (the period during which we compared N(t) and Ne(t)T) did occur within Greece. That is, coalescences close to the root of each phylogeny (which may represent transmission outside Greece) were not used in our analysis. In the worst case scenario – that Ne(t)T has been overestimated – our estimate of PTP can be considered a lower bound and that variation in onward transmission might be even greater than reported here. A second limitation of our study is that our estimate of PTP does not incorporate statistical uncertainty in the estimation of N(t) and Ne(t)T. In the future, we aim to develop a Bayesian approach to incorporate both sources of uncertainty and provide a proper posterior distribution for PTP. Our approach provides information about superspreading from analytical relationships between the rate of coalescence (Ne), viral generation time (T), and prevalence (N) and thus is independent of phylogenetic topology. It is therefore complementary to alternative approaches that investigate how non-random contact structures affect the topology of a transmission tree [61]. At this point we should emphasize that further exploration and extension of the approach is required. For example a zero-inflated Poisson distribution of secondary infections does not fit most of the HIV-1 epidemics. A power-law distribution resulting from sexual-contact analysis would provide a more realistic approximation, for which a detailed analysis of the effect of network structure on PTP needs to be performed. Finally, simulation studies could explore the robustness of the approach under a wider range of epidemiologic scenarios, whilst larger datasets could empirically replicate our findings to support wider applicability of this approach e.g. to inform Public Health policies.

Conclusion

We have shown that phylodynamic methods can be combined with epidemiological surveillance data to estimate the variability in ongoing transmission of a chronic viral epidemic, and to investigate its generation time. Both parameters are critical to the design of effective control measures but are very difficult to estimate from surveillance data alone. We tested the framework on a well-characterised set of HCV epidemic in Greece, showing that the results are epidemiologically coherent and suggesting that this approach could be a new tool for public health. We expect our approach to be most readily adapted to other chronic viral diseases such as HIV, but could also be applied to directly transmitted (e.g. Influenza) or vector-borne (e.g. Dengue) viral epidemics, for which superspreading events and generation times are largely unknown.

Methods

Ethics statement

Study approval was granted by the IRB of Athens University Medical School.

Estimation of chronic HCV incidence and prevalence through time

The overall and genotype-specific incidence of chronic HCV infection has been estimated in previous studies using back-calculation [24], [25]. Briefly, the distribution of transmission risk groups among HCV infected individuals was obtained from 943 Greek patients enrolled in treatment studies [24], [25]. Enrolment took place between 1995 and 2000; patients were adults (18–70 years old) with a histological diagnosis of chronic hepatitis. Injecting drug use, transfusion, other and sporadic transmissions were reported by 24%, 32%, 6% and 38% of the patients, respectively. The distribution of the dates of infection within each transmission group was determined using data from 456 Greek patients enrolled in treatment studies with known dates of infection. We extended the back-calculation approach to estimate subtype-specific incidence of chronic HCV [25] in Greece as follows: a) we estimated the number of individuals infected with HCV in Greece, b) we obtained the distribution of HCV subtypes by year of onset for each transmission group within the infected population and c) we calculated subtype-specific incidence according to transmission group using the number of new infections in the past for each transmission group and the corresponding distribution of HCV subtypes by year of infection. The estimates for each transmission group were then combined to obtain an estimate of the overall genotype-specific incidence and prevalence during 1940–1990.

HCV sequence data

Correct sampling is crucial to the inference of epidemic history from genetic data [62]. All available 1a, 1b, 3a and 4a subtype samples from distinct HCV-infected patients, tested within a 12-year period (1994–2006), were sorted according to their sampling dates, and at least one sample was randomly selected and sequenced for every 6-month interval. For cases in which no sample was available in a specific 6-month interval, the closest sample to that period was selected. Besides the sampling date, additional information was recorded for each sample: patient's age, sex, transmission group and treatment history (Table S1). Samples were excluded where the patient had a prior history of antiviral therapy and/or HIV co-infection, since these factors are believed to affect the intrahost evolution of the virus, thus (theoretically) introducing a bias into the estimation of substitution rate [63]. Sequencing of the HCV E2P7NS2 and NS5B regions was performed as previously described [26].

Estimation of basic reproductive number (R 0)

We estimated R 0 assuming that the population is large enough to follow a deterministic Susceptible-Infected-Removed model (SIR) [3]:where N(t) is the number of infected people at time t (prevalent cases), N(0) is the number of infected people at the baseline of the exponential growth phase, γ is the recovery rate of the disease and μ is the death rate in the general population. This equation is valid for the exponential phase of the epidemic growth. To estimate subtype-specific R 0 we used the nl routine in STATA to fit the above equation to the estimated N(t) curve during the exponential growth phase, assuming an average life expectancy (1/μ) of 70 years and an average infectivity period (1/γ) of 40 years (i.e. excluding host mortality), which are plausible estimates for the study population (Table S3). Note that if the N(t) and Ne(t) are highly correlated (such that N(t)/N(0) is equal to Ne(t)/Ne(0)) then equation 6 shows that we can get equivalent estimates of R 0 from the skyline plot..

Identification of the exponential growth phase

To identify the exponential growth phase of each Greek HCV epidemic, we first defined the end of the exponential phase as 1990, to reflect the introduction of anti-HCV screening after the virus' discovery in 1989. The start of the exponential phase was detected using two methods. First, by visually inspecting the epidemic time series and selecting the first time point after 6 years of consecutive increases of N or NeT. Second, we employed a previously-published algorithm used in quantitative PCR experiments, where the identification of the exponential phase of a growth curve is crucial [64]. Both methods provided closely similar results (±3 years). Phylogenetic trees (midpoint rooted) of the Greek isolates (blue circles) along with a global sample (all published sequences available at April 1st, 2010) on NS5B (nt 8297–8597). (TIF) Click here for additional data file. Upper and lower limits of the 95% Higher Posterior Density (HPD) of the skyline plots (NeT) and of the 95% Confidence Intervals (C.I.) of the back-calculated number of prevalent cases (N). (TIF) Click here for additional data file. Scatter plots of N against NeT for the exponential growth phase along with the fitted regression line that passes from the origin of the axis (i.e. suppressing the constant term). Note that regression has been performed correcting for auto-correlation according to the Newey-West method. We note an apparent deviation from linearity due to stochastic noise independently present the auto-correlated series. This deviation disappears when only independent data points are included in the plot. (TIF) Click here for additional data file. Cumulative distribution of the secondary infections for the Greek HCV epidemics (solid lines) and directly transmitted pathogens (dashed lines) based on estimates provided by Lloyd-Smith et al. [30]. (SSE = Superspreading events) (TIF) Click here for additional data file. A. Demographic features and experimental efficiency in the sample used for the phylodynamic analysis, B. Demographic features of the patients used for the epidemiological analysis. (PDF) Click here for additional data file. Estimated parameters of the phylodynamic analysis. (PDF) Click here for additional data file. Sensitivity analysis for the estimated medians of the Basic Reproductive Numbers (R 0). (PDF) Click here for additional data file. Regression analysis of the percentage of the risk group per genotype with the spread metrics PPT and R 0 per genotype in the study population: coefficients of determination (Pearson's R2) are shown with associated level of significance (P value). (PDF) Click here for additional data file. Supplementary information. (DOC) Click here for additional data file.

57 in total

1. How generation intervals shape the relationship between growth rates and reproductive numbers.

Authors: J Wallinga; M Lipsitch
Journal: Proc Biol Sci Date: 2007-02-22 Impact factor: 5.349

2. Correlates of spontaneous clearance of hepatitis C virus among people with hemophilia.

Authors: Mingdong Zhang; Philip S Rosenberg; Deborah L Brown; Liliana Preiss; Barbara A Konkle; M Elaine Eyster; James J Goedert
Journal: Blood Date: 2005-10-04 Impact factor: 22.113

Review 3. Global epidemiology of hepatitis B and hepatitis C in people who inject drugs: results of systematic reviews.

Authors: Paul K Nelson; Bradley M Mathers; Benjamin Cowie; Holly Hagan; Don Des Jarlais; Danielle Horyniak; Louisa Degenhardt
Journal: Lancet Date: 2011-07-27 Impact factor: 79.321

Review 4. HCV routes of transmission: what goes around comes around.

Authors: Miriam J Alter
Journal: Semin Liver Dis Date: 2011-12-21 Impact factor: 6.115

5. Prevalence and clinical outcome of hepatitis C infection in children who underwent cardiac surgery before the implementation of blood-donor screening.

Authors: M Vogt; T Lang; G Frösner; C Klingler; A F Sendl; A Zeller; B Wiebecke; B Langer; H Meisner; J Hess
Journal: N Engl J Med Date: 1999-09-16 Impact factor: 91.245

Review 6. Recovery, persistence, and sequelae in hepatitis C virus infection: a perspective on long-term outcome.

Authors: H J Alter; L B Seeff
Journal: Semin Liver Dis Date: 2000 Impact factor: 6.115

7. The epidemic behavior of the hepatitis C virus.

Authors: O G Pybus; M A Charleston; S Gupta; A Rambaut; E C Holmes; P H Harvey
Journal: Science Date: 2001-06-22 Impact factor: 47.728

8. Viral phylodynamics and the search for an 'effective number of infections'.

Authors: Simon D W Frost; Erik M Volz
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2010-06-27 Impact factor: 6.237

9. Phylodynamics of infectious disease epidemics.

Authors: Erik M Volz; Sergei L Kosakovsky Pond; Melissa J Ward; Andrew J Leigh Brown; Simon D W Frost
Journal: Genetics Date: 2009-09-21 Impact factor: 4.562

10. Routes of infection, viremia, and liver disease in blood donors found to have hepatitis C virus infection.

Authors: C Conry-Cantilena; M VanRaden; J Gibble; J Melpolder; A O Shakil; L Viladomiu; L Cheung; A DiBisceglie; J Hoofnagle; J W Shih
Journal: N Engl J Med Date: 1996-06-27 Impact factor: 91.245

23 in total

1. Transitioning to highly effective therapies for the treatment of chronic hepatitis C virus infection: a policy statement and implementation guideline.

Authors: Daniel John Smyth; Duncan Webster; Lisa Barrett; Mark MacMillan; Lisa McKnight; Frank Schweiger
Journal: Can J Gastroenterol Hepatol Date: 2014-11

2. Risk Factors Associated with HCV Among Opioid-Dependent Patients in a Multisite Study.

Authors: M Schulte; Y Hser; A Saxon; E Evans; L Li; D Huang; M Hillhouse; C Thomas; W Ling
Journal: J Community Health Date: 2015-10

Review 3. Global control of hepatitis C: where challenge meets opportunity.

Authors: David L Thomas
Journal: Nat Med Date: 2013-07 Impact factor: 53.440

Review 4. Contextualizing Canada's hepatitis C virus epidemic.

Authors: Mel Krajden; Darrel Cook; Naveed Z Janjua
Journal: Can Liver J Date: 2018-12-25

Review 5. Molecular epidemiology, phylogeny and evolution of the filarial nematode Wuchereria bancrofti.

Authors: Scott T Small; Daniel J Tisch; Peter A Zimmerman
Journal: Infect Genet Evol Date: 2014-08-29 Impact factor: 3.342

Review 6. Phylogenetic studies of transmission dynamics in generalized HIV epidemics: an essential tool where the burden is greatest?

Authors: Ann M Dennis; Joshua T Herbeck; Andrew L Brown; Paul Kellam; Tulio de Oliveira; Deenan Pillay; Christophe Fraser; Myron S Cohen
Journal: J Acquir Immune Defic Syndr Date: 2014-10-01 Impact factor: 3.731

7. Addicts with chronic hepatitis C: difficult to reach, manage or treat?

Authors: Barbara Zanini; Federica Benini; Marie Graciella Pigozzi; Patrizia Furba; Ernesto Giacò; Antonia Cinquegrana; Mariagrazia Fasoli; Alberto Lanzini
Journal: World J Gastroenterol Date: 2013-11-28 Impact factor: 5.742

8. A Smartphone Application Supporting Recovery from Heroin Addiction: Perspectives of Patients and Providers in China, Taiwan, and the USA.

Authors: Marya Schulte; Di Liang; Fei Wu; Yu-Ching Lan; Wening Tsay; Jiang Du; Min Zhao; Xu Li; Yih-Ing Hser
Journal: J Neuroimmune Pharmacol Date: 2016-02-04 Impact factor: 4.147

9. HIV epidemiology. The early spread and epidemic ignition of HIV-1 in human populations.

Authors: Nuno R Faria; Andrew Rambaut; Marc A Suchard; Guy Baele; Trevor Bedford; Melissa J Ward; Andrew J Tatem; João D Sousa; Nimalan Arinaminpathy; Jacques Pépin; David Posada; Martine Peeters; Oliver G Pybus; Philippe Lemey
Journal: Science Date: 2014-10-02 Impact factor: 47.728

10. China's tuberculosis epidemic stems from historical expansion of four strains of Mycobacterium tuberculosis.

Authors: Qingyun Liu; Aijing Ma; Lanhai Wei; Yu Pang; Beibei Wu; Tao Luo; Yang Zhou; Hong-Xiang Zheng; Qi Jiang; Mingyu Gan; Tianyu Zuo; Mei Liu; Chongguang Yang; Li Jin; Iñaki Comas; Sebastien Gagneux; Yanlin Zhao; Caitlin S Pepperell; Qian Gao
Journal: Nat Ecol Evol Date: 2018-11-05 Impact factor: 15.460