Literature DB >> 28895527

Assessing the danger of self-sustained HIV epidemics in heterosexuals by population based phylogenetic cluster analysis.

Huldrych F Günthard^1,2, Roger D Kouyos^1,2, Teja Turk^1,2, Nadine Bachmann^1,2, Claus Kadelka^1,2, Jürg Böni², Sabine Yerly³, Vincent Aubert⁴, Thomas Klimkait⁵, Manuel Battegay⁶, Enos Bernasconi⁷, Alexandra Calmy⁸, Matthias Cavassini⁹, Hansjakob Furrer¹⁰, Matthias Hoffmann¹¹, V Aubert, M Battegay, E Bernasconi, J Böni, D L Braun, H C Bucher, A Calmy, M Cavassini, A Ciuffi, G Dollenmaier, M Egger, L Elzi, J Fehr, J Fellay, H Furrer, C A Fux, H F Günthard, D Haerry, B Hasse, H H Hirsch, M Hoffmann, I Hösli, C Kahlert, L Kaiser, O Keiser, T Klimkait, R D Kouyos, H Kovari, B Ledergerber, G Martinetti, B Martinez de Tejada, C Marzolini, K J Metzner, N Müller, D Nicca, G Pantaleo, P Paioni, A Rauch, C Rudin, A U Scherrer, P Schmid, R Speck, M Stöckle, P Tarr, A Trkola, P Vernazza, G Wandeler, R Weber, S Yerly.

Abstract

Assessing the danger of transition of HIV transmission from a concentrated to a generalized epidemic is of major importance for public health. In this study, we develop a phylogeny-based statistical approach to address this question. As a case study, we use this to investigate the trends and determinants of HIV transmission among Swiss heterosexuals. We extract the corresponding transmission clusters from a phylogenetic tree. To capture the incomplete sampling, the delayed introduction of imported infections to Switzerland, and potential factors associated with basic reproductive number R0, we extend the branching process model to infer transmission parameters. Overall, the R0 is estimated to be 0.44 (95%-confidence interval 0.42-0.46) and it is decreasing by 11% per 10 years (4%-17%). Our findings indicate rather diminishing HIV transmission among Swiss heterosexuals far below the epidemic threshold. Generally, our approach allows to assess the danger of self-sustained epidemics from any viral sequence data.

Entities: Chemical Disease Gene Species

Keywords: HIV; basic reproductive number; concentrated vs. generalised epidemic; epidemiology; global health; heterosexual; infectious disease; microbiology; molecular epidemiology; transmission; virus

Mesh：

Year: 2017 PMID： 28895527 PMCID： PMC5650480 DOI： 10.7554/eLife.28721

Source DB: PubMed Journal: Elife ISSN： 2050-084X Impact factor: 8.140

Introduction

Epidemics of HIV and other blood-borne and sexually transmitted diseases (for instance syphilis, HBV and HCV) can be subdivided into concentrated and generalized epidemics. While for the former, the rapid infectious agent transmission is restricted to core transmission groups involved in high-risk behaviors (such as men who have sex with men and injecting drug users), the generalized epidemic refers to fast pathogen spreading in the heterosexual (general) population resulting in higher overall disease prevalence. Mechanistically, the key factor explaining whether the HIV transmission is concentrated or generalized, is the ability of HIV to spread among heterosexuals. If the epidemic in this population is not self-sustained, the HIV epidemic remains concentrated; otherwise the virus is spreading rapidly in the broad population leading to a generalized HIV epidemic. In most resource-rich settings HIV transmission is concentrated, that is, driven mostly by transmission among men who have sex with men (MSM) and injecting drug users (IDU), whereas the limited transmission among heterosexuals is maintained by either imported infections or spillovers from other transmission groups (Kouyos et al., 2010; von Wyl et al., 2011; Ragonnet-Cronin et al., 2016; Xiridou et al., 2010; Esbjörnsson et al., 2016; Sallam et al., 2017). This suggests that in most Western European countries and similar epidemiological settings the basic reproductive number among heterosexuals is below . However, it is not clear how far away from self-sustained the epidemic is in heterosexuals. Moreover, the change in HIV transmission among heterosexuals over time is another important, yet unknown, factor, especially with evidenced increasing risky sexual behavior (Kouyos et al., 2015). It is therefore crucial to assess both the transmission and its time trend in order to obtain meaningful insights into the epidemic. Assessing the subcritical transmission of HIV in the general population shares some methodological similarities with the analysis of stage III zoonoses, for instance, monkeypox (Wolfe et al., 2007), which also exhibit stuttering transmission chains. Both cases follow a source-sink dynamics, i.e., a flux of infections from a subpopulation in which the disease is self-sustained to a population where it is not. For the case of stage III zoonoses and tuberculosis, it has been shown that the distribution of outbreak sizes can be used to quantify the pathogen spread (Blumberg and Lloyd-Smith, 2013b; Blumberg and Lloyd-Smith, 2013a; Borgdorff et al., 1998). The fundamental approach of our study is to apply this concept to transmission of HIV in the general population. However, there are two key differences between emerging zoonotic pathogens and human-to-human infectious agents. Firstly, while the contact tracing data are not available for many sexually transmitted infections (STI), the viral sequences carry valuable information about the transmission chain size distribution. Thus, the approach of quantifying transmissibility from chain size distributions needs to be combined with a tool to derive clusters from viral sequences. Compared to the animal-human transmission the delayed introduction of the index case of an STI or blood-borne virus to the subpopulation of interest plays an important role, especially in viruses like HIV with long infectious periods in the absence of treatment and higher transmissibility during the acute phase (Marzel et al., 2016; Powers et al., 2011; Rieder et al., 2010; Rodger et al., 2016; Hollingsworth et al., 2008; Cohen et al., 2011b; Cohen et al., 2011a; Cohen et al., 2016). This is especially important because a considerable fraction of HIV cases in heterosexuals is found in migrants (Del Amo et al., 2004; von Wyl et al., 2011; European Centre for Disease Prevention and Control/WHO Regional Office for Europe, 2016). If, for example, a migrant infected with HIV abroad moves to Switzerland in the chronic stage of the infection, he/she has (from the perspective of the Swiss population) lost some transmission potential upon entering Swiss heterosexual transmission network. In order to quantify the subcritical transmission we combine phylogenetic cluster analysis with an adapted version of a branching process model based estimator that derives the basic reproductive number from the size distribution of transmission chains. We further extend this approach to determine the impact of calendar time and other potential determinants on ; especially in order to assess whether exhibits an increasing time trend or is high in particular subgroups. Applying this method to the phylogenetic transmission clusters among heterosexuals in the Swiss HIV Cohort Study (SHCS), we can assess transmission of HIV in this population and in particular the risk of a generalized HIV epidemic together with the main determinants of transmission.

Results

We developed a method to assess how far HIV transmission in populations with basic reproductive number is from the epidemic threshold, that is, how far it is from being self-sustained in these populations (see Materials and methods). A classical application of this question/method is HIV-1 transmission in heterosexuals in settings with a concentrated epidemic. Heterosexual HIV-1 transmission in Switzerland is a case in point for such a non-self-sustained HIV epidemic. We identified transmission clusters among heterosexuals in the SHCS. These clusters were small in size (Table 1) and comprised individuals of broad demographic background (see Table 2). Based on the most likely geographic origin of the transmission clusters, we classified transmission chains as being of Swiss origin, that is, to represent introductions from other transmission groups in Switzerland into the heterosexual population, and to be of non-Swiss origin. For these latter transmission chains, we assumed that the of the index case was reduced by a factor of (see Materials and methods). To take into account the imperfect sampling density we fixed the subtype-depending sampling probabilities based on the results from the study by Shilaih et al. (2016), corrected by the proportion of the HIV infected individuals linked to care ( based on Kohler et al., 2015) and the fraction of heterosexuals from the SHCS with an HIV sequence in the phylogenetic tree (). The model parameters used in this study are summarized in Table 1 (see Sensitivity analyses, Appendix 1—figure 1 and Appendix 1—figure 2 for the corresponding sensitivity analyses).

Table 1.

Transmission chain size distribution and model parameters.

	Subtype						Overall
	B	C	01_AE	02_AG	A	Other	Overall
Total number of chains, n (%)	1643 (53%)	322 (10%)	239 (7.7%)	331 (11%)	327 (11%)	238 (7.7%)	3100 (100%)
Chain size, n (%)
1	1437 (87%)	280 (87%)	206 (86%)	272 (82%)	269 (82%)	195 (82%)	2659 (86%)
2	158 (9.6%)	34 (11%)	31 (13%)	40 (12%)	44 (13%)	36 (15%)	343 (11%)
3	30 (1.8%)	7 (2.2%)	1 (0.42%)	10 (3.0%)	10 (3.1%)	6 (2.5%)	64 (2.1%)
4	12 (0.73%)	-	1 (0.42%)	6 (1.8%)	3 (0.92%)	1 (0.42%)	23 (0.74%)
5	1 (0.06%)	1 (0.31%)	-	2 (0.6%)	1 (0.31%)	-	5 (0.16%)
6	1 (0.06%)	-	-	1 (0.3%)	-	-	2 (0.06%)
7	1 (0.06%)	-	-	-	-	-	1 (0.03%)
8	2 (0.12%)	-	-	-	-	-	2 (0.06%)
9	1 (0.06%)	-	-	-	-	-	1 (0.03%)
Sampling probability, p (SD)	0.39	0.29	0.34	0.26	0.33	0.29	0.35 (0.05)
Chain origin, n (%)
Swiss (ρindex=1)	948 (58%)	36 (11%)	36 (15%)	36 (11%)	47 (14%)	30 (13%)	1133 (37%)
non-Swiss (ρindex=0.35)	695 (42%)	286 (89%)	203 (85%)	295 (89%)	280 (86%)	208 (87%)	1967 (63%)

Appendix 1—figure 1.

Sensitivity analysis regarding the index case relative transmission potential.

Panel (i) shows the sensitivity of the estimates from baseline model and panel (ii) the sensitivity of the time trend factor. The colored lines represent the subtype-stratified analyses, while the results from the overall models are shown in gray. In the first sensitivity analysis, the of Swiss-originating transmission chains was held at and the of non-Swiss origin varied (solid lines). In the second analysis, the of Swiss and non-Swiss origin was the same (dashed lines). The dotted lines show the results from the sensitivity subanalysis including only the transmission chains of non-Swiss origin. The vertical and horizontal lines depict the parameters and estimates from the main analysis, respectively.

Appendix 1—figure 2.

Sensitivity analysis regarding the sampling density.

The index case relative transmission potential parameter was the same as used in the main analyses, while the sampling densities varied (-axis). In the pooled analysis (larger plots) the sampling density was the same for all transmission chains. Panel (i) shows the corresponding estimates of the basic reproductive number and the time trend factor estimates are displayed in panel (ii). The dotted vertical lines depict the sampling densities used for each subtype in our study (subtype-stratified plots) and the mean sampling density over all transmission chains (overall plots). The horizontal dotted lines represent the estimates from the main analysis.

of the HIV transmission in Swiss heterosexuals

To obtain an overall estimate for the of HIV transmission in Swiss heterosexuals, the baseline model was fitted to all of the previously described transmission chain data. In this baseline model the was estimated to be (-confidence interval (CI) —). The fact that was clearly below (-value from one-sided Wald hypothesis testing against the alternative ) indicated that HIV transmission is far away from a self-sustained epidemic. Although the overall estimate was clearly below , individual subtypes represent different epidemiological settings and hence individual subtypes may have closer to the epidemic threshold. The subtype-stratified analyses indeed yielded lower of (-CI —) for subtype B as compared to the non-B subtypes (Figure 1). The recombinant form CRF02_AG had the highest estimated of (-CI —). Despite these differences among the estimates for different subtypes they were all significantly below (with all -values from the one-sided test smaller than ). Therefore, we concluded that there is no danger of a self-sustained HIV epidemic in Swiss heterosexuals of any HIV subtype.

Figure 1.

Overall basic reproductive number and per subtype from stratified analysis.

The dark gray point indicates the overall basic reproductive number estimate (by neglecting the transmission chain subtypes) and the corresponding -confidence interval is shown with the dark gray line and the gray-shaded band. The analogous results from the per-subtype stratified analysis are represented by colored points and lines, each color corresponding to one of the subtypes (B, C, CRF01_AE, CRF02_AG or A) or the group of subtypes (other).

Overall basic reproductive number and per subtype from stratified analysis.

Time trend of the

Despite consistently low estimates, an increasing time trend for would impose a potential concern, especially if the time trend would predict a crossing of the epidemic threshold in the near future. To investigate this, we fitted a univariate model with as a linear function of the establishment date of the transmission chain. We found that overall the is decreasing at a factor per years (-CI —). The per subtype-stratified analyses showed the consistently decreasing time trend among the subtypes ranging from factor per years for subtype A to for B-subtype. To better capture the changes of over time we included higher-order polynomials of the establishment date to our model (Figure 2). With the reference date on the 1st of January 1996 (which corresponds to the median estimated date of infection - see Table 2) a cubic spline (without the linear term) was identified as the optimal model according to the Bayesian information criterion (BIC). This model exhibits a mild increase of the from the mid 1980’s to the mid 1990’s, with a peak- of (-CI —) reached in 1996 and followed by a steep and monotonic decrease. It is noteworthy that the time of peak- coincided with the introduction of highly active antiretroviral therapy. Shortly after the started to rapidly decrease and has never rebounded. This extrapolation should be, however, taken with a grain of salt and seen more as a trend rather than a prognosis, since only a few transmission chains have been observed for the recent years (which is reflected by wide confidence intervals).

Figure 2.

Time trends for .

The upper smaller panels show the time trends for from the subtype-stratified analyses, in which the ’s were modeled as linear functions of establishment date (i.e., for each subtype the time trend rate was assumed to be constant). The colored shaded-bands correspond to the -prediction bands. The (best-fitting) nonlinear time trend for from the overall analysis is displayed in the lower panel (dark gray curve) together with the -prediction band (gray-shaded area). The black points represent the estimates from the per establishment year stratified analyses and the gray vertical lines the corresponding -confidence intervals.

Table 2.

Patients’ demographic characteristics.

	Patients	Transmission chains
	Patients	Index case
Total number, n	3698	3100
Age at estimated date of infection [in years], median (IQR)	29.2 (23.1—37.8)	28.8 (22.8—37.4)
Estimated date of infection, median (IQR)	Jun 1996 (Sep 1990—Nov 2001)	Nov 1995 (Sep 1989—May 2001)
Time to diagnosis [in years], median (IQR)	3.40 (1.66—5.24)	3.54 (1.78—5.43)
Reported sex with occasional partner [as fraction of FUPs*], median (IQR)	0.53 (0.09—0.89)	0.50 (0.07—0.88)
No available FUP^†, n (%)	250 (6.8%)	226 (7.3%)
Earliest CD4 count [per μL]^‡, median (IQR)	310 (143—510)	300 (134—507)

*Follow-up visit (FUP).

†Patients without FUP questionnaire regarding the sexual risk behavior. See Sensitivity analyses.

‡One patient did not have any available CD4 cell count. The missing value was imputed with the mean CD4 cell count.

Time trends for .

Determinants of the HIV-transmission

Finally, we identified the characteristics associated with higher and therefore potential focal subpopulations, in which the basic reproductive number could be above . The simplest model containing only the linear terms of risk factors showed that the is decreasing with the establishment date of the transmission chain and that all non-B subtypes have higher compared to subtype B, which was consistent with the findings from the univariate model and per-subtype stratified analyses. Moreover, we found that reporting sex with occasional partners and longer time to HIV diagnosis of the index case are associated with higher , whereas the earliest CD4 cell count and the age do not have significant effects (Figure 3).

Figure 3.

Effect of different factors on the basic reproductive number from the multivariate model with only linear factor terms.

Effect of different factors on the basic reproductive number from the multivariate model with only linear factor terms.

The black square and the black line show the reference basic reproductive number and its -confidence interval (for a transmission chain of subtype B which started on 1.1.1996, and in which the index case was diagnosed 3 years after the infection, was 32 years old upon infection, never reported on having sex with occasional partner and had the earliest CD4 cell count of 350 cells per μL). The vertical gray line separates the factors associated with lower (left; effect factor ) and from the factors contributing to higher (right; effect factor ). The black points on this line refer to the reference transmission chain. The colored and dark gray lines represent the effect sizes from multivariate model (black circles depicting the estimates) for different factors and their -confidence intervals. The corresponding -values are shown in the rightmost column. FUP, follow-up visit. These trends remained robust (Figure 4) when allowing the covariables to enter the model non-linearly (for instance as polynomials like in the case of the time trend above). The final multivariate model identified subtype, establishment date of the transmission chain, frequency of reporting sex with occasional partner and time to diagnosis of the index case as the significant risk factors associated with (see Selection of the predictive models). Allowing nonlinear terms for the time to diagnosis provided better goodness-of-fit than the linear model. The steep increase of in the early/acute phase (see Figure 4) of the infection indicates the importance of early diagnosis (which is nowadays closely related to early treatment initiation) while the time becomes less relevant in the cases diagnosed late in the chronic phase.

Figure 4.

Final multivariate model’s profile plots of factors associated with the basic reproductive number .

The vertical dotted lines depict the reference transmission chain (of subtype B, started on 1.1.1996, in which the observed index case did not report having sex with occasional partner and was diagnosed after 3 years after the infection). The left -axis represents the basic reproductive number whereas the right -axis corresponds to the relative values of as compared to the baseline . The as the function of specific factor (with the other factors held fixed at the reference value) is displayed by the colored (for HIV-1 subtype) and the dark gray (establishment date, sexual risk behavior and time to diagnosis) lines. The vertical bars and the shaded bands, respectively, correspond to the -confidence intervals.

Final multivariate model’s profile plots of factors associated with the basic reproductive number .

Discussion

Our approach demonstrates that viral sequences combined with basic demographic information can be successfully used not only to estimate the basic reproductive number of HIV in a subcritical setting and thereby assess the danger of a generalized HIV epidemic but also to shed light on the trends and other determinants of viral transmission. As a proof of concept, this approach was applied to HIV transmission in Swiss heterosexuals, for which we found an far below the epidemic threshold with a decreasing time trend - indicating a low and decreasing danger of a generalized epidemic. Even though the Swiss HIV epidemic is captured in outstanding detail and representativeness by the SHCS, our approach can be easily used in other non-self-sustained epidemics since viral sequences from genotypic resistance testing are nowadays routinely produced in most resource-rich settings. Moreover, the generalizability of our approach might be broadened to other settings and viruses due to the increased availability of viral sequences boosted by decreasing sequencing costs and the ability of the method to adjust for imperfect sampling. To our knowledge our study represents the first systematic assessment of the basic reproductive number for subcritical HIV transmission among heterosexuals, which makes it difficult to compare our results to other estimates. In addition, it was conducted in one of the most densely sampled settings. Most of the studies investigated the transmission route composition of larger transmission clusters across different B and non-B subtypes (Esbjörnsson et al., 2016; Chaillon et al., 2017; Ragonnet-Cronin et al., 2016; Sallam et al., 2017; Kouyos et al., 2010; von Wyl et al., 2011), or focused on homosexual men or injecting drug users as the main drivers of HIV transmission (Amundsen et al., 2004). Stadler et al. (2012) previously presented a birth-death process based analysis of HIV transmission in Switzerland. However, since this approach is restricted to sufficiently large clusters, it is not suitable for subcritical settings and might potentially overestimate due to selection bias. Hence, our approach, which is tailored to subcritical viral transmission, is complementary to theirs. Among other studies specific for heterosexual populations, Hughes et al. (2009) focused on the clusters of size at least across non-B subtypes, and Xiridou et al. (2010) studied the impact of sexual behavior of migrants on the HIV prevalence, while none of them directly assessed the danger of self-sustained epidemics. Epidemiological differences between the HIV-1 subtypes, especially between B and non-B subtypes, have been pointed out previously (Kouyos et al., 2010; von Wyl et al., 2011). Yet the exact factors contributing to the differences are difficult to identify. On the one hand, the non-B subtypes are often seen in relation to the infections imported from abroad, which could be introduced either by immigrants or by residents who got infected while temporarily abroad. A proportion of these introductions could be attributed to the sex tourism (Rogstad, 2004). However, even the differences between the various non-B subtypes could be substantial, as they represent different epidemiological settings. For instance, the CRF01_AE is often found in Asians and it also most likely originates from Southeastern Asia (Angelis et al., 2015), while subtypes originating from Africa, such as CRF02_AG (Mir et al., 2016), are frequently found in people of black ethnicity. Additionally, poverty and different policies regulating prostitution worldwide also have an impact on the transmission patterns, like on rate of condom use, access to HIV testing and treatment (Shannon et al., 2015). On the other hand, disentangling the effect of different epidemiological characteristics and even of the strains remains challenging, as was significantly affected by the HIV subtype even in the multivariate model (Figure 3). One of the key components of our model is the index case relative transmission potential , which is also associated with some degree of uncertainty. To illustrate its role and influence on the transmission parameters we performed a range of sensitivity analyses (Appendix 1—figure 1). On the one hand, omitting the reduced transmissibility of the index case, that is, assuming , leads to largely underestimated (overall of , -CI —) affirming the importance of this extension. Then again, the concrete value chosen may be debatable, especially due to arguable infectivity in chronic phase (studied by Bellan et al., 2015); thus a small can be caused both by immigration later during chronic infection and by elevated infectivity in the acute phase. To address this issue we lowered the for the transmission chains of non-Swiss origin to to obtain a more conservative estimate of , which was, nevertheless, still safely below (, -CI —). Furthermore, even though theoretically the transmission potential of some index cases could also be enhanced (i.e., ), for instance for sex workers, we do not expect that this is the case for many transmission chains and would therefore have only marginal effect on our estimates. Besides, since a would lead to even lower , our main conclusions would not change (in fact, the assumption of is conservative with respect to our conclusion of ). The presented model is based on source-sink dynamics, which is reflected in the importance of the index case and its immigration background, while the role of emigration is neglected. However, in many resource-rich settings similar source-sink patterns can be observed, both in the migration related influxes and the new virus introductions in the heterosexual population from other risk groups. Namely, the immigration from a setting with a generalized epidemic to a setting with a concentrated epidemic is by far more likely than the emigration. Similarly, occasional spillovers from other risk groups, such as MSM and IDU, to the generalized population are more probable than the reverse. Therefore, the assumption of absence of such outflow from the epidemiological setting under consideration is not problematic when considering a country like Switzerland, but might present a potential limitation if the unit of interest is smaller, like a region or a city. Our approach has theoretically several limitations, which we, however, expect to have only moderate impact. First, we assumed stuttering transmission chains, or in other words, that the basic reproductive number is below . If was larger than the observed transmission chains would have been much longer (see Sensitivity analyses and Appendix 1—figure 5) which is inconsistent with rather small clusters observed in HIV transmission among Swiss heterosexuals (Kouyos et al., 2010; von Wyl et al., 2011 and Shilaih et al., 2016). Second, some transmission chains might still be active, meaning that some patients from the chain could be still infectious and therefore able to further spread the virus. The consequence of this would be an underestimation of for recent years. However, given much higher transmissibility of HIV in the acute and recent infection (Marzel et al., 2016) and estimated mean time to being non-infectious of approximately — years in recent years (Stadler et al., 2012; Hughes et al., 2009) the majority of the observed transmission chains had most likely been stopped by the time of sampling and hence we do expect that this issue will not lead to a major bias of our estimates (see Sensitivity analyses and Appendix 1—figure 4). Third, since our method is based on transmission clusters their misidentification and negligence of their structure could be another constraint. Possible overlapping transmission chains (as it was also noted in Blumberg and Lloyd-Smith, 2013b), that is, misidentifying two transmission chains resulting from two separate introductions of closely related viruses as one single chain, represent the biggest concern in this regard. Failing to identify separate clusters would lead to a higher estimate. However, this means that our method will tend to overestimate and is hence conservative with respect to its main aim of assessing the danger of self-sustained epidemics; thus, if the method predicts an strongly below , the corresponding epidemic will indeed be far away from being self-sustained. Moreover, our method neglects the transmission chain structure and consequently uses only the aggregated number of infections, and assumes the same for the entire chain except for the index case. Yet, this issue is likely to have a weak impact, since we focus on subcritical transmission; the transmission chains are hence short (see Table 1), and their structure conveys only limited information. Indeed, although a huge variation in sexual behavior has been shown previously (Liljeros et al., 2001), our sensitivity analyses exhibited no major impact of varying sexual risk behavior on risk determinants (Sensitivity analyses and Appendix 1—figure 6). Finally, even though the negative binomial model was proposed as the favorable choice for the offspring distribution compared to the Poisson distribution (Blumberg and Lloyd-Smith, 2013b) we did not observe any significant differences in the estimates (see Sensitivity analyses and Appendix 1—figure 7). On the contrary, due to the simplicity of the Poisson distribution we managed to integrate the index case transmission potential reduction and the heterogeneity between the transmission chains into our Poisson-based model in a more systematic manner through the observed variability of the demographic characteristics.

Appendix 1—figure 5.

Sensitivity analysis regarding the stuttering transmission chains assumption.

The Q-Q plots compare the hypothetical transmission chain size distributions (-axis showing their empirical permilles) with the transmission chain size distribution (empirical permilles on the -axis) inferred from the phylogeny. The upper left plot compares the distribution of the simulated transmission chain sizes based on the estimated with the (from the phylogeny) observed transmission chain sizes and thus verifies the estimate. The remaining plots compare the simulated transmission chain size distributions against the extracted transmission chain sizes for closer to to justify the subcritical transmission assumption. Each point represents a permille, hence the darker points indicate more overlapping permilles.

Appendix 1—figure 4.

Relative bias due to ongoing transmission.

The upper panel shows the relative bias of the basic reproductive number from the baseline model and the lower panel the relative bias of the linear time trend factor from the corresponding generalized linear model. The proportion of active transmission chains over time is represented by the black line. The relative bias associated with overestimation and underestimation is displayed with green and red bars-points, respectively. Absence of bias is depicted by the horizontal gray lines.

Appendix 1—figure 6.

Comparison of effect sizes in the multivariate model with linear terms only for different sexual risk behavior definitions of a transmission chain.

The thick lines with black circles show the original effect sizes (where the index case determined the sexual risk behavior of the transmission chain) and their -confidence intervals. The empirical distribution of the effect sizes where a random individual in a transmission chain determines its sexual risk behavior is displayed by the shaded areas. The thinner horizontal double sided arrows with the filled circles correspond to the effect sizes and their -confidence intervals for the transmission chain level fraction of follow-up visits (FUPs) with reported sex with occasional partner by any of the infected individuals from the transmission chain. The vertical dotted gray line depicts the reference from the original model, i.e., using the index case to define the sexual risk behavior.

Appendix 1—figure 7.

Comparison between the Poisson and the negative binomial offspring distribution baseline model estimates.

The dark gray and colored lines show the estimates from the model with Poisson offspring distribution, while the black lines correspond to the negative binomial distribution. The index case relative transmission potential parameter was fixed to and the sampling density (-axis) varied. In the overall analysis the sampling density was the same for all transmission chains regardless of their subtype. The vertical gray lines depict the sampling densities used for each subtype in our study (above panels) and the mean sampling density in the overall analysis (bottom panel).

Conclusion

Generally, our approach allows the assessment of the danger of a concentrated epidemic to become generalized based on the viral sequence data. We demonstrated this approach for the case of heterosexual HIV transmission in Switzerland. In particular, even though the study highlighted some heterogeneity between the HIV subtypes, our findings indicate that there is no imminent danger of a self-sustained epidemic among Swiss heterosexuals, but rather diminishing HIV transmission far below the epidemic threshold. Hence, the HIV epidemic in Switzerland is and most likely will remain restricted to high risk core groups, especially MSM. Moreover, the results suggest that integrated prevention measures in Switzerland taken over time were successful within the heterosexual population.

Materials and methods

We combined a phylogenetic cluster detection approach to identify transmission chains in the population under consideration with an adapted version of the model developed in Blumberg and Lloyd-Smith (2013a) to infer the basic reproductive number (Figure 5). In particular, we accounted for both imperfect detection (included in Blumberg and Lloyd-Smith, 2013a) and modified transmissibility of the index case (not included in Blumberg and Lloyd-Smith, 2013a) from the perspective of the setting under consideration because it enters the population only (late) in chronic infection – e.g., via immigration. Moreover, we included the baseline transmission chain characteristics (such as HIV-1 subtype, date of infection, time to diagnosis, risky sexual behavior, etc.) to explain the heterogeneity among transmission chains. Note that our approach in principle estimates the effective reproductive number defined as the number of secondary infections for the current state of population; however, in case of a non-self-sustained epidemic with low prevalence, the vast majority of the population is susceptible and hence the effective reproductive number is a very good approximation for the basic reproductive number.

Figure 5.

Graphical representation of our phylogeny-based statistical approach.

(i): HIV transmission among heterosexuals in Switzerland (white arrow) has never led to a self-sustained epidemic. However, the unknown potential of imported infections (black arrows) either from abroad or from other transmission groups in Switzerland remains a large concern. (ii): The HIV transmission chains corresponding to Swiss heterosexuals (depicted in red) were identified from the phylogenetic tree containing the SHCS and background viral sequences. (iii): Our mathematical model is based on the discrete-time branching process with nodes of three different types: sampled Swiss infection (red), unsampled Swiss infection (light red) and foreign infection infected by a Swiss index case before moving to Switzerland (green). (iv): Our method for inferring accounts for both imperfect sampling and modified transmission potential of the index case. (v): Moreover, it includes the baseline transmission chain characteristics to assess the determinants of .

Graphical representation of our phylogeny-based statistical approach.

SHCS and viral sequences

The SHCS is a multicenter, nationwide, prospective observational study of HIV infected individuals in Switzerland, established in 1988 (Swiss HIV Cohort Study et al., 2010). The SHCS was approved by the ethics committees of the participating institutions (Kantonale Ethikkommission Bern, Ethikkommission des Kantons St. Gallen, Comite Departemental d’Ethique des Specialites Medicales et de Medicine Communataire et de Premier Recours, Kantonale Ethikkommission Zürich, Repubblica e Cantone Ticino–Comitato Ethico Cantonale, Commission Cantonale d’Étique de la Recherche sur l’Être Humain, Ethikkommission beiderBasel; all approvals are available on http://www.shcs.ch/206-ethic-committee-approval-and-informed-consent), and written informed consent was obtained from all participants. Up to December 2016 over patients have been enrolled. The SHCS is highly-representative as it covers more than HIV-positive individuals on antiretroviral therapy (ART) in Switzerland (Swiss HIV Cohort Study et al., 2010). In addition to the extensive demographic and clinical data collected at biannual/quarterly follow-up (FUP) visits, for approximately of the patients at least one partial pol sequence from the genotypic resistance testing is available (in total sequences from the SHCS resistance database until August 2015). The patients with heterosexual contact as the most likely transmission route comprise about one third of all SHCS participants.

Phylogenetic tree

The phylogenetic tree was constructed from the Swiss HIV sequences of the SHCS patients and non-Swiss background sequences exported from the Los Alamos National Laboratory, 2016 database ( HIV-1 viral sequences of any subtype and including the circulating recombinant forms 01–74 retrieved on February 23rd, 2016 spanning over the protease and RT regions with fragments of at least nucleotides; the HXB2 sequence and sequences from Switzerland were removed afterwards). The sequences of HIV-1 subtypes and circulating recombinant forms (B, C, CRF01_AE, CRF02_AG, A(1-2)), G, D and F(1-2)) were pairwise aligned to the reference genome HXB2 (accession number K03455) using Muscle v3.8.31 (Edgar, 2004). Sequences with insufficient sequencing quality of the protease region (coverage of less than nucleotides between the positions and of HXB2) or reverse transcriptase region (less than nucleotides between positions and ) were excluded. Using the earliest available of the remaining sequences for each patient, the phylogenetic tree was built with the FastTree algorithm under the generalized time-reversible model of nucleotide evolution (Price et al., 2009) including SHCS and background sequences.

Transmission chains

The Swiss heterosexual transmission chains were defined as clusters in the phylogenetic tree containing exclusively Swiss HIV sequences from individuals with heterosexual contact as the most likely route of the transmission, regardless of the respective genetic distances and local support values (see Sensitivity analyses and Appendix 1—figure 8 for alternative definition). The transmission chains and the patients enrolled in the SHCS forming them were identified with custom written functions in R (version 3.3.2).

Appendix 1—figure 8.

Sensitivity analysis regarding the transmission cluster definition.

The upper panel (i) compares the estimated with the original cluster definition (brighter lines) with the estimated based on the relaxed cluster definition (darker lines) from the overall analysis (in gray) and subtype-stratified analyses (in colors). Similarly, the bottom panel (ii) shows the comparison between the estimated time trend factors obtained from the transmission chain sizes based on different cluster definition thresholds.

For each transmission chain we determined if it was introduced to the Swiss HIV heterosexuals either as an imported infection from abroad or from other HIV transmission groups within Switzerland. The geographic origin for a given chain was obtained as the country of the closest sequence, which did not belong to Swiss heterosexuals. Specifically, we considered the smallest clade that contained both the transmission chain and either a non-Swiss or non-heterosexual sequence, and chose the sequence with the smallest pairwise genetic distance to the transmission chain (with respect to the Jukes and Cantor (JC69) model). Additionally, in each extracted transmission chain the observed index case was identified as the patient with the earliest estimated date of infection in the chain. The date of HIV infection for each single individual was imputed with the model described by Taffé et al. (2008) if the patient had enough CD4 cell count measurements before the ART initiation and the estimated date of infection fell within the seroconversion window; otherwise the midpoint of the seroconversion window was used. The demographic characteristics (Table 2) of the index case were extracted from the SHCS, including age at infection, time to diagnosis, first available CD4 cell count and sexual risk behavior. The latter was quantified as the fraction of semiannual follow-up visits at which the patient reported sex with occasional partners. The patients with no available questionnaire regarding the sexual risk behavior were assumed to have never reported on having sex with occasional partner (see Sensitivity analyses and Appendix 1—figure 9 for the corresponding sensitivity analysis). The characteristics of the index case were then used to define the features of each corresponding transmission chain.

Appendix 1—figure 9.

Subanalysis for the transmission chains with available follow-up information about sex with occasional partner of the index case compared to the main analysis with imputed data.

The effect sizes from the subanalysis are shown in brighter colors and those from the main analysis in dark. In the main analysis, the missing data were replaced by never reporting sex with an occasional partner.

Estimating the basic reproductive number from a model

Our model is based on the basic discrete-time branching process. The basic reproductive number was inferred from the model as the expected number of offsprings, therefore the offspring distribution represents the crucial component of the chain size distribution model. In the following sections we describe the main extensions of the basic branching process theory, which were implemented in our model. The detailed derivations can be found in Appendix 3.

Offspring distribution

We modeled the offspring distribution in a transmission chain using a Poisson distribution, which is a special case of the negative binomial distribution. The latter has been suggested in the literature (Blumberg and Lloyd-Smith, 2013b) in order to infer ; however since we did not observe any large differences between the two distributions (see Sensitivity analyses and Appendix 1—figure 7), we decided to use the simpler Poisson model. Suppose that denotes the number of secondary infections of transmission degree caused by the th individual from the preceding generation (i.e., infected individuals with transmission degree ), where the transmission degree refers to the number of transmissions needed to transfer the pathogen from the index case (see Appendix 3 for detailed model description). Under the Poisson offspring distribution the number of secondary infections is modeled by which coincides with the definition of the basic reproductive number . Some index cases may have lower transmission potential, e.g., immigrants that arrive during their chronic infection phase, while other index cases may exhibit enhanced transmissibility, for example, sex workers or foreigners living in Switzerland without a partner. To capture a potentially modified transmissibility of the index case we assumed a different offspring distribution of the root, namely where denotes the index case relative transmission potential. To assess the trends and determinants of , we further extended the offspring distribution based on the baseline characteristics of the transmission chain. More precisely, we assumed that the logarithm of can be linearly described by the chain characteristics which resulted in the offspring distributions for the secondary and the index cases, respectively. Hence, the can be predicted from the effect sizes of factors as Note that since each transmission chain has its specific baseline characteristics (perhaps even sampling density and index case relative transmission potential ) the notation above represents a simplification. More precisely, the of the th transmission chain equals .

Likelihood function

The likelihood function was expressed in terms of the probability generating function (PGF) of the transmission chain size distribution assuming independent and stuttering (i.e., assures that each transmission chain goes extinct almost surely) transmission chains. The following assumptions were made when incorporating the incomplete sampling of the sequences: For each transmission chain at most one observed transmission chain can be extracted from the phylogeny. In other words, all observed cases belonging to the same transmission chain can be identified as the cases forming the corresponding observed transmission chain, although some intermediate transmitters might not have been sampled. For a phylogeny, this represents by a definition a weak assumption; in contrast, for contact tracing approaches missing one ancestor can lead to misidentifying one transmission chain as two or more. The sampling density is independent of the transmission chain size or the transmission degree of the individual, namely each case of the transmission chain can be observed independently from the rest of the chain with probability . Let denote the true size of a transmission chain and the size of the corresponding observed transmission chain. The above two assumptions can be summarized as and the PGF of the observed transmission chain size hence equals in terms of the PGF of . The probability that a transmission chain has observed size of (where means that none of the cases of the transmission chain is detected) is given by In particular, the probability that a transmission chain is observed (i.e., the observed size is strictly positive) can be calculated as However, since only the transmission chains with at least one detected case can be extracted from the phylogeny (and therefore to account for the unobserved transmission chains) we are interested in the probability that an observed transmission chain has a specific size. The probability of observing a transmission chain of size is Finally, for a set of independent observed transmission chain sizes the likelihood function equals if the same , and are assumed for all transmission chains. For transmission chains with different baseline characteristics and different parameters, the generalized likelihood function is

Model fit

The maximum likelihood (ML) estimator for , the predictor for and the corresponding statistics (confidence intervals, -values, etc.) were implemented in the R package PoisTransCh (Turk, 2017, https://github.com/tejaturk/PoisTransCh; copy archived at https://github.com/elifesciences-publications/PoisTransCh). The provided confidence intervals are the Wald-type -confidence intervals (see Sensitivity analyses for the comparison against different types) and the -values are based on the Wald statistic. Initially, we assessed the impact of covariables potentially associated with HIV transmission. Specifically, we considered HIV-1 subtype, establishment date of the transmission chain (i.e., the earliest estimated date of infection in the transmission chain), reported sex with occasional partner, age at infection, first measured CD4 cell count and time to diagnosis of the index case. Final model selection was carried out by the forward selection and backward elimination algorithms based on the Akaike and Bayesian information criterion (AIC and BIC, respectively). The detailed steps are provided in Selection of the predictive models.

Datasets

Previously published datasets from Kouyos et al. (2010) and von Wyl et al. (2011) were used in this study. As previously discussed in these publications, due to the large sampling density this data would, in principle, allow for the reconstruction of entire transmission networks and could thereby endanger the privacy of the patients. This is especially problematic because HIV-1 sequences frequently have been used in court cases. Therefore, a random subset of 10% of the sequences are accessible via GenBank. These accession numbers are as follows: GU344102-GU344671, EF449787, EF449788, EF449796, EF449798, EF449828, EF449829, EF449838, EF449844, EF449852, EF449853, EF449854, EF449860, EF449880, EF449883, EF449889, EF449895, EF449901, EF449904, EF449905, EF449917, EF449921, EF449928, EF449930, EF449943, EF449950, EF449960, EF449971, EF449980, EF449987, EF450004, EF450005, EF450011, EF450024, EF450026, GQ848113, GQ848120, GQ848140, GQ848145, GQ848149, JF769777-JF769851 In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included. Thank you for submitting your article "Assessing the danger of self-sustained HIV epidemics in heterosexuals by population based phylogenetic cluster analysis" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom, Ryosuke Omori (Reviewer #1) served as Guest Editor, and the evaluation has been overseen by Prabhat Jha as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Dimitrios Paraskevis (Reviewer #2); Nico Nagelkerke (Reviewer #3). The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. Summary: The authors developed a new method for estimating epidemiological parameters e.g., R0, using viral sequences sampled from patients, via estimating infection tree per each cluster in phylogenetic tree. Using the model they assessed the risk of a generalized heterosexual HIV epidemic in Switzerland. They find that the basic reproduction number is well below 1 they conclude that this risk is negligible. Essential revisions: The robustness of the results should be assessed with relaxing the several biological assumptions assumed by the authors. 1) If the first CD4 count (must have been taken before ART initiation) is 310 (median, some even <200 and thus having AIDS) it is questionable whether the assumption of patient becoming non-infectious after 2-2.5 years is realistic. 2) The high acute phase infectivity of HIV has been challenged (e.g. Bellan et al., 2015), this also should be mentioned and considered in the analysis. Since index cases may differ substantially from other cases (e.g. foreigners in Switzerland without their partner, moving frequently between places), ρindex may well exceed 1. It is not entirely clear whether – when considering missing patients – the case of a missing index case without "offspring" (i.e., no patient is observed at all) is considered. The authors assume that the number of secondary infections follows the same distribution regardless of the transmission degree (subsection “Probability generating function of a completely observed uniform transmission chain”). This assumption means that the sexual behaviors are similar among hetero-sexual population whereas the field data of sexual behavior shows huge variation (e.g., Liljeros et al., 2001). Regarding the estimate for confidence intervals, it is not clear whether the distribution of ML estimator can be approximated by normal distribution or not due to small sample size. If the authors will not apply Wald approximation, the width of confidence interval may change. The point estimates of R0 shown in the lower panel of Figure 2 (black dot) are likely to be increased since 2009 although the authors' model predicts monotonic decrease. Why does this discrepancy happen? The authors' model predicts HIV will go extinct around 2020, however, this prediction may be optimistic if the predictive model lacks some critical factors. Essential revisions: The robustness of the results should be assessed with relaxing the several biological assumptions assumed by the authors. 1) If the first CD4 count (must have been taken before ART initiation) is 310 (median, some even <200 and thus having AIDS) it is questionable whether the assumption of patient becoming non-infectious after 2-2.5 years is realistic. 2) The high acute phase infectivity of HIV has been challenged (e.g. Bellan et al., 2015), this also should be mentioned and considered in the analysis. We believe that this might be a misunderstanding, since none of the two assumptions was made in our analyses directly but rather used as an intuitive explanation of 1) why the ongoing transmissions are not problematic and 2) to justify the introduction of ρindex (which was already assessed in a sensitivity analysis by setting ρindex=1). When verifying the stuttering transmission chains assumption we originally used even an infectious period of 5 years because it is more conservative for this test (longer infectious periods imply more ongoing transmission chains and hence potentially stronger underestimation of R0). Despite the minor impact of these assumptions on our method, we further addressed these issues as follows: Regarding 1) we extended and adjusted the section “Stuttering transmission chains assumption” (now named “Ongoing transmission and stuttering transmission chains assumption”) to assess the role of the duration of infectiousness in our study. Rather than assuming a constant duration, we assumed decreasing periods over time and used the information from the SHCS to estimate the decrease over time. Next, we re-evaluated the conservative maximum number of transmission degrees that had been completed by a certain date (Appendix 1—figure 3), which were then used in the simulation study to verify the subcritical transmission assumption (now Appendix 1—figure 5, previously Appendix 1—figure 3). Additionally, we explicitly assessed the relative bias stemming from ongoing transmission (Appendix 1—figure 4). We had addressed point 2) in the original version of the manuscript by performing the sensitivity analysis over a range of ρindex (Appendix 1—figure 1), where the lower ρindex values implicitly correspond to later immigration to Switzerland and/or the importance of acute phase on transmission, and vice versa – higher ρindex indicates either immigration early during infection or high infectivity long into the chronic phase. We added a more explicit relation between the ρindex values and infectivity in the Discussion. The revised sentence in the Discussion reads: "Then again, the concrete value chosen may be debatable, especially due to arguable infectivity in chronic phase (studied by Bellan et al., 2015; thus a small ρindex can be caused both by immigration later during chronic infection and by elevated infectivity in the acute phase." Since index cases may differ substantially from other cases (e.g. foreigners in Switzerland without their partner, moving frequently between places), ρindex may well exceed 1. This is an excellent point. In principle the index case could exhibit enhanced transmission potential. However, we do not expect that this is often the case and even if the ρindex exceeds 1, this would imply even lower R0. We added this comment to the Discussion: "Furthermore, even though theoretically the transmission potential of some index cases could also be enhanced (i.e., ρindex>1), for instance for sex workers, we do not expect that this is the case for many transmission chains and would therefore have only marginal effect on our estimates. Besides, since a ρindex>1 would lead to even lower R0, our main conclusions would not change (in fact, the assumption of ρindex<1 is conservative with respect to our conclusion of R0<1)." Furthermore, we replaced the "reduced" with "modified" transmission potential/transmissibility throughout the manuscript and extended the range of the sensitivity analyses regarding ρindex beyond 1 (Appendix 1—figure 1). It is not entirely clear whether – when considering missing patients – the case of a missing index case without "offspring" (i.e., no patient is observed at all) is considered. We are very grateful for this remark. In our model we indeed considered the transmission chains without any observed cases (i.e., all patients including the index case are missing) by normalizing the state probabilities for the observed transmission chain size distribution by the probability that a transmission chain is observed (namely ). In this sense our model takes into account that the sample of observed transmission chains from the phylogeny is biased, simply because the transmission chains with no observed cases cannot be detected. The same solution was suggested by Blumberg and Lloyd-Smith, 2013b. In the Methods and Materials we added some lines to explicitly explain how the unobserved transmission chains were handled: "The probability that a transmission chain has observed size of (where means that none of the cases of the transmission chain is detected) is given by In particular, the probability that a transmission chain is observed (i.e., the observed size is strictly positive) can be calculated as However, since only the transmission chains with at least one detected case can be extracted from the phylogeny (and therefore to account for the unobserved transmission chains) we are interested in the probability that an observed transmission chain has a specific size. The probability of observing a transmission chain of size is The second source of confusion could be the transmission chains in which the index case (and/or some other intermediaries) are missing. These cases are described with the two assumptions in the Likelihood function section. For instance, if an index case with a single offspring was not sampled while its offspring was detected on the phylogeny, our model would treat this transmission pair as an observed transmission chain of observed size 1 and the sampling density p<1 would account for the 'missing' index case. The authors assume that the number of secondary infections follows the same distribution regardless of the transmission degree (subsection “Probability generating function of a completely observed uniform transmission chain”). This assumption means that the sexual behaviors are similar among hetero-sexual population whereas the field data of sexual behavior shows huge variation (e.g., Liljeros et al., 2001). We indeed made the above assumption as a trade-off between accuracy and simplicity, as modelling the human sexual behavior would most likely require changing patterns even within an individual over time. Yet, since our transmission chains are very short (mean 1.19, median 1), we did not expect that the heterogeneity between the individuals within a chain would have an important impact, as we had explained in the Discussion. The following three arguments affirmed this explanation: • The significance at 5%-level and the direction of different determinants did not change by selecting a random infected individual from each chain to determine the risky sexual behavior of the chain as compared to the main analysis, in which the index case determined the sexual risk behavior of a transmission chain (newly added Appendix 1—figure 6). • Similarly, the determinants were robust when considering all the follow-up questionnaires (about sex with an occasional partner) from all the patients from a transmission chain to determine the chain’s risky behavior (new Appendix 1—figure 6). • The comparison between the negative binomial and Poisson based transmission chain size distribution model did not exhibit a strong preference of the former over the latter. Noteworthy, the dispersion parameter of the negative binomial distribution takes into account the heterogeneity between the individuals, hence no significant difference between the models implies that the heterogeneity between the individuals is sufficiently reflected by the variability between the transmission chains in terms of their characteristics (i.e., the model covariables), including the risky sexual behavior (see Comparison between Poisson and negative binomial offspring distribution based models in Appendix 1). We added a section in the Sensitivity analyses appendix to explain how the effect of the variability in sexual behavior was assessed ("Variation in sexual behavior along transmission chains"). Regarding the estimate for confidence intervals, it is not clear whether the distribution of ML estimator can be approximated by normal distribution or not due to small sample size. If the authors will not apply Wald approximation, the width of confidence interval may change. We agree with the reviewers that the Wald approximation based on the normal distribution might be debatable. To address this question we did the following: • We performed parametric bootstrap to assess the assumption of the normal approximation and the performance of the Wald-type confidence intervals in mean of coverage rates. The obtained empirical distribution of the ML estimator could be well approximated by the normal distribution and the coverage rates were very close to or above the target 95% (newly added Appendix 1—figure 10). • For all the transmission parameters from any of the models presented in our study (newly added Appendix 1 Table 1) we constructed the profile likelihood based confidence intervals, as well as the basic bootstrap confidence intervals (from the parametric and nonparametric bootstrapping) to compare their widths. For our sample size, the Wald confidence intervals turned out to be almost the same as the profile likelihood based confidence intervals, while we did not observe that the Wald-type confidence intervals are systematically wider/narrower than the bootstrap based confidence intervals (Appendix 1—figure 11). We therefore concluded that the normal approximation Wald-type confidence intervals for our sample size are a reliable and computationally less expensive alternative. We added a section "Confidence intervals" to Appendix 1. The point estimates of R0 shown in the lower panel of Figure 2 (black dot) are likely to be increased since 2009 although the authors' model predicts monotonic decrease. Why does this discrepancy happen? The authors' model predicts HIV will go extinct around 2020, however, this prediction may be optimistic if the predictive model lacks some critical factors. This is a very valid remark. In our model, the R0 point estimates exhibit a slight increase after 2009, however the number of yet sampled transmission chains from these recent years is still small, which is also reflected by very wide confidence intervals. We admit that the extrapolation up to 2020 might be too optimistic; therefore we modified the plot (Figure 2 and the profiled multivariate plot in Figure 4) such that the R0 is now only shown up to year 2015 to avoid misinterpretations. The plot shows the general trend and should be understood as such and not as a definitive prognosis of R0 for the future years, as all the models are related to some kind of uncertainty. We also added a note on this issue at the end of the paragraph describing Figure 2 in the Results section: "This extrapolation should be, however, taken with a grain of salt and seen more as a trend rather than a prognosis, since only a few transmission chains have been observed for the recent years (which is reflected by wide confidence intervals)."

Appendix 1—table 1.

Overview of all the parameters, their estimates and the -confidence intervals fitted in all the models presented in this study.

Subtypes	Parameter number	Parameter name	Parameter estimate	Wald-type 95%-CI	Profile likelihood 95%-CI
Overall	1	log⁡(R0)	-0.823	(-0.876,-0.770)	(-0.878,-0.772)
B	2	log⁡(R0)	-1.037	(-1.121,-0.952)	(-1.124,-0.955)
C	3	log⁡(R0)	-0.719	(-0.879,-0.559)	(-0.892,-0.571)
01_AE	4	log⁡(R0)	-0.826	(-1.036,-0.615)	(-1.057,-0.632)
02_AG	5	log⁡(R0)	-0.483	(-0.587,-0.378)	(-0.594,-0.384)
A	6	log⁡(R0)	-0.618	(-0.751,-0.485)	(-0.760,-0.492)
other	7	log⁡(R0)	-0.605	(-0.758,-0.451)	(-0.771,-0.461)
Overall	8	log⁡(R0,𝑟𝑒𝑓)	-0.839	(-0.894,-0.784)	(-0.895,-0.785)
Overall	9	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.112	(-0.187,-0.037)	(-0.188,-0.037)
B	10	log⁡(R0,𝑟𝑒𝑓)	-1.070	(-1.165,-0.975)	(-1.169,-0.979)
B	11	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.112	(-0.234,0.010)	(-0.236,0.008)
C	12	log⁡(R0,𝑟𝑒𝑓)	-0.692	(-0.851,-0.533)	(-0.864,-0.544)
C	13	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.209	(-0.466,0.049)	(-0.473,0.046)
01_AE	14	log⁡(R0,𝑟𝑒𝑓)	-0.781	(-0.991,-0.570)	(-1.013,-0.588)
01_AE	15	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.255	(-0.616,0.106)	(-0.629,0.101)
02_AG	16	log⁡(R0,𝑟𝑒𝑓)	-0.434	(-0.539,-0.329)	(-0.545,-0.333)
02_AG	17	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.415	(-0.609,-0.222)	(-0.615,-0.226)
A	18	log⁡(R0,𝑟𝑒𝑓)	-0.725	(-0.892,-0.558)	(-0.907,-0.571)
A	19	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.430	(-0.660,-0.199)	(-0.672,-0.209)
other	20	log⁡(R0,𝑟𝑒𝑓)	-0.600	(-0.754,-0.446)	(-0.767,-0.456)
other	21	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.162	(-0.397,0.073)	(-0.403,0.072)
Overall	22	log⁡(R0,𝑟𝑒𝑓)	-0.710	(-0.780,-0.640)	(-0.782,-0.641)
	23	(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10)2	-0.313	(-0.451,-0.176)	(-0.457,-0.182)
	24	(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10)3	-0.184	(-0.283,-0.086)	(-0.288,-0.091)
Overall	25	log⁡(R0,𝑟𝑒𝑓)	-1.252	(-1.366,-1.137)	(-1.369,-1.140)
	26	𝑆𝑢𝑏𝑡𝑦𝑝𝑒C	0.352	(0.167,0.538)	(0.158,0.531)
	27	𝑆𝑢𝑏𝑡𝑦𝑝𝑒01⁢_⁢𝐴𝐸	0.274	(0.046,0.502)	(0.029,0.490)
	28	𝑆𝑢𝑏𝑡𝑦𝑝𝑒02⁢_⁢𝐴𝐺	0.575	(0.428,0.721)	(0.426,0.720)
	29	𝑆𝑢𝑏𝑡𝑦𝑝𝑒A	0.430	(0.271,0.588)	(0.266,0.584)
	30	𝑆𝑢𝑏𝑡𝑦𝑝𝑒𝑜𝑡ℎ𝑒𝑟	0.426	(0.247,0.606)	(0.238,0.600)
	31	𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10	-0.214	(-0.301,-0.127)	(-0.301,-0.128)
	32	𝐴𝑔𝑒-3210	0.007	(-0.045,0.058)	(-0.046,0.057)
	33	CD4-350100	0.000	(-0.018,0.019)	(-0.019,0.018)
	34	𝑅𝑎𝑡𝑒𝑟𝑖𝑠𝑘	0.230	(0.095,0.364)	(0.096,0.365)
	35	𝑌𝑒𝑎𝑟𝑠𝑑𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠-310	0.351	(0.210,0.492)	(0.207,0.490)
Overall	36	log⁡(R0,𝑟𝑒𝑓)	-1.173	(-1.301,-1.045)	(-1.304,-1.048)
	37	110⁢log⁡(𝑌𝑒𝑎𝑟𝑠𝑑𝑖𝑎𝑔𝑛𝑜𝑠𝑖𝑠3)	1.727	(1.049,2.405)	(1.064,2.420)
	38	𝑆𝑢𝑏𝑡𝑦𝑝𝑒C	0.322	(0.140,0.505)	(0.131,0.498)
	39	𝑆𝑢𝑏𝑡𝑦𝑝𝑒01⁢_⁢𝐴𝐸	0.246	(0.020,0.472)	(0.004,0.460)
	40	𝑆𝑢𝑏𝑡𝑦𝑝𝑒02⁢_⁢𝐴𝐺	0.516	(0.374,0.659)	(0.372,0.658)
	41	𝑆𝑢𝑏𝑡𝑦𝑝𝑒A	0.404	(0.246,0.562)	(0.241,0.558)
	42	𝑆𝑢𝑏𝑡𝑦𝑝𝑒𝑜𝑡ℎ𝑒𝑟	0.401	(0.223,0.580)	(0.214,0.574)
	43	(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10)3	-0.231	(-0.337,-0.124)	(-0.345,-0.131)
	44	𝑅𝑎𝑡𝑒𝑟𝑖𝑠𝑘	0.230	(0.094,0.366)	(0.096,0.368)
	45	(𝐷𝑎𝑡𝑒𝑖𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛-1.1.1996365⋅10)4	-0.129	(-0.227,-0.031)	(-0.235,-0.038)

Appendix 2—table 1.

Establishment date models obtained with the AIC/BIC forward selection and backward elimination and their respective AIC and BIC values as well as the -values from the likelihood ratio test compared to the null model without any covariates.

Terms that were part of the respective final model are marked by .

	AIC		BIC
	Forward	Backward	Forward	Backward
Dateinfection−1.1.1996365⋅10
(Dateinfection−1.1.1996365⋅10)2		×		×
(Dateinfection−1.1.1996365⋅10)3	×	×	×	×
(Dateinfection−1.1.1996365⋅10)4	×		×
AIC	3364.3	3364.2	3364.3	3364.2
BIC	3382.4	3382.3	3382.4	3382.3
p-value from LR test	<0.0001	<0.0001	<0.0001	<0.0001

Appendix 2—table 2.

Multivariate models obtained with the AIC/BIC forward selection and backward elimination algorithms.

The terms listed in the table are the terms identified from the single determinant model selections and the crosses indicate the terms entering the multivariate models. The null model from the likelihood ratio test refers to the baseline model without any covariates (not even the subtype).

	AIC		BIC
	Forward	Backward	Forward	Backward
Subtype	×	×	×	×
(Dateinfection−1.1.1996365⋅10)2
(Dateinfection−1.1.1996365⋅10)3	×	×	×	×
(Dateinfection−1.1.1996365⋅10)4	×	×	×
𝑅𝑎𝑡𝑒𝑟𝑖𝑠𝑘	×	×	×	×
Raterisk	×	×	×
110log⁡(Yearsdiagnosis3)		×		×
Yearsdiagnosis−310	×		×
Yearsdiagnosis−310	×	×	×
(Yearsdiagnosis−310)3		×		×
CD4−35010
(Age−3210)2
AIC	3254	3252	3254	3262
BIC	3314	3331	3314	3316
p-value from LR test	<0.0001	<0.0001	<0.0001	<0.0001

40 in total

Review 1. Sex, sun, sea, and STIs: sexually transmitted infections acquired on holiday.

Authors: K E Rogstad
Journal: BMJ Date: 2004-07-24

2. Monitoring HIV/AIDS in Europe's migrant communities and ethnic minorities.

Authors: Julia Del Amo; Georg Bröring; Françoise F Hamers; Andrea Infuso; Kevin Fenton
Journal: AIDS Date: 2004-09-24 Impact factor: 4.177

3. The HIV care cascade in Switzerland: reaching the UNAIDS/WHO targets for patients diagnosed with HIV.

Authors: Philipp Kohler; Axel J Schmidt; Matthias Cavassini; Hansjakob Furrer; Alexandra Calmy; Manuel Battegay; Enos Bernasconi; Bruno Ledergerber; Pietro Vernazza
Journal: AIDS Date: 2015-11-28 Impact factor: 4.177

4. Comparing methods for estimating R0 from the size distribution of subcritical transmission chains.

Authors: S Blumberg; J O Lloyd-Smith
Journal: Epidemics Date: 2013-06-03 Impact factor: 4.396

Review 5. Global epidemiology of HIV among female sex workers: influence of structural determinants.

Authors: Kate Shannon; Steffanie A Strathdee; Shira M Goldenberg; Putu Duff; Peninah Mwangi; Maia Rusakova; Sushena Reza-Paul; Joseph Lau; Kathleen Deering; Michael R Pickles; Marie-Claude Boily
Journal: Lancet Date: 2014-07-22 Impact factor: 79.321

Review 6. Analysis of tuberculosis transmission between nationalities in the Netherlands in the period 1993-1995 using DNA fingerprinting.

Authors: M W Borgdorff; N Nagelkerke; D van Soolingen; P E de Haas; J Veen; J D van Embden
Journal: Am J Epidemiol Date: 1998-01-15 Impact factor: 4.897

7. Initiation of Antiretroviral Therapy in Early Asymptomatic HIV Infection.

Authors: Jens D Lundgren; Abdel G Babiker; Fred Gordin; Sean Emery; Birgit Grund; Shweta Sharma; Anchalee Avihingsanon; David A Cooper; Gerd Fätkenheuer; Josep M Llibre; Jean-Michel Molina; Paula Munderi; Mauro Schechter; Robin Wood; Karin L Klingman; Simon Collins; H Clifford Lane; Andrew N Phillips; James D Neaton
Journal: N Engl J Med Date: 2015-07-20 Impact factor: 91.245

8. A joint back calculation model for the imputation of the date of HIV infection in a prevalent cohort.

Authors: Patrick Taffé; Margaret May
Journal: Stat Med Date: 2008-10-15 Impact factor: 2.373

9. Increases in Condomless Sex in the Swiss HIV Cohort Study.

Authors: Roger D Kouyos; Barbara Hasse; Alexandra Calmy; Matthias Cavassini; Hansjakob Furrer; Marcel Stöckle; Pietro L Vernazza; Enos Bernasconi; Rainer Weber; Huldrych F Günthard; V Aubert; M Battegay; E Bernasconi; J Böni; H C Bucher; C Burton-Jeangros; A Calmy; M Cavassini; G Dollenmaier; M Egger; L Elzi; J Fehr; J Fellay; H Furrer; C A Fux; M Gorgievski; H Günthard; D Haerry; B Hasse; H H Hirsch; M Hoffmann; I Hösli; C Kahlert; L Kaiser; O Keiser; T Klimkait; R Kouyos; H Kovari; B Ledergerber; G Martinetti; B Martinez de Tejada; K Metzner; N Müller; D Nadal; D Nicca; G Pantaleo; A Rauch; S Regenass; M Rickenbach; C Rudin; F Schöni-Affolter; P Schmid; J Schüpbach; R Speck; P Tarr; A Trkola; P Vernazza; R Weber; S Yerly
Journal: Open Forum Infect Dis Date: 2015-06-03 Impact factor: 3.835

10. HIV-1 transmission between MSM and heterosexuals, and increasing proportions of circulating recombinant forms in the Nordic Countries.

Authors: Joakim Esbjörnsson; Mattias Mild; Anne Audelin; Jannik Fonager; Helena Skar; Louise Bruun Jørgensen; Kirsi Liitsola; Per Björkman; Göran Bratt; Magnus Gisslén; Anders Sönnerborg; Claus Nielsen; Patrik Medstrand; Jan Albert
Journal: Virus Evol Date: 2016-04-27

7 in total

1. Quantifying the fitness cost of HIV-1 drug resistance mutations through phylodynamics.

Authors: Denise Kühnert; Roger Kouyos; George Shirreff; Jūlija Pečerska; Alexandra U Scherrer; Jürg Böni; Sabine Yerly; Thomas Klimkait; Vincent Aubert; Huldrych F Günthard; Tanja Stadler; Sebastian Bonhoeffer
Journal: PLoS Pathog Date: 2018-02-20 Impact factor: 6.823

2. Downgrading disease transmission risk estimates using terminal importations.

Authors: Spencer J Fox; Steven E Bellan; T Alex Perkins; Michael A Johansson; Lauren Ancel Meyers
Journal: PLoS Negl Trop Dis Date: 2019-06-14

3. Limited Sustained Local Transmission of HIV-1 CRF01_AE in New South Wales, Australia.

Authors: Francesca Di Giallonardo; Angie N Pinto; Phillip Keen; Ansari Shaik; Alex Carrera; Hanan Salem; Barbara Telfer; Craig Cooper; Karen Price; Christine Selvey; Joanne Holden; Nadine Bachmann; Frederick J Lee; Dominic E Dwyer; Sebastián Duchêne; Edward C Holmes; Andrew E Grulich; Anthony D Kelleher
Journal: Viruses Date: 2019-05-27 Impact factor: 5.048

4. Subtype-specific differences in transmission cluster dynamics of HIV-1 B and CRF01_AE in New South Wales, Australia.

Authors: Francesca Di Giallonardo; Angie N Pinto; Phillip Keen; Ansari Shaik; Alex Carrera; Hanan Salem; Christine Selvey; Steven J Nigro; Neil Fraser; Karen Price; Joanne Holden; Frederick J Lee; Dominic E Dwyer; Benjamin R Bavinton; Jemma L Geoghegan; Andrew E Grulich; Anthony D Kelleher
Journal: J Int AIDS Soc Date: 2021-01 Impact factor: 6.707

5. Increased HIV Subtype Diversity Reflecting Demographic Changes in the HIV Epidemic in New South Wales, Australia.

Authors: Francesca Di Giallonardo; Angie N Pinto; Phillip Keen; Ansari Shaik; Alex Carrera; Hanan Salem; Christine Selvey; Steven J Nigro; Neil Fraser; Karen Price; Joanne Holden; Frederick J Lee; Dominic E Dwyer; Benjamin R Bavinton; Andrew E Grulich; Anthony D Kelleher
Journal: Viruses Date: 2020-12-06 Impact factor: 5.048

6. Similar But Different: Integrated Phylogenetic Analysis of Austrian and Swiss HIV-1 Sequences Reveal Differences in Transmission Patterns of the Local HIV-1 Epidemics.

Authors: Katharina Kusejko; Nadine Tschumi; Sandra E Chaudron; Huyen Nguyen; Manuel Battegay; Enos Bernasconi; Jürg Böni; Michael Huber; Alexandra Calmy; Matthias Cavassini; Alexander Egle; Katharina Grabmeier-Pfistershammer; Bernhard Haas; Hans Hirsch; Thomas Klimkait; Angela Öllinger; Matthieu Perreau; Alban Ramette; Baharak Babouee Flury; Mario Sarcletti; Alexandra Scherrer; Patrick Schmid; Sabine Yerly; Robert Zangerle; Huldrych F Günthard; Roger D Kouyos
Journal: J Acquir Immune Defic Syndr Date: 2022-08-01 Impact factor: 3.771

Review 7. Inferring the age difference in HIV transmission pairs by applying phylogenetic methods on the HIV transmission network of the Swiss HIV Cohort Study.

Authors: Katharina Kusejko; Claus Kadelka; Alex Marzel; Manuel Battegay; Enos Bernasconi; Alexandra Calmy; Matthias Cavassini; Matthias Hoffmann; Jürg Böni; Sabine Yerly; Thomas Klimkait; Matthieu Perreau; Andri Rauch; Huldrych F Günthard; Roger D Kouyos
Journal: Virus Evol Date: 2018-09-18

7 in total