| Literature DB >> 28053012 |
Oliver Ratmann1, Emma B Hodcroft2, Michael Pickles3, Anne Cori3, Matthew Hall2,4, Samantha Lycett2,5, Caroline Colijn6, Bethany Dearlove7, Xavier Didelot3, Simon Frost7, A S Md Mukarram Hossain7, Jeffrey B Joy8,9, Michelle Kendall6, Denise Kühnert10,11, Gabriel E Leventhal10,12, Richard Liang9, Giacomo Plazzotta6, Art F Y Poon13, David A Rasmussen11, Tanja Stadler11, Erik Volz3, Caroline Weis11, Andrew J Leigh Brown2, Christophe Fraser3,4.
Abstract
Viral phylogenetic methods contribute to understanding how HIV spreads in populations, and thereby help guide the design of prevention interventions. So far, most analyses have been applied to well-sampled concentrated HIV-1 epidemics in wealthy countries. To direct the use of phylogenetic tools to where the impact of HIV-1 is greatest, the Phylogenetics And Networks for Generalized HIV Epidemics in Africa (PANGEA-HIV) consortium generates full-genome viral sequences from across sub-Saharan Africa. Analyzing these data presents new challenges, since epidemics are principally driven by heterosexual transmission and a smaller fraction of cases is sampled. Here, we show that viral phylogenetic tools can be adapted and used to estimate epidemiological quantities of central importance to HIV-1 prevention in sub-Saharan Africa. We used a community-wide methods comparison exercise on simulated data, where participants were blinded to the true dynamics they were inferring. Two distinct simulations captured generalized HIV-1 epidemics, before and after a large community-level intervention that reduced infection levels. Five research groups participated. Structured coalescent modeling approaches were most successful: phylogenetic estimates of HIV-1 incidence, incidence reductions, and the proportion of transmissions from individuals in their first 3 months of infection correlated with the true values (Pearson correlation > 90%), with small bias. However, on some simulations, true values were markedly outside reported confidence or credibility intervals. The blinded comparison revealed current limits and strengths in using HIV phylogenetics in challenging settings, provided benchmarks for future methods' development, and supports using the latest generation of phylogenetic tools to advance HIV surveillance and prevention.Entities:
Keywords: HIV transmission and prevention; molecular epidemiology of infectious diseases; viral phylogenetic methods validation
Mesh:
Year: 2016 PMID: 28053012 PMCID: PMC5854118 DOI: 10.1093/molbev/msw217
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Aims of the PANGEA Phylodynamic Methods Comparison Exercise.
| Objectives | Reporting Variable | |
|---|---|---|
| 1 | Identify incident trends during the intervention | Consider the year |
| 2 | Estimate HIV-1 incidence after the intervention | Participants were asked to report %Incidence defined as |
| 3 | Quantify the reduction in HIV-1 incidence at the end of the intervention | Participants were asked to report the incidence ratio |
| 4 | Estimate the proportion of transmissions from early and acute HIV, just before the intervention | Participants were asked to report the proportion of new cases in year |
| 5 | Estimate the proportion of transmissions from early and acute HIV, after the intervention | Participants were asked to report the proportion of new cases in year |
| To estimate the impact of the following controlled covariates on the reporting variables: | ||
| 6 | Availability of full genome sequences (HIV-1 | |
| 7 | Sequence sampling frame: Sequence coverage at the end of the simulation; Rapid increases in sequence coverage; Sampling duration after intervention start | |
| 8 | Frequency of viral introductions into the modeled study population | |
| 9 | Inference of dated viral phylogenies from sequence data | |
Phylogenetic Methods Used in the PANGEA Phylodynamic Methods Comparison Exercise.
| Team | Team Members | Method | Model-based analysis | Model Overview | Simulated Data Used To Inform Inference | Fitting Process | Availability |
|---|---|---|---|---|---|---|---|
| Basel-Zürich | C. Weis, G.E. Leventhal, D. Kühnert, D.A. Rasmussen, T. Stadler | Birth–death skyline method with sampled ancestors | Yes | Stochastic birth–death model with sampled ancestors to estimate incidence and incidence reductions, and multi-type birth death model corresponding to two stages of infection to estimate the proportion of early transmissions. Time trends in parameters were modeled with serial time intervals during which parameters were assumed constant. Viral introductions were not modeled | All sequences and full trees to estimate birth–death parameters; cross-sectional survey data | Markov Chain Monte Carlo | |
| Cambridge | B. Dearlove, M. Hossain, S. Frost | Meta-population coalescent approach | Yes | Standard SI, SIS and SIR models were averaged. Model parameters did not change over time. Viral introductions were not modeled. | All sequences and full trees. | Markov Chain Monte Carlo | |
| Cambridge-London | E. Volz, M. Hossain, S. Frost | Structured coalescent | Yes | Deterministic compartment model stratified by gender, disease progression, diagnosis and treatment status, risk behavior. Time trends in baseline transmission rates were modeled with 4-parameter generalized logistic function. Diagnosis and treatment uptake rates changed at intervention start. Viral introductions were modeled with a source deme. | All sequences and sub-trees including all internal nodes 30 before the last sample; cross-sectional survey data; and gender and CD4 count at time of diagnosis for Regional datasets. | Parallel Markov Chain Monte Carlo | http://colgem.r-forge.r-project.org/ (last accessed October 14, 2016) |
| London | C. Colijn, M. Kendall, G. Plazotta, X. Didelot | Bayesian transmission chain analyzer | Yes | Stochastic generalized branching model with generation time modeled to represent three infection stages. Model parameters did not change over time. Viral introductions were not modeled. | All sequences and full trees on village datasets; sequences in trees with at least 80 tips on regional datasets. | Reversible-jump Markov Chain Monte Carlo | |
| Vancouver | A. Poon, J. Joy, R. Liang | ABC kernel method | Yes | Deterministic compartment model stratified by infection status, three stages of infection, and risk behavior. Model parameters did not change over time. Viral introductions were modeled with a source deme. | All sequences and full trees. | Approximate Bayesian Computation |
Simulated Datasets of the Phylodynamic Methods Comparison Exercise.
| Model | Dataset | Purpose | %Acute | Intervention Scale Up | Viral Introductions | Sequences (#) | Sequence Coverage in the Last Year of the Simulation | Sequences After Intervention Start | Sampling Duration After Intervention Start |
|---|---|---|---|---|---|---|---|---|---|
| Identify 60% reduction in incidence during intervention and 10% early transmissions. | L | F | 5 | 1,600 | 8 | 50 | 5 | ||
| Identify 30% reduction in incidence during intervention and 10% early transmissions. | L | S | 5 | 1,600 | 8 | 50 | 5 | ||
| Identify 60% reduction in incidence during intervention and 40% early transmissions. | H | F | 5 | 1,600 | 8 | 50 | 5 | ||
| Identify 30% reduction in incidence during intervention and 40% early transmissions. | H | S | 5 | 1,600 | 8 | 50 | 5 | ||
| O | As D, and evaluate impact of sampling frame: shorter duration of intensive sequencing. | L | F | 5 | 1,280 | 8 | 50 | 3 | |
| T | As D, and evaluate impact of tree reconstruction. | L | F | 5 | 1,600 | 8 | 50 | 5 | |
| S | As D, and evaluate impact of sampling frame: most sequences from after intervention start. | L | F | 5 | 1,600 | 8 | 85 | 5 | |
| I | As D, and evaluate impact of sampling frame: higher sequence coverage. | L | F | 5 | 3,200 | 16 | 50 | 5 | |
| R | As C, and evaluate impact of tree reconstruction. | L | S | 5 | 1,600 | 8 | 50 | 5 | |
| Q | As C, and evaluate impact of sampling frame: most sequences from after intervention start. | L | S | 5 | 1,600 | 8 | 85 | 5 | |
| G | As C, and evaluate impact of sampling frame: higher sequence coverage. | L | S | 5 | 3,200 | 16 | 50 | 5 | |
| N | Control simulation, no intervention. | L | None | 5 | 1,600 | 8 | 50 | 5 | |
| F | As A, and evaluate impact of sampling frame: shorter duration of intensive sequencing. | H | F | 5 | 1,280 | 8 | 50 | 3 | |
| L | As A, and evaluate impact of tree reconstruction. | H | F | 5 | 1,600 | 8 | 50 | 5 | |
| J | As A, and evaluate impact of sampling frame: higher sequence coverage. | H | F | 5 | 3,200 | 16 | 50 | 5 | |
| P | As A, and evaluate impact of higher proportion of viral introductions. | H | F | 20 | 1,600 | 8 | 50 | 5 | |
| H | As B, and evaluate impact of tree reconstruction. | H | S | 5 | 1,600 | 8 | 50 | 5 | |
| K | As B, and evaluate impact of sampling frame: higher sequence coverage. | H | S | 5 | 3,200 | 16 | 50 | 5 | |
| E | As B, and evaluate impact of higher proportion of viral introductions. | H | S | 20 | 1,600 | 8 | 50 | 5 | |
| M | Control simulation, no intervention. | H | None | 5 | 1,600 | 8 | 50 | 5 | |
| Identify 40% reduction in incidence during intervention and 4% early transmissions. | L | F | <2 | 777 | 25 | >95 | 5 | ||
| Identify 15% reduction in incidence during intervention and 4% early transmissions. | L | S | <2 | 857 | 25 | >95 | 5 | ||
| Identify 40% reduction in incidence during intervention and 20% early transmissions. | H | F | <2 | 957 | 25 | >95 | 5 | ||
| Identify 15% reduction in incidence during intervention and 20% early transmissions. | H | S | <2 | 1,040 | 25 | >95 | 5 | ||
| 5 | As 3, and evaluate impact of sampling frame: higher sequence coverage. | L | F | <2 | 1,469 | 50 | >95 | 5 | |
| 11 | Similar to 3, without imported sequences. | L | F | 0 | 638 | 25 | >95 | 5 | |
| 8 | As 2, and evaluate impact of sampling frame: higher sequence coverage. | L | S | <2 | 1,630 | 50 | >95 | 5 | |
| 9 | Similar to 2, without imported sequences. | L | S | 0 | 686 | 25 | >95 | 5 | |
| 0 | Control simulation, no intervention. | L | None | <2 | 872 | 25 | >95 | 5 | |
| 6 | As 1, and evaluate impact of sampling frame: higher sequence coverage. | H | F | <2 | 1,831 | 50 | >95 | 5 | |
| 12 | Similar to 1, without imported sequences. | H | F | 0 | 956 | 25 | >95 | 5 | |
| 7 | As 4, and evaluate impact of sampling frame: higher sequence coverage. | H | S | <2 | 1,996 | 50 | >95 | 5 | |
| 10 | Similar to 4, without imported sequences. | H | S | 0 | 1,012 | 25 | >95 | 5 |
Variables in shaded columns were unknown to participants at time of analysis.
Values range from 5% to 40%, reflecting recent estimates for endemic-phase epidemics in sub-Saharan Africa (Cohen et al. 2012).
Range reflects optimistic and pessimistic scenarios in prevention trials in sub-Saharan Africa (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014).
Range includes frequent viral introductions as reported in settings with highly mobile populations (Grabowski et al. 2014).
In comparison to the large sequence datasets that are available for concentrated epidemics in Europe or North America, the lower values here reflect challenges in achieving high sequence coverage where large populations are infected. Higher values reflect geographically focused sequencing efforts such as in Mochudi, Botswana (Carnegie et al. 2014).
Values reflect the duration of typical prevention trial settings, and that most sequences are obtained after intervention start (Iwuji et al. 2013; Moore et al. 2013; Hayes et al. 2014).
Out of all individuals that were alive and infected in the last calendar year of the simulation, the proportion that had ever a sequence taken.
For datasets in bold, only viral sequences were disclosed. For all other datasets, only viral phylogenies were provided.
FSimulation pipeline to generate HIV-1 sequence data, viral phylogenies, and accompanying individual-level data. Two simulation models (Regional and Village) were implemented for the methods comparison. The two individual-level epidemic and intervention models generated HIV-1 transmission chains in the model population, and its components are shown in blue to green. Next, individuals were sampled for sequencing, and a viral tree was generated for these individuals. Tree generation accounted for within-host viral evolution under a neutral coalescent model. Finally, viral sequences comprising the gag, pol and env genes were simulated along the viral tree. Sequence generation accounted for known variation in evolutionary rates across genes, codon positions, and along within-host lineages. Further details are provided in supplementary tables S1 and S2, Supplementary Material online.
FSimulated epidemic scenarios under the Regional and Village models. (A) Six generalized HIV-1 epidemic scenarios were simulated in a region of ∼80,000 adult individuals using the Regional model, and (B) nine scenarios were simulated in a rural village population with an initial population of ∼6,000 individuals using the Village model. The scenarios differ in terms of incidence, the proportion of early transmissions, and scale-up of the combination prevention package during the intervention period (gray-shaded time period). From these, 33 datasets were generated, that included either viral sequences or viral trees. These datasets further varied in the sequence sampling frame and the frequency of viral introductions; see also figure 1 and table 3. Datasets E, G, I, J, K, P had more frequent viral introductions or higher sequence coverage, and are not shown. The proportion of early transmissions under the Village model was smoothed with a 3-year sliding window to better visualize trends in this smaller model population.
FEstimates of HIV-1 incidence from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team (panel) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black.
FEstimates of HIV-1 incidence reductions from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team (panel) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black.
FEstimates of the proportion of transmissions from individuals in their first 3 months of infection (early and acute HIV), before the intervention from phylogenetic methods on simulated PANGEA datasets. Submitted estimates are shown for each PANGEA dataset by research team and model simulation (panels) and type of data provided (either sequences or the viral phylogenetic tree, color). Error bars correspond to 95% credibility or confidence intervals. True values are shown in black.
FPredictors of large error in phylogenetic estimates. (A) For each response, the error in the phylogenetic estimate was calculated, and statistical outliers were identified. The plot shows error in phylogenetic estimates by team and outcome measure. For large errors, the corresponding PANGEA dataset code in table 1 is indicated. (B) The contribution of the systematically varied covariates in table 1 to the presence of outliers was quantified through partial least squares regression (PLS, see “Materials and Methods” section). The plot shows the contribution of each predictor to the variance in outlier presence in colors, and the corresponding signs of the regression coefficients are added. Estimates from team Cambridge could not be characterized due to small sample size. The impact of the error predictors varied across the primary objectives of phylogenetic inference, as well as the phylogenetic methods used. With regard to estimates of incidence and incidence reduction, a subset of phylogenetic methods was particularly sensitive to high sequence coverage, a very large proportion of sequences obtained after intervention start, and a large frequency of viral introductions. With regard to estimates of the proportion of early transmissions, outliers were in several cases best explained by true differences in the proportion of early transmissions.