Ramon Lorenzo-Redondo1, Helen R Fryer2, Trevor Bedford3, Eun-Young Kim1, John Archer4, Sergei L Kosakovsky Pond5, Yoon-Seok Chung6, Sudhir Penugonda1, Jeffrey Chipman7, Courtney V Fletcher8, Timothy W Schacker9, Michael H Malim10, Andrew Rambaut11, Ashley T Haase12, Angela R McLean2, Steven M Wolinsky1. 1. Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL 60011, USA. 2. Institute for Emerging Infections, Department of Zoology, University of Oxford, Oxford, OX1 3PS, UK. 3. Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA. 4. Centro de Investigação em Biodiversidade e Recursos Genéticos Universidade do Porto, Vairão, Portugal. 5. Department of Medicine, University of California, San Diego, CA 92093, USA. 6. Center for Infectious Disease Research, Korean National Institutes of Health, Osong, Korea. 7. Department of Surgery, University of Minnesota, Minneapolis, MN, 55455 USA. 8. Antiviral Pharmacology Laboratory, University of Nebraska Medical Center, College of Pharmacy, Omaha, NE 68198, USA. 9. Division of Infectious Diseases, University of Minnesota, Minneapolis, MN 55455, USA. 10. Department of Infectious Diseases, King's College London, Guy's Hospital, London, UK. 11. Centre for Immunology, Infection and Evolution, University of Edinburgh, Edinburgh, UK. 12. Department of Microbiology, University of Minnesota, Minneapolis, MN 55455, USA.
Abstract
Lymphoid tissue is a key reservoir established by HIV-1 during acute infection. It is a site associated with viral production, storage of viral particles in immune complexes, and viral persistence. Although combinations of antiretroviral drugs usually suppress viral replication and reduce viral RNA to undetectable levels in blood, it is unclear whether treatment fully suppresses viral replication in lymphoid tissue reservoirs. Here we show that virus evolution and trafficking between tissue compartments continues in patients with undetectable levels of virus in their bloodstream. We present a spatial and dynamic model of persistent viral replication and spread that indicates why the development of drug resistance is not a foregone conclusion under conditions in which drug concentrations are insufficient to completely block virus replication. These data provide new insights into the evolutionary and infection dynamics of the virus population within the host, revealing that HIV-1 can continue to replicate and replenish the viral reservoir despite potent antiretroviral therapy.
Lymphoid tissue is a key reservoir established by HIV-1 during acute infection. It is a site associated with viral production, storage of viral particles in immune complexes, and viral persistence. Although combinations of antiretroviral drugs usually suppress viral replication and reduce viral RNA to undetectable levels in blood, it is unclear whether treatment fully suppresses viral replication in lymphoid tissue reservoirs. Here we show that virus evolution and trafficking between tissue compartments continues in patients with undetectable levels of virus in their bloodstream. We present a spatial and dynamic model of persistent viral replication and spread that indicates why the development of drug resistance is not a foregone conclusion under conditions in which drug concentrations are insufficient to completely block virus replication. These data provide new insights into the evolutionary and infection dynamics of the virus population within the host, revealing that HIV-1 can continue to replicate and replenish the viral reservoir despite potent antiretroviral therapy.
Combinations of antiretroviral drugs routinely cripple HIV-1 production and replication to levels undetectable in the blood within weeks of starting treatment[1]. None of the current treatments, however, are capable of eradicating the virus from a long-lived reservoir in resting memory CD4+ T cells and other potential cell types that insulate the virus from antiretroviral drugs or immune surveillance[2-5]. Intermittent virus production from reactivation of a small proportion of latently infected CD4+ T cells (rather than low levels of ongoing replication) is thought to drive viral rebound detected in blood of well-suppressed patients on treatment[6-8]. Ongoing replication is considered unlikely because neither viral genetic divergence over time, nor the emergence of drug resistance mutations have been convincingly documented[9,10]. As earlier studies only examined viral sequences derived from the blood of patients who continued to suppress viral replication in that anatomic compartment[11], the conclusions are not necessarily generalizable to other compartments in the body, particularly to lymphoid tissue where the frequency of infection per cell is mostly higher[12] and the intracellular drug concentrations are much lower than in blood[13]. Under low drug concentrations, the virus may continue to replicate and evolve in sanctuary sites within the reservoir of cells in lymphoid tissue, and remain undetectable in the bloodstream for a time depending on viral population migration dynamics between the two compartments. Here we use a multi-pronged strategy of deep-sequencing, time-calibrated phylogenetic analysis, and mathematical modeling to characterize the distinct temporal structure and divergence of compartmentally sampled viral sequences. We discover ongoing replication in lymphoid tissue sanctuaries of patients despite undetectable blood levels of virus. Our sampling approach differs fundamentally from those of previous studies[14-16], which do not address evolutionary dynamics within lymphoid tissue, and better suits investigation of the dynamic nature of the viral reservoir during treatment with potent antiretroviral drugs.
HIV-1 sequence determination
To investigate the evolution and spatial dispersion of virus with high accuracy, we deep sequenced (using the Roche 454 Sciences’ GS-FLX sequencing platform) HIV-1 DNA in cells from blood and inguinal lymph nodes collected from three subjects at three separate times (at day 0, and after 3 and 6 months of treatment) described elsewhere[13]. Previous work established that viral sequences contemporaneously sampled from lymphoid tissue in different locations are genetically homogeneous[17], consistent with CD4+ T cell homing and trafficking[18]. Consequently, detailed assessment of a portion of a solitary lymph node is no more susceptible to bias than wider anatomical sampling. We also sequenced viral RNA in the plasma (day 0) from these three study subjects. Two subjects (1727 and 1679) had well-suppressed infections (< 48 copies/mL of plasma); and the third subject (1774) continued to have measureable amounts of viral RNA in plasma after 3, but not 6, months of treatment (Extended Data Fig. 1). Subjects 1727 and 1679 were each infected with HIV-1 for approximately 3 to 4 months and were antiretroviral drug naïve before the study. Subject 1774 was infected with HIV-1 for approximately 17 years and was antiretroviral-therapy experienced, but had not received any treatment for at least 1 year prior to the study.
Extended Data Figure 1
The amounts of virus and the concentrations of drugs measured during antiretroviral therapy
Panels a-c show how the number of copies HIV-1 RNA per ml of blood, the number of the HIV-1 RNA particles bound to the follicular dendritic cell network per gram of lymphoid tissue, and the number HIV-1 RNA positive cells per gram of lymphoid tissue change over the first 6 months of treatment in subjects 1774, 1727 and 1679 (a, b and c, respectively). Filled circles represent detectable measures. Unfilled circles represent undetectable measures and are plotted at the limit of detection. Panels d-f show antiretroviral concentrations in cells from lymph node (dashed line) or blood (solid line) in subjects 1774, 1727 and 1679 (d, e and f, respectively) (see Methods). Intracellular TFV-diphosphate concentrations (fmol/106 cells) are shown in orange, FTC-triphosphate (fmol/106 cells) in green, ATV (ng/mL) in purple, and EFV (ng/mL) in blue. Samples with concentrations that were below the limits of quantification (2.5 fmol/106 cells, 2.5 fmol/106 cells, 0.014 ng/mL and 0.063 ng/mL, respectively) were assigned a value of 1 for graphical illustration purposes.
We aligned individual reads with an average length of 548bp to a consensus sequence using reference-guided assembly, and corrected sequencing errors for potentially inflated estimates of genetic diversity[19]. We then used a previously described approach[20] to reconstruct the minimum number of viral haplotypes needed to adequately explain the observed reads. We calculated the sequencing error rate and set the cut-off for the subsequent analyses using a known internal control sequence. We found no significant evidence for recombinant sequences that could bias the analysis. High coverage enabled us to correct PCR and sequencing errors (leaving an average 25,000 final long-reads per sample) to detect variants present in at least 0.04% of the virus population (see Methods). The sequences from the Pol region of HIV-1 that spanned the genomic region encoding the viral enzymes protease or reverse transcriptase showed no evidence of new mutations that confer resistance to the particular antiretroviral drug used (data not shown).To avoid uncertainties in the haplotype detection due to sparse sampling and prevent systematically biased evolutionary analyses, we established a higher number of template molecules than the depth of the sequence data by a median 9.6 fold (range 3.2 to 362 fold). The high coverage of ultra deep sequencing ensured reliable detection of low-frequency viral variants (see Methods). In support of this conclusion, we found limited variability across inferred haplotypes in two completely independent technical replicates made from the same RNA or DNA sample at each time point from each subject using the same procedures (average Spearman rank correlations coefficient between single nucleotide polymorphism frequencies across replicates of ρ = 0.832, interquartile range across samples, ρ = 0.820 to 0.851; 93.7% of haplotypes above 1% frequency appeared above 1% frequency in the replicate; 97.7% of haplotypes below 1% frequency appeared below 1% frequency in the replicate), indicating that the haplotype representation is not notably biased from random amplification of some sequences and not others. The high degree of concordance between technical replicates validated our approach for the computational characterization of the viral populations, which is robust with respect to experimental error and stochastic effect[20].
Phylogenies show temporal structure
The inferred haplotypes corresponding to the Gag or Pol regions of HIV-1 were subjected to maximum-likelihood methods of phylogeny estimation (Extended Data Figs. 2 and 3). We masked out the guanosines in the APOBEC3 trinucleotide contexts of the edited sites from the alignments to avoid distortion and retain the phylogenetic information[20] (Extended Data Figs. 2 and 3). Phylogenetic relationships between the distinct haplotypes showed a temporal structure consistent with the molecular clock (continuing nucleotide substitutions occurring at a constant rate), as evidenced by strong correlation between root-to-tip distance and sampling date in the regression analyses (despite the short branches). Branch support computed using an approximate likelihood ratio test and the proportion of sites that are different (p-distance) verify the divergence of haplotypes between day 0 and after 6 months of treatment in most of the Gag and Pol regions of HIV-1 analyzed (Extended Data Table 1). Consistent with ongoing replication rather than sampling of different virus populations in lymph nodes, viral sequences contemporaneously sampled from lymphoid tissue and blood showed a similar degree of divergence. With removal of the haplotypes found to harbor repetitive inactivating base substitutions of guanosine-to-adenosine (G-to-A) the evolutionary lineage emerged that lead up to the APOBEC3-mediated hypermutation event, and the evolutionary rate estimates (range, 6.24 × 10−4 to 1.02 × 10−3 substitutions per site per month; Extended Data Table 2) are consistent with those of intra-host virus estimations (range, 5.22 × 10−4 to 8.42 × 10−4 substitutions per site per month)[21]. We therefore conclude that continued virus replication contributes to the viral reservoir.
Extended Data Figure 2
Phylogenies and Highlighter plots for the Gag region of HIV-1
Maximum-likelihood trees were constructed using gene sequences from the Gag region of HIV-1 from lymph node and blood before and after the guanosines within all possible APOBEC3 trinucleotide sequence context of edited sites were masked in the alignments, regardless of their presence in hypermutant or non-hypermutant sequences, to avoid their distortion in the phylogenetic reconstructions. Branch tips are colored according to compartment sampled: red for plasma; gold for lymph node; and blue for blood. The progressive shading of the colors of the branch tips indicate the points in time sampled. Phylogenetic trees reconstructed from the haplotypes in which the guanosines in the APOBEC3 trinucleotide context of the edited sites are masked in the alignments correct the skewing effect caused by clustering of shared haplotypes that harbor repetitive G-to-A substitutions and longer branch lengths caused by a larger number of these mutations in the hypermutated sequences whilst retaining the phylogenetic information. The horizontal scale indicates the expected number of substitutions per nucleotide site per unit time with haplotypes from later time points having diverged more. The Highlighter plots show the haplotypes from the lymphoid tissue and blood time point clusters aligned to the plasma virus sequence from day 0. The particular nucleotide changes are color coded in the alignment (thymidine, red; adenosine, green; cytosine, blue; and guanosine, orange). Magenta circles represent APOBEC3-induced G-to-A change in a trinucleotide context of the edited sites, which are distinguishable from the more random error-prone viral reverse transcriptase and RNA polymerase II replicating enzyme induced mutations[6]. Gene sequences from the Gag region of HIV-1 from Subject 1774, who continued to have measureable amounts of HIV-1 RNA in plasma on treatment, and Subjects 1727 and 1679 who were well-suppressed on treatment (a, b and c, respectively) before and after the guanosines within the particular APOBEC3 trinucleotide sequence context of edited sites were masked in the entire sequence alignment (left and right panels, respectively).
Figure 3
Drug-dependent fitness landscape
Effective reproductive numbers for drug-sensitive (R orange line) and partially drug-resistant (R, blue line) strains are driven by the effective drug concentration of a single drug in the relevant compartment. Grey lines mark thresholds separating three possible outcomes. When effective drug concentrations are low, the benefit of drug resistance does not overcome the fitness cost of mutations and drug-sensitive strains dominate. This ceases to be true at intermediate effective drug concentrations and drug-resistant strains dominate. At high concentrations, both R and R fall below one (red line), neither strain can grow and virus replication is halted.
Extended Data Figure 3
Phylogenies and Highlighter plots for the Pol region of HIV-1
Maximum-likelihood trees were constructed using gene sequences from the Pol region (retrotranscriptase [pol2]) of HIV-1 from lymph node and blood before and after the guanosines within all possible APOBEC3 trinucleotide sequence context of edited sites were masked in the alignments, regardless of their presence in hypermutant or non-hypermutant sequences, to avoid their distortion in the phylogenetic reconstructions. Branch tips are colored according to compartment sampled: red for plasma; gold for lymph node; and blue for blood. The progressive shading of the colors of the branch tips indicate the points in time sampled. The horizontal scale indicates the expected number of substitutions per nucleotide site per unit time with haplotypes from later time points having diverged more. The Highlighter plots show the haplotypes from the lymphoid tissue and blood time point clusters aligned to the plasma virus sequence from day 0. The particular nucleotide changes are color coded in the alignment (thymidine, red; adenosine, green; cytosine, blue; and guanosine, orange). Magenta circles represent APOBEC3-induced G-to-A change in a trinucleotide context of the edited sites. Gene sequences from the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme reverse transcriptase from Subjects 1774, 1727, and 1679 (a, b and c, respectively) before and after the guanosines within the particular APOBEC3 trinucleotide sequence context of edited sites were masked in the entire sequence alignment (left and right panels, respectively).
Extended Data Table 1
The genetic distance measured between the haplotypes in the Gag or Pol regions of HIV-1
We used sequence data from the Gag or Pol regions (protease [pol1] and retrotranscriptase [pol2]) regions of HIV-1 (range, 315bp to 487bp) to perform a genetic distance analysis. We masked the guanosines in APOBEC3 trinucleotide contexts of the edited sites from the alignments. We compared the proportion of substitutions per site (p-distance) for haplotypes within lymph node or blood from each subject at 6 months to the most common haplotype present at day 0. Most comparisons were statistically significant (Mann-Whitney U test, P-values < 0.1), indicating ongoing diversification of the viral populations.
Patient
Compartment
Region
time=0
time=6
p-value
1774
LN
gag
4.96E-03
1.14E-02
<0.0001
poll
6.38E-03
5.58E-03
0.3783
pol2
5.85E-03
1.26E-02
0.0208
PB
gag
4.20E-03
1.20E-02
<0.0001
poll
1.83E-03
1.20E-02
<0.0001
pol2
9.72E-03
1.07E-02
0.0070
1727
LN
gag
5.73E-05
8.59E-03
<0.0001
pol1
1.35E-03
1.59E-02
<0.0001
pol2
1.08E-03
5.26E-03
<0.0001
PB
gag
NA
NA
NA
pol1
1.87E-03
5.26E-03
0.3280
pol2
1.16E-03
1.01E-02
<0.0001
1679
LN
gag
6.68E-04
3.85E-04
0.5050
pol1
1.73E-04
2.16E-03
<0.0001
pol2
1.54E-03
3.28E-03
0.0023
PB
gag
4.35E-03
5.97E-03
0.8799
pol1
2.23E-03
4.36E-03
0.0989
pol2
0.00E+00
1.57E-03
0.2383
Extended Data Table 2
HIV-1 evolutionary rate calculated after removing hypermutated haplotypes
We used a linear regression model to estimate the evolutionary rate for the Gag or Pol regions (protease [pol1] and retrotranscriptase [pol2]) of HIV-1. We calculated the slope (μ) of the linear regression between time and the direct pairwise genetic distances (number of substitutions per site per month) from the most common haplotype at day 0 for each subject. Haplotypes found to harbor G-to-A hypermutant sequences were removed from the analysis to limit the effect of inactivating mutations on the estimates. There is very strong evidence, now presented in numerous studies[17], that acute HIV-1 infections are by and large founded by a single (or at best a few) viral strains; this initial bottleneck is followed by rapid population diversification in the absence of treatment. The structures of our intra-host phylogenies at day 0 support this pattern. Even if multiple populations are transmitted and are able to establish infection and co-circulate, Bayesian phylodynamics models can properly account for it, and estimate evolutionary rates accurately.
Patient
Region
Compartment
μ
Std Err
1774
gag
LN
1.54E-03
1.42E-05
PB
1.55E-03
9.33E-06
poll
LN
−4.87E-04
2.17E-05
PB
1.58E-03
1.04E-05
pol2
LN
1.79E-03
2.44E-05
PB
1.60E-04
3.90E-05
1727
gag
LN
1.87E-03
1.49E-05
PB
NA
NA
poll
LN
1.81E-03
1.25E-05
PB
6.19E-04
1.13E-05
pol2
LN
1.44E-03
1.36E-05
PB
1.70E-03
6.56E-06
1679
gag
LN
4.74E-04
2.25E-05
PB
8.34E-04
5.09E-05
poll
LN
1.05E-03
8.89E-06
PB
3.90E-04
2.77E-05
pol2
LN
4.89E-04
2.20E-05
PB
5.12E-04
7.56E-05
Genetic differentiation due to migration
The pairwise fixation index (FST), a standard measure of genetic differentiation between populations, confirmed significant genetic variation between lymph node and blood at each time point (Extended Data Table 3)[22,23]. Because FST measures of population structure among sampled haplotypes in spatially distinct compartments can be affected by selection and migration, we used an unrestricted branch-site random effects model to test whether the proportion of sites along the branches of the phylogeny significantly differ among lineages, indicating episodic diversifying selection[24] (see Methods and Extended Data Table 4). We restricted the test to internal branches, which capture at least one and likely multiple rounds of virus replication, to mitigate the biasing effects of neutral or deleterious mutations on the ratio of nonsynonymous-to-synonymous substitution rates (ω) estimates where selection has not yet fully filtered such population level variation[25,26]. Except for one study subject (1679), where a small proportion of sites (0.3%) were under strong diversifying positive selection along internal tree branches (likelihood ratio test, P~10−6), we found scarce evidence to suggest the virus is evolving in response to strong selective forces. We concluded that the FST values are more likely due to migration of haplotypes from one compartment to another.
Extended Data Table 3
Patterns of genetic divergence measured between viral haplotypes in lymphoid tissue and blood
S and F pairwise genetic differentiation (a and b, respectively) between peripheral blood and lymph samples for the Gag and Pol regions of HIV-1 from subjects 1774, 1727, and 1679 at day 0 and after 6 months of antiretroviral therapy.
a
Day 0
Month 6
Snn
Probability (Random Snn > Observed Snn)
Snn
Probability (Random Snn > Observed Snn)
1774
Gag
0.71
0
0.53
0.086
Pol1
0.71
0
0.91
0
Pol2
0.86
0.001
0.90
0.003
1727
Gag
NA
NA
1.00
0
Pol1
0.91
0
0.98
0
Pol2
0.98
0
1.00
0
1679
Gag
0.72
0
0.97
0
Pol1
0.75
0.04
0.77
0.011
Pol2
0.86
0.003
0.86
0
b
Day 0
Month 6
Fst
Probability (Random Fst > Observed Fst)
Fst
Probability (Random Fst > Observed Fst)
1774
Gag
0.13
0
−0.01
0.51
Pol1
0.13
0
0.41
0
Pol2
0.05
0.236
0.36
0
1727
Gag
NA
NA
0.76
0
Pol1
0.51
0
0.58
0
Pol2
0.73
0
0.34
0
1679
Gag
−0.06
0.75
0.67
0
Pol1
0.37
0.018
0.02
0.394
Pol2
0.23
0.154
0.35
0
Extended Data Table 4
Episodic selection estimated across sites along internal branches of the phylogeny
A branch-site evolutionary model provided a statistical framework to search for evidence of episodic selection on internal branches in the tree. The ω distribution is given for both internal and external branches.
Subject ID
Gene
Branch-site ω distribution (internal branches)
Branch-site ω distribution (terminal branches)
P-value for episodic diversifying selection on internal branches
JS1774
gag
ω1 = 0.09 (81%)ω2 = 0.35 (5%)ω3 = 1.00 (14%)
ω1 = 0.09 (69%)ω2 = 0.15 (14%)ω3 = 0.32 (17%)
1.0
JS1774
pol (PR)
ω1 = 0.03 (95%)ω2 = 0.07 (3%)ω3 = 1.00 (2%)
ω1 = 0.09 (99.9%)ω2 = ∞ (0.1%)
1.0
JS1774
pol (RT)
ω1 = 0.11 (89%)ω2 = 0.35 (11%)
ω1 = 0.16 (71%)ω2 = 0.22 (29%)
1.0
SM1727
gag
ω1 = 0.16 (91%)ω2 = 0.36 (9%)
ω1 = 0.00 (87%)ω2 = 5.00 (13%)
1.0
SM1727
pol (PR)
ω1 = 0.05 (94%)ω2 = 0.24 (2%)ω3 = 1.00 (4%)
ω1 = 0.09 (60%)ω2 = 0.21 (13%)ω3 = 0.25 (27%)
1.0
SM1727
pol (RT)
ω1 = 0.09 (87%)ω2 = 0.51 (2%)ω3 = 1.00 (11%)
ω1 = 0.13 (99.6%)ω2 = ∞ (0.4%)
1.0
TM1679
gag
ω1 = 0.15 (63%)ω2 = 1.19 (37%)
ω1 = 0.68 (100%)
1.0
TM1679
pol (PR)
ω1 = 0.14 (82%)ω2 = 0.54 (0.5%)ω3 = 1.00 (17%)
ω1 = 0.19 (97%)ω2 = 0.15 (2.8%)ω3 = ∞ (0.4%)
1.0
TM1679
pol (RT)
ω1 = 0.19 (94.8%)ω2 = 0.93 (4.9%)ω3 = ∞ (0.3%)
ω1 = 0.31 (100%)
10−6
The phyloanatomic history of HIV-1
To infer evolutionary patterns and population dynamic processes from the time-structured sequence data, we used a Bayesian statistical framework that estimates the substitution rate, divergence time, and demographic history of the sampled viral lineages to structure-rooted, time-resolved phylogenies[27,28] (see Methods). This approach resolves evolutionary patterns to infer the timing and direction of the key migrations of the virus within hosts. Figure 1 shows the phyloanatomic history of HIV-1 within the study subjects inferred from the ancestral and descendent haplotypes on a Bayesian maximum clade credibility (MCC) phylogenetic tree. The branching patterns in the trees, which reconstruct the origins and trace the flow of HIV-1 within hosts, and the tissue of origin of the internal nodes, show strong statistical support (highlighted by their posterior probabilities). Branch lengths in these time-structured trees (left panels with all haplotypes; right panels with the putative G-to-A hypermutant sequences removed to avoid distortion) represent posterior median estimates of calendar time. The temporal structure in the trees shows a strong clock-like signal and rates consistent with HIV-1 within-host evolution. The best-fit model included a strict molecular clock and assumed a constant population size, though a model with a relaxed molecular clock gave qualitatively similar results (data not shown).
Figure 1
Time-structured phyloanatomic history of haplotypes in lymph nodes and blood
MCC phylogenetic trees constructed from the complete alignments of the haplotypes from the Gag region of HIV-1 for subjects 1774, 1727 and 1679 with all haplotypes (a, b and c, respectively) and with the haplotypes containing G-to-A hypermutations removed (d, e and f, respectively). Branch colors represent the most probable (modal) anatomic location of their descendent node inferred through Bayesian reconstruction of the ancestral state, along with the posterior probabilities for the location of their parental nodes.
Based on inferred MCC tree topologies and the compartment assignments to unobserved internal nodes, an underlying conformity and strong correlation exist between genetic and anatomic locations and the direction of the virus's spread in the body. A particular pattern recurs: the haplotypes in lymph nodes are the source of viral lineages that migrate from lymph node to blood (Fig. 1). We deduce that viral lineages in blood are derived from replicating virus in lymph nodes with little or no evidence of an additional source in blood. The time-scaled trees show a strong and significant result and are robust to different substitution models (see Methods). These data, only revealed through temporarily and spatially resolved sampling, further support the conclusion that the pattern does not result from distinct populations of haplotypes being sampled from different compartments, but rather migration and colonization of haplotypes between lymphoid tissue and blood. A structured coalescent model[29], less prone to potential bias in spatial inference estimates, shows higher migration rates from lymph node to blood (Extended Data Table 5), confirming that the direction of flow is not due to oversampling of a particular anatomic location that would have increased estimates of traffic into that location[30].
Extended Data Table 5
Estimated migration rates between lymph nodes and blood using Bayesian inference under the structured coalescence
Migration rates (in fraction of emigrants per month) were estimated in the Gag region of HIV-1 for the three study subjects using Bayesian inference under the structured coalescence model, assuming a constant population size and a strict molecular clock. Migration rates from lymph node to blood and vice versa are shown with their standard error of the mean (SEM) after running at least 50 million MCMC steps and reaching high values of estimated sample sizes (ESS) for all parameters.
Patient
Migration rate
Mean
SEM
1774
From PB to LN
0.48
1.08E-02
From LN to PB
1.70
1.57E-01
1727
From PB to LN
0.31
6.99E-03
From LN to PB
0.57
1.21E-02
1679
From PB to LN
0.18
2.20E-03
From LN to PB
0.71
1.06E-02
Our results, which reconstruct the dynamics of HIV-1 spread within the body, imply that in patients with no detectable viral RNA in plasma, the virus reservoir is constantly replenished by low-level virus replication in lymphoid tissue. Distinguishing between low amounts of viral replication and pools of latently infected cells that may persist and rekindle HIV-1 infection is methodologically difficult. A small number of HIV-1 sequences isolated at consecutive time points that persisted without evidence of genetic change might be due to long-lived central memory cells, a fraction of which may have reverted to a resting state, or latently infected transitional memory cells that persist by clonal expansions (driven by homeostatic proliferation) or survivorship for long-lived infected CD4+ T cells that contain replication-competent virus[8,15,22,31-34]. Two of the subjects (1774 and 1679) showed that some haplotypes persist as a single tree branch through time (Fig. 1d and f), consistent with proliferation of HIV-1 infected cells or long-term cell survival[33,34]. Regardless of the different mechanisms for self-renewal and/or persistence by which some of these quite similar latent or defective viral lineages may have persisted, these quiescent viruses differ from others that have evolved and trafficked between compartments. The temporally and compartmentally sampled data show that viral lineages continue to diverge in well-suppressed patients and help explain the persistence of infectious viral reservoirs with scant decay of the virus pool[35]. The dynamic nature of the viral population in lymphoid tissue sanctuaries, where infected cells can still produce new viruses, infect new target cells and replenish the pool, undermines previous estimates of the time necessary to purge the reservoir of latently infected cells and achieve virus eradication[3].
A spatial dynamic model
Even though viral genetic diversity accumulated over time, the infected cells did not produce new virus with drug-resistance mutations, conferring a putative fitness advantage that might have led to a systemic viral rebound. To provide a mechanistic explanation of this scenario, we developed a drug concentration-dependent mathematical model of virus replication and spread between spatially distinct compartments to explore the deterministic components of viral dynamics[36]. In this model (see Fig. 2 and Supplementary Information), the virus can occupy two spatially distinct compartments that have limited traffic of virus particles or cells. The main compartment has a larger volume and high effective drug concentration. The other, smaller volume compartment (<0.01% of the size of the main compartment) has a lower drug concentration and represents a sanctuary site within the lymphoid tissue reservoir [37]. The model includes competition between two viral strains, one that is sensitive to a potent multidrug regimen and one that is partially resistant to the full complement of antiretroviral drugs, but less fit than the drug-sensitive strain in the absence of treatment. Within each compartment the balance between the strains is determined by the difference in their replicative fitness without treatment, the effectiveness of drug therapy to curb new rounds of infection in that region (determined by drug concentration) and the susceptibility of each strain to antiretroviral therapy.
Figure 2
Cartoon illustration of the drug concentration-dependent spatial model
In the main compartment (i=0; the majority of lymphoid tissue and the blood) drug concentration is high (grey and red). In the sanctuary site (i=1; a small fraction of the lymphoid tissue and localized extracellular fluid) drug concentration is low (pink). There are uninfected cells (X), long-lived infected cells (Y), and short-lived infected cells (Q), as well as virus particles (V) that can be bound by few (F) or many (G) receptors on the follicular dendritic cell network. The dashed lines represent the effect of treatment in blocking infection and production of infectious virus particles. For graphical simplicity, we do not show the emergence of drug resistance, the production of noninfectious virus particles, virus clearance, nor cell death.
Figure 3 illustrates a hypothetical fitness landscape, which portrays the relationship between the fitness of each strain and the evolutionary adaptation across a range of drug concentrations. The fitness landscape illuminates the persistence of the pool of virus that is unaffected by antiretroviral drugs. The effective reproductive number R (the average number of secondary de novo infections of cells produced by one infected cell) is a function of the basic reproductive number R and the effectiveness of treatment (R = R in the absence of treatment). The fitness cost for drug resistance determines the difference between the effective reproductive numbers for drug-sensitive and drug-resistant strains (R and R, respectively) at zero effective drug concentration. In a competitive system where both strains are present, the maximum of the two effective reproductive numbers R and R equals the effective reproductive number R for the system as a whole at that effective drug concentration.With two spatial compartments, heterogeneity in the distribution of the drug can lead to heterogeneity in R. In the sanctuary site, where drug-penetrance is low, the drug-selective pressure on the replicating virus population is too low to compensate for the fitness costs associated with resistance; thus, the effective reproductive number for a drug-sensitive strain is greater than that of the partially drug-resistant strain (R > R), enabling de novo infection dominated by the drug-sensitive strain[38]. As drug effectiveness increases, the partially drug-resistant strain gradually becomes more fit relative to the drug-sensitive strain such that at some intermediate drug concentrations, characterized by threshold levels, the effective reproductive number for the resistant strain can be greater than that for the drug-sensitive strain (R > R). This would allow for de novo infection dominated by the partially drug-resistant strain. In our model, however, we assume that drug concentration in the main compartment exceeds this threshold. At this high drug concentration, infection is no longer sustainable because the effective reproductive numbers are less than unity for both strains (R and R).By assuming a relatively simple two-compartment model with drug concentration differences between them, the model predicts that virus, dominated by the drug-sensitive strain, can proliferate in a small sanctuary site where the drug concentration is low (Fig. 4a). Although all single point mutations will be generated sufficiently often to prompt partially drug-resistant strains within individuals, their numbers will remain low through competition. The likelihood is exceedingly low (see Supplementary Information and Supplementary Table 2) of stepwise accumulation of mutations from a partially drug-resistant strain to one that confers resistance to all drugs, each of which would have to come to fixation in the absence of drug selection (Extended Data Fig. 4), or the presence or absence of recombination. The model calculations show that increasing drug effectiveness or penetration across spatial regions can affect evolutionary dynamics, and lead either to the emergence of drug-resistant strains (Fig. 4b) or the elimination of ongoing replication (Fig. 4c). Our model predictions fit the data well (Extended Data Fig. 5) and confirm that both competition between strains and regional spatial heterogeneity in antiretroviral concentrations help capture the observed dynamics of the viral reservoir in these well-suppressed patients.
Figure 4
Modelling replication dynamics and treatment effectiveness in the viral reservoir
In the main compartment, antiretroviral therapy is wholly effective against drug-sensitive virus (z0 = z̃0 = 1). In the sanctuary site, a, low treatment effectiveness (z1 = z̃1 = 0.3) favors drug-sensitive strains. Partially drug-resistant strains will exist, but at levels below those favoring stepwise evolution towards a fully drug-resistant strain. b, Intermediate treatment effectiveness (z1 = z̃1 = 0.6), favors partially drug-resistant virus at levels that may suffice for evolution towards a fully drug-resistant strain. c, high treatment effectiveness (z1 = z̃1 = 1) favors the decline of all strains and the cessation of virus replication.
Extended Data Figure 4
Alternative drug-dependent fitness landscape plots
a, Fitness landscape plot for a partially drug-resistant strain that confers relatively low-level resistance to drugs as compared with the fitness costs imposed by the drug-resistant mutations. The drug-resistant strain (blue line) does not outcompete the drug-sensitive strain (orange line) at any effective treatment concentration where it can grow. There are two phases to the dynamics. At lower effective drug concentrations (left of grey line) the drug-sensitive strain thrives. Beyond this threshold, neither strain can continuously replicate. b, Fitness landscape plot for a highly drug-resistant strain. This strain confers a high-level of drug resistance relative to the replicative fitness cost imposed by the resistance mutations. At low effective drug concentrations (left of grey line), the drug-sensitive strain outcompetes the drug-resistant strain. At high effective drug concentrations, the drug-resistant strain outcompetes the drug-sensitive strain and can continuously replicate. We argue that, typically, fully drug resistant mutants of this sort neither exist in the viral population of patients before treatment, nor arise through random mutation during the course of antiretroviral therapy (see Supplementary Information and Supplementary Table 2). Drug-resistant strains, which are capable of ongoing replication at high effective drug concentrations are not typically generated in individuals because: they are generated in a single step very rarely; and stepwise generation from partially resistant strains is also rare because partially resistant strains are outcompeted in the sanctuary site which constantly refills the pool. The strain specific effective reproductive numbers for the drug-sensitive R (orange) and drug-resistant R (blue) strains are shown. For simplicity, only the impact of changes to the effectiveness of a single drug in a single compartment is shown.
Extended Data Figure 5
Model of replication dynamics and treatment effectiveness in the viral reservoir fitted to the data
The model is fitted to the total inferred average body counts of free virus particles (green line), infected CD4+ T cells (orange line) and virus bound to the follicular dendritic cell network of B cell follicles (grey line). a, Demonstrates the dynamics over the first 200 days of treatment. Note that early on during antiretroviral therapy, HIV-1 RNA in plasma declines more rapidly than virus bound to the follicular dendritic cell network of B cell follicles. Circles demonstrate average data from the 3 patients discussed in detail in this study and an additional 9 patients presented elsewhere[13]. Where the average value was indeterminate because of test sensitivity, the data are fitted below the upper limit of the average log10 infectious units. b, Demonstrates the dynamics over a longer period. The model predicts the persistent low-level viral RNA in plasma. The diamond symbol represents data relating to the long-term persistent virus, as measured using quantitative reverse transcription PCR (see Methods). The optimal model fit parameters are presented in Supplementary Table 1.
Though probabilistic models suggest (and this model allows) production of partially drug-resistant strains, any which arise cannot populate the sanctuary site because of low drug penetration and competition from drug-sensitive strains. Equally, they cannot repopulate the larger, main compartment where drug concentrations preclude any ongoing replication. In agreement with our phylogenetic inferences, these results suggest that the low-level viral replication in lymphoid tissues where antiretroviral drugs concentration are low could allow drug-sensitive strains to grow and spillover to the blood[39].
Conclusion
This study reveals how dynamic and spatial processes work together to permit HIV-1 to persist within the infected host and avoid development of resistance despite antiretroviral therapy. From these temporally and compartmentally structured sequence data, we conclude that continued virus production from infected cells in lymphoid tissue sanctuary sites, where drug concentrations are not fully suppressive, can continue to refill the viral reservoir and traffic to blood or lymphoid tissue[18]. We further show that the virus does not inexorably develop resistance to antiretroviral drugs because the lower concentrations of drug in the sanctuaries are not sufficient to confer a competitive advantage upon drug-resistant strains. Our findings explain the failure of treatment intensification to fully suppress de novo infection and highlight issues surrounding the barriers to delivering antiretroviral drugs at clinically effective concentrations in the infectious viral reservoir. The state-of-the-art sequencing approach, innovative time-calibrated phyloanatomic tree construction, and a novel model of compartmentalized intra-host population dynamics provide a new perspective on HIV-1's seemingly untouchable strongholds in the body. Achieving optimal cellular pharmacokinetics and spatial distribution of antiretroviral drugs in lymphoid tissue to fully suppress viral replication and preserve immune function is, therefore, a prerequisite to the elimination of the viral reservoir and ultimately a cure for HIV-1 infection.
Methods
Study subjects
The three study subjects were enrolled into a clinical protocol where treatment was started after we obtained peripheral blood by venipucture, inguinal lymph nodes by excisional biopsy, and ileum and rectal biopsies through colonoscopy[13]. Two subjects were antiretroviral drug naïve (1727 and 1679) and one subject (1774) had not received any drugs for at least 1 year before enrollment. Subjects 1679 and 1774 received emtricitabine (FTC), tenofovir (TFV; as the disoproxil fumarate [TDF]), and atazanavir with ritonavir (ATV/R). Subject 1727 received FTC, TDF and efavirenz (EFV). In all three subjects, conventional typing methods confirmed that the plasma virus was sensitive to the potent antiretroviral regimen that they received. Subjects 1727 and 1679 were well-suppressed patients. Subject 1774 continued to have measureable amounts of HIV-1 RNA in plasma after 3, but not 6 months of treatment. We obtained peripheral blood, inguinal lymph node, and terminal ileum and rectum biopsies once again at 3 and 6 months after starting treatment. Laboratory procedures for tissue management, in situ hybridization, and analytical pharmacology are described elsewhere[13].
Extraction and quantification of viral nucleic acids
Nucleic acids were extracted from frozen cells obtained from blood or lymphoid tissue using the MasterPure Complete DNA and RNA Purification Kit (Epicentre, Madison, WI). Viral RNA was isolated from plasma using the PureLink Viral RNA/DNA Mini Kit (Life Technologies, Carlsbad, CA). HIV-1 was quantified using a quantitative reverse transcription PCR assay. The relative amount of HIV-1 target DNA was normalized to the quantification cycle for a concentration calibrator by using an external standard curve of serial 10-fold dilutions of reference DNA for the Gag region of HIV-1 derived from the plasmid pNL-43. All reactions were performed in triplicate on the ABI 7900HT sequence detector (Applied Biosystems, Foster City, CA).
Library preparation for deep sequencing
As the number of viral templates in the sampled material is low, we used an amplicon-based deep sequencing strategy. To minimize biased amplification of the target sequence, primer locations in the Gag and Pol regions of HIV-1 were selected on the basis of the alignment positional entropy in the multiple sequences aligned from the Los Alamos National Laboratory HIV-1 sequence database (http://www.hiv.lanl.gov/). The primers were computationally screened for cross-dimer interactions and the concentration of each primer was optimized for amplification. For each sample, blanks were included to screen for contamination. We used designated, physically separated areas within the laboratory to set-up PCR, which avoids contact with potentially contaminating amplicons.To generate the long read-length PCR amplicons sequenced in this study (range, 509bp to 587bp read-lengths per run depending on the gene region being analyzed), we amplified the Gag and Pol regions of HIV-1. For the Gag region of HIV-1, we used forward primer gag_632F_EK (5′-GCAGTGGCGCCCGAAC-3′ (corresponding to HXB2 nucleic acid sequence numbering positions 632 → 647) and reverse primer gag_1788R_EK (5’-AATAGTCTTACAATCTGGGTTCGC -3′ (1788 → 1765). For the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme protease and reverse transcriptase, we used forward primer HIV-Pro1_2137F(5’-CAGAGCAGACCAGAGCCAAC-3′, corresponding to positions 2137 → 2156) and reverse primer HIV-RT1_3531R (5’-CTGCTATTAAGTCTTTTGATGGGTC-3′ (3531 → 3507). PCR amplification was performed using the High Fidelity Platinum Taq DNA Polymerase (Invitrogen, Carlsbad, CA) with thermal cycling conditions of 94°C for 2 mins, followed by 35 cycles of 94°C for 15s, 54°C for 15s, 68°C for 1 mins, with a final extension step at 68°C for 5 mins.We used an integrated sequencing pipeline for library construction, template amplification, and DNA sequencing as in[20]. Multiplex Identifiers were included during library preparation for sample barcoding. For the Gag region of HIV-1, we used the forward primer A-Gag_977F_degEK 5′-primer A-GCTACAACCAKCCCTYCAGACAG-3′ (977 → 1000) and the reverse primer B-Gag_1564R_degEK 5′-primer B-CTACTGGGATAGGTGGATTAYKTG-3′ (1564 → 1541) to generate a 587bp amplicon (977 → 1564). For the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme protease, we used forward primer A-Pol1_2235F 5′-primer A-ACTGTATCCTTTAGCTTCCCTCA-3′ (2235 → 2262) and the reverse primer B-Pol1_2744R 5′-primer B-TTTCTTTATGGCAAATACTGGAG-3′ (2744 → 2721) to generate a 509bp amplicon (2235 → 2744). For the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme reverse transcriptase, we used forward primer A-Pol2_2700F 5′-primer A- GGGCCTGAAAATCCATACAAT-3′ (2700 → 2721) and the reverse primer B-Pol2_3265R 5′-primer B-CATTTATCAGGATGGAGTTCATA-3′ (3265 → 3242) to generate a 565bp amplicon (2700 → 3265). PCR was performed using the High Fidelity Platinum Taq DNA Polymerase (Invitrogen, Carlsbad, CA) with thermal cycling conditions of 94°C for 2 mins, followed by 35 cycles of 94°C for 15s, 54°C for 15s, 68°C for 1 mins, with a final extension step at 68°C for 5 mins. DNA amplicon libraries were resolved on a pre-cast 2% agarose gel and purified with QIAquick Gel Extraction kit (Qiagen, Hilden, Germany) and AMPure XP SPRI beads (Beckman Coulter, Indianapolis, IN).Libraries were quantified with KAPA Library Quant Kit (Kapa Biosystems, Wilmington, MA) on Agilent 2100 Bioanalyzer High Sensitivity DNA chip (Agilent, Santa Clara, CA) for concentration and size distribution. The concentration of the product DNA was normalized before pooling to achieve sequence uniformity across amplicons. Controls were spiked into the reaction to monitor the library construction process and potential index cross-contamination. The known internal control sequence (clonal sequence of 456 bp) introduced into the reaction was used to calculate the single nucleotide error rate and set the cut-off for the sequence analysis where no control errors could be detected. PCR errors are more common in later cycles of amplification, and being limited to small copy numbers they have little impact on the haplotype distribution. Bidirectional sequencing using the 454 Life Sciences’ GS-FLX sequencing platform (Roche, Basel, Switzerland) provided independent confirmation of sequence information. The long read-length sequences were sorted based on index sequences, trimmed to remove residual adapter bases from the ends of the reads, and filtered for length and duplicates prior to alignment. Read depth and coverage estimation that met predetermined coverage thresholds were performed as in[20]. Sequencing quality metrics were calculated for all samples using FastQC and only high-quality sequencing libraries were used in the ensuing analyses.
Sequence clean up and assembly
Experimental precision along with deep coverage allows for accurate estimation of the underpinning diversity of the virus population. We binned the sequence reads by multiplex identifier barcodes, and identified and excluded sequencing errors and misaligned regions from the analysis by computational methods. A small number of reads had a disproportionate number of errors that accounted for most of the inaccuracy in the full dataset. After quality filtering and trimming to a uniform length, we proceeded to build the different haplotypes present in each of the samples. We began by collapsing the identical sequences into haplotypes using reference-guided assembly to avoid the use of uninformative sequence repeats. Haplotypes at prevalence above the error threshold (defined by the internal control spiked in the sequencing runs), which corresponded to variants present above 0.04% of the total existing variants in the collapsed alignments, were used in the analysis.For the processing of the temporally and spatially linked deep-sequencing data for studying viral diversity, the viral sequences were first aligned against the HXB2 reference sequence (GenBank accession number K03455) using Segminator II (version 0.1.1). We then generated a consensus viral sequence for each patient as a reference for assembly to improve the alignment quality. A statistical model that utilizes platform error rates in conjunction with patterns within nucleotide frequencies derived from data obtained from related samples, that is, different compartments within the host or temporally linked samples, was used to separate low frequency platform error from true variation. This model is termed “probabilistic read error detection across temporally obtained reads” (PREDATOR) and is implemented within Segminator II. This statistical framework maintains the reading frame and corrects for deep sequencing errors[19]. We corroborated the number of haplotypes and the frequency of haplotypes that explain the data using a second reconstruction algorithm based on combinations of multinomial distributions to analyze the k-mer frequency spectrum of the sequencing data implemented with QuRe[40]. We screened the sequence alignments for recombinant sequences using the GARD algorithm implemented in HyPhy[41,42].
Sampling variance
The high coverage of massively parallel sequencing is necessary to ensure reliable detection of low-frequency viral variants. A simple calculation based on the geometric distribution shows that in order to guarantee that a viral template occurring at frequency f is detected with probability p or better, it is necessary to sequence at least log (1-p) / log (1-f) – 1 templates. A variant of frequency 0.01 (that is, 1%), for example, would require 450 sequences to ensure its detection at probability 0.99 (that is, 99%) or better. Conversely, a study using 100 single-genome sequences would detect a variant of frequency 0.01 with probability 0.64. A low number of input DNA templates derived from one compartment that catches the spillover from another does not account for the complexities of partial observation and spatial heterogeneity that could lead to measurement error elsewhere[16]. This complication emphasizes the challenge in trying to extrapolate from single template sequencing the magnitude and character of the virus population that comprises the viral reservoir.
Maximum-likelihood tree construction
Maximum-likelihood phylogenies were created with PhyML using the general time-reversible model with the proportion of invariant sites and gamma distribution of among-site rate variation (GTR+I+Γ4) nucleotide substitution model[43] applying an approximate likelihood ratio test for branch support[44]. We estimated trees on viral sequence sets from which gaps in the alignment were removed and considered as missing data for the reason that maximum-likelihood tree error may increase with inclusion of unreliable sites. We assessed the temporal structure of the trees by performing linear regression on the root-to-tip distances of samples versus the time of sampling and tested the validity of the time-dependency of the evolution rate estimates with the assumption of a strict molecular clock using the program Path-O-Gen v1.4 (http://tree.bio.ed.ac.uk/software/pathogen/). We used the Highlighter sequence visualization tool (www.HIV.lanl.gov) to trace commonality between sequences in an alignment based on individual nucleotide changes.
Compartmentalization
We tested for subdivision of viral sequences into sub-populations in the different compartments at each time point. We calculated genetic distances using the Wright's FST and S test statistics[45,46]. We used a bootstrap test to determine the confidence of the estimates and performed a permutation test (1000 iterations) to assess the significance levels of the obtained scores.
Identifying selection
We used a modification of a random effects branch-site model to detect positive selection and test whether the phylogeny diverged over time[47]. A likelihood ratio hypothesis test compared the fit of the model using a 3-bin ω distribution (ω and ω in [0,1], ω unrestricted) to describe the evolution of all branches in the tree, to the fit of the model where all ω is restricted to be in [0,1]. We tested whether or not a proportion of sites along internal branches of the intra-host viral phylogeny have been subject to episodic selection (ω > 1)[24], restricting the test to internal branches to lessen the biasing effects of neutral or deleterious mutations on ω estimates[48,49] and serve as a proxy for population level selection[25,26].
Time-calibrated phylogenetic tree construction
To resolve the phyloanatomy, we reconstructed the temporal and spatial dynamics of the viral haplotype lineages with a Bayesian statistical framework using Markov chain Monte Carlo (MCMC) sampling for evolutionary hypothesis testing, as implemented in BEAST version 2.1.2[27]. This approach was used to sample phylogenies from their joint posterior distribution, in which the viral haplotypes are restricted by their known date of sampling, using a simple substitution model described by Hasegawa–Kishino–Yano (HKY) to avoid over-parameterization[50]. Models differing in assumptions on mutation rate and effective population size were run for 100 million generations each and compared using the Bayes factor as implemented in Tracer version 1.6. We determined that the best-fit model included a strict molecular clock and assumed a constant population size. We used a symmetric transition model with constant rates over time that considered a discretized diffusion process among the different compartments. This was formalized as a continuous time Markov chain model to reconstruct the spatial dynamics between compartments. All chains were run for sufficient length and convergence of the relevant parameters was assessed using Tracer version 1.6, ignoring 10% of the chain as burn-in.We summarized the connections between virus evolution and anatomical compartment history using an annotated MCC phylogenetic tree estimated with BEAST. The model and its parameters were chosen after computing the posterior probability of several models to obtain the discriminatory Bayes factors. Because population structure, whether due to spatial segregation or limitations to gene flow, may affect evolutionary dynamics, we confirmed that the direction of flow was not due to oversampling of a particular environment, by running a two-deme Bayesian inference under a structured coalescent model with a HKY substitution model assuming a strict molecular clock[29], which is less susceptible to sampling issues than our trait-based analysis. For completeness, we conducted a search for topologies and divergence times assuming a relaxed molecular clock as well. In the analyses performed, HIV-1 showed a high degree of clock-like evolution and a mean nucleotide substitution rate expected to be within the bounds necessary to obtain meaningful phyloanatomic information from sequence data. Using the location of each of the haplotypes, a discrete trait was included in the inference. We used BEAST to estimate the probabilities of each of the possible states.
Statistical analysis
Standard descriptive statistics were performed with the use of the STATA, GraphPad or R packages.
The amounts of virus and the concentrations of drugs measured during antiretroviral therapy
Panels a-c show how the number of copies HIV-1 RNA per ml of blood, the number of the HIV-1 RNA particles bound to the follicular dendritic cell network per gram of lymphoid tissue, and the number HIV-1 RNA positive cells per gram of lymphoid tissue change over the first 6 months of treatment in subjects 1774, 1727 and 1679 (a, b and c, respectively). Filled circles represent detectable measures. Unfilled circles represent undetectable measures and are plotted at the limit of detection. Panels d-f show antiretroviral concentrations in cells from lymph node (dashed line) or blood (solid line) in subjects 1774, 1727 and 1679 (d, e and f, respectively) (see Methods). Intracellular TFV-diphosphate concentrations (fmol/106 cells) are shown in orange, FTC-triphosphate (fmol/106 cells) in green, ATV (ng/mL) in purple, and EFV (ng/mL) in blue. Samples with concentrations that were below the limits of quantification (2.5 fmol/106 cells, 2.5 fmol/106 cells, 0.014 ng/mL and 0.063 ng/mL, respectively) were assigned a value of 1 for graphical illustration purposes.
Phylogenies and Highlighter plots for the Gag region of HIV-1
Maximum-likelihood trees were constructed using gene sequences from the Gag region of HIV-1 from lymph node and blood before and after the guanosines within all possible APOBEC3 trinucleotide sequence context of edited sites were masked in the alignments, regardless of their presence in hypermutant or non-hypermutant sequences, to avoid their distortion in the phylogenetic reconstructions. Branch tips are colored according to compartment sampled: red for plasma; gold for lymph node; and blue for blood. The progressive shading of the colors of the branch tips indicate the points in time sampled. Phylogenetic trees reconstructed from the haplotypes in which the guanosines in the APOBEC3 trinucleotide context of the edited sites are masked in the alignments correct the skewing effect caused by clustering of shared haplotypes that harbor repetitive G-to-A substitutions and longer branch lengths caused by a larger number of these mutations in the hypermutated sequences whilst retaining the phylogenetic information. The horizontal scale indicates the expected number of substitutions per nucleotide site per unit time with haplotypes from later time points having diverged more. The Highlighter plots show the haplotypes from the lymphoid tissue and blood time point clusters aligned to the plasma virus sequence from day 0. The particular nucleotide changes are color coded in the alignment (thymidine, red; adenosine, green; cytosine, blue; and guanosine, orange). Magenta circles represent APOBEC3-induced G-to-A change in a trinucleotide context of the edited sites, which are distinguishable from the more random error-prone viral reverse transcriptase and RNA polymerase II replicating enzyme induced mutations[6]. Gene sequences from the Gag region of HIV-1 from Subject 1774, who continued to have measureable amounts of HIV-1 RNA in plasma on treatment, and Subjects 1727 and 1679 who were well-suppressed on treatment (a, b and c, respectively) before and after the guanosines within the particular APOBEC3 trinucleotide sequence context of edited sites were masked in the entire sequence alignment (left and right panels, respectively).
Phylogenies and Highlighter plots for the Pol region of HIV-1
Maximum-likelihood trees were constructed using gene sequences from the Pol region (retrotranscriptase [pol2]) of HIV-1 from lymph node and blood before and after the guanosines within all possible APOBEC3 trinucleotide sequence context of edited sites were masked in the alignments, regardless of their presence in hypermutant or non-hypermutant sequences, to avoid their distortion in the phylogenetic reconstructions. Branch tips are colored according to compartment sampled: red for plasma; gold for lymph node; and blue for blood. The progressive shading of the colors of the branch tips indicate the points in time sampled. The horizontal scale indicates the expected number of substitutions per nucleotide site per unit time with haplotypes from later time points having diverged more. The Highlighter plots show the haplotypes from the lymphoid tissue and blood time point clusters aligned to the plasma virus sequence from day 0. The particular nucleotide changes are color coded in the alignment (thymidine, red; adenosine, green; cytosine, blue; and guanosine, orange). Magenta circles represent APOBEC3-induced G-to-A change in a trinucleotide context of the edited sites. Gene sequences from the Pol region of HIV-1 that spanned the genomic region encoding the viral enzyme reverse transcriptase from Subjects 1774, 1727, and 1679 (a, b and c, respectively) before and after the guanosines within the particular APOBEC3 trinucleotide sequence context of edited sites were masked in the entire sequence alignment (left and right panels, respectively).
Alternative drug-dependent fitness landscape plots
a, Fitness landscape plot for a partially drug-resistant strain that confers relatively low-level resistance to drugs as compared with the fitness costs imposed by the drug-resistant mutations. The drug-resistant strain (blue line) does not outcompete the drug-sensitive strain (orange line) at any effective treatment concentration where it can grow. There are two phases to the dynamics. At lower effective drug concentrations (left of grey line) the drug-sensitive strain thrives. Beyond this threshold, neither strain can continuously replicate. b, Fitness landscape plot for a highly drug-resistant strain. This strain confers a high-level of drug resistance relative to the replicative fitness cost imposed by the resistance mutations. At low effective drug concentrations (left of grey line), the drug-sensitive strain outcompetes the drug-resistant strain. At high effective drug concentrations, the drug-resistant strain outcompetes the drug-sensitive strain and can continuously replicate. We argue that, typically, fully drug resistant mutants of this sort neither exist in the viral population of patients before treatment, nor arise through random mutation during the course of antiretroviral therapy (see Supplementary Information and Supplementary Table 2). Drug-resistant strains, which are capable of ongoing replication at high effective drug concentrations are not typically generated in individuals because: they are generated in a single step very rarely; and stepwise generation from partially resistant strains is also rare because partially resistant strains are outcompeted in the sanctuary site which constantly refills the pool. The strain specific effective reproductive numbers for the drug-sensitive R (orange) and drug-resistant R (blue) strains are shown. For simplicity, only the impact of changes to the effectiveness of a single drug in a single compartment is shown.
Model of replication dynamics and treatment effectiveness in the viral reservoir fitted to the data
The model is fitted to the total inferred average body counts of free virus particles (green line), infected CD4+ T cells (orange line) and virus bound to the follicular dendritic cell network of B cell follicles (grey line). a, Demonstrates the dynamics over the first 200 days of treatment. Note that early on during antiretroviral therapy, HIV-1 RNA in plasma declines more rapidly than virus bound to the follicular dendritic cell network of B cell follicles. Circles demonstrate average data from the 3 patients discussed in detail in this study and an additional 9 patients presented elsewhere[13]. Where the average value was indeterminate because of test sensitivity, the data are fitted below the upper limit of the average log10 infectious units. b, Demonstrates the dynamics over a longer period. The model predicts the persistent low-level viral RNA in plasma. The diamond symbol represents data relating to the long-term persistent virus, as measured using quantitative reverse transcription PCR (see Methods). The optimal model fit parameters are presented in Supplementary Table 1.
The genetic distance measured between the haplotypes in the Gag or Pol regions of HIV-1
We used sequence data from the Gag or Pol regions (protease [pol1] and retrotranscriptase [pol2]) regions of HIV-1 (range, 315bp to 487bp) to perform a genetic distance analysis. We masked the guanosines in APOBEC3 trinucleotide contexts of the edited sites from the alignments. We compared the proportion of substitutions per site (p-distance) for haplotypes within lymph node or blood from each subject at 6 months to the most common haplotype present at day 0. Most comparisons were statistically significant (Mann-Whitney U test, P-values < 0.1), indicating ongoing diversification of the viral populations.
HIV-1 evolutionary rate calculated after removing hypermutated haplotypes
We used a linear regression model to estimate the evolutionary rate for the Gag or Pol regions (protease [pol1] and retrotranscriptase [pol2]) of HIV-1. We calculated the slope (μ) of the linear regression between time and the direct pairwise genetic distances (number of substitutions per site per month) from the most common haplotype at day 0 for each subject. Haplotypes found to harbor G-to-A hypermutant sequences were removed from the analysis to limit the effect of inactivating mutations on the estimates. There is very strong evidence, now presented in numerous studies[17], that acute HIV-1 infections are by and large founded by a single (or at best a few) viral strains; this initial bottleneck is followed by rapid population diversification in the absence of treatment. The structures of our intra-host phylogenies at day 0 support this pattern. Even if multiple populations are transmitted and are able to establish infection and co-circulate, Bayesian phylodynamics models can properly account for it, and estimate evolutionary rates accurately.
Patterns of genetic divergence measured between viral haplotypes in lymphoid tissue and blood
S and F pairwise genetic differentiation (a and b, respectively) between peripheral blood and lymph samples for the Gag and Pol regions of HIV-1 from subjects 1774, 1727, and 1679 at day 0 and after 6 months of antiretroviral therapy.
Episodic selection estimated across sites along internal branches of the phylogeny
A branch-site evolutionary model provided a statistical framework to search for evidence of episodic selection on internal branches in the tree. The ω distribution is given for both internal and external branches.
Estimated migration rates between lymph nodes and blood using Bayesian inference under the structured coalescence
Migration rates (in fraction of emigrants per month) were estimated in the Gag region of HIV-1 for the three study subjects using Bayesian inference under the structured coalescence model, assuming a constant population size and a strict molecular clock. Migration rates from lymph node to blood and vice versa are shown with their standard error of the mean (SEM) after running at least 50 million MCMC steps and reaching high values of estimated sample sizes (ESS) for all parameters.
Authors: Courtney V Fletcher; Kathryn Staskus; Stephen W Wietgrefe; Meghan Rothenberger; Cavan Reilly; Jeffrey G Chipman; Greg J Beilman; Alexander Khoruts; Ann Thorkelson; Thomas E Schmidt; Jodi Anderson; Katherine Perkey; Mario Stevenson; Alan S Perelson; Daniel C Douek; Ashley T Haase; Timothy W Schacker Journal: Proc Natl Acad Sci U S A Date: 2014-01-27 Impact factor: 11.205
Authors: Steven A Yukl; Amandeep K Shergill; Terence Ho; Maudi Killian; Valerie Girling; Lorrie Epling; Peilin Li; Lisa K Wong; Pierre Crouch; Steven G Deeks; Diane V Havlir; Kenneth McQuaid; Elizabeth Sinclair; Joseph K Wong Journal: J Infect Dis Date: 2013-07-12 Impact factor: 5.226
Authors: F Maldarelli; X Wu; L Su; F R Simonetti; W Shao; S Hill; J Spindler; A L Ferris; J W Mellors; M F Kearney; J M Coffin; S H Hughes Journal: Science Date: 2014-06-26 Impact factor: 47.728
Authors: Thor A Wagner; Sherry McLaughlin; Kavita Garg; Charles Y K Cheung; Brendan B Larsen; Sheila Styrchak; Hannah C Huang; Paul T Edlefsen; James I Mullins; Lisa M Frenkel Journal: Science Date: 2014-07-10 Impact factor: 47.728
Authors: John Archer; Greg Baillie; Simon J Watson; Paul Kellam; Andrew Rambaut; David L Robertson Journal: BMC Bioinformatics Date: 2012-03-23 Impact factor: 3.169
Authors: Mary F Kearney; Jonathan Spindler; Wei Shao; Sloane Yu; Elizabeth M Anderson; Angeline O'Shea; Catherine Rehm; Carry Poethke; Nicholas Kovacs; John W Mellors; John M Coffin; Frank Maldarelli Journal: PLoS Pathog Date: 2014-03-20 Impact factor: 6.823
Authors: Eun-Young Kim; Ramon Lorenzo-Redondo; Susan J Little; Yoon-Seok Chung; Prabhjeet K Phalora; Irina Maljkovic Berry; John Archer; Sudhir Penugonda; Will Fischer; Douglas D Richman; Tanmoy Bhattacharya; Michael H Malim; Steven M Wolinsky Journal: PLoS Pathog Date: 2014-07-31 Impact factor: 6.823
Authors: Erin Burgunder; John K Fallon; Nicole White; Amanda P Schauer; Craig Sykes; Leila Remling-Mulder; Martina Kovarova; Lourdes Adamson; Paul Luciw; J Victor Garcia; Ramesh Akkina; Philip C Smith; Angela D M Kashuba Journal: J Pharmacol Exp Ther Date: 2019-06-24 Impact factor: 4.030
Authors: David Palesch; Steven E Bosinger; Maud Mavigner; James M Billingsley; Cameron Mattingly; Diane G Carnathan; Mirko Paiardini; Ann Chahroudi; Thomas H Vanderford; Guido Silvestri Journal: J Virol Date: 2018-06-29 Impact factor: 5.103
Authors: Sara Gianella; Jeff Taylor; Timothy R Brown; Andy Kaytes; Cristian L Achim; David J Moore; Susan J Little; Ronald J Ellis; Davey M Smith Journal: AIDS Date: 2017-01-02 Impact factor: 4.177
Authors: Salum J Lidenge; For Yue Tso; Owen Ngalamika; John R Ngowi; Yasaman Mortazavi; Eun Hee Kwon; Danielle M Shea; Veenu Minhas; Julius Mwaiselage; Charles Wood; John T West Journal: J Infect Dis Date: 2019-04-08 Impact factor: 5.226