| Literature DB >> 28384158 |
Orlando DeLeon1, Hagit Hodis1, Yunxia O'Malley1, Jacklyn Johnson1, Hamid Salimi1, Yinjie Zhai1, Elizabeth Winter1, Claire Remec1, Noah Eichelberger1, Brandon Van Cleave1, Ramya Puliadi1, Robert D Harrington2, Jack T Stapleton1,3, Hillel Haim1.
Abstract
The <span class="Gene">envelope <al">span class="Chemical">glycoproteins (Envs) of HIV-1 continuously evolve in the host by random mutations and recombination events. The resulting diversity of Env variants circulating in the population and their continuing diversification process limit the efficacy of AIDS vaccines. We examined the historic changes in Env sequence and structural features (measured by integrity of epitopes on the Env trimer) in a geographically defined population in the United States. As expected, many Env features were relatively conserved during the 1980s. From this state, some features diversified whereas others remained conserved across the years. We sought to identify "clues" to predict the observed historic diversification patterns. Comparison of viruses that cocirculate in patients at any given time revealed that each feature of Env (sequence or structural) exists at a defined level of variance. The in-host variance of each feature is highly conserved among individuals but can vary between different HIV-1 clades. We designate this property "volatility" and apply it to model evolution of features as a linear diffusion process that progresses with increasing genetic distance. Volatilities of different features are highly correlated with their divergence in longitudinally monitored patients. Volatilities of features also correlate highly with their population-level diversification. Using volatility indices measured from a small number of patient samples, we accurately predict the population diversity that developed for each feature over the course of 30 years. Amino acid variants that evolved at key antigenic sites are also predicted well. Therefore, small "fluctuations" in feature values measured in isolated patient samples accurately describe their potential for population-level diversification. These tools will likely contribute to the design of population-targeted AIDS vaccines by effectively capturing the diversity of currently circulating strains and addressing properties of variants expected to appear in the future.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28384158 PMCID: PMC5383018 DOI: 10.1371/journal.pbio.2001549
Source DB: PubMed Journal: PLoS Biol ISSN: 1544-9173 Impact factor: 8.029
Fig 1Structural and sequence features of HIV-1 Env present different patterns of historic change in the population.
Historic changes in antigenic features (A) and length of the gp120 variable loops (B) in Envs isolated from samples collected in Iowa City (113 patients) and Seattle (14 patients). Each patient is represented by two isolates that reflect the range of feature values detected in the tested plasma sample. To examine changes in epitope integrity, we sectioned the three-decade time frame into 5–6 y periods. For each period, we quantified the percent of Envs that bind the probe inefficiently (marked by green circles), which is defined as less than 5% of probe binding to the control AD8 Env. To compare between feature variance in Period1 (1985–1991) and Period3 (2005–2012), we applied Levene’s test. The p-value for the null hypothesis of equal variance is labeled Pvar and is highlighted in a color that describes its statistical significance (green, high; red, low). Changes in antigenicity features were tested using data from 27 patients from Period1 and 30 patients from Period3. Changes in variable loop lengths were tested using 32 and 31 patients from Period1 and Period3, respectively. All antigenicity and segmental features are shown in S3 and S4 Figs. Data underlying this figure can be found in S2 Data.
Fig 2Antigenic and segmental features of Env show conserved levels of in-host variance.
(A) Binding of the indicated probes was measured to Envs isolated from plasma samples of 60 HIV-infected individuals (2–8 Envs per sample). Values represent the variance in binding efficiency among Envs isolated from the same plasma sample, as calculated by the coefficient of variation (CoV). The CoVs are color-coded according to their values (darker shades of blue represent greater variance). (B) Mean CoVs of each feature for the 60 patients examined in panel A. Error bars represent the standard error of the mean (SEM). (C) The protein sequence of each Env was used to calculate the indicated features of the five variable loops of gp120, including amino acid length, mean hydropathy score, and the density of charge and potential N-linked glycosylation sites (PNGSs) (calculated as a fraction of the loop length). The CoV of features among Envs from the same plasma sample was calculated and averaged for all 60 patients. Data underlying this figure can be found in S6 Fig and S2 Data.
Fig 3The volatility index is a conserved property of each feature.
(A) Schematic of the approach used to measure the volatility index of a feature in a given plasma sample. The squared pairwise phenotypic distance between each Env pair in the plasma sample is calculated and divided by the genetic distance (based on amino acid sequence) that separates them. The ratio is averaged for all Env pairs in the sample to generate the feature volatility index for that plasma sample. (B) Volatility indices of antigenicity features measured in 60 patients. Calculated values were first log10-transformed. For averaging of the indices, all values smaller than –6 were assigned a value of –6. (C) Correlation between the median volatility index of antigenic features measured in 60 patients from Iowa City and 43 samples (from 15 patients) collected in Seattle. The ideal correlation (y = x) is shown by a blue line. (D) Mean volatility indices measured for segmental features using 60 samples from Iowa City. Amino acid positions of segments are numbered according to the HXBc2 convention [88]. Volatilities of all gp120 and gp41 segments are shown in S8A Fig. (E, F) Correlations between the mean volatility indices of segmental features measured using the above Env panels and a panel of Envs isolated from plasma samples of 20 patients collected for the MOTIVATE trial. Two-tailed p-values for the Spearman correlation test of each feature type are indicated (*, p ≤ 0.01; **, p ≤ 0.001). (G) Effect of Env sample size on differences between hosts in measured volatilities. We calculated the hydropathy volatility of the 23 segments of Env in Iowa City samples containing two, three, or more than three Envs and in MOTIVATE trial samples (average of nine Envs tested per sample). Each dot represents the standard deviation among patient volatilities for a given feature. Groups are compared using Wilcoxon signed-rank test. (H) The cumulative mean volatility of V1 loop hydropathy is shown for the above groups. For each group, ten random paths of calculation are shown, which represent different orders of cumulative averaging of volatility values. Error bars represent the SEM. Spearman rank correlation coefficient, rS; p-value, two-tailed test; ns, not statistically significant. Data underlying this figure can be found in S2 Data.
Fig 4Relationship between the volatility index and longitudinal divergence of Env features.
(A) Approach used to measure diversification of phenotypes between mixed states. Phenotypic and genetic distances were measured between each reference Env from the first plasma sample and all other Envs. (B) Longitudinal divergence of segmental features measured for loops V1, V3, and V5 in 18 patients monitored for up to 11 y. Data represent the phenotypic pairwise distances between each reference isolate and all other Envs from that patient and are divided by the value of the reference isolate. All pairwise distances measured for all patients are shown. To monitor the progression of variance and allow equal representation for all patients, we divided the x-axis into sections of 0.01 genetic distance units (see vertical lines). For each section, all phenotypic pairwise distances from the same patient were averaged. Variance among patient averages for the same section was then calculated (labeled by red squares). Because of the small number of isolates in sections that describe larger genetic distances, calculations were performed only for sections of 0.01 to 0.09 distance units. Data describing divergence of all loops are shown in S9A Fig. (C) Correlation between the measured variance of features in each section and the predicted variance (calculated as the product of the volatility index and the genetic distance units of the section). (D) Longitudinal divergence of hydropathy score of the indicated V3 loop residues, as quantified by the sectional variance value. The hydropathy volatility at each position for panels of Envs from clades B and C is shown. (E) Correlation between predicted and measured hydropathy score for V3 loop crown residues at genetic distances of 0.01 to 0.09 units. Data underlying this figure can be found in S2 Data.
Fig 5Longitudinal divergence of Env features and associated asymmetry of increments.
(A) Evolution of variance in 18 longitudinally monitored patients. Divergence of all 11 features is shown in S10 Fig. (B) Correlation between predicted and measured antigenic variance that developed at each genetic distance section (in the range of 0.01 to 0.09 units). (C) Changes in length of the V5 loop in longitudinally monitored patients with increasing genetic distance from the reference isolate. Evolution was examined separately for patients in which the V5 loop of the reference Env(s) was short (9 amino acids, red), intermediate (11 amino acids, yellow), or long (15 amino acids, blue). A least-squares regression line was fit to each dataset, which describes the mean change in feature value per genetic distance unit; the slope of the line (μ) is indicated. (D) Changes in binding efficiency of mAb PG16 in longitudinally monitored patients from different reference states. Data are colored according to the value of their reference state. The vertical colored bars by the y-axis represent the range of values of the reference isolates. Data underlying this figure can be found in S2 Data.
Fig 6Relationship between in-host volatility and population-level diversity of Env features.
(A) Volatilities measured in samples collected in Iowa City during Period1 or Period3 are compared with the diversity of each feature (calculated by the standard deviation of the feature value) in Iowa City during Period3 or Period1, respectively. (B) Comparison between in-host volatility and diversification of gp120 features between Period1 and Period3 in Iowa City. Volatility was calculated using the 20 samples of the MOTIVATE trial. SP, signal peptide; V, variable loop; C, constant region. Comparison between volatility and diversity of features in Iowa City during Period3 is shown in S12 Fig. Data underlying this figure can be found in S2 Data.
Fig 7Prediction of population-level changes in sequence of the V3 loop crown and membrane-proximal ectodomain region (MPER).
(A) Hydropathy volatility of residues 305–317 was calculated using the 20 patient samples of the MOTIVATE trial. Values are compared with diversity of the hydropathy score of each position in Iowa City during Period3. (B) To predict the amino acid variants that appeared in the population, we measured for each position the volatility of charge, molecular weight, and hydropathy. The likelihood of changes from the consensus sequence in Iowa City during Period1 (also the clade B ancestor) to each possible amino acid variant was then calculated using a joint probability density function, which combines the likelihoods of the transition for the three feature types (Eqs 5 and 6). A schematic describing the approach is provided in S13 Fig. Sequence logos represent the ancestral strain, the predicted variants (calculated using the joint probability density function), and the amino acid variants at each position that circulated in the Iowa City population during Period3. Calculations were solely based on the volatility of each feature; substitution likelihoods of amino acids were not taken into consideration. (C) Relationship between volatility and diversity of the hydropathy score of MPER residues in Iowa City during Period3. (D) Prediction of the MPER amino acid variants that evolved in the Iowa City population using the joint probability density function. Data underlying this figure can be found in S2 Data.
Fig 8Spread of HIV-1 Env features from patient to population.
Evolution of viruses circulating in the host is controlled by immune and fitness pressures. The collective effects of such pressures determine the permissiveness for variance of each feature at any given time point (i.e., volatility). The propensity for longitudinal divergence of features is closely related to volatility; small degrees of variance can be amplified over time to increase the range of values. During transmission of the virus between hosts, some features are subjected to selective pressures specific to the transmission process (bottlenecks), which limit potential diversity in the recipients. Occurrence of these processes across multiple patients and transmission events defines the range of feature values in the population. Thus, measured across three decades, the in-host volatility and transmission bottlenecks dictate distribution of each feature in the population.