| Literature DB >> 27163788 |
Damien C Tully1, Colin B Ogilvie1, Rebecca E Batorsky1, David J Bean1, Karen A Power1, Musie Ghebremichael1, Hunter E Bedard1, Adrianne D Gladden1, Aaron M Seese1, Molly A Amero1, Kimberly Lane1, Graham McGrath2, Suzane B Bazner2, Jake Tinsley3, Niall J Lennon4, Matthew R Henn4, Zabrina L Brumme5, Philip J Norris6, Eric S Rosenberg2, Kenneth H Mayer3, Heiko Jessen7, Sergei L Kosakovsky Pond8, Bruce D Walker1,9, Marcus Altfeld1,10, Jonathan M Carlson11, Todd M Allen1.
Abstract
Due to the stringent population bottleneck that occurs during sexual HIV-1 transmission, systemic infection is typically established by a limited number of founder viruses. Elucidation of the precise forces influencing the selection of founder viruses may reveal key vulnerabilities that could aid in the development of a vaccine or other clinical interventions. Here, we utilize deep sequencing data and apply a genetic distance-based method to investigate whether the mode of sexual transmission shapes the nascent founder viral genome. Analysis of 74 acute and early HIV-1 infected subjects revealed that 83% of men who have sex with men (MSM) exhibit a single founder virus, levels similar to those previously observed in heterosexual (HSX) transmission. In a metadata analysis of a total of 354 subjects, including HSX, MSM and injecting drug users (IDU), we also observed no significant differences in the frequency of single founder virus infections between HSX and MSM transmissions. However, comparison of HIV-1 envelope sequences revealed that HSX founder viruses exhibited a greater number of codon sites under positive selection, as well as stronger transmission indices possibly reflective of higher fitness variants. Moreover, specific genetic "signatures" within MSM and HSX founder viruses were identified, with single polymorphisms within gp41 enriched among HSX viruses while more complex patterns, including clustered polymorphisms surrounding the CD4 binding site, were enriched in MSM viruses. While our findings do not support an influence of the mode of sexual transmission on the number of founder viruses, they do demonstrate that there are marked differences in the selection bottleneck that can significantly shape their genetic composition. This study illustrates the complex dynamics of the transmission bottleneck and reveals that distinct genetic bottleneck processes exist dependent upon the mode of HIV-1 transmission.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27163788 PMCID: PMC4862634 DOI: 10.1371/journal.ppat.1005619
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Fig 1Mean average pairwise Hamming distance (APHD) of HIV-1 Env SGA/S sequences distinguishes between single and multiple founder viruses.
(A) A training set of SGA/S Env sequences derived from 127 previously published acute HIV-1 infected subjects illustrating a wide range of env diversity. The APHD is calculated using a sliding window of 120bp with a step size of 21bp. The mean APHD is plotted according to Fiebig stages as defined by HIV-1 clinical laboratory test results. (B) A classifier based on a logistic regression segregated 127 subjects into single or multiple infections and correctly assigned 97% of subjects into the respective groups. Each point corresponds to an individual subject with the number of subjects denoted on the x-axis in parenthesis under each Fiebig stage.
Fig 2Subject 571373 exhibits low env diversity by both 454 and SGA/S reflective of a single founder virus.
454 and SGA/S analysis of 3′ half sequences from subject 571373 (Fiebig stage II/III). (A) Heat map illustrating a small number of sites exhibiting low-level amino acid sequence diversity across the 3′ half of the HIV-1 genome as detected by 454 deep sequencing. Plotted is the percentage of amino acid diversity at each position with the first amino acid of Vif located in the top left corner of the grid and last amino acid of Nef located in the bottom right corner. Completely conserved residues are black and low-level variant residues (<10%) are dark blue. (B) The average pairwise hamming distance calculated from 454 sequencing reads for the 3′ half of the genome is plotted with the APHD of 0.063 (red line) and standard deviation (dotted black line) shown. The plot shows a relatively uniform population with random sites throughout the genome exhibiting low-level diversity. (C) SGA/Ss from subject 571373 covering the 3′ half of the HIV-1 genome display limited structure on a neighbor-joining (NJ) phylogenetic tree (left) and few nucleotide changes from the intrasubject consensus. The highlighter plots (right) compare sequences for each subject’s sequence set to an intrasubject consensus (uppermost sequence) and illustrates the pattern of nucleotide base mutations within sequences using short color-coded bars. (D) Hamming distance analysis of SGA/Ss showing infection by a single virus with Hamming distance frequencies conforming precisely to model predictions of a single virus infection (red line). As further support for a single founder virus the estimated time to a single most recent common ancestor (MRCA) of 36 days (23–49 days) overlapped with the estimated clinical duration of infection based on Fiebig stages (18–37 days).
Fig 3Subject 654207 exhibits high env diversity by both 454 and SGA/S reflective of multiple founder viruses.
(A) Heat maps illustrating a number of sites exhibiting amino acid sequence diversity across the 3′ half of the genome as detected by 454 deep sequencing. Plotted is the percentage of amino acid diversity at each position with the first amino acid of Vif located in the top left corner of the grid and last amino acid of Nef located in the bottom right corner. Completely conserved residues are black and low-level variant residues (<10%) are dark blue, moderately variable residues (20%) are sky blue and highly variant residues (>40%) are green. (B) The average pairwise hamming distance calculated from 454 sequencing reads for the 3′ half of the genome is plotted with the APHD of 0.752 (red line) and standard deviation (dotted black line) shown. The plot shows a variable population with a high number of sites exhibiting throughout the genome exhibiting high-level diversity. (C) SGA/Ss from subject 654207 covering the 3′ half of the HIV-1 genome display a phylogeny (left) revealing productive infection by at least four viruses with inter-lineage recombination. Founder virus lineages are color-coded and labeled variant 1–4. Recombinant sequences are shown by green symbols. The highlighter plots (right) compare sequences for each subject’s sequence set to an intrasubject consensus (uppermost sequence) and illustrates the pattern of nucleotide base mutations within sequences using short color-coded bars. (D) Hamming distance analysis of SGA/Ss showing infection by multiple viruses with Hamming distance frequencies (mean Hamming distance of 35.38) not conforming to model predictions of a single virus infection. Splitting of variants into their respective sub-lineages such as variants 1 and 4 demonstrate Hamming distance frequencies that do conform to model predictions of a single virus infection (red line). Subject 654207 is viral RNA positive but Western blot negative (stage II/III of infection).
Fig 4Complexity of acute HIV-1 infection revealed using deep sequencing data and the APHD approach.
(A) Mean APHD of 74 newly deep sequenced acute HIV-1 infected subjects, illustrating a wide range of env diversity plotted according to Fiebig stages. Black circles depict the 6 samples in which SGA/S was also performed. (B) Classification of the 74 subjects into single vs. multiple founder viruses resulted in 63 subjects exhibiting a more homogeneous infection suggestive of productive clinical infection originating from a single virus, and 11 subjects exhibiting distinctly higher diversity indicative of heterogeneous infection and infection by multiple founder viruses. Each point corresponds to an individual subject with the number of subjects denoted on the x-axis in parenthesis under each Fiebig stage.
Multiplicity of HIV-1 infection in HSX, MSM and IDU subjects.
| Route of transmission | Study | Virus subtype | Total no. of subjects | Single variant transmission | Multiple variant transmission | ||
|---|---|---|---|---|---|---|---|
| n | % | n | % | ||||
| HSX | Tully | B | 8 | 7 | 87.5% | 1 | 12.5% |
| Abrahams [ | C | 69 | 54 | 78.3% | 15 | 21.7% | |
| Haaland [ | A or C | 27 | 22 | 81.5% | 5 | 18.5% | |
| Keele [ | B | 79 | 65 | 82.3% | 14 | 17.7% | |
| Gnanakaran [ | B | 7 | 4 | 57.1% | 3 | 42.9% | |
|
|
|
|
|
|
| ||
| MSM | Tully | B | 64 | 53 | 82.8% | 11 | 17.2% |
| Keele [ | B | 22 | 13 | 59.1% | 9 | 40.9% | |
| Li [ | B | 28 | 18 | 64.3% | 10 | 35.7% | |
| Herbeck [ | B | 9 | 8 | 88.9% | 1 | 11.1% | |
| Gnanakaran [ | B | 9 | 7 | 77.8% | 2 | 22.2% | |
|
|
|
|
|
|
| ||
| IDU | Tully | B | 1 | 1 | 100.0% | 0 | 0.0% |
| Keele [ | B | 1 | 0 | 0.0% | 1 | 100.0% | |
| Bar [ | B | 10 | 4 | 40.0% | 6 | 60.0% | |
| Masharsky [ | A | 13 | 9 | 69.2% | 4 | 30.8% | |
| Dukhovlinova [ | A | 7 | 7 | 100.0% | 0 | 0.0% | |
|
|
|
|
|
|
| ||
a Chi-Square test was used to examine the association between mode of transmission and infection outcome (p value = 0.167)
Fig 5Selection bias intensified for HSX founder viruses compared to MSM founder viruses.
The transmission index of a sequence was calculated using logistic regression with model weights taken from [12]. Black lines represent the median transmission index for the two risk groups. The overall transmission index of HSX (red circles) viruses is significantly higher than from MSM (blue circles) founder viruses (P = 0.00003, Mann-Whitney two-tailed test). The number of subjects in each category is denoted under each group.
Previously described signature sites enriched in HSX Founder viruses.
| Predictor Variable | Target Position (HXB2 #) | Target Amino Acid | Target Consensus | Direction |
|
| Mutation Direction | Domain |
|---|---|---|---|---|---|---|---|---|
| HSX | R192 | R | R | Adapted | 0.0039 | 0.142 | !R—> R | V2 Base |
| HSX | N362 | N | N | Adapted | 0.0017 | 0.1199 | !N—> N | C3; PNGS |
| HSX | N362 | K | N | Non-Adapted | 0.00106 | 0.2312 | K ->! K | C3; PNGS |
| HSX | R633 | K | R | Non-Adapted | 0.0128 | 0.2312 | K ->! K | gp41 |
Using a phylogenetically corrected logistical-regression model we identified whether any previously published signature sites [34] were associated with HSX mode of transmission. All associations identified were corrected for multiple compairsons with sites with q-values <0.3 displayed.
a Predictor Variable: The risk behavior tested. In this case HSX mode of transmission.
b Target Position: The position and identity of the target variable in HXB2 numbering with cohort consensus amino acid listed.
c Target Amino Acid: The position and identity of the target amino acid.
d Direction: Adapted: amino acid is positively correlated with respect to HSX mode of transmission, Non-Adapted: amino acid is present only when risk group is absent
Signature sites identified between MSM and HSX Founder viruses in Env using a phylogenetic corrected method.
| Predictor Variable | Target Position (HXB2 #) | Target Amino Acid | Target Consensus | Direction | p value | q value | Association Conditions | Mutation Direction | Domain |
|---|---|---|---|---|---|---|---|---|---|
| MSM | I165 | I | I | Non-Adapted | 0.00135 | 0.36973 | I ->! I | V2 Loop | |
| MSM | T283 | I | T | Non-Adapted | 0.00082 | 0.13190 | E275@K | T ->! I | C2 region; CD4 contact |
| MSM | K343 | E | K | Adapted | 0.00085 | 0.13190 | N135@E, L515@L | !E -> E | C3 region |
| MSM | N362 | N | N | Non-Adapted | 0.00099 | 0.36973 | N ->! N | C3 region | |
| MSM | Q389 | P | Q | Non-Adapted | 0.00003 | 0.04125 | P417@P | Q ->! P | V4 loop (Beside Lectin DC-SIGN BS) |
| HSX | Adapted | 0.00006 | 0.04150 | !P -> P | |||||
| MSM | E429 | Q | E | Non-Adapted | 0.00026 | 0.07752 | T189@N, K304@R, E347@G | E ->! Q | C4 Region; CD4 contact |
| MSM | T465 | N | T | Adapted | 0.00365 | 0.33233 | E293@Q, N404@S, T467@T, H721@H, A792@I | !N -> N | V5 Loop |
| MSM | G471 | A | G | Adapted | 0.00048 | 0.36973 | !A -> A | C5 region; CD4 | |
| MSM | M518 | M | M | Non-Adapted | 0.00156 | 0.36973 | M ->! M | gp41; fusion peptide | |
| HSX | K617 | K | K | Non-Adapted | 0.00215 | 0.31619 | F317@L, M426@R, R444@R | K ->! K | gp41 fusion domain |
| MSM | P724 | P | P | Non-Adapted | 0.00013 | 0.07103 | N136@N, T139@T, N463@E | P ->! P | cytoplasmic tail; Kennedy epitope |
| HSX | Adapted | 0.00007 | 0.04150 | !P -> P | |||||
| MSM | E735 | E | E | Adapted | 0.00199 | 0.19669 | !E -> E | cytoplasmic tail; Kennedy epitope | |
| MSM | F752 | F | F | Adapted | 0.00100 | 0.13190 | S209@S | !F -> F | cytoplasmic tail |
| MSM | R770 | R | R | Adapted | 0.00018 | 0.07103 | T139@T, L856@Q | !R -> R | cytoplasmic tail; LLP-2 amphipathic helix |
| MSM | A823 | G | A | Adapted | 0.00128 | 0.13803 | N396@E, Q834@Q | !G -> G | cytoplasmic tail |
| HSX | Non-Adapted | 0.00111 | 0.18648 | G ->! G | |||||
| MSM | A823 | A | A | Non-Adapted | 0.00128 | 0.13803 | N396@E, Q834@Q | A ->! A | cytoplasmic tail |
| HSX | Adapted | 0.00111 | 0.18648 | !A -> A | |||||
| HSX | V832 | I | V | Adapted | 0.00082 | 0.18648 | D412@N, T676@S | !I -> I | cytoplasmic tail; LLP-1 amphipathic helix |
| MSM | H842 | N | H | Adapted | 0.00040 | 0.09374 | P724@Q, E275@K, N461@D | !N -> N | cytoplasmic tail; LLP-1 amphipathic helix |
| MSM | R845 | T | R | Non-Adapted | 0.00098 | 0.13190 | I6@I, E87@A, N141@A, T818@T | R ->! T | cytoplasmic tail; LLP-1 amphipathic helix |
| HSX | Adapted | 0.001017 | 0.186479 | I6@I, N141@A | !T -> R | ||||
| HSX | A854 | A | A | Adapted | 0.0005 | 0.186479 | T413@N, D621@D | !A -> A | cytoplasmic tail; LLP-1 amphipathic helix |
a Predictor Variable: The risk behavior MSM or HSX.
b Target Position: The position and identity of the target variable in HXB2 numbering with cohort consensus amino acid listed.
c Target Amino Acid: The position and identity of the target variable.
d Direction: Adapted: amino acid is positively correlated with the risk group, Non-Adapted: amino acid is present only when risk group is absent.
e Association Conditions: The conditions under which the p-value was calculated. Models are built using forward selection, so these predictor variables were added to the model prior to the addition of the site. Sites are labeled with HXB2 numbering and residues correspond to cohort consensus
Fig 6Mapping of signature sites on the three-dimensional structure of gp120 shows clustering around the CD4-binding site.
A ribbon representation of the crystal structure from the JRFL gp120 molecule (grey) bound to CD4 molecule (green) (PDBID: 2B4C). The CD4 binding site is highlighted in transparent green while signature sites 283, 343, 362, 389, 429, 465 and 471 are all depicted as red space-filling residues.