| Literature DB >> 19424423 |
Natasha Wood1, Tanmoy Bhattacharya, Brandon F Keele, Elena Giorgi, Michael Liu, Brian Gaschen, Marcus Daniels, Guido Ferrari, Barton F Haynes, Andrew McMichael, George M Shaw, Beatrice H Hahn, Bette Korber, Cathal Seoighe.
Abstract
The pattern of viral diversification in newly infected individuals provides information about the host environment and immune responses typically experienced by the newly transmitted virus. For example, sites that tend to evolve rapidly across multiple early-infection patients could be involved in enabling escape from common early immune responses, could represent adaptation for rapid growth in a newly infected host, or could represent reversion from less fit forms of the virus that were selected for immune escape in previous hosts. Here we investigated the diversification of HIV-1 env coding sequences in 81 very early B subtype infections previously shown to have resulted from transmission or expansion of single viruses (n = 78) or two closely related viruses (n = 3). In these cases, the sequence of the infecting virus can be estimated accurately, enabling inference of both the direction of substitutions as well as distinction between insertion and deletion events. By integrating information across multiple acutely infected hosts, we find evidence of adaptive evolution of HIV-1 env and identify a subset of codon sites that diversified more rapidly than can be explained by a model of neutral evolution. Of 24 such rapidly diversifying sites, 14 were either i) clustered and embedded in CTL epitopes that were verified experimentally or predicted based on the individual's HLA or ii) in a nucleotide context indicative of APOBEC-mediated G-to-A substitutions, despite having excluded heavily hypermutated sequences prior to the analysis. In several cases, a rapidly evolving site was embedded both in an APOBEC motif and in a CTL epitope, suggesting that APOBEC may facilitate early immune escape. Ten rapidly diversifying sites could not be explained by CTL escape or APOBEC hypermutation, including the most frequently mutated site, in the fusion peptide of gp41. We also examined the distribution, extent, and sequence context of insertions and deletions, and we provide evidence that the length variation seen in hypervariable loop regions of the envelope glycoprotein is a consequence of selection and not of mutational hotspots. Our results provide a detailed view of the process of diversification of HIV-1 following transmission, highlighting the role of CTL escape and hypermutation in shaping viral evolution during the establishment of new infections.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19424423 PMCID: PMC2671846 DOI: 10.1371/journal.ppat.1000414
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Parameter estimates for the neutral and selection model applied to env sequences from different Fiebig stage datasets.
| Fiebig Stage | dN/dS: Neutral Model (M1a) | dN/dS: Selection Model (M2a) | 2 * Delta Log Likelihood | P-value (M1a vs M2a) |
| I–II | 0.6912 | 0.6995 | 1.2833 | 0.5264 |
| I–III | 0.6687 | 0.7016 | 10.0741 | 0.0065 |
| I–V | 0.6307 | 0.7130 | 18.3186 | 0.0001 |
Positive selection results obtained using HyPhy from a dataset excluding individuals with sequences enriched for APOBEC hypermutation, as well as from the complete dataset including the hypermutated sequences (the latter sites are indicated with +).
| HXB2 | Data | Location | Posterior Probability | p-value/ q-value | Number of subjects with variation | APOBEC3 in subject's mutations | Timing (Fiebig Stage) | Number of subjects with a mutational pattern out of all 81 subjects | CTL testing |
| 62 | 79 | gp120 C1 | 0.765 | - | 5 | 4 of 5 | I–V | 2 D to N, 1 D to Y, 2 E to K | Yes |
| 64 | 81 | gp120 C1 | 0.505 | - | 2 | No | I–V |
| Yes |
| 66 | 83 | gp120 C1 | 0.979 | - | 4 | No | I–V |
| Yes |
| 66 ∼ | 83 | gp120 C1 | - |
| 4 | No | IV–VI |
| Yes |
| 66 ∼ | 83 | gp120 C1 | - |
| 4 | No | IV–VI |
| Yes |
| 175 | 226 | gp120 V2 | 0.983 | - | 7 | No | I–V | 3 L to P, 3 L to F, 1 N to S | No |
| 176 | 227 | gp120 V2 | 0.655 | - | 2 | No | I–V | 1 F to S, 1 F to V | No |
| 232 | 306 | gp120 C2 | 0.853 | - | 4 | No | I–V | 1 K to E, 1 K to R, 1 T to A, 1 T to M | No |
| 242 | 316 | gp120 C2 | 0.652 | - | 3 | No | I–V | 2 V to I, 1 V to L | No |
| 274 | 349 | gp120 C2 | 0.722 | - | 3 | No | I–V | 1 S to T, 1 S to F, 1 S to P | No |
| 322 + | 396 | gp120 V3 | 0.51 | - | 4 | 2 of 4 | I–V | 2 E to K, 1 E to G, 1D to N | No |
| 337 | 412* | gp120 C3 | 0.932 | - | 5 | 2 of 5 | I–V | 1 K to E, 1 K to R, 1 E to K, 1 D to N, | No |
| 344 + | 419 | gp120 C3 | 0.588 | - | 3 | No | I–V |
| No |
| 347 + | 422 | gp120 C3 | 0.91 | - | 5 | 4 of 5 | I–V | 2 R to K, 1 K to R, 1 G to R, | No |
| 354 | 431 | gp120 C3 | 0.895 | - | 3 | 2 of 3 | I–V | 1 E to K, 1 G to E, | Yes |
| 360 | 439 | gp120 C3 | 0.897 | - | 3 | No | I–V | 1 V to G, 1 V to A, | Yes |
| 372 ∼ | 454* | gp120 C3 | - |
| 1 | No | I–II* |
| No |
| 381 ∼ | 463 | gp120 C3 | - |
| 4 | 4 of 4 | III–VI | 4 To K | No |
| 381 ∼ | 463 | gp120 C3 | - |
| 4 | 4 of 4 | III–VI | 4 Away from E | No |
| 460 | 566 | gp120 C4 | 0.87 | - | 2 | No | I–V | 1 N to K, | Yes |
| 460 ∼ | 566 | gp120 C4 | - |
| 2 | No | IV–VI |
| Yes |
| 482 + | 601 | gp120 C5 | 0.669 | - | 5 | 5 of 5 | I–V |
| No |
| 509 | 634 | gp120 C5 | 0.904 | - | 7 | 7 of 7** | I–V | 7 E to K | No |
| 513 | 638 | gp41 | 0.728 | - | 2 | No | I–V | 1 V to S, 1 V to G | No |
| 518 | 646 | gp41 | 0.999 | - | 9 | No | I–V | 2 M to I, 7 M to V | No |
| 587 | 719 | gp41 | 0.77 | - | 2 | No | I–V | 2 L to I | No |
| 588 | 720 | gp41 | 0.546 | - | 3 | 2 of 3 | I–V | 2 R to K, 1 K to R | No |
| 612 | 744 | gp41 | 0.841 | - | 2 | No | I–V |
| Yes |
| 632 ∼ | 768 | gp41 | - |
| 4 | 4 of 4 | III–VI | 4 To K | No |
| 632 ∼ | 768 | gp41 | - |
| 4 | 4 of 4 | III–VI | 4 Away from E | No |
| 648 ∼ | 786 | gp41 | - |
| 3 | 3 of 3 | III–VI | 3 To K | No |
| 651 | 789 | gp41 | 0.598 | - | 3 | No | I–V |
| No |
| 696 | 834 | gp41 | 0.635 | - | 7 | 6 of 7 | I–V |
| No |
| 700 | 838* | gp41 | 0.535 | - | 3 | No | I–V |
| No |
| 702 + | 840 | gp41 | 0.533 | - | 3 | No | I–V | 2 L to F, 1 L to P | No |
| 703 + | 841 | gp41 | 0.541 | - | 3 | No | I–V | 1 S to A, 1 S to F, 1 S to P | No |
| 817 ∼ | 965 | gp41 | - |
| 2 | No | V–VI |
| Yes |
| 817 ∼ | 965 | gp41 | - |
| 2 | No | V–VI |
| Yes |
| 831 | 979 | gp41 | 0.664 | - | 2 | 2 of 2 | I–V |
| Yes |
| 831 ∼ | 979 | gp41 | - |
| 2 | 2 of 2 | V–VI |
| Yes |
| 833 | 981 | gp41 | 0.993 | - | 2 | No | I–V | 1 V to A, | Yes |
| 833 ∼ | 981 | gp41 | - |
| 2 | No | V–VI |
| Yes |
| 841 | 989 | gp41 | 0.772 | - | 4 | No | I–V | 1 L to H or gap, 1 L to P, 1 L to F, 1 I to T | Yes |
Sites identified using the maximum likelihood phylogeny-based method are indicated with ∼ . The location, timing, and mutational patterns observed are provided as well as an indication of whether CTL testing was carried out or not.
Key for
HXB2: Coordinates listed according to HXB2 numbering (http://www.hiv.lanl.gov/content/sequence/LOCATE/locate.html)
Data: Coordinate listed according to the protein alignment of all 81 acutely infected subjects.
Location: Region in Envelope.
Posterior Probability: Sites identified in HyPhy with a posterior probability>0.5 are included in this table.
P-value/q-value: The p and q values from the Phylogenetic analysis for a specific mutational pattern being associated with an earlier or later Fiebig stage.
Number of subjects with variation: Out of the 81 subjects, the number that had any variation in this site.
APOBEC3 in subject's mutations: The number of subjects among those that vary in a given position that have a G to A change in the context of an APOBEC3 motif.
Timing: The selection results were obtained using all samples in Fiebig stages I–V.
For the phylogenetic method a change at the site was enriched in the range of Fiebig stages shown.
Number of subjects with a mutational pattern: This summarizes all individuals with changes found in this position in the data. Italics means a change was found more than once in at least one person with the pattern.
Posterior probabilities: The number of individuals that have a change from the most common amino acid to another. If there was more than 1 change in any of the individuals, it is noted in italics.
Example 1: Site 651: 2 N to S, 1 S to G means: Two people had N as the most common amino acid, but S was present. PRB931 had 17 N and 2 S, PRB956 had 25 N and 1 S. Because there was more than one S in one of them, it is in italics. One person, 700010077, in noted as S to G, and had 51 S and 1 G.
Example 2: Site 518: 2 M to I, 7 M to V means: Nine people had M as the most common amino acid, and each of the nine had a single variant among their sequences, and it was I in 2 of them, or V in nine of them. For example, subject 1012 had 42 sequences, with 41 M and 1 V.
Phylogenetic method: The within-subject change that was observed to be enriched either early or late Fiebig stages.
Example: Site 460: 2 Away from N means: In the subjects that had changes in this position, the ancestral state of the founding virus (the transmitted virus) was most likely to be N, and in both cases at least one change away from N was observed. Because it is italics, at least one of the two carried more than one mutation from N in this position.
CTL testing: If a variant was found multiple times in a patient or embedded in the context of additional proximal changes, peptides spanning the region were generated and T-cell assays were perform (see Table 3).
*: In Keele et al., we noted patients bearing these changes might have been infected with more than one closely related form. Alternatively, early selection or maintenance of a very early mutational event might be giving rise to the pattern.
**: This site was difficult to align in a few patients due to a frameshifting insertion of an A in a string of As in the primary sequences, thus some of 7 noted E to K changes were actually due to a frameshifting indel.
***: In an additional subject there was an ambiguous base call in this position.
CTL results indicating the six regions tested as well as the specific epitope sequences.
| Alignment coordinate | Patient (Fiebig stage) | Sequences Tested (Database Analog) | Epitope Annotation | EliSpot SFU/106 PBMCs | Cultured Elispot SFU/106 PBMCs | ICS % Total CD8+ memory T-cells | Summary |
|
| MEMI (V) |
| Putative epitope | 487 | 15144 | N/A | Positive |
|
| Common variant - escape? | 222 | 10227 | N/A | Diminished | ||
|
| 18mer bound selected area | 133 | 1655 | N/A | Positive | ||
|
| 18mer common transmitted variant | Negative | 2086 | N/A | Diminished | ||
|
| 700010077 (V) |
| Putative epitope | 477 | N/A | N/A | Positive |
|
| 18mer bound selected area | 107 | N/A | 0.71 (32 d) | Positive | ||
|
| 18mer common transmitted variant | Negative | N/A | N/A | Negative | ||
|
| 18mer common transmitted variant | Negative | N/A | N/A | Negative | ||
|
| MEMI (V) |
| Putative epitope | Negative | 9930 | N/A | Positive |
|
| Common variant - escape? | Negative | 2468 | N/A | Diminished | ||
|
| 18mer bound selected area | Negative | Negative | N/A | Negative | ||
|
| 18mer common transmitted variant | Negative | Negative | N/A | Negative | ||
|
| Putative epitope | Negative | Negative | N/A | Negative | ||
|
| Common variant - escape? | Negative | Negative | N/A | Negative | ||
|
| Putative epitope | Negative | Negative | N/A | Negative | ||
|
| Common variant - escape? | Negative | Negative | N/A | Negative | ||
| 744 | Z13 (V) |
| Putative epitope | N/A | N/A | Negative | Negative |
|
| 18mer bound selected area | 572 | N/A | N/A | Positive | ||
|
| 18mer common transmitted variant | N/A | N/A | Negative | Negative | ||
| 965 | Z36 (V) |
| Putative epitope | N/A | N/A | Negative | Negative |
|
| 18mer bound selected area | N/A | N/A | Negative | Negative | ||
|
| 18mer common transmitted variant | N/A | N/A | Negative | Negative | ||
| 566 | MEMI (V) |
| Putative epitope | Negative | Negative | N/A | Negative |
|
| Common variant - escape? | Negative | Negative | N/A | Negative | ||
|
| 18mer bound selected area | Negative | Negative | N/A | Negative | ||
|
| 18mer common transmitted variant | Negative | Negative | N/A | Negative |
Sites that occur in the APOBEC3 context are shown in italics.
Figure 1Posterior probabilities of belonging to the selection class (ω>1) for all sites in gp160.
The dashed line indicates the 0.5 posterior probability, which we used as a threshold to assign sites to the selection site class. Flat parts of the graph correspond to sequence regions that were masked either because they were poorly aligned or coding in more than one frame.
Figure 2Three-dimensional structure context of the selected sites identified in gp120.
(A) 13 sites from the dataset excluding sequences with evidence of APOBEC-mediated hypermutation, and (B) 4 additional sites identified analyzing the complete dataset. The sites depicted in blue represent those sites that are embedded in a known or potential CTL epitope and circled site numbers are potentially affected by APOBEC hypermutation. Sites marked with an asterisk occur in a region for which there is no available structure, therefore the positions are shown in proximity to their actual locations.
Figure 3Selected sites that are embedded in potential CTL epitopes.
The patient consensus sequence is at the top of each alignment, with Fiebig stage, patient ID, and CON for consensus indicated. The proposed epitope is shown beneath the patient consensus, followed by the HLA. Previously reported epitopes are provided in full, predicted epitopes are written with uppercase letters representing the anchor motif embedded in a string of x's. (A–E) are based on i) evidence for selection from the model implemented in HyPhy, ii) evidence of selection from locally concentrated regional evidence for selection relative to extremely homogeneous early infection alignments, and iii) HLA appropriate known epitopes or anchor motifs found using the Los Alamos HIV immunology database epitope location finder, ELF. (F) This potential escape mutation was found by the phylogenetic test for changes enriched in later Fiebig stages, and was also supported by (ii) and (iii) above (Table 2). Mutations in (A–C) also were favored in later Fiebig stages. (G) This set of mutations includes the only other sites in the entire data set of 81 subjects where mutations tended to cluster over a small region in one person. No site in this region had statistical support for positive selection, but this cluster of mutations was embedded in a potential epitope.
Figure 4Maximum likelihood tree including all sequences from all 81 homogeneous patients illustrating the pattern of change in one site.
The tree is based on all nucleotide sequences and all positions in Env, and illustrates the mutational patterns in the context of the full tree at amino acid position 83. Position 83 is highly conserved, and is His in most sequences in most subjects (indicated by the red Xs). In the 4 subjects that are featured, it changes from His to Tyr in at least some of the sequences in each subject, indicated by the blue 0 s. Although the trees were generated for each patient separately for the HyPhy analysis, this kind of repeated mutation pattern in a single site is an illustration of the kind of change captured in the dN/dS ratio of the HyPhy analysis (Table 2). This site, 83, was also identified as interesting in the full tree analysis as it tended to change from H to Y in patients at later Fiebig stages (Table 2).
Figure 5Cumulative distribution of insertions and deletions in acutely infected patients.
(A) The frameshifting single base indels observed in early infection were distributed throughout the Env gene, with a few concentrated regions localized in particular strings of like bases (red and black lines in the figure). As these substitutions are lethal, they must have arisen in the newly infected individual. In contrast, most of the single base indels in the 21 heterogeneous infection cases were compensatory, associated with nearby mutations that resolved the reading frame, and most of the compensated indels are localized in the hypervariable domains. (B) Similarly, multiple base frameshifting indels are distributed throughout Env, while in-frame indels are concentrated in the hypervariable regions.