| Literature DB >> 21980282 |
S Gnanakaran1, Tanmoy Bhattacharya, Marcus Daniels, Brandon F Keele, Peter T Hraber, Alan S Lapedes, Tongye Shen, Brian Gaschen, Mohan Krishnamoorthy, Hui Li, Julie M Decker, Jesus F Salazar-Gonzalez, Shuyi Wang, Chunlai Jiang, Feng Gao, Ronald Swanstrom, Jeffrey A Anderson, Li-Hua Ping, Myron S Cohen, Martin Markowitz, Paul A Goepfert, Michael S Saag, Joseph J Eron, Charles B Hicks, William A Blattner, Georgia D Tomaras, Mohammed Asmal, Norman L Letvin, Peter B Gilbert, Allan C Decamp, Craig A Magaret, William R Schief, Yih-En Andrew Ban, Ming Zhang, Kelly A Soderberg, Joseph G Sodroski, Barton F Haynes, George M Shaw, Beatrice H Hahn, Bette Korber.
Abstract
Here we have identified HIV-1 B clade Envelope (Env) amino acid signatures from early in infection that may be favored at transmission, as well as patterns of recurrent mutation in chronic infection that may reflect common pathways of immune evasion. To accomplish this, we compared thousands of sequences derived by single genome amplification from several hundred individuals that were sampled either early in infection or were chronically infected. Samples were divided at the outset into hypothesis-forming and validation sets, and we used phylogenetically corrected statistical strategies to identify signatures, systematically scanning all of Env. Signatures included single amino acids, glycosylation motifs, and multi-site patterns based on functional or structural groupings of amino acids. We identified signatures near the CCR5 co-receptor-binding region, near the CD4 binding site, and in the signal peptide and cytoplasmic domain, which may influence Env expression and processing. Two signatures patterns associated with transmission were particularly interesting. The first was the most statistically robust signature, located in position 12 in the signal peptide. The second was the loss of an N-linked glycosylation site at positions 413-415; the presence of this site has been recently found to be associated with escape from potent and broad neutralizing antibodies, consistent with enabling a common pathway for immune escape during chronic infection. Its recurrent loss in early infection suggests it may impact fitness at the time of transmission or during early viral expansion. The signature patterns we identified implicate Env expression levels in selection at viral transmission or in early expansion, and suggest that immune evasion patterns that recur in many individuals during chronic infection when antibodies are present can be selected against when the infection is being established prior to the adaptive immune response.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21980282 PMCID: PMC3182927 DOI: 10.1371/journal.ppat.1002209
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Number of subjects and SGA sequences used in this study.
| Dataset | Stage | Total Number | |
| Subjects | Sequences | ||
| Original | Early | 48 | 1340 |
| Chronic | 43 | 892 | |
| Holdout | Early | 43 | 1375 |
| Chronic | 43 | 1230 | |
| Plasma Donors | Early | 44 | 1466 |
| LANL Database | Chronic | 54 | 760 |
Summary statistics for the only single-site signature found in Env based on within-subject consensus sequence analysis, His at position 12.
| Data Analysis | HXB2 Pos | Align Pos | Original | Holdout | Fiebig stage | Direction | Change Early | Stasis Early | ||
| p-value | q-value | p-value | q-value | Change Chronic | Stasis Chronic | |||||
| Consensus Tree | 12 H | 12 | 0.001 | 0.07 | 0.12 | 0.30 | F1–F5 | H ->!H | 2 | 35 |
| chronic | 13 | 21 | ||||||||
| Full Tree, strong | 12 H | 12 | 4×10−9 | 9×10−8 | 9×10−5 | 0.0005 | F1–F5 | H ->!H | 8 | 67 |
| chronic signatures | chronic | 57 | 54 | |||||||
| Full Tree | 12 H | 12 | 8×10−5 | 0.0007 | 0.08 | 0.19 | F1–F6 | !R ->R | 2 | 14 |
| chronic | 20 | 6 | ||||||||
| Full Tree | 12 H | 12 | 1×10−5 | 0.0002 | ns | ns | F1–F6 | !P ->P | 0 | 91 |
| chronic | 20 | 127 | ||||||||
The full tree analysis and summary of common changes in position 12 support this signature, and are also provided. The direction indicates the signature amino acids, and H ->!H is read as H changes to “not His” (i.e. any other amino acid). The Fiebig stage indicates the group included in the comparison that gave the p-value shown. For example, F1-F5 means that Fiebig stages F1–F5 were included in the early group, and the p-values for this set are given, as they have the lowest p- and q-values. Five increasing inclusive levels of Fiebig stages were compared, however; all 5 groupings of Fiebig stages had a trend indicting support of this signature, although not always meeting the q-value threshold. The contingency table on the right of each row indicates the number of times the ML tree indicated a change between the ancestral state immediately preceding the consensus sequence, versus when the amino state did not change. Thus H is enriched among transmitted variants. In the consensus tree, it mutates away from H in only 2/37 times in acute/early, versus and 13/34 times in the chronic cases (5% in acutes versus 38% in chronics). In the full tree including all of the sequences, the distinction was similarly pronounced, changing 8/75 in acute cases and 57/111 in chronics (10% versus 51%). H most frequently mutates to R or P during the course of an infection; changes to P were statistically not supported (ns) in the holdout set.
Summary statistics additional signatures identified with additional searches, using the combined original and PD/DB sets to identify potential signatures and comparing to the holdout set. For legend see table 2.
| Data Analysis | HXB2 Pos | Align Pos | Original+PD/DB | Holdout | Fiebig stage | Direction | Change Acute | Stasis Acute | ||
| p-value | q-value | p-value | q-value | Change Chronic | Stasis Chronic | |||||
| Homogeneous | 415 | 525 | 0.003 | 0.40 | 0.05 | 0.11 | F1–F2 | T -> | 14 | 30 |
| Early, consensus |
| 9 | 78 | |||||||
| Full tree, strong chronic signatures | 397 | 487 | 3×10−11 | 5×10−9 | 1×10−9 | 6×10−8 | F1–F4 | N ->!N | 6 | 146 |
| 66 | 154 | |||||||||
| Full tree, strong chronic signatures | 399 | 489 | 5×10−11 | 5×10−9 | 3×10−6 | 3×10−6 | F1–F6 | T ->!T | 17 | 184 |
| 77 | 148 | |||||||||
| Full tree, strong chronic signatures | 362 | 445 | 6×10−11 | 1×10−8 | 1×10−8 | 1×10−6 | F1–F2 | N ->!N | 11 | 82 |
| 130 | 138 | |||||||||
| Consensus Tree | Ref 1 | Ref 2 | 0.007 | 0.23 | 0.01 | 0.28 | F1–F5 | L[IV]---N -> | 0 | 36 |
| CCR5 model set | !L[IV]---N | 8 | 35 | |||||||
One new acute signature site was identified through restricting the search to just the homogeneous early infection samples, !T415. This association was significant only for a grouping of the earliest samples, from Fiebig stages 1 and 2. Three sites in addition to site 12 (already included in Table 2) were strongly supported signatures of recurrent change in the chronic subjects using full tree analyses. One combination of sites was found through more intensive examination of the functional domain sets. It was found in the CCR5 CoRbs model, defined based on a heavy-atom based distance criterion to identify the proximal amino acids to the CCR5 CoRbs.
Region explored for Ref-1 HXB2 amino acid and positions, complex signature positions in bold; Ref-2 refers to the alignment position given in parenthesis. Q114 (133), L122 (141), I201 (271), Q203 (273), A204 (274), S209 (279), N377 (463), Y384 (470), A436 (546) and P437 (547)
Signature hypotheses raised based on analysis of all within-subject consensus sequences.
| HXB2 Pos | Align Pos | p value mafft | p value hmmer | q value | OR | Cross Validate train | Cross Validate holdout | Fiebig stage | Direction | Change Early | Stasis Early | Region |
| Change Chronic | Stasis Chronic | |||||||||||
| 12 H | 12 | 0.0067 | 0.0039 | 0.46 | 0.39 | 8 | 6 | F1–F5 | H to !H | 19 | 108 | Signal peptide |
| chronic | 38 | 85 | ||||||||||
| 192 K | 262 | 0.0005 | 0.0029 | 0.28 | 0 | 10 | 9 | F1–F3 | R to !R | 0 | 86 | V2 |
| chronic | 11 | 107 | base | |||||||||
| 309 I | 381 | 0.0006 | 0.0010 | 0.29 | 0.27 | 6 | 2 | F1–F4 | I to !I | 9 | 83 | V3 near tip |
| chronic | 35 | 88 | ||||||||||
| 415 T | 525 | 0.0100 | 0.0031 | 0.48 | 3.35 | 6 | 6 | F1–F2 | T to !T | 18 | 43 | V4 PNLG |
| early | 14 | 113 | ||||||||||
| 446 V | 556 | 0.0010 | 0.0010 | 0.40 | 0 | 4 | 3 | F1–F6 | !V to V | 0 | 145 | PNLG |
| chronic | 9 | 121 | ||||||||||
| 455 T | 565 | 0.0019 | 0.0014 | 0.23 | 0 | 6 | 6 | F1–F4 | T to !T | 0 | 103 | V5 CD4bs VRC01 |
| chronic | 12 | 117 | ||||||||||
| 543 Q | 681 | na | 0.0047 | 0.42 | 0.14 | 3 | 3 | F1–F6 | L to !L | 2 | 37 | gp41 |
| chronic | 13 | 32 | ||||||||||
| 700 A | 851 | na | 0.0064 | 0.43 | 0.21 | 0 | 0 | F1–F4 | A to !A | 4 | 50 | Trans-membrane |
| chronic | 16 | 42 | ||||||||||
| 703 S | 854 | 0.0200 | 0.0033 | 0.37 | 7.51 | 2 | 0 | F1–F4 | S to !S | 11 | 93 | Cytoplasmic tail |
| chronic | 2 | 128 | ||||||||||
| 721 L | 873 | 0.0002 | 0.0005 | 0.14 | 8.39 | 1 | 0 | F1–F2 | !F to F | 11 | 54 | Cytoplasmic tail |
| early | 3 | 125 |
Consensus sequences from each subject from all three sets (Table 1, main text) were combined in a hypothesis-raising context (the Test set “All con”). 2 acute signatures were observed (in bold): selecting for a loss of T in acutes at position 415 (discussed in the text), and selecting for F at 721. Key: HXB2 Pos: the HXB2 Env position and amino acid. Aln Pos: The corresponding position in the Env protein alignment. Sig AA: The signature amino acid. Test set: “All con” was based on comparing acute and chronic data using a consensus from each patient and combining all three datasets described in Table 1 in the main text. We raised the q value threshold to 0.5 for this exploratory summary, so we could identify a few potentially interesting sites; only half would be expected to be of interest. “Original” are the six sites for which a signature hypothesis was raised based on the original data; only position 12 H was later supported in the holdout data, so it is discussed further in the main text and was subsequently experimentally validated to regulate expression levels. Here we used our standard q threshold of 0.2. Pattern: “A to !A” means the signature amino acid is predicted in the maximum likelihood tree to be A in the most recent ancestral node of the subject, but to have changed to not being the signature amino acid (“!A” means “not A”) in the subject. This change contrasted to the signature amino acid remaining the same in the contingency table (The signature amino acid A it found in the recent ancestor and the leaf node). “!A to A” is the inverse situation where the ancestral state is not the signature amino acid. FS: Fiebig Stage.
Summary statistics using the combined original and PD/DB sets and holdout set to the gain or loss of PNLGs, defined as the motif NX[ST], where N is Asp, X is any amino acid besides Pro, and [ST] is a Ser or Thr.
| HXB2 Pos | Align Pos | Original+PD/DB | Holdout | Fiebig stage | Direction | ||
| p-value | q-value | p-value | q-value | ||||
| 397 | 487 | 2×10−11 | 3×10−10 | 9×10−5 | 1×10−4 | F1–F4 | Recurrent loss of potential N-linked glycosylation sites during chronic infection |
| 362 | 445 | 6×10−7 | 6×10−6 | 0.02 | 0.02 | F1–F6 | |
| 356 | 438 | 1×10−7 | 6×10−7 | 0.002 | 0.002 | F1–F6 | |
| 392 | 478 | 1×10−5 | 6×10−5 | 2×10−5 | 3×10−5 | F1–F3 | |
| 462 | 576 | 1×10−5 | 6×10−5 | 3×10−11 | 2×10−10 | F1–F4 | |
| 188 | 249 | 1×10−5 | 8×10−5 | 3×10− | 7×10−5 | F1–F4 | |
Figure 1Mapping of signature sites (red) on the three-dimensional structure of gp120 (silver).
A ribbon structure of the HIV-1 gp120 core +V3 in the CD4-bound conformation is shown in white. (A) Key residues involved in co-receptor and antibody (2G12, b12, b13 and F105) binding that are proximal to the position 415 are shown. Residues 295 and 332, that contribute to the 2G12 epitope, and residue 444, that is important for co-receptor binding, are shown as blue balls. A motif spanning the region 417 to 421 (cyan color) that is proximal to position 415 and contains residues that take part in binding to coreceptor (419), b12 (417–419), b13 (419–421) and F105 (421). CD4 (orange) is shown for better visualization of receptor binding site region. (B) Locations of signature patterns involving glycan motifs (N-notP-[ST]). (C) Spatial locations of signature sites within a set of functional sites (blue) associated with CCR5 binding. The 17b antibody Fab is included to mark the region in gp120 that takes part in CCR5 binding. Signature sites are labeled with HXB2 reference numbers.
Figure 2p- and q-values found in shuffling experiments in which the entire sequence signature strategy was repeated 10 times after randomizing the early and chronic designation of each subject.
The black x's represent the distribution of p- and q-values in the real data, while the colored circles represent the findings for incremental inclusion of Fiebig stages 2–6 in shuffled data. The lower quadrant of part of the graph is almost exclusively occupied by the real data, indicating a signature dependent on early versus chronic status; p-values of less than 10−6 were rare in the randomized data, and value less than 10−8 were exclusively found among real data classifications.
Summary statistics regarding changes in regional hydrophobicity associated with chronic infection.
| Data Analysis | Set number | Original+PD/DB | Holdout | Correlation Coefficient Original | Correlation Coefficient Test | Direction | |||
| p-value | q-value | p-value | q-value | ||||||
| Change in Polarity | 270 | 1×10−12 | 1×10−11 | 0.04 | 0.01 | 0.64 | 0.18 | Chronic sets are more polar | |
| 368 | 1×10−6 | 1×10−5 | 0.05 | 0.01 | 0.47 | 0.18 | |||
| 362 | 1×10−4 | 1×10−3 | 1×10−4 | 1×10−3 | 0.38 | 0.34 | |||
Sets of amino acids including in the three statistically interesting regions. These tests compared sequences from all Fiebig stages, F1–F6, to chronic samples.
Spatial Region 270: I359,T358,I360,E466,N397,K357,F396,S465,A346,I467,F361.
Spatial Region 368: S465,E466,E464,T358,K357,N463,N462,I359,I467,I360,G459.
Spatial Region 362: G459,G458,N460,D457,S461,N462,E466,I467,R456,N463,S465.
Figure 3Three statistically significant structures-based regional clusters in gp120 (white) associated with changes in polarity.
These regional clusters occur near the CD4-binding site (orange) shown in (A). The CD4-bound conformation of the HIV-1 gp120 core+V3 is shown, from the perspective seen by CD4. The three clusters (B–D) are shown in red. The residues that form these sets are shown in panel (E). All maps are based on HXB2 numbering.