| Literature DB >> 21994599 |
Vlad Novitsky1,2, Rui Wang3, Stephen Lagakos3, Max Essex1,2.
Abstract
The diversity of HIV-1 and its propensity to generate escape mutants present fundamental challenges to control efforts, including HIV vaccine design. Intra-host diversification of HIV is determined by immune responses elicited by an HIV-infected individual over the course of the infection. Complex and dynamic patterns of transmission of HIV lead to an even more complex population viral diversity over time, thus presenting enormous challenges to vaccine development. To address inter-patient viral evolution over time, a set of 653 unique HIV-1 subtype C gag sequences were retrieved from the LANL HIV Database, grouped by sampling year as <2000, 2000, 2001-2002, 2003, and 2004-2006, and analyzed for the site-specific frequency of translated amino acid residues. Phylogenetic analysis revealed that a total of 289 out of 653 (44.3%) analyzed sequences were found within 16 clusters defined by aLRT of more than 0.90. Median (IQR) inter-sample diversity of analyzed gag sequences was 8.7% (7.7%; 9.8%). Despite the heterogeneous origins of analyzed sequences, the gamut and frequency of amino acid residues in wild-type Gag were remarkably stable over the last decade of the HIV-1 subtype C epidemic. The vast majority of amino acid residues demonstrated minor frequency fluctuation over time, consistent with the conservative nature of the HIV-1 Gag protein. Only 4.0% (20 out of 500; HXB2 numbering) amino acid residues across Gag displayed both statistically significant (p<0.05 by both a trend test and heterogeneity test) changes in amino acid frequency over time as well as a range of at least 10% in the frequency of the major amino acid. A total of 59.2% of amino acid residues with changing frequency of 10%+ were found within previously identified CTL epitopes. The time of the most recent common ancestor of the HIV-1 subtype C was dated to around 1950 (95% HPD from 1928 to 1962). This study provides evidence for the overall stability of HIV-1 subtype C Gag among viruses circulating in the epidemic over the last decade. However selected sites across HIV-1C Gag with changing amino acid frequency are likely to be under selection pressure at the population level.Entities:
Keywords: CTL epitopes; Gag; HIV-1 subtype C; amino acid frequency; consensus sequence; gag phylogeny; time of MRCA
Year: 2010 PMID: 21994599 PMCID: PMC3185553 DOI: 10.3390/v2010033
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.818
Figure 1.Phylogenetic relationships between HIV-1 subtype C gag sequences. Phylogenetic tree was constructed by PhyML. Color branches represent year of sampling by group. Nodes with significant aLRT support are highlighted.
Figure 2.Phylogenetic network of HIV-1 subtype C gag sequences (n=653). The presented split network was generated by SplitsTree v4 [32,33] using the NeighborNet approach. To highlight multiple splits and parallel branching, the central part of the split network is enlarged.
Figure 3.Pairwise distances of HIV-1 subtype C gag sequences collected at different time points. In the box plots: The boundary of the box closest to zero indicates the 25th percentile, a black line within the box marks the median value, a red line within the box shows the mean, and the boundary of the box farthest from zero indicates the 75th percentile. Whiskers above and below the box indicate the 10th and 90th percentiles. Points above and below the whiskers indicate the 5th and 95th percentiles. Five groups in each graph correspond to the time of sampling. A: Pairwise distances within group by sampling time. B: Pairwise distances to HIV-1 subtype C consensus. Comparisons between groups are based on Mann-Whitney sum rank test.
Figure 4.Changing amino acid residues in HIV-1 subtype C Gag. A: Overview of changing amino acid residues across Gag cleavage products. Location of changing amino acids is shown in relation to HXB2 numbering. Changing amino acid sites with frequency change of more than 20% are shown in red, and sites with frequency change between 10% and 20% are shown in blue. Location of changing amino acids detected by chi-square test and Cochran-Armitage trend test is depicted in black under the Gag bar. B: Dynamics of amino acid frequency at top 20 sites (significant changes by all three methods). Number in the upper left corner in each graph depicts location in relation to HXB2 numbering.
Figure 5.Normalized dN-dS values across gag codons. P-values were derived from a two-tailed extended binomial distribution. A total of 31 positively and 242 negatively selected sites were found.
Amino acid positions in HIV-1 subtype C Gag with frequency change ≥20% in the subtype C consensus sequence: potential association with known CTL epitopes. (Note: Epitopes with multiple changing amino acids are highlighted; changing amino acids are shown in bold and underscored).
| Gag cleavage product | Amino acid position | Number of associated CTL epitopes | Epitope | HXB2 | HIV-1 Subtype | HLA restriction | |
|---|---|---|---|---|---|---|---|
| start | end | ||||||
| p17 | ASILRG | 5 | 15 | C | |||
| RLRPGGKK | 20 | 29 | C | A*3002 | |||
| RLRPGGKK | 20 | 30 | C | ||||
| RPGGKK | 22 | 29 | A, C, D | B42, B7 | |||
| RPGGKK | 22 | 30 | C | B35, Cw*0602 | |||
| RPGGKK | 22 | 31 | A, C, D | B*0702, B*5801, B*8101 | |||
| K | 27 | 35 | C | Cw*0602 | |||
| 28 | 36 | A, C | A*2301 | ||||
| 28 | 36 | A, B, C | A*2301, A*2402, A24 | ||||
| 28 | 36 | A, C, D | B*0702, B*5801, B*8101 | ||||
| 28 | 38 | C | |||||
| H | 33 | 41 | C | Cw*0602, Cw*0804 | |||
| 34 | 44 | C | A*3002, A30, B*5703, B57 | ||||
| EEL | 73 | 82 | C | B*4006 | |||
| 76 | 86 | B, C | A*30, A*3002, A30, B57, B58, B63 | ||||
| p24 | RALGPGA | 335 | 343 | A, B, C, D | B7 | ||
| ALGPGA | 336 | 346 | C | ||||
| GPGA | 338 | 346 | A, C, D | B*0702, B*5801, B*8101 | |||
| A | 341 | 349 | B, C, CRF01_AE | A*0201, A*0206, A*0220, A*0234, A*0236, A2 | |||
| p2 | |||||||
Figure 6.Bayesian skyline plot of HIV-1 subtype C. The upper 95% HPD, median, and lower 95% HPD of HIV-1 subtype C are projected on the time line. The bold black line traces the inferred median effective population size over time with the 95% HPD shaded in blue.
HIV-1 subtype C gag sequences (>1,000 bp) included in the analyses by country of origin and by year of sampling. Countries with fewer than 10 sequences are presented as ‘Others’, and include Argentina (2), Brazil (6), China (2), Cyprus (5), Denmark (2), Djibouti (1), Georgia (1), Kenia (3), Senegal (1), Somalia (1), Spain (3), Uganda (2), USA (2), Uruguay (1), Yemen (1), and Zimbabwe (6). Groups of analyzed gag sequences by sampling year are outlined at the bottom.
| 49 | 10 | 7 | 4 | 25 | 3 | |||||||||||||||||||
| 12 | 1 | 4 | 3 | 1 | 1 | 2 | ||||||||||||||||||
| 41 | 4 | 5 | 2 | 23 | 4 | 2 | 1 | |||||||||||||||||
| 22 | 1 | 4 | 17 | |||||||||||||||||||||
| 11 | 1 | 3 | 7 | |||||||||||||||||||||
| 431 | 3 | 17 | 27 | 72 | 73 | 22 | 111 | 79 | 24 | 3 | ||||||||||||||
| 18 | 2 | 11 | 5 | |||||||||||||||||||||
| 30 | 1 | 1 | 6 | 11 | 8 | 3 | ||||||||||||||||||
| 39 | 1 | 2 | 2 | 1 | 5 | 0 | 4 | 6 | 2 | 2 | 5 | 2 | 5 | 1 | 1 | |||||||||
| 653 | 1 | 0 | 5 | 1 | 2 | 2 | 1 | 5 | 5 | 0 | 14 | 4 | 34 | 58 | 131 | 111 | 39 | 117 | 84 | 26 | 8 | 4 | 1 | |
Amino acid positions in HIV-1 subtype C Gag with frequency change 10% to 20% in the subtype C consensus sequence: potential association with known CTL epitopes (Note: Epitopes with multiple changing amino acids are highlighted; changing amino acids are shown in bold and underscored).
| Gag cleavage product | Amino acid position | Number of associated CTL epitopes | Epitope | HXB2 | HIV-1 Subtype | HLA restriction | |
|---|---|---|---|---|---|---|---|
| start | end | ||||||
| p17 | AS | 5 | 15 | C | |||
| ELD | 12 | 21 | B, C | B63 | |||
| 18 | 27 | B, C, multiple | A*0301, A11, A3, B27 | ||||
| 20 | 29 | C | A*3002 | ||||
| RLRPGGKKHY | 20 | 30 | C | ||||
| RPGGKKRY | 22 | 30 | C | B35, Cw*0602 | |||
| RPGGKKKY | 22 | 31 | A, C, D | B*0702, B*5801, B*8101 | |||
| KRY | 27 | 35 | C | Cw*0602 | |||
| HY | 28 | 36 | A, C | A*2301 | |||
| HY | 28 | 36 | A, B, C | A*2301, A*2402, A24 | |||
| HY | 28 | 36 | A, C, D | B*0702, B*5801, B*8101 | |||
| HY | 28 | 38 | C | ||||
| EELRSL | 73 | 82 | C | B*4006 | |||
| RSL | 76 | 86 | B, C | A*30, A*3002, A30, B57, B58, B63 | |||
| SL | 77 | 85 | A, B, C, CRF02_AG, D, F, G, K | A*02.01, A*0201, A*0202, A*0205, A*0214, A*0220, A*0234, A*0236, A*68, A02, A2, B*1503 | |||
| SL | 77 | 86 | C | ||||
| L | 78 | 86 | C | A*2902, A29, B*4403 | |||
| L | 78 | 86 | C | A*2902 | |||
| p17 | YCVH | 86 | 96 | C | |||
| p24 | VKVIEEK | 156 | 164 | C | B*1503 | ||
| VKVVEEK | 156 | 164 | B, C | B*1503 | |||
| IEEK | 159 | 168 | C | B*4006 | |||
| IEEK | 159 | 169 | C | B*4501 | |||
| EEK | 160 | 168 | A, C | B*4415, B*4501 | |||
| EK | 161 | 168 | C | Cw*0602 | |||
| K | 162 | 172 | A, B, C, CRF02_AG, G | A*310102, A*6603, B*440302, B*5701, B*5703, B*5801, B57, B58, B63, B8, Cw*040101, Cw*07 | |||
| 163 | 173 | C | |||||
| AAEWDR | 209 | 219 | C | ||||
| AEWDR | 210 | 218 | B, C | A2, B*04, B*4006, Cw*0602 | |||
| RLHPVHAGP | 214 | 224 | C | ||||
| HPVHAGP | 216 | 224 | B, C | B*3910, B07, B35, B7 | |||
| HPVHAGP | 216 | 224 | A, B, C, D | B7 | |||
| GP | 221 | 228 | C, D | B35 | |||
| TSTLQEQI | 240 | 249 | B, C, HIV-2 | A*310102, A*6603, B*440302, B*5701, B*5703, B*58, B*5801, B27, B35, B57, B58, B63, B7, Cw*040101, Cw*07 | |||
| TSTLQEQI | 240 | 249 | B, C | B*5701, B*5703, B*5801, B57 | |||
| TLQEQI | 242 | 250 | B, C | A*0201, A*0220, A*0234, A*0236, A2 | |||
| PPIPVG | 254 | 262 | B, C | B*3501, B*3502, B35 | |||
| PPVPVG | 254 | 262 | C | B35 | |||
| PPIPVG | 254 | 262 | A, B, C, D | B35, B53, B7 supertype | |||
| PVG | 257 | 267 | C | ||||
| G | 259 | 267 | A, B, C, CRF02_AG, D | A*01, A*6801, B*0801, B*51, B8, Cw*07, Cw15, DQ2, DQ3, DR3, DR4 | |||
| 260 | 267 | B, C | B*0801, B8 | ||||
| TLRAEQATQ | 303 | 312 | C | Cw*0304 | |||
| RAEQATQ | 305 | 315 | C | ||||
| QATQ | 308 | 316 | C | B*5301, B*5801, B57 | |||
| NPDCKTIL | 327 | 337 | C | B*3910 | |||
| 335 | 343 | A, B, C, D | B7 | ||||
| 336 | 346 | C | |||||
| G | 338 | 346 | A, C, D | B*0702, B*5801, B*8101 | |||
| p2 | CLAEAMSQ | 362 | 370 | B, C | A*0201, A*0220, A*0234, A*0236 | ||
| p7 | |||||||
| 401 | 411 | C | |||||
| p1 | FLGKIWPS | 433 | 442 | A, B, C, CRF01_AE | A*0201, A*0205, A2 | ||
| p6 | |||||||
| PLTSL | 485 | 495 | C | ||||