| Literature DB >> 35910645 |
Paloma Troyano-Hernáez1, Roberto Reinosa1, Africa Holguín1.
Abstract
The emergence and spread of new HIV-1 variants pose a challenge for the effectiveness of antiretrovirals (ARV) targeting Pol proteins. During viral evolution, non-synonymous mutations have fixed along the viral genome, leading to amino acid (aa) changes that can be variant-specific (V-markers). Those V-markers fixed in positions associated with drug resistance mutations (DRM), or R-markers, can impact drug susceptibility and resistance pathways. All available HIV-1 Pol sequences from ARV-naïve subjects were downloaded from the United States Los Alamos HIV Sequence Database, selecting 59,733 protease (PR), 6,437 retrotranscriptase (RT), and 6,059 integrase (IN) complete sequences ascribed to the four HIV-1 groups and group M subtypes and circulating recombinant forms (CRFs). Using a bioinformatics tool developed in our laboratory (EpiMolBio), we inferred the consensus sequences for each Pol protein and HIV-1 variant to analyze the aa conservation in Pol. We analyzed the Wu-Kabat protein variability coefficient (WK) in PR, RT, and IN group M to study the susceptibility of each site to evolutionary replacements. We identified as V-markers the variant-specific aa changes present in >75% of the sequences in variants with >5 available sequences, considering R-markers those V-markers that corresponded to DRM according to the IAS-USA2019 and Stanford-Database 9.0. The mean aa conservation of HIV-1 and group M consensus was 82.60%/93.11% in PR, 88.81%/94.07% in RT, and 90.98%/96.02% in IN. The median group M WK was 10 in PR, 4 in RT, and 5 in IN. The residues involved in binding or catalytic sites showed a variability <0.5%. We identified 106 V-markers: 31 in PR, 28 in RT, and 47 in IN, present in 11, 12, and 13 variants, respectively. Among them, eight (7.5%) were R-markers, present in five variants, being minor DRM with little potential effect on ARV susceptibility. We present a thorough analysis of Pol variability among all HIV-1 variants circulating to date. The relatively high aa conservation observed in Pol proteins across HIV-1 variants highlights their critical role in the viral cycle. However, further studies are needed to understand the V-markers' impact on the Pol proteins structure, viral cycle, or treatment strategies, and periodic variability surveillance studies are also required to understand PR, RT, and IN evolution.Entities:
Keywords: HIV-1; Pol; conservation; integrase; protease; resistance; reverse transcriptase; variants
Year: 2022 PMID: 35910645 PMCID: PMC9330395 DOI: 10.3389/fmicb.2022.866705
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 6.064
FIGURE 1Number of HIV-1 Pol sequences per country included in this study as available in Los Alamos HIV sequence database (LANL) in January 2022. PR, protease; RT, reverse transcriptase; IN, integrase. (A) Protease sequences per country. Total LANL sequences: 59.733. (B) Reverse transcriptase sequences per country. Total LANL sequences: 6.437. (C) Integrase sequences per country. Total LANL sequences: 6.059. Eleven integrase sequences had no record of the country of origin.
FIGURE 2Geographic distribution by regions of HIV-1 Pol variants available in Los Alamos HIV sequence database (LANL) in January 2022. HIV-1 variant distribution within regions in PR (A), RT (B), and IN (C). PR, Protease; RT, reverse transcriptase; IN, integrase. Countries are colored by regions according to the United Nations geoscheme (https://unstats.un.org). Geographic regions color code inside the box in (A). Pie graphs show the percentage of the HIV-1 variants per region as available in LANL in January 2022 and the most frequent variant per region. The total number of available LANL sequences per region is in brackets beside the region name. NA, Northern Africa; SA, Southern Africa; EA, Eastern Africa; WA, Western Africa; CA, Central Africa; SAM, South America; CAC, Central America and The Caribbean; NAM, North America; OC, Oceania; NEU, Northern Europe; SEU, Southern Europe; EEU, Eastern Europe; WEU, Western Europe; CAS, Central Asia; SEAS, Southern and Southeastern Asia; EAS, Eastern Asia; WAS, Western Asia.
FIGURE 3Amino acid conservation rate along PR in HIV-1 and group M consensus. aa, amino acid; M, group M consensus. PR, protease (99 aa). Dots in group M represent the same aa as in HIV-1 consensus for that position. HXB2 reference sequence is described below the groups for further guidance. Colors represent the conservation rate. Residues of PR active site (triad Asp25-Thr26-Gly27, conserved among aspartyl proteases) are highlighted in red font. Orange triangles indicate positions where major DRM to PI are located according to Stanford v9.0 (Release Notes - HIV Drug Resistance Database, 2020) and summarized in https://cms.hivdb.org/prod/downloads/resistance-mutation-handout/resistance-mutation-handout.pdf. Aa code: A, alanine; C, cysteine; D, aspartic acid; E, glutamic acid; F, phenylalanine; G, glycine; H, histidine; I, isoleucine; K, lysine; L, leucine; M, methionine; N, asparagine; P, proline; Q, glutamine; R, arginine; S, serine; T, threonine; V, valine; W, tryptophan; Y, tyrosine.
FIGURE 5Amino acid conservation rate along IN in HIV-1 and group M consensus. Aa, amino acid; M, group M consensus. IN, integrase (288 aa). Dots in group M represent the same aa as in HIV-1 consensus for that position. HXB2 reference sequence is described below the groups for further guidance. Colors represent the conservation rate. Residues of the zinc-binding site (His12, His16, Cys40, and Cys43) and the D–D–E motif of the catalytic domain (Asp64, Asp116, and Glu152) are highlighted in red font. Green circles indicate positions where major INSTIs DRM are located according to Stanford v9.0 (Release Notes - HIV Drug Resistance Database, 2020). Aa code according to Figure 3.
Variability in protease (PR), retrotranscriptase (RT), and integrase (IN) active sites.
| Protein | Sites | Variability | |
| PR | Complete PR | 17.40%* | |
| Active site triad | Asp25 | 0.11% | |
| Thr26 | 0.04% | ||
| Gly27 | 0.01% | ||
| RT | Complete RT | 11.19%* | |
| Catalytic triad | Asp110 | 0.03% | |
| Asp185 | 0.04% | ||
| Asp186 | 0.06% | ||
| IN | Complete IN | 9.02%* | |
| Zinc-binding site | His12 | 0.08% | |
| His16 | 0.15% | ||
| Cys40 | 0.13% | ||
| Cys43 | 0.05% | ||
| D–D–E motif | Asp64 | 0.07% | |
| Asp116 | 0.08% | ||
| Glu152 | 0.23% | ||
PR, protease; RT, reverse transcriptase; IN, integrase; with an asterisk, mean residue variability for each protein calculated from their respective HIV-1 consensus.
FIGURE 6Percentage of aa conservation of PR, RT, and IN across the HIV-1 group M variants with >5 sequences at LANL. X-axis: HIV-1 group M variants with >5 available sequences at LANL (46 in PR, 16 in RT, and 36 in IN). Y-axis: conservation rate for each variant included in this analysis. The horizontal line represents 90% conservation.
Single V-markers and R-markers in protease found across HIV-1 variants with >5 LANL sequences.
| HIV-1 variant | Countries | V-markers and R-markers (bold red) |
| Group O | Belgium (2), Cameroon (12), Senegal (3), Spain (3), United Kingdom (2), United States (2) |
|
| Subtype J | Republic of Angola (2), The Democratic Republic of the Congo (2), Central African Republic (3), Congo (1), Cameroon (7), Gabon (1), Spain (1), Senegal (1), Belgium (2) | Q61E (83%) |
| CRF08_BC | China (430), India (1) | T12S (87%) |
| CRF13_cpx | Belgium (2), Burkina Faso (1), Cameroon (22), Central African Republic (5), Germany (5), Greenland (1), Poland (1), |
|
| CRF19_cpx | Cuba (172), Spain (4), Tunisia (1), United Kingdom (1) | H69Q (75%) |
| CRF35_A1D | Afghanistan (9), China (1), Iran (205), Romania (1) | L19Q (76%) |
| CRF49_cpx | Botswana (1), Gambia (3), Germany (4), Nigeria (1), Senegal (1) | D60N (90%), Q61D (90%), I66V (80%) |
| CRF51_01B | Singapore (8) | L63S (100%) |
| CRF60_BC | Brazil (1), Germany (2), Italy (22) |
|
| CRF63_02A6 | Kyrgyzstan (2), | K14R (79%) |
| CRF89_BF1 | Argentina (1), Spain (20), | T12E (91%) |
In brackets, number of sequences in variant or country and conservation percentage in markers; in bold red font, V-markers that are R-markers. None of the R-markers corresponded to major DRM to PI according to Stanfordv 9.0.
Single V-markers and R-markers in reverse transcriptase found across HIV-1 variants with >5 LANL sequences.
| HIV-1 variant | Countries | V-markers and R-markers (bold red) |
| Sub-subtype A6 | United Kingdom (4), Georgia (1), Italy (1), Russian Federation (79) | K11T (85%), E36D (76%) |
| Subtype C | Belgium (4), Brazil (12), Botswana (30), The Democratic Republic of the Congo (4), China (11), Spain (5), United Kingdom (14), Georgia (1), Equatorial Guinea (1), India (65), Kenya (4), Nigeria (1), Nepal (2), Pakistan (1), Rwanda (3), Sweden (9), Senegal (2), Thailand (1), United Republic of Tanzania (37), Uganda (7), United States (5), South Africa (933), Zambia (455) | T39E (78%) |
| Subtype D | The Democratic Republic of the Congo (2), United Kingdom (2), Kenya (1), South Korea (1), Nigeria (1), United Republic of Tanzania (1), Uganda (229), United States (1) | L282C (91%), P345Q (89%), T377Q (92%), S379C (82%) |
| Sub-subtype F1 | Germany (1), Spain (22), France (1), United Kingdom (14), Italy (2), United States (1) | D123E (88%), I178L (85%) |
| Subtype G | The Democratic Republic of the Congo (2), Cameroon (1), Spain (5), United Kingdom (2), Kenya (1), Nigeria (27), Russian Federation (1), South Africa (1) | M357R (93%), Q394R (90%), T400V (83%), F440Y (93%) |
| CRF01_AE | Afghanistan (1), China (547), Sweden (3), Thailand (221), United Kingdom (13), United States (3), Viet Nam (437) | V245E (91%) |
| CRF02_AG | Belgium (1), Benin (2), Cameroon (5), China (2), Spain (5), Gabon (1), United Kingdom (29), Equatorial Guinea (5), Italy (1), South Korea (1), Mexico (1), Nigeria (33), Pakistan (3), Russian Federation (1), Sweden (2), Senegal (9), Togo (7), Thailand (1), United States (1) | S162A (96%) |
| CRF06_cpx | Burkina Faso (3), China (1), United Kingdom (10), Nigeria (4), Senegal (1) | F346H (84%), R358K (89%) |
| CRF08_BC | China (129) | E53D (98%), D324E (86%) |
| CRF35_A1D | Afghanistan (9) | L283I (78%) |
| CRF55_01B | China (11) |
|
| CRF89_BF | Spain (7) | Q394L (86%), E399D (100%) |
In brackets, number of sequences in variant or country and conservation percentage in markers; in bold red font, V-markers that are R-markers. None of the R-markers corresponded to major DRM to NRTI or NNRTI according to Stanford v9.0.
Single V-markers and R-markers in integrase found across HIV-1 variants with >5 LANL sequences.
| HIV-1 variant | Countries | V-markers and R-markers (bold red) |
| Group N | Cameroon (11), France (1) | D55N (100%), V165I (92%), K215T (100%), T218L (100%), I220V (83%), D279G (100%) |
| Group O | Cameroon (30), Belgium (1), France (13), United States (3), Senegal (3) | D3E (90%), K7Q (100%), M22L (100%), N27G (100%), D41P (100%), Q44H (98%), L45I (88%), G59E (96%), Y83F (98%), G106A (100%), V126M (100%), Q137H (94%), S153A (96%), K160S (98%), G163Q (90%), I182V (90%), I204L (100%), D207Q (80%), K211T (100%), K240Q (100%), N254K (98%), C280S (86%), V281M (84%), D286T (92%), D288S (94%) |
| Subtype C | Cameroon (1), Ethiopia (2), Kenya (6), Malawi (1), Mozambique (1), Rwanda (3), Somalia (1), United Republic of Tanzania (57), Uganda (7), Zambia (471), Poland (1), Belgium (7), Denmark (1), Spain (5), Sweden (20), United States (2), Botswana (52), South Africa (623), Argentina (1), Brazil (7), Uruguay (1), Senegal (2), China (4), Cyprus (4), Georgia (1), India (43), Myanmar (1), Israel (5), Nepal (3), Pakistan (1), Saudi Arabia (14), Tajikistan (2), Thailand (1), Unknown (1), Yemen (1) | D25E (79%) |
| Subtype H | Belgium (3), Cameroon (1), Central African Republic (2), The Democratic Republic of the Congo (4), United Kingdom (1) | N222K (91%) |
| Subtype J | Republic of Angola (1), Belgium (3), Cameroon (1), The Democratic Republic of the Congo (3), Sweden (2) | Y99F (80%) |
| CRF03_A6B | Belarus (1), Russian Federation (3), Tajikistan (4) |
|
| CRF06_cpx | Australia (1), Cameroon (2), Estonia (95), Ghana (1), Mali (2), Russian Federation (1), Senegal (1) | L63I (91%) |
| CRF07_BC | China (321), South Korea (1), Taiwan (1), Viet Nam (1) | I84M (91%) |
| CRF08_BC | China (36) | K211R (92%) |
| CRF22_01A1 | Cameroon (14) | A23V (100%) |
| CRF33_01B | Indonesia (2), Malaysia (4) | L63V (83%) |
| CRF35_A1D | Afghanistan (13) | I60M (100%), V126F (100%), G134S (77%) |
| CRF42_BF1 | Luxembourg (17) | L28I (100%), S39C (100%), G163E (100%) |
In brackets, number of sequences in variant or country and conservation percentage in markers; in bold red font, V-markers that are R-markers. None of the R-markers corresponded to major DRM to INSTI according to Stanford v9.0.
FIGURE 7Wu–Kabat variability coefficient plot of PR, RT, and IN group M sequences. (A) Wu–Kabat variability coefficient plot of PR (99 aa). (B) Wu–Kabat variability coefficient plot of RT (440 aa). (C) Wu–Kabat variability coefficient plot of IN (288 aa). X-axis, amino acid position; Y-axis, WK variability coefficient.
FIGURE 8Proportion of Wu–Kabat variability coefficient values in PR, RT, and IN residues. Each box represents the proportion of residues within each protein that present a Wu–Kabat coefficient value within the range indicated beneath the figure and colored accordingly. Protease (99 aa), reverse transcriptase (440 aa), integrase (288 aa). WK, WK variability coefficient.