| Literature DB >> 28187204 |
John M Murray1, Stephen Maher1,2, Talia Mota3, Kazuo Suzuki4, Anthony D Kelleher4, Rob J Center3, Damian Purcell3.
Abstract
Significant progress has been made in characterizing broadly neutralizing antibodies against the HIV envelope glycoprotein Env, but an effective vaccine has proven elusive. Vaccine development would be facilitated if common features of early founder virus required for transmission could be identified. Here we employ a combination of bioinformatic and operations research methods to determine the most prevalent features that distinguish 78 subtype B and 55 subtype C founder Env sequences from an equal number of chronic sequences. There were a number of equivalent optimal networks (based on the fewest covarying amino acid (AA) pairs or a measure of maximal covariance) that separated founders from chronics: 13 pairs for subtype B and 75 for subtype C. Every subtype B optimal solution contained the founder pairs 178-346 Asn-Val, 232-236 Thr-Ser, 240-340 Lys-Lys, 279-315 Asp-Lys, 291-792 Ala-Ile, 322-347 Asp-Thr, 535-620 Leu-Asp, 742-837 Arg-Phe, and 750-836 Asp-Ile; the most common optimal pairs for subtype C were 644-781 Lys-Ala (74 of 75 networks), 133-287 Ala-Gln (73/75) and 307-337 Ile-Gln (73/75). No pair was present in all optimal subtype C solutions highlighting the difficulty in targeting transmission with a single vaccine strain. Relative to the size of its domain (0.35% of Env), the α4β7 binding site occurred most frequently among optimal pairs, especially for subtype C: 4.2% of optimal pairs (1.2% for subtype B). Early sequences from 5 subtype B pre-seroconverters each exhibited at least one clone containing an optimal feature 553-624 (Ser-Asn), 724-747 (Arg-Arg), or 46-293 (Arg-Glu).Entities:
Mesh:
Substances:
Year: 2017 PMID: 28187204 PMCID: PMC5302377 DOI: 10.1371/journal.pone.0171572
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Unrooted phylogenetic trees for the subtype B and C HIV Env sequences.
Founder sequences are shown with red dashed branches, while chronic sequences are denoted by blue branches.
Fig 2Conserved and covarying regions for subtypes B and C Env.
Regions are coloured as conserved across both subtypes (black), conserved within each subtype (dark blue), conserved except for a maximum of 2 individuals in that subtype (light blue), and covarying (magenta). Those covarying pairs with at least 20% of the maximum covariance value for subtype C, and 12.5% for subtype B, are connected with magenta lines. The different levels of covariance were determined to include approximately the same numbers of covarying pairs in each case: 78 for subtype B and 79 for subtype C. The signal, α4β7 binding site, constant (C1-C6) and variable (V1-V5) regions within gp120, and the gp41 domain are mapped onto the Env sequence.
Pairs observed multiple times (frequency f) in optimal networks for each subtype.
The number of individuals exhibiting each AA combination is denoted by n.
| B | C | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| AA positions | f | n | AA | AA positions | f | n | AA | ||
| 278 | 620 | 4 | 15 | SD | 170 | 192 | 3 | 5 | HR |
| 750 | 836 | 4 | 14 | DI | 588 | 662 | 3 | 6 | RA |
| 230 | 232 | 3 | 3 | DQ | 7 | 10 | 2 | 7 | QY |
| 232 | 236 | 3 | 10 | TS | 161 | 192 | 2 | 4 | AI |
| 535 | 620 | 3 | 9 | LD | 179 | 674 | 2 | 4 | PN |
| 151 | 178 | 2 | 3 | GN | 192 | 343 | 2 | 5 | IQ |
| 240 | 340 | 2 | 8 | KK | 295 | 334 | 2 | 7 | EN |
| 283 | 621 | 2 | 3 | IE | 344 | 346 | 2 | 7 | KG |
| 291 | 792 | 2 | 4 | AI | 352 | 379 | 2 | 8 | YG |
| 293 | 337 | 2 | 5, 3 | VD, QK | 393 | 727 | 2 | 4 | DP |
| 319 | 836 | 2 | 6 | TT | 417 | 770 | 2 | 8 | |
| 336 | 845 | 2 | 9 | ET | 448 | 727 | 2 | 6 | SL |
| 347 | 543 | 2 | 10 | TL | 721 | 727 | 2 | 7 | IL |
| 440 | 620 | 2 | 5 | KD | |||||
| 624 | 747 | 2 | 8 | ER | |||||
| 724 | 758 | 2 | 10 | RD | |||||
| 724 | 837 | 2 | 11 | RF | |||||
| 747 | 758 | 2 | 3 | QD | |||||
| 818 | 840 | 2 | 11 | IF | |||||
Single AA in optimal networks determined on Founders, which appear at least 2 times (frequency f).
The region of Env is denoted ahead of a decimal point, and any recognised motif after the decimal.
| B | C | ||||
|---|---|---|---|---|---|
| position | domain | f | position | domain | f |
| 836 | 41CT.LLP-1 α helix | 10 | 192 | V2 | 8 |
| 620 | 41ED.HR2 | 9 | 346 | C3 | 7 |
| 232 | C2.NGS | 6 | 727 | 41CT.KenEpi | 6 |
| 347 | C3 | 6 | 624 | 41ED | 5 |
| 336 | C3 | 5 | 7 | Sig | 4 |
| 535 | 41ED.aHR1 | 5 | 161 | V2 | 4 |
| 724 | 41CT | 5 | 179 | V2.α4β7 | 4 |
| 747 | 41CT | 5 | 295 | V3.NGS(2G12) | 4 |
| 750 | 41CT.NGS | 5 | 588 | 41ED.HR1 | 4 |
| 178 | V2.α4β7 | 4 | 10 | Sig | 3 |
| 230 | C2 | 4 | 170 | V2 | 3 |
| 278 | C2 | 4 | 344 | C3 | 3 |
| 291 | C2.NGS | 4 | 350 | C3 | 3 |
| 621 | 41ED | 4 | 393 | V4 | 3 |
| 624 | 41ED | 4 | 662 | 41ED | 3 |
| 758 | 41CT | 4 | 674 | 41ED | 3 |
| 92 | C1.120•41 | 3 | 721 | 41CT | 3 |
| 151 | V1 | 3 | 832 | 41CT.LLP-1 α helix | 3 |
| 236 | C2.NGS | 3 | 27 | Sig | 2 |
| 240 | C2.NGS | 3 | 29 | Sig | 2 |
| 283 | C2 | 3 | 172 | V2 | 2 |
| 293 | C2 | 3 | 181 | V2.α4β7 | 2 |
| 319 | V3.R5/X4bs | 3 | 334 | C3 | 2 |
| 354 | C3 | 3 | 337 | C3 | 2 |
| 543 | 41ED | 3 | 343 | C3 | 2 |
| 837 | 41CT.LLP-1 α helix | 3 | 352 | C3 | 2 |
| 24 | Sig | 2 | 379 | C3 | 2 |
| 181 | V2.α4β7 | 2 | 417 | C4 | 2 |
| 335 | C3 | 2 | 440 | C4 | 2 |
| 337 | C3 | 2 | 448 | C4 | 2 |
| 340 | C2.NGS | 2 | 496 | C5 | 2 |
| 440 | C4 | 2 | 619 | 41ED | 2 |
| 444 | C4 | 2 | 621 | 41ED | 2 |
| 553 | 41ED | 2 | 770 | 41CT.LLP-2 α helix | 2 |
| 640 | 41ED | 2 | 833 | 41CT.LLP-1 α helix | 2 |
| 792 | 41CT.LLP-3 α helix | 2 | |||
| 818 | 41CT | 2 | |||
| 833 | 41CT.LLP-1 α helix | 2 | |||
| 840 | 41CT.LLP-1 α helix | 2 | |||
| 845 | 41CT.LLP-1 α helix | 2 | |||
The pairs that are observed in a given number of optimal solutions for subtypes B and C.
Listed for each of the optimal separating pairs are the covarying positions, the amino acids for the sequences in the Founder separating pairs, and then the Env motifs for each of these positions. The optimization problem was solved using objective i) (minimizing the number of pairs) on a covariance network constructed using all sequences in each subtype.
| 178–346 | Asn Val | V2.α4β7[ | C3 |
| 232–236 | Thr Ser | C2.NGS | C2.NGS |
| 240–340 | Lys Lys | C2.aNGS | C3 |
| 279–315 | Asp Lys | C2.CD4bs[ | V3.R5/X4bs[ |
| 291–792 | Ala Ile | C2.NGS | 41CT.LLP-3 α helix[ |
| 322–347 | Asp Thr | V3.R5/X4bs[ | C3 |
| 535–620 | Leu Asp | 41ED.aHR1 | 41ED.HR2 |
| 742–837 | Arg Phe | 41CT.KenEpi[ | 41CT.LLP-1 α helix[ |
| 750–836 | Asp Ile | 41CT.NGS | 41CT.LLP-1 α helix[ |
| 92–346 | Lys Val | C1.120•41 | C3 |
| 588–836 | Lys Thr | 41ED.HR1 | 41CT.LLP-1 α helix[ |
| 644–781 | Lys Ala | 41ED.HR2 | 41CT.LLP-2 α helix[ |
| 133–287 | Ala Gln | V1 hvr | C2.nCD4bs[ |
| 307–337 | Ile Gln | V3.R5/X4bs[ | C3 |
| 10–346 | Tyr Gly | Sig | C3 |
| 132–841 | Ser Leu | V1 hvr | 41CT.LLP-1 α helix[ |
| 295–322 | Glu Asp | V3.NGS(2G12)[ | V3.R5/X4bs[ |
| 721–727 | Ile Leu | 41CT | 41CT.KenEpi [ |
| 778–779 | Val Val | 41CT.LLP-2 α helix[ | 41CT.LLP-2 α helix[ |
| 779–833 | Val Val | 41CT.LLP-2 α helix[ | 41CT.LLP-1 α helix[ |
The region of Env is denoted ahead of a decimal point, and any recognised motif after the decimal. The covarying pairs are separated by—a dash. Sig = Env signal peptide; C2 = constant domain 2; C3 = constant domain 3, V1 = variable domain 1, V2 = variable domain 2, V3 = variable domain 3, 41ED = gp41 ectodomain external to membrane; 41CT = gp41 cytoplasmic tail internal to the membrane; α4β7 = alpha-4-beta-7 integrin binding site; NGS = N-linked glycosylation site; aNGS = amino acid adjacent to NGS; nNGS = near to NGS; C3 = constant region 3.; CDbs = residues mapped to contacting at the CD4 binding site; R5/X4bs = residues mapped to contact R5 or X4 coreceptor; V1hvr = Variable region 1 hyper variable region; HR1 = helix region 1; aHR1 = adjacent to HR1; HR2 = helix region 2 that contains T20 drug site; 120•41 contact residues between gp120 and gp41; KenEpi = Kennedy Epitope–highly immunogenic epitope [40]; LLP-1 helix = lentiviral lytic peptide– 1 alpha helix; LLP-2 helix = lentiviral lytic peptide– 2 alpha helix; LLP-3 helix = lentiviral lytic peptide– 3 alpha helix.
Comparison of optimal Founder pairs with the AA combinations appearing for 5 subtype B pre-seroconverters with each of the clones sequenced.
| Founder pairs | Env motifs connected | Patient: PSC35 | PSC89 | PSC24 | PSC73 | PSC182 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Clone: 5 | 10 | 51 | 948 | 955 | 911 | 912 | 913 | 914 | 915 | 928 | 949 | |||
| 343–621 | QD | aNGS–aHR2 | DD | HE | QE | DD | ED | QE | EY | IE | QE | HQ | KQ | |
| 354–636 | PD | aNGS -HR2 | PS | PS | PS | PN | PS | PN | PS | PS | PN | NS | GN | |
| 553–624 | SN | HR1- NGS | SD | S- | SE | SD | NE | SD | SG | SD | SG | NN | ||
| 624–747 | ER | NGS—aNGS | NR | DR | -R | DR | DR | GR | NR | DR | GR | NR | ||
| 724–747 | RR | KenEpi- aNGS | PR | PR | PR | PR | PR | PR | PR | PR | PR | QR | PR | |
| 46–293 | RE | RR | RQ | KE | KE | RK | RK | KE | KE | KA | KE | |||
Fig 3The collection of optimal pairs displayed relative to the domains of Env.
A) over a linear representation and B) as networks.