| Literature DB >> 29484203 |
Marcel Tongo1,2,3, Gordon W Harkins4, Jeffrey R Dorfman5,6, Erik Billings7,8, Sodsai Tovanabutra7,8, Tulio de Oliveira1, Darren P Martin2.
Abstract
Subtype A is one of the rare HIV-1 group M (HIV-1M) lineages that is both widely distributed throughout the world and persists at high frequencies in the Congo Basin (CB), the site where HIV-1M likely originated. This, together with its high degree of diversity suggests that subtype A is amongst the fittest HIV-1M lineages. Here we use a comprehensive set of published near full-length subtype A sequences and A-derived genome fragments from both circulating and unique recombinant forms (CRFs/URFs) to obtain some insights into how frequently these lineages have independently seeded HIV-1M sub-epidemics in different parts of the world. We do this by inferring when and where the major subtype A lineages and subtype A-derived CRFs originated. Following its origin in the CB during the 1940s, we track the diversification and recombination history of subtype A sequences before and during its dissemination throughout much of the world between the 1950s and 1970s. Collectively, the timings and numbers of detectable subtype A recombination and dissemination events, the present broad global distribution of the sub-epidemics that were seeded by these events, and the high prevalence of subtype A sequences within the regions where these sub-epidemics occurred, suggest that ancestral subtype A viruses (and particularly sub-subtype A1 ancestral viruses) may have been genetically predisposed to become major components of the present epidemic.Entities:
Keywords: HIV-1 subtype A; adaptation; diversity; evolution; phylogenetic analysis; recombination
Year: 2018 PMID: 29484203 PMCID: PMC5819727 DOI: 10.1093/ve/vey003
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.ML tree indicating the phylogenetic relationships between 357 HIV-1M genome sequences. These represent all published near full-length subtype A sequences that were available in the LANL database in June 2015 (LANL 2014), contiguous subtype A-derived genome segments from twenty-one CRFs with a total length of at least 1,000 nt, three highly divergent subtype A-like sequence fragments from two URFs and ninety nine near full-length sequences representing the full spectrum of known diversity within subtypes B, C, F, G, H, J, and K. The tree is mid-point rooted and was constructed with 1,000 full ML bootstrap replicates using RAxML (Stamatakis 2014). Nodes with bootstrap ≥70% are indicated with black dots. Some clades have been condensed for the sake of clarity.
Figure 2.A time-scaled MCC tree indicating the most probable geographical locations of sequences that are ancestral to 306 subtype A sequences. These subtype A sequences include full-length subtype A genomes and contiguous A-attributed fragments within CRF and URF genomes. The colours of the branches correspond to the probable geographic locations of the ancestral viruses indicated by the branches, as indicated in the legend. The tree is temporally scaled such that distances between the concentric grey circles represent 10 years of evolution. Nodes with posterior probabilities >0.95 and >0.80 are indicated with black and open dots, respectively.
Calculated support for the time and origin of the lineages within subtype A.
| Group | tMRCA | 95% HPD | Posterior prob. | Location | Location prob. | Time of the recombination event | 95% HPD | Location of the recombination event | Location prob. |
|---|---|---|---|---|---|---|---|---|---|
| A | 1946 | 1939–53 | 1 | CM/GA/CF | 0.53 | ||||
| A1 | 1951 | 1946–57 | 1 | CM/GA/CF | 1 | ||||
| Cluster I (A1Afr) | 1959 | 1953–64 | 1 | East Africa | 0.59 | ||||
| 35_AD | 1989 | 1986–92 | 1 | Central Asia | 1 | 1968–89 | 1965–92 | East Africa or Central Asia | 0.5 each |
| 50_A1D | 1988 | 1985–92 | 1 | West Europe | 0.99 | 1976–88 | 1973–92 | West Europe | 0.99 |
| Cluster II (A1Eur) | 1977 | 1972–81 | 0.97 | East Europe | 0.93 | ||||
| 03_AB | 1993 | 1992–5 | 0.99 | East Europe | 1 | 1992–3 | 1991–5 | East Europe | 1 |
| 32_06A1 | 1997 | 1995–9 | 1 | East Europe | 1 | 1984–97 | 1980–99 | East Europe | 0.99 |
| Cluster III (divergent lineages) | |||||||||
| 04_cpx | 1973 | 1968–78 | 1 | GE/CY | 0.99 | 1962–73 | 1956–78 | CM/GA/CF-GE/CY | 0.5 each |
| 06_cpx | 1974 | 1970–9 | 1 | West Africa | 0.93 | 1967–74 | 1961–79 | West Africa | 0.82 |
| 09_cpx | 1974 | 1969–78 | 1 | West Africa | 0.95 | 1963–74 | 1959–78 | West Africa | 0.61 |
| 02_AG_2 and 01_AE | 1969 | 1964–73 | 0.99 | CM/GA/CF | 1 | 1962–69 | 1956–73 | CM/GA/CF | 1 |
| 02_AG_1 and 37_cpx_2&3 | 1970 | 1966–74 | 0.81 | CM/GA/CF | 1 | 1963–70 | 1959–74 | CM/GA/CF | 1 |
| 13_cpx | 1972 | 1967–77 | 0.87 | CM/GA/CF | 1 | 1961–72 | 1956–77 | CM/GA/CF | 0.99 |
| 18_cpx | 1972 | 1966–79 | 1 | CM/GA/CF | 0.92 | 1957–72 | 1951–79 | CM/GA/CF | 0.94 |
| 19_cpx | 1974 | 1968–79 | 1 | Americas | 0.98 | 1965–74 | 1957–79 | CM/GA/CF-CUBA | 0.51 and 0.49 |
| 22_01A1 and 45_cpx_1 and 36_cpx | 1971 | 1966–76 | 0.95 | CM/GA/CF | 1 | 1963–71 | 1959–76 | CM/GA/CF | 0.87 |
| 37_cpx_1 | 1977 | 1971–84 | 0.83 | CM/GA/CF | 1 | 1957–77 | 1952–84 | CM/GA/CF | 1 |
| 45_cpx_2&3 and 11_cpx | 1966 | 1961–71 | 1 | CM/GA/CF | 1 | 1956–66 | 1951–71 | CM/GA/CF | 1 |
| 49_cpx | 1980 | 1974–86 | 1 | West Africa | 0.99 | 1967–80 | 1961–86 | West Africa | 0.95 |
| A3 | 1975 | 1969–81 | 1 | West Africa | 0.97 | ||||
| A2 | 1968 | 1964–73 | 1 | CM/GA/CF | 0.98 | ||||
| 16_A2D | 1973 | 1968–77 | 1 | CM/GA/CF | 0.96 | 1968–73 | 1964–77 | CM/GA/CF | 0.97 |
| 21_A2D | 1977 | 1973–81 | 1 | East Africa | 0.99 | 1970–7 | 1966–81 | CM/GA/CF-East Africa | 0.5 each |
| A4 | 1966 | 1958–74 | 1 | DRC | 0.99 | ||||
| A5 | 1973 | 1968–78 | 1 | DRC | 1 | 1956–73 | 1948–78 | DRC | 0.97 |
CM, Cameroon; GA, Gabon; CF, Central Africa Republic; GE, Greece; CY, Cyprus; DRC, Democratic Republic of Congo.
02_AG_1 = fragment 1 (HXB2 position 790–2155); 02_AG_2 = fragment 2 (HXB2 position 6225–8311); 37_cpx_1 = fragment 1 (HXB2 position 1126–2142); 31_cpx = fragment 2 (HXB2 position 2913–4663); 37_cpx_3 = fragment 3 (HXB2 position 6431–7743); 45_cpx_1 = fragment 1 (HXB2 position 790–2397); 45_cpx_cpx_2 = fragment 2 (HXB2 position 4299–6041); 45_cpx_3 = fragment 3 (HXB2 position 6343–8236).
Figure 3.Patterns of natural selection acting at sub-subtype A1-derived gag codon sites in CRFs 01_AE (A) and 02_AG (B) compared to those acting on homologous codon sites in sub-subtype A1 genomes that were sampled in Africa (A1Afr). Codon site locations that are indicated are specific for the alignment used to analyse all these sequences. At each site, absolute values of the inferred non-synonymous substitution rate minus the synonymous substitution rate (dN − dS) are plotted (as determined by the FUBAR method). Significantly positive dN − dS values which are indicative of positive selection are indicated in red, whereas significantly negative dN − dS values which are indicative of negative selection are plotted in blue if negative selection for the same amino acid state is found in the two compared lineages. They are plotted in green if negative selection is detected in both compared lineages, but different amino acids are being selected, and in grey if negative selection is only detectable in one of the compared lineages. Overall dN/dS ratios <1 indicate that, as expected, the gag gene in all three clades is evolving under predominantly negative selection. Black arrows indicate clusters of codon sites under negative selection for different amino acids in the two compared lineages.
Amino acid sites variation between A1Afr, A1Eur, CRFs 01_AE, 02_AG, and 22_01A1.
| Site | A1Afr | A1Eur | 01_AE | 02_AG | 22_01A1 |
|---|---|---|---|---|---|
| Gag-165 | |||||
| Gag-267 | |||||
| Gag-384 | |||||
| Gag-406 | |||||
| Pol-484 | |||||
| Pol-559 | |||||
| Pol-828 | |||||
| Env-514 |
aSites numbered from the beginning of each gene and according to the reference strain HIV-1/HXB2.
bAmino acid found in the majority (>80%) of analysed sequences in bold, while minor amino acids are in brackets.
cAmino acids found at approximately the same proportion of analysed sequences in bold. Grey shaded boxes indicate site under negative selection.