| Literature DB >> 30063225 |
Soo-Yon Rhee1, Robert W Shafer1.
Abstract
Accurate classification of HIV-1 group M lineages, henceforth referred to as subtyping, is essential for understanding global HIV-1 molecular epidemiology. Because most HIV-1 sequencing is done for genotypic resistance testing pol gene, we sought to develop a set of geographically-stratified pol sequences that represent HIV-1 group M sequence diversity. Representative pol sequences differ from representative complete genome sequences because not all CRFs have pol recombination points and because complete genome sequences may not faithfully reflect HIV-1 pol diversity. We developed a software pipeline that compiled 6,034 one-per-person complete HIV-1 pol sequences annotated by country and year belonging to 11 pure subtypes and 70 CRFs and selected a set of sequences whose average distance to the remaining sequences is minimized for each subtype/CRF and country to generate a Geographically-Stratified set of 716 Pol Subtype/CRF (GSPS) reference sequences. We provide extensive data on pol diversity within each subtype/CRF and country combination. The GSPS reference set will also be useful for HIV-1 pol subtyping.Entities:
Mesh:
Year: 2018 PMID: 30063225 PMCID: PMC6067049 DOI: 10.1038/sdata.2018.148
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Schematic overview of the process of creating a set of Geographically-Stratified Pol Subtype/CRF (GSPS) reference sequences.
Proportions of each HIV-1 group M subtype/CRF in the set of 6,763 one-per-person complete pol sequences in GenBank compared with the reported global distribution of each subtype/CRF.
| C | 48 | 18.0 |
| A | 12 | 4.6 |
| B | 11 | 32.2 |
| CRF02_AG | 8 | 1.9 |
| CRF01_AE | 5 | 17.9 |
| G | 5 | 1.1 |
| D | 2 | 1.3 |
| F+H+J+K | 1 | 1.4 |
| Other CRFs | 4 | 11.2 |
| URFs | 4 | 10.5 |
aGlobal distribution of HIV-1 obtained from the WHO-UNAIDS Network for HIV Isolation and Characterisation[22]. The data was collected between 2000 and 2007 from researchers and literature review.
bProportion of complete pol sequences in GenBank (one per person) as of June 15, 2017.
List of the Subtype/CRFs, Countries, and Intra-Subtype/CRF Divergence of the Complete Set of 6,034 Individuated Pol Sequences with Subtype Assignments and the Subset of 716 Representative Geographically-Stratified Pol Subtype (GSPS) Sequences.
| Subtype / CRF | Complete Set (n=6,034) | GSPS (n=716) | Overall ADCL | |||||
|---|---|---|---|---|---|---|---|---|
| # Seqs | Mean PWD | #Countries | Countries (# Sequences per country) | Ancestor CRFs Identified | #Seqs | Countries (# Sequences per country) | ||
| Note: Pairwise distances (PWD) were calculated using TN93 substitution model. | ||||||||
| Abbreviations: PWD – pairwise distance; ADCL - average distance to the closest leaf; NA – not applicable. | ||||||||
| A1 | 306 | 0.071 | 24 | KE(84), CY(32), TZ(30), UG(24), UA(21), CM(19), PK(19), RU(19), RW(11), UZ(8), IN(6), KZ(6), ZA(5), CD(4), SE(4), ES(3), SN(3), NG(2), AU(1), BY(1), GE(1), IT(1), SO(1), US(1) | CRF11_cpx, CRF22_01A1, CRF35_AD, CRF50_A1D | 78 | KE(20), TZ(8), UG(8), CM(6), CD(4), RW(4), SE(4), CY(3), ZA(3), NG(2), PK(2), SN(2), AU(1), BY(1), ES(1), GE(1), IN(1), IT(1), KZ(1), RU(1), SO(1), UA(1), US(1), UZ(1) | 0.035 |
| A2 | 4 | 0.050 | 3 | CD(2), CM(1), CY(1) | CRF16_A2D, CRF21_A2D | 4 | CD(2), CM(1), CY(1) | NA |
| B | 2175 | 0.062 | 42 | US(751), JP(389), CN(261), BR(229), KR(112), CY(83), ES(58), AU(57), DE(36), TH(32), DK(17), AR(14), FR(13), PE(13), GB(10), JM(8), SE(8), RU(7), CA(6), CO(6), ZA(6), CU(5), HT(5), TT(5), UY(5), DO(4), GE(3), HK(3), MM(3), NL(3), PH(3), PY(3), UA(3), YE(3), CH(2), EC(2), TW(2), BO(1), GA(1), IN(1), MX(1), PL(1) | CRF03_AB, CRF07_BC, CRF20_BG, CRF23_BG, CRF24_BG, CRF28_BF, CRF38_BF1, CRF39_BF, CRF42_BF, CRF51_01B, CRF54_01B, CRF67_01B, CRF68_01B, CRF69_01B, CRF87_cpx | 176 | BR(22), ES(14), AR(13), PE(13), DE(12), JM(8), CO(6), DK(6), RU(6), CY(5), HT(5), SE(5), TH(5), CU(4), DO(4), ZA(4), CN(3), GE(3), JP(3), PY(3), TT(3), CH(2), EC(2), GB(2), PH(2), TW(2), US(2), UY(2), YE(2), AU(1), BO(1), CA(1), FR(1), GA(1), HK(1), IN(1), KR(1), MM(1), MX(1), NL(1), PL(1), UA(1) | 0.031 |
| C | 1215 | 0.060 | 30 | ZM(515), ZA(362), TZ(76), BW(54), CN(34), IN(33), MW(29), ET(26), BR(19), CY(12), SE(10), KE(9), NP(8), ES(5), US(5), CD(2), MM(2), NG(2), AO(1), AR(1), DE(1), DK(1), GE(1), IL(1), JP(1), PK(1), SN(1), SO(1), UY(1), YE(1) | CRF07_BC, CRF08_BC, CRF31_BD, CRF60_BC, CRF61_BC, CRF62_BC, CRF64_BC, CRF65_cpx, CRF77_cpx, CRF82_cpx, CRF83_cpx, CRF85_BC, CRF86_BC | 126 | MW(15), ET(14), SE(10), TZ(10), ZA(9), KE(8), NP(8), CN(7), BW(5), US(5), CY(4), ZM(4), BR(3), ES(3), IN(3), CD(2), MM(2), NG(2), AO(1), AR(1), DE(1), DK(1), GE(1), IL(1), JP(1), PK(1), SN(1), SO(1), UY(1), YE(1) | 0.039 |
| D | 86 | 0.063 | 13 | UG(45), KE(8), TZ(8), CD(7), CM(5), BR(3), TD(2), YE(2), ZA(2), CY(1), DK(1), KR(1), US(1) | CRF10_CD, CRF19_cpx, CRF21_A2D | 33 | KE(8), UG(7), CD(4), TZ(3), CM(2), YE(2), BR(1), CY(1), DK(1), KR(1), TD(1), US(1), ZA(1) | 0.037 |
| F1 | 66 | 0.055 | 10 | BR(30), ES(22), AO(3), CD(2), FR(2), JP(2), RO(2), AR(1), FI(1), RU(1) | CRF12_BF, CRF17_BF, CRF29_BF, CRF38_BF1, CRF44_BF | 16 | BR(3), AO(2), CD(2), FR(2), JP(2), AR(1), ES(1), FI(1), RO(1), RU(1) | 0.025 |
| F2 | 9 | 0.054 | 1 | CM(9) | 2 | CM(2) | 0.039 | |
| G | 77 | 0.063 | 13 | NG(26), ES(19), CM(15), CD(3), CU(3), PT(3), KE(2), CN(1), GH(1), GW(1), IT(1), SE(1), ZA(1) | CRF20_BG, CRF23_BG, CRF24_BG, CRF43_02G, CRF73_BG | 32 | CM(10), NG(6), CD(3), ES(3), KE(2), CN(1), CU(1), GH(1), GW(1), IT(1), PT(1), SE(1), ZA(1) | 0.033 |
| H | 10 | 0.061 | 5 | CD(4), BE(2), CF(2), GB(1), US(1) | 10 | CD(4), BE(2), CF(2), GB(1), US(1) | NA | |
| J | 6 | 0.058 | 3 | CD(3), SE(2), AO(1) | 5 | CD(3), AO(1), SE(1) | 0.023 | |
| K | 2 | 0.046 | 2 | CD(1), CM(1) | 2 | CD(1), CM(1) | NA | |
| 01_AE | 1210 | 0.040 | 15 | CN(583), VN(384), TH(216), US(6), JP(4), CF(3), MM(3), SE(3), TW(2), AF(1), CM(1), HK(1), ID(1), IR(1), PH(1) | CRF33_01B, CRF34_01B CRF52_01B, CRF53_01B | 24 | TH(4), CF(3), SE(3), MM(2), TW(2), AF(1), CM(1), CN(1), HK(1), ID(1), IR(1), JP(1), PH(1), US(1), VN(1) | 0.022 |
| 02_AG | 130 | 0.055 | 20 | CM(50), NG(23), GH(10), CY(8), PK(6), GW(5), FR(4), SE(4), SN(3), US(3), EC(2), ES(2), KR(2), RU(2), AO(1), DE(1), EE(1), MX(1), NE(1), TH(1) | CRF43_02G | 58 | NG(18), GH(5), GW(5), SE(4), CM(3), CY(3), SN(3), US(3), ES(2), KR(2), AO(1), DE(1), EC(1), EE(1), FR(1), MX(1), NE(1), PK(1), RU(1), TH(1) | 0.037 |
| 03_AB | 3 | 0.011 | 2 | RU(2), BY(1) | 2 | BY(1), RU(1) | 0.010 | |
| 04_cpx | 8 | 0.047 | 2 | GR(6), CY(2) | 3 | GR(2), CY(1) | 0.036 | |
| 05_DF | 3 | 0.047 | 3 | BE(1), CD(1), ES(1) | 3 | BE(1), CD(1), ES(1) | ||
| 06_cpx | 15 | 0.049 | 10 | EE(4), KR(2), ML(2), BF(1), BJ(1), CD(1), GH(1), NG(1), RU(1), SN(1) | 10 | BF(1), BJ(1), CD(1), EE(1), GH(1), KR(1), ML(1), NG(1), RU(1), SN(1) | 0.025 | |
| 07_BC | 300 | 0.028 | 3 | CN(295), TW(4), MM(1) | 3 | CN(1), MM(1), TW(1) | 0.019 | |
| 08_BC | 91 | 0.030 | 2 | CN(89), MM(2) | 3 | MM(2), CN(1) | 0.017 | |
| 09_cpx | 6 | 0.045 | 4 | CI(2), SN(2), GH(1), US(1) | 6 | CI(2), SN(2), GH(1), US(1) | NA | |
| 10_CD | 3 | 0.050 | 1 | TZ(3) | 3 | TZ(3) | NA | |
| 11_cpx | 23 | 0.063 | 6 | CM(17), FR(2), CY(1), GR(1), NG(1), US(1) | 15 | CM(9), FR(2), CY(1), GR(1), NG(1), US(1) | 0.038 | |
| 12_BF | 6 | 0.037 | 2 | AR(4), UY(2) | 2 | AR(1), UY(1) | 0.033 | |
| 13_cpx | 8 | 0.042 | 1 | CM(8) | 1 | CM(1) | 0.037 | |
| 16_A2D | 2 | 0.054 | 2 | KE(1), KR(1) | 2 | KE(1), KR(1) | NA | |
| 17_BF | 5 | 0.036 | 4 | AR(2), BO(1), PE(1), PY(1) | 4 | AR(1), BO(1), PE(1), PY(1) | 0.030 | |
| 18_cpx | 7 | 0.055 | 2 | CM(4), CU(3) | 5 | CM(4), CU(1) | 0.030 | |
| 19_cpx | 6 | 0.040 | 2 | CU(4), ES(2) | 2 | CU(1), ES(1) | 0.027 | |
| 20_BG | 4 | 0.023 | 2 | CU(3), ES(1) | 2 | CU(1), ES(1) | 0.012 | |
| 21_A2D | 3 | 0.042 | 1 | KE(3) | 1 | KE(1) | 0.040 | |
| 22_01A1 | 1 | 1 | CM(1) | 1 | CM(1) | NA | ||
| 23_BG | 2 | 0.018 | 1 | CU(2) | 1 | CU(1) | 0.018 | |
| 24_BG | 8 | 0.021 | 2 | CU(7), ES(1) | 2 | CU(1), ES(1) | 0.016 | |
| 25_cpx | 6 | 0.046 | 3 | CM(3), SA(2), CD(1) | 5 | CM(3), CD(1), SA(1) | 0.030 | |
| 26_AU | 1 | 1 | CD(1) | 1 | CD(1) | NA | ||
| 27_cpx | 3 | 0.061 | 1 | CD(3) | 3 | CD(3) | NA | |
| 28_BF | 3 | 0.043 | 1 | BR(3) | 2 | BR(2) | 0.037 | |
| 29_BF | 7 | 0.056 | 1 | BR(7) | 7 | BR(7) | NA | |
| 31_BC | 2 | 0.028 | 1 | BR(2) | 1 | BR(1) | 0.028 | |
| 33_01B | 4 | 0.035 | 2 | MY(3), ID(1) | 2 | ID(1), MY(1) | 0.033 | |
| 34_01B | 3 | 0.008 | 1 | TH(3) | 1 | TH(1) | 0.006 | |
| 35_AD | 23 | 0.031 | 2 | AF(14), IR(9) | 2 | AF(1), IR(1) | 0.026 | |
| 36_cpx | 1 | 1 | CM(1) | 1 | CM(1) | NA | ||
| 37_cpx | 4 | 0.055 | 2 | CM(3), CY(1) | 4 | CM(3), CY(1) | NA | |
| 38_BF1 | 3 | 0.046 | 1 | UY(3) | 3 | UY(3) | NA | |
| 39_BF | 3 | 0.054 | 1 | BR(3) | 3 | BR(3) | NA | |
| 40_BF | 4 | 0.049 | 1 | BR(4) | 3 | BR(3) | 0.038 | |
| 42_BF | 21 | 0.008 | 1 | LU(21) | 1 | LU(1) | 0.004 | |
| 43_02G | 4 | 0.031 | 1 | SA(4) | 1 | SA(1) | 0.027 | |
| 44_BF | 1 | 1 | CL(1) | 1 | CL(1) | NA | ||
| 45_cpx | 4 | 0.055 | 4 | BR(1), CD(1), CM(1), GA(1) | 4 | BR(1), CD(1), CM(1), GA(1) | NA | |
| 47_BF | 3 | 0.028 | 2 | ES(2), BR(1) | 2 | BR(1), ES(1) | 0.017 | |
| 48_01B | 3 | 0.024 | 1 | MY(3) | 1 | MY(1) | 0.023 | |
| 49_cpx | 3 | 0.041 | 1 | GM(3) | 1 | GM(1) | 0.040 | |
| 50_A1D | 4 | 0.030 | 1 | GB(4) | 1 | GB(1) | 0.025 | |
| 51_01B | 4 | 0.020 | 2 | MY(2), SG(2) | 2 | MY(1), SG(1) | 0.019 | |
| 52_01B | 3 | 0.032 | 2 | TH(2), MY(1) | 2 | MY(1), TH(1) | 0.025 | |
| 53_01B | 4 | 0.035 | 1 | MY(4) | 1 | MY(1) | 0.030 | |
| 54_01B | 2 | 0.027 | 1 | MY(2) | 1 | MY(1) | 0.027 | |
| 55_01B | 8 | 0.020 | 1 | CN(8) | 1 | CN(1) | 0.016 | |
| 56_cpx | 4 | 0.003 | 1 | FR(4) | 1 | FR(1) | 0.002 | |
| 58_01B | 6 | 0.026 | 1 | MY(6) | 1 | MY(1) | 0.023 | |
| 59_01B | 8 | 0.020 | 1 | CN(8) | 1 | CN(1) | 0.017 | |
| 60_BC | 3 | 0.021 | 1 | IT(3) | 1 | IT(1) | 0.020 | |
| 61_BC | 3 | 0.012 | 1 | CN(3) | 1 | CN(1) | 0.012 | |
| 62_BC | 3 | 0.018 | 1 | CN(3) | 1 | CN(1) | 0.018 | |
| 63_02A1 | 10 | 0.011 | 1 | RU(10) | 1 | RU(1) | 0.009 | |
| 64_BC | 3 | 0.029 | 1 | CN(3) | 1 | CN(1) | 0.027 | |
| 65_cpx | 5 | 0.023 | 1 | CN(5) | 1 | CN(1) | 0.020 | |
| 67_01B | 2 | 0.016 | 1 | CN(2) | 1 | CN(1) | 0.016 | |
| 68_01B | 3 | 0.009 | 1 | CN(3) | 1 | CN(1) | 0.008 | |
| 69_01B | 7 | 0.021 | 1 | JP(7) | 1 | JP(1) | 0.013 | |
| 73_BG | 3 | 0.028 | 2 | ES(2), DE(1) | 2 | DE(1), ES(1) | 0.023 | |
| 74_01B | 3 | 0.032 | 1 | MY(3) | 1 | MY(1) | 0.029 | |
| 77_cpx | 4 | 0.018 | 1 | MY(4) | 1 | MY(1) | 0.017 | |
| 78_cpx | 3 | 0.031 | 1 | CN(3) | 1 | CN(1) | 0.029 | |
| 82_cpx | 6 | 0.017 | 1 | MM(6) | 1 | MM(1) | 0.011 | |
| 83_cpx | 11 | 0.012 | 1 | MM(11) | 1 | MM(1) | 0.009 | |
| 85_BC | 10 | 0.023 | 1 | CN(10) | 1 | CN(1) | 0.018 | |
| 86_BC | 3 | 0.033 | 1 | CN(3) | 1 | CN(1) | 0.031 | |
| 87_cpx | 3 | 0.028 | 1 | CN(3) | 1 | CN(1) | 0.026 | |
Intra-Subtype/CRF Divergence of the Complete Set of 6,034 Individuated Pol Sequences and the Subset of 716 Representative Geographically-Stratified Pol Subtype (GSPS) Sequences According to Subtype/CRF.
| Subtype/CRF | Complete Set (n=6,036) | GSPS (n=718) | Overall ADCL | ||||||
|---|---|---|---|---|---|---|---|---|---|
| # Sequences | Mean PWD | Min PWD | Max PWD | # Sequences | Mean PWD | Min PWD | Max PWD | ||
| Note: Pairwise distances (PWD) were calculated using TN93 substitution model. | |||||||||
| Abbreviations: PWD – pairwise distance; ADCL - average distance to the closest leaf; NA – not applicable | |||||||||
| A1 | 306 | 0.071 | 0.000 | 0.109 | 78 | 0.071 | 0.007 | 0.103 | 0.035 |
| A2 | 4 | 0.050 | 0.046 | 0.057 | 4 | 0.050 | 0.046 | 0.057 | NA |
| B | 2175 | 0.062 | 0.000 | 0.102 | 176 | 0.065 | 0.009 | 0.100 | 0.031 |
| C | 1215 | 0.060 | 0.000 | 0.129 | 126 | 0.068 | 0.017 | 0.129 | 0.039 |
| D | 86 | 0.063 | 0.001 | 0.099 | 33 | 0.068 | 0.029 | 0.099 | 0.037 |
| F1 | 66 | 0.055 | 0.002 | 0.086 | 16 | 0.063 | 0.032 | 0.083 | 0.025 |
| F2 | 9 | 0.054 | 0.015 | 0.076 | 2 | 0.055 | 0.055 | 0.055 | 0.039 |
| G | 77 | 0.063 | 0.004 | 0.094 | 32 | 0.063 | 0.018 | 0.088 | 0.033 |
| H | 10 | 0.061 | 0.043 | 0.083 | 10 | 0.061 | 0.043 | 0.083 | NA |
| J | 6 | 0.058 | 0.023 | 0.071 | 5 | 0.061 | 0.050 | 0.071 | 0.023 |
| K | 2 | 0.046 | 0.046 | 0.046 | 2 | 0.046 | 0.046 | 0.046 | NA |
| 01_AE | 1210 | 0.040 | 0.000 | 0.072 | 24 | 0.041 | 0.002 | 0.067 | 0.022 |
| 02_AG | 130 | 0.055 | 0.002 | 0.086 | 58 | 0.056 | 0.005 | 0.080 | 0.037 |
| 03_AB | 3 | 0.011 | 0.010 | 0.012 | 2 | 0.010 | 0.010 | 0.010 | 0.010 |
| 04_cpx | 8 | 0.047 | 0.025 | 0.060 | 3 | 0.039 | 0.037 | 0.040 | 0.036 |
| 05_DF | 3 | 0.047 | 0.035 | 0.055 | 3 | 0.047 | 0.035 | 0.055 | NA |
| 06_cpx | 15 | 0.049 | 0.017 | 0.070 | 10 | 0.050 | 0.017 | 0.069 | 0.025 |
| 07_BC | 300 | 0.028 | 0.000 | 0.053 | 3 | 0.025 | 0.016 | 0.034 | 0.019 |
| 08_BC | 91 | 0.030 | 0.001 | 0.053 | 3 | 0.037 | 0.032 | 0.046 | 0.017 |
| 09_cpx | 6 | 0.045 | 0.037 | 0.058 | 6 | 0.045 | 0.037 | 0.058 | NA |
| 10_CD | 3 | 0.050 | 0.048 | 0.052 | 3 | 0.050 | 0.048 | 0.052 | NA |
| 11_cpx | 23 | 0.063 | 0.006 | 0.085 | 15 | 0.065 | 0.050 | 0.085 | 0.038 |
| 12_BF | 6 | 0.037 | 0.029 | 0.044 | 2 | 0.029 | 0.029 | 0.029 | 0.033 |
| 13_cpx | 8 | 0.042 | 0.030 | 0.055 | 1 | NA | NA | NA | 0.037 |
| 16_A2D | 2 | 0.054 | 0.054 | 0.054 | 2 | 0.054 | 0.054 | 0.054 | NA |
| 17_BF | 5 | 0.036 | 0.029 | 0.041 | 4 | 0.036 | 0.029 | 0.041 | 0.030 |
| 18_cpx | 7 | 0.055 | 0.028 | 0.074 | 5 | 0.058 | 0.046 | 0.074 | 0.030 |
| 19_cpx | 6 | 0.040 | 0.021 | 0.056 | 2 | 0.051 | 0.051 | 0.051 | 0.027 |
| 20_BG | 4 | 0.023 | 0.011 | 0.035 | 2 | 0.035 | 0.035 | 0.035 | 0.012 |
| 21_A2D | 3 | 0.042 | 0.040 | 0.045 | 1 | NA | NA | NA | 0.040 |
| 22_01A1 | 1 | NA | NA | NA | 1 | NA | NA | NA | NA |
| 23_BG | 2 | 0.018 | 0.018 | 0.018 | 1 | NA | NA | NA | 0.018 |
| 24_BG | 8 | 0.021 | 0.009 | 0.034 | 2 | 0.022 | 0.022 | 0.022 | 0.016 |
| 25_cpx | 6 | 0.046 | 0.030 | 0.054 | 5 | 0.049 | 0.040 | 0.054 | 0.030 |
| 26_AU | 1 | NA | NA | NA | 1 | NA | NA | NA | NA |
| 27_cpx | 3 | 0.061 | 0.056 | 0.068 | 3 | 0.061 | 0.056 | 0.068 | NA |
| 28_BF | 3 | 0.043 | 0.037 | 0.050 | 2 | 0.043 | 0.043 | 0.043 | 0.037 |
| 29_BF | 7 | 0.056 | 0.046 | 0.066 | 7 | 0.056 | 0.046 | 0.066 | NA |
| 31_BC | 2 | 0.028 | 0.028 | 0.028 | 1 | NA | NA | NA | 0.028 |
| 33_01B | 4 | 0.035 | 0.032 | 0.040 | 2 | 0.038 | 0.038 | 0.038 | 0.033 |
| 34_01B | 3 | 0.008 | 0.002 | 0.012 | 1 | NA | NA | NA | 0.006 |
| 35_AD | 23 | 0.031 | 0.002 | 0.044 | 2 | 0.022 | 0.022 | 0.022 | 0.026 |
| 36_cpx | 1 | NA | NA | NA | 1 | NA | NA | NA | NA |
| 37_cpx | 4 | 0.055 | 0.035 | 0.063 | 4 | 0.055 | 0.035 | 0.063 | NA |
| 38_BF1 | 3 | 0.046 | 0.042 | 0.049 | 3 | 0.046 | 0.042 | 0.049 | NA |
| 39_BF | 3 | 0.054 | 0.050 | 0.058 | 3 | 0.054 | 0.050 | 0.058 | NA |
| 40_BF | 4 | 0.049 | 0.038 | 0.058 | 3 | 0.052 | 0.047 | 0.058 | 0.038 |
| 42_BF | 21 | 0.008 | 0.001 | 0.019 | 1 | NA | NA | NA | 0.004 |
| 43_02G | 4 | 0.031 | 0.025 | 0.041 | 1 | NA | NA | NA | 0.027 |
| 44_BF | 1 | NA | NA | NA | 1 | NA | NA | NA | NA |
| 45_cpx | 4 | 0.055 | 0.046 | 0.063 | 4 | 0.055 | 0.046 | 0.063 | NA |
| 47_BF | 3 | 0.028 | 0.017 | 0.034 | 2 | 0.032 | 0.032 | 0.032 | 0.017 |
| 48_01B | 3 | 0.024 | 0.021 | 0.025 | 1 | NA | NA | NA | 0.023 |
| 49_cpx | 3 | 0.041 | 0.037 | 0.045 | 1 | NA | NA | NA | 0.040 |
| 50_A1D | 4 | 0.030 | 0.022 | 0.040 | 1 | NA | NA | NA | 0.025 |
| 51_01B | 4 | 0.020 | 0.015 | 0.024 | 2 | 0.020 | 0.020 | 0.020 | 0.019 |
| 52_01B | 3 | 0.032 | 0.025 | 0.038 | 2 | 0.033 | 0.033 | 0.033 | 0.025 |
| 53_01B | 4 | 0.035 | 0.022 | 0.049 | 1 | NA | NA | NA | 0.030 |
| 54_01B | 2 | 0.027 | 0.027 | 0.027 | 1 | NA | NA | NA | 0.027 |
| 55_01B | 8 | 0.020 | 0.002 | 0.030 | 1 | NA | NA | NA | 0.016 |
| 56_cpx | 4 | 0.003 | 0.002 | 0.005 | 1 | NA | NA | NA | 0.002 |
| 58_01B | 6 | 0.026 | 0.014 | 0.036 | 1 | NA | NA | NA | 0.023 |
| 59_01B | 8 | 0.020 | 0.002 | 0.030 | 1 | NA | NA | NA | 0.017 |
| 60_BC | 3 | 0.021 | 0.018 | 0.023 | 1 | NA | NA | NA | 0.020 |
| 61_BC | 3 | 0.012 | 0.011 | 0.013 | 1 | NA | NA | NA | 0.012 |
| 62_BC | 3 | 0.018 | 0.018 | 0.018 | 1 | NA | NA | NA | 0.018 |
| 63_02A1 | 10 | 0.011 | 0.007 | 0.014 | 1 | NA | NA | NA | 0.009 |
| 64_BC | 3 | 0.029 | 0.021 | 0.034 | 1 | NA | NA | NA | 0.027 |
| 65_cpx | 5 | 0.023 | 0.014 | 0.030 | 1 | NA | NA | NA | 0.020 |
| 67_01B | 2 | 0.016 | 0.016 | 0.016 | 1 | NA | NA | NA | 0.016 |
| 68_01B | 3 | 0.009 | 0.007 | 0.010 | 1 | NA | NA | NA | 0.008 |
| 69_01B | 7 | 0.021 | 0.002 | 0.038 | 1 | NA | NA | NA | 0.013 |
| 73_BG | 3 | 0.028 | 0.023 | 0.033 | 2 | 0.029 | 0.029 | 0.029 | 0.023 |
| 74_01B | 3 | 0.032 | 0.027 | 0.037 | 1 | NA | NA | NA | 0.029 |
| 77_cpx | 4 | 0.018 | 0.007 | 0.025 | 1 | NA | NA | NA | 0.017 |
| 78_cpx | 3 | 0.031 | 0.026 | 0.035 | 1 | NA | NA | NA | 0.029 |
| 82_cpx | 6 | 0.017 | 0.004 | 0.035 | 1 | NA | NA | NA | 0.011 |
| 83_cpx | 11 | 0.012 | 0.000 | 0.024 | 1 | NA | NA | NA | 0.009 |
| 85_BC | 10 | 0.023 | 0.015 | 0.030 | 1 | NA | NA | NA | 0.018 |
| 86_BC | 3 | 0.033 | 0.027 | 0.037 | 1 | NA | NA | NA | 0.031 |
| 87_cpx | 3 | 0.028 | 0.025 | 0.031 | 1 | NA | NA | NA | 0.026 |
| A1 | 306 | 0.071 | 0.000 | 0.109 | 79 | 0.072 | 0.007 | 0.103 | 0.035 |
| A2 | 4 | 0.050 | 0.046 | 0.057 | 4 | 0.050 | 0.046 | 0.057 | NA |
| B | 2175 | 0.062 | 0.000 | 0.102 | 176 | 0.065 | 0.009 | 0.100 | 0.031 |
| C | 1215 | 0.060 | 0.000 | 0.129 | 126 | 0.068 | 0.017 | 0.129 | 0.039 |
| D | 86 | 0.063 | 0.001 | 0.099 | 33 | 0.068 | 0.029 | 0.099 | 0.037 |
| F1 | 66 | 0.055 | 0.002 | 0.086 | 16 | 0.063 | 0.032 | 0.083 | 0.025 |
| F2 | 9 | 0.054 | 0.015 | 0.076 | 2 | 0.055 | 0.055 | 0.055 | 0.039 |
| G | 77 | 0.063 | 0.004 | 0.094 | 32 | 0.063 | 0.018 | 0.088 | 0.033 |
| H | 10 | 0.061 | 0.043 | 0.083 | 10 | 0.061 | 0.043 | 0.083 | NA |
| J | 6 | 0.058 | 0.023 | 0.071 | 5 | 0.061 | 0.050 | 0.071 | 0.023 |
| K | 2 | 0.046 | 0.046 | 0.046 | 2 | 0.046 | 0.046 | 0.046 | NA |
Figure 2Phylogenetic tree of 716 Geographically-Stratified Pol Subtype/CRF (GSPS) reference sequences representing 11 pure subtypes and 70 CRFs.
Branches of GSPS sequences belonging to the pure subtypes A1, A2, B, C, D, F1, F2, G, H, J, and K and the highly prevalent CRFs, CRF01_AE and CRF02_AG are color-coded and their clades are indicated in the outer ring. GSPS sequences belonging to the remaining 68 CRFs are indicated by black branches with color-coded circles on the branch tips. The tree was constructed using neighbour joining with branch length optimized by maximum likelihood method with GTR evolution model using the R package phangorn and then rooted at the mid-point. The tree was illustrated using the R package ggtree.
Figure 3Number of sequences by country in the complete set of 6,034 one-per-person complete HIV-1 group M pol sequences.
The large number of sequences from China is consistent with the frequent sequencing of complete genome and complete pol sequences in research and public health laboratories in this country.
Figure 4Correlation between the estimated year of the most recent common ancestor (MRCA) and the intra-subtype/CRF diversity in the complete set of 6,034 one-per-person complete HIV-1 group M pol sequences.
Median PWD: median of the intra-subtype/CRF pairwise distances using TN93 substitution model. Each point indicates a subtype or CRF with CRFs labelled by their number alone. Points have been manually jittered to minimize overlap. Year of the MRCA was obtained from the references listed in Table 4.
Time to the most recent common ancestors (tMRCA) of HIV-1 group M subtypes and CRFs.
| A1 | Wertheim, J. O. | 1946 | 1936–1956 |
| A2 | Wertheim, J. O. | 1952 | 1941–1936 |
| B | Wertheim, J. O. | 1955 | 1946–1964 |
| C | Wertheim, J. O. | 1939 | 1926–1951 |
| D | Abecasis, A.B. | 1945 | 1935–1955 |
| F1 | Wertheim, J. O. | 1950 | 1940–1959 |
| F2 | Wertheim, J. O. | 1961 | 1954–1968 |
| G | Tongo, M. | 1953 | 1939–1963 |
| 01_AE | Liao, H. | 1967 | 1963–1973 |
| 02_AG | Faria, N.R. | 1973 | 1972–1975 |
| 07_BC | Tee, K.K. | 1993 | 1991–1995 |
| 08_BC | Tee, K.K. | 1990 | 1988–1991 |
| 09_cpx | Delatorre, E. | 1966 | |
| 11_cpx | Delatorre, E. | 1957 | |
| 12_BF | Dilernia, D. A. | 1969 | 1946–1981 |
| 13_cpx | Delatorre, E. | 1965 | |
| 20_BG | Delatorre, E. | 1996 | 1994–1998 |
| 23_BG | Delatorre, E. | 1998 | 1996–2000 |
| 24_BG | Delatorre, E. | 1997 | 1996–2000 |
| 28_BF | Ristic, N. | 1988 | 1984–1993 |
| 29_BF | Ristic, N. | 1988 | 1984–1993 |
| 31_BC | Passaes, C.P. | 1988 | 1982–1992 |
| 33_01B | Tee, K.K. | 1992 | 1987–1997 |
| 35_AD | Eybpoosh, S. | 1991 | |
| 38_BF1 | Bello, G. | 1986 | 1981–1990 |
| 42_BF | Struck, D. | 2002 | 2001–2003 |
| 45_cpx | Delatorre, E. | 1965 | |
| 48_01B | Li, Y. | 2001 | 1998–2004 |
| 50_A1D | Foster, G. M. | 1992 | 1966–2007 |
| 51_01B | Ng, K. T. | 2000 | 1992–2006 |
| 59_01B | Zhang, W. | 2000 | 1994–2005 |
| 63_02A1 | Shcherbakova, N. S. | 2006 | 2005–2007 |
| 65_cpx | Liu, Y. | 2000 | 1997–2003 |
| 69_01B | Hosaka, M. | 1993 | 1978–1999 |
| 74_01B | Cheong, H. T. | 1995 |
Test Sequences for which the Subtype/CRF of the Closest NCBI Reference Sequence Differed from the Subtype/CRF of the Closest Geographically-Stratified Pol Subtype/CRF (GSPS) Reference Sequence (n=61; 1.2% of 5,185 Test Sequences).
| CRF01_AE | A1 | 5 | Eastern Africa (3), South Africa (1), Cyprus (1) | CRF01_AE has been reported primarily in Southeast Asia and Central Africa. A1 is common in East Africa |
| CRF03_AB | B | 1 | US (1) | CRF03_AB has been reported primarily in Eastern Europe and Central Asia. |
| CRF08_BC | C | 14 | CRF08_BC has been reported primarily in China. | |
| CRF22_01A1 | A1 | 9 | Cameroon (9) | CRF22_01A1 has been reported primarily in Cameroon. Subtype A1 is also common in Cameroon |
| CRF28_BF | B | 4 | Brazil (4) | CRF28_BF has been reported primarily in Brazil. Subtype B is also common in Brazil. |
| CRF31_BC | C | 15 | CRF31_BC has been reported primarily in Brazil. | |
| CRF51_01B | B | 2 | Japan (1), Philippine (1) | CRF51_01B has been reported primarily in Southeast Asia and Japan. Subtype B is common in Japan. |
| CRF64_BC | C | 6 | China (6) | CRF64_BC has been reported primarily China. Subtype C is also common in China. |
| CRF69_01B | B | 5 | Japan (5) | CRF69_01B has been reported primarily in Japan. Subtype B is common in Japan |
aSequences for which the closest NCBI sequence are unlikely to represent the correct subtype/CRF of the sequences are shown in bold.
bThe information in this column was obtained from Los Alamos National Laboratories (LANL) HIV Sequence Database[1].
1300 bp pol Test Sequences for which the Consensus Subtype/CRF assigned by COMET and Rega Differed from the Subtype/CRF of the Closest Geographically-Stratified Pol Subtype/CRF (GSPS) Reference Sequence (n=27; 0.4% of the 6,115 Test Sequences with a Consensus Subtype).
| CRF02_AG | CRF36_cpx (1) | 1 | Cameroon (1) | CRF02_AG is common in Cameroon. CRF36_cpx has been reported primarily in Cameroon. |
| CRF12_BF | CRF17_BF (1), B (3) | 4 | Argentina (4) | CRF12_BF and CRF17_BF has been reported primarily in South America. Subtype B is also common in South America. |
| A1 | CRF22_01A1 (3) | 3 | Cameroon (3) | Subtype A1 is common in Cameroon. CRF22_01A1 has been reported primarily in Cameroon. |
| A1 | A2 (1) | 1 | Republic of the Congo (1) | Subtypes A1 and A2 are common in Republic of the Congo. |
| B | CRF17_BF(1), CRF38_BF1 (2) | 3 | Argentina (3) | Subtype B and CRF17_BF are common in South America. CRF38_BF1 has been reported primarily in Uruguay. |
| B | CRF28_BF (1) | 1 | Brazil (1) | Subtype B and CRF28_BF are common in Brazil. |
| C | CRF07_BC (1), CRF85_BC (3) | 4 | China (4) | Subtype C is common in China. CRF07_BC and CRF85_BC have been reported primarily in China. |
| D | B (3) | 3 | Republic of the Congo (1), South Africa (1), Spain (1) | Subtype D is generally only seen in Eastern and Central Africa. |
| G | CRF43_02G (1) | 1 | Nigeria (1) | CRF43_02G is primarily reported in Saudi Arabia. Subtype G has been reported commonly in Africa. |
| G | CRF73_BG (6) | 6 | Portugal (4), Spain (2) | Subtype G is common in Central Europe. CRF73_BG has been reported in Portugal and Spain. |
aSubtype/CRF classifications of test sequences that were agreed by COMET and Rega subtyping programs.
bSubtypes/CRFs of the closest GSPS sequences of the test sequences that differed from their consensus subtype/CRF.
cThe information in this column was obtained from Los Alamos National Laboratories (LANL) HIV Sequence Database[1].