| Literature DB >> 28794033 |
Zigui Chen1,2, Wendy C S Ho1, Siaw Shi Boon1, Priscilla T Y Law1, Martin C W Chan1, Rob DeSalle3, Robert D Burk4, Paul K S Chan5,2.
Abstract
Human papillomavirus 58 (HPV58) is found in 10 to 18% of cervical cancers in East Asia but is rather uncommon elsewhere. The distribution and oncogenic potential of HPV58 variants appear to be heterogeneous, since the E7 T20I/G63S variant is more prevalent in East Asia and confers a 7- to 9-fold-higher risk of cervical precancer and cancer. However, the underlying genomic mechanisms that explain the geographic and carcinogenic diversity of HPV58 variants are still poorly understood. In this study, we used a combination of phylogenetic analyses and bioinformatics to investigate the deep evolutionary history of HPV58 complete genome variants. The initial splitting of HPV58 variants was estimated to occur 478,600 years ago (95% highest posterior density [HPD], 391,000 to 569,600 years ago). This divergence time is well within the era of speciation between Homo sapiens and Neanderthals/Denisovans and around three times longer than the modern Homo sapiens divergence times. The expansion of present-day variants in Eurasia could be the consequence of viral transmission from Neanderthals/Denisovans to non-African modern human populations through gene flow. A whole-genome sequence signature analysis identified 3 amino acid changes, 16 synonymous nucleotide changes, and a 12-bp insertion strongly associated with the E7 T20I/G63S variant that represents the A3 sublineage and carries higher carcinogenetic potential. Compared with the capsid proteins, the oncogenes E7 and E6 had increased substitution rates indicative of higher selection pressure. These data provide a comprehensive evolutionary history and genomic basis of HPV58 variants to assist further investigation of carcinogenic association and the development of diagnostic and therapeutic strategies.IMPORTANCE Papillomaviruses (PVs) are an ancient and heterogeneous group of double-stranded DNA viruses that preferentially infect the cutaneous and mucocutaneous epithelia of vertebrates. Persistent infection by specific oncogenic human papillomaviruses (HPVs), including HPV58, has been established as the primary cause of cervical cancer. In this work, we reveal the complex evolutionary history of HPV58 variants that explains the heterogeneity of oncogenic potential and geographic distribution. Our data suggest that HPV58 variants may have coevolved with archaic hominins and dispersed across the planet through host interbreeding and gene flow. Certain genes and codons of HPV58 variants representing higher carcinogenic potential and/or that are under positive selection may have important implications for viral host specificity, pathogenesis, and disease prevention.Entities:
Keywords: HPV58; cervical cancer; evolution; oncogenicity; papillomavirus; virus-host codivergence
Mesh:
Substances:
Year: 2017 PMID: 28794033 PMCID: PMC5640864 DOI: 10.1128/JVI.01285-17
Source DB: PubMed Journal: J Virol ISSN: 0022-538X Impact factor: 5.103
FIG 1Phylogeny of HPV58 complete genomes. The topology was obtained from the maximum likelihood tree by using RAxML, inferred from a global alignment of 90 complete genomes. Support scores alongside the branches of each sublineage indicate bootstrap percentages obtained by RAxML and PhyML and the Bayesian credibility values obtained by MrBayes. The stars indicate absolute agreement among the results of the three algorithms. The pairwise nucleotide sequence differences were calculated for each isolate and are shown on the right, with the scale displayed on the top. Values for each comparison for a given isolate are connected by lines, and the comparison to self is indicated as 0.0%.
Variation of HPV58 genome regions and ORFs
| ORF or region | Maximum nucleotide pairwise difference (%) | Length of nucleotide sequence alignment (bp) | No. of variable nucleotide positions | % of variable nucleotide positions | No. of variable nucleotide changes at each codon position | Maximum amino acid pairwise difference (%) | Length of amino acid alignment | No. of variable amino acid positions | % of variable amino acid positions | Ratio of nonsynonymous to synonymous changes | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1st | 2nd | 3rd | ||||||||||
| E6 | 1.8 | 447 | 16 | 3.6 | 8 | 1 | 7 | 4.0 | 148 | 8 | 5.4 | 1.33 |
| E7 | 3.4 | 294 | 20 | 6.8 | 7 | 6 | 7 | 7.1 | 97 | 12 | 12.4 | 1.71 |
| E1 | 1.3 | 1,932 | 89 | 4.6 | 35 | 17 | 37 | 2.0 | 645 | 46 | 7.1 | 1.12 |
| E2 | 1.3 | 1,074 | 53 | 4.9 | 11 | 9 | 33 | 2.2 | 357 | 27 | 7.6 | 1.13 |
| E4 | 2.2 | 273 | 19 | 7.0 | 5 | 10 | 4 | 5.6 | 90 | 17 | 18.9 | 17.00 |
| NCR1 | 3.2 | 62 | 4 | 6.5 | ||||||||
| E5 | 3.5 | 228 | 19 | 8.3 | 8 | 0 | 11 | 4.0 | 75 | 5 | 6.7 | 0.36 |
| NCR2 | 5.8 | 122 | 18 | 14.8 | ||||||||
| L2 | 2.5 | 1,419 | 103 | 7.3 | 24 | 25 | 53 | 3.4 | 472 | 52 | 11.0 | 1.24 |
| L1 | 2.2 | 1,572 | 94 | 6.0 | 17 | 15 | 62 | 3.6 | 523 | 35 | 6.7 | 0.67 |
| LCR | 3.6 | 849 | 92 | 10.8 | ||||||||
| CG/8 ORFs | 1.7 | 7,834 | 500 | 6.4 | 115 | 83 | 214 | 2.2 | 2,407 | 202 | 8.4 | 1.08 |
Each insert or deletion event was counted as one variation.
The first, second, and third nucleotide positions in a codon.
Amino acid changes (nonsynonymous changes). Each insert or deletion event was counted as one variation.
Each nucleotide position is counted once based on 90 complete genome alignments.
CG, complete genome; NCR1, noncoding region 1 between the E2 and E5 ORFs; NCR2, noncoding region 2 between the E5 and L2 ORFs; LCR, long control region.
FIG 2HPV58 lineage- and sublineage-specific nucleotide and amino acid changes across the complete genome. The x axis shows HPV58 gene/region positions, aligned according to the sublineage in the phylogenetic tree on the y axis. Lineage- and sublineage-specific SNPs were determined based on a global alignment of 90 complete genomes and color-coded as shown at the top. Amino acid changes within the E2/E4 region are changes observed in E2. SNPs were cumulative for the underlined lineages traversed from deepest node out to finer subline branches (dotted lines).
HPV58 single nucleotide polymorphisms showing genomic signatures with the HPV58 A3 variant represented by E7 T20I/G63S
Shading indicates changes identical to those in the A3 variants.
Deletion (Del) or insertion (Ins), TCCTTGTCAGTT (12 bp).
Likelihood ratio tests for positive selection of amino acid sites for HPV58 genes
| ORF | CODEML | FUBAR | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Best model | Log likelihood | LRT statistic | Codon | Posterior probability | Posterior probability | ||||
| E6 | M3 | −779.5126 | 0.5853 | ||||||
| 86D | 12.32 | 1.000 | 5.57 | 0.876 | |||||
| E7 | M2 | −606.8498 | 0.8527 | ||||||
| 64T | 8.47 | 0.833 | 9.45 | 0.919 | |||||
| 77V | 9.98 | 0.988 | 5.20 | 0.877 | |||||
| E1 | M3 | −3,452.3510 | 0.2046 | 1.8435 | |||||
| E2 | M8 | −1,986.1880 | 0.3243 | ||||||
| E4 | M3 | −629.6729 | 1.9275 | ||||||
| E5 | M2 | −442.6889 | 0.0744 | 0.0003 | |||||
| L2 | M3 | −2,918.3903 | 0.3219 | 0.0705 | |||||
| L1 | M3 | −3,182.2828 | 0.1830 | ||||||
| 325I | 5.97 | 0.984 | 2.30 | 0.851 | |||||
The “best” model was interpreted from the maximum log-likelihood value. M2, selection; M3, discrete; M8, beta and ω.
Overall dN/dS ratio for each gene.
Likelihood ratio test statistics follow a χ2 distribution, with degrees of freedom equaling 2 when values were ≥5.99 and P values were ≤0.05 (in boldface type).
Amino acid sites under positive selection are shown in boldface type.
Positively selected sites with P values of ≥0.950 by CODEML.
Positively selected sites with P values of ≥0.900 by FUBAR.
Geographic origin of HPV58 variants
| Continent | Country or city | Reference | Sequenced region(s) | No. of HPB58 variants | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | A1 | A2 | A3 | B1 | B2 | C | D1 | D2 | ||||
| Africa | South Africa | Partial LCR | 11 | 0 | 6 | 0 | 0 | 5 | 0 | 0 | 0 | |
| Zimbabwe | L1, LCR | 73 | 2 | 35 | 0 | 0 | 2 | 28 | 0 | 6 | ||
| America | Argentina | L1, LCR | 7 | 0 | 5 | 0 | 0 | 0 | 0 | 2 | 0 | |
| Brazil | Partial LCR | 61 | 1 | 45 | 6 | 0 | 3 | 6 | 0 | 0 | ||
| Canada | L1, LCR | 12 | 0 | 11 | 0 | 1 | 0 | 0 | 0 | 0 | ||
| Mexico | Partial LCR | 4 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | ||
| USA | L1, LCR | 39 | 0 | 29 | 4 | 0 | 1 | 5 | 0 | 0 | ||
| Partial LCR | 9 | 0 | 6 | 2 | 0 | 1 | 0 | 0 | 0 | |||
| Asia | China | L1, LCR | 3 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
| Nearly complete genome | 37 | 35 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |||
| Partial E6, E7 | 22 | 12 | 4 | 6 | 0 | 0 | 0 | 0 | 0 | |||
| E6, E7 | 135 | 89 | 20 | 24 | 2 | 0 | 0 | 0 | 0 | |||
| Hong Kong | L1, LCR | 90 | 25 | 36 | 24 | 2 | 1 | 2 | 0 | 0 | ||
| Japan | L1, LCR | 14 | 1 | 4 | 7 | 0 | 2 | 0 | 0 | 0 | ||
| South Korea | L1, LCR | 139 | 9 | 77 | 50 | 1 | 0 | 2 | 0 | 0 | ||
| Taiwan | L1, LCR | 6 | 2 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | ||
| Partial LCR | 5 | 0 | 2 | 3 | 0 | 0 | 0 | 0 | 0 | |||
| Thailand | L1, LCR | 7 | 0 | 3 | 4 | 0 | 0 | 0 | 0 | 0 | ||
| Europe | Italy | L1, LCR | 23 | 1 | 16 | 1 | 0 | 0 | 4 | 0 | 1 | |
| Partial E6, E7, L1, LCR | 24 | 0 | 17 | 1 | 1 | 1 | 4 | 0 | 0 | |||
| Scotland | Partial LCR | 7 | 0 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | ||
| UK | L1, LCR | 19 | 0 | 19 | 0 | 0 | 0 | 0 | 0 | 0 | ||
FIG 3Geographic distribution of HPV58 variants. (a) A total of 747 HPV58 variants with known geographic origins from 16 countries/regions (see details in Table 4) were assigned to a lineage/sublineage and are summarized by continent in the pie charts. (b) Principal-component analysis using a weighted UniFrac algorithm clustered different study cohorts into three distinct groups, mainly matching the geographic locations where the viruses were isolated. (c) Relative frequencies of HPV58 lineage/sublineage distributions in four continents. A higher frequency indicates a predominance of certain lineages/sublineage in the associated geographic area.
Divergence time estimations for HPV58 variant lineages
| Calibration | Rate (10−8) | Clock model | Tree prior | AICM | Log marginal likelihood | Estimated rate (10−8) | MRCA (kya) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | 95% HPD interval | Node 0 | Node 1 | Node 2 | Node 3 | Node 4 | Node 5 | Node 6 | |||||||||||||
| Mean | 95% HPD interval | Mean | 95% HPD interval | Mean | 95% HPD interval | Mean | 95% HPD interval | Mean | 95% HPD interval | Mean | 95% HPD interval | Mean | 95% HPD interval | ||||||||
| No | Feline PV | Relaxed | Bayesian | 32,649 | −16,239.82 | 1.84 | 1.61, 2.08 | 451.6 | 306.0, 619.6 | 191.9 | 125.0, 266.5 | 149.8 | 98.1, 206.8 | 356.6 | 246.5, 477.2 | 281.6 | 193.7, 376.6 | 220.1 | 147.1, 296.8 | 128.3 | 82.7, 178.7 |
| No | 1.95 (1.32–2.47) | Relaxed | Yule | +7 | −16,241.02 | 1.95 | 1.73, 2.17 | 318.8 | 241.4, 407.8 | 179.1 | 125.9, 235.8 | 146.9 | 105.8, 192.3 | 270.8 | 202.3, 343.1 | 223.2 | 164.1, 283.3 | 182.6 | 134.7, 236.2 | 120.8 | 86.3, 160.8 |
| No | Relaxed | Constant | +10 | −16,239.91 | 1.67 | 1.40, 1.94 | 545.9 | 352.7, 778.2 | 232.7 | 143.6, 341.5 | 182.0 | 112.9, 258.5 | 433.7 | 293.1, 608.5 | 343.4 | 230.9, 474.1 | 269.3 | 180.3, 377.8 | 158.7 | 97.5, 223.1 | |
| No | Strict | Bayesian | +33 | −16,287.82 | 1.89 | 1.33, 2.41 | 397.4 | 272.0, 545.9 | 187.4 | 123.8, 265.7 | 156.3 | 102.6, 221.0 | 327.6 | 223.6, 455.3 | 274.2 | 188.4, 379.3 | 230.3 | 157.6, 322.2 | 133.7 | 86.1, 189.4 | |
| No | Strict | Yule | +37 | −16,288.68 | 1.96 | 1.40, 2.47 | 357.6 | 252.5, 496.5 | 178.5 | 120.8, 251.8 | 151.3 | 103.3, 215.8 | 298.1 | 209.3, 415.9 | 252.3 | 176.0, 352.4 | 213.9 | 149.9, 300.7 | 129.1 | 86.6, 184.2 | |
| No | Strict | Constant | +41 | −16,289.52 | 1.90 | 1.33, 2.42 | 409.5 | 286.8, 570.3 | 197.4 | 128.4, 277.6 | 166.1 | 110.4, 237.8 | 339.8 | 228.3, 468.4 | 285.8 | 195.1, 397.5 | 240.9 | 163.8, 337.1 | 142.6 | 92.4, 201.2 | |
| 2 Cali. | 1.84 (1.43−2.21) | Relaxed | Yule | +7 | −16,241.14 | 1.72 | 1.41, 2.08 | 425.4 | 339.6, 516.6 | 213.4 | 142.7, 294.7 | 171.3 | 115.1, 237.1 | 320.2 | 228.0, 417.0 | 252.6 | 177.5, 331.1 | 198.1 | 141.5, 265.8 | 108.7 | 85.7, 132.0 |
| 2 Cali. | Relaxed | Constant | +9 | −16,240.87 | 1.81 | 1.52, 2.08 | 490.8 | 402.1, 580.2 | 224.6 | 143.5, 329.3 | 176.5 | 111.8, 255.6 | 385.2 | 270.2, 499.6 | 298.0 | 205.1, 402.9 | 226.4 | 151.0, 314.8 | 106.8 | 84.1, 130.8 | |
| 2 Cali. | Strict | Yule | +40 | −16,288.79 | 1.83 | 1.50, 2.16 | 348.9 | 301.8, 398.8 | 174.5 | 142.8, 208.9 | 147.9 | 119.3, 176.8 | 290.3 | 246.6, 334.2 | 245.8 | 209.6, 283.6 | 208.3 | 175.0, 241.1 | 125.7 | 102.0, 151.8 | |
| 2 Cali. | Strict | Bayesian | +45 | −16,289.89 | 1.85 | 1.54, 2.18 | 418.7 | 341.5, 494.1 | 192.4 | 143.4, 241.6 | 159.2 | 116.8, 200.8 | 327.9 | 257.8, 397.8 | 269.9 | 214.8, 329.9 | 222.6 | 174.2, 272.1 | 114.5 | 93.8, 135.3 | |
| 2 Cali. | Strict | Constant | +46 | −16,291.17 | 1.91 | 1.62, 2.21 | 417.2 | 343.8, 491.1 | 197.3 | 150.9, 248.6 | 165.0 | 125.8, 208.9 | 328.6 | 265.3, 400.5 | 271.5 | 217.7, 328.8 | 224.5 | 177.6, 273.1 | 117.0 | 96.0, 138.2 | |
Node numbers match the notation in Fig. 4.
Two time points were introduced in the HPV58 variant tree to calibrate the time estimate. Boldface indicates the time estimate with the “best” models.
FIG 4Divergence time estimation for HPV58 variants. A Bayesian MCMC method with a tree prior of a coalescent Bayesian skyline model and a UCLD molecular clock model of rate variation among branches under an HHS scenario, as the best model as determined by AICM (Table 5), was used to calculate the divergence times. An HPV16 variant substitution rate and two human evolutionary time points of calibration (arrowed at nodes 0 and 6) were set. Branch lengths are proportional to the times scaled in thousands of years. Gray bars indicate the 95% HPD for the corresponding divergence age. The branches are coded (Br0 to Br14), and ancestral codon mutations are listed in Table S3 in the supplemental material.
FIG 5Schematic illustration of HPV58 codivergence with archaic hominins. The model is based on HPV58 variant divergence time estimations, phylogenetic topology, and geographic distributions that superimpose ancestral viral transmission between Neanderthals/Denisovans and modern human populations. t denotes the splitting time between Neanderthals/Denisovans and modern humans, t represents the speciation of modern humans, t indicates the era of population expansion of modern humans walking out of Africa, t indicates the time of gene flow (f) that may have occurred between modern humans and Neanderthals/Denisovans, and t estimates the extinction of Neanderthals/Denisovans. The arrows indicate the out-of-Africa migration events of archaic and modern human populations. The broken lines indicate the potential extinction of viral variants. Branch lengths and widths are not drawn to scale.