| Literature DB >> 30955499 |
Anouk Willemsen1, Ignacio G Bravo1.
Abstract
Papillomaviruses (PVs) are ancient viruses infecting vertebrates, from fishes to mammals. Although the genomes of PVs are small and show conserved synteny, PVs display large genotypic diversity and ample variation in the phenotypic presentation of the infection. Most PV genomes contain two small early genes E6 and E7. In a bunch of closely related human papillomaviruses (HPVs), the E6 and E7 proteins provide the viruses with oncogenic potential. The recent discoveries of PVs without E6 and E7 in different fish species place a new root on the PV tree, and suggest that ancestral PVs consisted of the minimal PV backbone E1-E2-L2-L1. Bayesian phylogenetic analyses date the most recent common ancestor of the PV backbone to 424 million years ago (Ma). Common ancestry tests on extant E6 and E7 genes indicate that they share a common ancestor dating back to at least 184 Ma. In AlphaPVs infecting Old World monkeys and apes, the appearance of the E5 oncogene 53-58 Ma concurred with (i) a significant increase in substitution rate, (ii) a basal radiation and (iii) key gain of functions in E6 and E7. This series of events was instrumental to construct the extant phenotype of oncogenic HPVs. Our results assemble the current knowledge on PV diversity and present an ancient evolutionary timeline punctuated by evolutionary innovations in the history of this successful viral family. This article is part of the theme issue 'Silent cancer agents: multi-disciplinary modelling of human DNA oncoviruses'.Entities:
Keywords: genome evolution; oncogenes; papillomaviruses; phylogenetic dating; virus evolution
Year: 2019 PMID: 30955499 PMCID: PMC6501903 DOI: 10.1098/rstb.2018.0303
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1.Dated Bayesian phylogenetic tree for a dataset containing 343 PVs. The tree was constructed at the nucleotide level based on the concatenated E1-E2-L2-L1 genes. The scale bar is given in million years ago (Ma). Values at the nodes correspond to posterior probabilities, where asterisks indicate full support. Error bars encompass 95% highest posterior density for the age of the nodes. Clock symbols indicate the nodes used for calibration. Clades are coloured according to the PV crown group classification, as indicated in the legend on the left. Next to the tree on the right, the taxonomic group (superorder, class, order, parvorder, no rank) corresponds to the one in which the host clades could best be summarized. Below the tree, a geological time scale is drawn. The matrix next to the taxonomic host groups indicates the presence/absence of the E6, E7 and E5 genes for each PV (see legend), and the classification of E5 (α, β, γ, δ, ε, ζ) is indicated within the matrix. Next to the matrix, the size of the oncogenes is plotted. (Online version in colour.)
Inferred node age in million years ago (Ma) for the most recent common ancestors (MRCA) of the different PV clades and for the root of the tree. The rows of the PV crown groups are named accordingly (figure 1), otherwise the taxonomic host group is given. An asterisk indicates the presence of one exception within the clade of PVs infecting mammals, which is a python PV (discussed in the text). The differences between the ancestral node ages of the crown groups as well as the root are significant after performing a Kruskal–Wallis rank sum test (chi-sq. = 51993, d.f. = 5, p < 2.2 × 10−16) and a multiple comparison test after Kruskal–Wallis (electronic supplementary material, table S3). Although the inferred times and the posterior distributions for the ancestral Alpha–Omikron and Delta–Zeta as well as the Beta–Xi and Lambda–Mu clades are similar (electronic supplementary material, figure S3), the significant difference between these groups was confirmed by a Wilcoxon rank sum test (W = 57 451 000, p < 2.2 × 10 − 16 and W = 50 158 000, p < 2.2 × 1016, respectively).
| PV clade | MRCA age (Ma) | 95% HPD |
|---|---|---|
| root | 424 | 402–446 |
| amniotes | 184 | 161–208 |
| Aves/Testudines (grey) | 171 | 148–195 |
| mammals* | 121 | 112–132 |
| Lambda–Mu (yellow) | 95 | 90–100 |
| Beta–Xi (green) | 94 | 88–100 |
| Delta–Zeta (blue) | 84 | 79–90 |
| Alpha–Omikron (red) | 83 | 76–90 |
| Sirenia (black) | 77 | 68–87 |
Testing for common ancestry of E6 and E7 on reduced dataset 1. The test was performed using the software Bali-Phy on the concatenated E6 and E7 amino acid sequences as well as on E6 and E7 separately. The log marginal likelihoods (P(data|M)) are indicated for the Common Ancestry (CA) model (H0) and the alternative Independent Origin (IO) models (H1–H7). The Bayes factor for CA is calculated as ΔBF = log [Prob(CA)] − log [Prob(IO)], such that positive values favour CA and negative values indicate IO.
| model | P(data|M) | ΔBF | P(data|M) | ΔBF | P(data|M) | ΔBF |
|---|---|---|---|---|---|---|
| H0: (grey-blue-yellow-red-black-green) | −19083.055 | 0 | −10960.151 | 0 | −8239.297 | 0 |
| H1: grey+(blue-yellow-red-black-green) | −19425.259 | 342.204 | −11008.514 | 48.363 | −8300.605 | 61.308 |
| H2: grey+blue+(yellow-red-black-green) | −19595.075 | 512.020 | −11102.586 | 142.435 | −8391.373 | 152.076 |
| H3: grey+blue+yellow+(red-black-green) | −19883.098 | 800.043 | −11266.682 | 306.531 | −8515.244 | 275.947 |
| H4: grey+blue+yellow+red+(black-green) | −20108.285 | 1025.230 | −11390.535 | 430.384 | −8644.525 | 405.228 |
| H5: grey+blue+(red-black)+(green-yellow) | −19850.013 | 766.958 | −11239.646 | 279.495 | −8541.050 | 301.753 |
| H6: grey+blue+red+black+(green-yellow) | −22355.489 | 3272.434 | −11352.869 | 392.718 | −8638.945 | 399.648 |
| H7: grey+blue+yellow+red+black+green | −20317.714 | 1234.659 | −11503.409 | 543.258 | −8748.908 | 509.611 |
Testing for common ancestry of E6 and E7 on reduced dataset 2. The test was performed using the software Bali-Phy on the concatenated E6 and E7 amino acid sequences as well as on E6 and E7 separately. The log marginal likelihoods (P(data|M)) are indicated for the Common Ancestry (CA) model (H0) and the alternative Independent Origin (IO) models (H1–H7). The Bayes factor for CA is calculated as ΔBF = log [Prob(CA)] − log [Prob(IO)], such that positive values favour CA and negative values indicate IO.
| model | P(data|M) | ΔBF | P(data|M) | ΔBF | P(data|M) | ΔBF |
|---|---|---|---|---|---|---|
| H0: (grey-blue-yellow-red-black-green) | −19083.055 | 0 | −10847.336 | 0 | −8106.947 | 0 |
| H1: grey+(blue-yellow-red-black-green) | −19165.438 | 82.383 | −10915.029 | 67.693 | −8153.371 | 46.424 |
| H2: grey+blue+(yellow-red-black-green) | −19406.063 | 323.008 | −11020.832 | 173.496 | −8281.371 | 174.424 |
| H3: grey+blue+yellow+(red-black-green) | −19676.573 | 593.518 | −11174.707 | 327.371 | −8424.371 | 317.424 |
| H4: grey+blue+yellow+red+(black-green) | −19921.623 | 838.568 | −11306.744 | 459.408 | −8539.261 | 432.314 |
| H5: grey+blue+(red-black)+(green-yellow) | −19682.881 | 599.826 | −13601.982 | 2754.646 | −8411.536 | 304.589 |
| H6: grey+blue+red+black+(green-yellow) | −19886.211 | 803.156 | −13704.295 | 2856.959 | −8511.299 | 404.352 |
| H7: grey+blue+yellow+red+black+green | −20121.000 | 1037.945 | −11412.956 | 565.620 | −8642.184 | 535.237 |
Figure 2.This tree is a zoom in on the Alpha–Omikron PV crown-group shown in figure 1. Values at the nodes correspond to posterior probabilities, where asterisks indicate full support. Error bars encompass 95% highest posterior density for the age of the nodes; next to the error bars, the median node age is given in millions of years ago (Ma). Clock symbols indicate the nodes used for calibration. A black arrow indicates the timing for the emergence of E5 gene in the ancestral PV genome, between 53 and 58 Ma. Boxes display the average evolutionary rate for the complete PV tree (in grey) or for the AlphaPV subtree after the emergence of E5 (in black). On the right side of the tree, the different PV species, the clinical presentation and host taxonomy are given. Dots label HPVs that have been classified by the IARC as carcinogenic to humans (black dots, group I) or probably/possibly carcinogenic to humans (grey dots, groups IIa and IIb). The three barplots on the right represent: (a) the worldwide prevalence of each HPV in women with normal cervical cytology, with error bars indicating the 95% confidence interval; (b) the oncogenic potential for each HPV, proxyed as the ratio between the prevalence of each HPV in cervical cancers divided by the prevalence in normal cervical cytology), with error bars indicating the 95% confidence interval; (c) the E6-mediated p53 degradation activity, expressed as the inverse value of the EC50 in ng of E6 protein needed to degrade cellular p53, with higher values indicating an enhanced potential of E6 to degrade p53; error bars indicate an approximate of the standard error of the mean. The first two barplots contain data obtained from the ICO/IARC HPV Information Centre (http://www.hpvcentre.net/), while the third contains data obtained from Mesplede et al. [7]. The correlation analysis for the second and third barplot is shown in the inset at the bottom. For the raw data of the barplots, see electronic supplementary material, table S4. (Online version in colour.)