Literature DB >> 28025273

Transmission between Archaic and Modern Human Ancestors during the Evolution of the Oncogenic Human Papillomavirus 16.

Ville N Pimenoff1,2, Cristina Mendes de Oliveira3,4, Ignacio G Bravo3,5.   

Abstract

Every human suffers through life a number of papillomaviruses (PVs) infections, most of them asymptomatic. A notable exception are persistent infections by Human papillomavirus 16 (HPV16), the most oncogenic infectious agent for humans and responsible for most infection-driven anogenital cancers. Oncogenic potential is not homogeneous among HPV16 lineages, and genetic variation within HPV16 exhibits some geographic structure. However, an in-depth analysis of the HPV16 evolutionary history was still wanting. We have analyzed extant HPV16 diversity and compared the evolutionary and phylogeographical patterns of humans and of HPV16. We show that codivergence with modern humans explains at most 30% of the present viral geographical distribution. The most explanatory scenario suggests that ancestral HPV16 already infected ancestral human populations and that viral lineages co-diverged with the hosts in parallel with the split between archaic Neanderthal-Denisovans and ancestral modern human populations, generating the ancestral HPV16A and HPV16BCD viral lineages, respectively. We propose that after out-of-Africa migration of modern human ancestors, sexual transmission between human populations introduced HPV16A into modern human ancestor populations. We hypothesize that differential coevolution of HPV16 lineages with different but closely related ancestral human populations and subsequent host-switch events in parallel with introgression of archaic alleles into the genomes of modern human ancestors may be largely responsible for the present-day differential prevalence and association with cancers for HPV16 variants.
© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  Hominin evolution; divergence; evolutionary medicine; host-switch; human papillomavirus; infection and cancer; sexually transmitted infection; variant; virus-host coevolution

Mesh:

Year:  2016        PMID: 28025273      PMCID: PMC5854117          DOI: 10.1093/molbev/msw214

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Virus lifestyle shapes viral population structure through essential virus life traits, such as infectivity, immunogenicity, latency, transmission rate, mutation rate, virion productivity, and duration of the infection (Sharp 2002). The interaction between the virus and the host’s immune system strongly influences viral evolution: immune memory reduces the number of susceptible hosts, and differential immune recognition and response against different pathogen phenotypes may favour selection of certain viral lineages (Holmes 2008). Finally, the evolutionary history of the host constrains further the population structure of the virus. Thus, for viruses and hosts with a long shared evolutionary history, the virus population is shaped by population-level processes of genetic drift and natural selection, in both the virus and the host populations (Holmes 2008). Papillomaviruses (PVs) have a long common history with amniotes (Bravo and Félez-Sánchez 2015). PVs are double-stranded DNA viruses, with a circular genome of about 8 kb. More than 300 PVs have been described and above 200 of them have been retrieved from humans (Bzhalava et al. 2015). Human PVs (HPVs) infect dividing epithelia, and virtually all humans are hosts to a variable number of HPVs infections at cutaneous (Antonsson et al. 2003) but often also at mucosal sites (Ma et al. 2014). The viral infection does not kill the target cell and in most cases PVs can complete their life cycle and be maintained as a chronic, asymptomatic infection with viral genomes remaining episomally as plasmids in the infected cell (Doorbar et al. 2012). In some cases, they can cause productive, wart-like lesions or even be involved in certain cancers (Doorbar et al. 2012). The fine balance between viral replication and host immune tolerance suggests a long coexistence between PVs and their hosts, and indeed ancestral PVs probably infected the ancestral amniotes (García-Vallvé et al. 2005) and possibly all bony vertebrates (López-Bueno et al. 2016). The evolution of PVs in mammals likely started with an initial adaptive radiation event that generated a handful of viral crown groups, linked to the evolution of mammalian fur and skin glands (Gottschling et al. 2011; Bravo and Félez-Sánchez 2015). This diversification was followed by limited host-linked evolution and encompassed also incomplete lineage sorting and host-switch events (Gottschling et al. 2007; Gottschling et al. 2011). At shallower evolutionary timescale, it has been commonly assumed that HPV16 has co-diverged with modern human populations, based on certain geographical structure of HPV16 variant distribution (Bernard 1994; Cornet et al. 2012), but this hypothesis has never been rigorously tested. Infections by anogenital HPVs are very common: >80% of all sexually active adults are infected by one of these HPVs at least once in their lifetime (Einstein et al. 2009; Chesson et al. 2014), and the time point prevalence of cervical HPVs infections in sexually active healthy women is estimated to be around 12% (Bruni et al. 2010) or even higher in younger women (Molden et al. 2016). At least 12 evolutionarily related HPVs are carcinogenic to humans (Bouvard et al. 2009), causing different fractions of anogenital and oropharyngeal cancers (Moscicki et al. 2012). HPV16 is not only the most prevalent HPV in infection-associated cancers of the cervix, vulva, vagina, anus, penis, and oropharynx (Moscicki et al. 2012) but also the most prevalent HPV infection of the genital tract occurring asymptomatically in healthy individuals (Bruni et al. 2010). Within the HPV16 lineage, distribution of genetic diversity presents certain geographic structure, with different HPV16 variants showing differential prevalence in different geographical regions (Cornet et al. 2012). Furthermore, the oncogenic potential is not homogeneous for all HPV16 variants (Schiffman et al. 2010) and varies in association with different human populations (Villa et al. 2000; Berumen et al. 2001; Burk et al. 2003; Xi et al. 2006; Zuna et al. 2011; Cornet et al. 2013). Such differential connection between viral phylogeography and oncogenic potential might indicate that the evolution of HPV16 lineage has been at least partly shaped by the differential host immune response. In this study, we have analyzed human and virus genomic data to infer the evolutionary history of HPV16. The common understanding about the evolution of HPVs is that the modern repertoire of diverse HPVs has coevolved with modern humans as a host population, so that the current geographic distribution of HPVs reflects modern human migration dispersal patterns (Ho et al. 1993; Bernard 1994). Here, we show for the first time that differential coevolution of HPV16 lineages with different but closely related ancestral human populations together with recent host-switch events may be largely responsible for the present-day differential prevalence and association with cancers for HPV16 variants. The most parsimonious interpretation of our results requires that the ancestor of HPV16 already infected the ancestral human populations >500 thousands years ago (kya). The split between Neanderthals/Denisovans and modern human ancestor populations was mirrored by a split in the viral populations, namely HPV16A, carried by ancestral human populations, and HPV16BCD, carried by the populations of modern human ancestors in Africa. After the last out-of-Africa migration event 60–120 kya, the modern human ancestor populations that left Africa carried a reduced sample of the viral diversity in the continent: the HPV16B lineage remained in African populations but was lost by lineage sorting in those leaving the continent. The viral lineage HPV16CD gave rise in allopatry to the HPV16C variant in Africa and to the HPV16D variant outside Africa. Later, the interbreeding events between Neanderthal and Denisovan populations with modern human ancestor populations lead to a host-switch through sexual transmission of the HPV16A virus lineage from archaic populations into the modern human ancestors. The HPV16A lineage thus transmitted expanded rapidly in the new host populations and became dominant in Eurasia and in the Americas.

Results

HPV16 Phylogeography

We inferred the phylogenetic relationship among all available, non-recombinant HPV16 genomes (n = 118) under maximum likelihood (ML) framework for both the full-genome alignment (supplementary material S1, Supplementary Material online) and for the coding region alignment (supplementary material S2, see also supplementary table S1, Supplementary Material online). Phylogenetic inference recovered all previously reported HPV16 variant lineages, and we retained for subsequent analysis HPV16A1-3, A4, B, C, and D lineages, all supported by >98% bootstrap values for the full-genome alignment (fig. 1A and supplementary fig. S1, Supplementary Material online). Pairwise distances under ML were significantly larger (2.28 times, 95% confidence interval [CI], 1.88–2.69; Wilcoxon–Mann–Whitney test P = 2.2e–16) for the full-genome than for the selection-filtered alignment (supplementary fig. S2, Supplementary Material online).
F

Phylogenetic tree of the 118 HPV16 complete genomes. (A) Unrooted ML phylogenetic tree inferred using the complete coding region alignment after excluding a total of 26 positively selected and 115 negatively selected codons. The final alignment encompassed 118 sequences, 6,093 bp and 316 distinct alignment patterns. Bootstrap support after 500 replicates is given for branches leading to the A1–3, A4, B, C, and D variant lineages. Scale bar in substitutions per nucleotide site. (B) Maximum clade credibility tree inferred for selection-filtered HPV16 genome coding region alignment under the HHS scenario, 9.6 × 10−9 [5.5 × 10−9–13.6 × 10−9] subs/site/yr as substitution rate prior and enforcing two time point calibrations: archaic Hominin divergence at 500 kya (95% CI, 400–600 kya) and recent modern human migration out-of-Africa at 90 kya (95% CI, 60–120 kya). Scale bar in millions of years ago. Node bars indicate the 95% probability intervals for the corresponding node age.

Phylogenetic tree of the 118 HPV16 complete genomes. (A) Unrooted ML phylogenetic tree inferred using the complete coding region alignment after excluding a total of 26 positively selected and 115 negatively selected codons. The final alignment encompassed 118 sequences, 6,093 bp and 316 distinct alignment patterns. Bootstrap support after 500 replicates is given for branches leading to the A1–3, A4, B, C, and D variant lineages. Scale bar in substitutions per nucleotide site. (B) Maximum clade credibility tree inferred for selection-filtered HPV16 genome coding region alignment under the HHS scenario, 9.6 × 10−9 [5.5 × 10−9–13.6 × 10−9] subs/site/yr as substitution rate prior and enforcing two time point calibrations: archaic Hominin divergence at 500 kya (95% CI, 400–600 kya) and recent modern human migration out-of-Africa at 90 kya (95% CI, 60–120 kya). Scale bar in millions of years ago. Node bars indicate the 95% probability intervals for the corresponding node age. To assess HPV16 phylogeography, we analyzed a total of 1,719 HPV16 isolate cases (alignments available as supplementary materials S3–S4, Supplementary Material online) by complementing the 118 HPV16 full-genome sequences with 1,601 partial HPV16 isolate sequences spanning the LCR, the E6 oncogene, and the L2 capsid gene. A total of 1,680 global HPV16 isolate sequences with known geographical origin could be unambiguously assigned to a precise HPV16 lineage using an evolutionary placement algorithm (supplementary table S2, Supplementary Material online). Phylogeography of the 1,680 isolate cases showed that the HPV16A1-3 lineage predominated in Europe, South Asia, and Central/South America, and was also present in all other continental subgroups, albeit with very low prevalence in sub-Saharan Africa (fig. 2). HPV16A4 was the most prevalent lineage in East Asia and was also present in North America, but was virtually absent elsewhere. Variants B and C were largely restricted to Africa and were especially prevalent in Sub-Saharan Africa, although they were also observed in North America. Variant D was present in all continents, displaying low prevalence in Sub-Saharan Africa, and the highest frequency in Central/South America.
F

Phylogeographic distribution of the 1,680 HPV16 sequences encompassing the LCR, E6 and L2 genome loci. Each sequence was assigned to a specific HPV16 variant lineage (see the color coding for A1–3, A4, B, C, and D variants). The size of the pie charts is proportional to the number of sequences from the corresponding geographic region (supplementary table S3, supplementary material online).

Phylogeographic distribution of the 1,680 HPV16 sequences encompassing the LCR, E6 and L2 genome loci. Each sequence was assigned to a specific HPV16 variant lineage (see the color coding for A1–3, A4, B, C, and D variants). The size of the pie charts is proportional to the number of sequences from the corresponding geographic region (supplementary table S3, supplementary material online). To explain the evolution and diversity of HPV16 and to infer its origins, we explored the differential fit to the data of two alternative scenarios: 1) the recent-out-of-Africa (ROOA) model of exclusive codivergence for HPV16 with modern human populations and 2) the Hominin-host-switch (HHS) model, including a viral transmission between archaic and ancestral modern human populations. On the one hand, the ROOA scenario required exclusive codivergence of HPV16 lineages with modern human dispersals. On the other hand, the HHS scenario implied an ancestral virus-host codivergence episode during the split between Homo sapiens and H. neanderthalensis lineages, followed by more recent transmission events between Neanderthals/Denisovans and modern human ancestors (fig. 3). Both scenarios are plausible given the topology of the phylogenetic relationships between HPV16 lineages (fig. 1) and the impossibility to root the HPV16 tree using extant HPVs sequences.
F

Timeline of divergence for archaic and modern human ancestors, and for HPV16. (A) Dated and classification-confirmed Hominin fossil taxa (light green represents uncertain classification, as reviewed by Scally and Durbin 2012). (B) Phylogenetic relationship of modern human, Neanderthal and Denisovan populations. Vertical arrows indicate the proposed interbreeding and subsequent gene flow from the Neanderthal and Denisovan populations to the ancestors of present-day modern humans. The evolutionary relationships between Neanderthals and Denisovans are still unresolved (Sawyer et al. 2015; Stringer and Barnes 2015). (C) Phylogenetic relationships for 118 selection-filtered HPV16 coding genome sequences using maximum clade credibility tree under the Hominin host-switch scenario. Blue bars indicate the 95% probability intervals for the corresponding node age. Arrows indicate the nodes used for calibration.

Timeline of divergence for archaic and modern human ancestors, and for HPV16. (A) Dated and classification-confirmed Hominin fossil taxa (light green represents uncertain classification, as reviewed by Scally and Durbin 2012). (B) Phylogenetic relationship of modern human, Neanderthal and Denisovan populations. Vertical arrows indicate the proposed interbreeding and subsequent gene flow from the Neanderthal and Denisovan populations to the ancestors of present-day modern humans. The evolutionary relationships between Neanderthals and Denisovans are still unresolved (Sawyer et al. 2015; Stringer and Barnes 2015). (C) Phylogenetic relationships for 118 selection-filtered HPV16 coding genome sequences using maximum clade credibility tree under the Hominin host-switch scenario. Blue bars indicate the 95% probability intervals for the corresponding node age. Arrows indicate the nodes used for calibration. To test the explanatory power of the ROOA model, we analyzed the correlation between geography-based population structure of modern humans and of HPV16 by comparing genetic distances between geographically defined human metapopulations with the genetic distances between HPV16 isolates. We used nine previously described metapopulation groups (supplementary table S3, Supplementary Material online), excluding North America due to lack of human genome data, and tested the correlation between the FST distance matrices for humans and for HPV16 (supplementary fig. S3, Supplementary Material online). We observed indeed correlation between both matrices, but geographical structure across different human genome loci explained <30% of the worldwide HPV16 geographical structure (table 1, see also supplementary table S4, Supplementary Material online). Notably, most of the European and sub-Saharan African ascertained autosomal human genome loci variability showed significant but limited (<30%) correlation with the HPV16 LCR-E6 variability. Very interestingly, after excluding the sites under selection in the HPV16 E6 gene only the correlation with the mitochondrial genome variability and most of the sub-Saharan African but not of the European ascertained autosomal genome data remained significant, and accounted for barely 15% of the global distribution of HPV16 diversity. Nevertheless, it is important to note that after controlling for false discovery rate none of the human genome and of HPV16 correlations showed to be significant.
Table 1

Mantel’s Permutation Test for Similarity of FST Distance Matrices between HPV16 and Human Genome Sequence Data Sets.

Selection-filtered
Including positions under selection
TaxaLociRegion/SNPsnCorrelation (%)aP-valuebFDRcCorrelation (%)aP-valuebFDRc
mtDNA116 kb genome87512.40.0370.01715.80.0260.018
NRY15,000 kb5518.50.0620.02811.80.0440.036
Europechr16,3537738.80.0620.02912.70.0330.026
Europechr25,9187736.30.0910.0409.60.0540.040
Europechr35,3257738.80.0630.03212.70.0370.030
Europechr44,2117738.90.0630.03012.70.0360.029
Europechr54,7437735.60.1040.0478.70.0610.047
Europechr64,7367736.30.0950.0419.50.0550.042
Europechr74,1627735.20.1120.0488.10.0620.048
Europechr83,7507738.00.0700.03611.60.0450.037
Europechr93,2227738.80.0640.03312.70.0380.033
Europechr104,2237739.20.0590.02613.10.0350.028
Europechr113,8507735.90.0990.0469.20.0580.046
Europechr123,3397735.10.1160.0498.00.0660.049
Europechr132,2347735.80.0980.0459.20.0570.045
Europechr142,3077735.90.0970.0439.30.0560.043
Europechr152,3467736.20.0960.0429.30.0540.041
Europechr162,6367737.50.0780.03811.10.0450.038
Europechr172,1097736.80.0840.03910.10.0450.039
Europechr182,2437739.80.0560.02513.90.0330.025
Europechr191,1757733.90.1390.0506.50.0850.050
Europechr202,0677739.90.0550.02313.80.0310.021
Europechr2197377310.00.0550.02414.10.0310.022
Europechr221,1417737.70.0730.03711.60.0420.034

Note.—HPV16 sequence data (1,192 bp) encompassing LCR (735 bp) and E6 (457 bp) regions for 1,101 worldwide isolates were used in the correlations. mitochondrial DNA (mtDNA); nonrecombining part of the Y chosomosome (NRY); autosomal SNP data stratified by chromosome and ascertained using a European (Europe) ancestry data (autosomal SNP data ascertained using a sub-Saharan ancestry data is provided in supplementary table S8, supplementary material online). For both datasets nine geographic metapopulations were defined: North/East/West/Southern Africa, South Asia, East Asia, Europe, Central and South America (supplementary table S2 and S5. supplementary material online).

R2 of the correlation coefficient.

Mantel-test after 10,000 permutations.

False discovery rate controlling procedure (q = 0.05).

Mantel’s Permutation Test for Similarity of FST Distance Matrices between HPV16 and Human Genome Sequence Data Sets. Note.—HPV16 sequence data (1,192 bp) encompassing LCR (735 bp) and E6 (457 bp) regions for 1,101 worldwide isolates were used in the correlations. mitochondrial DNA (mtDNA); nonrecombining part of the Y chosomosome (NRY); autosomal SNP data stratified by chromosome and ascertained using a European (Europe) ancestry data (autosomal SNP data ascertained using a sub-Saharan ancestry data is provided in supplementary table S8, supplementary material online). For both datasets nine geographic metapopulations were defined: North/East/West/Southern Africa, South Asia, East Asia, Europe, Central and South America (supplementary table S2 and S5. supplementary material online). R2 of the correlation coefficient. Mantel-test after 10,000 permutations. False discovery rate controlling procedure (q = 0.05).

Time Inference for HPV16 Diversification

To estimate divergence times for HPV16 diversification under the two alternative evolutionary scenarios, we applied a Bayesian Markov chain Monte Carlo (MCMC) inference. First, we tested whether we could accurately identify the root of the HPV16 genome diversity by performing phylogenetic inference incorporating the most closely related HPVs, members of Alphapapillomaviruses species 9. The long evolutionary distances between these viruses and HPV16 prevented unambiguous identification of the root of HPV16 lineages, with either ML (RAxML) or Bayesian (Phylobayes) approaches (supplementary fig. S4, Supplementary Material online). In all reconstructions, the preferred topology was (A,(B,(C,D))), which corresponded to the HHS scenario. Nevertheless, the likelihood values for the HPV16 trees obtained after forcing our two alternative scenarios, namely, the HHS and ROOA models, differed by <0.01%, and neither the Shimodaira–Hasegawa test nor the Robinson–Foulds split method showed significant differences between both topologies. When introducing time into the Bayesian (BEAST) phylogenetic analyses without imposing any prior on tree topology, the same (A,(B,(C,D))) topology (i.e., the HHS model) was systematically the preferred one for both HPV16 complete and HPV16 selection-filtered coding region alignments (supplementary fig. S4, Supplementary Material online). The combined difference in topology and branch length between trees obtained with the full-length and with the selection-filtered alignments was very small (as assessed by a K-score < 0.004) (supplementary table S5, Supplementary Material online). We resorted then to dated Bayesian MCMC inference for identifying the more plausible between the two alternative evolutionary scenarios. For the HHS scenario, we used the archaic Hominin divergence at 500 kya (95% CI, 400–600 kya, see references in Materials and Methods) to calibrate the time for the most recent common ancestor (tmrca) of extant HPV16 lineages, as well as the recent modern human migration out-of-Africa at 90 kya (95% CI, 60–120 kya, see references in Materials and Methods) to calibrate the tmrca of lineages HPV16C and D. For the ROOA scenario, we used the recent migration out-of-Africa with the same 90-kya settings as above but in this case to calibrate the tmrca of all HPV16 lineages. Both strict and relaxed clock were tested for all MCMC calibrations and the best log-likelihood value using Bayes Factor (2lnBF) was obtained for relaxed clock (2lnBF > 38). The demographic models of exponential growth or of Bayesian skyline for the population function of the coalescent tree prior yielded in both cases growth rates consistently above zero, excluding thus constant population size as a model. In addition, Bayesian skyline plot estimating changes in effective population size through time showed a significant increase of the HPV16 population in the last 12 kya (supplementary fig. S5, Supplementary Material online). For final analyses, we used the evolutionary models and priors showing the best performance, namely, relaxed log-normal molecular clock and a Bayesian skyline coalescent model (2lnBF > 83). Comparison of divergence time estimates between the HHS and ROOA scenarios for HPV16 evolution is presented in table 2. For the analyses based on the positions evolving neutrally, the log-marginal likelihoods showed the highest value and strongest support (2lnBF = 8.9) for the HHS scenario with a substitution rate 18.4 × 10 − 9 (95% CI, 14.3 ×10−9–22.1 × 10−9) subs/site/yr. For the analyses that included the positions under selection, the HHS model was again the preferred model albeit with a substitution rate two times higher. The log-marginal likelihood value for the HHS scenario including the positions under selection did not significantly differ from that for the ROOA model (2lnBF = 1.2) although the inferred evolutionary rate in the case of ROOA model needed to be around five times higher. Maximum clade credibility tree inferred for selection-filtered HPV16 genome coding region alignment under the HHS scenario and enforcing the two HHS model time point calibrations is presented in fig 1B. Furthermore, to estimate whether our time point calibrations had an impact on the divergence time estimates, we also performed MCMC analysis either by imposing the HHS model topology or without imposing any prior on tree topology and without using any time point calibrations (table 3). Depending on substitution rate priors, divergence times of extant HPV16 lineages showed to be between 260 kya and 4.8 Ma. In all cases, the preferred model was the HHS scenario. Although HPV16A was thus always the basal clade, divergence within HPV16A showed to be lower than within HPV16BCD (supplementary fig. S6, Supplementary Material online).
Table 2

Substitution Rate and Divergence Time Estimates with Highest Probability Density (HPD) in Thousands of Years Ago (kya) Inferred for HPV16 Complete (Full) or Selection-Filtered Genome Alignment (Neutral), under the HHS or the ROOA Scenarios, and Considering Different Priors.

ModelCalibrationRate uniform priorData×10−9 (subs/site/yr)
Log marginal
2lnBFtmrca ABCD (kya)
tmrca A1–4 (kya)
tmrca BCD (kya)
tmrca CD (kya)
×10−9 (subs/site/yr)µ(95% HPD)likelihoodmean(95% HPD)mean(95% HPD)mean(95% HPD)mean(95% HPD)
HHSHHS 2 calibPVs estímate [13.2–24.7] aNeutral18.4(14.5–22.1)12476.94Ref461(365–561)87(48–133)197(121–293)105(82–129)
HHSHHS 2 calibHuman mtDNA [17.6–32.3]bNeutral20.3(16–25.1)−12480.346.8456(358–553)78(42–123)184(109–273)101(76–124)
HHSHHS 2 calibPVs estimate [5.5–13.6]cNeutral12.8(11.1–14.3)−12484.9115.9484(396–580)130(80–190)250(158–360)122(100–144)
HHSHHS 2 calibMammal genome [2.0–2.4]dNeutral3.6(3.1–4.1)−12567.19180.5571(488–655)423(284–578)408(267–575)180(158–201)
HHSHHS 2 calib4.5-4.5E–11Neutral20.2(14.0–26.8)−12482.1810.5457(357–5559)79(40–128)185(105–280)101(75–128)
HHSHHS 1 calib 500 kya4.5-4.5E–11Neutral13.6(8.1–20.3)−12482.4110.9490(389–587)125(55–206)311(157–471)197(95–312)
HHSHHS 1 calib 90 kya4.5-4.5E–11Neutral32.1(17.0–49.6)−12483.2112.5210(92–346)54(23–91)134(72–203)85(53–115)
ROOA90 kya4.5-4.5E–11Neutral80.5(40.3–128)−12481.418.985(54–115)22(9–38)54(24–87)34(14–56)
HHSHHS 2 calibPVs estímate [13.2–24.7]aFull22.9(20.3–25.3)−19313.694.7476(387–568)147(101–200)246(164–343)127(106–148)
HHSHHS 2 calibHuman mtDNA [17.6–32.3]bFull27.5(23.5–31.0)19311.33Ref459(364–551)120(79–166)216(140–300)116(95–138)
HHSHHS 2 calibPVs estimate [5.5–13.6]cFull14.4(13.1–15.8)−19333.8144.9519(431–607)240(167–324)330(212–455)152(8131–173)
HHSHHS 2 calibMammal genome [2.0–2.4]dFull4.6(4.0–5.2)−19502.91383.1650(571–727)616(518–705)411(320–498)214(8189–232)
HHSHHS 2 calib4.5-4.5E–11Full33.9(24.4–43.4)−19314.235.8446(346–544)95(53–142)186(110–271)105(80–132)
HHSHHS 1 calib 500 kya4.5-4.5E–11Full22.0(13.2–31.2)−19311.871.1490(390–586)156(78–240)320(176–473)209(111–319)
HHSHHS 1 calib 90 kya4.5-4.5E–11Full55.0(31.8–82.9)−19311.821199(99–322)63(31–100)130(73–193)84(54–115)
ROOA90 kya4.5-4.5E–11Full130.6(71.8–206.0)−19311.941.284(54–115)27(12–44)55(27–88)36(17–58)

Note.—Likelihood values in bold are the best-supported models, taken as reference (Ref) for the path sampling Bayes factor (2lnBF) comparison.

Rector et al. (2007).

Fu et al. (2014).

Shah et al. (2010).

Kumar and Subramanian (2002).

Table 3

Substitution Rate and Divergence Time Estimates with Highest Probability Density (HPD) in Thousands of Years Ago (kya) Inferred for HPV16 Complete (Full) or Selection-Filtered Genome Alignment (Neutral) Either under the HHS Scenario or without Imposing Any Prior on Tree Topology, and without Using Any Time Point Calibrations, and Using Instead Only Substitution Rate Priors.

DataModelCalibrationUniform prior×10−9 (subs/site/yr)
tmrca ABCD (kya)
tmrca A1-4 (kya)
tmrca BCD (kya)
tmrca CD (kya)
×10−9 (subs/site/yr)µ(95% HPD)Mean(95% HPD)Mean(95% HPD)Mean(95% HPD)Mean(95% HPD)
FullNo topologyNo calibPVs estimate [13.2–24.7]a18.0(12.5–23.8)581(298–911)188(109–278)391(221–589)261(146–392)
NeutralNo topologyNo calib18.2(12.4–24.4)340(168–573)90(46–141)231(121–355)156(83–247)
NeutralHHSNo calib18.2(12.4–24.2)361(178–592)92(50–147)228(123–353)145(81–222)
FullNo topologyNo calibHuman mtDNA [17.6–32.3]b23.6(16.4–31.1)440(223–693)142(85–214)296(167–442)198(112–296)
NeutralNo topologyNo calib24.0(16.4–31.7)258(124–420)68(37–107)175(94–269)118(61–186)
NeutralHHSNo calib23.9(16.4–31.6)273(136–444)70(37–108)173(95–271)110(60–167)
FullNo topologyNo calibPVs estimate [5.5–13.6]c9.0(5.2–12.8)1181(532–2037)384(201–627)803(400–1330)536(267–870)
NeutralNo topologyNo calib9.1(5.2–13.2)698(296–1228)185(85–313)185(86–312)317(145–539)
NeutralHHSNo calib9.1(5.2–13.1)742(324–1297)189(90–320)469(220–791)298(141–493)
FullNo topologyNo calibMammal genome [2.0–2.4]d2.1(1.8–2.4)4819(2914–7086)1564(1077–2133)3249(2108–4458)2177(1466–3015)
NeutralNo topologyNo calib2.1(1.8–2.5)2863(1576–4415)755(457–1086)1919(1206–2741)1301(784–1916)
NeutralHHSNo calib2.1(1.8–2.4)3010(1686–4582)767(475–1123)1900(1187–2738)1208(761–1691)

Rector et al. (2007).

Fu et al. (2014).

Shah et al. (2010).

Kumar and Subramanian (2002).

Substitution Rate and Divergence Time Estimates with Highest Probability Density (HPD) in Thousands of Years Ago (kya) Inferred for HPV16 Complete (Full) or Selection-Filtered Genome Alignment (Neutral), under the HHS or the ROOA Scenarios, and Considering Different Priors. Note.—Likelihood values in bold are the best-supported models, taken as reference (Ref) for the path sampling Bayes factor (2lnBF) comparison. Rector et al. (2007). Fu et al. (2014). Shah et al. (2010). Kumar and Subramanian (2002). Substitution Rate and Divergence Time Estimates with Highest Probability Density (HPD) in Thousands of Years Ago (kya) Inferred for HPV16 Complete (Full) or Selection-Filtered Genome Alignment (Neutral) Either under the HHS Scenario or without Imposing Any Prior on Tree Topology, and without Using Any Time Point Calibrations, and Using Instead Only Substitution Rate Priors. Rector et al. (2007). Fu et al. (2014). Shah et al. (2010). Kumar and Subramanian (2002).

Variation within the HPV16 E6 Oncogene

To investigate geographical differences in HPV16 genetic diversity, we estimated summary statistics within each geographic HPV16 population using all 1,123 available HPV16 nearly complete E6 sequences (table 4). Overall, HPV16 E6 genetic diversity was highest outside sub-Saharan Africa. Both Tajima’s D and Fu’s-Li’s D statistics as well as the more robust R2 index showed statistically large negative values for European isolates, suggesting past selective sweeps and/or population expansion, for HPV16 population in Europe only. HPV16 E6 haplotype diversity was similar in sub-Saharan Africa, Europe, and South America and significantly higher in North Africa and Central America than in sub-Saharan Africa (table 4). East Asian and Central American HPV16 isolates showed higher average number of pairwise differences compared with sub-Saharan African isolates, even after accounting for intralineage diversity. Furthermore, to estimate the geographical origin of the HPV16 E6 gene haplotype diversity, we constructed a median-joining haplotype network for the E6 gene sequences (table 5, see also supplementary fig. S7, Supplementary Material online). Two of the most common HPV16A1-3 lineage E6 haplotypes were observed in Eurasian isolate background, and only 1–3% of these common haplotypes originated from sub-Saharan African isolates. In contrast, the most frequent HPV16B and C variant E6 haplotypes were observed in >92% of African isolate background. The most common HPV16D E6 haplotype displayed the highest frequency (58%) in the American isolate background and the lowest (4–5%) in North and sub-Saharan African isolates. The second most common HPV16D E6 haplotype showed the highest frequency (58%) in North African isolates, intermediate frequencies in Eurasian and in American isolates (18% and 21%, respectively) and very small (3%) of the sub-Saharan African isolate background.
Table 4

Summary Statistics of the HPV16 E6 Gene Sequences (n = 1,123) with Geographic Origin of the Sample.

MetapopulationnbpShHd(95% CI)a(95% CI)bMPD(95% CI)a(95% CI)bWIMPTajima’s DP-valuec(95% CI)bP-valuebFu’s-Li’s D*(95% CI)bP-valuebR2(95% CI)bP-valueb
sub-Saharan Africa21345634320.754(0.712 –0.796)(0.439 –0.892)3.049(2.739 –3.359)(0.815 –7.389)1.459–1.427>0.1(–1.642 to 2.053)0.047–3.459(–2.178 to 1.523)0.0030.044(0.033 –0.148)0.100
North Africa22145630260.861(0.839 –0.883)(0.653 –0.918)4.985(4.52 –5.45)(1.543 –12.592)0.969–0.196>0.1(–1.598 to 2.097)0.519–3.288(–1.959 to 1.311)0.4660.081(0.035 –0.14)0.533
Europe16145631250.734(0.688 –0.78)(0.16 –0.838)1.62(1.401 –1.839)(0.194 –4.219)1.187–2.043<0.05(–1.62 to 2.05)0.001–3.825(–1.958 to 1.447)0.0020.026(0.028 –0.164)0.017
South Asia139456860.523(0.455 –0.591)(0.056 –0.797)0.998(0.831 –1.165)(0.043 –2.744)0.464–0.724>0.1(–1.509 to 2.026)0.253–0.454(–2.302 to 1.297)0.2810.062(0.026 –0.193)0.302
East Asia20745631370.643(0.567 –0.719)(0.495 –0.894)3.425(3.079 –3.771)(0.853 –8.014)1.7121.211>0.1(–1.593 to 2.11)0.091–2.848(–2.415 to 1.606)0.0140.054(0.032 –0.146)0.180
Central America5045621140.889(0.851 –0.927)(0.589 –0.925)4.551(3.641 –5.461)(1.197 –10.979)1.803–0.237>0.1(–1.683 to 2.033)0.475–1.027(–2.568 to 1.511)0.1670.103(0.052 –0.177)0.503
South America13240218170.78(0.744 –0.816)(0.486 –0.893)3.184(2.774 –3.594)(0.772 –7.859)0.947–0.096>0.1(–1.613 to 1.896)0.526–0.850(–2.371 to 1.543)0.1850.087(0.035 –0.156)0.542

Note.—Number of samples (n), base pair (bp), Segregating sites (S), haplotypes (h), haplotype diversity (Hd), mean pairwise difference (MPD), weighted intralineage mean pairwise difference (WIMP) (Hurles et al. 2002), Tajima’s D, Fu’s and Li’s D* and R2 statistics were calculated using DnaSP (Librado and Rozas 2009).

Calculcated from empirical distribution.

Calculated from a coalescent simulation with 1,000 replications and no recombination.

Calculated assuming beta distribution.

Table 5

HPV16 E6 Gene Major Haplotype Frequencies within Each Lineage and Geographical Background.

OriginA1–3a
A1–3b
A4
B
C
D
n%n%n%n%n%n%n%
America5019.54019.021.511.242.64557.7821.1
Eurasia17768.911856.212798.511.275.02633.3718.4
North Africa2710.54621.944.75337.645.02257.9
Sub-Saharan Africa31.262.97992.97754.633.812.6

Major haplotype including HPV16 E6 350G allele.

Major haplotype including HPV16 E6 350T allele.

Summary Statistics of the HPV16 E6 Gene Sequences (n = 1,123) with Geographic Origin of the Sample. Note.—Number of samples (n), base pair (bp), Segregating sites (S), haplotypes (h), haplotype diversity (Hd), mean pairwise difference (MPD), weighted intralineage mean pairwise difference (WIMP) (Hurles et al. 2002), Tajima’s D, Fu’s and Li’s D* and R2 statistics were calculated using DnaSP (Librado and Rozas 2009). Calculcated from empirical distribution. Calculated from a coalescent simulation with 1,000 replications and no recombination. Calculated assuming beta distribution. HPV16 E6 Gene Major Haplotype Frequencies within Each Lineage and Geographical Background. Major haplotype including HPV16 E6 350G allele. Major haplotype including HPV16 E6 350T allele.

Discussion

HPV16 is the most oncogenic virus for humans, and epidemiological evidence suggests that its oncogenic potential is associated with high viral loads and with infection persistence (Doorbar et al. 2012; Moscicki et al. 2012). Viral genotypic variation may underlie strong phenotypic variation so that different HPV16 variants may display differential persistence and viral load and therefore differential carcinogenicity. Several reports have communicated differential oncogenic potential for different HPV16 variant lineages (Villa et al. 2000; Schiffman et al. 2010; Zuna et al. 2011; Freitas et al. 2014). However, the differential risks for different variants seem to depend on the host population (Cornet et al. 2013; Qmichou et al. 2013), and to vary with the specific individual genetic background (Xi et al. 2006; Lopera et al. 2014). In the present study, we have addressed the origin, diversification, and dispersal of HPV16. We have used the largest available collection of HPV16 full-length genomes and partial sequences to generate estimates of the worldwide phylogeography and divergence time of extant HPV16 lineages. Our results support a scenario of codivergence of HPV16 with separate but closely related ancestral Hominin populations with subsequent host-switch events, likely through sexual transmission between archaic and modern human ancestral populations in the recent human evolutionary history. We have further compared the genetic diversity patterns of HPV16 with those estimated for modern human populations using genome data. Altogether, our results suggest that virus-host coevolution and host switch events have been fundamental factors that have shaped HPV16 evolution. Purifying selection can mask the ancient origin of recently sampled pathogens (Wertheim and Kosakovsky Pond 2011), and our results point in the same direction. Results based on only HPV16 genome positions evolving neutrally showed significantly better fit to the data for our HHS model while analyses including positions under selection still preferred the HHS model but could not differentiate the two alternative scenarios, namely, the ROOA and the HHS model, tested in this study (table 2). More importantly, our MCMC inference showed that irrespective of the evolutionary rate prior used, divergence times for extant HPV16 lineages largely predate the estimated recent out-of-Africa migration of modern human ancestors 60–120 kya. Furthermore, although HPV16A is the oldest lineage, the tmrca for the HPV16BCD variants (197 kya, 95% HPD 121–291 kya) was older than the tmrca for the HPV16A lineage (88 kya, 95% HPD 50–134 kya), and the HPV16A lineage encompasses less genetic diversity than the sister HPV16BCD lineages. Taken together, the lower divergence within HPV16A, the good match for the tmrca of this lineage with the out-of-Africa migration of modern humans and the delayed expansion of HPV16A reinforce the HHS scenario: the exceptional interbreeding events between archaic and modern human populations resulted in a strong bottleneck for the transmission of HPV16A, so that only a small fraction of the HPV16A diversity available in the Neandertal/Denisovan populations underwent effectively a horizontal transfer toward populations of ancestral modern humans. Mounting evidence suggests that virus-host codivergence is not the only essential driving force for PV evolution. At higher taxonomic levels virus-host co-phylogeny events explain only around 30% of the PV evolutionary history, while additional events of lineage sorting, lineage duplication and host switch have played major roles during the evolution of PVs (Gottschling et al. 2011). In the present study, we have quantified for the first time that codivergence between modern humans and HPV16 explains <30% of the current geographical distribution of the viral extant diversity, largely based on the virtual absence of HPV16B and HPV16C lineages outside Africa and on the enrichment of HPV16A and HPV16D outside Africa (fig. 2). The contribution of virus-host codivergence to HPV16 evolution, regardless of excluding the viral coding region sites under selection, showed to be consistent only for mitochondrial genome variability and less consistent using European ascertained autosomal single nucleotide polymorphisms (SNPs) compared with sub-Saharan African ascertained autosomal SNPs. It is tempting to speculate that this difference in consistency might be partly due to the SNPs introgressed from archaic populations into the genomes of non-African modern humans. However, the underlying biological complexities and limitations of the available data to compare the human genome and of HPV16 genetic diversity restrained us from further estimations of local adaptation between virus and host genomes. Nevertheless, under the ROOA scenario, one would have expected a consistently high correlation between human and viral phylogeography. Instead, several lines of evidence argue further against this ROOA scenario for HPV16. First, the HHS scenario is preferred over ROOA in terms of HPV16 lineage phylogeny (supplementary fig. S6, Supplementary Material online). Second, the hypothesis of exclusive codivergence of all HPV16 lineages with the ancestral dispersals of modern humans does not explain the largely predominant role of HVP16A (fig. 2), the most basal HPV16 lineage, in all continents and in all indigenous populations, except in sub-Saharan Africa (Picconi et al., 2003; Cornet et al. 2012; Mendoza et al. 2013; Qmichou et al. 2013; Tan et al. 2013; Lopera et al. 2014). Moreover, for indigenous populations in South America, for instance, the increased presence of HPV16A lineage variants has been proposed to reflect the influence of recent European occupation (Picconi et al. 2003). However, such a rapid selective sweep of the putative pre-Columbian HPV16A genetic diversity would require strong selection forces for the viral dynamics in very short time scale, which are not compatible with our current understanding of PV evolution (Bravo and Félez-Sánchez 2015). Third, the geographical dichotomy for the HPV16 E6 gene haplotype backgrounds consistently supported our HHS model over ROOA: most common HPV16A lineage haplotypes were observed mostly in Eurasian populations while most common HPV16B, C, and D lineage haplotypes were observed mainly in African populations (table 5). And fourth, while human genetic diversity is highest in sub-Saharan Africa (1000 Genomes Project Consortium et al. 2015), our results of HPV16 E6 variability showed consistently higher genetic diversity estimates outside sub-Saharan Africa (table 4). After rejecting the ROOA model as explanatory framework for HPV16 evolution, we have explored the HHS model as alternative scenario, implying a host switch of an ancestral HPV16 lineage between archaic and modern human ancestral populations (fig. 4). Indeed, recent genomic studies have confirmed the admixture of modern human ancestors with Neanderthals and Denisovans through repeated interbreeding after migration to Europe and Asia, respectively (Reich et al. 2010; Lazaridis et al. 2014; Qin and Stoneking 2015; Vernot and Akey 2015). Such interbreeding events lead to adaptive introgression of genetic material from our closest evolutionary relatives into the genome of a subset of modern human ancestors. We propose here that in parallel to the introgression of genetic material, sexual intercourse between Neanderthals/Denisovans and modern human ancestors also prompted sexual transmission of PVs, chiefly of the HPV16A lineage (fig. 4). Our results on HPV16 phylogeography, genetic diversity, and tmrca of the different HPV16 lineages sustain this HHS scenario, in which the divergence of extant HPV16 lineages (ca. 460 kya, table 2) predates the recent out-of-Africa migration of modern human ancestors (fig. 3), and thus, ancestral HPV16 lineages probably already infected the ancestral, closely related Hominin groups. The host split between the ancestral archaic (Neanderthal/Denisovan) and the modern human ancestor populations was probably mirrored by a viral split resulting in the ancestral HPV16A lineage and of the ancestral HPV16BCD lineage, respectively (fig. 3). Among the recurrent out-of-Africa expansions of ancestral Hominin groups, which may have occurred some 460 kya, Neanderthals/Denisovans may have carried essentially the ancestral HPV16A. Evolution of HPV16 genomes in ancestral Hominin populations remaining in Africa, instead, would have lead to HPV16B and CD lineages (fig 4). The virtual absence of HPV16B outside Sub-Saharan Africa is parsimoniously explained if in the last out-of-Africa expansion, the modern human ancestors that left Africa probably lost the ancestral HPV16B lineage by a lineage sorting event (Johnson et al. 2003). Similar pattern of virus extinction after host population bottleneck has been observed in non-human primates (Kapusinszky et al. 2015). After the modern human dispersal, the HPV16CD ancestor generated in allopatry the HPV16C lineage in the populations remaining in Africa, and the HPV16D lineage in the populations outside Africa (fig. 4). During their expansion in Europe and in Asia, modern human ancestors experienced limited admixture with Neanderthal and Denisovan populations and were exposed to the HPV16A lineage, most likely through sexual contact. After the adaptive introgression of Neanderthal-Denisovan alleles, anatomically modern humans of non-African ancestry carry nowadays in average between 2% and 4% of their genomes of archaic human origin (Meyer et al. 2012; Prüfer et al. 2014). Remarkably, the horizontal transfer of genetic loci upon interbreeding of Neanderthal/Denisovan ancestry into the genome of modern human ancestors did not occur at random (Vernot and Akey 2014; Kuhlwilm et al. 2016; Vernot et al. 2016). Instead, genes involved in keratinocyte differentiation and innate immunity, which are loci directly involved in host–PV interaction, are particularly enriched in introgressed genomic loci, displaying >60% of Neanderthal/Denisovan ancestry in present day Eurasians (table 6) (Abi-Rached et al. 2011; Sankararaman et al. 2014; Vernot and Akey 2014; Vernot et al. 2016). Such genetic changes possibly altered the HPV16-human interplay, as this virus exclusively infects keratinocytes in particular epithelia and requires keratinocyte differentiation to accomplish the virus life cycle and virion production (Doorbar et al. 2012). Hence, we speculate that the adaptive introgression in modern humans of archaic alleles involved in keratinocyte differentiation and innate immunity may have elicited changes in the ecological niche of HPV16 that significantly enhanced the adaptive value of HPV16A and lead to increased prevalence of this viral lineage in modern human populations with Eurasian origin (figs. 2, 3, and 4).
F

Cartoon timeline depicting interbreeding and subsequent gene flow of archaic alleles from Neanderthals and Denisovans into modern humans and the proposed sexual transmission of HPV16A lineage to the ancestors of modern human populations in Eurasia. Viral lineages HPV16A, B, C, and D are labeled at the bottom.

Table 6

Global Prevalence of HPV16A Lineages (Upper Panel) and Proportion of Nonrandomly Introgressed Archaic Alleles into Modern Human Ancestors (Lower Panel).

Human papillomavirus 16A lineagesGeography
Ref.
Sub-Saharan AfricaWest EurasiaEast Eurasia
HPV16A1-3<5%93%25-91%This study
HPV16A4Absent∼1%60%This study
Host’s genome loci directly involved in PVs life-cycle
 Neanderthal ancestry lociAbsent64%62%Sankararaman et al. (2014)
 Loci involved in keratinocyte differentiationAbsent40–70%40–66%Vernot and Akey (2014)
 HLA I loci (innate immunity)<7%52–59%72–82%Abi-Rached et al. (2011)
 Toll-like receptor loci (innate immunity)Absent15–39%17–51%Dannemann et al. (2015)
 APOBEC3A deletion (innate immunity)<1%7%14-93%Kidd et al. (2007)

Note.—A selection of host’s genome loci is listed, directly involved in the Papillomavirus life-cycle through keratinocyte differentiation and innate immunity.

Cartoon timeline depicting interbreeding and subsequent gene flow of archaic alleles from Neanderthals and Denisovans into modern humans and the proposed sexual transmission of HPV16A lineage to the ancestors of modern human populations in Eurasia. Viral lineages HPV16A, B, C, and D are labeled at the bottom. Global Prevalence of HPV16A Lineages (Upper Panel) and Proportion of Nonrandomly Introgressed Archaic Alleles into Modern Human Ancestors (Lower Panel). Note.—A selection of host’s genome loci is listed, directly involved in the Papillomavirus life-cycle through keratinocyte differentiation and innate immunity. Taken together, our HHS evolutionary model provides explanatory potential for the three central flaws of the alternative ROOA scenario for the evolution of HPV16, namely, 1) the distant phylogenetic relationship between HPV16A and HPV16BCD clades (fig. 1), which predates the most recent out-of-Africa migration of modern human ancestors (table 3); 2) the global increased prevalence of HPV16A outside Africa (fig. 2), and 3) the increased viral genetic diversity outside sub-Saharan Africa (table 4). In agreement with our HHS model, recent studies of parasites evolution have concluded that the divergence of at least five present day human parasites (lice, tapeworm, follicle mites, a protozoan, and bedbug) predates modern human origin. The most likely explanation for the presence of these closely related ancient pairs of parasite taxa involves host switch from an archaic to modern human ancestors (Ashford 2000; Reed et al. 2004). Furthermore, evolutionary studies of sexually transmitted herpes simplex-virus suggest ancient codivergence and later cross-species transmission of this pathogen between ancestors of chimpanzees and extinct Hominin groups (Wertheim et al. 2014). Despite the explanatory power of our results, our study suffers from a number of limitations. Ideally, the global host and pathogen sequence data should be analyzed at the level of full genomes from the same individuals, but such data set is not available. Furthermore, there is hitherto no evidence of the presence of any PV sequences from ancient human samples. Indeed, we analyzed the currently available Neanderthal and Denisovan preassembly sequence data, and we could not find any significant traces of any known HPVs in these data sets. Unfortunately, the epithelial tropism of these viruses likely prevents retrieving viral sequences from fossil bones, the common reservoir for ancient DNA studies. We have also tried to identify sequence traces of PV DNA from the metagenomic sequence data extracted from tissue remains of the European Copper Age glacier mummy (Maixner et al. 2014) but could not find any traces of PV DNA in any of these sample data sets. However, future studies encompassing larger global sample of individuals with full human and HPV genome sequence data and/or well-preserved ancestral human samples containing PVs DNA will allow to test and refute or validate our hypothesis. We have presented here the most comprehensive study of the evolution and diversity of HPV16, the most oncogenic infectious agent for humans. Our phylogeographic analyses confirm the limited contribution of virus-host codivergence to the evolution of the HPV16 lineage: most of the geography-based diversity within HPV16 cannot be explained alone by co-speciation with modern humans. Instead, our results suggest that ancestral HPV16 already infected the ancestor of H. sapiens and H. neanderthalensis half a million years ago, and that two main HPV16 lineages codiverged with either human lineage. When a population of modern humans migrated out of Africa some 60–120 kya, they carried with them a subset of the HPV16 diversity evolved in that continent. When modern humans encountered and interbred with Neanderthal/Denisovan populations in Europe and in Asia, a transfer of sexually transmitted pathogens occurred, in parallel with the genomic introgression. Gene alleles involved in keratinocyte differentiation and in innate immunity evolved among Neanderthals/Denisovans and transferred to modern humans, may have facilitated niche colonization and expansion of HPV16A in modern human populations with archaic admixture. This evolutionary advantage led to the enhanced prevalence of HPV16A lineage among modern humans with introgressed archaic genetic background. Indeed, our results show that sexually transmitted pathogens may have been effectively transferred between different Hominin populations in recent events. We propose that the repertoire of the HVP16 variants coevolved with archaic humans, the diversity of keratinocyte differentiation and innate immune genes that have undergone adaptive introgression, and eventually the interaction between viral and host genotypes may be largely responsible for the differential association of certain HPV16 variants with cancer risk in certain modern human populations.

Materials and Methods

HPV16 Complete Genomes and Partial Sequences

The full-genome sequences for 118 HPV16 isolates used for the analysis were retrieved from GenBank. Sequences were aligned with MUSCLE (Edgar 2004), at amino acid level for the coding region and at nucleotide level for the UTR, and tested for putative recombination break points using GARD algorithm (Kosakovsky Pond et al. 2006). The final full-genome alignment encompassed 118 full-length HPV16 genomes, 7,926 bp and 638 distinct alignment patterns (alignment available as supplementary material S1, Supplementary Material online). Best substitution model for the HPV16 full-genome data, manually curated and filtered using Gblocks (Castresana 2000), was inferred using jModelTest 2.0 (Darriba et al. 2012). Phylogenetic inference was performed under ML framework with RAxML v8.0.16 using the general time reversible (GTR) model of nucleotide substitution with Γ-rate heterogeneity parameter (GTR + Γ4) and 500 bootstrap replicates (Stamatakis 2014). Besides the unpartitioned analysis, four partition schemes were also explored: 1) partitions of noncoding and coding regions; 2) the same as before, but further partitioning the coding region into three codon positions; 3) eight partitions corresponding to the regulatory region and to the E6, E7, E1, E2, E5, L2, L1 genes; and 4) the same as before, but further partitioning each coding regions into three codon positions. Among all partition schemes explored, the unpartitioned scheme rendered the best AIC scores. In order to focus on the HPV16 genome positions evolving under neutrality and to filter out the possible bias arising from the presence of positions undergoing natural selection, we identified and removed a total of 26 positively selected and 115 negatively selected codons from the full-genome alignment (supplementary table S1, Supplementary Material online). Each coding region of the HPV16 genome was separately assessed for codon-based selection using random effects likelihood (REL) model with ML fit (Pond and Muse 2005) and Bayes Factor of 20 (i.e., P-value = 0.05) as a level of significance, except the L2 gene region, which was assessed for selection using ML reconstruction of ancestral codons with a P-value = 0.05 as a level of significance, due to computational limitations of the REL model with the L2 alignment (Kosakovsky Pond and Frost 2005). The final coding region alignment manually curated and filtered for sites under selection, encompassed 118 HPV16 sequences, 6,093 bp and 316 distinct alignment patterns (alignment available as supplementary material S2, Supplementary Material online). To define the root of the HPV16 complete genome ML tree, the five main lineages (i.e., A1–3, A4, B, C, and D) were identified and the two most divergent members of each lineage were retained. These 10 HPV16 variant genomes were aligned, separately for each E1, E2, L2, and L1 coding region, with the genomes of all other members of Alphapapillomaviruses species 9: HPV31, HPV33, HPV35, HPV52, HPV58, and HPV67. The concatenated coding regions genome alignment was submitted to ML phylogenetic inference following the same protocol described above. The procedure was replicated using Bayesian inference with Phylobayes (Lartillot and Philippe 2004), both at the amino acid and at nucleotide level. The differential likelihood of our two alternative hypothesis for the location of the root in the HPV16 lineage namely, ROOA and HHS models, was further assessed using Shimodaira and Hasegawa (1989) and Robinson and Fouls (1981) comparison tests. Phylogenetic relationships of partial HPV16 sequences spanning the LCR, E6 and L2 loci from 1,601 HPV16 isolates retrieved worldwide (supplementary table S6, Supplementary Material online) were placed in the genetic landscape of HPV16 complete genome variability using an Evolutionary Placement Algorithm on RAxML v8.0.16 with the –f v command and the GTR + Γ4 model (Berger and Stamatakis 2011). The algorithm provides likelihood weights for placing partial sequences into the different nodes in the reference tree (supplementary table S2, Supplementary Material online), which in our case was the best ML tree obtained for the 118 complete HPV16 genomes (supplementary fig. 1, Supplementary Material online). Partial HPV16 sequences were introduced into the initial genome alignment using MAFFT (Katoh and Standley 2013). The final alignment consisted of 1,719 isolates (118 complete genomes, 1,082 isolates of 1,338 bp partial sequences spanning the LCR and E6 loci and 519 isolates of 665 bp partial sequences spanning the LCR, E6 and L2 loci, see available alignments in supplementary materials S3–S4, Supplementary Material online). For each partial sequence, we integrated the likelihood weights for all nodes using 0.7 as a likelihood cutoff value to confidently assign each sample into one of the five main HPV16 variant lineages (supplementary table S2, Supplementary Material online). Finally, we stratified the HPV16 variant isolate data into 10 geographical population groups: North/East/West/Southern Africa, South Asia, East Asia, Europe, Central and South America using the United Nations standard geographical criteria (supplementary table S3, Supplementary Material online). To compare the genetic variability patterns observed for HPV16 isolates within each geographic population group we calculated the summary statistics of the available 1,123 HPV16 E6 sequences (table 4). Further to visualize the genetic variability of the 1,123 HPV16 E6 456bp nearly complete gene sequences within each geographic background a haplotype-specific median joining network was constructed using NETWORK 4.6.1.3 (www.fluxus-technology.com) with equal weight for variable positions.

Bayesian MCMC Inference

Evolutionary analysis for HPV16 lineages was estimated using both the complete 7,926 bp genome alignment and the selection-filtered 6,093 bp coding region alignment of 118 HPV16 variants (supplementary materials S1–S2,Supplementary Material online), and using Bayesian MCMC analysis as implemented in BEAST v.1.8.1 (Drummond et al. 2012). Bayesian MCMC inferences were performed with the GTR + Γ4 model. Genealogies were estimated using both strict and relaxed molecular clocks with an uncorrelated log-normal distribution of rates. Two credible PVs substitution rate estimates from the literature were used as uniform priors: 9.6 × 10−9 [5.5 × 10−9–13.6 × 10−9] subs/site/yr (Shah et al. 2010) and 19.5 × 10−9 [13.2 × 10−9–24.7 × 10−9] subs/site/yr (Rector et al. 2007). We also assessed the suitability of two additional plausible substitution rates, namely, the values inferred for mammalian genomes 2.2 × 10−9 [2.0 × 10−9 − 2.4 × 10−9] subs/site/yr (Kumar and Subramanian 2002) and for human mitochondrial genomes 25.3 × 10−9 [17.6 × 10−9–32.3 × 10−9] subs/site/yr (Fu et al. 2014). In addition, a noninformative uniform prior between 4.5 × 10−11 and 4.5 subs/site/yr covering all plausible PVs rate for both molecular clock models was used (table 2). Two recently revised evolutionary time point estimates for human populations were used as prior normal distribution for calibration of the HPV16 variant phylogenetic tree (table 2 and fig. 3C): first, the archaic divergence of modern humans and Neanderthals/Denisovans was set at 500 kya (95% CI, 400–600 kya) (Reich et al. 2010; Scally and Durbin 2012; Buck and Stringer 2014; Mendez et al. 2016); second, modern humans migration out-of-Africa was set at 90 kya (95% CI, 60–120 kya) (Scally and Durbin 2012; Liu et al. 2015). All evolutionary related parameters were estimated with the following demographic models: 1) constant population size, 2) exponential growth, and 3) Bayesian skyline coalescent models (Pybus et al. 2000). Two independent runs were performed for each parameter combination with 100 million generations and subsampling every 10,000 generations. The resulting tree and log-files were combined for further analysis after removing a 10% burn-in from each run. Statistical confidence in parameter estimates was assessed by reporting marginal posterior means and their associated 95% HPD intervals (95% HPD). Efficient mixing of the chains in the Bayesian MCMC analysis was assessed with effective sample size values >200 for all parameters. Convergence of each of the simulations was visually confirmed before data were merged for further analysis. The combined Bayesian phylogenetic inferences were analyzed using Tracer v.1.6 and TreeAnnotator v.1.8.1 and visualized using Figtree v.1.4.2. Best model estimates to explain the HPV16 variant diversity were selected using two independent path sampling runs with at least 1 million iterations per path step and until no significant chances were observed in log marginal likelihoods for Bayes Factor (2lnBF) estimates (Baele et al. 2012, 2013) as implemented in BEAST v.1.8.1 (Drummond et al. 2012). The support of a particular model was assessed with 2lnBF < 2 indicating no support, 2lnBF = 2–6 indicating positive model support, 2lnBF = 6–10 indicating strong positive support and 2lnBF > 10 indicating definite distinction between competing models. To determine the best topology for the HPV16 viral lineages, the maximum clade credibility trees were also inferred for both complete and selection-filtered coding region HPV16 genome alignments, without imposing any prior on tree topology, using BEAST v.1.8.1 (Drummond et al. 2012), and further assessing the distance between the obtained trees using K tree score (Soria-Carrasco et al. 2007). Wilcoxon–Mann–Whitney U test was used to determine the significance of intertaxa pairwise differences between the complete and selection filtered coding region HPV16 ML trees. Summary statistics, including Tajima’s D, Fu’s and Li’s D* and R2 were calculated using DnaSP v.5.10.1. (Librado and Rozas 2009).

Human Genome Data

To compare the phylogeography of HPV16 with the global variability of the human genome, we compiled human mitochondrial, Y chromosome, and autosomal genotype data of the same 938 nonrelated individuals of the CEPH Human Genome Diversity Panel representing 51 human populations worldwide (Cann et al. 2002). In order to systematically compare the host and pathogen genome diversity distribution, we stratified the human population genetic data in nine subcontinental metapopulation groups (supplementary table S3, Supplementary Material online) using the same geographical classification presented for HPV16 isolate data in supplementary table S6 (Supplementary Material online). This geographical grouping of human population genetic data reflected, as far as possible, the metapopulation groups from which the viral isolates were sampled. All mitochondrial complete genome sequences were manually aligned using MUSCLE (Edgar 2004) and 875 complete mitochondrial DNA genomes available from (Lippold et al. 2014) were used in this study. Y chromosome nonrecombining region sequences from 551 males were retrieved (Lippold et al. 2014) and all variable loci data were included in this study. Autosomal single nucleotide polymorphism (SNP) loci data ascertained using either a European or sub-Saharan African individual as previously reported (Reich et al. 2010) were used in this study. All autosomal SNP markers were previously genotyped in 934 CEPH diversity panel individuals (Green et al. 2010; Reich et al. 2010). The African and European ascertained data sets consisted of 12,162 and 111,970 SNPs, respectively, covering all autosomal chromosomes of the human genome. All autosomal SNP genotype quality control and data mining were performed using PLINK software (Purcell et al. 2007). Only SNPs with MAF > 0.01 were used in further analysis. Phasing for each genotype data sets per chromosome were performed using BEAGLE software (Browning and Browning 2007). Genetic FST distances (WRIGHT 1951) between the nine geographical population groups defined above, and excluding North America, were estimated for 1,101 available HPV16 isolate sequences encompassing LCR (735bp) and E6 (457 bp) genome regions (supplementary fig. S3, Supplementary Material online) as well as for each of the human genome marker data sets (mitochondrial, Y chromosome, and autosomal SNPs). Mantel’s permutation test for matrix similarity was implemented for pairwise combinations of FST distance matrices with 10,000 permutations. To search for possible traces of HPVs, including all known HPV16 variants, in archaic humans we retrieved all available Neanderthal and Denisovan pre-assembly sequence data from http://cdna.eva.mpg.de/neandertal/altai/AltaiNeandertal and http://www.ebi.ac.uk/ena/data/view/PRJEB3092, respectively. Reads were mapped against all known HPVs genomes using bowtie2 (version: 2.2.5) with –very-sensitive command for the alignment. Further details of data processing are available from the authors upon request.

Supplementary Material

Supplementary figures S1–S7, tables S1–S6 and materials S1–S4 are available at Molecular Biology and Evolution online.

Author Contributions

Designed and conceived the project: VNP, IGB. Retrieved data and performed the phylogenetic inference: VNP, CMO. Performed the modern and archaic human genome data analyses and Bayesian MCMC inference: VNP. Wrote the manuscript: VNP, IGB. All authors read and approved the final article. Click here for additional data file.
  86 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

2.  Parasites as indicators of human biology and evolution.

Authors: 
Journal:  J Med Microbiol       Date:  2000-09       Impact factor: 2.472

3.  A population-based prospective study of carcinogenic human papillomavirus variant lineages, viral persistence, and cervical neoplasia.

Authors:  Mark Schiffman; Ana Cecilia Rodriguez; Zigui Chen; Sholom Wacholder; Rolando Herrero; Allan Hildesheim; Rob Desalle; Brian Befano; Kai Yu; Mahboobeh Safaeian; Mark E Sherman; Jorge Morales; Diego Guillen; Mario Alfaro; Martha Hutchinson; Diane Solomon; Philip E Castle; Robert D Burk
Journal:  Cancer Res       Date:  2010-03-30       Impact factor: 12.701

Review 4.  Coevolution of papillomaviruses with human populations.

Authors:  H U Bernard
Journal:  Trends Microbiol       Date:  1994-04       Impact factor: 17.079

5.  General acquisition of human papillomavirus infections of skin occurs in early infancy.

Authors:  Annika Antonsson; Silvana Karanfilovska; Pelle G Lindqvist; Bengt Göran Hansson
Journal:  J Clin Microbiol       Date:  2003-06       Impact factor: 5.948

6.  Purifying selection can obscure the ancient age of viral lineages.

Authors:  Joel O Wertheim; Sergei L Kosakovsky Pond
Journal:  Mol Biol Evol       Date:  2011-06-24       Impact factor: 16.240

7.  The complete genome sequence of a Neanderthal from the Altai Mountains.

Authors:  Kay Prüfer; Fernando Racimo; Nick Patterson; Flora Jay; Sriram Sankararaman; Susanna Sawyer; Anja Heinze; Gabriel Renaud; Peter H Sudmant; Cesare de Filippo; Heng Li; Swapan Mallick; Michael Dannemann; Qiaomei Fu; Martin Kircher; Martin Kuhlwilm; Michael Lachmann; Matthias Meyer; Matthias Ongyerth; Michael Siebauer; Christoph Theunert; Arti Tandon; Priya Moorjani; Joseph Pickrell; James C Mullikin; Samuel H Vohr; Richard E Green; Ines Hellmann; Philip L F Johnson; Hélène Blanche; Howard Cann; Jacob O Kitzman; Jay Shendure; Evan E Eichler; Ed S Lein; Trygve E Bakken; Liubov V Golovanova; Vladimir B Doronichev; Michael V Shunkov; Anatoli P Derevianko; Bence Viola; Montgomery Slatkin; David Reich; Janet Kelso; Svante Pääbo
Journal:  Nature       Date:  2013-12-18       Impact factor: 49.962

8.  Ancient human genomes suggest three ancestral populations for present-day Europeans.

Authors:  Iosif Lazaridis; Nick Patterson; Alissa Mittnik; Gabriel Renaud; Swapan Mallick; Karola Kirsanow; Peter H Sudmant; Joshua G Schraiber; Sergi Castellano; Mark Lipson; Bonnie Berger; Christos Economou; Ruth Bollongino; Qiaomei Fu; Kirsten I Bos; Susanne Nordenfelt; Heng Li; Cesare de Filippo; Kay Prüfer; Susanna Sawyer; Cosimo Posth; Wolfgang Haak; Fredrik Hallgren; Elin Fornander; Nadin Rohland; Dominique Delsate; Michael Francken; Jean-Michel Guinet; Joachim Wahl; George Ayodo; Hamza A Babiker; Graciela Bailliet; Elena Balanovska; Oleg Balanovsky; Ramiro Barrantes; Gabriel Bedoya; Haim Ben-Ami; Judit Bene; Fouad Berrada; Claudio M Bravi; Francesca Brisighelli; George B J Busby; Francesco Cali; Mikhail Churnosov; David E C Cole; Daniel Corach; Larissa Damba; George van Driem; Stanislav Dryomov; Jean-Michel Dugoujon; Sardana A Fedorova; Irene Gallego Romero; Marina Gubina; Michael Hammer; Brenna M Henn; Tor Hervig; Ugur Hodoglugil; Aashish R Jha; Sena Karachanak-Yankova; Rita Khusainova; Elza Khusnutdinova; Rick Kittles; Toomas Kivisild; William Klitz; Vaidutis Kučinskas; Alena Kushniarevich; Leila Laredj; Sergey Litvinov; Theologos Loukidis; Robert W Mahley; Béla Melegh; Ene Metspalu; Julio Molina; Joanna Mountain; Klemetti Näkkäläjärvi; Desislava Nesheva; Thomas Nyambo; Ludmila Osipova; Jüri Parik; Fedor Platonov; Olga Posukh; Valentino Romano; Francisco Rothhammer; Igor Rudan; Ruslan Ruizbakiev; Hovhannes Sahakyan; Antti Sajantila; Antonio Salas; Elena B Starikovskaya; Ayele Tarekegn; Draga Toncheva; Shahlo Turdikulova; Ingrida Uktveryte; Olga Utevska; René Vasquez; Mercedes Villena; Mikhail Voevoda; Cheryl A Winkler; Levon Yepiskoposyan; Pierre Zalloua; Tatijana Zemunik; Alan Cooper; Cristian Capelli; Mark G Thomas; Andres Ruiz-Linares; Sarah A Tishkoff; Lalji Singh; Kumarasamy Thangaraj; Richard Villems; David Comas; Rem Sukernik; Mait Metspalu; Matthias Meyer; Evan E Eichler; Joachim Burger; Montgomery Slatkin; Svante Pääbo; Janet Kelso; David Reich; Johannes Krause
Journal:  Nature       Date:  2014-09-18       Impact factor: 49.962

9.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

10.  Metagenomic analysis reveals presence of Treponema denticola in a tissue biopsy of the Iceman.

Authors:  Frank Maixner; Anton Thomma; Giovanna Cipollini; Stefanie Widder; Thomas Rattei; Albert Zink
Journal:  PLoS One       Date:  2014-06-18       Impact factor: 3.240

View more
  21 in total

1.  Modelling the evolution of viral oncogenesis.

Authors:  Carmen Lía Murall; Samuel Alizon
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-05-27       Impact factor: 6.237

2.  The HPV16 Genome Is Stable in Women Who Progress to In Situ or Invasive Cervical Cancer: A Prospective Population-Based Study.

Authors:  Laila-Sara Arroyo-Mühr; Camilla Lagheden; Emilie Hultin; Carina Eklund; Hans-Olov Adami; Joakim Dillner; Karin Sundström
Journal:  Cancer Res       Date:  2019-07-09       Impact factor: 12.701

Review 3.  Human papillomaviruses: diversity, infection and host interactions.

Authors:  Alison A McBride
Journal:  Nat Rev Microbiol       Date:  2021-09-14       Impact factor: 60.633

4.  Ancient Evolution and Dispersion of Human Papillomavirus 58 Variants.

Authors:  Zigui Chen; Wendy C S Ho; Siaw Shi Boon; Priscilla T Y Law; Martin C W Chan; Rob DeSalle; Robert D Burk; Paul K S Chan
Journal:  J Virol       Date:  2017-10-13       Impact factor: 5.103

5.  The Intersection of HPV Epidemiology, Genomics and Mechanistic Studies of HPV-Mediated Carcinogenesis.

Authors:  Lisa Mirabello; Megan A Clarke; Chase W Nelson; Michael Dean; Nicolas Wentzensen; Meredith Yeager; Michael Cullen; Joseph F Boland; Mark Schiffman; Robert D Burk
Journal:  Viruses       Date:  2018-02-13       Impact factor: 5.048

Review 6.  The Role of aDNA in Understanding the Coevolutionary Patterns of Human Sexually Transmitted Infections.

Authors:  Ville N Pimenoff; Charlotte J Houldcroft; Riaan F Rifkin; Simon Underdown
Journal:  Genes (Basel)       Date:  2018-06-25       Impact factor: 4.096

7.  Classification and evolution of human papillomavirus genome variants: Alpha-5 (HPV26, 51, 69, 82), Alpha-6 (HPV30, 53, 56, 66), Alpha-11 (HPV34, 73), Alpha-13 (HPV54) and Alpha-3 (HPV61).

Authors:  Zigui Chen; Mark Schiffman; Rolando Herrero; Rob DeSalle; Kathryn Anastos; Michel Segondy; Vikrant V Sahasrabuddhe; Patti E Gravitt; Ann W Hsing; Paul K S Chan; Robert D Burk
Journal:  Virology       Date:  2018-01-11       Impact factor: 3.616

8.  Association of cervical carcinogenesis risk with HPV16 E6 and E7 variants in the Taizhou area, China.

Authors:  Mei-Zhen Dai; Yi Qiu; Xing-Hong Di; Wei-Wu Shi; Hui-Hui Xu
Journal:  BMC Cancer       Date:  2021-07-03       Impact factor: 4.430

9.  Ancient Evolutionary History of Human Papillomavirus Type 16, 18 and 58 Variants Prevalent Exclusively in Japan.

Authors:  Kohsei Tanaka; Gota Kogure; Mamiko Onuki; Koji Matsumoto; Takashi Iwata; Daisuke Aoki; Iwao Kukimoto
Journal:  Viruses       Date:  2022-02-24       Impact factor: 5.048

10.  Lineage analysis of human papillomavirus type 39 in cervical samples of Iranian women.

Authors:  Neda Hosseini; Zabihollah Shoja; Arash Arashkia; Amir-Hossein Khodadadi; Somayeh Jalilvand
Journal:  Virol J       Date:  2021-07-22       Impact factor: 4.099

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.