| Literature DB >> 16526953 |
Santiago García-Vallvé1, José R Iglesias-Rozas, Angel Alonso, Ignacio G Bravo.
Abstract
BACKGROUND: Papillomaviruses (PVs) infect stratified squamous epithelia in warm-blooded vertebrates and have undergone a complex evolutionary process. The control of the expression of the early ORFs in PVs depends on the binding of cellular and viral transcription factors to the upstream regulatory region (URR) of the virus. It is believed that there is a core of transcription factor binding sites (TFBS) common to all PVs, with additional individual differences, although most of the available information focuses only on a handful of viruses.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16526953 PMCID: PMC1421437 DOI: 10.1186/1471-2148-6-20
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Phylogenetic coherence of different papillomavirus taxa according to the L1 and E7 genes, and to the URR.
| L1 | Clustalw | Dialign | Tcoffee | consensus | |||||||||
| dnapars | Fitch | NJ | UPGMA | Protpars | Fitch | NJ | UPGMA | Protpars | Fitch | NJ | UPGMA | ||
| α | 77 | 100 | 77 | 94 | 80 | 100 | 100 | 87 | 52 | 100 | 100 | 93 | 71 |
| β | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| γ | 94 | 92 | 100 | 58 | 94 | 77 | 100 | 98 | 97 | 91 | 99 | 98 | 95 |
| δ | 60 | 43 | 100 | 79 | 100 | 100 | 100 | 100 | 54 | 100 | 100 | 100 | 37 |
| β+γ | 52 | 82 | 50 | 99 | 70 | 77 | 82 | 84 | 48 | 95 | 77 | 76 | 64 |
| δ+ξ | 100 | 100 | 68 | 15 | 77 | 100 | 38 | - | 100 | 49 | 69 | - | 72 |
| κ | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 99 | 99 | 100 |
| λ | 68 | 100 | 100 | 100 | 88 | 100 | 100 | 100 | 66 | 100 | 100 | 100 | 89 |
| μ | 95 | 100 | 100 | 100 | 92 | 99 | 100 | 100 | 97 | 95 | 99 | 99 | 97 |
| μ+κ+λ | 42 | 92 | 97 | 88 | 49 | 99 | 94 | 95 | 33 | 100 | 98 | 89 | 35 |
| E7 | Clustalw | Dialign | Tcoffee | consensus | |||||||||
| Protpars | Fitch | NJ | UPGMA | Protpars | Fitch | NJ | UPGMA | Protpars | Fitch | NJ | UPGMA | ||
| α | 54 | 90 | 86 | 99 | - | 92 | 84 | 89 | 63 | 85 | 84 | 97 | 30 |
| β | 26 | 96 | 89 | 94 | 99 | 92 | 89 | 84 | 64 | 85 | 96 | 88 | 57 |
| γ | 58 | 53 | 99 | 96 | 94 | 100 | 83 | 72 | 96 | 94 | 98 | 100 | 81 |
| δ | 55 | 15 | - | - | - | 47 | - | - | - | 55 | 25 | - | 35 |
| β+γ | - | 100 | - | - | - | - | - | - | - | - | - | - | - |
| δ+ξ | - | ||||||||||||
| κ | - | - | - | - | 38 | - | - | - | 88 | - | 29 | - | 33 |
| λ | - | 95 | 96 | 99 | 99 | 95 | 94 | 96 | 84 | 86 | 91 | 91 | 94 |
| μ | 100 | 93 | 91 | 99 | 100 | 92 | 94 | 97 | 92 | 96 | 99 | 100 | 92 |
| μ+κ+λ | - | - | - | - | - | - | - | - | - | - | - | - | - |
| URR | Clustalw | Dialign | Tcoffee | consensus | |||||||||
| dnapars | Fitch | NJ | UPGMA | dnapars | Fitch | NJ | UPGMA | dnapars | Fitch | NJ | UPGMA | ||
| K2/ML | K2/ML | K2/ML | K2/ML | K2/ML | K2/ML | K2/ML | K2/ML | K2/ML | |||||
| α | - | -/62 | -/- | -/91(1) | 99 | -/- | np/- | np/- | 20 | -/15(1) | -/- | 89(1)/88(1) | 19(1) |
| β | 98 | 100/100 | 100/100 | 100 | - | -/- | np/- | np/- | 97 | -/30 | -/- | 97/96 | 51 |
| γ | - | 100(2)/100(2) | 100(2)/100(2) | 100(2)/100(2) | - | -/- | np/- | np/- | - | -/- | -/- | -/- | 66(2) |
| δ | 96* | 69/62 | 100 | 100/100 | 99 | -/- | np/30 | np/44 | 94 | 85/49 | 100/99 | 100/100 | 80 |
| β+γ | - | -/- | -/- | -/- | - | -/- | np/- | np/- | - | -/- | -/- | -/- | - |
| δ+ξ | - | -/- | -/- | -/- | - | -/- | np/- | np/- | - | -/- | -/- | -/- | - |
| κ | - | -/- | -/- | -/- | - | -/- | np/- | np/- | 81 | -/- | -/- | -/- | - |
| λ | 100 | 73/- | 68 | 99 | - | -/- | np/- | np/- | 93 | 61/49 | -/- | 98/92 | - |
| μ | - | 75/74 | 58 | 100/99 | - | -/- | np/- | np/- | 86 | 97/98 | 97/99 | 100/99 | 64 |
| μ+κ+λ | - | -/- | -/- | -/- | - | -/- | np/- | np/- | - | -/- | -/- | -/- | - |
(1) HPV2 did not cluster together with the rest of the alpha genus
(2) HPV4 did not cluster together with the rest of the gamma genus
np: the high divergence values did not allow the algorithm to rend a solution
Phylogenies were reconstructed with three different alignments, CLUSTALW, DIALIGN and TCOFFEE, subsequently analysed with four different phylogenetic algorithms: a parsimony based algorithm – PROTPARS for protein sequences and DNAPARS for DNA sequences -, and three different matrix-based algorithms: FITCH, Neighbor-Joining (NJ), and UPGMA. Matrices were generated with PROTDIST or DNADIST. For DNA, two different nucleotide substitution models were used, the Kimura-two parameter model (K2) or a maximum-likelihood model (ML). Numbers refer to the percentage of times a given group is recovered in the consensus tree for each reconstruction method, after a bootstrap of 1000 cycles. The column "consensus" gathers the output of the CONSENSE programme with trees from all independent algorithms as input. Some algorithms could not work with the DIALIGN alignment as input, due to the extreme divergence between the sequences. This fact is marked as "np" in the corresponding columns. The support values decrease in the order L1>E7>URR. This reflects the diversity of the evolutionary pressures along the genome of the papillomaviruses. Some of the genera stably recovered according to the L1 protein phylogeny appear with a lower support for the E7 protein phylogeny, and do not appear as definite groups for the URR phylogeny. This is the case for genera alpha, kappa or lambda. Some other genera appear confidently with independence of the element analysed. This is the case for genera beta, gamma and delta. This shows that there are differences in the evolutionary patterns between the members of different clades.
Figure 1Relative branch length for the consensus trees of the L1, E1, E7 genes and the URR for the beta and delta papillomaviruses. Evolutionary distances in substitutions per site from present sequences to the last common ancestor of each genera were calculated on the consensus tree estimated after CLUSTALW alignment, distance matrix construction under the neighbor-joining conditions and a Kimura-two parameter nucleotide substitution model, and bootstrapped 10000 times. The two clades were recovered in all the consensus trees, independently of the analysed element. Distances were normalised for each virus individually, with respect to the L1 distance. P values show the results of a two-tailed paired Student's t test for each virus, for the values of the elements connected by the arrow brackets. There is a statistically significant gradient in the divergence distances in the order L1
Figure 2Density of predicted TFBS and binding TF in different DNA sequences. Transcription factor binding sites (TFBS) were predicted with MATCH, using the nucleotide matricial description of each site as compiled in TRANSFAC. Predictions were run on three DNA sequences: the URR of HPV16, the HPV16 full-genome except the URR, and a random DNA sequence with the same base composition than HPV16. Results were normalised with respect to the actual predictions in the randomised sequence. Values are shown for the total number of predicted TFBS (a) and for the total number of predicted binding TF (b) since some TF were predicted to have more than one binding site.
Example of predicting the presence of transcription factor binding sites in the upstream regulatory region of the papillomaviruses.
| α | β | γ | α | β | γ | ||||||
| HPV16 | HPV18 | HPV6 | HPV8 | HPV4 | HPV16 | HPV18 | HPV6 | HPV8 | HPV4 | ||
| AbaA | + | + | + | + | HFH-8 | + | + | + | + | + | |
| AhR/Arnt | + | + | HLF | + | |||||||
| AP-1 | + | + | + | + | + | HNF-3beta | + | + | + | + | |
| AREB6 | + | + | + | + | + | lk-1 | + | ||||
| Arnt | + | + | + | Lmo2 complex | + | + | + | + | |||
| Athb-1 | + | + | Mat1-Mc | + | + | + | + | ||||
| Bcd | + | + | + | + | MATa1 | + | |||||
| Brc- Z1 | + | Max | + | ||||||||
| BR-c Z4 | + | + | + | + | MCM1 | + | |||||
| Brn-2 | + | + | + | + | MYB.Ph3 | + | + | ||||
| CCAAT box | + | NF-E2 | + | ||||||||
| C/EBP | + | + | NF-Y | + | + | + | + | ||||
| CDP CR3+HD | + | + | Nkx2-5 | + | + | ||||||
| c-Ets-1(p54) | + | + | + | + | N-Myc | + | |||||
| CF2-II | + | + | + | + | + | NRF-2 | + | ||||
| CHOP-C | + | + | + | oct-1 | + | + | + | + | + | ||
| c-Myb | + | + | + | + | + | OCT-x | + | ||||
| c-Myc/Max | + | PacC | + | + | |||||||
| Croc | + | + | PHO4 | + | + | + | |||||
| dl | + | + | RAP1 | + | |||||||
| E2 | + | + | + | + | + | RFX1 | + | ||||
| E2F | + | RORalpha1 | + | + | |||||||
| E47 | + | + | + | + | S8 | + | + | + | + | ||
| E74A | + | SBF-1 | + | + | + | + | |||||
| Elf-1 | + | + | + | + | + | Skn-1 | + | + | + | + | + |
| Elk-1 | + | + | Sn | + | + | + | |||||
| ER | + | + | Sox-5 | + | + | + | + | + | |||
| FOXD3 | + | + | + | + | SOX-9 | + | + | + | + | + | |
| FOXJ2 | + | + | + | + | STATx | + | + | ||||
| Freac-2 | + | + | + | StuAp | + | + | |||||
| Freac-7 | + | + | + | + | + | Su(H) | + | ||||
| GATA-1 | + | + | + | + | TATA | + | + | + | + | ||
| GATA-2 | + | + | + | TCF11 | + | ||||||
| GATA-3 | + | TGIF | + | + | |||||||
| GATA-x | + | + | + | + | USF | + | + | ||||
| GBP | + | + | + | VBP | + | ||||||
| GCN4 | + | + | + | + | v-ErbA | + | |||||
| Gfi-1 | + | v-Maf | + | ||||||||
| Hand1/E47 | + | + | v-Myb | + | + | + | + | + | |||
| Hairy | + | XFD-1 | + | + | |||||||
| HAP2/3/4 | + | XFD-2 | + | + | + | ||||||
| HFH-1 | + | + | + | YY1 | + | ||||||
| HFH-3 | + | + | + | + | + | Zeste | + | + | |||
Transcription factor binding sites were predicted with MATCH, using the nucleotide matricial description of each site as compiled in TRANSFAC. The coincidence levels between the binding site sequence and the sequence in the URR were fixed to optimise simultaneously the number of false positives and false negatives. It is obvious that different papillomaviruses contain different transcription factor binding sites in their URR. Some of them are common to all papillomaviruses, such as AP-1, E2, HFH-3 or Oct-1. Other TFBS are type-exclusive, genus-exclusive, or are shared by papillomaviruses that infect the same host. The high dimensionality of these results makes it necessary to analyse them with information reduction techniques, such as principal component analysis or genetic algorithms.
Figure 3Presence of predicted transcription factor binding sites in alpha, beta and gamma papillomaviruses. Transcription factor binding sites were predicted with MATCH, using the nucleotide matricial description of each site as compiled in TRANSFAC. The coincidence levels between the binding site sequence and the sequence in the URR were fixed to minimise the number of false positives. The repertoire of transcription factor binding sites is different in different papillomaviruses. Some binding sites, such as E2, are present in most of the analysed viruses. Others are preferentially present in beta and gamma genera, such as v-Myb or FOXD3, in genus beta, such as HNF-3beta or HFH-3, or in genus gamma, such as COMP1 or GATA-3.
Figure 4Principal Component Analysis of the prediction of TFBS in the URR of the papillomaviruses. The predictions of the presence/absence of TFBS in the URR of the papillomaviruses were analysed by principal component analysis. The figure shows the distribution of alpha and beta papillomaviruses according to the two principal components, PC1 and PC2. Beta papillomaviruses, in grey, cluster together in the upper-right quadrant, whereas alpha papillomaviruses are distributed throughout the other three quadrants. Beta papillomaviruses are therefore more homogeneous according to the principal component analysis of TFBS in the URR than alpha papillomaviruses. This higher diversity in the alpha papillomaviruses is also observed regarding the higher number of species comprised in this genus and to the diversity in their clinical manifestations.
Predictions by genetic algorithms for the categorisation of different papillomaviruses according to the presence/absence of TFBS in their URR.
| input | grouping | correctly predicted | erroneously predicted | unable to predict |
| all viruses | 6 categories, random | 44% | 66% | 0% |
| 6 categories, taxa: alpha beta gamma+xi kappa+lambda+mu delta+zeta theta+iota | 64% (α 85%) (β 84%) | 18 (α 7%) (β 8%) | 22% (α 7%) (β 8%) | |
| 2 categories, random | 61% | 39% | -% | |
| 2 categories primate/non primate | 90% | 2% | 8% | |
| human PVs | 4 categories, random | 30% | 56% | 14% |
| 4 categories, taxa: alpha beta gamma mu | 76% (α 79%) (β 92%) | 12% (α 85) (β 0%) | 12% (α 12%) (β 8%) | |
| alpha + beta PVs | 2 categories, random | 39% | 39% | 22% |
| 2 categories, taxa: alpha beta | 97% | 3% | 0% |
Genetic algorithms were trained with the matrix describing the presence/absence of TFBS in the URR of the papillomaviruses. An additional column was provided, including either real information about the pertenence of each virus to a genus, or about the nature of the infected host. As a control, additional training was performed on the same matrix providing a column with a random categorisation. Numbers in the table reflect the percentage of viruses for which the genetic algorithms rendered correct predictions, false predictions, or could not formulate any prediction, respectively. Additional values in brackets show the specific accuracies of the predictions for the alpha and beta papillomaviruses. Genetic algorithms were able to discern the pertenence of a given virus to a genus, attending only to the repertoire of TFBS in the URR. Specifically, alpha and beta papillomaviruses were recognised as separate definite clades with high confidence in most of cases. The main targets of both genera are basal cells in the stratified squamous epithelia. These results suggest that alpha and beta papillomaviruses take advantage of different control mechanisms in the same target cell, and/or that they infect different subsets of cells, which are histologically indistiguishable. Finally, genetic algorithms were able to recognise whether a given papillomavirus infects or not primates, regarding only the TFBS patterns in the URR. Papillomaviruses infecting primates are phylogenetically distant, and separated long before the appearance of the primates themselves. Therefore their clustering together according to the repertoire of TFBS suggests a uniformity of regulatory mechanisms, achieved by convergent evolution.
Figure 5Predictions by genetic algorithms for the categorisation of alpha and beta papillomaviruses according to the predicted repertoire of TFBS in the URR. The predictions of TFBS in the URR were analysed by genetic algorithms, including additional information about the genus they belong, alpha or beta (a). Controls were performed with the same predictions, but including random adscription to one of two categories (b). Expected values are shown as open circles, and predicted values are shown as crosses. The elements for which the genetic algorithms could not formulate predictions are labelled as "NP", non-predictable. (a) Genetic algorithms were able to correctly discern alpha and beta papillomaviruses with regards exclusively to their TFBS patterns. (b) The negative control shows that this result is not dependent on the information contained in the prediction matrix itself, but on the categorisation additionally provided. Thus, the repertoire of TFBS in the URR is different between alpha and beta PVs, although both genera infect histologically similar target cells in the same hosts.