| Literature DB >> 18084026 |
Ignacio E Sánchez1, Mariano Dellarole, Kevin Gaston, Gonzalo de Prat Gay.
Abstract
Mucosal human papillomaviruses (HPVs) are etiological agents of oral, anal and genital cancer. Properties of high- and low-risk HPV types cannot be reduced to discrete molecular traits. The E2 protein regulates viral replication and transcription through a finely tuned interaction with four sites at the upstream regulatory region of the genome. A computational study of the E2-DNA interaction in all 73 types within the alpha papillomavirus genus, including all known mucosal types, indicates that E2 proteins have similar DNA discrimination properties. Differences in E2-DNA interaction among HPV types lie mostly in the target DNA sequence, as opposed to the amino acid sequence of the conserved DNA-binding alpha helix of E2. Sequence logos of natural and in vitro selected sites show an asymmetric pattern of conservation arising from indirect readout, and reveal evolutionary pressure for a putative methylation site. Based on DNA sequences only, we could predict differences in binding energies with a standard deviation of 0.64 kcal/mol. These energies cluster into six discrete affinity hierarchies and uncovered a fifth E2-binding site in the genome of six HPV types. Finally, certain distances between sites, affinity hierarchies and their eventual changes upon methylation, are statistically associated with high-risk types.Entities:
Mesh:
Substances:
Year: 2007 PMID: 18084026 PMCID: PMC2241901 DOI: 10.1093/nar/gkm1104
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Conserved features of the E2–DNA interaction. (A) Complex of the c-terminal domain of the HPV18 E2 protein with the idealized target DNA sequence CAACCGAATTCGGTTG. The two four-base half-sites in direct contact with the protein are shown in red, the four-base linker in silver and the two flanking bases in gold. The protein helices that contact the DNA directly in green. (B) Sequence logo (63,68) of the recognition helix for alpha papillomaviruses. Protein residues contributing more than 0.8 kcal/mol to the binding energy of HPV16 E2 (23) are indicated with asterisks. (C) Correlation between the free energies of binding of E2 proteins from HPV type 11 and 16 to four E2-BSs (open triangle) (16,19,20) and of E2 proteins from HPV types 18 and 16 to another set of four E2-BSs (filled square) (16,19,20). The correlation R-values are 0.87 (16/11 pair, dashed line) and 0.91 (16/18 pair, continuous line). The sequences of the DNA-binding helix of the three proteins are also shown, with the side chains contributing more than 0.8 kcal/mol to the binding energy of HPV16 E2 (23) in bold.
Figure 2.E2-binding sites in alpha papillomaviruses. (A) Schematic view of the upstream regulatory region of a prototypical alpha HPV genome. Shown are the flanking ORFs, L1 and E6, the start of the early promoter and its TATA box, the four binding sites for the E2 protein, the binding sites for the viral protein E1 and the host protein Sp1 and the silencer, enhancer and nuclear matrix attachment regions. (B) Sequence logos of the four E2-binding sites. Sites are shown in the 5′–3′ direction. (C) Histograms of the number of bases between E2-binding sites.
Figure 3.Influence of CG methylation in the evolution of E2-binding sites. (A) Top: Sequence logo for all four biological E2-binding sites. Middle: Sequence logo from in vitro binding selection experiments with HPV51 E2 (27). Bottom: Two sample logo, taking the biological logo as sample and the in vitro logo as background. Displayed bases are enriched in the biological sites compared with the sequences selected in vitro. (B) Presence of putative methylation sites (CG dinucleotides) in positions 4,5 and 10,11 for each of the four in vivo E2-binding sites.
Figure 4.Correlation between observed and predicted free energies of binding for E2–DNA complexes for E2 proteins. (A) HPV type 16, data from Ref. (18). (B) HPV type 16, data from Ref. (16). (C) HPV type 11, data from Ref. (20). (D) HPV type 18, data from Ref. (19). (E) HPV type 51, data from Ref. (27). Units are kcal/mol in all cases. The binding energy of the consensus target sequence was arbitrarily set to zero. All other sequences have positive predicted values of ▵▵Gbinding, indicative of a reduced predicted binding affinity. The total number of points is 38, the standard deviation between observed and predicted values is 0.64 kcal/mol or 2.9-fold in KD.
Figure 5.Six classes of relative binding affinity hierarchies for the E2–DNA interaction. For each group of types, we represent the average relative predicted affinity and standard deviation for each site (thick black line, points) and the values for each type (high-risk types in red, low-risk types in green, cutaneous types in blue and other types in grey). The types were grouped using the k-means algorithm (see Methods section). The binding energy of the consensus target sequence was arbitrarily set to zero. All other sequences have positive predicted values of ΔΔGbinding, indicative of a reduced predicted binding affinity. (A) High-risk types 16, 35, 52, 53, 56, 66 and 73; low-risk types 6b, 40, 42, 43 and 44; cutaneous types 2, 2a, 2isoC2, 27 and 27b; and types 13, 13b, 30, 34, 35H, 55, 57, 57b, 67, 71, 74subtype, 90cand, 91, 106, PCPV1 and RHPV1. (B) High-risk types 18, 26, 33, 39, 45, 51, 58, 59, 68a, 82 and 82subtype; low-risk types 6, 6a, 11 and 54; cutaneous type 7; and types 32, 85cand, 97 and 97iso624. (C) Cutaneous types 3, 10, 28, 29 and 94; and types 84, 86cand, 87cand and 89cand. (D) Low-risk types 61, 72 and 81; and type 62cand. (E) High-risk type 31; low-risk type 70; and type 69. (F) Cutaneous type 77; and types 83 and 102.
Newly identified E2-binding sites
| HPV type | Sequence | Distance to site 4 | ΔΔ |
|---|---|---|---|
| 30 | AACCAAAAAGGGTG | 93 | 3.11 |
| 44 | AACCGAAAACGGTT | −15 | 0.29 |
| 54 | AACCGAAACCGTTT | Overlapping site 4 | 2.48 |
| 61 | GACCGAAACCGGTC | −19 | 1.52 |
| 90 | GACCGAAACCGGGA | −2 | 3.40 |
| 102 | GACCGAAACCGGTC | −25 | 1.52 |
aIn the orientation with the best predicted energy. The sites from types 30, 44, 61, 90 and 102 are in the 3′–5′ direction, the site from type 54 is in the 5′–3′ direction.
bDistance is negative if site 5 is closer to the early promoter than site 4 and positive otherwise.
cRelative to the consensus sequence.
Association of molecular and epidemiological properties in alpha papillomaviruses
| Molecular property | High risk ( | Low risk ( | Cutaneous ( | |||||
|---|---|---|---|---|---|---|---|---|
| Affinity hierarchy | 7 | >0.05 | 5 | >0.05 | 5 | >0.05 | ||
| 11 | 1.2 × 10−3 (+) | 4 | >0.05 | 1 | >0.05 | |||
| 0 | >0.05 | 0 | >0.05 | 5 | 4.9 × 10−3 (+) | |||
| 0 | >0.05 | 3 | 1.7 × 10−2 (+) | 0 | >0.05 | |||
| 1 | >0.05 | 1 | >0.05 | 0 | >0.05 | |||
| 0 | >0.05 | 0 | >0.05 | 1 | >0.05 | |||
| Methylation defect | Site 1 | Position 4 ( | 1 | >0.05 | 0 | >0.05 | 0 | >0.05 |
| Position 10 ( | 0 | 3.6 × 10−3 (−) | 3 | >0.05 | 7 | 2.9 × 10−2 (+) | ||
| Site 2 | Position 4 (0) | 0 | >0.05 | 0 | >0.05 | 0 | >0.05 | |
| Position 10 ( | 7 | 2.3 × 10−4 (+) | 0 | >0.05 | 0 | >0.05 | ||
| Site 3 | Position 4 ( | 14 | 1.4 × 10−5 (+) | 2 | >0.05 | 0 | 6.0 × 10−3 (−) | |
| Position 10 ( | 1 | 6.4 × 10−4 (−) | 2 | >0.05 | 12 | 6.3 × 10−7 (+) | ||
| Site 4 | Position 4 (0) | 0 | >0.05 | 0 | >0.05 | 0 | >0.05 | |
| Position 10 ( | 0 | >0.05 | 0 | >0.05 | 1 | >0.05 | ||
| Distance between sites | 1 | >0.05 | 1 | >0.05 | 0 | >0.05 | ||
| 5 | 7.2 × 10−4 (−) | 10 | >0.05 | 12 | 1.0 × 10−3 (+) | |||
| 10 | 1.1 × 10−2 (+) | 2 | >0.05 | 0 | 1.0 × 10−2 (−) | |||
| 3 | >0.05 | 0 | >0.05 | 0 | >0.05 | |||
| 0 | >0.05 | 0 | >0.05 | 6 | 5.9 × 10−6 (+) | |||
Epidemiologial properties are shown as columns and molecular properties as rows, with the number of types between brackets. For a given combination of properties, we indicate the observed number of types and the probability that the observation occurs by chance. Plus and minus signs indicate which combinations of molecular and epidemiological properties occur together more or less often than at random, respectively.