| Literature DB >> 23577084 |
Ping Chen1, Yun Gan, Na Han, Wei Fang, Jiafu Li, Fei Zhao, Kanghong Hu, Simon Rayner.
Abstract
INTRODUCTION: The Hepatitis B Virus (HBV) genome contains four ORFs, S (surface), P (polymerase), C (core) and X. S is completely overlapped by P and as a consequence the overlapping region is subject to distinctive evolutionary constraints compared to the remainder of the genome. Specifically, a non-synonymous substitution in one coding frame may produce a synonymous substitution in the alternative frame, suggesting a possible conflict between requirements for diversifying and purifying forces. To examine how these contrasting requirements are balanced within this region, we investigated the relationship amongst positive selection sites, conserved regions, epitopes and elements of protein structure to consider how HBV balances the contrasting evolutionary pressures. METHODOLOGY/Entities:
Mesh:
Substances:
Year: 2013 PMID: 23577084 PMCID: PMC3618453 DOI: 10.1371/journal.pone.0060098
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Map of the Overlapping Region of the S and P Genes.
The line at the top shows a schematic of the major components of the S and P genes. The arrows above mark the location of the overlapping regions of the two genes. The spacer domain in P more or less corresponds to the PreS (PreS1+ PreS2) domain in S, whereas the RT domain in P corresponds to the S domain in S. The plots below show the variation within the overlapping region and the location of specific features for both genes. First row: Entropy plots for S (upper plot) and P (lower plot). The X-axis denotes the codon position (1–389) and refers to the position within the overlapping region. The Y-axis denotes the entropy of the sites, with a higher value representing a more variable codon. The location of important regions within each gene is marked above each plot. For S these are PreS, a-determinant and four transmembrane regions TM1 to TM4. For P these are Spacer and the YMDD motif. In S, the variable residues are mainly located in PreS, the “a” determinant and at the C-terminus, while the trans-membrane regions are relatively well conserved. In P, the Spacer domain and the region corresponding to “a” determinant are highly variable, while the most conserved codons are locate within and near the YMDD motif. Row 2 shows the location of highly conserved codons for S (upper plot) and P (lower plot), based on the entropy plots. Row 3 shows the location of predicted secondary structure features (alpha helix and beta sheets) based on predicted protein structures for S (upper plot) and P (lower plot). Row 4 shows the location of epitopes within the S protein. Row 5 shows the sites predicted to be under positive selection for S (upper plot) and P (lower plot).
Association between conservation, secondary structure, positive selection sites and epitopes performed by Fisher’s exact test.
| OR | p-value | |
| (A) P protein (Conservation VS. 2nd structure) | ||
| α-helix | 2.83 | 3.01e-05 |
| β-sheet | 1.94 | 0.09 |
| (B) S protein (Positive selection VS. epitopes) | ||
| Infinite | 0.01 | |
| (C) S protein (Variation VS. epitopes) | ||
| 0.47 | 0.02 | |
| (D) S protein (Conservation VS 2nd structure) | ||
| α-helix | 1.96 | 0.01 |
| β-sheet | 1.06 | NS |
OR: odd ratio; NS: not significant.
Association between (A) Conserved sites and secondary structure (B) sites under positive selection and epitopes and (C) variation (entropy) and epitopes for S protein. (D) association between Conserved sites and secondary structure for P protein. The odds ratio provides a measure of the association between two specified variables. For example, in (A) conserved sites have a strong association with α-helices both in the S (OR = 1.96, P = 0.01) and the P protein (OR = 2.83, P = 3.01e-05<0.01), a weak association with β-sheets in the P protein (OR = 1.94, P = 0.09), but have no significant association between conserved sites and β-sheets in the S protein, indicating the α-helices are highly conserved and the β-sheets can accommodate more variable residues.
Figure 2Predicted 3D Structures for the S and P Proteins.
A) The predicted 3D model of the HBV RT based on the HIV RT structure which folds in the classic “right hand” shape with fingers (blue), palm (red) and thumb (green) subdomains. The finger and thumb subdomains are primarily composed of α-helices, whereas the palm regions mainly comprises α-helix and β-sheets. B) The predicted 3D model for S. S contains four long α-helices which constitute the trans-membrane (TM) regions. These are colored blue (TM1), green (TM2), yellow (TM3) and brown (TM4) respectively. These α-helices are each separated by loops and the “a” determinant located in the loop between TM2 and TM3. C) The spatial distribution of conserved and variable residues in HBV RT. The highly conserved residues are colored red, the highly variable residues are colored blue, and the remaining residues are colored white. The majority of residues are conserved. Furthermore, the most conserved residues are clustered within and near the YMDD motif (marked as red spheres). D) The spatial distribution of conserved and variable residues in S. Red and blue indicate the most highly conserved and most variable residues respectively, the remaining residues are colored white. The “a” determinant (marked with spheres with the same colour scheme to show variability) harbors many B cell epitopes and contains many highly variable sites (blue spheres). Compared to P, the distribution of variable sites in S appears to be more diffuse. E) Schematic of secondary structure of S. S has four membrane spanning regions (TM1–TM4). The N-terminus, C-terminus and “a” determinant are located on the outer face of the membrane. Coordinates of the membrane spanning regions are shown for inner and outer face. Top coordinate corresponds to the position within the pre-S1, coordinates in parentheses correspond to the position within the small S.
Bivariate logistic regression analysis for association with (A) conservation, or (B) positive selection sites in S protein.
| Coef. | Std. Err. | OR | p-value | 95% CI | |
| (A) Conservation (P(> χ2 ) = 0.008) | |||||
| X1 (α-helix+β-sheet) | 0.64 | 0.59 | 1.89 | 0.041 | 1.03–3.47 |
| X2 (epitopes) | 0.49 | 0.43 | 1.64 | 0.058 | 0.98–2.72 |
| (B) Positive selection (P(> χ2) >0.05) | |||||
| X1 (α-helix+β-sheet) | 0.14 | 0.38 | na | 0.71NS | na |
| X2 (epitopes) | 16.24 | 887.44 | na | 0.98NS | na |
Coef: coefficient; Std. Err.: standard error; OR: odd ratio; CI: confidence interval; NS: not significant; na: not applicable. Logistic regression analysis was carried out with conservation as the predicted variable. (A) The estimated coefficients suggest that the conservation is significantly associated with structural region (α-helix+β-sheet) (Coef. = 0.64 with p = 0.041<0.5), but has no significant association with epitopes (Coef. = 0.49 with p = 0.058>0.05). This result is consistent with the results of Fisher’s exact test. (B) The logistic model with X1 and X2 as predictor variables is not significant due to the fact P(> χ2) >0.05.