| Literature DB >> 34209097 |
Wei Liu1,2,3, Junhua Li1,2,3, Hongli Du1, Zhihua Ou2,3.
Abstract
Human papillomavirus type 16 (HPV16) is the most prevalent HPV type causing cervical cancers. Herein, using 1597 full genomes, we systemically investigated the mutation profiles, surface protein glycosylation sites and the codon usage bias (CUB) of HPV16 from different lineages and sublineages. Multiple lineage- or sublineage-conserved mutation sites were identified. Glycosylation analysis showed that HPV16 lineage D contained the highest number of different glycosylation sites from lineage A in both L1 and L2 capsid proteins, which might lead to their antigenic distances between the two lineages. CUB analysis showed that the HPV16 open reading frames (ORFs) preferred codons ending with A/T. The CUB of HPV16 ORFs was mainly affected by natural selection except for E1, E5 and L2. HPV16 only shared some of the preferred codons with humans, which might help reduce competition in translational resources. These findings increase our understanding of the heterogeneity between HPV16 lineages and sublineages, and the adaptation mechanism of HPV in human cells. In summary, this study might facilitate HPV classification and improve vaccine development and application.Entities:
Keywords: HPV16; codon usage bias; glycosylation; lineage and sublineage; mutation
Mesh:
Substances:
Year: 2021 PMID: 34209097 PMCID: PMC8310365 DOI: 10.3390/v13071281
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Mutation profiles of HPV16 sublineages.
| ORF | Nucleotide Mutation | Amino Acid Mutation | Proportion of Sequences with the Corresponding Mutations in Each Sublineage (%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A1 | A2 | A3 | A4 | B1 | C1 | D1 | D2 | D3 | D4 | |||
| ( | ( | ( | ( | ( | ( | ( | ( | ( | ( | |||
| E1 | T1220C | V119A | 100 | 1.1 | ||||||||
| C1415T | T184I | 100 (89) | ||||||||||
| C1598T | P245L | 92.9 (76) | 1.1 | |||||||||
| A1667G | H268R | 100 | 100 | 100 | 100 | |||||||
| T2252C | F463S | 0.1 | 91.7 | |||||||||
| T2253C | F463S | 91.7 | ||||||||||
| T2342C | F493S | 0.1 | 1.2 | 100 | 95.8 | |||||||
| C2343T | F493S | 100 | 94.7 | |||||||||
| T2354C | L497P | 0.5 | 100 (100) | |||||||||
| T2375C | L504P | 0.3 | 2.5 | 3.6 (2.9) | 100 | |||||||
| C2456T | T531I | 0.1 | 99.0 | |||||||||
| E2 | C3158G | T135R | 100 | |||||||||
| A3180C | E142D | 14.3 (14.7) | 6 (7.1) | 97.9 | ||||||||
| T3223A | L157I/M a | 100 | 100 | 100 | 100 | |||||||
| T3383C | I210T | 1.7 | 100 | 100 | ||||||||
| T3386C | I211T | 95.8 | ||||||||||
| G3412A | A220T | 100 | ||||||||||
| G3415A | A221T | 100 | ||||||||||
| G3430A | A226T | 100 (98.2) | 2.9 | |||||||||
| E5 | G3881A | A7T | 100 (100) | |||||||||
| A4054T | L64S/F b | 100 (100) | 2 (1.8) | |||||||||
| A4089T | H76L | 97.6 | ||||||||||
| E6 | G132T | R10I | 98 (87.5) | |||||||||
| C143G | Q14D | 92.9 (98.2) | 98 (99.5) | |||||||||
| T350G | L83V | 47.8 | 21.6 | 3.6 (14.7) | 100 | 100 | 100 | 100 | ||||
| E7 | A647G | N29S | 98.8 | 100 (89.3) | ||||||||
| L1 | A6178C | N207T | 41.7 | 14.3 (11.8) | 78 (75) | 8.3 | 5.7 | 100 | ||||
| T6480C | S308P | 3.6 (2.9) | 100 (100) | |||||||||
| A6801T | T415S | 97.9 | ||||||||||
| L2 | A4967G | T245A | 0.1 | 100 | 97.1 | 98.9 | 100 | |||||
| A5032T | L266F | 100 | 100 | 100 | 100 | |||||||
| A5288C | T353P | 100 (89.3) | ||||||||||
| A5288G | T353A | 100 (97.1) | ||||||||||
| T5366G | S379A/V c | 100 | 97.1 | 96.8 | 100 | |||||||
| T5384G | S385A | 100 | 97.1 | 100 | 100 | |||||||
Note: mutation sites were determined for sublineages with more than 10 sequences, and only those mutations occurring in >90% of the sequences in a certain sublineage were shown. Blank space indicates that there were few/no corresponding mutations in the sublineage or that sublineage contained less than 10 sequences. As multiple sublineages of B and C lineages contained less than 10 strains, the overall mutation frequencies were also calculated for B and C lineages were also calculated. The numbers in parentheses indicate the proportion of the mutation in B or C lineage. a L157I/M: T3223A -> L157I; T3223A and A3224G -> L157M. b L64S/F: A4054T -> L64F; A4054T and T4053C -> L64S. c S379V/A: T5366G -> S379A; T5366G and C5367T -> S379V.
Figure 1Mutation distribution across the HPV16 genome. The x axis shows HPV16 gene positions, and the y axis shows the 12 nucleotide mutation patterns. The bubble size indicates the occurrence of nucleotide mutations.
Figure 2The lineage distribution of potential glycosylation sites in L1 and L2 proteins.
Figure 3ENC plot of the eight ORFs of HPV16. The continuous curve plots the relationship between GC3 and ENC in the absence of selection. The horizontal dotted line represents the ENC value of 35. Almost all points lie below the curve.
Figure 4Neutrality plot analysis of GC12 and GC3 for HPV16 ORFs.
Figure 5Relative synonymous codon usage (RSCU) analysis revealed over-representation of codons ending in A/T in HPV16 ORFs. Columns correspond to the 59 codons (three stop codons and those for Trp, Met were excluded). Rows correspond to the eight ORFs. Blue cells indicate under-represented codons (RSCU < 0.6) and red cells indicate over-represented codons (RSCU > 1.6). “X3s” indicates the nucleotide at the 3rd codon position.
Figure 6Pairwise correlation analysis of RSCU for 59 codons in eight HPV16 ORFs versus those of humans. The R-squared values of linear regression analysis are shown. The embedded table denotes the number of commonly preferred (RSCU ≥ 0.6) codons and unpreferred (RSCU < 0.6) codons for HPV16 and human genes, and the number of preferred codons in humans but unpreferred in HPV16 and preferred codons in HPV16 but unpreferred in humans.