| Literature DB >> 32435064 |
Behzad Dehghani1, Zahra Hasanshahi1, Tayebeh Hashempour1, Mohamad Motamedifar1,2.
Abstract
Human Papilloma Virus (HPV) genome encodes several proteins, as L1is major capsid protein and L2 is minor capsid protein. Among all HPV types HPV-16 and HPV-18 are the most common high-risk HPV (HR-HPV) types globally and the majority of cases are infected with these types. HPV entry and the initial interaction with the host cell are mainly related to the L1 protein which is the main component of HPV vaccines. The aim of this research was comparison analysis among all Iranian L1 protein sequences submitted in NCBI GenBank to find the major substitutions as well as structural and immune properties of this protein. All sequences HPV L1 protein from Iranian isolates from 2014 to 2016 were selected and obtained from NCBI data bank. "CLC Genomics Workbench" was used to translate alignment. To predict B cell epitopes, we employed several programs. Modification sites such as phosphorylation, glycosylation, and disulfide bonds were determined. Secondary and tertiary structures of all sequences were analyzed. Several mutations were found and major mutations were in amino acid residues 102, 202, 207, 292, 379, and 502. The mentioned mutations showed the minor effect on B cell and physicochemical properties of the L1 protein. Six disulfide bonds were determined in L1 protein and also in several N-link glycosylation and phosphorylation sites. Five L1 loops were determined, which had great potential to be B cell epitopes with high antigenic properties. All in all, this research as the first report from Iran described the tremendous potential of two L1 loops (BC and FG) to induce immune system which can be used as the descent candidate to design a new vaccine against HPV in the Iranian population. In addition, some differences between the reference sequence and Iranian patients' sequences were determined. It is essential to consider these differences to monitor the effectiveness and efficacy of the vaccine for the Iranian population. Our results provide a vast understanding of L1 protein that can be useful for further studies on HPV infections and new vaccine generations. © Institute of Molecular Biology, Slovak Academy of Sciences 2019.Entities:
Keywords: Bioinformatics; HPV; L1
Year: 2019 PMID: 32435064 PMCID: PMC7223900 DOI: 10.2478/s11756-019-00386-w
Source DB: PubMed Journal: Biologia (Bratisl) ISSN: 0006-3088 Impact factor: 1.350
The software used in this study and related URLs
| Software | URL | Function | |
|---|---|---|---|
| 1 | Signal-BLAST | Signal peptide prediction | |
| 2 | predisi | Signal peptide prediction | |
| 3 | SignalP | Signal peptide prediction | |
| 4 | ProtParam | Physicochemical properties | |
| 5 | DiANNA | Disulfide bonds prediction | |
| 6 | SCRATCH | Disulfide bonds prediction | |
| 7 | NetPhosK | Phosphorylation sites prediction | |
| 8 | DISPHOS | Phosphorylation sites prediction | |
| 9 | NetPhos | Phosphorylation sites prediction | |
| 10 | NetNGlyc | N-glycosylation sites prediction | |
| 11 | GlycoEP | N-glycosylation sites prediction | |
| 12 | SOPMA | Secondary structure prediction | |
| 13 | Phyre2 | Secondary structure prediction | |
| 14 | I-TASSER | Tertiary structure prediction | |
| 15 | Phyre2 | Tertiary structure prediction | |
| 16 | (PS)2-v2 | Tertiary structure prediction | |
| 17 | Qmean | Tertiary structure qulification | |
| 18 | Rammpage | Ramachandran Plot Analysis | |
| 19 | immuneepitope | Immuno-informatic analysis | |
| 20 | BcePred | Immuno-informatic analysis | |
| 21 | ABCpred | Immuno-informatic analysis | |
| 22 | Bepipred | Immuno-informatic analysis | |
| 23 | AlgPred | Immuno-informatic analysis | |
| 24 | VaxiJen | Immuno-informatic analysis | |
| 25 | IEDB | Immuno-informatic analysis |
All mutations which were summarized and all sequences harbored substitution in amino acid 60 and Selected sequences groups based on the most prevalent mutations
| Mutations | Frequency |
| Q 2 E | 1(1.6%) |
| L 39 S | 1(1.6%) |
| H 102 Y | 40(66.6%) |
| T 202 N | 40(66.6%) |
| N 207 T | 1(1.6%) |
| V 220 I | 1(1.6%) |
| H 228 D | 60 (100%) |
| T 292 A | 53(88.3%) |
| A 310 V | 1(1.6%) |
| T 379 P | 40(66.6%) |
| T 424 S | 1(1.6%) |
| L 502 F | 40(66.6%) |
| K 514 N | 1(1.6%) |
| Groups | Mutations for each group |
| 1 | 102, 202, 207,228, 292, 379, and 502 |
| 2 | 228 and292 |
| 3 | 228 |
Physicochemical properties of L1 protein in selected sequences and reference sequence
| Protparam | ref(K02718.1) | 1 | 2 | 3 |
| Molecular weight | 59,554.02 | 59,542.99 | 59,473.93 | 59,503.95 |
| pI | 8.27 | 8.26 | 8.27 | 8.27 |
| Estimated half-life | 30 h (mammalian reticulocytes, in vitro). | 30 h (mammalian reticulocytes, in vitro). | 30 h (mammalian reticulocytes, in vitro). | 30 h (mammalian reticulocytes, in vitro). |
| >20 h (yeast, in vivo). | >20 h (yeast, in vivo). | >20 h (yeast, in vivo). | >20 h (yeast, in vivo). | |
| >10 h ( | >10 h (Escherichia coli, in vivo). | >10 h (Escherichia coli, in vivo). | >10 h (Escherichia coli, in vivo). | |
| Instability index | 36.11 stable. | 35.87 stable | 36.19 stable | 36.19 stable |
| GRAVY | −0.298 | −0.294 | −0.289 | −0.294 |
| Aliphatic index | 76.53 | 75.99 | 76.72 | 76.53 |
Disulfide bonds were computed by DiANNA and SCRATCH with several similar positions among the sequences
| disulfide bonds | Reference | 1 | 2 | 3 |
|---|---|---|---|---|
| 13–13 | ||||
| 13–211 | ||||
| 13–371 | ||||
| 128–211 | ||||
| 172–405 | ||||
| 201–453 | + | – | – | – |
| 211–453 | + | – | – | – |
| 211–454 | – | + | + | + |
| 255–371 | ||||
| 405–453 | + | – | – | – |
| 350–454 | – | + | + | + |
| 187–454 | – | + | + | + |
| 350–371 | – | + | – | – |
+: Mentioned band was predicted in the sequence. -: Mentioned band was not found in the sequence
Phosphorylation sites found by NetPhosK”, “DISPHOS” and “NetPhos; several serine, threonine, and tyrosine amino acids being predicted
| Serine | Threonine | Tyrosine | |
|---|---|---|---|
| Reference | 49–115–244-308-324-369-375-422-518-519-521 | 36–65–121-292-320-362-380-439-443-544-516-523-507-517-520-522 | 53–161–260-268-302-381-444 |
| 1 | + | + and 376, 514 | + |
| 2 | + | + and 507,514 | + |
| 3 | + | + and 229,507,514 | + |
+: All mentioned positions were predicted in the sequence
Secondary structure prediction results for L1 protein; the majority of L1 protein structure consisting of random coil
| Secondary structure | Reference | 1 | 2 | 3 |
|---|---|---|---|---|
| Alpha helix | 19.77% | 19.59% | 19.96% | 19.40% |
| Extended strand | 28.25% | 28.81% | 28.63% | 28.63% |
| Beta turn | 9.42% | 8.85% | 9.04% | 9.42% |
| Random coil | 42.56% | 42.75% | 42.37% | 42.56% |
Fig. 1HPV L1 3D model structure; yellow: 5 identified loops, and red: α-helixs
Ramachandran plot and Qmean results for the selected and reference sequences
| Phyre2 | (PS)2-v2 | I-TASSER | ||||
|---|---|---|---|---|---|---|
| Ramachandrana | Qmean | Ramachandrana | Qmean | Ramachandrana | Qmean | |
| Ref | 86%,10.2% | −7.34 | 95.4%,4.2% | −3.73 | 79.2%, 14.4% | −8.79 |
| 1 | 85.5%,10.3% | −7.6 | 95.1%,4.4% | −3.51 | 80.3%, 14.0% | −7.71 |
| 2 | 85.7%,10.1% | −7.57 | 95.8%,4.0% | −3.98 | 81.1%, 14.0% | −6.77 |
| 3 | 85.7%,10.1% | −7.57 | 95.6%,4.0% | −3.72 | 81.9%, 14.0% | −6.12 |
aPercentage of favored, and allowed regions
Fig. 2Coverage of the predicted tertiary structures by 3 reliable software. The coverage of “I-TASSER” was 100%; it was 90% for “Phyre2”, and around 85% for “(PS)2-v2”
Five identified loops on the L1 protein and the regions related to each loop. Codons which were mutated in selected sequences were bolded
| Region | Loop |
|---|---|
| 76-FPIKKPNNNKILVPA-89 | BC or A |
| 157-NASAYAANAGVDNR-170 | DE or B |
| 198-GSPC | EF or C |
| 292- | FG or D |
| 374-ISTSE | H1 or E |
Fig. 3Propensity scale plots of L1, Flexibility, Hydrophilicity, and Surface accessibility. The Horizontal red line is the threshold. Yellow colors, above the threshold, indicate favorable regions consisting of higher scored residues
The comparison among the all identified loops by considering hydrophilicity, flexibility, surface accessibility, and B cell epitope prediction
| Loops | Surface accessibility | Flexibility | Hydrophilicity | B-cell epitope | Final selection |
|---|---|---|---|---|---|
| BC | * | * | * | * | |
| DE | * | * | * | ||
| EF | * | * | |||
| FG | * | * | * | * | |
| H1 | * | * |
Finally, two loops BC, and FG were selected as the most capable regions. *: indicates the more capable loop in each parameter