| Literature DB >> 15592899 |
Abstract
In this study, we used the probabilistic models developed by us over the last several years to analyze 158 proteins from coronaviruses in order to determine which protein is more vulnerable to mutations. The results provide three lines of evidence suggesting that the spike glycoprotein is different from the other coronavirus proteins: (1) the spike glycoprotein is more sensitive to mutations, this is the current state of the spike glycoprotein, (2) the spike glycoprotein has undergone more mutations in the past, this is the history of spike glycoprotein, and (3) the spike glycoprotein has a bigger potential towards future mutations, this is the future of spike glycoprotein. Furthermore, this study gives a clue on the species susceptibility regarding different proteins.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15592899 PMCID: PMC7088192 DOI: 10.1007/s00894-004-0210-0
Source DB: PubMed Journal: J Mol Model ISSN: 0948-5023 Impact factor: 1.810
Fig. 1Predictable and unpredictable portions in coronavirus proteins. The data are presented as median with interquartile range. * the predictable and unpredictable portions in spike glycoprotein group are statistically different from any other protein groups at p<0.05 level, except for hemagglutinin-esterase precursor group. # the predictable and unpredictable portions in spike glycoprotein group are statistically different from hemagglutinin-esterase precursor, membrane protein and nucleocapsid protein groups at p<0.05 level. † the predictable and unpredictable portions in spike glycoprotein group are statistically different from hemagglutinin-esterase precursor, and membrane protein groups at p<0.05 level.
Fig. 2Percent of unpredictable types and frequencies with respect to whether the actual value is larger or smaller than the predicted value in coronavirus proteins. The data are presented as mean ± SD. * the percents of unpredictable types/frequencies in spike glycoprotein group are statistically different from other protein groups at p<0.05 level. # the percents of unpredictable types in spike glycoprotein group are statistically different from any other protein groups at p<0.05 level, except for hemagglutinin-esterase precursor and nucleocapsid protein groups.
Unpredictable absent amino-acid pairs that disappear from a group of proteins
| Hemagglutinin-esterase precursor | Spike glycoprotein |
|---|---|
| RA, RD, NQ, DR, CA, CS, QF, IK, LK, FA, FC, FQ, FP, VK | WI |
Fig. 3Magnitude of difference between actual and predicted values in coronavirus proteins. The data are presented as mean ± SD. * indicates the difference between actual and predicted values in spike glycoprotein group is statistically different from any other protein group at p<0.05 level. # indicates the difference between actual and predicted values in spike glycoprotein group is statistically different from other protein groups at p<0.05 level, except for envelope protein group.
Fig. 4Number of amino-acid pairs in envelop proteins from different species with respect to the difference between their actual and predicted values. The data are presented as mean ± SD.
Fig. 5Number of amino-acid pairs in hemagglutinin-esterase precursor proteins from different species with respect to the difference between their actual and predicted values. The data are presented as mean ± SD.
Fig. 6Number of amino-acid pairs in membrane glycoproteins from different species with respect to the difference between their actual and predicted values. The data are presented as mean ± SD.
Fig. 7Number of amino-acid pairs in nonstructural proteins from different species with respect to the difference between their actual and predicted values. The data are presented as mean ± SD.
Fig. 8Number of amino-acid pairs in nucleocapsid proteins from different species with respect to the difference between their actual and predicted values. The data are presented as mean ± SD.
Fig. 9Number of amino-acid pairs in spike glycoproteins from different species with respect to the difference between their actual and predicted values. The data are presented as mean ± SD.
Fig. 10Number of amino-acid pairs in other proteins from different species with respect to the difference between their actual and predicted values. The data are presented as mean ± SD.
Characteristics of the proteins that we have studied in the past
| Protein | I | II | III | IV | V | VI | VII | VIII | IX | Refence |
|---|---|---|---|---|---|---|---|---|---|---|
| BTK | 62.46 | 71.88 | 36.25 | 12.77 | −1.26 | −1.30 | 1.46 | 1.72 | 112 | [ |
| CA54 | 73.75 | 93.47 | 36.50 | 20.31 | −3.86 | −10.96 | 4.68 | 41.52 | 151 | [ |
| FA9 | 62.35 | 72.83 | 32.00 | 9.35 | −1.13 | −1.09 | 1.37 | 1.60 | 99 | [ |
| GLCM | 59.77 | 71.59 | 37.25 | 14.39 | −1.12 | −1.12 | 1.51 | 1.93 | 109 | [ |
| HBA | 61.62 | 68.57 | 10.75 | 4.29 | −1.02 | −1.00 | 1.19 | 1.49 | 133 | [ |
| LDLR | 69.61 | 80.21 | 40.50 | 18.74 | −1.43 | −1.32 | 1.86 | 2.43 | 127 | [ |
| Human p53 | 57.14 | 68.37 | 30.75 | 5.87 | −1.15 | −1.13 | 1.45 | 1.84 | 190 | [ |
| PH4H | 59.83 | 71.84 | 28.50 | 7.10 | −1.16 | −1.06 | 1.39 | 1.62 | 187 | [ |
| VHL | 72.46 | 78.30 | 18.00 | 9.91 | −1.07 | −1.05 | 1.24 | 1.634 | 109 | [ |
| RUN1 | 64.22 | 75.22 | 6 | [ | ||||||
| ADHA | 55.98 | 64.61 | [ | |||||||
| CTGF | 58.46 | 70.40 | [ | |||||||
| GSHR | 57.70 | 68.71 | 1 | [ | ||||||
| AO FB | 62.40 | 73.94 | [ | |||||||
| LIS1 | 56.76 | 71.32 | 5 | [ | ||||||
| TNFA | 59.24 | 69.40 | [ | |||||||
| TYRO | 45.45 | 58.14 | 64 | [ | ||||||
| ATTY | 53.36 | 67.55 | 1 | [ | ||||||
| Bovin p53 | 62.44 | 71.95 | [ | |||||||
| Mouse p53 | 60.85 | 74.29 | 3 | [ | ||||||
| Sheep p53 | 60.19 | 70.34 | [ | |||||||
| AMPC | 54.63 | 66.32 | 9 | [ | ||||||
| DOPO | 61.13 | 73.75 | 8 | [ |
BTK human Bruton’s tyrosine kinase, CA54 human collagen α5(IV) chain precursor, FA9 human coagulation factor IX precursor, GLCM human β-glucocerebrosidase, HBA haemoglobin α chain, LDLR human low-density lipoprotein receptor, PH4H human phenylalanine hydroxylase protein, VHL Von Hippel-Lindau disease tumor suppressor, RUN1 human acute myeloid leukemia 1 protein, ADHA human alcohol dehydrogenase α-chain, CTGF human connective tissue growth factor, GSHR human glutathione reductase, AOFB human monoamine oxidase B, LIS1 human platelet-activating factor acetylhydrolase α-subunit, TNFA human tumor necrosis factor, TYRO human tyrosinase, ATTY human tyrosine aminotransferase, AMPC_CITFR Citrobacter Freundii β-lactamase, DOPO human dopamine β-hydroxylase, I percent of unpredictable portion of present types, II percent of unpredictable portion of present frequencies, III percent of unpredictable present types whose actual values are smaller than predicted values, IV percent of unpredictable present frequencies whose actual values are smaller than predicted values, V difference between actual and predicted values in unpredictably present types whose actual values are smaller than predicted values, VI difference between actual and predicted values in unpredictably present frequencies whose actual values are smaller than predicted values, VII difference between actual and predicted values in unpredictably present types whose actual values are larger than predicted values, VIII difference between actual and predicted values in unpredictably present frequencies whose actual values are larger than predicted values, IX number of mutations