| Literature DB >> 35156058 |
Abstract
The comparison of 303,250 human SARS-CoV-2 spike protein sequences with the reference protein sequence Wuhan-Hu-1, showed ∼96.5% of the spike protein sequence has undergone the mutations till date, since outbreak of the COVID-19 pandemic disease that was first reported in December 2019. A total of 1,269,629 mutations were detected corresponding to 1,229 distinct mutation sites in the spike proteins comprising 1,273 amino acid residues. Thereby, ∼3.5% of the human SARS-CoV-2 spike protein sequence has remained invariant in the past two years. Considering different mutations occur at the same mutation site, a total of 4,729 distinct mutations were observed and are catalogued in the present work. The WHO/CDC, U.S.A., classification and definitions for the current variants being monitored (VBM) and variant of concern (VOC) are assigned to the SARS-CoV-2 spike protein mutations identified in the present work along with a list of other amino acid substitutions observed for the variants. All 195 amino acid residues in receptor binding domain (Thr333-Pro527) were associated with mutations in SARS-CoV-2 spike protein sequence including Lys417, Tyr449, Tyr453, Ala475, Asn487, Thr500, Asn501 and Gly502 that make interactions with the ACE-2 receptor ≤3.2 Å distance as observed in the crystal structure complex available in the Protein Data Bank (PDB code:6LZG). However, not all these residues were mutated in the same spike protein. Especially, Gly502 mutated only in two spike protein sequences and Tyr449 mutated only in seven spike protein sequences among the spike protein sequences analysed constitute potential sites for the design of suitable inhibitors/drugs. Further, forty-four invariant residues were observed that correspond to ten domains/regions in the SARS-CoV-2 spike protein and some of the residues exposed to the protein surface amongst these may serve as epitope targets to develop monoclonal antibodies.Entities:
Keywords: Drug design sites; Epitope sites; Human SARS-CoV-2 mutations; Invariant sites; Mutation propensity
Year: 2022 PMID: 35156058 PMCID: PMC8824715 DOI: 10.1016/j.crstbi.2022.01.002
Source DB: PubMed Journal: Curr Res Struct Biol ISSN: 2665-928X
Distribution of total numbers of distinct mutation sites and mutations in domains/regions of the human SARS-CoV-2 spike proteins relative to the reference protein sequence.
| Domains/Regions | Number of amino acid residues in domains/regions | Total number of distinct mutation sites in domains/regions | Total number of observed mutations |
|---|---|---|---|
| S1A (NTD) (1-302) | 302 | 302 | 422247 |
| S1A-S1B linker (303-332) | 30 | 30 | 2194 |
| S1B (RBD) (333-527) | 195 | 195 | 246622 |
| S1B – S1C linker (528-533) | 6 | 6 | 190 |
| S1C domain (534-589) | 56 | 56 | 5732 |
| S1C – S1D linker (590-593) | 4 | 4 | 85 |
| S1D domain (594-674) | 81 | 81 | 322034 |
| Protease cleavage site (675-692) | 18 | 18 | 87140 |
| S1–S2 subunits linker (693-710) | 18 | 18 | 21631 |
| Central β-strand (711-737) | 27 | 26 | 19294 |
| Downward helix (738-782) | 45 | 41 | 5676 |
| S2′ cleavage site (783-815) | 33 | 32 | 3125 |
| Fusion peptide (816-828) | 13 | 13 | 357 |
| Connecting region (829-911) | 83 | 79 | 6745 |
| Heptad repeat region (912-983) | 72 | 62 | 53655 |
| Central helix (984-1034) | 51 | 40 | 21907 |
| β-hairpin (1035–1068) | 34 | 30 | 1225 |
| β-sheet domain (1069-1133) | 65 | 63 | 11775 |
| Heptad repeat region (1134-1213) | 80 | 75 | 26561 |
| Transmembrane region (1214-1236) | 23 | 23 | 3971 |
| Cytoplasmic region (1237-1273) | 37 | 35 | 7463 |
Fig. 1A. Distribution of the total number of mutations at the mutation sites in human SARS-CoV-2 spike protein sequence. B. Top 26 mutation sites in human SARS-CoV-2 proteins arranged in decreasing order of the total number of observed mutations.
The WHO classification of variants along with the other amino acid substitutions observed [mentioned within square brackets] represented among the 303,250 human SARS-CoV-2 spike protein sequences.
| L5F | (Iota) [L5I, L5V, L5J, L5G] |
| S13I | (Epsilon) [S13Q, S13T, S13G, S13N, S13R, S13C] |
| L18F | (Gamma) [L18R, L18I, L18K, L18V, L18N, L18T, L18P] |
| T19R | (Delta) [T19I, T19L, T19K, T19S, T19A] |
| T20N | (Gamma) [T20I, T20R, T20S, T20A, T20P, T20F] |
| P26S | (Gamma) [P26L, P26H, P26A, P26F, P26R, P26Y, P26T] |
| A67V | (Eta) [A67S, A67T, A67I, A67G, A67P, A67H, A67D] |
| D80G | (Iota) [D80Y, D80A, D80F, D80H, D80N, D80C, D80W, D80P, D80B, D80R, D80E] D80A (Beta) [D80Y, D80G, D80F, D80H, D80N, D80C, D80W, D80P, D80B, D80R, D80E] |
| T95I | (Iota, Kappa, Delta) [T95E, T95A, T95N, T95S, T95K, T95P] |
| D138Y | (Gamma) [D138H, D138C, D138B, D138G, D138F, D138P, D138N, D138A, D138V] |
| G142D | (Kappa, Delta) [G142Y, G142F, G142S, G142V, G142A, G142C, G142L] |
| W152C | (Epsilon) [W152L, W152S, W152R, W152K, W152F] |
| E154K | (Kappa) [E154A, E154W, E154F, E154Q, E154D, E154G, E154V, E154S] |
| F157S | (Iota) [F157C, F157Y, F157L, F157V, F157I] |
| R158G | (Delta) [R158S, R158Y, R158K, R158L, R158I] |
| R190S | (Gamma) [R190V, R190M, R190F, R190K, R190W, R190N, R190L, R190G, R190I] |
| D215G | (Beta) [D215Y, D215A, D215H, D215P, D215N, D215R, D215E, D215V] |
| D253G | (Iota) [D253Y, D253N, D253S, D253A, D253V, D253H] |
| K417T | (Gamma) [K417N, K417R, K417A, K417E, K417M] |
| K417N | (Beta, Delta) [K417T, K417R, K417A, K417E, K417M] |
| L452R | (Epsilon, Iota, Kappa, Delta) [L452M, L452Q, L452W, L452P] |
| S477N | (Iota) [S477I, S477G, S477R, S477K, S477P, S477T, S477B] |
| T478K | (Delta) [T478R, T478C, T478I, T478A] |
| E484K | (Alpha, Beta, Gamma, Eta, Iota, Zeta, Mu) [E484Q, E484Z, E484G, E484A, E484D, E484F, E484R, E484V, E484S] |
| E484Q | (Kappa) [E484K, E484Z, E484G, E484A, E484D, E484F, E484R, E484V, E484S] |
| S494P | (Alpha) [S494G, S494L, S494R, S494T, S494A, S494Q] |
| N501Y | (Alpha, Beta, Gamma, Mu) [N501T, N501S, N501V, N501I, N501H, N501R, N501K] |
| A570D | (Alpha) [A570V, A570S, A570T, A570G] |
| D614G | (Alpha, Beta, Gamma, Epsilon, Eta, Iota, Kappa, Zeta, Mu, Delta) [D614N, D614S, D614A] |
| H655Y | (Gamma) [H655N, H655P, H655R, H655L] |
| Q677H | (Eta) [Q677P, Q677R, Q677E, Q677Y, Q677S, Q677L, Q677K] |
| P681H | (Alpha, Mu) [P681R, P681L, P681Y, P681S] |
| P681R | (Kappa, Delta), [P681H, P681L, P681Y, P681S] |
| A701V | (Beta, Iota) [A701T, A701S, A701E, A701I] |
| T716I | (Alpha) [T716P, T716S] |
| T859N | (Iota) [T859I, T859S] |
| F888L | (Eta) [F888S, F888V] |
| D950H | (Iota) [D950N, D950B, D950Y, D950A, D950E, D950S] |
| D950N | (Delta) [D950B, D950H, D950Y, D950A, D950E, D950S] |
| S982A | (Alpha) [S982L] |
| T1027I | (Gamma) [T1027A, T1027S, T1027N] |
| Q1071H | (Kappa) [Q1071L, Q1071R, Q1071Y] |
| D1118H | (Alpha) [D1118G, D1118A, D1118Y] |
| V1176F | (Zeta) [V1176I] |
| K1191N | (Alpha) [K1191R, K1191T, K1191E, K1191M, K1191Q] |
Fig. 2Mutation percentages in different domains/regions of the human SARS-CoV-2 spike proteins.
Fig. 3Amino acid mutation propensity corresponding to different domains/regions in human SARS-CoV-2 spike proteins.
Domain association of the non-mutated amino acid residues among 303,250 human SARS-CoV-2 spike proteins with reference to the Wuhan-Hu-1 spike protein sequence. (x) Indicates missing residues in three-dimensional structure.
| Domains/Regions | Non-mutated amino acid residues |
|---|---|
| Central β-strand (711-737) | F718 |
| Downward helix (738-782) | C743, S746, C749, F782 |
| S2′ cleavage site (783-815) | N801 |
| Connecting region (829-911) | C840 (x), L878, G880, Q901 |
| Heptad repeat region (912-983) | Q920, Q935, N953, N955, L959, L962, L966, F970, S974, L977 |
| Central helix (984-1034) | E988, Q992, L996, R1000, L1004, Y1007, Q1010, I1013, K1028, M1029, C1032 |
| β-hairpin (1035–1068) | Q1036, S1051, Q1054, H1064 |
| β-sheet domain (1069-1133) | T1105, C1126 |
| Heptad repeat region (1134-1213) | L1145, F1148 (x), L1152 (x), L1193 (x), Y1209 (x) |
| Cytoplasmic region (1237-1273) | F1256 (x), K1269 (x) |
Fig. 4Invariant amino acid residue positions and ACE-2 interacting sites in human SARS-CoV-2 spike protein.
Fig. 5Non-mutated site propensities corresponding to the different domains/regions in human SARS-CoV-2 spike protein.
Fig. 6A. Surface representation of the human SARS-CoV-2 spike protein three-dimensional structure (PDB code:6VXX) with A-chain (red), B-chain (green), C-chain (blue) showing invariant residues exposed on the protein surface in heptad repeat region (912-983), S2′ cleavage site (783-815) and central helix (984-1034). B. View showing proximity of invariant hydrophobic residues from different protein chains in heptad repeat region (1134-1213) exposed on the protein surface. C. View showing β-sheet domain (1069-1133) invariant residue close to glycosylation site exposed on the protein surface. D. View showing central helix (984-1034) invariant residues exposed on the protein surface. (For interpretation of the references to colour in this figure legend, the reader is referred to the Web version of this article.)
Other mutations in human SARS-CoV-2 spike proteins corresponding to the NCBI Accession codes containing the RBD mutations at Y449 and G502.
| NCBI Accession codes | Mutations in human SARS-CoV-2 spike proteins comprising the mutations at Y449 and G502 |
|---|---|
| QWN56156.1 | V70L, Y449S, A570D, D614G, P681H, T716I, S982A, D1118H |
| QZJ78555.1 | I68T, T95I, Y449S, D614G |
| QNH88954.1 | Y449D, D614G |
| QJD23270.1 | Y449N |
| UDB67143.1 | T95I, Y144S, Y145N, R346K, Y449N, E484K, N501Y, E583D, D614G, P681H, D950N |
| UCQ96089.1 | T19R, Y449H, L452R, T478K. D614G, P681R, D950N |
| UCK95525.1 | P209S, S359T, Y449F, G799D |
| UAT83124.1 | T19R, L452R, T478K, N501T, G502E, V503C, G504K, D614G, P681R, D950N |
| QTY96446.1 | T478K, G502V, D614G, P681H, T732A |
Fig. 7A. Tyr449 side-chain interactions ≤3.2 Å in spike protein RBD with ACE2 receptor (PDB code:6LZG). B. Gly502 main-chain interactions ≤3.2 Å in spike protein RBD with ACE2 receptor (PDB code:6LZG). C. Asn487 side-chain interactions ≤3.2 Å in spike protein RBD with ACE2 receptor (PDB code:6LZG).