| Literature DB >> 34532764 |
Christian Mann1, Justin H Griffin1, Kevin M Downard2.
Abstract
Mass mapping using high-resolution mass spectrometry has been applied to identify and rapidly distinguish SARS-CoV-2 coronavirus strains across five major variants of concern. Deletions or mutations within the surface spike protein across these variants, which originated in the UK, South Africa, Brazil and India (known as the alpha, beta, gamma and delta variants respectively), lead to associated mass differences in the mass maps. Peptides of unique mass have thus been determined that can be used to identify and distinguish the variants. The same mass map profiles are also utilized to construct phylogenetic trees, without the need for protein (or gene) sequences or their alignment, in order to chart and study viral evolution. The combined strategy offers advantages over conventional PCR-based gene-based approaches exploiting the ease with which protein mass maps can be generated and the speed and sensitivity of mass spectrometric analysis.Entities:
Keywords: Coronavirus; Evolution; Mass spectrometry; SARS-CoV-2; Variants; Virus
Mesh:
Year: 2021 PMID: 34532764 PMCID: PMC8445501 DOI: 10.1007/s00216-021-03649-1
Source DB: PubMed Journal: Anal Bioanal Chem ISSN: 1618-2642 Impact factor: 4.142
Original coronavirus reference strain and tryptic and GluC proteolytic segments that contain sites of mutations in surface spike protein within five major variants of concern
| Reference | Wuhan, China | L18 or T19 or T20 | 1–21 | MFVFLVLLPLVSSQCVNLTTR | 2380.3132 |
| P26 | 22–34 | TQLPPAYTNSFTR | 1495.7540 | ||
| HV del 69–70 | 54–77 | LFLPFFSNVTWFHAIHVSGTNGTK | 2720.3984 | ||
| D80 | 79–80 | FD | 281.1132 | ||
| T95 | 89–96 | GVYFASTE | 873.3990 | ||
| D138 | 133–138 | FQFCND | 773.2923 | ||
| G142 or Y del 144 | 139–147 | PFLGVYYHK | 1123.5935 | ||
| E154 | 151–154 | SWME | 552.2123 | ||
| R158 or del 156–157 | 155–158 | SEFR | 538.2620 | ||
| R190 | 188–190 | NLR | 402.2460 | ||
| D215 | 215 | D | 134.0448 | ||
| 242–244 del. or R246I | 238–246 | FQTLLALHR | 1098.6419 | ||
| K417N | 409–417 | QIAPGQTGK | 899.4946 | ||
| L452 | 429–453 | FTGCVIAWNSNNLD | 1553.7053 | ||
| T478 + E484 | 472–484 | IYQAGSTPCNGVE | 1338.5995 | ||
| N501 | 499–504 | STNLVK | 661.3880 | ||
| A570 | 569–571 | IAD | 318.1660 | ||
| D614 | 587–614 | ITPCSFGGVSVITPGTNTSNQVAVLYQD | 2868.4084 | ||
| H655 | 655–661 | HVNNSYE | 862.3690 | ||
| P681 | 664–682 | IPIGAGICASYQTQTNSPR | 1976.9859 | ||
| A701 | 686–702 | SVASQSIIAYTMSLGAE | 1727.8521 | ||
| T716 | 703–725 | NSVAYSNNSIAIPTNFTISVTTE | 2443.1988 | ||
| D950 | 948–950 | LQD | 375.1875 | ||
| S982 | 979–982 | ILSR | 488.3192 | ||
| T1027 | 1019–1028 | ASANLAATK | 846.4680 | ||
| D1118 | 1112–1118 | PQIITTD | 787.4197 | ||
| V1176 | 1169–1181 | ISGINASVVNIQK | 1342.7689 |
aBased on NCBI protein database sequence QHD43416.1
Major coronavirus variants of concern, mutation sites in surface spike protein and unique peptide masses that distinguish such strains
| B.1.17 (Alpha) | UK | HV69–70 del. | 54–77 minus 69–70 | LFLPFFSNVTWFHAISGTNGTK | 2484.2711 | 2484.2711 |
| Y144 del. | 139–147 minus 144 | PFLGVYHK | 960.5302 | 960.5302 | ||
| N501Y | 499–504 | ST | 710.4084 | |||
| A570D | 569–570 | I | 247.1289 | (247.1289) | ||
| D614G | 587–619 | ITPCSFGGVSVITPGTNTSNQVAVLYQ | 3356.6138 | |||
| P681H | 664–682 | IPIGAGICASYQTQTNS | 2016.9920 | 2016.9920 | ||
| T716I | 703–725 | NSVAYSNNSIAIP | 2455.2352 | 2455.2352 | ||
| S982A | 979–982 | IL | 472.3242 | (472.3242) | ||
| D1118H | 1112–1127 | PQIITT | 1746.8116 | 1746.8116 | ||
| B.1.351 (Beta) | South Africa | L18F | 1–21 | MFVFLVLLPLVSSQCVN | 2414.2975 | |
| D80A | 79–88 | F | 1133.5626 | 1133.5626 | ||
| D215G | 215–224 | GLPQ | 1018.5204 | 1018.5204 | ||
| LAL 242–244 del. | 238–246 minus 242–244 | FQTLHR | 801.4366 | 801.4366 | ||
| R246I | 238–253 | FQTLLALH | 1788.9531 | 1788.9531 | ||
| K417N | 409–419 | QIAPGQTGN | 1184.5906 | 1184.5906 | ||
| E484K | 472–484 | IYQAGSTPCNGV | 1337.6519 | 1337.6519 | ||
| N501Y | 499–504 | ST | 710.4084 | |||
| D614G | 587–619 | ITPCSFGGVSVITPGTNTSNQVAVLYQ | 3356.6138 | |||
| A701V | 686–702 | SVASQSIIAYTMSLG | 1755.8834 | 1755.8834 | ||
| B.1.617 (Delta) | India | T95I | 89–96 | GVYFAS | 885.4353 | 885.4353 |
| G142D | 139–142 | PFL | 491.2501 | |||
| E154K | 151–154 | SWM | 551.2647 | 551.2647 | ||
| L452R | 429–452 | FTGCVIAWNSNN | 1481.6955 | |||
| E484Q | 472–509 | IYQAGSTPCNGVQGFNCYFPLQSYGFQPTNGVGYQPYR | 4221.9222 | 4221.9222 | ||
| D614G | 587–619 | ITPCSFGGVSVITPGTNTSNQVAVLYQ | 3356.6138 | |||
| P681R | 664–681 | IPIGAGICASYQTQTNS | 1879.9331 | |||
| B.1.617.2 (Delta plus) | India | T19R | 1–19 | MFVFLVLLPLVSSQCVNL | 2178.2178 | 2178.2178 |
| G142D | 139–142 | PFL | 491.2501 | |||
| EF156–157 del. | 155–158 minus 156–157 | SR | 262.1510 | (262.1510) | ||
| R158G | 154–169 | SEF | 1654.6690 | 1654.6690 | ||
| L452R | 429–452 | FTGCVIAWNSNN | 1481.6955 | |||
| T478K | 472–478 | IYQAGS | 766.4094 | 766.4094 | ||
| D614G | 587–619 | ITPCSFGGVSVITPGTNTSNQVAVLYQ | 3356.6138 | |||
| P681R | 664–681 | IPIGAGICASYQTQTNS | 1879.9331 | |||
| D950N | 948–964 | LQ | 1867.0396 | 1867.0396 | ||
| P.1 (Gamma) | Brazil | L18F | 1–21 | MFVFLVLLPLVSSQCVN | 2414.2975 | |
| T20N | 1–21 | MFVFLVLLPLVSSQCVNLT | 2393.3084 | 2393.3084 | ||
| P26S | 22–34 | TQLP | 1485.7333 | 1485.7333 | ||
| D138Y | 133–147 | FQFCN | 1925.9043 | 1925.9043 | ||
| R190S | 188–191 | NL | 462.2195 | (462.2195) | ||
| K417T | 409–420 | QIAPGQTG | 1171.5954 | 1171.5954 | ||
| E484K | 472–484 | IYQAGSTPCNGV | 1337.6519 | 1337.6519 | ||
| N501Y | 499–504 | ST | 710.4084 | 710.4084 | ||
| D614G | 587–619 | ITPCSFGGVSVITPGTNTSNQVAVLYQ | 3356.6138 | |||
| H655Y | 655–661 | 888.3734 | 888.3734 | |||
| T1027I | 1019–1028 | ASANLAA | 858.5044 | 858.5044 | ||
| V1176F | 1169–1181 | ISGINAS | 1390.7689 | 1390.7689 |
aResidue numbering is based on the originating strain and may differ in some variants due to the presence of deletion sites
bAll strain distinguishing peptides do not contain proline (F817P, A892P, A899P, A942P, K986P, V987P) or alanine substitutions (R683A and R685A) added to the recombinant forms for the variants introduced to stabilize the S-protein trimer
cThose with masses lower than 500 are bracketed since they typically appear among matrix background ions in MALDI mass spectra. All other peptides differ in mass by at least 83 ppm, as is the case for mass 1133.5626 and that of 1133.6565 for missed cleaved peptide 821–830 (of sequence LLFNKVTLAD) for the spike protein of the original reference strain
Fig. 1High-resolution MALDI mass spectra for the doubly digested (trypsin + GluC) S-protein extricated from laboratory grown virus. Peaks labelled in bold represent regions containing mutations in major variants of concern
Tryptic + GluC peptide ions detected for spike protein from lab grown specimen, their sequences and location
| 846.4690 | 846.4680 | + 1.2 | 1020–1028 | ASANLAATK | S2 undefined |
| 1045.4650 | 1045.4659 | − 0.9 | 390–398 | LCFTNVYAD | S1 subunit receptor-binding domain (RBD) |
| 1139.6001 | 1139.5996 | + 0.4 | 559–567 | FLPFQQFGR | S1 undefined |
| 1206.6671 | 1206.6663 | + 0.7 | 517–528 | LLHAPATVCGPK | S1 subunit receptor-binding domain (RBD)—partial |
| 1290.6985 | 1290.6974 | + 0.7 | 726–737 (1) | ILPVSMTKTSVD | S2 undefined |
| 1576.7071 | 1576.7060 | + 0.7 | 647–661 | AGCLIGAEHVNNSYE | S1 subunit C-terminal domain (CTD) |
| 1727.8529 | 1727.8520 | + 0.5 | 686–702 | SVASQSIIAYTMSLGAE | S2 subunit N-terminus at furin cleavage site |
| 1743.8478 | 1743.8469 | + 0.5 | 686–702 (+O) | SVASQSIIAYTMSLGAE | S2 subunit N-terminus at furin cleavage site |
| 1801.9139 | 1801.9133 | + 0.3 | 341–355 (1) | VFNATRFASVYAWNR | S1 subunit receptor-binding domain (RBD) |
| 3044.6021 | 3044.6011 | + 0.3 | 951–979 (1) | VVNQNAQALNTLVKQLSSNFGAISSVLND | HR1 domain—partial |
| 3209.6026 | 3209.6035 | − 0.3 | 584–614 (1) | ILDITPCSFGGVSVITPGTNTSNQVAVLYQD | S1 subunit receptor-binding domain (RBD)—partial |
aBased on NCBI protein sequence QHD43416.1 where residues denoted (+O) are associated with an oxidized methionine residues and those with a (1) containing one missed cleavage site; all others contain no missed cleavage sites. Bolded entries represent regions that allow variants to be distinguished as identified in Table 2
bAs defined in UniPro knowledge base (uniprokb) at https://covid-19.uniprot.org/uniprotkb/ and ref. Acta Pharmacologica Sinica
Fig. 2High-resolution MALDI mass spectra for the doubly-digested (trypsin + GluC) recombinant S-protein for five major variants of concern. Peaks labelled horizontally containing mutations that distinguish the variants. Residue segments for all peaks are provided in Supplementary Table 1
Fig. 3Mass tree for the S-protein of an originating strain and five major variants of concern, constructed using the mass map data of Figs. 1 and 2