Literature DB >> 35600141

Identification of Endogenous Peptides in Nasal Swab Transport Media used in MALDI-TOF-MS Based COVID-19 Screening.

Helen Tsai¹, Brett S Phinney¹, Gabriela Grigorean¹, Michelle R Salemi¹, Hooman H Rashidi², John Pepper^3,4, Nam K Tran².

Abstract

Mass spectrometry (MS) based diagnostic detection of 2019 novel coronavirus infectious disease (COVID-19) has been postulated to be a useful alternative to classical PCR based diagnostics. These MS based approaches have the potential to be both rapid and sensitive and can be done on-site without requiring a dedicated laboratory or depending on constrained supply chains (i.e., reagents and consumables). Matrix-assisted laser desorption ionization (MALDI)-time-of-flight (TOF) MS has a long and established history of microorganism detection and systemic disease assessment. Previously, we have shown that automated machine learning (ML) enhanced MALDI-TOF-MS screening of nasal swabs can be both sensitive and specific for COVID-19 detection. The underlying molecules responsible for this detection are generally unknown nor are they required for this automated ML platform to detect COVID-19. However, the identification of these molecules is important for understanding both the mechanism of detection and potentially the biology of the underlying infection. Here, we used nanoscale liquid chromatography tandem MS to identify endogenous peptides found in nasal swab saline transport media to identify peptides in the same the mass over charge (m/z) values observed by the MALDI-TOF-MS method. With our peptidomics workflow, we demonstrate that we can identify endogenous peptides and endogenous protease cut sites. Further, we show that SARS-CoV-2 viral peptides were not readily detected and are highly unlikely to be responsible for the accuracy of MALDI based SARS-CoV-2 diagnostics. Further analysis with more samples will be needed to validate our findings, but the methodology proves to be promising.

Entities: Chemical

Year: 2022 PMID： 35600141 PMCID： PMC9113002 DOI： 10.1021/acsomega.2c01864

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

The first known case of novel coronavirus infectious disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) was first identified in Wuhan, China, in December 2019. The disease has since quickly escalated into a global pandemic. In response, scientists from around the world have generated considerable research that has led to a better understanding of SARS-CoV-2 and the management of COVID-19. Unfortunately, we still face problems with limiting the spread of COVID-19 and highly infectious variants. Rapid, on-site, and robust screening for SARS-CoV-2 infection could enhance both containment and reduction in infectivity. A rapid on-site test and constant screening can help determine the social restriction policies, track new variants and their spread, and assess treatments. For a rapid on-site test to be viable, it will need to have adequate sensitivity and specificity, as well as yield low false-positive rate (FPR) and false-negative rate (FNR). There are many methodologies for detecting SARS-CoV-2 including molecular and antigen approaches. Molecular methods, such as reverse transcription polymerase chain reaction (RT-PCR), are the accepted gold standard. These molecular methods are highly sensitive and specific and can be automated to provide high throughput testing capacity. However, these molecular methods often require a specialized laboratory and reagents that can be in short supply and significant infrastructure for the transportation of samples. Results are typically produced in 24 to 48 h. Point-of-care (POC) molecular methods exist and can report results in as little as 20 min, but these platforms are not widely available and are often impacted by supply chains. Antigen methods are rapid and low-cost alternatives to molecular methods. Both POC and laboratory-based antigen tests are now available but appear to be less sensitive and specific compared to their molecular counterparts. Recently, mass spectrometry (MS) has been used as an alternative to RT-PCR detection, using both liquid chromatography tandem MS (LC-MS/MS)[1−5] and matrix assisted laser desorption ionization (MALDI)–time-of-flight (TOF) MS.[6−9] MS approaches generally do not rely on reagents that can be in short supply or are biologically produced, with the exception of trypsin for bottom up proteomics methods, and have a long track record of successful microorganism identification[10−13] and effective assessment of systemic disease.[14,15] While LC-MS based approaches can be fast (1–5 min) and have high sensitivity and accuracy,[1] they typically require complex instrumentation that requires both dedicated laboratory facilities and highly trained personnel. In contrast, MALDI-TOF-MS approaches can be performed on-site and generally do not require infrastructure such as specialized laboratories and highly trained personnel. MALDI-TOF-MS based techniques have a long-proven track record in clinical microbiology for pathogen identification. These MALDI-TOF-MS based approaches rely on “spectral patterns” of generally unknown components to diagnose disease and detect microorganisms. Recently, Tran et al. demonstrated that machine learning (ML)-enhanced MALDI-TOF-MS screening of SARS-CoV-2 nasal swabs can be both accurate and sensitive.[6] Due to the limitation of MALDI-TOF-MS technology, the underlying molecules responsible for the spectra are unknown. Identification of these components will be useful for the understanding of both the mechanism of detection and the underlying biology. We followed up with an exploratory study using nanoscale LC-MS/MS to identify the underlying peptides that could be responsible for the m/z values seen in the MALDI-TOF spectra. At the onset of the study, our exploratory investigation has the following limitations: (1) nanoscale LC-MS/MS is far more sensitive and can detect more peptides than MALDI-TOF-MS, so direct identification cannot be made but only inferred; (2) our method will only identify potential peptide components of the spectra but will miss other molecules such as lipids and carbohydrates; (3) our sample size is limited so the study serves as a template for future studies. We hypothesize that the peptides attached to the exterior of the nasal swabs used in the MALDI-TOF-MS study are digested by endogenous proteases. This is supported by the mass range in the MALDI-TOF spectrum. Thus, we chose to perform our investigation using a peptidomics workflow, instead of a trypsin-based proteomics workflow, because this method also relies on endogenous proteases for digestion. We believe that our peptidomics workflow can be applied to the nasal swab transport media to identify host proteome profiles. The identification of the nasal endogenous peptides during infection can help us further understand SARS-CoV-2 pathogenesis and determine suitable detection methods and discover drug targets. Here, we show that the peptidomics workflow is suitable for the identification of peptides in nasal swab saline transport media. We identified endogenous protease cut sites and 14720 endogenous peptides where the top proteins mapped are comprised of polymeric immunoglobulin receptor, actin, statherin, glyceraldehyde-3-phosphate dehydrogenase, thymosin β-4, and histones. We show that SARS-CoV-2 viral peptides were not readily detected and are highly unlikely to be responsible for the accuracy of MALDI based SARS-CoV-2 diagnostics. Further investigation with more samples will be needed, but the methodology proves promising.

Materials and Methods

Collection of Nasal Swab Specimens

Nasal swabs from the anterior nares were collected at the UC Davis Health Emergency Department (ED). The study was approved by the UC Davis Institutional Review Board. A subset of eight samples were selected for peptidomics in which half were positive and half were negative for COVID-19 (Table ). COVID-19 diagnosis was cross-confirmed by United States Food and Drug Administration emergency use authorized molecular tests (digital droplet RT-PCR [Bio-Rad, Hercules, CA], and cobas Liat [Roche Diagnostics, Pleasanton, CA]). The age of participants ranges from 30 to 77 years. None of the patients were vaccinated. Three of the COVID-negative patients (n5, n6, and n9) had reported pre-existing pulmonary disease: two displayed chronic obstructive pulmonary disease (COPD) and one displayed moderate asthma with acute exacerbations.

Table 1

Metadata of the Samples Used

sample	days since symptom onset	age	diagnosis	LIAT result	public health/UCD result	ddPCR result	ddPCRN1 (copies/μL)	ddPCRN2 (copies/μL)
n13		70	intertrochanteric fracture of left femur	negative	control	negative	0	0
n5	1	64	COPD; pulmonary emphysema	negative	control	negative	0	0
n6	4	63	COPD exacerbation	negative	control	negative	0	0
n9	2	31	moderate asthma w/acute exacerbation	negative	control	negative	0	0
p22	1	30		positive	COVID	positive	6.91	7.1
p25	10	77		positive	COVID	positive	8.78	6.25
p30	0	68		positive	COVID	positive	1.14	0.663
p32	2	37		positive	COVID	positive	0.683	0.683

Sample Preparation of Saline Media from Nasal Swabs

Endogenous peptides were processed by taking an aliquot of nasal swab transport media and separating the peptides from the remaining endogenous proteins and other large molecules by molecular weight cutoff using a 30 kDa centrifugal membrane filter (Amicon Ultra 0.5 mL, UFC503024, Sigma-Aldrich). Separated peptides were then assayed using a fluorescent peptide (PN 23290, Thermo Scientific) assay to determine total amount and analyzed by LC-MS/MS.

Liquid Chromatography Tandem Mass Spectrometry

LC peptide separation was done on a Dionex Ultimate RSLC (Thermo Scientific). The digested peptides were reconstituted in 0.1% trifluoroacetic acid, and 10 μL of each sample was loaded onto a PepMap C18 guard column: 100 μm × 2 cm, 5 μm particle size (PN 164564-CMD, Thermo Fisher), where they were desalted online before being separated on a PepMapRSLC C18 analytical column: 75 μm × 25 cm, 2 μm particle size (PN ES902, ThermoFisher). Peptides were eluted using a gradient of 0.1% formic acid (A) and 100% acetonitrile (B) with a flow rate of 300 nL/min. A 120 min gradient was run with 5% to 35% B over 50 min, 35% to 80% B over 3 min, 80% B for 1 min, 80% to 5% B over 1 min, and finally held at 5% B for 5 min. Mass spectra were collected on an Orbitrap Fusion Lumos tribrid mass spectrometer (Thermo Fisher Scientific) in a data-dependent mode (Orbi/Orbi) with one MS precursor scan followed by 15 MS/MS scans. A dynamic exclusion of 35 s was used. MS spectra were acquired with a resolution of 70000 and a target of 1 × 106 ions or a maximum injection time of 20 ms. MS/MS spectra were acquired with a resolution of 17500 and a target of 5 × 104 ions or a maximum injection time of 250 ms. Peptide fragmentation was performed using higher-energy collision dissociation (HCD) with a normalized collision energy (NCE) value of 27. Unassigned charge states as well as +1 and ions greater than +5 were excluded from MS/MS fragmentation.

Data Analysis

Tandem mass spectra were searched using FragPipe, version 16.0 (MSFragger, version 3.3)[16] using the built in peptidomic workflow against combined the UniProt Human reference proteome (UP000005640_9606 20,588 entries), the UniProt Sars-CoV2 proteome (UP000464024, 17 entries), common laboratory contaminants, and an equal number of reverse decoy sequences. The search was performed twice: In one search for peptide identification, the peptide decoy false discovery rate (FDR) was set at 0.01 and protein decoy FDR was left open at 1. The second search was done using a traditional peptide and protein decoy FDR cutoff of 0.01 for protein identification. Output from FragPipe was analyzed using R. The primary outputs of interest from FragPipe used for our analysis are the combined_protein.tsv and the psm.tsv files from all the samples. The total FDR-filtered proteins from all experimental groups, in which each row is a protein group, is reported in from the combined_protein.tsv file. The number of peptides found are from the psm.tsv files. A separate psm.tsv was generated for each experiment and contains the FDR-filtered search results in which each row contains a peptide-spectrum match (PSM). For all files, the nonhuman entries were filtered out. To evaluate if a protein or peptide is present, we used the total number of PSMs with sequences mapping to the selected protein, including shared PSMs (Total Spectral Counts). For the comparative analysis with the DIA study by Mun et al.,[17] we downloaded their supplementary file, pr1c00506_si_003.txt, and pulled the Protein_Accession column for comparisons. With the Human Atlas Protein, we pulled nasopharynx genes (https://www.proteinatlas.org/search/nasopharynx) on October 16, 2021 and used the Protein column for comparisons. The top protein groups were ranked by spectral counts normalized by the length of protein with the highest value being the highest ranked. We selected the combined total spectral count (Combined_Total_Spectral_Count column) for normalization. The normalization reasoning is similar to iBAQ as longer proteins are expected to generate more peptides with proteolysis. This is approximated by dividing the spectral counts by the length of the protein. To generate the cumulative frequency graph, we then sorted the normalized spectral counts in descending order so that the protein with the highest total normalized spectral counts is the top rank. Last, we calculated the cumulative sum and divided each sum by the total. The top peptides were also identified using the same calculation and with spectral counts. For peptides, the spectral counts used are also the combined total because all psm.tsv files were concatenated, and the occurrence of each peptide was counted as a spectral count. To identify potential proteases in the nasopharynx responsible for the endogenous peptides, we looked at peptides with at least one spectral count and generated a sequence motif for the preterminal, N-terminal, C-terminal, and post-terminal amino acids. The sequence motif was generated using the ggseqlogo R package.[18] To look for enriched pathways, we used Reactome and pulled 68 genes (this includes the indistinguishable mapped proteins) corresponding to the top 67 protein groups from the cumulative frequency analysis. The analysis was done on October 20, 2021 (https://reactome.org/userguide/analysis). The analysis included interactors.

Data Availability

All raw data and search results are available at the following repositories: Massive, https://massive.ucsd.edu/ (MSV000088411), and Proteome Exchange, http://proteomecentral.proteomexchange.org/ (PXD029800).

Results and Discussion

Due to the high variability of both peptides and proteins identified between the nasal swabs and the low power of this study (n = 4), we did not test for the differentiation of proteins and peptides between positive and negative cohorts (Figure ).

Figure 1

(A) Number of unique peptides, peptide isoforms, and spectral counts identified per sample. (B) Number of proteins identified per sample. (C) Histogram displaying the range and counts of masses generated per sample. Nevertheless, we identified 14270 endogenous peptides across 1198 protein groups that we hypothesize could be partly responsible for the previously reported MALDI-TOF-MS based screen.[6] Peptides can exist in different isoforms due to post-translational modifications such as N-terminal acetylation and deamidation. These modifications can have real biological significance and can also be introduced during the preparation of samples. With our analysis, we identified 15086 unique peptide isoforms. We identified 96 proteins in which all samples have at least one spectral count and 44 peptides in which all samples have the presence of at least one spectral count. Lowering the threshold in which seven or more samples have the presence of at least one spectral count, we identified 196 proteins. Within confirmed COVID-19 positive and negative samples, we identified 269 protein groups that all positive samples have in common and 105 protein groups that all negative samples have in common (Figure A and 2B). For peptides, we identified 296 common peptides and 65 common peptides within the positive and negative categories, respectively (Figure C,D). We identified three proteins that are uniquely found in the positive samples (ANXA5, CANX, SCFD1) and no proteins unique to negative samples. We identified six peptides unique to the positive samples and one peptide unique to the negative samples.

Figure 2

(A) Proteins in common between samples within the positive category. (B) Proteins in common between samples within the negative category. (C) Peptides in common between samples within the positive category. (D) Peptides in common between samples within the negative category. To identify the peptides and proteins in these samples that were the most highly abundant, we identified 67 protein groups (out of 1198) that had the highest number of peptides (normalized by protein length) and cumulatively account for 75% of the total peptides found in this experiment (Figure A and Supplementary Table 1). Of these protein groups, the top 20 are listed in Table . We also identified 6093 peptides that had the highest number of spectral counts and cumulatively account for 75% of the total peptides (Figure B and Supplementary Table 2). These peptides correspond to 1015 proteins, and the summary of counts can be found in Supplementary Table 3. Of these peptides, the top 20 are listed in Table .

Figure 3

(A) Cumulative saturation graph for protein groups based on total spectral counts. (B) Cumulative saturation graph for peptides identified in our experiment.

Table 2

Top 20 Protein Groups Based on Total Combined Spectral Counts

rank	cumulative frequency	protein ID	gene	protein length	coverage	description	protein probability	top peptide probability	combined total peptides	combined spectral count	combined unique spectral count	combined total spectral count	count mapped proteins
1	0.07	P02808	STATH	62	75.6	statherin	1.00	1.00	181	1121	1121	1121	1
2	0.10	P62807	H2BC4	126	99.2	histone H2B type 1-C/E/F/G/I	0.26	1.00	212	1017	0	1018	1
3	0.13	Q5QNW6	H2BC18	126	99.2	histone H2B type 2-F	1.00	1.00	210	7	7	1017	1
4	0.16	P58876	H2BC5	126	99.2	histone H2B type 1-D	1.00	1.00	209	6	6	1017	1
5	0.20	Q99877	H2BC15	126	99.2	histone H2B type 1-N	1.00	1.00	209	3	3	1014	1
6	0.23	Q99879	H2BC14	126	88.9	histone H2B type 1-M	1.00	1.00	208	3	3	1013	1
7	0.26	Q16778	H2BC21	126	99.2	histone H2B type 2-E	1.00	1.00	207	50	0	993	1
8	0.29	O60814	H2BC12	126	99.2	histone H2B type 1-K	1.00	1.00	200	14	0	987	1
9	0.32	Q99880	H2BC13	126	99.2	histone H2Btype 1-L	1.00	1.00	174	2	2	905	1
10	0.34	P60709	ACTB	375	92.3	actin cytoplasmic 1	1.00	1.00	460	2571	3	2571	1
11	0.36	QBFI13	H2AC18	130	99.2	histone H2A type 2-A	1.00	1.00	174	742	0	742	1
12	0.39	POC0SB	H2AC11	130	99.2	histone H2A type 1	1.00	1.00	167	29	0	721	1
13	0.40	P62328	TMSB4X	44	97.7	thymosin β-4	1.00	1.00	33	220	220	220	1
14	0.42	P62805	H4C1	103	95.1	histone H4	1.00	1.00	142	512	512	512	1
15	0.44	Q93077	H2AC6	130	99.2	histone H2A type 1-C	1.00	1.00	149	159	0	560	1
16	0.46	P01833	PIGR	764	48.4	polymeric immunoglobulin receptor	1.00	1.00	364	3087	3087	3087	1
17	0.47	Q71DI3	H3C15	136	83.8	histone H3.2	1.00	1.00	161	473	13	473	1
18	0.48	P04406	GAPDH	335	97.6	glyceraldehyde-3-phosphate dehydrogenase	1.00	1.00	311	1065	1062	1065	1
19	0.49	Q16695	H3–4	136	86.8	histone H3.1t	1.00	1.00	141	120	120	432	1
20	0.51	Q8IUE6	H2AC21	130	87.7	histone H2A type 2-B	1.00	1.00	122	42	17	393	1

Table 3

Top 20 Peptides Based on Total Spectral Counts

rank	cumulative frequency	peptide	pre-amino acid	post-amino acid	peptide length	calculated peptide mass	protein ID	gene	description	count mapped proteins	combined total spectral count
1	0.002435181	SLAKADAAP	V	A	24	2602.324	P01833	PIGR	polymeric immunoglobulin receptor	1	102
2	0.004106384	AVEERKAAG	V	A	37	3971.035	P01833	PIGR	polymeric immunoglobulin receptor	1	70
3	0.005753712	ADKPDMGEI	M	-	43	4949.518	P63313	TMSB10	thymosin β-10	1	69
4	0.007257795	SLAKADAAP	V	L	30	3282.684	P01833	PIGR	polymeric immunoglobulin receptor	1	63
5	0.008666380	AVADTRDQA	K	L	39	3831.847	P01833	PIGR	polymeric immunoglobulin receptor	1	59
6	0.009955594	PDEKVLDSG	A	A	16	1874.937	P01833	PIGR	polymeric immunoglobulin receptor	1	54
7	0.011125435	DVSLAKADA	R	A	26	2816.419	P01833	PIGR	polymeric immunoglobulin receptor	1	49
8	0.012223655	AVEERKAAG	V	K	36	3842.940	P01833	PIGR	polymeric immunoglobulin receptor	1	46
9	0.013274125	EIENKAIQDP	R	A	30	3371.670	P01833	PIGR	polymeric immunoglobulin receptor	1	44
10	0.014300721	AKADAAPDE	L	A	22	2402.207	P01833	PIGR	polymeric immunoglobulin receptor	1	43
11	0.015303443	LFAEEKAVAI	R	S	38	3881.826	P01833	PIGR	polymeric immunoglobulin receptor	1	42
12	0.016282290	SLAKADAAP	V	K	23	2474.229	P01833	PIGR	polymeric immunoglobulin receptor	1	41
13	0.017261137	VESTGVFTTF	V	S	26	2715.438	P04406	GAPDH	glyceraldehyde-3-phosphate dehydrogenase	1	41
14	0.01B216110	PPAGQPOGI	A		23	2355.231	P04280	PRB1	basic salivary proline-rich protein 1	2	40
15	0.019147209	AIQDPRLFAE	K	A	41	4278.998	P01833	PIGR	polymeric immunoglobulin receptor	1	39
16	0.020054433	ELRVAPEEHI	N	M	30	3438.829	P60709	ACTB	actin, cytoplasmic 1	2	38
17	0.020937784	AIQDPRLFAE	K	L	42	4350.035	P01833	PIGR	polymeric immunoglobulin receptor	1	37
18	0.021797259	AVVVKKIETR	R		24	2596.480	P05787	KRT8	keratin, type II cytoskeletal 8	1	36
19	0.022656735	EERKAAGSR	V	A	35	3800.929	P01833	PIGR	polymeric immunoglobulin receptor	1	36
20	0.023516211	EIENKAIQDP	R	D	23	2642.366	P01833	PIGR	polymeric immunoglobulin receptor	1	36

(A) Cumulative saturation graph for protein groups based on total spectral counts. (B) Cumulative saturation graph for peptides identified in our experiment. Although it is tempting to match the m/z values of the peptides identified in this study with values reported previously in the MALDI-TOF-MS based assay, matching such data would be an educated guess at best. First, there are inherent differences between LC-MS/MS and MALDI-TOF-MS, including ionization, peptide suppression, matrix effects, and the lack of isotopic resolution in the MALDI-TOF-MS due to data smoothing. Second, our LC-MS/MS analysis in this study should be far more sensitive than the MALDI-TOF-MS based assay. However, it is a reasonable hypothesis that peptides identified in this study by LC-MS/MS are responsible for some of the m/z values seen in our previous MALDI-TOF-MS based assay. Comparing the masses (Da) of the peptides identified by LC-MS/MS and m/z values of the MALDI-TOF-MS assay, we found that the ranges overlap but do not align perfectly and the number of masses identified by LC-MS/MS is far greater than the MILO curated number from MALDI-TOF-MS (Figure ). The mass range of peptides identified is between 768.4 and 6941.4 Da (Da), with a mean of 2144.3 Da and a median of 1965.1 Da. The range of masses for each sample can be found in Table . The MALDI-TOF-MS m/z range is between 1992.7 and 16019.0, with a mean of 5601 m/z and a median of 4307 m/z.

Figure 4

Table 4

Distribution of Mass Ranges by Sample

sample	count	min	1st qu.	median	mean	3rd qu.	max
n13	2282	821.44	1611.85	2029.05	2172.07	2563.24	6693.10
n5	359	920.51	1591.81	2032.05	2196.30	2551.77	6394.22
n6	1771	768.39	1554.40	2021.01	2280.75	2627.80	6452.18
n9	4240	792.41	1571.81	1987.62	2170.46	2552.79	6707.25
p22	2401	821.44	1525.75	1906.96	2057.89	2427.32	6824.06
p25	2788	821.44	1499.79	1886.96	2024.19	2381.58	6669.05
p30	5160	818.49	1600.78	2031.03	2173.90	2579.70	6587.68
p32	6860	799.44	1498.83	1924.01	2137.86	2492.29	6941.44

(A) Mass distribution of the 816 MILO predicted masses determined to be significant, ranging from 1992.7 to 16019.0 m/z. (B) Mass distribution of the peptides identified by LC-MS/MS, 14318 unique masses, ranging from 768.4 to 6941.4 Da. Although it is likely that the molecules detected in the MALDI-TOF-MS based assay are composed mostly of human host response proteins and peptides, it does not rule out the possibility that other molecules such as lipids and carbohydrates not detected in this study may be responsible in part for the MALDI-TOF-MS assay’s performance. Of the peptides identified in our study, none corresponded with SARS-CoV-2 viral proteins. In subsequent experiments, viral proteins were detected on nasal swabs using traditional bottom-up proteomics and were relatively low in abundance compared to human host proteins (data not shown). Using a diaPASEF analysis like Mun et al.,[17] viral proteins were 100–1000 times less abundant than the most abundant human host proteins detected (complete data reported in subsequent publication). Bottom-up proteomics assays, where the proteins are digested using a protease and then detected, are far more sensitive than the native peptidomic workflow presented here. This decrease in sensitivity is due mainly to the massively expanded search space of nonenzymatic peptidomic searches when combined with decoy false discovery filtering. The human protein groups identified in this study generally matched the proteins expected to be in the nasopharynx. The Human Protein Atlas lists 365 genes reported to be in the nasopharynx (https://www.proteinatlas.org/search/nasopharynx). Of that, we found 35 proteins (Supplementary Table 4). Compared with a previous bottom-up proteomics analysis of nasal swabs, our results are generally consistent. In a recent DIA-based bottom-up proteome profiling of nasopharyngeal swabs, Mun et al.[17] reported 7674 proteins identified. We analyzed the protein groups from their list of detected proteins using the Spectronaut results from their published repository (PXD025277). From that, we extracted 7805 protein identifications in 7711 protein groups. In this study, 90% of the proteins we identified (1116 of 1245) matched the data in their bottom-up DIA study (Supplementary Table 5). Analyzing the protease cut sites of the peptides, we identified neutrophil elastase (P08246) as a possible protease in the nasopharynx responsible for the endogenous peptides. There is a high number of valines in the preterminal amino acid position, which is a known specificity for this enzyme (Figure A). The peptide coverage of the protease was high, 22.8%, and we found spectral counts for this protein in seven of eight samples. The sample in which the neutrophil elastase was not detected was n5, which is the sample with the lowest number of spectral counts. For this protein, there are 35 combined spectral counts (razor), 34 combined unique spectral counts and 35 combined total spectral counts. The sequence motif between the positive and negative samples do not appear to be significantly different with the top amino acids changing only slightly (Figure B).

Figure 5

Sequence motif for the preterminal (1), N-terminal (2), C-terminal (4), and post-terminal amino acids (5). (A) Sequence motif for all samples. (B) Sequence motif for positive samples. (C) Sequence motif for negative samples. Selecting the genes from the top 67 protein groups in our top protein cumulative frequency analysis (68 genes including the indistinguishable mapped proteins), we looked for enriched pathways using Reactome (Figure ). Of the 68 genes, four were not found. The top five pathways found are involved in DNA methylation, packaging of telomere ends, methylation of histones and DNA by Polycomb Repressive Complex 2 (PRC2), deacetylation of histones by histone deacetylases (HDACs), and nucleosome assembly (complete list available in Supplementary Table 6).

Figure 6

Reacfoam output from Reactome, a holistic view of all the human pathways. The scale to the top right indicates the p-value obtained from over-representation analysis for molecules selected for each pathway result. The top of the scale in yellow is near zero, and the bottom with grayish yellow is the 0.05 threshold.

Conclusions

Using our peptidomic workflow, we identified 14270 endogenous peptides across 1245 protein groups from nasal swab transport media. The proteins mapped to these peptides are primarily polymeric immunoglobulin receptor, actin, statherin, glyceraldehyde-3-phosphate dehydrogenase, thymosin β-4, and histones. Our method identified protease cut sites but was not sensitive enough to detect SARS-CoV-2 viral peptides. Due to the large biological diversity typically seen in studies like this, a larger number of samples will be needed to validate these results. We believe that the result from our methodology is promising and that some of the peptides seen in this limited sample set should be representative of the m/z signals seen in our previous MALDI-TOF assay.

16 in total

Review 1. Applications of MALDI-TOF mass spectrometry in clinical diagnostic microbiology.

Authors: Antony Croxatto; Guy Prod'hom; Gilbert Greub
Journal: FEMS Microbiol Rev Date: 2011-08-22 Impact factor: 16.408

2. Detection of SARS-CoV-2 in nasal swabs using MALDI-MS.

Authors: Fabiane M Nachtigall; Alfredo Pereira; Oleksandra S Trofymchuk; Leonardo S Santos
Journal: Nat Biotechnol Date: 2020-07-30 Impact factor: 54.908

3. ggseqlogo: a versatile R package for drawing sequence logos.

Authors: Omar Wagih
Journal: Bioinformatics Date: 2017-11-15 Impact factor: 6.937

4. Comparison of tear protein levels in breast cancer patients and healthy controls using a de novo proteomic approach.

Authors: Daniel Böhm; Ksenia Keller; Julia Pieter; Nils Boehm; Dominik Wolters; Wulf Siggelkow; Antje Lebrecht; Marcus Schmidt; Heinz Kölbl; Norbert Pfeiffer; Franz-Hermann Grus
Journal: Oncol Rep Date: 2012-06-01 Impact factor: 3.906

5. MALDI-TOF-MS analysis in discovery and identification of serum proteomic patterns of ovarian cancer.

Authors: Agata Swiatly; Agnieszka Horala; Joanna Hajduk; Jan Matysiak; Ewa Nowak-Markwitz; Zenon J Kokot
Journal: BMC Cancer Date: 2017-07-06 Impact factor: 4.430

6. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics.

Authors: Andy T Kong; Felipe V Leprevost; Dmitry M Avtonomov; Dattatreya Mellacheruvu; Alexey I Nesvizhskii
Journal: Nat Methods Date: 2017-04-10 Impact factor: 28.547

7. A rapid and reliable liquid chromatography/mass spectrometry method for SARS-CoV-2 analysis from gargle solutions and saliva.

Authors: Marc Kipping; Dirk Tänzler; Andrea Sinz
Journal: Anal Bioanal Chem Date: 2021-08-24 Impact factor: 4.142

8. Prognostic accuracy of MALDI-TOF mass spectrometric analysis of plasma in COVID-19.

Authors: Lucas Cardoso Lazari; Fabio De Rose Ghilardi; Livia Rosa-Fernandes; Diego M Assis; José Carlos Nicolau; Veronica Feijoli Santiago; Talia Falcão Dalçóquio; Claudia B Angeli; Adriadne Justi Bertolin; Claudio Rf Marinho; Carsten Wrenger; Edison Luiz Durigon; Rinaldo Focaccia Siciliano; Giuseppe Palmisano
Journal: Life Sci Alliance Date: 2021-06-24

9. DIA-Based Proteome Profiling of Nasopharyngeal Swabs from COVID-19 Patients.

Authors: Dong-Gi Mun; Patrick M Vanderboom; Anil K Madugundu; Kishore Garapati; Sandip Chavan; Jane A Peterson; Mayank Saraswat; Akhilesh Pandey
Journal: J Proteome Res Date: 2021-07-22 Impact factor: 4.466