Literature DB >> 33427452

Fast Screening and Primary Diagnosis of COVID-19 by ATR-FT-IR.

Liyang Zhang¹, Meng Xiao², Yao Wang², Siqi Peng¹, Yu Chen², Dong Zhang², Dongheyu Zhang¹, Yuntao Guo¹, Xinxin Wang¹, Haiyun Luo¹, Qun Zhou³, Yingchun Xu².

Abstract

The outbreak of coronavirus disease 2019 (COVID-19) has led to substantial infections and mortality around the world. Fast screening and diagnosis are thus crucial for quick isolation and clinical intervention. In this work, we showed that attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FT-IR) can be a primary diagnostic tool for COVID-19 as a supplement to in-use techniques. It requires only a small volume (∼3 μL) of the serum sample and a shorter detection time (several minutes). The distinct spectral differences and the separability between normal control and COVID-19 were investigated using multivariate and statistical analysis. Results showed that ATR-FT-IR coupled with partial least squares discriminant analysis was effective to differentiate COVID-19 from normal controls and some common respiratory viral infections or inflammation, with the area under the receiver operating characteristic curve (AUROC) of 0.9561 (95% CI: 0.9071-0.9774). Several serum constituents including, but not just, antibodies and serum phospholipids could be reflected on the infrared spectra, serving as "chemical fingerprints" and accounting for good model performances.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 33427452 PMCID： PMC7805601 DOI： 10.1021/acs.analchem.0c04049

Source DB: PubMed Journal: Anal Chem ISSN： 0003-2700 Impact factor: 6.986

Introduction

Coronavirus disease 2019 (COVID-19) is a pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), a newly appearing coronavirus which has spread over the world and led to substantial infections and mortality.[1] Reverse transcription polymerase chain reaction (RT-PCR) is a conventional and standard assay for viral diagnosis and has been widely used for SARS-CoV-2 RNA detection. SARS-CoV-2 RNA can be detected in both upper and lower respiratory specimens including nasal swab, oropharyngeal swab, sputum, and bronchoalveolar lavage fluid (BALF).[2,3] Despite BALF, which is not a requisite for COVID-19 diagnosis because of the harder sampling, the sputum was reported to have the highest positive rate (74.4–88.9%), followed by nasal swabs (53.6–73.3%) during the first 14 days after onset (d.a.o.).[3] The positive rate for throat swab was reported to be around 60%.[3,4] Viral RNA is also detected in serum samples with a percentage of 0% (0/31),[5] 8% (1/12),[6] or 15% (6/41).[7] Notably, several factors may influence the performance of RT-PCR such as improper sample preparation or varied qualities of detection kits and thus lead to high false-positive rates. In addition, viral replication is inhibited in the late stage of infection, accounting for the high false-negative rates in this stage. Also, it is time-consuming to perform the whole test procedures. Serological assay based on immunoglobulin-G (IgG) and IgM levels can serve as a complement to nucleic acid detection.[8,9] The median time of IgM and IgG seroconversion was reported to be 5 (n = 41) and 14 (n = 208) days after onset, respectively.[10] The combination of IgM and IgG tests yielded a higher detection sensitivity of 88.66% and specificity of 90.63% (397 PCR confirmed patients and 128 negative patients in total) than a single IgG or IgM test.[11] Additionally, a higher positive detection rate of 99.4% (n = 173, 95% CI 96.8–100%) was achieved when applying both antibody and nucleic acid tests, compared to a single RNA test of 67.1% (95% CI 59.4–74.1%).[12] Nevertheless, there remain some problems unclear such as the antibody responses of COVID-19 patients, the potential false positive caused by immunological cross reactivity, and the varied performances of commercially available detection kits. Rapid and reliable diagnosis of COVID-19 is of great significance to help screen the COVID-19 patients and deliver more appropriate treatment. In the last decade, transmission or attenuated total reflection (ATR) Fourier transform infrared spectroscopy (FT-IR) and Raman spectroscopy have been utilized to identify viral infections or predict viral load in blood,[13] sera,[13−15] plasma,[16] or infected cells,[17,18] differentiate different viral infections,[19] and verify the infectious agent type (as bacterial or viral) based on the white blood cell data.[20] Subtle molecular and chemical changes in blood components in response to bacterial or viral infections can be recorded and reflected by the infrared spectra. For example, the strong band at 1631 cm–1 attributed to the β-pleated sheet protein marker of Ig is unique to the positive serum spectra induced by hepatitis B and C virus.[14] In comparison with other assays, infrared spectroscopy enables us to inspect almost all biological components at once, which may be beneficial to COVID-19 diagnosis. Additionally, it is easier to perform and takes less operation time (typically for several minutes). In this work, we showed the feasibility of ATR–FT-IR in COVID-19 screening and primary diagnosis. The spectral differences between COVID-19 and healthy controls and the potential spectral markers were identified by multivariate and statistical analysis. For the purpose of the performance test, especially the specificity of the proposed model, healthy controls and some common respiratory viral infections or inflammation were considered.

Materials and Methods

Participants

We collected a total of 115 blood samples from 20 healthy donors and 76 patients, of which 41 were confirmed with COVID-19, 15 had respiratory viral infections caused by influenza A/B or respiratory syncytial virus (RSV), and 20 were with inflammation-related diseases (Table ). Influenza A/B and RSV are chosen because they are common in respiratory infections and share similar flu-like symptoms[21,22] to COVID-19. Based on a report regarding 1099 patients, the most common symptoms of COVID-19 are fever (43.8% on admission and 88.7% during hospitalization) and cough (67.8%).[23] We know that inflammation-related diseases can also contribute to alterations in serum components (see Discussion). Here, respiratory bacterial infections, pulmonary infection, intra-abdominal infection, bacteremia, and some other diseases were enrolled in the study (Table S1). We aim to investigate whether and how the serum infrared spectra are specific for COVID-19.

Table 1

Number of Participants, Specimens, and the Measured Spectra in This Study

	participants (n = 96)	specimens (n = 115)	spectra (n = 289)
COVID-19 (cohort 1)	35	35	59
COVID-19 (cohort 2)	6	6	18
Influenza A	4	8	23
Influenza B	2	2	5
RSV	9	24	73
inflammationa	20	20	60
control	20	20	51

Details are presented in Table S1.

Details are presented in Table S1. The COVID-19 patients were from two cohorts, all of whom were diagnosed by reverse transcriptase PCR (RT-PCR) following China national guidelines for diagnosis and treatment of Corona Virus Disease 2019 (COVID-19) (trial version 5, revised).[24] The first cohort included 35 critically ill COVID-19 patients who were admitted from Feb 15 to March 30 at the Sino-French New City Branch of Tongji Hospital in Wuhan. Of the 35 patients, one was with <7 days postsymptom onset and one was within 7–14 days, while 33 were with >14 days. In the second cohort collected in the Peking Union Medical College Hospital, five confirmed cases were with >14 days after onset, while one was with 2 days. All of them were with mild symptoms. Clinical residual blood samples (stored at −80 °C) from influenza A, influenza B, or respiratory syncytial virus (RSV)-infected patients admitted from Feb 16 to March 1, 2020, were revived. The viral nucleic acids were determined by X’Pert Xpress Flu/RSV (Cepheid AB, Sweden). Of note, for each patient with influenza A or RSV infections, two to three blood specimens from different infection stages before recovery were collected to enrich and generalize the data set. Patients with other diseases were diagnosed using standard clinical methods and proved to be without any SARS-CoV-2 infection (Table S1). This study was approved by the Ethics Committees from Tsinghua University and Peking Union Medical College Hospital.

Sample Preparation and ATR–FTIR Spectroscopy

The serum samples were obtained by blood centrifugation under 5000 rpm for 5 min. Prior to measurements, the serum specimens were incubated at 56 °C for 30 min to inactivate the potential pathogens. Each specimen was measured one to three times on a PerkinElmer infrared spectrometer coupled with a diamond ATR accessory at a resolution of 4 cm–1. Sixteen scans were accumulated per spectrum. For each measurement, an aliquot of 3 or 4 μL serum sample was transferred onto the ATR crystal and allowed to dry under mild airflow at room temperature. Water absorption was monitored by OH stretching at around 3300 cm–1 and bending at around 1635 cm–1. It took about 3–5 min for samples to be sufficiently dried. Afterward, the spectra between 4000 and 600 cm–1 were collected. The spectral background was recorded separately for each sample to achieve a higher signal-to-noise ratio. Prior to data analysis, the raw spectral data were preprocessed with baseline calibration using the built-in rubber-band algorithm on the PerkinElmer spectrum (version 10.5.4). The corresponding constant baseline drift at 4000 cm–1 was subtracted for each sample spectrum, and then, all the spectra were normalized to amide I by peak absorbance. The second-derivative infrared (SD-IR) spectra were calculated from the normalized spectra.

Multivariate and Statistical Analysis

All data analyses were performed on MATLAB (version R2020b, Mathworks, Natick, U.S.A). The second-derivative infrared (SD-IR) spectra were calculated by the 13-point Savitzky–Golay algorithm using in-house protocols. Analysis of variance (ANOVA) is a well-established method to evaluate the difference of the observed statistic(s) among groups by calculating the F-statistic (i.e., the ratio of between-group variance to within-group variance).[25] The larger F-statistic yields a lower p value, indicating that the groups are more likely to be different in this statistic. In this work, the p value was used to evaluate the statistical spectral differences of major bands in absorbances or band locations of healthy controls and COVID-19 patients. Hence, the potential spectral markers can be primarily identified. Hierarchical cluster analysis (HCA) and principal component analysis (PCA) are widely used unsupervised approaches in microorganism differentiation.[26−28] In HCA, the spectral distance (usually Euclidian distance) is calculated to construct the hierarchical structure, in which root nodes are formed by spectra with the shortest distances, while leaf nodes are formed by the longest distances. PCA aims to reduce the dimensions of the original matrix data in terms of maximum variance.[26,29] In the new vector space formed by principal components (PC), data distribution is more apparent. Here, the natural separability of normal controls and nonsevere and severe COVID-19 patients was tested by the two methods. Partial least squares discriminant analysis (PLS-DA) is a supervised chemometric technique whereby the so-called latent variables are successively extracted to find the maximum correlation between the X-matrix and Y-matrix.[14,30] In the present study, the data set was divided into three groups: normal controls, COVID-19, and infections caused by other diseases. PLS1 and PLS2 models[31] were applied for two-group and three-group classification, respectively. Specifically, in the case of PLS1, the two classes were coded as [0] and [1], whereas the three classes were coded as [1 0 0], [0 1 0], and [1 0 0] in the PLS2 model. In comparison with PCA, PLS-DA is more suitable to identify the significant wavenumbers for discrimination by inspecting the regression vector[14] or the variable importance in projection (VIP).[32]

Results

COVID-19 Vs Controls: Band Assignments, Spectral Differences, and Spectral Interpretation

It is known that serum is composed of proteins, cholesterol, glucose, urea, triglycerides, and other more dilute compounds, all of which can be recorded in the spectra, whereas only components with higher abundance could be identified in the spectra and provide insightful information. Reports show that a wide range of abundant biomolecules in plasma can be quantified using FTIR.[33] To better illustrate the spectral features, we summarized the normal levels of the major serum components (Table S2). Averaged spectra minimize the influence of individual differences and thus are more representative. As for the original spectra, all bands had significant frequency shifts in the COVID-19 group (p < 0.001, Figure a and Table ), indicating the notable changes in protein and lipid conformation. Bands related to proteins include amide A (N–H stretching vibration), amide I (C=O stretching), amide II (C–N stretch and N–H bending), and amide III. As seen from Table , the relative absorbance values of both amide A and amide II showed significant differences between COVID-19 and the control group (p < 0.001), denoting the alterations in protein concentrations.

Figure 1

Table 2

Normalized Spectra: Band Assignments and Statistical Comparisons between COVID-19 and Control Spectra in Band Locations and Relative Absorbancea,d,e,f

	band locations/(cm^–1)			relative absorbance (a.u.)
assignments[19,27]	control	COVID-19	p	control	COVID-19	p	changes (%)b
amide A	3285.65 (1.04)	3282.75 (1.19)	***	0.430 (0.028)	0.500 (0.049)	***	+16.3
ν_as(CH₃)	2958.31 (0.55)	2958.99 (0.83)	***	0.215 (0.007)	0.217 (0.008)
ν_as(CH₂)	2930.76 (1.05)	2929.71 (1.55)	***	0.220 (0.010)	0.225 (0.015)	*	+2.3
ν_s(CH₃)	2872.61 (0.49)	2873.14 (0.76)	***	0.150 (0.006)	0.151 (0.008)
amide I	1642.27 (2.28)	1636.84 (2.24)	***	1 (0)c	1 (0)c
amide II	1537.18 (1.05)	1538.19 (0.95)	***	0.878 (0.03)	0.857 (0.025)	***	–2.4
δ(CH₂)	1453.33 (0.62)	1452.9 (0.7)	***	0.342 (0.021)	0.339 (0.014)
ν_s(COO^–)	1397.53 (1.05)	1398.22 (0.68)	***	0.386 (0.024)	0.380 (0.016)
amide III	1308.06 (1.7)	1311.19 (1.14)	***	0.280 (0.021)	0.279 (0.022)
ν_as(PO₂^–)	1242.29 (0.81)	1241.62 (1.01)	***	0.264 (0.023)	0.271 (0.016)
ν_s(C–O–C)	1170.12 (0.33)	1167.6 (2.85)	***	0.154 (0.016)	0.151 (0.010)
ν_s(PO₂^–)	1079.63 (0.77)	1077.36 (1.02)	***	0.162 (0.017)	0.193 (0.025)	***	+19.1

νs: symmetric stretching vibrations; νas: asymmetric stretching vibrations; δ: bending vibrations. Data are in mean (standard deviation) or %.

Increases in average band absorbance compared to controls.

All the spectra were normalized to amide I.

Statistical differences were compared between COVID-19 and the control group using one-way ANOVA.

*p < 0.05. **p < 0.01. ***p < 0.001.

See more in Figure S2.

Spectral profiles of COVID-19 and control serum samples and some abundant constituents in human serum. (a) Averaged original spectra of COVID-19 and control samples. Spectra were normalized to amide I. The principal absorbance bands were annotated. (b) Averaged SD-IR spectra of COVID-19 and control samples and the most abundant constituents in human serum. Bands with notable variations were annotated (see also Table S3). Lysolecithin, sphingomyelin, lecithin, and human IgG (Bioss, Beijing, China, purified by Protein-A affinity chromatography) were in dried powder form, whereas human serum albumin was buffered with water. The band located at 981 cm–1 in the human IgG spectrum may arise from impurities. νs: symmetric stretching vibrations; νas: asymmetric stretching vibrations; δ: bending vibrations. Data are in mean (standard deviation) or %. Increases in average band absorbance compared to controls. All the spectra were normalized to amide I. Statistical differences were compared between COVID-19 and the control group using one-way ANOVA. *p < 0.05. **p < 0.01. ***p < 0.001. See more in Figure S2. We found that the amide I band of the COVID-19 group had a significant red shift of roughly 6 cm–1 compared to the control (p < 0.001, Figure a), suggesting the transitions in protein secondary structures. This is further illustrated by the SD-IR spectra (Figures b, S3, and Table S3). The band centered at 1651 cm–1 from the α-helix structure was significantly lower (p < 0.001), while the band centered at 1632 cm–1 attributed to the β-pleated sheet structure[26] was significantly higher (p < 0.001) in the COVID-19 group, responsible for the notable frequency shift of amide I in the original spectra. Human serum albumin (HSA) (about 70% of serum by weight), immunoglobulin (IgG: 14%), transferrin (5.7%), and α-antitrypsin (0.7%) are among the most abundant serum proteins (Table S2).[35] HSA is predominated by α-helix and Ig by the β-sheet (also see Figure b).[36−38] Hence, bands centered at 1651 and 1632 cm–1 principally provide hints toward HSA and Ig in human sera, respectively (see more in the Discussion section). The peak at 1397 cm–1 is associated with COO– symmetric stretching mainly from aspartic acids and glutamic acids in protein side chains.[34] No significant change was observed in the absorbance of this band in the COVID-19 group, despite the slight band shift. Bands centered at 1242 and 1078 cm–1 correspond to asymmetric and symmetric stretches from PO2– groups, respectively.[34] Compared to controls, there was no significant difference in the relative absorbance of the former band in COVID-19 sample spectra (p > 0.05), whereas a significant increase (by 19.1%) was found in the latter band (p < 0.001, Figure a and Table ). The PO2– functional group may arise from nucleic acids or phospholipids in serum. However, nucleic acids such as cell-free circulating DNA (cirDNA)[21] or SARS-CoV-2 RNA[5] exhibit limited abundance in human serum. Hence, increases in the band centered at 1078 cm–1 can be attributed to phospholipids. It is known that lecithin, sphingomyelin, and lysolecithin are the three major phospholipids in normal human serum, accounting for 95% on a weight percent basis (see also Table S1).[39,40] In the mild, severe, and fatal COVID-19 patients, sphingolipids and lysolecithin (12:0/0:0) were observed with a 3.5–5.5-fold increase (log2 fold change: 1.8–2.5), whereas lecithin decreases by 50%–80% (−2.5 < log2 fold change < −1.0).[41] Considering sphingolipids to be the most abundant phospholipids in serum, the stronger absorption in the band centered at 1078 cm–1 may reflect the higher total phospholipid content in COVID-19 patients. This finding was further validated by the SD-IR spectra shown in Figure b, where the νs(PO2–) absorption band was found at 1086 and 1083 cm–1 in the SD-IR spectra of sphingomyelin and lysolecithin, respectively. This slight band shift from 1079 cm–1 may result from sample states as the purchased samples were in dried powder form. The bands centered at 1170 cm–1 in the control group and 1166 cm–1 in COVID-19 sample spectra originate from ester C–O–C asymmetric stretching of phospholipids, triglycerides, and cholesterol esters.[42] No significant absorbance alterations were observed. Nevertheless, we found that this band together with the bands centered at 2925 and 2853 cm–1 in the SD-IR spectra had a significant blue shift in the absorption frequency (Figure b and Table S3), suggesting the potential conformational changes in lipid components.

Differentiate COVID-19 Patients from Normal Controls

PCA was first performed on the SD-IR spectra to reduce the dimension of infrared spectral space. It requires the first five and ten components to account for 90% of the total variance in the data ranging from 1600 to 1700 cm–1 and 1000 to 1700 cm–1, respectively. Nevertheless, the score plot with respect to the first two PCs provides meaningful information (Figure ). Obvious transitions from normal controls to nonsevere COVID-19 patients and to the severe ones were observed (Figure a). Overlapping was mitigated when applying the spectral range of 1000–1700 cm–1 (Figure b). This implies that other serum components in addition to proteins may play a role in differentiating between healthy controls and COVID-19 patients and evaluating the illness severities. Two patients (NS348 and NS457) in the nonsevere group and one patient (S55) in the severe group were found to be closer to normal controls. It is noteworthy that patient NS348 was in the early stage of illness with 2 days after onset, while NS457 was an asymptomatic patient who was validated with at least >14 d.a.o. S55 was a critically ill patient but with time after onset of <7 days when sampled. Note that two components explain about 60% and 55% of the total variance when applying a spectral window of 1600–1700 or 1000–1700 cm–1, respectively. For comparison with the 2-PC results, the score plot regarding three PCs is shown in Figure S4.

Figure 2

Discrimination among normal controls and nonsevere and severe COVID-19 patients with unsupervised methods. (a,b) PCA score plot using spectral ranges of 1600–1700 cm–1 and 1000–1700 cm–1, respectively. Spectra from three patients are marked with arrows. NS = nonsevere. S = severe. (c) Hierarchical cluster analysis (1000–1700 cm–1). The windows corresponding to the three groups are filled with different colors. 1 = normal controls; 2 = nonsevere COVID-19 patients; 3 = severe COVID-19 patients. We further applied HCA to classify the groups using the shortest distance method and Euclidian distance based on the spectral region of 1000–1700 cm–1. Most of the severe patients were clustered together except six spectra from patients S57, S14, S24, and S55 (Figure c). Among the spectra from nonsevere patients, six of them from patients NS348 and NS457 were mixed with normal controls, whereas the remaining was separated well with others (Figure c). Results of the two unsupervised methods show the potential feasibility of FT-IR to discern COVID-19 patients in different severities, yet supervised learning methods with the labeled data set may provide more insightful information. PLS-DA is commonly used in chemometrics for its ability to handle multivariate data and high model interpretability.[31] We first performed a PLS-DA analysis using SD-IR spectra in the region between 1000 and 1700 cm–1. The first latent variable (LV, i.e., score) was extracted under leave-one-out cross validation with an R2Y value of 0.793 and a Q2 value of 0.790. Regularly, Q2 > 0.5 is admitted for good model predictability.[43] The regression vector of variables (loadings for LV1) and the corresponding variable importance in the projection (VIP) values are presented in Figure a. VIP is a parameter to evaluate the influence of individual X-variables on the model and is preferred if the value is larger than one.[32,44] Clearly, the most discriminatory bands are located at 1655, 1625, 1557, 1506, 1074, and 1035 cm–1. This finding is consistent with the SD-IR spectra shown in Figure b.

Figure 3

Results of the PLS-DA model for classification between COVID-19 and control groups. (a) Regression vector with respect to the spectral region of 900–1700 cm–1. VIP values in different ranges are shown in different colors. VIP = variable influence on projection. (b) ROC plot of COVID-19 samples (outer) and the model output (inner). Two threshold values corresponding to two points in the ROC curve were selected. AUC = area under curve. CI = confidence interval. In the PLS-DA model, normal controls and COVID-19 samples were labeled “0” and “1”, respectively. The receiver operating characteristic (ROC) curve was generated by disturbing the decision threshold. The area under the curve (AUC) value of the model was 0.9947 (95% CI 0.9769–0.9985). From the model output shown in Figure b, we observed a distinct separation between the two classes, despite six sample spectra which were from patient NS348 and NS457 mixed with controls, that is, when screening COVID-19 patients, especially the ones with moderate symptoms, such a simple model based on a single latent variable may not be effective enough. Nevertheless, SD-IR spectra coupled with PLS-DA analysis achieve high sensitivity and specificity when choosing proper decision thresholds, at least for the training data in this work.

Differentiate COVID-19 from Normal Controls, Respiratory Viral Infections, and Inflammation-Related Diseases

Patients with influenza A/influenza B/RSV infections or common inflammatory diseases were enrolled in as interferences to further assess the performance of the proposed method. We first applied a database searching algorithm based on Pearson’s correlation coefficient.[45] Despite the high sensitivity for the controls, the overall performance is limited (Table S4). PCA was not satisfactory either (not shown), three PCs explaining about 50–55% of the data variance when using different spectral windows. The abovementioned results highlight the risks of misclassification using unsupervised methods. Then, a triple-class PLS-DA model was established to differentiate the following three groups: normal controls, COVID-19, and other diseases. The spectral range between 900 and 1700 cm–1 achieved the best results. Three most frequent cross-validation (CV) methods, namely, leave-one-out (LOO), 7-fold, and 10-fold were used to determine the optimum number of latent variables by means of the prediction error rate.[30,46] When applying 7-fold or 10-fold CV, the data were randomly rearranged, which was repeated 100 times to obtain statistical results. Clearly, three methods achieved consistent prediction error rates, indicating good model predictability (Figure a). In view of the model complexity, finally, five LVs was selected. Fewer variables are not adequate, whereas more variables contribute little to the prediction capability.

Figure 4

PLS-DA model performances for the triple-class classification. (a) Prediction error rates as a function of the number of latent variables. For 7-fold and 10-fold cross-validation, the error bars were presented. (b) ROC graphs for each group. The inner graph shows the model predicted output of the COVID-19 class. The decision threshold values of 1 and 2 are 0.288 and 0.383, respectively. (c) VIP scores for each class. Significant peaks were labeled. The PLS-DA model with five LVs using the whole dataset was then applied to produce the ROC curves (Figure b). Here, the model output is a matrix of n × 3, each column corresponding to one class. By disturbing the decision threshold, we got one ROC curve for each class. This helps illustrate how one class is differentiated from others. As shown in Figure b, the control group achieves the highest AUC value of 0.9994 (95% CI: 0.9970–1.0000), which suggests that normal controls are nearly not likely to be identified as patients by the model and vice versa. The AUC values for COVID-19 and other diseases are 0.9561 (95% CI: 0.9071–0.9774) and 0.9588 (95% CI: 0.9327–0.9752), respectively. The sensitivity and specificity of COVID-19 identification can be adjusted by modifying the decision thresholds. For instance, when it was set as 0.288 (threshold 1, Figure b), a high sensitivity of 87% was achieved, whereas a high specificity of 98% was fulfilled when it was set as 0.383 (threshold 2, Figure b, sensitivity: 83.1%). Overall, threshold 1 is more acceptable. We found that nonsevere patients NS348 and NS457 were still mixed with others (Figure b, inner graph). Finally, VIP scores were calculated to inspect the most significant peaks for classification. As indicated from Figure c, bands in spectral ranges of 1450–1650 and 1050–1100 cm–1 should be paid attention to. Of a simple note, inflammatory markers may not be likely to be distinguished by infrared spectroscopy because of their relatively low levels (<1 mg/mL, or even lower than ng/mL, Table S2). We may conclude that the discriminatory bands may arise from subtle changes in proteins or lipids or other constituents in serum.

Discussion

The selection of biospecimens is necessary to be discussed. In this study, we chose serum because of its relatively stable components, thus minimizing the individual differences caused by disease-irrelevant factors. However, other types of specimens also deserve trying, for example, white blood cells. COVID-19 patients, especially for severe cases, are reported with significantly lower total lymphocytes, CD4+ T cells, CD8+ T cells, B cells, and NK cells (P < 0.001),[47] as well as reduced percentages of monocytes, eosinophils, and basophils.[48,49] Virus-containing samples such as nasal and throat swabs may be feasible too. As for ATR–FT-IR measurements, several factors may influence the qualities of the spectra. For instance, hemolysis caused by improper sampling might lead to spectral distortion and impede correct identification (data not shown). It is known that unfolding, conformational changes, and denaturation may happen in proteins when exposed to elevated temperatures.[50,51] For the samples collected, preheating at 56 °C for 30 min before measurement contributes to minor spectral differences in most cases, whereas it may lead to relatively large variations in very few cases. The influence of the thermal factor on serum components and the corresponding infrared spectra is worthwhile to further investigate. Nevertheless, the IgG secondary structure seems to be stable within temperatures between 20 and 55 °C.[51] Of another note, when considering the practical operation, protein coagulation induced by heat may make it harder for uniform sampling. Immune responses to infection can contribute to alterations in serum constituents, and several biomarkers have been suggested to identify certain infections. For example, procalcitonin (PCT), circulating cytokines (interferon [IL]-1β, IL-6, IL-18, etc.), and acute-phase proteins (C-reactive protein [CRP], ferritin, etc.) have been used to differentiate between bacterial and viral infection.[52,53] However, it is not always an easy task to find specific markers for a given viral infection. Over the past several months, proteomic and metabolomic profiles of peripheral blood samples have been investigated to correlate disease severity of COVID-19 with certain proteins and lipids.[54,55] Details can be found elsewhere. We know that FT-IR also provides hints toward proteins, lipids, and other constituents in sera, but in a more comprehensive and macroscopic view. Provided that human serum is dominated by proteins and proteins are dominated by albumin and Ig (Table S2, see also Figure b), we can postulate that alterations in the amide I are closely related to the two types of proteins. The increases in Ig levels and decreases in albumin levels in the sera of patients with COVID-19, revealed by the SD-IR spectra in Figure b, have been proved by a lot of reports. The IgG level in the patient sera elevates several times after the onset of COVID-19.[56] A report by Long et al. showed that 19.5% (8/41) of the patients with COVID-19 have a fourfold increase in the IgG titers.[9] The decreased albumin level was also reported by several reports.[54,57] The reduction in the albumin level is related to the acute phase response (APR) of patients to viral infection. The hepatic synthesis of proteins is drastically regulated during the acute phase of illnesses such as infection, tissue injury, neoplastic growth, or immunological disorders, usually with increased C-reactive protein (CRP) and serum amyloid A and decreased transferrin and albumin.[58] The dysregulation of phospholipids in patients with COVID-19 has also been indicated by the SD-IR spectra shown in Figure b, which has been discussed earlier (Results, part one). More detailed interpretation of the infrared spectra and the corresponding spectral differences still lack a lot of knowledge. The multivariate and statistical analysis provided chances for the evaluation of COVID-19 diagnosis by infrared spectroscopy. In summary, the second-derivative spectrum can increase the separation of the overlapping bands and thus is more powerful than the original spectrum. We show that both unsupervised and supervised methods are promising to differentiate COVID-19 patients from healthy controls. Nonsevere and severe patients can be separated by unsupervised methods such as PCA and HCA. However, as indicated by this work, asymptomatic patients or newly diagnosed patients with low IgG levels may not be correctly identified. In this case, the decision threshold of PLS-DA should be decided with caution. Lower threshold values yield higher sensitivity but lower specificity. For the purpose of diagnosis, which is no doubt tougher than the only discrimination between COVID-19 and controls, common respiratory viral infections or inflammation were considered as interferences. Unsupervised methods were not so effective, whereas the PLS-DA model yielded good performances. For a given threshold, the model achieved a sensitivity of 83.1% and specificity of 98%. Bands in spectral ranges of 1450–1650 and 1050–1100 cm–1 were the most responsible for the model discrimination capability. Further illustration is required in future work.

Conclusions

Taking the time cost, operation complexity, and detection performance into consideration, FTIR coupled with multivariate analysis is a feasible tool for screening and primary diagnosis of COVID-19. Several serum constituents including, but not just, antibodies and serum lipids could be reflected on the infrared spectra, serving as “chemical fingerprints” and accounting for good model performances. It was also showed that this assay exhibited high identification specificity for COVID-19 under the intervention from common respiratory viruses and inflammation-related diseases. In terms of practical use, more clinical data are required to generalize the model, and special attention should be paid to the asymptomatic patients. Nevertheless, infrared spectroscopy can serve as an assistant diagnosis tool as a supplement to in-use techniques. The feasibility of other common specimens such as oropharyngeal swabs, identification and interpretation of the spectral markers, the potentials of FTIR spectroscopy on antibody response monitoring, and COVID-19 severity prediction require further investigations.

11 in total

1. Diagnosing COVID-19 in human sera with detected immunoglobulins IgM and IgG by means of Raman spectroscopy.

Authors: Ana Cristina Castro Goulart; Renato Amaro Zângaro; Henrique Cunha Carvalho; Landulfo Silveira
Journal: J Raman Spectrosc Date: 2021-08-19 Impact factor: 2.727

Review 2. Novel photonic methods for diagnosis of SARS-CoV-2 infection.

Authors: Naveen Joshi; Shubhangi Shukla; Roger J Narayan
Journal: Transl Biophotonics Date: 2022-03-15

3. Tear-Based Vibrational Spectroscopy Applied to Amyotrophic Lateral Sclerosis.

Authors: Diletta Ami; Alessandro Duse; Paolo Mereghetti; Federica Cozza; Francesca Ambrosio; Erika Ponzini; Rita Grandori; Christian Lunetta; Silvia Tavazzi; Fabio Pezzoli; Antonino Natalello
Journal: Anal Chem Date: 2021-12-14 Impact factor: 6.986

4. Attenuated Total Reflectance-Fourier Transform Infrared (ATR-FTIR) Spectroscopy Discriminates the Elderly with a Low and High Percentage of Pathogenic CD4+ T Cells.

Authors: Rian Ka Praja; Molin Wongwattanakul; Patcharaporn Tippayawat; Wisitsak Phoksawat; Amonrat Jumnainsong; Kanda Sornkayasit; Chanvit Leelayuwat
Journal: Cells Date: 2022-01-28 Impact factor: 6.600

Review 5. Spectroscopy: a versatile sensing tool for cost-effective and rapid detection of novel coronavirus (COVID-19).

Authors: Ujjal Kumar Sur; Chittaranjan Santra
Journal: Emergent Mater Date: 2022-02-28

Review 6. Current trends in COVID-19 diagnosis and its new variants in physiological fluids: Surface antigens, antibodies, nucleic acids, and RNA sequencing.

Authors: Menna Mostafa; Ahmed Barhoum; Ekin Sehit; Hossam Gewaid; Eslam Mostafa; Mohamed M Omran; Mohga S Abdalla; Fatehy M Abdel-Haleem; Zeynep Altintas; Robert J Forster
Journal: Trends Analyt Chem Date: 2022-08-30 Impact factor: 14.908

7. Potential of ATR-FTIR-Chemometrics in Covid-19: Disease Recognition.

Authors: Octavio Calvo-Gomez; Hiram Calvo; Leticia Cedillo-Barrón; Héctor Vivanco-Cid; Juan Manuel Alvarado-Orozco; David Andrés Fernandez-Benavides; Lourdes Arriaga-Pizano; Eduardo Ferat-Osorio; Juan Carlos Anda-Garay; Constantino López-Macias; Mercedes G López
Journal: ACS Omega Date: 2022-08-25

Review 8. Spectroscopic methods for COVID-19 detection and early diagnosis.

Authors: Alaa Bedair; Kamal Okasha; Fotouh R Mansour
Journal: Virol J Date: 2022-09-22 Impact factor: 5.913

9. Micro-Fourier-transform infrared reflectance spectroscopy as tool for probing IgG glycosylation in COVID-19 patients.

Authors: Carla Carolina Silva Bandeira; Karen Cristina Rolim Madureira; Meire Bocoli Rossi; Juliana Failde Gallo; Ana Paula Marques Aguirra da Silva; Vilanilse Lopes Torres; Vinicius Alves de Lima; Norival Kesper Júnior; Janete Dias Almeida; Rodrigo Melim Zerbinati; Paulo Henrique Braz-Silva; José Angelo Lauletta Lindoso; Herculano da Silva Martinho
Journal: Sci Rep Date: 2022-03-11 Impact factor: 4.379

10. An infrared spectral biomarker accurately predicts neurodegenerative disease class in the absence of overt symptoms.

Authors: Lila Lovergne; Dhruba Ghosh; Renaud Schuck; Aris A Polyzos; Andrew D Chen; Michael C Martin; Edward S Barnard; James B Brown; Cynthia T McMurray
Journal: Sci Rep Date: 2021-08-02 Impact factor: 4.379