| Literature DB >> 34009691 |
Xin Gao1,2, Yuan Liu1,2, Shaohui Zou1,2, Pengqin Liu3,4, Jing Zhao1,2, Changshun Yang1,2, Mingxing Liang1,2, Jinlian Yang1,2.
Abstract
Coronavirus disease 2019 (COVID-19) is a global epidemic disease caused by a novel virus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), causing serious adverse effects on human health. In this study, we obtained a blood leukocytes sequencing data set of COVID-19 patients from the GEO database and obtained differentially expressed genes (DEGs). We further analyzed these DEGs by protein-protein interaction analysis and Gene Ontology enrichment analysis and identified the DEGs closely related to SARS-CoV-2 infection. Then, we constructed a six-gene model (comprising IFIT3, OASL, USP18, XAF1, IFI27, and EPSTI1) by logistic regression analysis and calculated the area under the ROC curve (AUC) for the diagnosis of COVID-19. The AUC values of the training group, testing group, and entire group were 0.930, 0.914, and 0.921, respectively. The six genes were highly expressed in patients with COVID-19 and positively correlated with the expression of SARS-CoV-2 invasion-related genes (ACE2, TMPRSS2, CTSB, and CTSL). The risk score calculated by this model was also positively correlated with the expression of TMPRSS2, CTSB, and CTSL, indicating that the six genes were closely related to SARS-CoV-2 infection. In conclusion, we comprehensively analyzed the functions of DEGs in the blood leukocytes of patients with COVID-19 and constructed a six-gene model that may contribute to the development of new diagnostic and therapeutic ideas for COVID-19. Moreover, these six genes may be therapeutic targets for COVID-19.Entities:
Keywords: COVID-19; SARS-CoV-2; bioinformatics; diagnosis; leukocyte
Mesh:
Substances:
Year: 2021 PMID: 34009691 PMCID: PMC8242610 DOI: 10.1002/jmv.27093
Source DB: PubMed Journal: J Med Virol ISSN: 0146-6615 Impact factor: 20.693
Overall information about the data sets used in this study
| Data sets | Platform | Sample | COVID‐19 | Non‐COVID‐19 |
|---|---|---|---|---|
| GSE157103 | GPL24676 | Leukocytes from whole blood | 100 | 26 |
| GSE156063 | GPL24676 | Clinical naso‐/pharyngeal swab specimens | 93 | 141 |
| GSE154104 | GPL24247 | Lung tissues from mice 2, 4, and 7 days postinfection | 20 | 0 |
Figure 1Differentially expressed genes (DEGs) screening. (A) Volcano map of DEGs. Green indicates downregulated DEGs, red indicates upregulated DEGs, and gray indicates genes without differential expression. (B) Heat map of DEGs
Results of the module analysis
| Module | Score | Nodes | Edges |
|---|---|---|---|
| 1 | 75.929 | 85 | 3189 |
| 2 | 14.714 | 15 | 103 |
| 3 | 9 | 9 | 36 |
| 4 | 3 | 3 | 3 |
| 5 | 3 | 3 | 3 |
Figure 2Module analysis and GO function analysis of differentially expressed genes. (A) PPI network of module 1 and module 2. (B) Top 10 results of GO functional annotation of module 1 and module 2 genes. (C) Relationship between module 2 genes and GO functional annotation. GO, Gene Ontology; PPI, protein–protein interaction
Comparison of clinical features between the training group and testing group
| Covariates | Type | Entire | Training | Testing |
|
|---|---|---|---|---|---|
| Age | >60 | 73 (57.94%) | 49 (55.06%) | 24 (64.86%) | 0.5149 |
| ≤60 | 52 (41.27%) | 39 (43.82%) | 13 (35.14%) | ||
| Unknown | 1 (0.79%) | 1 (1.12%) | 0 (0%) | ||
| Gender | Female | 51 (40.48%) | 39 (43.82%) | 12 (32.43%) | 0.3773 |
| Male | 74 (58.73%) | 49 (55.06%) | 25 (67.57%) | ||
| Unknown | 1 (0.79%) | 1 (1.12%) | 0 (0%) | ||
| ICU | No | 60 (47.62%) | 44 (49.44%) | 16 (43.24%) | 0.6612 |
| Yes | 66 (52.38%) | 45 (50.56%) | 21 (56.76%) | ||
| Hospital‐free days | >30 | 61 (48.41%) | 46 (51.69%) | 15 (40.54%) | 0.3450 |
| ≤30 | 65 (51.59%) | 43 (48.31%) | 22 (59.46%) |
Figure 3Lasso regression analysis and expression information visualization of module 2 genes. (A) The adjustment parameter (λ) selected in the lasso model is cross‐verified 10 times by the minimum standard. The Y‐axis represents the binomial deviation and the X‐axis represents log (λ). (B) Visualization of selected gene expression information. The number in the outermost circle and the size of the yellow circle indicate log2FC, and the blue and red data in the third circle indicate the average expression values of COVID‐19 and non‐COVID‐19 samples
Figure 4Predictive ability evaluation of the six‐gene model in the training group, testing group, and entire group. (A–C) training group; (D–F) testing group; (G–I) entire group
Evaluation of the prediction accuracy of the six‐gene model in each group
| Group | SE | SP | PPV | NPV | Accuracy | AUC |
|---|---|---|---|---|---|---|
| Training | 0.9275 | 0.7500 | 0.9275 | 0.7500 | 0.8876 | 0.9304 |
| Testing | 0.8387 | 0.8333 | 0.9630 | 0.5000 | 0.8378 | 0.9140 |
| Entire | 0.9000 | 0.7692 | 0.9375 | 0.6667 | 0.8730 | 0.9212 |
Abbreviations: AUC, the area under the curve; NPV, negative predictive value; PPV, positive predictive value; SE, sensitivity; SP, specificity.
Figure 5Analysis of the prediction ability of each independent index for SARS‐CoV‐2 infection. (A) Analysis of the predictive ability of the six genes individually for SARS‐CoV‐2 infection. (B) Analysis of the predictive ability of six clinical indexes for SARS‐CoV‐2 infection. (C) Predictive ability analysis after fitting the six‐gene model with ferritin and fibrinogen
Figure 6Expression analysis of six genes (A–D), Expression analysis of the six genes in the GSE157103 data set; (E, F) Expression analysis of the six genes in the GSE156063 data set; (G) Analysis of the coexpression of the six genes and SARS‐CoV‐2 infection‐related genes in the GSE157103 data set; (H) Analysis of the coexpression of the six genes and SARS‐CoV‐2 infection‐related genes in the GSE156063 data set; (I) Analysis of the expression of IFIT3, USP18, XAF1, IFI27, and EPSTI1 in the lung tissue of SARS‐CoV‐2‐infected mice in the GSE154104 data set
Figure 7Analysis of the relationship between risk score and expression of genes related to SARS‐CoV‐2 infection in the GSE157103 data set. (A) Analysis of the correlation between risk score and ACE2 expression. (B) Analysis of the correlation between risk score and TMPRSS2 expression. (C) Analysis of the correlation between risk score and CTSB expression. (D) Analysis of the correlation between risk score and CTSL expression