| Literature DB >> 35922752 |
Samaneh Maleknia1, Mohammad Javad Tavassolifar1, Faezeh Mottaghitalab1, Mohammad Reza Zali2, Anna Meyfour3.
Abstract
BACKGROUND: Regardless of improvements in controlling the COVID-19 pandemic, the lack of comprehensive insight into SARS-COV-2 pathogenesis is still a sophisticated challenge. In order to deal with this challenge, we utilized advanced bioinformatics and machine learning algorithms to reveal more characteristics of SARS-COV-2 pathogenesis and introduce novel host response-based diagnostic biomarker panels.Entities:
Keywords: Biomarker; COVID-19; Data integration; Nasopharyngeal swab; Pathogenesis; Random forest; SARS-COV-2; Systems biology; Whole blood
Mesh:
Substances:
Year: 2022 PMID: 35922752 PMCID: PMC9347150 DOI: 10.1186/s10020-022-00513-5
Source DB: PubMed Journal: Mol Med ISSN: 1076-1551 Impact factor: 6.376
Fig. 1The workflow of the study: The RNA-Seq datasets related to whole blood (WB) and nasopharyngeal (NP) samples from patients with COVID-19 infection and other similar disease conditions including viral and non-viral acute respiratory illnesses (ARI) as well as healthy controls were acquired from GEO database. Data were integrated and the batch effects were eliminated. Subsequently, the datasets were subjected to pathway enrichment and GO analyses. Furthermore, the candidate diagnostic biomarker panels were identified using machine learning methods on train datasets and validated on independent cohorts to introduce the best biomarker combinations. Besides, the RF-based generic prediction models were generated by using all combinations of 3 to 9 markers related to 23 common WB/NP DEGs was done. Finally, the results of two prediction models, including the LASSO feature-based prediction model and RF-based generic prediction model were compared. WB whole blood, NP nasopharyngeal, ARI acute respiratory illnesses, RF random forest
The number of WB and NP samples applied in this study
| Tissue | GSEa | COVID-19 | HC | Not-COVID | Other respiratory diseases | Refs |
|---|---|---|---|---|---|---|
| WB | 163,151 | 7 | 20 | Ng et al. ( | ||
| WB | 151,161 | 39 | – | |||
| WB | 152,641 | 62 | 24 | Thair et al. ( | ||
| WB | 169,687 | 14 | – | |||
| WB | 172,450 | 13 | – | |||
| NP | 152,075 | 430 | 54 | Lieberman et al. ( | ||
| NP | 156,063 | 93 | Other viruses = 41, non-viral = 100 | Mick et al. ( | ||
| NP | 163,151 | 138 | 11 | Influenza = 76b, other Corona viruses = 12b, other viruses = 32, non-viral = 82 | Ng et al. ( | |
| NP | 188,678 | 90 | other viruses = 59, non-viral = 169 | – |
aGenomic Spatial Event (GSE) database (Danford et al. 2008)
bThe samples related to “Influenza” and “other corona” were labeled as other viruses in the analysis
Fig. 2Transcriptome analysis of whole blood samples of COVID-19 patients versus healthy controls: The volcano plot to demonstrate differential expressed genes which had adjusted P-value < 0.05, |Log2FC|> 1. Red and green show up and downregulated genes, respectively (A). Dot plot to show BPs (GO) according to significantly upregulated and downregulated genes. The size of the dots is proportional to the gene ratio in considering process and the color corresponds to the –log10 of the adjusted P-value. Selected top and not-redundant terms are visualized (B). Bar plot to depict hallmark gene set enrichment analysis. The size of the bars is proportional to the gene ratio in considering pathway and the color corresponds to the –log10 of the adjusted P-value (C). BP biological process, GO gene ontology
Fig. 3Cell-type proportions in whole blood of COVID-19 in comparison to healthy control: the box plots for the estimated immune cell type proportions of the COVID-19 patients and the HC individuals which were obtained by Cibersortx. HC healthy control
Fig. 4Transcriptome analysis of nasopharyngeal samples of patients with COVID-19 versus non-viral and other viral acute respiratory illnesses (ARIs) as well as healthy controls: The volcano plot to demonstrate differential expressed genes which had adjusted P-value < 0.05, |Log2FC|> 1. Red and green show up and downregulated genes, respectively (A). Dot plots to show BPs according to significantly up/downregulated genes (B) and hallmark gene set enrichment analysis (C). The size of the dots is proportional to the gene ratio in considering process and pathway; and the color corresponds to the –log10 of the adjusted P-value. Selected top and not-redundant terms are visualized. BP biological process
Fig. 5Analysis of common dysregulated genes in SARS-COV-2 -infected whole blood and nasopharyngeal samples in comparison with healthy controls: The Venn diagram to display the distribution of genes in four desired groups (UB upregulated genes in blood, DB downregulated genes in blood, UN upregulated genes in nasal, and DN downregulated genes in nasal) (A). Dot plot to show BPs according to common genes of each paired group. The size of the dots is proportional to the gene ratio in considering process and the color corresponds to the –log10 of the adjusted P-value. Selected top and not-redundant terms are visualized (B). Bar plot to depict hallmark gene set enrichment analysis. The size of the bars is proportional to the gene ratio in considering pathway and the color corresponds to paired groups whose common genes were studied. The “KRAS Signaling Dn” pathway was enriched in two groups (C). BP biological process
The selected features based on the criteria* in the train sets of WB and NP samples
| NP | WB | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Gene | L.Coef | Abs.L.Coef | logFC | Abs.logFC | Gene | L.Coef | Abs.L.Coef | logFC | Abs.logFC |
| IFI6 | − 0.28491 | 0.284913 | 1.613021 | 1.613021 | SLC24A5 | − 1.00469 | 1.004693 | − 2.03768 | 2.037676 |
| IFI44L | − 0.27956 | 0.279561 | 1.970857 | 1.970857 | SLC45A2 | 0.706125 | 0.706125 | 1.208076 | 1.208076 |
| SIGLEC1 | − 0.27801 | 0.278012 | 1.332828 | 1.332828 | C1QC | 0.416476 | 0.416476 | 1.196718 | 1.196718 |
| NUCB1 | 0.226323 | 0.226323 | − 1.00591 | 1.005905 | NMNAT2 | 0.403256 | 0.403256 | 1.065142 | 1.065142 |
| XAF1 | − 0.21556 | 0.215562 | 1.081608 | 1.081608 | LGSN | 0.313768 | 0.313768 | 1.034986 | 1.034986 |
| TMED9 | 0.201803 | 0.201803 | − 1.50563 | 1.505627 | HIST2H4A | 0.306524 | 0.306524 | 2.531967 | 2.531967 |
| SAMHD1 | 0.168243 | 0.168243 | − 1.00134 | 1.001343 | INSC | 0.300362 | 0.300362 | 1.489929 | 1.489929 |
| SDC1 | − 0.15729 | 0.157293 | − 1.00398 | 1.003981 | GOLGA8M | 0.288115 | 0.288115 | 1.033698 | 1.033698 |
| TIMM13 | 0.154103 | 0.154103 | − 1.1873 | 1.187303 | CDCA5 | 0.209252 | 0.209252 | 1.05616 | 1.05616 |
| IL1R2 | 0.14483 | 0.14483 | − 1.37001 | 1.370005 | NOS1AP | 0.194112 | 0.194112 | 1.137223 | 1.137223 |
| CXCL11 | 0.122537 | 0.122537 | 1.340965 | 1.340965 | BEGAIN | 0.190819 | 0.190819 | 1.167257 | 1.167257 |
| LAMB3 | 0.092874 | 0.092874 | − 1.46286 | 1.462858 | OTOF | 0.171533 | 0.171533 | 1.280815 | 1.280815 |
| TMA7 | 0.081283 | 0.081283 | − 1.315 | 1.314995 | UGT2B11 | − 0.15846 | 0.158463 | − 1.48498 | 1.484976 |
| ADIRF | 0.069208 | 0.069208 | − 1.30839 | 1.308395 | GSTM1 | − 0.15549 | 0.155486 | − 1.43369 | 1.433691 |
| BBS10 | 0.05455 | 0.05455 | − 1.40761 | 1.407608 | OR10G2 | − 0.13769 | 0.137689 | − 1.95528 | 1.95528 |
| OR1I1 | 0.050515 | 0.050515 | − 1.4272 | 1.427201 | TRIP13 | 0.115733 | 0.115733 | 1.028476 | 1.028476 |
| MIF | 0.031292 | 0.031292 | − 1.80266 | 1.802656 | CCDC27 | − 0.10699 | 0.106987 | − 1.99933 | 1.999328 |
| CXCL10 | 0.029183 | 0.029183 | 1.5632 | 1.5632 | ABCC11 | 0.08954 | 0.08954 | 1.31761 | 1.31761 |
| C19orf33 | 0.015395 | 0.015395 | − 1.33282 | 1.33282 | EIF1AY | − 0.08357 | 0.083573 | − 1.76969 | 1.769694 |
| COPA | 0.014771 | 0.014771 | − 1.31686 | 1.316856 | OR2A42 | − 0.0225 | 0.022504 | − 2.24545 | 2.245455 |
| ADAM17 | 0.012107 | 0.012107 | − 1.36326 | 1.363259 | GTF2H2C | 0.01273 | 0.01273 | 1.429336 | 1.429336 |
| TCTEX1D4 | 0.006927 | 0.006927 | − 1.52527 | 1.52527 | SCN5A | 0.007577 | 0.007577 | 1.405947 | 1.405947 |
| IFIT2 | 0.005005 | 0.005005 | 1.308344 | 1.308344 | |||||
WB whole blood, NP nasopharyngeal
*Absolute LASSO coefficient more than 0.1 OR the non-zero LASSO coefficient and the absolute value of logFC more than 1.3
The criteria obtained for WB and NP samples in the first and second phases by the RF classifier based on LASSOfeatures
| Tissue | The first phase based on the k-fold CV on the train set | The second phase based on train and test sets | Number of features | Genes | ||||
|---|---|---|---|---|---|---|---|---|
| Sensitivity_cv | Specificity_cv | Accuracy_cv | Sensitivity | Specificity | Accuracy | |||
| WB | 0.895238095 | 0.855072464 | 0.879310345 | 0.90625 | 0.923076923 | 0.911111111 | 3 | CCDC27, CDCA5, EIF1AY |
| 0.971428571 | 0.898550725 | 0.942528736 | 0.90625 | 0.923076923 | 0.911111111 | 4 | CCDC27, HIST2H4A, NOS1AP, TRIP13 | |
| 0.971428571 | 0.927536232 | 0.954022989 | 0.90625 | 0.923076923 | 0.911111111 | 5 | CCDC27, HIST2H4A, LGSN, NOS1AP, TRIP13 | |
| 0.980952381 | 0.956521739 | 0.967816092 | 1 | 0.923076923 | 0.977777778 | 6* | C1QC, CCDC27, HIST2H4A, INSC, OTOF, SLC24A5 | |
| 0.980952381 | 0.956521739 | 0.967816092 | 1 | 0.923076923 | 0.977777778 | 6 | CCDC27, CDCA5, INSC, NMNAT2, OR2A42, SLC24A5 | |
| 0.980952381 | 0.913043478 | 0.948275862 | 1 | 0.923076923 | 0.977777778 | 7 | ABCC11, CCDC27, INSC, NMNAT2, OTOF, SLC24A5, TRIP13 | |
| 0.971428571 | 0.942028986 | 0.971264368 | 1 | 0.923076923 | 0.977777778 | 8 | ABCC11, CCDC27, CDCA5, GTF2H2C, INSC, OTOF, SLC24A5, SLC45A2 | |
| 0.971428571 | 0.971014493 | 0.977011494 | 1 | 0.923076923 | 0.977777778 | 9* | CCDC27, CDCA5, EIF1AY, GOLGA8M, GTF2H2C, INSC, NMNAT2, OTOF, SLC24A5 | |
| 0.971428571 | 0.971014493 | 0.977011494 | 1 | 0.923076923 | 0.977777778 | 9 | CCDC27, GSTM1, GTF2H2C, INSC, NMNAT2, NOS1AP, OTOF, SLC24A5, TRIP13 | |
| NP | 0.861148198 | 0.777070064 | 0.822803195 | 0.815508021 | 0.785714286 | 0.808 | 3 | IFI6, MIF, NUCB1 |
| 0.891855808 | 0.805732484 | 0.852578068 | 0.855614973 | 0.825396825 | 0.848 | 4 | IFI6, LAMB3, MIF, NUCB1 | |
| 0.887850467 | 0.823248408 | 0.8583878 | 0.863636364 | 0.873015873 | 0.866 | 5 | IFI44L, IFI6, IL1R2, MIF, NUCB1 | |
| 0.903871829 | 0.824840764 | 0.867828613 | 0.852941176 | 0.857142857 | 0.854 | 6 | ADIRF, IFI6, IL1R2, MIF, NUCB1, TMA7 | |
| 0.910547397 | 0.837579618 | 0.877269426 | 0.868983957 | 0.873015873 | 0.87 | 7 | CXCL10, IFI6, MIF, NUCB1, SAMHD1, SIGLEC1, TMED9 | |
| 0.90787717 | 0.839171975 | 0.87654321 | 0.871657754 | 0.912698413 | 0.882 | 8 | COPA, CXCL11, IFI6, MIF, NUCB1, SAMHD1, SIGLEC1, TMED9 | |
| 0.90787717 | 0.855095541 | 0.883805374 | 0.871657754 | 0.904761905 | 0.88 | 9 | CXCL11, IFI6, IFIT2, LAMB3, MIF, NUCB1, SAMHD1, TMA7, XAF1 | |
WB whole blood, NP nasopharyngeal
*Selected to depict ROC curves
Fig. 6The criteria of classifiers: The Line plots to indicate the value of the sensitivity, specificity, and accuracy of the classifiers for whole blood (WB) and nasopharyngeal (NP) samples in the first and second phases based on the number of features
Fig. 7The ROC curves: These ROC curves illustrate the sensitivity, 1-specificity, and AUC associated to phase I (A and C) and phase II (B and D) for whole blood and nasopharyngeal samples among the top 3 to 9 features, respectively
The comparison of criteria related to the best panels of the LASSO feature-based prediction model and RF-based generic prediction model
| Tissue | Number of features | The best LASSO-based panels based on train and test sets | The best common-based panels based on train and test sets | Difference of accuracies | ||||
|---|---|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | Sensitivity | Specificity | Accuracy | |||
| WB | 3 | 0.906 | 0.923 | 0.911 | 0.938 | 0.769 | 0.889 | 0.022 |
| 4 | 0.906 | 0.923 | 0.911 | 0.969 | 0.692 | 0.889 | 0.022 | |
| 5 | 0.906 | 0.923 | 0.911 | 0.969 | 0.769 | 0.911 | 0.000 | |
| 6 | 1 | 0.923 | 0.978 | 0.938 | 0.846 | 0.911 | 0.067 | |
| 7 | 1 | 0.923 | 0.978 | 0.938 | 0.846 | 0.911 | 0.067 | |
| 8 | 1 | 0.923 | 0.978 | 0.938 | 0.846 | 0.911 | 0.067 | |
| 9 | 1 | 0.923 | 0.978 | 0.906 | 0.846 | 0.889 | 0.089 | |
| NP | 3 | 0.816 | 0.786 | 0.808 | 0.743 | 0.802 | 0.758 | 0.050 |
| 4 | 0.856 | 0.825 | 0.848 | 0.810 | 0.810 | 0.810 | 0.038 | |
| 5 | 0.864 | 0.873 | 0.866 | 0.826 | 0.841 | 0.830 | 0.036 | |
| 6 | 0.853 | 0.857 | 0.854 | 0.829 | 0.865 | 0.838 | 0.016 | |
| 7 | 0.869 | 0.873 | 0.870 | 0.832 | 0.881 | 0.844 | 0.026 | |
| 8 | 0.872 | 0.913 | 0.882 | 0.850 | 0.841 | 0.848 | 0.034 | |
| 9 | 0.872 | 0.905 | 0.880 | 0.850 | 0.857 | 0.852 | 0.028 | |
WB whole blood, NP nasopharyngeal