| Literature DB >> 35528178 |
Lei Chen1,2, Zi Mei3, Wei Guo4, ShiJian Ding1, Tao Huang5,6, Yu-Dong Cai1.
Abstract
COVID-19 is hypothesized to be linked to the host's excessive inflammatory immunological response to SARS-CoV-2 infection, which is regarded to be a major factor in disease severity and mortality. Numerous immune cells play a key role in immune response regulation, and gene expression analysis in these cells could be a useful method for studying disease states, assessing immunological responses, and detecting biomarkers. Here, we developed a machine learning procedure to find biomarkers that discriminate disease severity in individual immune cells (B cell, CD4+ cell, CD8+ cell, monocyte, and NK cell) using single-cell gene expression profiles of COVID-19. The gene features of each profile were first filtered and ranked using the Boruta feature selection method and mRMR, and the resulting ranked feature lists were then fed into the incremental feature selection method to determine the optimal number of features with decision tree and random forest algorithms. Meanwhile, we extracted the classification rules in each cell type from the optimal decision tree classifiers. The best gene sets discovered in this study were analyzed by GO and KEGG pathway enrichment, and some important biomarkers like TLR2, ITK, CX3CR1, IL1B, and PRDM1 were validated by recent literature. The findings reveal that the optimal gene sets for each cell type can accurately classify COVID-19 disease severity and provide insight into the molecular mechanisms involved in disease progression.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35528178 PMCID: PMC9073549 DOI: 10.1155/2022/6089242
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.246
Figure 1Computational workflow for this study. First, we applied Boruta and mRMR methods to filter and rank features of expression profiles for different immune cells (B cell, CD4+ T cell, CD8+ T cell, monocyte, and NK cell). Then, using the incremental feature selection method, a series of feature subsets were generated, and training samples made up of these feature subsets were used to train decision tree and random forest with 10-fold cross-validation. Based on the evaluation metrics of the model, the optimal number of features under each cell type was determined, and the optimal classifiers and classification rules were established as well. The GO and KEGG functional analyses were performed on these selected gene sets.
Figure 2Details of the sample sizes and analysis results. (a) Sample sizes for different disease severity in each immune cell type. (b) IFS curves generated by decision tree and random forest in different immune cell types, the highest point of each curve was marked. (c) Number of classification rules extracted by the optimal decision tree classifiers.
Figure 3Results of GO and KEGG enrichment analyses in different immune cell types.
Essential genes involved in Discussion.
| Cell type | Gene symbol | Description |
|---|---|---|
| B cell | TLR2 | Toll-like receptor 2 |
| BCL2A1 | BCL2-related protein A1 | |
| CD79A | CD79a molecule | |
| CD79B | CD79b molecule | |
| NR4A1 | Nuclear receptor subfamily 4 group A member 1 | |
| CD4+ T cell | ITK | IL2 inducible T cell kinase |
| HPGD | 15-hydroxyprostaglandin dehydrogenase | |
| CD8+ T cell | CX3CR1 | C-X3-C motif chemokine receptor 1 |
| TNFAIP3 | TNF alpha-induced protein 3 | |
| Monocyte | IL1B | Interleukin 1 beta |
| IFITM3 | Interferon-induced transmembrane protein 3 | |
| NK cell | PRDM1 | PR/SET domain 1 |