| Literature DB >> 36212830 |
Zhiyang Liu1, Mei Meng2, ShiJian Ding3, XiaoChao Zhou2, KaiYan Feng4, Tao Huang5,6, Yu-Dong Cai3.
Abstract
Patients infected with SARS-CoV-2 at various severities have different clinical manifestations and treatments. Mild or moderate patients usually recover with conventional medical treatment, but severe patients require prompt professional treatment. Thus, stratifying infected patients for targeted treatment is meaningful. A computational workflow was designed in this study to identify key blood methylation features and rules that can distinguish the severity of SARS-CoV-2 infection. First, the methylation features in the expression profile were deeply analyzed by a Monte Carlo feature selection method. A feature list was generated. Next, this ranked feature list was fed into the incremental feature selection method to determine the optimal features for different classification algorithms, thereby further building optimal classifiers. These selected key features were analyzed by functional enrichment to detect their biofunctional information. Furthermore, a set of rules were set up by a white-box algorithm, decision tree, to uncover different methylation patterns on various severity of SARS-CoV-2 infection. Some genes (PARP9, MX1, IRF7), corresponding to essential methylation sites, and rules were validated by published academic literature. Overall, this study contributes to revealing potential expression features and provides a reference for patient stratification. The physicians can prioritize and allocate health and medical resources for COVID-19 patients based on their predicted severe clinical outcomes.Entities:
Keywords: SARS-CoV-2; classification rule; machine learning; methylation; severity
Year: 2022 PMID: 36212830 PMCID: PMC9537378 DOI: 10.3389/fmicb.2022.1007295
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 6.064
Sample size of each class for the methylation profile.
| Class name | Sample size |
|---|---|
| Negative infection | 296 |
| Other infection | 65 |
| Discharged from emergency department | 34 |
| Admitted to inpatient care | 84 |
| Progressed to ICU | 35 |
| Death | 11 |
Figure 1Workflow of this study. First, the Monte Carlo feature selection (MCFS) method was used to rank methylation signatures based on their importance, and a ranked feature list was generated. This list was then fed into the incremental feature selection (IFS) method with different classification algorithms to determine the optimal features for each classification algorithm. Optimal classifiers were set up. Classification rules generated by the optimal decision tree (DT) classifier were used to analyze the methylation expression pattern. The genes corresponding to essential methylation sites were subjected to functional enrichment analysis.
Figure 2Bar chart to show top 10 key methylation features and their relative importance scores.
Figure 3IFS curves to show the performance of different classification algorithms under different feature subsets. The highest weighted F1 for each classification algorithm was marked on the corresponding IFS curve. The SVM yielded the highest weighted F1 of 0.921 when top 1,025 features were used.
Overall performance of the optimal classifiers.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| k-nearest neighbor | 10 | 0.784 | 0.730 | 0.793 | 0.790 |
| Random forest | 35 | 0.893 | 0.842 | 0.873 | 0.895 |
| Support vector machine | 1,025 | 0.920 | 0.881 | 0.926 | 0.921 |
| Decision tree | 590 | 0.771 | 0.686 | 0.749 | 0.780 |
Figure 4Performance of four optimal classifiers on six classes. The optimal SVM classifier produced best performance on all classes.
Figure 5Distribution of classification rules on six classes.
Figure 6Top five GO terms enriched by the genes converted by the top 1,025 methylation features.
Figure 7Top five KEGG pathways enriched by the genes converted by the top 1,025 methylation features.
Essential methylation sites and their corresponding genes for distinguishing severity of SARS-CoV-2 infection.
|
|
|
|
|---|---|---|
| cg22930808 | PARP9 | Poly (ADP-Ribose) Polymerase Family Member 9 |
| cg25888371 | MX1 | MX Dynamin Like GTPase 1 |
| cg17114584 | IRF7 | Interferon Regulatory Factor 7 |