| Literature DB >> 35620480 |
Zhandong Li1, Zi Mei2, Shijian Ding3, Lei Chen4, Hao Li1, Kaiyan Feng5, Tao Huang6,7, Yu-Dong Cai3.
Abstract
The occurrence of coronavirus disease 2019 (COVID-19) has become a serious challenge to global public health. Definitive and effective treatments for COVID-19 are still lacking, and targeted antiviral drugs are not available. In addition, viruses can regulate host innate immunity and antiviral processes through the epigenome to promote viral self-replication and disease progression. In this study, we first analyzed the methylation dataset of COVID-19 using the Monte Carlo feature selection method to obtain a feature list. This feature list was subjected to the incremental feature selection method combined with a decision tree algorithm to extract key biomarkers, build effective classification models and classification rules that can remarkably distinguish patients with or without COVID-19. EPSTI1, NACAP1, SHROOM3, C19ORF35, and MX1 as the essential features play important roles in the infection and immune response to novel coronavirus. The six significant rules extracted from the optimal classifier quantitatively explained the expression pattern of COVID-19. Therefore, these findings validated that our method can distinguish COVID-19 at the methylation level and provide guidance for the diagnosis and treatment of COVID-19.Entities:
Keywords: COVID-19; decision tree; feature selection; methylation; rule
Year: 2022 PMID: 35620480 PMCID: PMC9127386 DOI: 10.3389/fmolb.2022.908080
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1Flowchart of the computational method in this study. A systematic analysis process that integrates feature selection, DT algorithms, and rule learning was applied to identify COVID-19 methylation site features. The optimal classifier, methylation sites, and rules were determined based on the performance of the DT model and the importance of the features in each model.
FIGURE 2IFS curves obtained by DT classification models on the top 1000 features of the COVID-19 dataset. The model produced the highest F1-measure of 0.990 when the top 50 features were used.
FIGURE 3Performance of the best DT model and DT model with informative features. The best DT model is superior to the DT model with informative features.
Rules yielded by decision tree on top 50 features.
| Index | Condition | Result |
|---|---|---|
| Rule0 | cg03753191 ≤ 0.1398 cg15959262 > 0.5931 cg17439158 > 0.5681 | Patient with COVID-19 |
| Rule1 | cg03753191 > 0.1398 cg17439158 ≤ 0.6170 | Patient without COVID-19 |
| Rule2 | cg03753191 ≤ 0.1398 cg15959262 ≤ 0.5931 cg08399733 ≤ 0.9130 | Patient without COVID-19 |
| Rule3 | cg03753191 > 0.1398 cg17439158 > 0.6170 | Patient with COVID-19 |
| Rule4 | cg03753191 ≤ 0.1398 cg15959262 > 0.5931 cg17439158 ≤ 0.5681 | Patient without COVID-19 |
| Rule5 | cg03753191 ≤ 0.1398 cg15959262 ≤ 0.5931 cg08399733 > 0.9130 | Patient with COVID-19 |