| Literature DB >> 36213553 |
Abstract
Although some people do not have any chronic disease or are not in the risky age group for Covid-19, they are more vulnerable to the coronavirus. As the reason for this situation, some experts focus on the immune system of the person, while others think that the genetic history of patients may play a role. It is critical to detect corona from DNA signals as early as possible to determine the relationship between Covid-19 and genes. Thus, the effect on the severe course of the disease of variations in the genes associated with the corona disease will be revealed. In this study, a novel intelligent computer approach is proposed to identify coronavirus from nucleotide signals for the first time. The proposed method presents a multilayered feature extraction structure to extract the most effective features using an Entropy-based mapping technique, Discrete Wavelet Transform (DWT), statistical feature extractor, and Singular Value Decomposition (SVD), together. Then 94 distinctive features are selected by the ReliefF technique. Support vector machine (SVM) and k nearest neighborhood (k-NN) are chosen as classifiers. The method achieved the highest classification accuracy rate of 98.84% with an SVM classifier to detect Covid-19 from DNA signals. The proposed method is ready to be tested with a different database in the diagnosis of Covid-19 using RNA or other signals.Entities:
Keywords: Big data analysis; Biomedical signal processing; Covid-19; Linear algebra; Machine learning
Year: 2022 PMID: 36213553 PMCID: PMC9528020 DOI: 10.1016/j.chemolab.2022.104680
Source DB: PubMed Journal: Chemometr Intell Lab Syst ISSN: 0169-7439 Impact factor: 4.175
Fig. 1The flow diagram of the proposed approach.
Samples of the dataset.
| Accession | Race | Accession | Race |
|---|---|---|---|
| China | China | ||
| LC594644.1 | Japan | Japan | |
| MW364964.1 | Chile | NR_109888.1 | Chile |
| MW482885.1 | USA | USA | |
| MT994989 | Egypt | Egypt | |
| MT994632.1 | Iran | Iran | |
| MT820485.1 | Saudi Arabia | Saudi Arabia | |
| MT233521.1 | Spain: Valencia | Spain: Valencia | |
| MT253696.1 | China: Zhejiang | China: Zhejiang | |
| MT240479.1 | Pakistan: Gilgit | NR_144759.2 | Pakistan: Gilgit |
Fig. 2The illustration of working principle of Entropy based numerical technique.
Fig. 3Spectrogram images of corona and healthy signals.
Fig. 4The graphical explanation of the detection of features in the spectrogram images.
Fig. 5Various LBP operators.
Fig. 6An example for labeling pixels with the LBP operator.
Test results of the proposed method according to classifiers.
| Classifiers | Folds | Sensitivity (%) | Specificity (%) | Precision(%) | Accuracy(%) | F1 Score |
|---|---|---|---|---|---|---|
| SVM | Fold1 | 97.63 | 98.4 | 97.38 | 99.17 | 99.02 |
| Fold2 | 96.89 | 98.11 | 98.19 | 98.39 | 98.47 | |
| Fold3 | 97.24 | 98.58 | 97.15 | 97.86 | 97.36 | |
| Fold4 | 99.72 | 99.12 | 98.26 | 98.64 | 99.81 | |
| Fold5 | 98.12 | 99.04 | 94.97 | 96.54 | 99.54 | |
| k-NN | Fold1 | 97.81 | 96.48 | 96.23 | 96.89 | 98.12 |
| Fold2 | 95.81 | 98.17 | 95.38 | 98.18 | 97.84 | |
| Fold3 | 95.26 | 98.48 | 96.19 | 96.75 | 98.47 | |
| Fold4 | 94.97 | 99.95 | 95.52 | 96.94 | 97.81 | |
| Fold5 | 95.85 | 98.17 | 96.83 | 97.19 | 98.21 | |
Fig. 7The ROC curve of the proposed method in SVM and k-NN.
A comparison of the proposed method with other methods.
| Authors | Methods | Dataset | Results (Accuracy) |
|---|---|---|---|
| Uhlenhaut [ | PCR-based method | Lung cellular DNA | – |
| Guo et al. [ | Microarray-based method | 19 cDNAs | 100% |
| Mani et al. [ | Silico bioinformatics analysis approach | Indian genome sequences | >98.3% accuracy |
| Aslan et al. [ | A new method based on KNN | CpG island | 98.4% |
| Zhang et al. [ | Random Forest(RF), KNN,SVM, Decision Tree(DT) | PSMB8, COLCA2, | RF:89.3% |
| Chen et al. [ | Random Forest(RF), KNN,SVM, Decision Tree(DT) | FAM83A, LGALS3BP, | SVM:88.5% |
| Li et al. [ | Boruta feature filtering, Decision Tree, Random Forest | IRF9 | KNN:83.8% |
| Genes with accession number | DT:80.8% | ||
| Single-cell with accession number E-MTAB-10026 | DT: 86.7% | ||
| KNN: 88.2% | |||
| SVM: 93.8% | |||
| RF: 92.3% | |||
| Max RF:90.9% (macro F1) on B cell | |||