| Literature DB >> 35687925 |
Naiyar Iqbal1, Pradeep Kumar2.
Abstract
BACKGROUND: The world has been battling the continuous COVID-19 pandemic spread by the SARS-CoV-2 virus for last two years. The issue of viral disease prediction is constantly a matter of interest in virology and the study of disease transmission over the long years.Entities:
Keywords: COVID-19; Classification; Machine learning; Prediction; Predictor; RNA-Seq; SARS-CoV-2
Mesh:
Substances:
Year: 2022 PMID: 35687925 PMCID: PMC9162937 DOI: 10.1016/j.compbiomed.2022.105684
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 6.698
Fig. 1Year-wise publications associated with COVID-19, RNA-Seq and Machine Learning in PubMed Repository.
List of interactive machine learning based COVID-19 models.
| MODEL & YEAR | METHODS | DISEASE | ACCURACY (%) | Gap/Future Work |
|---|---|---|---|---|
| Convolutional neural network (CNN) and a bi-directional long short-term memory (BiLSTM) network | Coronaviridae, | 99.90 and 85.80 | It is necessary to classify a broader spectrum of viruses. | |
| Dimension reduction (DR) and Sparse Representation (SR) | SARS-CoV-2 | 67.70 Spalt1 | Improve the function of jSRC, | |
| NLP Techniques: k-mer, Bag-of-Descriptors (BoDs), and Bag-of-Unique-Descriptors (BoUDs) | SARS-CoV-2 | 100 | To forecast the virus class, the results must also be examined by machine intelligence technology |
Fig. 2Integrated workflow for RNA-Seq data processing and Machine Learning COVID-19 prediction.
Fig. 3Confusion matrix and classification performance metrics.
Fig. 4MA Plot of DEGs on log2FC ≥ 2 using DESeq2.
Fig. 6MA Plot of DEGs on log2FC ≥ 2 using Limma Trend.
Fig. 8MA Plot of DEGs on log2FC ≥ 2 using Limma Voom.
Fig. 5Volcano Plot of DEGs on log2FC ≥ 2 using DESeq2.
Fig. 7Volcano Plot of DEGs on log2FC ≥ 2 using Limma Trend.
Fig. 9Volcano Plot of DEGs on log2FC ≥ 2 using Limma Voom.
No. of Genes and corresponding accuracy rate based on DEGs levels and Log2 fold change.
| ML ALGORITHM | DEGs Level | |log2FC| > 0 | |log2FC| ≥ 1 | |log2FC| ≥ 2 | |log2FC| ≥ 3 | Mean Accuracy | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| No. of Genes | Accuracy | No. of Genes | Accuracy | No. of Genes | Accuracy | No. of Genes | Accuracy | |||
| SVM | 7628 | 98.02 | 657 | 98.49 | 36 | 95.70 | 3 | 89.88 | ||
| KNN | 94.07 | 96.28 | 95.23 | 88.95 | 93.63 | |||||
| NB | 94.77 | 95.12 | 93.37 | 87.09 | 92.59 | |||||
| RF | 93.60 | 95.00 | 91.51 | 88.72 | 92.21 | |||||
| DT | 94.77 | 91.16 | 83.02 | 81.28 | 87.56 | |||||
| SVM | 8901 | 97.56 | 1586 | 98.37 | 6 | 93.60 | ||||
| KNN | 92.91 | 94.42 | 94.65 | 94.83 | ||||||
| NB | 93.60 | 94.88 | 92.21 | 93.98 | ||||||
| RF | 93.49 | 93.84 | 95.70 | 94.31 | ||||||
| DT | 94.88 | 90.81 | 81.16 | 87.44 | ||||||
| SVM | 10005 | 95.81 | 2246 | 97.09 | 90 | 98.14 | 8 | 91.86 | ||
| KNN | 92.44 | 94.88 | 96.05 | 94.53 | 94.48 | |||||
| NB | 93.84 | 93.14 | 94.88 | 93.02 | 93.72 | |||||
| RF | 93.72 | 93.84 | 94.07 | 93.60 | 93.81 | |||||
| DT | 94.88 | 91.40 | 80.70 | 81.86 | 87.21 | |||||
List of top five Up-Regulated and five Down-Regulated genes out of 67 DEGsModerate with log2FC ≥ 2.
| Regulation | Ensembl ID | Gene Symbol | Log2FoldChange | p-adj |
|---|---|---|---|---|
| ENSG00000275214 | IFI27 | +4.17787 | 1.88E-12 | |
| ENSG00000170439 | METTL7B | +3.61618 | 2.12E-10 | |
| ENSG00000115155 | OTOF | +3.35171 | 3.91E-09 | |
| ENSG00000204936 | CD177 | +3.22541 | 2.92E-06 | |
| ENSG00000283802 | ADAMTS2 | +3.08239 | 2.47E-07 | |
| ENSG00000154165 | GPR15 | −2.06062 | 1.34E-07 | |
| ENSG00000082497 | SERTAD4 | −2.05997 | 4.95E-08 | |
| ENSG00000092978 | GPATCH2 | −2.05432 | 4.06E-05 | |
| ENSG00000180537 | RNF182 | −2.03799 | 5.79E-04 | |
| ENSG00000079308 | TNS1 | −2.00083 | 1.86E-07 |
Fig. 10Distribution of DEGsModerate Biomarkers based on Log2 Fold Change.
Fig. 11Classification outcomes of machine learning techniques of DEGsModerate with |log2FC| ≥ 2.
Feature dimension comparison of ML models performance based on fold change parameter of DEGsModerate.
| ML Algorithm | Feature Dimension | Accuracy (%) | Accuracy Effect (%) | ||
|---|---|---|---|---|---|
| Log2 Fold Change | Selected Genes | ||||
| Support Vector Machine | |log2FC| > 0 | 8901 | 97.56 | +0.81 | Increase |
| |log2FC| ≥ 1 | 1586 | 98.37 | |||
| |log2FC| ≥ 3 | 6 | 93.60 | −5.47 | Decrease | |
| K-Nearest Neighbor | |log2FC| > 0 | 8901 | 92.91 | +1.51 | Increase |
| |log2FC| ≥ 1 | 1586 | 94.42 | |||
| |log2FC| ≥ 3 | 6 | 94.65 | −2.68 | Decrease | |
| Naïve Bayes | |log2FC| > 0 | 8901 | 93.60 | +1.28 | Increase |
| |log2FC| ≥ 1 | 1586 | 94.88 | |||
| |log2FC| ≥ 3 | 6 | 92.21 | −3.02 | Decrease | |
| Random Forest | |log2FC| > 0 | 8901 | 93.49 | +0.35 | Increase |
| |log2FC| ≥ 1 | 1586 | 93.84 | |||
| |log2FC| ≥ 3 | 6 | 95.70 | +1.51 | Increase | |
| Decision Tree | |log2FC| > 0 | 8901 | 94.88 | −4.07 | Decrease |
| |log2FC| ≥ 1 | 1586 | 90.81 | |||
| |log2FC| ≥ 3 | 6 | 81.16 | −1.75 | Decrease | |
Classification performance metrics of trained ML models of DEGsModerate with |log2FC| ≥ 2 (in %).
| ML Algorithm | SVM | kNN | Naïve Bayes | Random Forest | Decision Tree |
|---|---|---|---|---|---|
| Classification Accuracy | 99.07 | 97.33 | 95.23 | 94.19 | 82.91 |
| Sensitivity | 98.71 | 97.90 | 93.87 | 96.45 | 97.58 |
| Specificity | 100.00 | 95.83 | 98.75 | 88.33 | 70.83 |
| Precision | 100.00 | 98.38 | 99.49 | 95.54 | 88.64 |
| FPR | 0.00 | 4.17 | 1.25 | 11.67 | 29.17 |
| NPV | 96.80 | 94.68 | 86.21 | 90.61 | 68.98 |
| RMC | 0.93 | 2.67 | 4.77 | 5.81 | 17.09 |
| F1 | 99.35 | 98.14 | 96.60 | 95.99 | 88.07 |
| Area under ROC (Control) | 99.19 | 97.38 | 98.86 | 98.66 | 81.28 |
| Area under ROC (Treated) | 99.19 | 97.38 | 99.46 | 98.66 | 81.28 |