| Literature DB >> 35569336 |
Farshad Saberi-Movahed1, Mahyar Mohammadifard2, Adel Mehrpooya3, Mohammad Rezaei-Ravari4, Kamal Berahmand5, Mehrdad Rostami6, Saeed Karami7, Mohammad Najafzadeh8, Davood Hajinezhad9, Mina Jamshidi8, Farshid Abedi10, Mahtab Mohammadifard11, Elnaz Farbod12, Farinaz Safavi13, Mohammadreza Dorvash14, Negar Mottaghi-Dastjerdi15, Shahrzad Vahedi16, Mahdi Eftekhari4, Farid Saberi-Movahed17, Hamid Alinejad-Rokny18, Shahab S Band19, Iman Tavassoly20.
Abstract
One of the most critical challenges in managing complex diseases like COVID-19 is to establish an intelligent triage system that can optimize the clinical decision-making at the time of a global pandemic. The clinical presentation and patients' characteristics are usually utilized to identify those patients who need more critical care. However, the clinical evidence shows an unmet need to determine more accurate and optimal clinical biomarkers to triage patients under a condition like the COVID-19 crisis. Here we have presented a machine learning approach to find a group of clinical indicators from the blood tests of a set of COVID-19 patients that are predictive of poor prognosis and morbidity. Our approach consists of two interconnected schemes: Feature Selection and Prognosis Classification. The former is based on different Matrix Factorization (MF)-based methods, and the latter is performed using Random Forest algorithm. Our model reveals that Arterial Blood Gas (ABG) O2 Saturation and C-Reactive Protein (CRP) are the most important clinical biomarkers determining the poor prognosis in these patients. Our approach paves the path of building quantitative and optimized clinical management systems for COVID-19 and similar diseases.Entities:
Keywords: COVID-19; Clinical biomarker; Dimensionality reduction; Feature selection; Matrix factorization
Mesh:
Substances:
Year: 2022 PMID: 35569336 PMCID: PMC8979841 DOI: 10.1016/j.compbiomed.2022.105426
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 6.698
Fig. 1Chronological and detailed illustration of the basic framework for the methods MFFS, MPMR, SGFS, RMFFS and SLSDR.
Details of ten gene expression datasets used in the experiments.
| Dataset | # Samples | # Features | # Classes | Reference |
|---|---|---|---|---|
| Embryonal Tumors of CNS (CNS) | 60 | 7129 | 2 | [ |
| Colon Cancer | 62 | 2000 | 2 | [ |
| Diffuse Large B-Cell Lymphoma (DLBCL) | 47 | 4026 | 2 | [ |
| GLIOMA | 50 | 4434 | 4 | [ |
| Leukemia | 72 | 7070 | 2 | [ |
| Lung Cancer | 203 | 3312 | 5 | [ |
| Lymphoma | 96 | 4026 | 9 | [ |
| Prostate Tumor | 102 | 10509 | 2 | [ |
| Small Round Blue Cell Tumors (SRBCT) | 83 | 2328 | 4 | [ |
| TOX-171 | 171 | 5748 | 4 | [ |
The per-iteration computational complexity comparison among different feature selection methods. Note that n is the number of samples, d is the number of features, and k is the number of selected features.
| Method | Computational complexity |
|---|---|
| MFFS | |
| MPMR | |
| SGFS | |
| RMFFS | |
| SLSDR |
Fig. 2Performance metrics of the Random Forest classifier: (a) Classification ACC, (b) TPR, (c) TNR, (d) PPV, (e) NPV, and (f) AUC.
Fig. 3Frequency of pair of features (biomarkers) that are selected by various feature selection methods at each iteration of the 10-fold CV of the Random Forest classifier. The feature selections methods are (a) MFFS, (b) MPMR, (c) RMFSS, (d) SGFS, and (e) SLSDR.
Fig. 4Aggregate frequency of features (biomarkers) that are selected by all feature selection methods together at all iterations of the 10-fold CV of the Random Forest classifier, where at each iteration only two features (k = 2) are selected.