| Literature DB >> 35873009 |
Roberto Bárcenas1, Ruth Fuentes-García1.
Abstract
Understanding SARS-CoV-2 infection that causes COVID-19 disease among the population was fundamental to determine the risk factors associated with severe cases or even death. Amidst the study of the pandemic, Artificial Intelligence (AI) and Machine Learning (ML) have been successfully applied in many areas such as biomedicine. Using a dataset from the Mexican Ministry of Health, we performed a multiclass classification scheme for the detection of risks in COVID-19 patients and implemented three Machine Learning algorithms achieving the following accuracy measures: Random Forest (89.86%), GBM (89.37%) XGBoost (89.97%). The key findings are the identification of relevant components associated with different severities of COVID-19 disease. Among these factors, we found sex, age, days elapsed from the beginning of symptoms, symptoms such as dyspnea and polypnea; and other comorbidities such as diabetes and hypertension. This setting allows us to establish predicting algorithms to model the risk that an individual or a specific group of people face after contracting COVID-19 and the factors associated with developing complications or receiving appropriate treatment.Entities:
Keywords: COVID-19; Feature importance; Machine Learning; Multiclass classification; Risk assessment
Year: 2022 PMID: 35873009 PMCID: PMC9295315 DOI: 10.1016/j.imu.2022.101023
Source DB: PubMed Journal: Inform Med Unlocked ISSN: 2352-9148
Fig. 1Progression grouped by Sex, Age and Days elapsed.
Fig. 2Risk level by comorbidities and major symptoms. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Risk level by Health System features.
Confusion matrices.
| Random Forest | GBM | XGBoost | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Low | Moderate | High | Low | Moderate | High | Low | Moderate | High | |||
| Risk | Risk | Risk | Risk | Risk | Risk | Risk | Risk | Risk | |||
| Low Risk | 29927 | 0 | Low Risk | 29937 | 3 | Low Risk | 29935 | 0 | |||
| Moderate Risk | 2 | 2112 | Moderate Risk | 1 | 1686 | Moderate Risk | 3 | 1275 | |||
| High Risk | 15 | 2901 | 7620 | High Risk | 6 | 3324 | 7819 | High Risk | 6 | 3738 | 8111 |
Performance measures.
| Random Forest | GBM | XGBoost | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | 89.86% | Accuracy | 89.37% | Accuracy | 89.97% | ||||||
| Low | Moderate | High | Low | Moderate | High | Low | Moderate | High | |||
| Risk | Risk | Risk | Risk | Risk | Risk | Risk | Risk | Risk | |||
| Sensitivity | 99.94 | 99.98 | 99.97 | ||||||||
| Specificity | 96.22 | 97.38 | 91.66 | 96.13 | 97.92 | 90.47 | 96.18 | 98.65 | 89.29 | ||
| Precision | 98.24 | 67.37 | 72.32 | 98.2 | 67.47 | 70.13 | 98.22 | 70.75 | 68.42 | ||
| 99.08 | 51.84 | 77.31 | 99.08 | 44.89 | 76.94 | 99.09 | 37.42 | 77.13 | |||
| BA | 98.08 | 69.75 | 87.35 | 98.05 | 65.78 | 87.84 | 98.08 | 62.04 | 88.84 | ||
Fig. 4Training algorithms: Feature importance.