| Literature DB >> 35388055 |
Alejandro Reina Reina1,2, José M Barrera3,4, Bernardo Valdivieso5, María-Eugenia Gas5, Alejandro Maté3,4, Juan C Trujillo3,4.
Abstract
Patients affected by SARS-COV-2 have collapsed healthcare systems around the world. Consequently, different challenges arise regarding the prediction of hospital needs, optimization of resources, diagnostic triage tools and patient evolution, as well as tools that allow us to analyze which are the factors that determine the severity of patients. Currently, it is widely accepted that one of the problems since the pandemic appeared was to detect (i) who patients were about to need Intensive Care Unit (ICU) and (ii) who ones were about not overcome the disease. These critical patients collapsed Hospitals to the point that many surgeries around the world had to be cancelled. Therefore, the aim of this paper is to provide a Machine Learning (ML) model that helps us to prevent when a patient is about to be critical. Although we are in the era of data, regarding the SARS-COV-2 patients, there are currently few tools and solutions that help medical professionals to predict the evolution of patients in order to improve their treatment and the needs of critical resources at hospitals. Moreover, most of these tools have been created from small populations and/or Chinese populations, which carries a high risk of bias. In this paper, we present a model, based on ML techniques, based on 5378 Spanish patients' data from which a quality cohort of 1201 was extracted to train the model. Our model is capable of predicting the probability of death of patients with SARS-COV-2 based on age, sex and comorbidities of the patient. It also allows what-if analysis, with the inclusion of comorbidities that the patient may develop during the SARS-COV-2 infection. For the training of the model, we have followed an agnostic approach. We explored all the active comorbidities during the SARS-COV-2 infection of the patients with the objective that the model weights the effect of each comorbidity on the patient's evolution according to the data available. The model has been validated by using stratified cross-validation with k = 5 to prevent class imbalance. We obtained robust results, presenting a high hit rate, with 84.16% accuracy, 83.33% sensitivity, and an Area Under the Curve (AUC) of 0.871. The main advantage of our model, in addition to its high success rate, is that it can be used with medical records in order to predict their diagnosis, allowing the critical population to be identified in advance. Furthermore, it uses the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD 9-CM) standard. In this sense, we should also emphasize that those hospitals using other encodings can add an intermediate layer business to business (B2B) with the aim of making transformations to the same international format.Entities:
Mesh:
Year: 2022 PMID: 35388055 PMCID: PMC8986770 DOI: 10.1038/s41598-022-09613-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Selection window of the comorbidities. The red window is the period comprised between the beginning of the infection and when the seroconversion occurs.
Figure 2SARS-CoV test interpretation table where we can see if a test can be positive depending on the phase in which the patient is (Instituto de Salud Carlos III).
Figure 3Estimation of the patient exposure to the virus according to traceability of test.
Figure 4Detection periods of SARS-CoV-2 RNA by PCR and antibodies by serological techniques. On the X axis we can see the number of days that have elapsed, with day 0 being the onset of symptoms. The white stripe of letter “E” indicates the date of exposure to the virus and the beginning of the infection (Instituto de Salud Carlos III).
Correction factor (days) by kind of test, since a patient may be infected by the virus until a test can detect in general terms that the patient is positive.
| Test | Days since positive |
|---|---|
| PCR | 2 |
| IgA | 2 |
| Ac | 11 |
| IgM | 14 |
| IgG | 18 |
Baseline characteristics and comorbidity of patients with coronavirus disease (SARS-COV-2).
| Mean | Std | Min–max | ||
|---|---|---|---|---|
| Age of SARS-COV-2 survivors | General N = 1201 | 49.53 | 24.90 | 0–101 |
Women N = 662 | 49.98 | 25.53 | 0–101 | |
Men N = 539 | 48.97 | 24.12 | 0–101 | |
| Age of non SARS-COV-2 survivors | General N = 154 | 80.89 | 13.11 | 11–101 |
Women N = 75 | 83.43 | 14.46 | 11–101 | |
Men N = 79 | 78.49 | 11.27 | 49–101 | |
| Number of comorbidities in general population | General N = 10,677 | 8.89 | 9.83 | 0–77 |
Women N = 5112 | 8.40 | 9.05 | 0–77 | |
Men N = 5565 | 13.42 | 13.57 | 0–66 | |
| Number of comorbidities in the deceased population | General N = 3007 | 19.52 | 11.58 | 0–63 |
Women N = 1241 | 16.77 | 9.03 | 1–54 | |
Men N = 1766 | 22.35 | 4 | 4–63 |
Matrix of accuracy results, according to different scales and algorithms.
| Scaler | SVM | LR | K-neighbors | Decision Tree | Naive Bayes | Random Forest | MLP | GP | AdaBoost | Bagging |
|---|---|---|---|---|---|---|---|---|---|---|
| MinMax | 0.8826 | 0.8876 | 0.8759 | 0.8776 | 0.5378 | 0.8901 | 0.8901 | 0.8718 | 0.886 | 0.8793 |
| Standard | 0.8868 | 0.8968 | 0.8818 | 0.8801 | 0.5378 | 0.8976 | 0.8926 | 0.8718 | 0.8843 | 0.8859 |
| MaxAbs | 0.8826 | 0.8876 | 0.8759 | 0.8751 | 0.5378 | 0.8943 | 0.8909 | 0.8718 | 0.8851 | 0.8793 |
| Robust | 0.8826 | 0.8968 | 0.8784 | 0.8743 | 0.5378 | 0.8868 | 0.8934 | 0.8726 | 0.8835 | 0.8818 |
| Quant-Normal | 0.8859 | 0.8968 | 0.8843 | 0.8826 | 0.5378 | 0.8951 | 0.8993 | 0.8734 | 0.8843 | 0.8918 |
| Quant-Uniform | 0.8809 | 0.8876 | 0.8759 | 0.8693 | 0.5378 | 0.8984 | 0.8893 | 0.8718 | 0.886 | 0.8793 |
| PowerTransf-yeoJhonson | 0.8818 | 0.8968 | 0.8776 | 0.8693 | 0.5378 | 0.8935 | 0.8951 | 0.8718 | 0.8843 | 0.8859 |
Figure 5Accuracy obtained in each of the iterations of stratified k-fold cross validation with k = 5 on each of the algorithms used to obtain the best algorithm that works best a priori for the use case.
Figure 6Confusion matrix obtained after optimizing parameters using data not previously seen by the algorithm for the validation of the results. In this case, label 0 means that the patient is alive after infection, whereas 1 means that the patient dies.
Figure 7Area under the curve of the best algorithm obtained. The Y-axis show the relation of True Positive (TP) rate and the X-axis show False Positive (FP) rate with AUC = 0.871.