| Literature DB >> 35869091 |
Daniela Oliveira1, Diana Ferreira1, Nuno Abreu2, Pedro Leuschner2, António Abelha1, José Machado3.
Abstract
Nowadays, we are facing the worldwide pandemic caused by COVID-19. The complexity and momentum of monitoring patients infected with this virus calls for the usage of agile and scalable data structure methodologies. OpenEHR is a healthcare standard that is attracting a lot of attention in recent years due to its comprehensive and robust architecture. The importance of an open, standardized and adaptable approach to clinical data lies in extracting value to generate useful knowledge that really can help healthcare professionals make an assertive decision. This importance is even more accentuated when facing a pandemic context. Thus, in this study, a system for tracking symptoms and health conditions of suspected or confirmed SARS-CoV-2 patients from a Portuguese hospital was developed using openEHR. All data on the evolutionary status of patients in home care as well as the results of their COVID-19 test were used to train different ML algorithms, with the aim of developing a predictive model capable of identifying COVID-19 infections according to the severity of symptoms identified by patients. The CRISP-DM methodology was used to conduct this research. The results obtained were promising, with the best model achieving an accuracy of 96.25%, a precision of 99.91%, a sensitivity of 92.58%, a specificity of 99.92%, and an AUC of 0.963, using the Decision Tree algorithm and the Split Validation method. Hence, in the future, after further testing, the predictive model could be implemented in clinical decision support systems.Entities:
Mesh:
Year: 2022 PMID: 35869091 PMCID: PMC9306245 DOI: 10.1038/s41598-022-15968-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Flow of the referenced or suspected COVID-19 patient.
Figure 2Stages of the CRISP-DM methodology. Adapted from[34].
Description of the attributes of the dataset under study
| Attribute | Description | Type |
|---|---|---|
| Patient_id | Patient’s Identifier | Integer |
| Age | Patient’s age | Integer |
| Gender | Patient’s gender1 | Integer |
| Temperature | Patient’s body temperature2 | Integer |
| Headache | Patient’s headache evaluation2 | Polynomial |
| Muscle_pain | Patient’s muscle pain evaluation2 | Polynomial |
| Cough | Patient’s cough evaluation2 | Polynomial |
| Diarrhea | Patient’s diarrhea evaluation2 | Polynomial |
| Thoracalgia | Patient’s thoracalgia evaluation2 | Polynomial |
| Shortness_of_breath | Patient’s shortness of breath evaluation2 | Polynomial |
| Shortness_of_smell_taste | Patient’s shortness of smell and taste evaluation2 | Polynomial |
| Medication_last_24h | Medications taken in the previous 24 hours | Polynomial |
| Global_evaluation | Patient’s health status3 | Polynomial |
| Result | COVID-19 test result4 | Binomial |
1{Female, Male} 2{No,I have now, Keeps, Improved, Worsened} 3{I feel better, I feel worse, I feel the same} 4{Negative, Positive}.
Figure 3Distribution of patients per age.
Figure 4Distribution of patients per gender.
Figure 5Distribution of patients per body temperature.
Figure 6Distribution of patients per test result (target).
DMMs with the highest accuracy for each DMT.
| DMM | DMT | S | SM | MVA | DA | Accuracy (%) |
|---|---|---|---|---|---|---|
| 3 | DT | S1 | Split Validation (80%) | N/A | SMOTE | 96.25 |
| 7 | RF | S1 | Split Validation (70%) | N/A | SMOTE | 91.36 |
| 105 | DL | S3 | Split Validation (80%) | N/A | SMOTE | 89,07 |
| 27 | NV-K | S1 | Split Validation (80%) | N/A | SMOTE | 87.45 |
| 19 | NV | S1 | Split Validation (70%) | N/A | SMOTE | 86.32 |
| 13 | RT | S1 | Split Validation (70%) | N/A | SMOTE | 68.26 |
DMMs with the highest precision for each DMT.
| DMM | DMT | S | SM | MVA | DA | Precision (%) |
|---|---|---|---|---|---|---|
| 3 | DT | S1 | Split Validation (80%) | N/A | SMOTE | 99.91 |
| 13 | RT | S1 | Split Validation (70%) | N/A | SMOTE | 98.99 |
| 21 | NV | S1 | Split Validation (80%) | N/A | SMOTE | 98.80 |
| 63 | NV-K | S2 | Split Validation (80%) | N/A | SMOTE | 98.36 |
| 103 | DL | S3 | Split Validation (70%) | N/A | SMOTE | 97.57 |
| 83 | RF | S3 | Cross Validation | N/A | SMOTE | 91.97 |
DMMs with the highest sensitivity for each DMT.
| DMM | DMT | S | SM | MVA | DA | Sensitivity (%) |
|---|---|---|---|---|---|---|
| 9 | RF | S1 | Split Validation (80%) | N/A | SMOTE | 93.42 |
| 3 | DT | S1 | Split Validation (80%) | N/A | SMOTE | 92.58 |
| 105 | DL | S3 | Split Validation (80%) | N/A | SMOTE | 89.37 |
| 28 | NV-K | S1 | Split Validation (80%) | Replace | SMOTE | 82.57 |
| 22 | NV | S1 | Split Validation (80%) | Replace | SMOTE | 79.09 |
| 18 | RT | S1 | Cross Validation | Replace | SMOTE | 50.80 |
DMMs with the highest specificity for each DMT.
| DMM | DMT | S | SM | MVA | DA | Specificity (%) |
|---|---|---|---|---|---|---|
| 3 | DT | S1 | Split Validation (80%) | N/A | SMOTE | 99.92 |
| 13 | RT | S1 | Split Validation (70%) | N/A | SMOTE | 99.63 |
| 21 | NV | S1 | Split Validation (80%) | N/A | SMOTE | 99.12 |
| 63 | NV-K | S2 | Split Validation (80%) | N/A | SMOTE | 98.76 |
| 103 | DL | S3 | Split Validation (70%) | N/A | SMOTE | 98.18 |
| 83 | RF | S3 | Cross Validation | N/A | SMOTE | 92.35 |
DMMs with the highest AUC for each DMT.
| DMM | DMT | S | SM | MVA | DA | AUC |
|---|---|---|---|---|---|---|
| 7 | RF | S1 | Split Validation (70%) | N/A | SMOTE | 0.972 |
| 3 | DT | S1 | Split Validation (80%) | N/A | SMOTE | 0.963 |
| 105 | DL | S3 | Split Validation (80%) | N/A | SMOTE | 0.958 |
| 19 | NV | S1 | Split Validation (70%) | N/A | SMOTE | 0.951 |
| 25 | NV-K | S1 | Split Validation (70%) | N/A | SMOTE | 0.950 |
| 51 | RT | S2 | Split Validation (80%) | N/A | SMOTE | 0.687 |
Figure 7Average accuracy and precision of 216 tested models per DMT and SM.
Figure 8Average Sensitivity and Specificity of 216 tested models per DMT and SM.