| Literature DB >> 35036334 |
Shaik Asif Hussain1, Nizar Al Bassam1, Amer Zayegh1, Sana Al Ghawi1.
Abstract
COVID-19 pandemic seriousness is making the whole world suffer due to inefficient medication and vaccines. The article prediction analysis is carried out with the dataset downloaded from the Application peripheral interface (API) designed explicitly for COVID-19 quarantined patients. The measured data is collected from a wearable device used for quarantined healthy and unhealthy patients. The wearable device provides data of temperature, heart rate, SPO2, blood saturation, and blood pressure timely for alerting the medical authorities and providing a better diagnosis and treatment. The dataset contains 1085 patients with eight features representing 490 COVID-19 infected and 595 standard cases. The work considers different parameters, namely heart rate, temperature, SpO2, bpm parameters, and health status. Furthermore, the real-time data collected can predict the health status of patients as infected and non-infected from measured parameters. The collected dataset uses a random forest classifier with linear and polynomial regression to train and validate COVID-19 patient data. The google colab is an Integral development environment inbuilt with python and Jupyter notebook with scikit-learn version 0.22.1 virtually tested on cloud coding tools. The dataset is trained and tested in 80% and 20% ratio for accuracy evaluation and avoid overfitting in the model. This analysis could help medical authorities and governmental agencies of every country respond timely and reduce the contamination of the disease.•The measured data provide a comprehensive mapping of disease symptoms to predict the health status. They can restrict the virus transmission and take necessary steps to control, mitigate and manage the disease.•Benefits in scientific research with Artificial Intelligence (AI) to tackle the hurdles in analyzing disease diagnosis.•The diagnosis results of disease symptoms can identify the severity of the patient to monitor and manage the difficulties for the outbreak caused.Entities:
Keywords: AI model; Dataset; Healthcare; Pandemic; Quarantine; Wearable electronic device
Year: 2022 PMID: 35036334 PMCID: PMC8743393 DOI: 10.1016/j.mex.2022.101618
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 3Dataset modeling, classification, and prediction.
Fig. 1Shows the RF model classification.
The parameters in the dataset.
| Data parameters | Description | Attributes |
|---|---|---|
| Gender | Patient gender is an attribute primary spectrum of Health care | Male or female |
| Age | Patient's age is major influence associated to determine the health care | Less than 80 |
| Heart Rate | Pulse defines heart beats per minute as either too fast or too slow | < 100 |
| Temperature | Body temperature in human to evaluate person's health | < = 37 |
| SpO2 Saturation | It measures the percentage of blood oxygen content and arterial saturation | 96–100% |
| Blood pressure | Measures the blood pressure in the circulatory system | > 95 |
Data Columns and types with count (total 8 columns).
| # | Column | Non-Null count | Dtype |
|---|---|---|---|
| 0 | Id | 1085 non-null | Int 64 |
| 1 | gender | 902 non-null | Object |
| 2 | Age | 843 non-null | Float 64 |
| 3 | Heart_rate | 1085 non-null | Int 64 |
| 4 | Temperature | 1085 non-null | Float64 |
| 5 | SPO2_saturation | 1085 non-null | Float64 |
| 6 | Bpm | 1085 non-null | Int 64 |
| 7 | Health_status | 1085 non-null | Object |
Dtypes: float64(3), int64(3), object (2); Memory Usage: 67.9+ kB.
Shows the dataset file with all the data included.
| S. No. | id | Gender | Age | Heart_rate | Temperature | SpO2 Saturation | bpm | Health_status |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Male | 66.0 | 70 | 38.6 | 88.0 | 75 | Infected |
| 1 | 2 | Female | 56.0 | 74 | 39.6 | 88.0 | 70 | Infected |
| 2 | 3 | Male | 46.0 | 82 | 37.2 | 98.0 | 83 | Non Infected |
| 3 | 4 | Female | 60.0 | 90 | 38.6 | 98.0 | 75 | Non Infected |
| 4 | 5 | Male | 58.0 | 72 | 39.6 | 93.0 | 78 | Infected |
| … | … | … | … | … | … | … | … | |
| 1080 | 1081 | NaN | 24.0 | 110 | 38.0 | 30.0 | 72 | Infected |
| 1081 | 1082 | NaN | 35.0 | 110 | 38.0 | 30.0 | 74 | Infected |
| 1082 | 1083 | Male | NaN | 110 | 38.0 | 30.0 | 68 | Infected |
| 1083 | 1084 | Male | NaN | 110 | 38.0 | 30.0 | 67 | Infected |
| 1084 | 1085 | Male | 70.0 | 110 | 38.0 | 30.0 | 70 | Infected |
Fig. 4The design flow model of machine learning for COVID-19 dataset.
Shows the dataset shape for first five rows from the loaded dataset.
| S.No. | Patient ID | Gender | Age | Heart_rate | Temperature | SpO2 Saturation | BPM | Health_Status |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Male | 66.0 | 70 | 38.6 | 88.0 | 75 | Infected |
| 1 | 2 | Female | 56.0 | 74 | 39.6 | 88.0 | 70 | Infected |
| 2 | 3 | Male | 46.0 | 82 | 37.2 | 98.0 | 83 | Non-Infected |
| 3 | 4 | Female | 60.0 | 90 | 38.6 | 98.0 | 75 | Non-Infected |
| 4 | 5 | Male | 58.0 | 72 | 39.6 | 93.0 | 78 | Infected |
Shows the standard statistics calculated for the considered data.
| id | age | Heart_rate | Temperature | SpO2 Saturation | bpm | |
|---|---|---|---|---|---|---|
| Count | 1085.000000 | 843.000000 | 1085.000000 | 1085.000000 | 1085.000000 | 1085.000000 |
| Mean | 543.000000 | 49.483689 | 89.812903 | 38.562488 | 66.707465 | 71.221198 |
| std | 313.356825 | 18.255334 | 19.685747 | 4.592419 | 30.251069 | 13.148559 |
| Min | 1.000000 | 0.250000 | 47.000000 | 36.000000 | 20.000000 | 44.000000 |
| 25% | 272.000000 | 35.000000 | 72.000000 | 38.000000 | 30.000000 | 59.000000 |
| 50% | 543.000000 | 51.000000 | 91.000000 | 38.100000 | 82.000000 | 72.000000 |
| 75% | 814.000000 | 64.000000 | 110.000000 | 38.500000 | 87.300000 | 81.000000 |
| max | 1085.000000 | 96.000000 | 120.000000 | 95.000000 | 340.000000 | 109.000000 |
Shows the correlation coefficient for the dataset.
| id | age | Heart_rate | temperature | SpO2 Saturation | bpm | |
|---|---|---|---|---|---|---|
| ID | 1.000000 | −0.033531 | 0.721335 | −0.082765 | −0.558897 | 0.001511 |
| Age | −0.033531 | 1.000000 | 0.083925 | 0.091438 | 0.033087 | 0.061741 |
| Heart_rate | 0.721335 | 0.083925 | 1.000000 | −0.028797 | −0.235919 | 0.284245 |
| Temperature | −0.082765 | 0.091438 | −0.028797 | 1.000000 | 0.054208 | 0.003302 |
| SPO2 Saturation | −0.558897 | 0.033087 | −0.235919 | 0.054208 | 1.000000 | 0.079131 |
| bpm | 0.001511 | 0.061741 | 0.284245 | 0.003302 | 0.079131 | 1.000000 |
Shows the criterion of parameters for train and test points.
| S. No. | Parameters | Infected (Non-Healthy) | Non-Infected (Healthy) |
|---|---|---|---|
| 1. | Temperature | ||
| 2. | Heartbeat variation | > 100 | < 100 |
| 3. | BPM | <= 94 | > 95 |
| 4. | SpO2 | 95–100% | < 94% |
Dataset to measure Accuracy.
| Description | Parameters (X, Y) | Percentage |
|---|---|---|
| Accuracy score | Y_test and Y-Predict | 0.9926470588235294 |
| Training score | X_Train and Y-Train | 0.968019680196802 |
| Testing score | X_train and X-Test | 0.9705882352941176 |
Training and testing data for randomized values for 813 rows x 4 Columns.
| Id:813 rows x 4 Columns | Heart_rate | Temperature | SpO2_ saturation | bpm |
|---|---|---|---|---|
| 862 | 113 | 38.5 | 30.0 | 67 |
| 658 | 97 | 38.5 | 85.0 | 66 |
| 252 | 78 | 36.9 | 98.0 | 67 |
| 706 | 102 | 38.5 | 85.0 | 53 |
| 215 | 64 | 37.8 | 85.0 | 81 |
| … | … | … | … | … |
| 1033 | 110 | 38.0 | 30.0 | 75 |
| 763 | 109 | 38.5 | 87.3 | 82 |
| 835 | 112 | 38.5 | 30.0 | 77 |
| 559 | 70 | 37.6 | 30.0 | 57 |
| 684 | 95 | 38.5 | 85.0 | 94 |
Training and testing data for randomized values for 272 rows x 4 columns.
| Id: [272 rows x 4 columns] | Heart_rate | Temperature | SpO2_ saturation | bpm |
|---|---|---|---|---|
| 204 | 61 | 38.0 | 85.0 | 89 |
| 183 | 65 | 37.8 | 89.0 | 94 |
| 356 | 82 | 37.1 | 96.0 | 58 |
| 1069 | 118 | 38.0 | 30.0 | 86 |
| 272 | 85 | 38.0 | 90.0 | 70 |
| … | … | … | … | … |
| 255 | 87 | 38.0 | 98.0 | 76 |
| 495 | 57 | 38.1 | 30.0 | 57 |
| 319 | 71 | 38.1 | 85.0 | 74 |
| 493 | 62 | 38.1 | 55.0 | 56 |
| 144 | 77 | 39.6 | 82.0 | 84 |
Fig. 2Shows the process of classification with X and Y as actual and predicted values. (https://dsc-spidal.github.io/harp/docs/examples/rf/).
Fig. 5The performance estimation and predictive model flow.
Fig. 6The performance estimation and predictive model flow.
| Subject Area | Engineering |
| More specific subject area | Data Mining- Artificial Intelligence |
| Method name | Random Forest Classifier Algorithm used to train and test the data to predict the disease progression |
| Name and reference of original method | NA |
| Resource availability |