| Literature DB >> 33770831 |
Abstract
Despite the increasing awareness about its severity and the importance of adopting preventive habits, cardiovascular disease remains the leading cause of death worldwide. Most people already recognize that a healthy lifestyle, which includes a balanced diet and the practice of physical activity, is essential to prevent this disease. However, since few simple mechanisms allow a self-assessment and a continuous monitoring of the level of cardiac well-being, people are not conscious enough about their own cardiovascular health status. In this context, this paper presents and describes a tool related to the creation of cardiac well-being indexes that allow a quick and intuitive monitoring and visualization of the users' cardiovascular health level over time. For its implementation, data mining techniques were used to calculate the indexes, and a data warehouse was built to archive the data and to support the construction of dashboards for presenting the results.Entities:
Keywords: analytical systems; data mining; decision support systems; heart disease prevention; well-being indexes
Mesh:
Year: 2021 PMID: 33770831 PMCID: PMC8238473 DOI: 10.1515/jib-2020-0040
Source DB: PubMed Journal: J Integr Bioinform ISSN: 1613-4516
Optimized parameters for each data mining technique.
| J48 | RF | NB | KNN | MLP |
|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
| |
|
|
| |||
| Scenario I: 6 |
| |||
| Scenario II: 3 | ||||
|
|
Sensitivity, accuracy and score values obtained for training data.
| Algorithm | Sensitivity | Accuracy | Score |
|---|---|---|---|
| MLP | 0.700 | 0.753 | 0.7265 |
| RF | 0.703 | 0.748 | 0.7255 |
| J48 | 0.694 | 0.751 | 0.7225 |
| ASJ48 | 0.688 | 0.750 | 0.7190 |
| ASKNN | 0.701 | 0.735 | 0.7180 |
| ASMLP | 0.684 | 0.746 | 0.7150 |
| ASRF | 0.692 | 0.731 | 0.7115 |
| KNN | 0.685 | 0.728 | 0.7065 |
| NB | 0.647 | 0.739 | 0.6930 |
| ASNB | 0.638 | 0.735 | 0.6865 |
Sensitivity and accuracy values obtained for testing data.
| Algorithm | Sensitivity | Accuracy |
|---|---|---|
| MLP | 0.7117 | 0.7232 |
| RF | 0.7093 | 0.7288 |
| J48 | 0.6895 | 0.7192 |
| ASJ48 | 0.7130 | 0.7294 |
| ASKNN | 0.7121 | 0.7074 |
| ASMLP | 0.6980 | 0.7372 |
| ASRF | 0.7024 | 0.7206 |
| KNN | 0.6931 | 0.7032 |
| NB | 0.6595 | 0.7418 |
| ASNB | 0.6445 | 0.7380 |
Figure 1:The multidimensional schema of the data warehouse.
Figure 2:Implemented structures in the data warehouse.
Source to target mapping. Data from the various sources (left) and their transformations before being loaded into the target (right), in the Data Warehousea.
| Source | Attribute | Transformation | Attribute | Target | ||
|---|---|---|---|---|---|---|
| Source 1 | Id | Int |
| Int | Id_user | Dim user |
| District | Varchar |
| Varchar | District | ||
| Education level | Varchar |
| Varchar | Education | ||
| Salary | Int | Varchar | Salary | |||
| Gender | Varchar |
| Varchar | Gender | ||
| Varchar |
| Varchar | Gender | |||
| Weight | Int |
| Float | bmi | Fact table | |
| Height | Int | |||||
| Family history | Varchar |
| Varchar | Fam_history | ||
| Total cholesterol | Int |
| Int | Total_cholesterol | ||
| Fast glucose | Int |
| Int | Fast glucose | ||
| Diabetes | Varchar |
| Varchar | Diabetes | ||
| Hypertension | Varchar |
| Varchar | Hypertension | ||
| Pain after effort | Varchar |
| Varchar | Pain_effort | ||
| Hypothyroidism | Varchar |
| Varchar | Hypothyroidism | ||
| Clin. Analysis date | Date |
| Int | Age | ||
| Date of birth | Date | |||||
| Numb. Cigarettes | Int |
| Float | Smoke_amount | ||
| Smoke years | Int | |||||
| Source 2 | Id | Int |
| Int | Id_user | Fact table |
| Index date | Date |
| Date | Index_date | ||
| Calories | Int |
| Varchar | Physical_exercise | ||
| Systolic BP | Int |
| Int | Systolic BP | ||
| Diastolic BP | Int |
| Int | Diastolic BP | ||
| Source 3 | Code | Varchar |
| Varchar | Code | Dim district |
| Name | Varchar |
| Varchar | Name | ||
| Province | Varchar |
| Varchar | Province | ||
| Region | Varchar |
| Varchar | Region | ||
| Coastal location | Varchar |
| Varchar | Coastal location |
a1 – , 2 – , 3 –
Figure 3:Methodology used to populate the tables of the data warehouse (BPMN schema).
Figure 4:Dashboards to evaluate the cardiac well-being indexes of a user over time.
Figure 5:Dashboards regarding global cardiac well-being indexes for statistical analysis.
Figure 6:History table of the user with ID 4 containing the modified records regarding education level, residence and salary.
General statistics of numerical attributes used for the data mining process (before the process of data cleansing).
| Numeric attribute | Min | Max | Mean | Standard deviation |
|---|---|---|---|---|
| Age | 29 | 64 | 52.8 | 6.8 |
| Systolic blood pressure | −150 | 16020 | 128.9 | 159.6 |
| Diastolic blood pressure | −70 | 11000 | 96.9 | 188.1 |
| Cholesterol | 100 | 320 | 170.1 | 52.5 |
| Fast glucose | 80 | 400 | 119.6 | 56.4 |
| Smoking years | 0 | 50 | 5.4 | 12.0 |
| Number of cigarettes | 0 | 50 | 7.1 | 14.6 |
| BMI | 3.5 | 298.7 | 27.6 | 6.1 |
General statistics of nominal attributes used for the data mining process (before the process of data cleansing).
| Nominal attribute | Missing values | Value | Occurrences |
|---|---|---|---|
| Family history | 0 | Yes | 8,459 |
| No | 56,541 | ||
| Pain after effort | 0 | Yes | 8,620 |
| No | 56,380 | ||
| Physical exercise | 0 | None | 6,352 |
| Low | 6,413 | ||
| Moderate | 26,054 | ||
| High | 26,181 | ||
| Smoking | 34,956 | Yes | 6,651 |
| No | 23,393 | ||
| Hypertension | 0 | Yes | 25,225 |
| No | 39,775 | ||
| Hypothyroid | 0 | Yes | 13,501 |
| No | 51,499 | ||
| Gender | 0 | Male | 28,326 |
| Female | 36,674 | ||
| Record date | 0 | — | — |
| Diabetes | 65,000 | — | — |
| Class | 0 | Disease | 32,509 |
| No disease | 32,491 |