| Literature DB >> 33968352 |
Sadegh Ilbeigipour1, Amir Albadvi1, Elham Akhondzadeh Noughabi1.
Abstract
One of the major causes of death in the world is cardiac arrhythmias. In the field of healthcare, physicians use the patient's electrocardiogram (ECG) records to detect arrhythmias, which indicate the electrical activity of the patient's heart. The problem is that the symptoms do not always appear and the physician may be mistaken in the diagnosis. Therefore, patients need continuous monitoring through real-time ECG analysis to detect arrhythmias in a timely manner and prevent an eventual incident that threatens the patient's life. In this research, we used the Structured Streaming module built top on the open-source Apache Spark platform for the first time to implement a machine learning pipeline for real-time cardiac arrhythmias detection and evaluate the impact of using this new module on classification performance metrics and the rate of delay in arrhythmia detection. The ECG data collected from the MIT/BIH database for the detection of three class labels: normal beats, RBBB, and atrial fibrillation arrhythmias. We also developed three decision trees, random forest, and logistic regression multiclass classifiers for data classification where the random forest classifier showed better performance in classification than the other two classifiers. The results show previous results in performance metrics of the classification model and a significant decrease in pipeline runtime by using more class labels compared to previous studies.Entities:
Mesh:
Year: 2021 PMID: 33968352 PMCID: PMC8084659 DOI: 10.1155/2021/6624829
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Record and number of the total samples considered for train and test datasets.
| Data set | Record | Total samples |
|---|---|---|
| Train set | 118-205-232 | 1111800 |
| Test set | 115-124-232 | 358200 |
Figure 1Block diagram of data preprocessing and classification.
Figure 2ECG filtered and R-peak detected of the signal in 360 Hz sample rate.
Figure 3A normal segmented beat in 360 Hz sample rate.
Figure 4Feature extraction of ECG signal using the DWT algorithm.
The number of training samples according to each class label sampled at 360 Hz and the corresponding record in the MIT/BIH database.
| Class label | Record | Number of samples |
|---|---|---|
| Normal beat | 205 | 400000 |
| RBBB arrhythmia | 118 | 400200 |
| Atrial arrhythmia | 232 | 311600 |
Figure 5Data Stream as an unbounded table in Apache Spark Structured Streaming.
Figure 6Framework of the proposed online cardiac arrhythmias detection pipeline on Apache Spark.
The number of test samples according to each class label sampled at 360 Hz and the corresponding record in the MIT/BIH database.
| Class label | Record | Number of samples |
|---|---|---|
| Normal beat | 115 | 104400 |
| RBBB arrhythmia | 124 | 129600 |
| Atrial arrhythmia | 232 | 124200 |
Multiclass classification metrics obtained using random forest on the test dataset.
| Metrics | Accuracy (%) | Sensitivity (%) | Specificity (%) | F1-score (%) | Precision (%) | AUC score (%) | False positive rate |
|---|---|---|---|---|---|---|---|
| Value | 88.7 | 83.8 | 97.5 | 86.08 | 92.5 | 86.2 | 0.024 |
Classification performance of the proposed method and comparison with some online methods from the literature.
| Method | Acc (%) | Se (%) | Sp (%) |
|---|---|---|---|
| Lee et al. [ | 99 | — | — |
| Park and Kang [ | 96.7 | 99.5 | 89.9 |
| Sutton et al. [ | 82.1 | 100 | 73.6 |
| Lahdenoja [ | 97 | 93 | 100 |
| Tateno and Glass [ | — | 94.4 | 97.2 |
| Dash et al. [ | — | 90.2 | 91.2 |
| Jang et al. [ | 91.9 | 84.6 | 94.3 |
| Gradl et al. [ | — | 89.5 | 80.6 |
| Leutheuser [ | 91.6 | 90.9 | 92.3 |
| Yen et al. [ | 98.3 | — | — |
| Oresko et al. [ | 93.3 | — | — |
| Proposed method | 88.7 | 83.8 | 97.5 |
Classification performance of the proposed method and comparison with some recently proposed methods.
| Approach | Method | Acc | Se/Rec | Pre | F1 score | Sp |
|---|---|---|---|---|---|---|
| Offline | Wang et al. [ | — | 82.2 | 83.8 | 82.8 | — |
| Ghosh et al. [ | 99.4 | 98.7 | — | — | 100 | |
| Mahmud et al. [ | 99.2 | 99.1 | 99.0 | 99.1 | — | |
| He et al. [ | 95.1 | 87.2 | 82.4 | 84.0 | — | |
|
| ||||||
| Online | Kanadala et al. [ | — | 74.2 | — | — | 88.4 |
| Proposed method | 88.7 | 83.8 | 92.5 | 86.0 | 97.5 | |
Figure 7Execution time (Ms) consumed by a query for different packs in Apache Spark Structured Streaming.
Consumption time of the proposed method and comparison with the novel methods from the literature.
| Methods | Implementation | Class number | Average consumption time for all epochs (s) | The number of epochs' sample | Epochs time (s) |
|---|---|---|---|---|---|
| Sutton et al., 2018 [ | Apache Spark Streaming | 2 | ±2 | 480 (8 Hz sample rate) | 60 |
| Sutton et al., 2018 [ | MatLab | 2 | >2 | 480 (8 Hz sample rate) | 60 |
| Proposed method | Apache Structured Streaming | 3 | ±1 | 1800 (360 Hz sample rate) | 5 |