| Literature DB >> 34069717 |
Nagarajan Ganapathy1, Diana Baumgärtel1, Thomas M Deserno1.
Abstract
Early detection of atrial fibrillation from electrocardiography (ECG) plays a vital role in the timely prevention and diagnosis of cardiovascular diseases. Various algorithms have been proposed; however, they are lacking in considering varied-length signals, morphological transitions, and abnormalities over long-term recordings. We propose dynamic symbolic assignment (DSA) to differentiate a normal sinus rhythm (SR) from paroxysmal atrial fibrillation (PAF). We use ECG signals and their interbeat (RR) intervals from two public databases namely, AF Prediction Challenge Database (AFPDB) and AF Termination Challenge Database (AFTDB). We transform RR intervals into a symbolic representation and compute co-occurrence matrices. The DSA feature is extracted using varied symbol-length V, word-size W, and applied to five machine learning algorithms for classification. We test five hypotheses: (i) DSA captures the dynamics of the series, (ii) DSA is a reliable technique for various databases, (iii) optimal parameters improve DSA's performance, (iv) DSA is consistent for variable signal lengths, and (v) DSA supports cross-data analysis. Our method captures the transition patterns of the RR intervals. The DSA feature exhibit a statistically significant difference in SR and PAF conditions (p < 0.005). The DSA feature with W=3 and V=3 yield maximum performance. In terms of F-measure (F), rotation forest and ensemble learning classifier are the most accurate for AFPDB (F = 94.6%) and AFTDB (F = 99.8%). Our method is effective for short-length signals and supports cross-data analysis. The DSA is capable of capturing the dynamics of varied-lengths ECG signals. Particularly, the optimal parameters-based DSA feature and ensemble learning could help to detect PAF in long-term ECG signals. Our method maps time series into a symbolic representation and identifies abnormalities in noisy, varied-length, and pathological ECG signals.Entities:
Keywords: RR intervals; classification; electrocardiography; machine learning; paroxysmal atrial fibrillation; symbolic pattern
Mesh:
Year: 2021 PMID: 34069717 PMCID: PMC8161329 DOI: 10.3390/s21103542
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The overall pipeline of the proposed approach. The interbeat (RR) intervals are applied to dynamic symbol assignment (DSA) to map electrocardiography (ECG) signals to a symbolic sequence. The thresholds is used to maps the symbols in the RR interval. The pattern transition probability is computed from co-occurrence pattern transition matrix for symbols . Finally, the is transformed to a 1-dimensional array using row-based concatenation, and the DSA features are extracted. The DSA features are fed to the k-nearest neighbor (kNN), support vector machine (SVM), random forest (RF), rotation forest (RoF), and ensemble learning (EL) classifiers to differentiate normal and paroxysmal atrial fibrillation (PAF) segments. The dark-blue arrow refers to the flow from one process to the next. The grey arrow refers to the intermediate outcome of the process.
Figure 2The pipeline of the dynamic symbol assignment (DSA) approach; representative RR intervals as input to the DSA method (a), distance evaluation of the input data (b), distance approximation using dynamic threshold lists (c), and representation of symbolic sequence after symbolization (d).
Figure 3The electrocardiography (ECG) is composed of R-waves and RR intervals.
An example of pattern transition probability computed from co-occurrence pattern transition matrix with word size for the symbolic sequence .
| Symbol Pattern | Pattern Transition Probabilities | ||
|---|---|---|---|
|
|
|
| |
|
| 0.04 | 0.07 | 0 |
|
| 0.04 | 0.04 | 0.07 |
|
| 0 | 0 | 0 |
|
| 0.04 | 0.07 | 0 |
|
| 0.04 | 0.14 | 0.07 |
|
| 0 | 0.14 | 0.04 |
|
| 0 | 0 | 0 |
|
| 0.04 | 0.07 | 0.04 |
|
| 0 | 0 | 0.07 |
Optimal parameters for each classifier.
| Database | Hyper-Parameters Range | Optimal |
|---|---|---|
| SVM | cost = (0.001, 0.01, 0.1, 1) | 0.01 |
| gamma = (0.01, 0.1, 1) | 0.1 | |
| kernel = (Linear, Polynomial, Radial basis function) | Linear | |
| kNN | c = (1, 2, 3…,10) | 5 |
| weights = (Uniform, Distance) | Uniform | |
| metric = (Euclidean, Manhattan, Minkowski) | Euclidean | |
| RF | max_depth = (10, 20, 30, …, 50, None) | None |
| max_features = (‘auto’, ‘sqrt’) | sqrt | |
| min_samples_leaf = (1, 2, 3) | 1 | |
| min_samples_split = (2, 4, 6, 8, 10, 12) | 10 | |
| n_estimators = (100, 200, 300, …, 500) | 100 | |
| RoF | max_features = (‘auto’, ‘sqrt’) | sqrt |
| classifers = (‘RF’, ‘J48′, ’Decision tree’) | RF | |
| maxGroup = (1, 2, 3…, 10) | 3 | |
| minGroup = (1, 2, 3…, 10) | 3 | |
| projectionFilter = (‘PCA’, ’random’) | PCA | |
| EL | number_of_classifers = (1,2,3…,10) | 3 |
| classifiers_used = (SVM, kNN, RF, RoF) | SVM, RF, RoF |
Description of database in use.
| Database | Leads | Subjects | Records | Sampling Rate (Hz) | Length (min) | Quantization Bit | AF Details | Total Length (h) |
|---|---|---|---|---|---|---|---|---|
| PAF Prediction Challenge—2001 | 2 | 48 | 50 | 128 | 25, 5 | 16 | 5 | 24 |
| AF Termination Challenge—2004 | 2 | 30 | 30 | 128 | 1 | 16 | 1 | 20–24 |
Figure 4Representative ECG measurements of SR (a,c), and PAF (b,d) segments obtained from AFPDB and AFTDB, respectively. The identification of R-Peaks in PAF segments (within the red-rectangle area) are not easily identifiable.
Figure 5Representative SR (a), and PAF (b) segments with their corresponding RR interval and discretized symbolic sequences.
Figure 6Representative heatmaps of transition patterns for SR (a), and PAF (b) segments using co-occurrence matrices.
Overall average performance (%) of the DSA method on the various databases using with varied . Bold values indicate the best performance in a group.
|
| Classifier | AFPDB | AFTDB | ||||||
|---|---|---|---|---|---|---|---|---|---|
| ACC | P | R | F | ACC | P | R | F | ||
| 3 | SVM | 80.0 | 82.8 | 80.0 | 81.4 | 86.7 | 89.4 | 86.7 | 88.0 |
| kNN | 81.8 | 84.8 | 81.8 | 83.3 | 81.7 | 86.7 | 81.7 | 84.1 | |
| RF | 91.8 | 91.9 | 91.8 | 91.8 | 93.5 | 96.9 | 96.7 | 96.8 | |
| RoF |
|
|
|
| 98.3 | 98.4 | 98.3 | 98.3 | |
| EL | 91.8 | 91.9 | 91.8 | 91.8 |
|
|
|
| |
| 4 | SVM | 77.3 | 81.0 | 77.3 | 79.1 | 85.0 | 88.4 | 85.0 | 86.7 |
| kNN | 52.7 | 61.4 | 52.7 | 56.7 | 50.0 | 50.0 | 50.0 | 50.0 | |
| RF | 81.8 | 84.0 | 81.8 | 82.9 | 93.3 | 94.1 | 93.3 | 93.7 | |
| RoF |
|
|
|
|
|
|
|
| |
| EL | 89.1 | 89.1 | 89.1 | 89.1 | 93.3 | 93.3 | 93.3 | 93.3 | |
| 5 | SVM | 77.3 | 81.9 | 77.3 | 79.5 | 85.0 | 88.4 | 85.0 | 86.7 |
| kNN | 50.0 | 50.0 | 50.0 | 50.0 | 50.0 | 50.0 | 50.0 | 50.0 | |
| RF | 72.7 | 82.4 | 72.7 | 77.2 | 48.3 | 83.5 | 75.0 | 79.0 | |
| RoF | 88.2 | 89.2 | 88.2 | 88.7 | 85.0 | 87.1 | 85.0 | 86.0 | |
| EL |
|
|
|
|
|
|
|
| |
Figure 7Boxplots representing the distribution of maximum co-occurrence values for varied with and for varied with in AFPDB (a,c) and AFTDB (b,d).
The average F (%) for varied in two databases. Bold values indicate the best performance in a group.
|
| Classifier | AFPDB | AFTDB | ||||||
|---|---|---|---|---|---|---|---|---|---|
| 3 | 5 | 7 | 9 | 3 | 5 | 7 | 9 | ||
| 3 | SVM | 86.9 | 82.1 | 81.4 | 80.6 | 84.1 | 89.1 | 88.0 | 88.0 |
| kNN | 89.2 | 89.0 | 83.3 | 72.5 | 98.3 | 84.1 | 84.1 | 62.8 | |
| RF |
|
| 91.8 | 92.0 | 93.7 | 96.7 | 96.8 | 88.9 | |
| RoF | 90.0 | 91.8 |
|
|
|
|
|
| |
| EL | 88.2 | 91.0 | 91.8 | 93.6 | 95.2 | 98.3 | 98.3 | 95.0 | |
| 4 | SVM | 85.2 | 79.9 | 79.1 | 79.1 | 81.5 | 80.2 | 86.7 | 86.7 |
| kNN | 90.1 | 62.8 | 56.7 | 50.0 | 92.3 | 50.0 | 50.0 | 62.8 | |
| RF |
| 91.1 | 82.9 | 84.8 | 95.0 | 88.1 | 93.7 | 85.4 | |
| RoF | 90.9 |
|
|
|
|
|
|
| |
| EL | 88.2 | 92.8 | 89.1 | 91.8 | 91.7 | 99.8 | 93.3 | 93.4 | |
| 5 | SVM | 82.9 | 79.1 | 79.5 | 60.7 | 78.9 | 77.7 | 86.7 | 47.1 |
| kNN | 73.7 | 60.7 | 50.0 | 50.0 | 63.9 | 50.0 | 50.0 | 62.8 | |
| RF | 87.5 | 77.9 | 77.2 | 67.8 | 95.0 | 73.9 | 79.0 | 70.4 | |
| RoF | 90.0 | 87.9 | 88.7 | 77.9 |
| 84.1 | 86.0 | 80.3 | |
| EL |
|
|
|
| 98.3 |
|
|
| |
Figure 8The average receiver operating characteristic (ROC) plots represent the performance of the classifiers for the DSA feature obtained using with for AFPDB (a–c) and AFTDB (d–f).
The average F (%) and AUCs (%) obtained by the DSA method for a varied length of the time segments in the AFPDB database. Bold values indicate the best performance in a column.
| Classifier | Time Segments (min) | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |||||||||||
| ACC | F | AUCs | ACC | F | AUCs | ACC | F | AUCs | ACC | F | AUCs | ACC | F | AUCs | |
| SVM | 85.7 | 86.7 | 86.0 | 88.0 | 88.4 | 88.0 | 88.0 | 88.4 | 88.0 | 90.0 | 90.0 | 90.0 | 86.4 | 86.9 | 86.4 |
| kNN | 92.0 | 92.1 | 96.2 | 94.0 |
|
| 96.0 |
|
|
|
|
| 89.0 | 89.1 | 93.9 |
| RF |
|
|
| 94.0 | 94.0 | 96.1 | 96.0 | 96.1 | 95.8 | 92.0 | 92.1 | 96.1 |
|
| 95.8 |
| RoF | 92.0 | 92.0 | 97.1 |
| 96.1 | 95.5 |
| 96.0 | 96.0 | 94.0 | 94.1 | 95.7 | 90.0 | 90.0 |
|
| EL | 92.0 | 92.1 | 96.0 | 90.0 | 90.0 | 95.4 | 94.0 | 94.1 | 92.0 | 94.0 | 94.1 | 94.1 | 88.2 | 88.3 | 92.2 |
Figure 9Comparison of P (a) and R (b) obtained for a varied length of time series by the DSA method and its classification using different classifiers.
The average F (%) and AUC (%) obtained by the DSA method for cross-dataset using . The PAF and SR segments are obtained from AFTDB and AFPDB databases, respectively. Bold values indicate the best performance in a column.
| Classifier | Symbol Lengths | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 5 | 7 | 9 | |||||||||
| ACC | F | AUCs | ACC | F | AUCs | ACC | F | AUCs | ACC | F | AUCs | |
| SVM | 85.0 | 86.7 | 85.0 | 85.0 | 86.7 | 85.0 | 85.0 | 86.7 | 85.0 | 85.00 | 86.7 | 85.0 |
| kNN | 95.0 | 95.1 | 99.3 | 88.3 | 89.4 | 94.0 | 82.7 | 84.1 | 95.4 | 55.0 | 63.9 | 65.7 |
| RF | 93.3 | 93.3 | 99.3 | 96.7 | 96.7 | 99.4 | 95.0 | 95.2 | 99.8 | 98.3 | 98.3 | 99.8 |
| RoF | 96.7 | 96.7 |
|
|
|
|
|
|
|
|
|
|
| EL |
|
| 96.9 | 99.8 | 99.8 | 99.8 | 99.8 | 99.8 | 99.8 | 96.7 | 96.8 | 99.8 |
Figure 10The wearable T-shirt (Pro-Kit, Hexoskin, Quebec, Canada) (a), the cECG chair (Smart Seat, Capical, Braunschweig, Germany) (d), the acquired one-minute ECG segments from these sensors (b,e), and the corresponding transition pattern evaluated using our method (c,f), respectively.
Comparison of the state-of-the-art methods with our approach. Bold values indicate the best performance in a column.
| Existing Methods | Database | Classifiers | Validation | Length (min) | P | R | F | ACC |
|---|---|---|---|---|---|---|---|---|
| Mohebbi and Ghassemian [ | AFPDB | SVM | Split | - | 96.3 | 93.1 | - | - |
| Zong et al. [ | AFPDB | ARMA, FL | Split | 30 | - | - | - | 80.88 |
| Sutton et al. [ | AFPDB | LD, LR, DT, RF | 5-fold | 1 |
| 73.6 | - | 82.0 |
| Pourbabaee et al. [ | AFPDB | CNN, kNN, SVM, MLP | Split | 5 | - | - | - | 91.0 |
| Park et al. [ | Combined AFPDB and AFTDB | SVM | 4-fold | 1 | 91.4 | 92.9 | - | - |
| Our Method | AFPDB | SVM, kNN, RF, RoF, EL | 5-fold | 1, 5 | 94.6 | 94.5 | 94.6 | 94.0 |
| AFTDB | 1 | 99.8 |
|
|
|