Literature DB >> 33665429

Can the application of certain music information retrieval methods contribute to the machine learning classification of electrocardiographic signals?

Ennio Idrobo-Ávila¹, Humberto Loaiza-Correa¹, Rubiel Vargas-Cañas², Flavio Muñoz-Bolaños³, Leon van Noorden⁴.

Abstract

The electrocardiogram is traditionally used to diagnose a large number of heart pathologies. Research to improve the readability and classification of cardiac signals includes studies geared toward sonification of the electrocardiographic signal and others involving features related to music processing, such as Mel-frequency cepstral coefficients. In terms of music processing features, this study seeks to use music information retrieval (MIR) features as electrocardiographic signal descriptors. The study compares the discriminatory capability of the introduced features in relation to standard groups such as heart rate variability, wavelet transform, descriptive statistics, Mel coefficients and fractal analysis, evaluated using classification algorithms; the signals analyzed were extracted from public databases. The group of features extracted from wavelet transform and the MIR group showed a high level of discrimination; the best representation of the ECG signals in the study was achieved in most cases by the MIR features. Moreover, a correlation coefficient higher than 0.8 was found between a number of MIR and other feature groups, indicating a likely relationship between the electrocardiographic signals and MIR features. These results suggest the feasibility of representing the analyzed signals by music information retrieval descriptors, giving the potential to consider these electrocardiographic signals as analogues to musical signals.

Entities: Disease Gene Species

Keywords: ECG signal classification; Heart rate variability; Music; Neural networks; PhysioNet physiological signals database

Year: 2021 PMID： 33665429 PMCID： PMC7905363 DOI： 10.1016/j.heliyon.2021.e06257

Source DB: PubMed Journal: Heliyon ISSN： 2405-8440

Introduction

The electrocardiogram (ECG) is a medical examination that allows recording the electrical activity of the heart using contact electrodes placed in specific positions of the chest and extremities, together with a monitoring system [1]. It is one of the main physiological measures for medical diagnosis, analysis and monitoring of cardiac function [2]. Since it is related to the mechanical pumping of the heart, it provides information on its physiological state [2]. ECG represents one of the most widely used diagnostic tools in the health field, and is used in different situations from control in healthy subjects [3] to monitoring of patients in intensive care [3]. The high frequency of ECG use is due to the increase in the prevalence and incidence of cardiovascular diseases (CVD), where physicians and cardiologists have to read and analyze large numbers of ECG records, an increasingly challenging task. It is important to note that CVD represents the leading cause of death in the world [4], with more than 17 million deaths per year in 2016, and expected to exceed 23 million in 2030 [5]. Therefore, in recent years, research has been conducted concerning ECG applications, including studies related to signal processing [6, 7], machine learning [8], ordinal statistics of patterns and symbolic dynamics [9], or using morphological and dynamic features of ECG signals [10]. In relation to ECG classification, applications and analysis, other studies looked to relate music to ECG signals. The aim of these studies is the sonification of the ECG signal [11, 12]. In this regard, some features related to music processing, such as Mel-frequency cepstral coefficients (MFCC) have been used with promising results [13]. At this point, it is worth mentioning that MFCCs are used as timbre descriptors within music information retrieval (MIR) [14, 15], a field usually referred to as analysis of audio content, whose objective is the extraction of information from audio signals - digital recordings of music and audio in general [16]. MIR provides information about recognition of musical instruments, classification of musical phrases or melodies, and rhythm and high level-based music retrieval [17]. MIR analysis is also focused on recognition of emotions in music [18] and research continues concerning ECG and emotion in relation to music. Applications have been developed in which heart rate emotion data was used for generating music [19]. More recent research continues to propose methods for recognizing emotions elicited by music through ECG [20]. Even though prior research has explored a limited number of digital music processing features as ECG signal descriptors, until now the methods implemented only allowed ECG signal characterization via MFCC features using public databases in order to classify ECG as normal or abnormal. Therefore, studies in which more elements of musical analysis are incorporated must still be considered to see whether MIR features can allow better discrimination between ECG signals [13]. This study hypothesizes that ECG signals can indeed be represented through MIR features, which could serve to discriminate between different cardiopathies. In this paper, an approach that uses MIR as ECG features is presented to classify between inferior and anterior myocardial infarction, T-wave alternation, and normal ECG signals. MIR features of different natures are considered and compared with those features commonly used in cardiac signal analysis as features related to descriptive statistics [21], wavelet coefficients [22], heart rate variability [23], and fractal analysis [24]. In order to compare the descriptive capacity of each set of features, different machine learning algorithms were used, and the performance obtained with each set was compared. A correlation analysis was performed to associate the MIR features with those already known in the context of the ECG signals. The remainder of this document is organized as follows: Section 2 presents a description of the databases used, along with the stages of the experimental procedure; Section 3 reports the results, in which the best outcomes can be observed; Section 4 covers discussion of the results; and finally, Section 5 shows the conclusions of this study and some ideas for future research.

Materials and methods

Database description

This study makes use of ECG signals from two publicly available databases - the MIT-BIH and the C-database, both belonging to the PhysioNet databases [25]. The MIT-BIH database [26] was included since it was considered in previous research to study MFCC coefficients as descriptors of ECG signals [13] and thus enabled comparisons to be made. The C-database was included as a challenge to the MIR features since this database contains signals with more subtle differences between its classes, which poses difficulties for the classification stage. In both databases, Lead II of the ECG signals was selected since this lead registers the inferior electrical activity of the heart [27]. The MIT-BIH database is composed of signals belonging to three classes, arrhythmia (ARR), congestive heart failure (CHF), and normal sinus rhythm (NSR). These classes comprised 30 instances, each with a 512-second duration and sampling frequency of 128 Hz. The C-database involved the PTB Diagnostic ECG Database (PTB) [25, 28, 29] where signs of inferior and anterior myocardial infarction and normal signals were used. Additionally, the T-Wave Alternans Challenge Database (TWA) [25, 30, 31] was included where T-wave alternating signals were used. Thus, C-database was constructed with four classes: normal signals (N), signals with alternating T-wave (T-wave), signals with inferior myocardial infarction (IMI) and signals with anterior myocardial infarction (AMI). The classes N, T-wave, IMI and AMI consisted of 48, 41, 38, and 39 instances respectively, 166 instances in total. All signals are of 60 s duration and a sampling frequency of 1000 Hz, except for T-wave class signals, which were acquired with a sampling frequency of 500 Hz.

Experimental procedure

Once ECG derivations and anomalies were chosen and classified, the general experimental procedure was proposed. This procedure consisted of four stages: preprocessing, dataset augmentation, feature extraction and classification (Figure 1). The feature extraction stage was divided into two substages: extraction and selection, the latter to find out how each feature contributes to description of the database. For this, a new classification according to this ranking was performed, to observe the changes in the classification metrics with the number of best features; and a correlation between MIR features and other ECG features was also performed, to find any relationship(s) between them.

Figure 1

Block diagram of the experimental procedure.

Preprocessing

The preprocessing stage allows both conditioning the signals and eliminating unwanted elements such as certain frequential components. In this case, the preprocessing stage consisted of two sub-stages: resampling and baseline wander correction.

Resampling

The objective of the resampling stage is to put all signals of the dataset in as similar conditions as possible, i.e. similar frequential content. All signals in the C-database were resampled to the minimum sample frequency of the original dataset; namely, 500 Hz. The resampling process was not applied to the MIT-BIH database since all the signals had the same sampling rate.

Baseline wander correction

After resampling the signals, a baseline wander correction process was applied. This procedure was performed to remove noise produced by the breathing or movement of subjects [32], which produces problems as it overlaps with the ST segment [32]. Baseline wander correction was applied to both databases using discrete wavelet transform (DWT) with the same filter bank: Daubechies 6 (Figure 2) [33]. This mother wavelet was selected because it is one of the most similar to a normal ECG signal. Wavelet decomposition was performed on ten levels of the input signal, and a frequency band between 0 and 0.49 Hz was removed, corresponding to baseline drift [34]. Following this decomposition, signal reconstruction was performed from approximation coefficients of the last level and all the detail coefficients of all decomposition levels were equaled to zero. As a result of this process, a low-frequency signal represented the baseline of the input signal. Finally, it was necessary to subtract the signal obtained from wavelet analysis of the input signal, in order to correct the baseline.

Figure 2

Mother wavelet Daubechies 6. For more information on mother wavelets see [33].

Database augmentation

Wavelet-based shrinkage filtering

The data augmentation stage is essential to ensure that the machine learning algorithms have enough samples from which to learn. Additional data have to replicate the conditions that were encountered during the acquisition process. The data augmentation process was applied only to the C-database, as the MIT-BIH database contained enough data to train the chosen algorithms. Data augmentation was implemented using the wavelet-based shrinkage filtering method [35], in which three mother wavelets were applied to filter the signals: Daubechies 4 (db4), Daubechies 6 (db6), and Symlets 8 (sym8). Thus, for each raw signal, three more signals were generated (Figure 3).

Figure 3

Database augmentation stage.

Segmentation

Before the feature extraction stage, a segmentation phase was performed. This process was carried out to analyse the signals in more detail, i.e. short periods of time. The signals were segmented using a Hamming Window. In the MIT-BIH database, the signals were divided into segments of ten seconds, without overlapping, while in the C-database, each signal was divided into nine segments of ten seconds each, with a window overlapping of 40% [36]. This interval was chosen because, in a healthy heart rate range, there are between 50 and 90 bpm (beats per minute). Thus, it is possible to have between 8 to 15 heartbeats within each window [37]. Using this amount of heartbeats, it is possible to establish a pattern behaviour in both the ECG signal and some heart rate variability (HRV) analysis [38, 39].

Feature extraction

Once the segmentation stage was carried out, the feature extraction step was performed. This process allows compact representation of data (ECG signals) through features that depict particular behaviours or patterns in the data [40]. From the MIT-BIH database (Table 1), a set of MIR features was extracted, while six groups of features were extracted from the C-database (Table 2) - MIR, Mel-frequency cepstral coefficients (MFCC), descriptive statistics, fractal analysis, HRV, and descriptive statistics from wavelet coefficients. Although MFCC could be considered part of MIR, it was omitted in this MIT-BIH analysis and was considered elsewhere mainly in order to compare the outcomes obtained from MIR features with the MFCC results obtained in [13], through study of the same MIT-BIH database.

Table 1

List of MIR features extracted from ECG signals (26 features) MIT-BIH database.

Features	Description
mean pitch, standard deviation pitch, zero-crossing, low energy rate, tempo, minimum tempo, maximum tempo, mean tempo, standard deviation of tempo, pulse clarity, event density, minimum novelty, maximum novelty, mean novelty, standard deviation of novelty, key, mode, spectral spread, spectral distribution centroid, spectral roll-off, spectral skewness, spectral kurtosis, spectral flatness, spectral regularity, spectral entropy, root mean square	Although, many features were included, a greater influence of tempo-related features was expected due to the rhythmic nature of the heart. The main MIR characteristics considered here include root mean square, tempo, zero-cross, spectral flatness, and spectral spread. Root mean square is related to sound intensity [41]. Tempo is related to the tempo of a musical signal; this is estimated from the detection of periodicities in the signals [42]. Zero-crossing represents the number of changes of sign in consecutive blocks of signals [16]. Zero-crossing has high values for segments with noise and low values for tonal parts or parts with determined frequencies. Its long constant values during constant pitches indicate the presence of a fundamental frequency. Spectral flatness measures the amount of correlation structure that exists in a signal [43]; it also indicates whether the spectral distribution is smooth, associated with noise, or with peaks, associated with a tonal behavior [44]. A spectral flatness close to one represents spectrum with components in all frequency bands, similar to the spectrum of white noise. In contrast, a spectral flatness close to zero represents a spectrum with components in a limited number of frequency bands, such as those present in pure tones or the sum of sinusoidal components. Spectral spread is associated with the standard deviation of the signal spectrum around the spectral centroid; this measure is associated with perception of the timbre of an audio signal [16].

Features

Description

mean pitch, standard deviation pitch, zero-crossing, low energy rate, tempo, minimum tempo, maximum tempo, mean tempo, standard deviation of tempo, pulse clarity, event density, minimum novelty, maximum novelty, mean novelty, standard deviation of novelty, key, mode, spectral spread, spectral distribution centroid, spectral roll-off, spectral skewness, spectral kurtosis, spectral flatness, spectral regularity, spectral entropy, root mean square

Although, many features were included, a greater influence of tempo-related features was expected due to the rhythmic nature of the heart. The main MIR characteristics considered here include root mean square, tempo, zero-cross, spectral flatness, and spectral spread. Root mean square is related to sound intensity [41]. Tempo is related to the tempo of a musical signal; this is estimated from the detection of periodicities in the signals [42]. Zero-crossing represents the number of changes of sign in consecutive blocks of signals [16]. Zero-crossing has high values for segments with noise and low values for tonal parts or parts with determined frequencies. Its long constant values during constant pitches indicate the presence of a fundamental frequency. Spectral flatness measures the amount of correlation structure that exists in a signal [43]; it also indicates whether the spectral distribution is smooth, associated with noise, or with peaks, associated with a tonal behavior [44]. A spectral flatness close to one represents spectrum with components in all frequency bands, similar to the spectrum of white noise. In contrast, a spectral flatness close to zero represents a spectrum with components in a limited number of frequency bands, such as those present in pure tones or the sum of sinusoidal components. Spectral spread is associated with the standard deviation of the signal spectrum around the spectral centroid; this measure is associated with perception of the timbre of an audio signal [16].

Table 2

List of features extracted from ECG signals, C-database.

Group	Features	Total
MFCC	mfcc1, mfcc2, mfcc3, mfcc4, mfcc5, mfcc6, mfcc7, mfcc8, mfcc9, mfcc10, mfcc11, mfcc12, mfcc13, maximum mfcc, minimum mfcc, mean mfcc, mfcc variance, mfcc skewness, mfcc kurtosis	19
Descriptive statistics	maximum, minimum, mean, variance, skewness, kurtosis, median, mode, energy, entropy	10
Fractal analysis	Higuchi fractal dimension, Katz fractal dimension, Hurst exponent, HRV detrended fluctuation analysis alpha1, HRV detrended fluctuation analysis alpha2	5
HRV	mean R-R interval, root mean square of the successive differences, median of Euclidean distance, interquartile range of Euclidean distance, mean of the heart rate, probability of intervals greater than 50ms, triangular index from the interval histogram, performing triangular interpolation, correlation dimension, approximate entropy, standard deviation1 of the Poincaré plot, standard deviation2 of the Poincaré plot, ratio of standard deviation1 and standard deviation2 of the Poincaré plot, very low-frequency components, low-frequency components, high-frequency components, ratio of low and high-frequency components, power of low-frequency components, power of high-frequency components, total power, HRV detrended fluctuation analysis alpha1, HRV detrended fluctuation analysis alpha2	22
Wavelet coefficients	maximum(cfs0-7), minimum(cfs0-7), mean(cfs0-7), variance(cfs0-7), median(cfs0-7), mode(cfs0-7), energy(cfs0-7), entropy(cfs0-7), Higuchi fractal dimension (cfs0-7)	72
MIR	As described in Table 1	26
Total		154

List of MIR features extracted from ECG signals (26 features) MIT-BIH database. List of features extracted from ECG signals, C-database. In calculating HRV, both temporal and frequency domain features were considered. HRV was computed through detection of R-peaks; R-peaks were identified using the Pan-Tompkins algorithm [45]. Undetected peaks were marked manually. Regarding feature extraction of wavelet coefficients, three levels of decomposition, descriptive statistics for these components, and the Higuchi fractal dimension were considered. Implementation of the algorithms required in this study was carried out using Matlab [46], while MIR feature extraction was performed using MIRtoolbox [44, 47].

Feature ranking

A best features ranking was carried out using the information gain ratio [48] with the C-database. This process was performed to observe whether the classification process could be improved by considering only the best features. This process was also performed on the MIR features to determine how much each MIR feature contributes to data representation. From the ranking process, two groups were created, “Best features” (the five best-ranked features from each of the MIR, Statistics, HRV, and Wavelet groups) and “14-MIR” (the 14 best-ranked MIR features). The number 14 was selected according to a classification analysis with the best MIR features (See below: Section 3.5. Classification with best MIR features).

Classification

Once the features were extracted and selected, a classification process was carried out to determine whether or not the extracted features can represent particular ECG signals. Classification was implemented using six classical AI algorithms: AdaBoost, CN2 rule inducer, neural network, random forest, decision trees (Tree), and k-nearest neighbors (kNN) (Table 3). These algorithms were applied to both datasets: MIT-BIH and C-database.

Table 3

Configuration parameters of classical artificial intelligence algorithms implemented.

Classification algorithms	Configuration MIT-BIH database	ConfigurationC-database
AdaBoost	Base estimator: Tree, Number of estimators: 50, Learning rate: 1, Classification algorithm: SAMME.R	Base estimator: Tree, Number of estimators: 50, Learning rate: 1, Classification algorithm: SAMME.R
CN2 rule inducer	Unordered rule, Evaluation measure: entropy, Beam width: 5, Regression loss function: linear	Unordered rule, Evaluation measure: entropy, Beam width: 5, Regression loss function: linear
Neural network	Multi-layer perceptron with backpropagation. Neurons in hidden layers: 150, Activation: ReLu, Regularization: alpha = 0.002, Solver: Adam	Multi-layer perceptron with backpropagation. Neurons in hidden layers: 5, Activation: ReLu, Regularization: alpha = 3, Solver: L-BFGS-B
Random forest	Number of trees: 20	Number of trees: 10
Decision trees	Induce binary tree, Minimum number of instances in leaves: 7, Limit the maximal tree depth to: 100	Induce binary tree, Minimum number of instances in leaves: 2, Limit the maximal tree depth to: 100
K-nearest neighbors	Number of neighbors: 5, Metric: Manhattan, Weight: Distance	Number of neighbors: 3, Metric: Manhattan, Weight: Distance

Configuration parameters of classical artificial intelligence algorithms implemented.

Training, test and evaluation of models

The datasets were split into training and testing sets corresponding to 80 and 20% of the data, respectively. The datasets had a total of 4,590 and 5,976 instances related to MIT-BIH and the C-database, respectively, within which the training and test sets had respectively 3,672 and 918 instances for the MIT-BIH database, and 4,860 and 1,116 instances for the C-database. These two sets were constructed considering that data of subjects in the test set should not be in the training set. Although this approach generally produces low-performance metrics in evaluation because of inter-participant variability [49], it was selected since it is closest to real applications. Cross-validation with five folds was used for training the models. The performance of each classifier was evaluated using the area under the ROC curve (AUC), and accuracy. In the classification process, the performance of the classification algorithms was compared with each group of features, revealing the group of features which best described the ECG signals and the classifiers that achieved the best performance with the selected database.

Correlation analysis

Finally, using the Pearson correlation coefficient, an analysis was carried out of correlation between the MIR features and the other features extracted from the C-database, to establish a link between the ECG signal and the MIR features.

Results

Preprocessing

Baseline correction

Applying the methodology described above, the ECG baseline was subtracted (Figure 4), resulting in enhanced and cleaner signals, with a near-constant dc level.

Figure 4

Example of ECG baseline correction.

Database augmentation

Shrinkage filtering with three mother wavelets - Daubechies 4 (db4), Daubechies 6 (db6), and Symlets 8 (sym8) - augmented the database by a factor of 4 (from 166 to 664 instances), each filtering subtly modifying the original signal (Figure 5); e.g. filtering with the sym8 and db6 wavelets was noticeably smoother than with db4.

Figure 5

Example of wavelet-based shrinkage filtering.

MIR feature ranking

Ranking of all MIR features was carried out using the information gain ratio (Table 4). As expected, the tempo features best described these ECG signals.

Table 4

Ranking of MIR features using information gain ratio (IGR).

Ranking	MIR feature	IGR	Ranking	MIR feature	IGR
1	mean tempo	0.1181	14	standard deviation of tempo	0.0432
2	tempo	0.1137	15	spectral spread	0.0367
3	minimum tempo	0.0995	16	low energy rate	0.0324
4	pulse clarity	0.0773	17	spectral regularity	0.0296
5	root mean square	0.0731	18	spectral flatness	0.0283
6	spectral entropy	0.0708	19	standard deviation pitch	0.0240
7	spectral skewness	0.0641	20	zero-crossing	0.0233
8	spectral roll-off	0.0597	21	key	0.0165
9	spectral centroid	0.0579	22	mode	0.0060
10	spectral kurtosis	0.0569	23	standard deviation of novelty	0.0040
11	event density	0.0526	24	maximum novelty	0.0028
12	maximum tempo	0.0483	25	mean of novelty	0.0015
13	mean pitch	0.0453	26	minimum novelty	0.0009

Ranking of MIR features using information gain ratio (IGR).

Classification: evaluation of models

Classification: MIT-BIH database

After training the algorithms with the MIT-BIH database, performance was evaluated using the AUC (Figure 6), and accuracy metrics (Figure 7) in both binary and multiclass classifications. The two classes used were normal sinus rhythm (NSR) and congestive heart failure (CHF), while arrhythmia (ARR) to make three classes. With the exception of CN2 rule inducer for 3 classes, performance across all algorithms surpassed 0.90 AUC and 0.85 accuracy. Neural network, with 0.98 AUC and 0.99 accuracy (2 classes) and 0.96 AUC and 0.99 accuracy (3 classes) outperformed all the others.

Figure 6

Area under the ROC curve (AUC), MIT-BIH database: classification with MIR features.

Figure 7

Accuracy, MIT-BIH database: classification with MIR features.

Area under the ROC curve (AUC), MIT-BIH database: classification with MIR features. Accuracy, MIT-BIH database: classification with MIR features.

Classification: C-database

Having trained the algorithms with the C-database, performance was evaluated using AUC and accuracy. AUC again revealed a strong overall performance by neural network (Figure 8). The MIR features, in combination with neural network, outperformed all other groups. The “Best features” group was also seen to perform well. An accuracy value that exceeded 0.7 showed that MIR features in combination with neural network again performed best (Figure 9).

Figure 8

Area under the ROC curve (AUC), C-database: classification analysis of features and classifier algorithms.

Figure 9

Accuracy, C-database: classification analysis of features and classifier algorithms.

Area under the ROC curve (AUC), C-database: classification analysis of features and classifier algorithms. Accuracy, C-database: classification analysis of features and classifier algorithms.

Classification with best MIR features: C-database

Having ranked the MIR features, analysis of the classification performance was carried out using the MIR feature ranking. In this analysis, neural network was selected, as the best overall performing algorithm from previous results. Observation began with two features (Figure 10). In this performance, the values of AUC and accuracy are observed to be around 0.85 and 0.65 respectively.

Figure 10

Neural network classification with ranked MIR features (C-database).

Correlation: MIR and ECG features

In addition to the classification, a correlation analysis was performed using the Pearson correlation coefficient. The MIR features and their relationship to features from the other groups were considered using correlation analysis (Table 5). This approach used Landis and Koch levels of reliability [50] and relationships between features with an absolute value of correlation coefficient greater than 0.8 were considered.

Table 5

Correlation coefficients (Corr) between selected MIR features and other groups of features (ECG features).

MIR Features		ECG Features	Corr
tempo	-	mean heart rate	+0.979
tempo	-	mean R-R interval	-0.958
root mean square	-	energy (cfs0)	+0.951
root mean square	-	variance	+0.951
root mean square	-	variance (cfs0)	+0.950
root mean square	-	energy	+0.950
root mean square	-	maximum (cfs0)	+0.946
root mean square	-	maximum	+0.935
zero-crossing	-	Higuchi fractal dimension	+0.842
root mean square	-	mean	+0.834
spectral spread	-	Higuchi fractal dimension	+0.829
root mean square	-	mean (cfs0)	+0.829
spectral flatness	-	Higuchi fractal dimension	+0.819

Correlation coefficients (Corr) between selected MIR features and other groups of features (ECG features).

Classification without the AMI class: C-database

The confusion matrix of classification with neural network in the complete C-database is shown (Table 6), since this algorithm performed best overall. Compared to other metrics, precision allows easier visualization of system performance in determining the percentage of predicted classes belonging to each class, i.e. positives that are correct [51]. Given that a 90% precision was obtained, the AMI classification, with only 60% precision, was notably poor.

Table 6

Confusion matrix: classification with MIR features and neural network.

		Predicted heart condition
		N	T	IMI	AMI	Σ
Actual heart condition	N	189 (90%)	28 (10%)	42 (15%)	65 (20%)	324
	T	0 (0%)	248 (84%)	40 (14%)	0 (0%)	288
	IMI	0 (0%)	9 (3%)	180 (62%)	63 (20%)	252
	AMI	22 (10%)	9 (3%)	27 (9%)	194 (60%)	252
	Σ	211	294	289	322	1116

Confusion matrix: classification with MIR features and neural network. Taking into consideration that anterior myocardial infarction is not usually recognized in lead II of ECG, a new classification process was performed without the AMI class. This classification was made using MIR features and neural network (Figure 11). Performance in this configuration was improved significantly with respect to that of Figure 10, with values of AUC and accuracy observed to be around 0.90 and 0.80 respectively.

Figure 11

Neural network classification with ranked MIR features (C-database without AMI class).

Discussion

Comparing the performance level of the present study with previous work, the results with MFCC features were not able to match the correct classification of 99% reported by [13] in the use of neural network classifiers and MFCC. In that study, the high performance was achieved in a binary classification considering normal and abnormal classes, in which the authors took segments of ECG signals with a one second duration, whereas in our study, in addition to normal signals, three more classes were examined, thereby significantly increasing the complexity in the classification task; likewise, the analysis was carried out with different subjects in the training and test sets. Using the same MIT-BIH database and considering MIR features, an accuracy of 0.98 and 0.96 was obtained for 2 and 3 classes, respectively (Figure 7). As with the C-database, neural network was the best algorithm overall (Figures 6 and 7); CN2 rule inducer was the only algorithm with an accuracy below 0.85 (Figure 7) and all algorithms performed well, with AUC higher than 0.90 and accuracy above 0.85. It is not clear whether periods of one second are representative or not in the context of physiology and signal processing. Clearly, it is difficult to determine if a condition of arrhythmia is present in signals of one second duration. In the present study, therefore, ten-second segments were used. Anterior myocardial infarction consistently produced the weakest predictions due to the lack of information present in the original lead II signal (Table 6). Lead II is most able to recognize elements in the inferior wall of the heart. For the detection of infarction in the anterior wall, precordial leads V3 and V4 are recommended, as well as leads V1 and V2 [27], highlighting the need to establish which diseases or conditions it is possible to recognize in each electrocardiogram lead. Classification exempting the AMI class consequently gave a better performance than with all four classes (Figure 11), with AUC and accuracy around 0.9 and 0.8 respectively, stabilizing from the 12th best-ranked features. These metrics would thus appear to ratify the potential capacity of the selected MIR features to represent the ECG signals studied. Moreover, ranking of the MIR features in the C-database revealed tempo, pulse clarity, root mean square and spectral elements (skewness, roll-off, centroid, and kurtosis) as the strongest contributors to the classification task (Table 4). From the 13th best-ranked features, classification with neural network stabilized around 0.85 in AUC and 0.65 in accuracy. The neural network classifier, almost without exception, performed best in both AUC and accuracy for all feature sets of the C-database. Within the features themselves, the best performance involved the MIR, statistics, and wavelet groups. The MFCC and fractal features generally produced the lowest performances. However, combining the five best-ranked features from each of the MIR, Statistics, HRV and Wavelet groups also gave good results, noticeably using the neural network and random forest classifiers (Figures 8 and 9). From the correlation analysis, a direct relationship was observed between a number of features extracted from the ECG signal and certain MIR features. In particular, tempo was strongly linked to mean heart rate and R-R interval, while root mean square had a close association with energy, variance, maximum, and mean of the wavelet coefficient zero (cfs0). In the musical context, tempo, slower rhythms, and low-pitched notes are known to be linked generally with low frequencies and the energy of the selected ECG signals was found to be concentrated in the lower frequencies (cfs0). In terms of the rhythmic function of the human heart, it would seem to be more coherent to associate the low frequencies with tempo or with more ponderous rhythms (i.e. the normal heart rate of between 60 and 90 beats per minute). Given that tempo describes the speed of music [42], a close link would be anticipated with heart rate over a given length of time, as described by the R-R interval and heart rate average. From the same analysis, but in the time domain, zero-crossing allowed conclusions to be drawn concerning the higher frequencies. Zero-crossing is a measurement of sign changes in the signal. It is a noisiness indicator that could explain its association with the Higuchi fractal dimension, which measures irregularity in a signal or time series [52]. Where more zero crosses abound, the higher the fractal dimension will be. The fractal nature of the selected ECG signals was clearly associated with the presence of high frequencies, as revealed by features both of time and of frequency (e.g. spectral spread, spectral flatness). The direct relationships between the Higuchi fractal dimension and these spectral features imply that the wider the frequency range of the ECG signal, the higher the self-similarity may be, as observed particularly in the relationship with spectral spread. The correlation with spectral flatness confirms a fractal behaviour with the frequency content in all frequency bands, but especially in the higher frequencies. Thus, given the influence of frequency content in the fractal dimension of the electrocardiographic signals, through the sound and music property of timbre a link can be contemplated between the ECG signals and musical signals. MIR features were found to be descriptors for ECG signals of inferior and anterior myocardial infarction, T-wave alternating, congestive heart failure, arrhythmia and of the normal condition. Finally, because these particular ECG signals can be represented through MIR descriptors, they might be considered as analogues to musical signals, since MIR features are often used in musical signal processing.

Conclusions and future work

The use of music information retrieval (MIR) features as electrocardiographic signal descriptors was explored. By means of AI classification techniques and correlation analysis, a relationship was established between MIR features and ECG signals. The best representation of the ECG signals in the study was achieved in most cases by the MIR features. The features extracted from the statistics and wavelet groups also had a significant level of description. In the correlation phase, a solid association was established between the musical features and a number of features of interest extracted from ECG signals, most evident in the strong relationships found between tempo, heart rate, and Higuchi fractal dimension. Given that tempo relates to the speed of music, a close link could therefore be expected with heart rate over a given length of time, as described by the R-R interval and heart rate average. The fractal behaviour of the heart was further clearly associated with the frequency content in the ECG signals across all bands. When the energy was predominantly in the lower frequencies, this fractal nature was less pronounced. Conversely, the greatest fractal quality was observed with frequency content predominantly at the higher frequencies. Given the influence of frequency content in the fractal dimension of the electrocardiographic signals, a link might thus be considered between the ECG signals and musical signals through and music property of timbre. This study constitutes an initial approach to relating MIR features with ECG signals. The MIR features selected were shown to be capable of discriminating ECG signals in the study, making them potential candidates for use as electrocardiographic signal descriptors and contributing to the development of feature extraction of these signals. This approach could be expanded to discover other possible applications. This would necessitate the study of MIR features in more depth; the inclusion of new MIR features; the incorporation of other ECG referrals and different pathologies within a larger database; and eventually combining MIR features with other classic features of ECG analysis to improve classification performance. Moreover, a new classification metric might be developed, adjusted to these types of feature. Finally, a logical development of the present study would seek, through music information retrieval (MIR) features, to relate sound stimuli and ECG captured in subjects as they listen to these stimuli, adding to the science of music perception. It would also give a physiological meaning to MIR features associated with ECG signals, and thereby contribute greatly to the science of music therapy.

Declarations

Author contribution statement

Ennio Idrobo-Ávila, R, Vargas-Cañas: Conceived and designed the experiments; Performed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper. H. Loaiza-Correa, F. Muñoz-Bolaños, L. van Noorden: Analyzed and interpreted the data; Wrote the paper.

Funding statement

This work was supported by Universidad del Valle, Universidad del Cauca, and Colciencias (Funding call No. 727 of 2015), Colombia.

Data availability statement

Data associated with this study has been deposited at PhysioNet physiological signals database.

Declaration of interests statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.

17 in total

1. Classifying cardiac biosignals using ordinal pattern statistics and symbolic dynamics.

Authors: U Parlitz; S Berg; S Luther; A Schirdewan; J Kurths; N Wessel
Journal: Comput Biol Med Date: 2011-04-20 Impact factor: 4.589

Review 2. The worldwide environment of cardiovascular disease: prevalence, diagnosis, therapy, and policy issues: a report from the American College of Cardiology.

Authors: Lawrence J Laslett; Peter Alagona; Bernard A Clark; Joseph P Drozda; Frances Saldivar; Sean R Wilson; Chris Poe; Menolly Hart
Journal: J Am Coll Cardiol Date: 2012-12-25 Impact factor: 24.094

3. The measurement of observer agreement for categorical data.

Authors: J R Landis; G G Koch
Journal: Biometrics Date: 1977-03 Impact factor: 2.571

4. A real-time QRS detection algorithm.

Authors: J Pan; W J Tompkins
Journal: IEEE Trans Biomed Eng Date: 1985-03 Impact factor: 4.538

5. ECG signal quality during arrhythmia and its application to false alarm reduction.

Authors: Joachim Behar; Julien Oster; Qiao Li; Gari D Clifford
Journal: IEEE Trans Biomed Eng Date: 2013-01-15 Impact factor: 4.538

6. Higuchi dimension of digital images.

Authors: Helmut Ahammer
Journal: PLoS One Date: 2011-09-13 Impact factor: 3.240

7. Polyphonic sonification of electrocardiography signals for diagnosis of cardiac pathologies.

Authors: Jakob Nikolas Kather; Thomas Hermann; Yannick Bukschat; Tilmann Kramer; Lothar R Schad; Frank Gerrit Zöllner
Journal: Sci Rep Date: 2017-03-20 Impact factor: 4.379

Review 8. Heart Rate Variability and Cardiac Vagal Tone in Psychophysiological Research - Recommendations for Experiment Planning, Data Analysis, and Data Reporting.

Authors: Sylvain Laborde; Emma Mosley; Julian F Thayer
Journal: Front Psychol Date: 2017-02-20

Introduction

Materials and methods

Database description

Experimental procedure

Preprocessing

Resampling

Baseline wander correction

Database augmentation

Wavelet-based shrinkage filtering

Segmentation

Feature extraction

Feature ranking

Classification

Training, test and evaluation of models

Correlation analysis

Results

Preprocessing

Baseline correction

Database augmentation

MIR feature ranking

Classification: evaluation of models

Classification: MIT-BIH database

Classification: C-database

Classification with best MIR features: C-database

Correlation: MIR and ECG features

Classification without the AMI class: C-database

Discussion

Conclusions and future work

Declarations

Author contribution statement

Funding statement

Data availability statement

Declaration of interests statement

Additional information

Review 2. The worldwide environment of cardiovascular disease: prevalence, diagnosis, therapy, and policy issues: a report from the American College of Cardiology.

Review 8. Heart Rate Variability and Cardiac Vagal Tone in Psychophysiological Research - Recommendations for Experiment Planning, Data Analysis, and Data Reporting.

Review 9. The history, hotspots, and trends of electrocardiogram.