| Literature DB >> 30112388 |
Hong Tang1, Ziyin Dai1, Yuanlin Jiang1, Ting Li2, Chengyu Liu3.
Abstract
This paper proposes a method using multidomain features and support vector machine (SVM) for classifying normal and abnormal heart sound recordings. The database was provided by the PhysioNet/CinC Challenge 2016. A total of 515 features are extracted from nine feature domains, i.e., time interval, frequency spectrum of states, state amplitude, energy, frequency spectrum of records, cepstrum, cyclostationarity, high-order statistics, and entropy. Correlation analysis is conducted to quantify the feature discrimination abilities, and the results show that "frequency spectrum of state", "energy", and "entropy" are top domains to contribute effective features. A SVM with radial basis kernel function was trained for signal quality estimation and classification. The SVM classifier is independently trained and tested by many groups of top features. It shows the average of sensitivity, specificity, and overall score are high up to 0.88, 0.87, and 0.88, respectively, when top 400 features are used. This score is competitive to the best previous scores. The classifier has very good performance with even small number of top features for training and it has stable output regardless of randomly selected features for training. These simulations demonstrate that the proposed features and SVM classifier are jointly powerful for classifying heart sound recordings.Entities:
Mesh:
Year: 2018 PMID: 30112388 PMCID: PMC6077676 DOI: 10.1155/2018/4205027
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Flow diagram of the proposed classification.
Figure 2Illustration of the HSMM segmentation.
Summary of time-domain features.
| Feature index | Feature name | Physical meaning |
|---|---|---|
| 1 | m_RR | mean value of RR intervals |
| 2 | sd_RR | standard deviation (SD) of RR intervals |
| 3 | m_IntS1 | mean value of S1 intervals |
| 4 | sd_IntS1 | SD of S1 intervals |
| 5 | m_IntS2 | mean value of S2 intervals |
| 6 | sd_IntS2 | SD of S2 intervals |
| 7 | m_IntSys | mean value of systolic intervals |
| 8 | sd_IntSys | SD of systolic intervals |
| 9 | m_IntDia | mean value of diastolic intervals |
| 10 | sd_IntDia | SD of diastolic intervals |
| 11 | m_Ratio_SysRR | mean value of the ratio of systolic interval to RR interval of each heart beat |
| 12 | sd_Ratio_SysRR | SD of the ratio of systolic interval to RR interval of each heart beat |
| 13 | m_Ratio_DiaRR | mean value of the ratio of diastolic interval to RR interval of each heart beat |
| 14 | sd_Ratio_DiaRR | SD of the ratio of diastolic interval to RR interval of each heart beat |
| 15 | m_Ratio_SysDia | mean value of the ratio of systolic to diastolic interval of each heart beat |
| 16 | sd_Ratio_SysDia | SD of the ratio of systolic to diastolic interval of each heart beat |
| 17 | m_Ratio_S1RR | mean value of the ratio of S1 interval to RR interval of each heart beat |
| 18 | sd_Ratio_S1RR | SD of the ratio of S1 interval to RR interval of each heart beat |
| 19 | m_Ratio_S2RR | mean value of the ratio of S2 interval to RR interval of each heart beat |
| 20 | sd_Ratio_S2RR | SD of the ratio of S2 interval to RR interval of each heart beat |
Note: ∗ means the new added features in this study.
Summary of normalized amplitude features.
| Feature index | Feature name | Physical meaning |
|---|---|---|
| 1 | m_Amp_SysS1 | mean value of the ratio of the mean absolute amplitude during systole to that during S1 in each heart beat |
| 2 | sd_Amp_SysS1 | SD of m_Amp_SysS1 |
| 3 | m_Amp_DiaS2 | mean value of the ratio of the mean absolute amplitude during diastole to that during S2 in each heart beat |
| 4 | sd_Amp_DiaS2 | SD of m_Amp_DiaS2 |
| 5 | m_Amp_S1S2 | mean value of the ratio of the mean absolute amplitude during S1 to that during S2 in each heart beat |
| 6 | sd_Amp_S1S2 | SD of m_Amp_S1S2 |
| 7 | m_Amp_S1Dia | mean value of the ratio of the mean absolute amplitude during S1 to that during diastole in each heart beat |
| 8 | sd_Amp_S1Dia | SD of m_Amp_S1Dia |
| 9 | m_Amp_SysDia | mean value of the ratio of the mean absolute amplitude during systole to that during diastole in each heart beat |
| 10 | sd_Amp_SysDia | SD of m_Amp_SysDia |
| 11 | m_Amp_S2Sys | mean value of the ratio of the mean absolute amplitude during S2 to that during systole in each heart beat |
| 12 | sd_Amp_S2Sys | SD of m_Amp_S2Sys |
Summary of energy-domain features.
| Feature index | Feature name | Physical meaning |
|---|---|---|
| 1-27 |
| Ratio of a given band energy to the total energy |
| 28 |
| Mean of |
| 29 |
| standard deviation of |
| 30 |
| Mean of |
| 31 |
| standard deviation of |
| 32 |
| Mean of |
| 33 |
| standard deviation of |
| 34 |
| Mean of |
| 35 |
| standard deviation of |
| 36 |
| Mean of |
| 37 |
| standard deviation of |
| 38 |
| Mean of |
| 39 |
| standard deviation of |
| 40 |
| Mean of |
| 41 |
| standard deviation of |
| 42 |
| Mean of |
| 43 |
| standard deviation of |
| 44 |
| Mean of |
| 45 |
| standard deviation of |
| 46 | | Mean of |
| 47 |
| standard deviation of |
Summary of cepstrum-domain features.
| Feature index | Feature name | Physical meaning |
|---|---|---|
| 1-13 | Cepstrum coefficients | Cepstrum coefficients of a PCG recording |
| 14-26 | Cepstrum coefficients | Cepstrum coefficients of jointed S1 state |
| 27-39 | Cepstrum coefficients | Cepstrum coefficients of jointed systolic state |
| 40-52 | Cepstrum coefficients | Cepstrum coefficients of jointed S2 state |
| 53-65 | Cepstrum coefficients | Cepstrum coefficients of jointed diastole state |
Figure 3An example of cycle frequency spectral density. (a) A subsequence of a PCG recording and (b) cycle frequency spectral density of the subsequence.
Summary of cyclostationary features.
| Feature index | Feature name | Physical Meaning |
|---|---|---|
| 1 | m_cyclostationarity_1 | mean value of the degree of cyclostationarity |
| 2 | sd_cyclostationarity_1 | SD of the degree of cyclostationarity |
| 3 | m_cyclostationarity_2 | mean value of the sharpness measure |
| 4 | sd_cyclostationarity_2 | SD of the sharpness measure |
Summary of high-order statistics features.
| Feature index | Feature name | Physical Meaning |
|---|---|---|
| 1 | m_S1_skewness | mean value of the skewness of S1 |
| 2 | sd_S1_skewness | SD of the skewness of S1 |
| 3 | m_S1_kurtosis | mean value of the kurtosis of S1 |
| 4 | sd_S1_kurtosis | SD of the kurtosis of S1 |
| 5 | m_S2_skewness | mean value of the skewness of S2 |
| 6 | sd_S2_skewness | SD of the skewness of S2 |
| 7 | m_S2_kurtosis | mean value of the kurtosis of S2 |
| 8 | sd_S2_kurtosis | SD of the kurtosis of S2 |
| 9 | m_sys_skewness | mean value of the skewness of systole |
| 10 | sd_sys_skewness | SD of the skewness of systole |
| 11 | m_sys_kurtosis | mean value of the kurtosis of systole |
| 12 | sd_sys_kurtosis | SD of the kurtosis of systole |
| 13 | m_dia_skewness | mean value of the skewness of diastole |
| 14 | sd_dia_skewness | SD of the skewness of diastole |
| 15 | m_dia_kurtosis | mean value of the kurtosis of diastole |
| 16 | sd_dia_kurtosis | SD of the kurtosis of diastole |
Summary of entropy features.
| Feature index | Feature name | Physical meaning |
|---|---|---|
| 1 |
| Mean value of SampEn of S1 state |
| 2 |
| SD value of SampEn of S1 state |
| 3 |
| Mean value of SampEn of S2 state |
| 4 |
| SD value of SampEn of S2 state |
| 5 |
| Mean value of SampEn of systolic state |
| 6 |
| SD value of SampEn of systolic state |
| 7 |
| Mean value of SampEn of diastolic state |
| 8 |
| SD value of SampEn of diastolic state |
| 9 |
| Mean value of FuzzyMEn of S1 state |
| 10 |
| SD value of FuzzyMEn of S1 state |
| 11 |
| Mean value of FuzzyMEn of S2 state |
| 12 |
| SD value of FuzzyMEn of S2 state |
| 13 |
| Mean value of FuzzyMEn of systolic state |
| 14 |
| SD value of FuzzyMEn of systolic state |
| 15 |
| Mean value of FuzzyMEn of diastolic state |
| 16 |
| SD value of FuzzyMEn of diastolic state |
Summary of the proposed features.
| Index | Domain | Num. of features | Motivation |
|---|---|---|---|
| 1 | Time interval | 20 | The time interval of each state has physiological meaning based on heart physiology. |
| 2 | Frequency spectrum of state | 308 | To reflect the frequency spectrum within state. |
| 3 | State amplitude | 12 | The amplitude is related to the heart hemodynamics. |
| 4 | Energy | 47 | To reflect energy distribution with respect to frequency band |
| 5 | Frequency spectrum of records | 27 | To reflect frequency spectrum within records |
| 6 | Cepstrum | 65 | To reflect the acoustic properties. |
| 7 | Cyclostationary | 4 | To reflect the degree of signal repetition. |
| 8 | High-order statistics | 16 | To reflect the skewness and kurtosis of each signal state. |
| 9 | Entropy | 16 | To reflect the PCG signal inherent complexity. |
| Total | - | 515 | - |
Variables to evaluate the classification.
| Classification Results | ||||
|---|---|---|---|---|
| Normal (-1) | Uncertain (0) | Abnormal (1) | ||
| Reference label | Normal, clean |
|
|
|
| Normal, noisy |
|
|
| |
| Abnormal, clean |
|
|
| |
| Abnormal, noisy |
|
|
| |
Figure 4Correlation coefficient (CC) between features and the target label. (a) Time interval, (b) state amplitude, (c) energy, (d) high-order statistics, (e) cepstrum, (f) frequency spectrum of state, (g) cyclostationarity, (h) entropy, and (i) frequency spectrum of records.
Summary of the correlation coefficients.
| No. | Feature domain | Max. absolute | Physical meaning |
|---|---|---|---|
| 1 | Time interval | 0.286 | sd_IntSys |
| 2 | State amplitude | -0.159 | sd_Amp_S2Sys |
| 3 | Energy | 0.345 | Standard deviation of |
| 4 | High-order statistics | 0.185 | sd_S1_kurtosis |
| 5 | Cepstrum | 0.216 | The seventh cepstrum coefficient of S2 state |
| 6 | Frequency spectrum of state | 0.417 | Spectrum value of 30 Hz of S2 state |
| 7 | cyclostationarity | -0.240 | Sharpness of the peak of cycle frequency spectral density |
| 8 | Entropy | -0.374 | Average value of sample entropy of diastolic state |
| 9 | Frequency spectrum of records | -0.272 | Ratio of spectrum magnitude sum in [90 120] Hz |
Rank order of the nine domains based on contribution.
| Rank order | Feature domain (Total num. of features) | Num. of top 10 features | Num. of top 100 features | Num. of top 200 features | Num. of top 300 features |
|---|---|---|---|---|---|
| 1 | Frequency spectrum of state (308) | 4 | 39 | 115 | 183 |
| 2 | Energy (47) | 3 | 12 | 16 | 24 |
| 3 | Entropy (16) | 3 | 8 | 10 | 11 |
| 4 | Cepstrum (65) | 0 | 14 | 28 | 40 |
| 5 | Time interval (20) | 0 | 14 | 17 | 17 |
| 6 | Frequency spectrum of records (27) | 0 | 5 | 5 | 10 |
| 7 | High-order statistics (16) | 0 | 4 | 4 | 7 |
| 8 | Cyclostationarity (4) | 0 | 2 | 3 | 4 |
| 9 | State amplitude (12) | 0 | 2 | 2 | 4 |
Performance of signal quality classification.
| Percent data to train | Percent data to test | Estimation results for | Estimation results for | Sensitivity | Specificity | ||
|---|---|---|---|---|---|---|---|
| Clean | Noisy | Clean | Noisy | ||||
|
| |||||||
| 10% | 90% | 2421 | 166 | 104 | 146 | 0.96 | 0.47 |
| 20% | 80% | 2143 | 154 | 75 | 149 | 0.97 | 0.49 |
| 30% | 70% | 1869 | 143 | 59 | 136 | 0.97 | 0.49 |
| 40% | 60% | 1601 | 125 | 44 | 122 | 0.97 | 0.49 |
| 50% | 50% | 1332 | 104 | 36 | 104 | 0.97 | 0.50 |
| 60% | 40% | 1064 | 84 | 28 | 84 | 0.97 | 0.50 |
| 70% | 30% | 799 | 63 | 19 | 64 | 0.97 | 0.50 |
| 80% | 20% | 530 | 44 | 12 | 45 | 0.98 | 0.50 |
| 90% | 10% | 265 | 21 | 6 | 22 | 0.98 | 0.50 |
Performance of the classification.
| Case | Percent of data to train | Percent of data to test | Repeat times | Training and test data division | Sensitivity | Specificity | Overall score |
|---|---|---|---|---|---|---|---|
|
| 100% | 100% | 1 | No | 0.99 | 0.91 | 0.95 |
|
| |||||||
|
| 10% | 90% | 200 | Yes | 0.68±0.06 | 0.87±0.03 | 0.77±0.02 |
| 20% | 80% | 200 | Yes | 0.76±0.05 | 0.86±0.02 | 0.81±0.02 | |
| 30% | 70% | 200 | Yes | 0.80±0.04 | 0.87±0.02 | 0.83±0.02 | |
| 40% | 60% | 200 | Yes | 0.82±0.04 | 0.87±0.01 | 0.85±0.02 | |
| 50% | 50% | 200 | Yes | 0.84±0.03 | 0.87±0.01 | 0.85±0.01 | |
| 60% | 40% | 200 | Yes | 0.85±0.04 | 0.87±0.01 | 0.86±0.01 | |
| 70% | 30% | 200 | Yes | 0.86±0.04 | 0.87±0.01 | 0.87±0.02 | |
| 80% | 20% | 200 | Yes | 0.87±0.04 | 0.87±0.02 | 0.87±0.02 | |
| 90% | 10% | 200 | Yes | 0.88±0.04 | 0.87±0.02 | 0.88±0.02 | |
Note: the number is presented as mean±SD.
Figure 5Classification performance with respect to the number of top features and percent of data for training. (a) Overall scores obtained by top 1, top 2, top 3, top 4, and top 5 features. (b) Overall scores obtained by top 10, top 20, top 30, top 40, and top 50 features. (c) Overall scores obtained by top 100, top 200, top 300, top 400, and top 515 features.
Classification performance based on features in specified domain.
| Rank | Domain (# features) | Mean of overall score | Standard deviation |
|---|---|---|---|
| 1 | Frequency spectrum of state (308) | 0.85 | 0.021 |
| 2 | Entropy (16) | 0.82 | 0.028 |
| 3 | Energy (47) | 0.78 | 0.020 |
| 4 | Cepstrum (65) | 0.75 | 0.027 |
| 5 | High-order statistics (16) | 0.73 | 0.029 |
| 6 | Frequency spectrum of records (27) | 0.71 | 0.025 |
| 7 | Time interval (20) | 0.70 | 0.025 |
| 8 | Cyclostationarity (4) | 0.65 | 0.042 |
| 9 | State amplitude (12) | 0.61 | 0.025 |
Figure 6The mean overall score with respect to rate of data for training and value of sigma. The diamond shows the peak position.