| Literature DB >> 31133782 |
Evan Campbell1,2, Angkoon Phinyomark2, Erik Scheme1,2.
Abstract
In pattern recognition, the selection of appropriate features is paramount to both the performance and the robustness of the system. Over-reliance on machine learning-based feature selection methods can, therefore, be problematic; especially when conducted using small snapshots of data. The results of these methods, if adopted without proper interpretation, can lead to sub-optimal system design or worse, the abandonment of otherwise viable and important features. In this work, a deep exploration of pain-based emotion classification was conducted to better understand differences in the results of the related literature. In total, 155 different time domain and frequency domain features were explored, derived from electromyogram (EMG), skin conductance levels (SCL), and electrocardiogram (ECG) readings taken from the 85 subjects in response to heat-induced pain. To address the inconsistency in the optimal feature sets found in related works, an exhaustive and interpretable feature selection protocol was followed to obtain a generalizable feature set. Associations between features were then visualized using a topologically-informed chart, called Mapper, of this physiological feature space, including synthesis and comparison of results from previous literature. This topological feature chart was able to identify key sources of information that led to the formation of five main functional feature groups: signal amplitude and power, frequency information, nonlinear complexity, unique, and connecting. These functional groupings were used to extract further insight into observable autonomic responses to pain through a complementary statistical interaction analysis. From this chart, it was observed that EMG and SCL derived features could functionally replace those obtained from ECG. These insights motivate future work on novel sensing modalities, feature design, deep learning approaches, and dimensionality reduction techniques.Entities:
Keywords: EMG; affective computing; emotion recognition; feature extraction; feature selection; heat pain; multimodal analysis; physiological signals
Year: 2019 PMID: 31133782 PMCID: PMC6513974 DOI: 10.3389/fnins.2019.00437
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
List of all features included in the exploration, in alphabetical order and theoretical groups.
| 1 | HOMAV1 | Amplitude | First Higher-Order Mean Absolute Value | Phinyomark et al., |
| 2 | HOMAV1n | Amplitude | Normalized 1st Higher-Order Mean Absolute Value | Phinyomark et al., |
| 3 | HOMAV2 | Amplitude | Second Higher-Order Mean Absolute Value | Phinyomark et al., |
| 4 | HOMAV2n | Amplitude | Normalized 2nd Higher-Order Mean Absolute Value | Phinyomark et al., |
| 5 | MAV | Amplitude | Mean Absolute Value | Phinyomark et al., |
| 6 | P2P | Amplitude | Peak to Peak Amplitude | Walter et al., |
| 7 | PK | Amplitude | Peak Amplitude | Walter et al., |
| 8 | RMS | Amplitude | Root Mean Square | Phinyomark et al., |
| 9 | TMNP | Amplitude | Mean Relative Time of the Peaks | Phinyomark and Scheme, |
| 10 | TMNV | Amplitude | Mean Relative Time of the Valleys | Phinyomark and Scheme, |
| 11 | IQR | Variability | Interquartile Range | Walter et al., |
| 12 | R | Variability | Range | Walter et al., |
| 13 | SD | Variability | Standard Deviation | Walter et al., |
| 14 | VAR | Variability | Variance | Phinyomark et al., |
| 15 | IDS | Stationarity | Interal Degree of Stationarity | Cao and Slobounov, |
| 16 | MD | Stationarity | Median | Walter et al., |
| 17 | MIDS | Stationarity | Modified Integral Degree of Stationarity | Cao and Slobounov, |
| 18 | MMNDS | Stationarity | Modified Mean Degree of Stationarity | Cao and Slobounov, |
| 19 | SDMN | Stationarity | Standard Deviation of Mean Vector | Walter et al., |
| 20 | SDSD | Stationarity | Standard Deviation of Standard Deviation Vector | Walter et al., |
| 21 | ApEn | Entropy | Approximate Entropy | Ferenets et al., |
| 22 | FuzzyEn | Entropy | Fuzzy Entropy | Al-sharhan et al., |
| 23 | SampEn | Entropy | Sample Entropy | Richman and Moorman, |
| 24 | ShannonEn | Entropy | Shannon Entropy | Ferenets et al., |
| 25 | SpectralEn | Entropy | Spectral Entropy | Ferenets et al., |
| 26 | LDF | Linearity | Lag Dependence Function | Walter et al., |
| 27 | PLDF | Linearity | Population Lag Dependence Function | Walter et al., |
| 28 | CC | Similarity | Correlation Coefficient | Kennedy, |
| 29 | MDCOH | Similarity | Median Coherence | Dukic et al., |
| 30 | MI | Similarity | Mutual Information | Chen et al., |
| 31 | MICOH | Similarity | Modified Integral of Coherence | Dukic et al., |
| 32 | MNCOH | Similarity | Mean Coherence | Dukic et al., |
| 33 | MMNCOH | Similarity | Modified Mean Coherence | Dukic et al., |
| 34 | BW | Frequency | Bandwidth | Walter et al., |
| 35 | CF | Frequency | Center Frequency | Walter et al., |
| 36 | MDF | Frequency | Median Frequency | Phinyomark et al., |
| 37 | MNF | Frequency | Mean Frequency | Phinyomark et al., |
| 38 | MOF | Frequency | Mode Frequency | Walter et al., |
| 39 | ZC | Frequency | Zero Crossings | Phinyomark et al., |
| 40 | MNRR | Variability | Mean Resting Rate | Shaffer and Ginsberg, |
| 41 | RMSSD | Variability | Root Mean Square Successive Interval Differences | Shaffer and Ginsberg, |
| 42 | slopeRR | Variability | Slope Resting Rate | Shaffer and Ginsberg, |
Feature abbreviations are included along with theoretical feature types and an accompanying article with mathematical definition.
Figure 1Illustration of one-hundred-epoch hold-out-and-k-fold cross-validation scheme used for machine learning-based feature selection approaches.
Figure 4Topological feature chart. Node expansion highlighting key features selected within the SFS protocols for all classification tasks (FSa-FSd) and feature sets identified through other works of literature (FS1-FS4). Features belonging to defined feature sets were identified through superscripts (FSa: , FSb: , FSc: , FSd: , FS1: 1, FS2: 2, FS3: 3, FS4: 4). Node composition by modality is shown by pie charts. Node functional group is denoted by bold acronym (SAP, Signal Amplitude and Power; NLC, Nonlinear Complexity; FI, Frequency Information; UNI, Unique; CON, Connecting).
Figure 3Topological network rendered by the Mapper algorithm using k-NN distance as the filter function.
Classification performance of features using UFS.
| 1 | cCC | 547.52 | 72.31 ± 1.27 | cCC | 536.47 | 49.64 ± 0.87 | cCC | 444.48 | 28.81 ± 0.65 | |||
| 2 | tCC | 484.56 | 78.37 ± 1.15 | cP2P | 702.41 | 72.89 ± 1.12 | tCC | 383.7 | 32.19 ± 0.68 | |||
| 3 | cPK | 650.11 | 72.72 ± 1.14 | cP2P | 517.96 | 63.22 ± 1.13 | ||||||
| 4 | tMICOH | 155.25 | 79.27 ± 1.19 | cCC | 640.95 | 84.43 ± 0.96 | cPK | 479.09 | 63.08 ± 1.16 | cP2P | 294.22 | 40.62 ± 0.69 |
| 5 | tMNCOH | 155.25 | 79.28 ± 1.19 | cSD | 617.44 | 84.44 ± 0.89 | tCC | 456.92 | 67.04 ± 1.02 | cPK | 278.13 | 40.68 ± 0.76 |
| 6 | cMNCOH | 113.4 | 79.21 ± 1.18 | cRMS | 617.15 | 84.45 ± 0.89 | cSD | 445.75 | 66.91 ± 0.96 | cSD | 266.09 | 40.52 ± 0.79 |
| 7 | cMICOH | 112.78 | 79.22 ± 1.18 | cSDSD | 609.82 | 84.47 ± 0.9 | cRMS | 445.32 | 66.91 ± 0.96 | cRMS | 265.88 | 40.51 ± 0.8 |
| 8 | zMICOH | 49.2 | 79.18 ± 1.23 | cMAV | 555.3 | 84.73 ± 0.95 | cSDSD | 445.18 | 66.9 ± 0.98 | cSDSD | 261.74 | 40.3 ± 0.82 |
| 9 | zMNCOH | 49 | 79.21 ± 1.2 | cTMNV | 554.45 | 84.71 ± 0.94 | cMAV | 401.88 | 66.83 ± 0.98 | cMAV | 245.97 | 40.47 ± 0.8 |
| 10 | sMICOH | 13.29 | 79.11 ± 1.17 | cTMNP | 545.99 | 84.6 ± 0.94 | cTMNV | 400.05 | 66.73 ± 0.96 | cTMNV | 239.48 | 40.49 ± 0.78 |
| 36 | sR | 236.66 | 90.6 ± 0.85 | zZC | 162.89 | 70.22 ± 0.96 | sR | 92.06 | 43.2 ± 0.8 | |||
| 44 | cBW | 3.15 | 80.2 ± 1.2 | cMNCOH | 113.11 | 70.64 ± 1.01 | sSD | 68.58 | 43.1 ± 0.84 | |||
| 47 | zR | 2.89 | 80.25 ± 1.2 | cMDF | 151.55 | 90.82 ± 0.78 | hRMSSD | 95.12 | 70.52 ± 1 | |||
| 49 | cFuzzyEn | 2.81 | 80.3 ± 1.18 | cMNF | 149.31 | 90.91 ± 0.73 | sMICOH | 47.89 | 43.27 ± 0.78 | |||
i denotes the rank of feature's F-value and SFS iteration included in the model. Feature abbreviation, and the classification accuracy (mean ± sd) were displayed for each of the four classification problems. Local maxima are indicated by bolded entries in the upper part of table. Iterations were abridged within the lower part of table where global maxima for each classification task are indicated by bold entries.
Classification performance of the three most frequently selected features for each SFS iteration.
| 1 | 1 | cCC | 100 | 72.3 | cCC | 100 | 73.7 | cCC | 92 | 48.6 | cP2P | 61 | 29.9 |
| 2 | – | – | – | – | – | – | cP2P | 5 | 48.4 | cR | 13 | 29.9 | |
| 3 | – | – | – | – | – | – | cR | 3 | 48.3 | cCC | 13 | 29.7 | |
| E | – | – | – | – | – | – | – | – | – | Other | 13 | 29.5-29.8 | |
| 2 | 1 | tCC | 100 | 78.4 | cRMS | 47 | 83.8 | cPK | 61 | 62.7 | cCC | 87 | 38.4 |
| 2 | – | – | – | cP2P | 19 | 84.0 | cR | 19 | 62.9 | cPK | 7 | 35.5 | |
| 3 | – | – | – | cSD | 16 | 84.0 | cP2P | 12 | 63.0 | cP2P | 4 | 33.3 | |
| E | – | – | – | E | 18 | 83.4 | cCC | 8 | 61.6 | cR | 2 | 35.4 | |
| 3 | 1 | zCC | 84 | 79.3 | tCC | 100 | 86.2 | tCC | 100 | 66.6 | tCC | 100 | 40.6 |
| 2 | cHOMAV1 | 4 | 79.0 | – | – | – | – | – | – | – | – | – | |
| 3 | sSDSD | 2 | 78.8 | – | – | – | – | – | – | – | – | – | |
| E | Other | 10 | 77.4-78.7 | – | – | – | – | – | – | – | – | – | |
| 4 | 1 | cHOMAV2n | 17 | 79.8 | tMAV | 28 | 87.8 | sSDSD | 91 | 68.1 | |||
| 2 | cShannonEn | 10 | 79.8 | sSDSD | 17 | 87.5 | hslopeRR | 5 | 67.7 | sApEn | 12 | 41.2 | |
| 3 | cHOMAV1n | 10 | 79.9 | tRMS | 16 | 87.3 | sP2P | 2 | 67.5 | cShannonEn | 7 | 41.1 | |
| E | Other | 63 | 78.5-79.2 | Other | 39 | 87.3-88.1 | – | – | – | Other | 33 | 40.6-41.1 | |
| 5 | 1 | sSDSD | 32 | 89.0 | hslopeRR | 72 | 69.0 | hslopeRR | 28 | 41.5 | |||
| 2 | zPK | 6 | 80.1 | zCC | 16 | 88.7 | sSDSD | 6 | 68.9 | sSDSD | 17 | 41.8 | |
| 3 | sMICOH | 5 | 80.0 | cApEn | 7 | 88.4 | tRMS | 4 | 68.9 | sApEn | 8 | 41.9 | |
| E | Other | 82 | 78.7-79.8 | Other | 45 | 87.6-89.2 | Other | 8 | 68.0-68.1 | Other | 47 | 41.2-42.1 | |
| 6 | 1 | sMICOH | 6 | 80.4 | zCC | 55 | 89.4 | zCC | 61 | 69.5 | tCC | 20 | 42.0 |
| 2 | zShannonEn | 5 | 80.4 | cApEn | 4 | 89.3 | cShannonEn | 15 | 69.5 | zIQR | 11 | 41.9 | |
| 3 | cPK | 4 | 80.3 | sSDSD | 3 | 89.2 | tShannonEn | 7 | 69.4 | hslopeRR | 7 | 42.0 | |
| E | Other | 85 | 78.8-80.2 | Other | 38 | 87.7-89.9 | Other | 17 | 68.2-68.8 | Other | 62 | 41.4-43.0 | |
| 7 | 1 | sMICOH | 4 | 80.6 | zIQR | 21 | 89.9 | zCC | 17 | 69.9 | zCC | 20 | 42.5 |
| 2 | cIQR | 4 | 80.4 | zCC | 12 | 89.8 | cShannonEn | 10 | 70.0 | cShannonEn | 10 | 42.4 | |
| 3 | zMAV | 4 | 80.3 | zVAR | 10 | 89.8 | cSampEnz | 7 | 70.0 | zIQR | 9 | 42.4 | |
| E | Other | 88 | 79.0-80.2 | Other | 57 | 87.7-90.2 | Other | 66 | 68.3-69.5 | Other | 61 | 41.8-43.2 | |
| 8 | 1 | zPLDF | 5 | 80.0 | zCC | 18 | 43.0 | ||||||
| 2 | sIDS | 4 | 80.0 | tMIDS | 4 | 90.2 | zIQR | 7 | 70.3 | zIQR | 7 | 42.9 | |
| 3 | cShannonEn | 3 | 80.1 | zMIDS | 4 | 90.1 | cFuzzyEn | 6 | 70.3 | sVAR | 6 | 42.9 | |
| E | Other | 88 | 79.2-80.6 | Other | 86 | 88.5-90.7 | Other | 79 | 68.6-69.6 | Other | 69 | 41.7-43.4 | |
| Pain threshold | 38 | 80.8 ± 1.1 | |||||||||||
| Pain tolerance | 78 | 90.9 ± 0.9 | |||||||||||
| Three-class | 65 | 71.1 ± 1.1 | |||||||||||
| Five-class | 73 | 43.3 ± 0.8 | |||||||||||
i denotes the iteration of the feature selection protocol. R denotes the rank of the feature determined through majority vote. Feature abbreviation, the number of CVs the feature appears across (Votes), and classification accuracy (Accuracy) are displayed for each of the four classification problems. – denotes that the top 3 features or 100 votes were already allocated. Where the top 3 features did not inlcude all CV votes, remaining results are summarized with accuracy ranges as “Other”. Bolded entries indicate that a local maxima was reached for the classification problem. Global maxima determined for all classification tasks using the SFS protocol are abridged in the bottom table. Number of features for the global maxima were included as i with corresponding means and standard deviations of accuracies across CVs given in percentages.
Figure 2Accuracy of feature sets across all 100 CVs determined by SFS (FSa-FSd) as compared to those previously identified in the literature (FS1-FS4) (y-axis on the left side). Error bars are representative of standard deviation across all CVs. Circles indicate the number of features within each feature set (y-axis on the right side).
Relationships between heat-pain intensity and features derived from zEMG, cEMG, tEMG, SCL, and ECG.
| PK | ↑↑ | ↑↑ | ↑ | ↑ | na |
| P2P | ↑↑ | ↑↑ | ↑ | ↑↑ | na |
| RMS | ↑↑ | ↑ | ↑ | ↑ | na |
| TMNP | ↑↑ | − | ↑ | ↑ | na |
| TMNV | ↓↓ | − | ↓ | ↑ | na |
| MAV | ↑↑ | ↑ | ↑ | ↑ | na |
| HOMAV1 | ↑↑ | ↑ | ↑ | ↑ | na |
| HOMAV1n | ↑ | ↑↑ | ↑ | ↓ | na |
| HOMAV2 | ↑↑ | ↑ | ↑ | − | na |
| HOMAV2n | ↑ | ↑↑ | ↑ | ↓ | na |
| VAR | ↑ | − | ↑ | ↑ | na |
| SD | ↑↑ | ↑ | ↑ | ↑↑ | na |
| R | ↑↑ | ↑↑ | ↑ | ↑↑ | na |
| IQR | ↑ | − | ↑ | ↑↑ | na |
| MD | − | ↑ | − | − | na |
| MMNDS | ↑ | ↑ | − | − | na |
| IDS | ↓ | − | − | − | na |
| MIDS | − | − | − | − | na |
| SDMN | ↑ | ↑ | − | ↑↑ | na |
| SDSD | ↑↑ | ↑ | ↑ | ↑↑ | na |
| ApEn | ↑ | ↑ | ↑ | ↓ | na |
| FuzzyEn | − | − | − | ↓ | na |
| SampEn | − | − | − | ↓ | na |
| ShannonEn | ↑↑ | ↑↑ | ↑ | ↑ | na |
| SpectralEn | ↑ | − | − | − | na |
| PLDF | − | − | − | − | na |
| LDF | ↓ | ↓ | − | − | na |
| MDCOH | − | − | − | − | na |
| MNCOH | − | ↓ | ↓ | − | na |
| MMNCOH | − | − | − | − | na |
| MICOH | − | ↓ | ↓ | − | na |
| CC | ↓↓ | ↓ | ↓ | ↓ | na |
| MI | ↑↑ | ↑↑ | ↑ | ↑ | na |
| MNF | ↑ | ↑ | − | − | na |
| MDF | ↑ | ↑ | − | − | na |
| ZC | ↑↑ | ↑ | ↑ | na | na |
| MOF | ↑ | ↑ | − | na | na |
| BW | − | − | − | na | na |
| CF | − | − | − | na | na |
| MNRR | na | na | na | na | ↓ |
| RMSSD | na | na | na | na | ↓ |
| slopeRR | na | na | na | na | ↓ |
| ↓ | μ | ||||
| ↓↓ | 0.5 < | μ | |||
| ↓ | 0.2 < | μ | |||
| - | na | ||||
| ↑ | 0.2 < | μ | |||
| ↑↑ | 0.5 < | μ | |||
| ↑ | μ | ||||
Relationships were quantized into seven levels dependent on their statistical and substantive significance (as shown in the right sub-table).