| Literature DB >> 36033642 |
Shruthi Suresh1, David T Newton2, Thomas H Everett3, Guang Lin4,5, Bradley S Duerstock1,6.
Abstract
Feature selection plays a crucial role in the development of machine learning algorithms. Understanding the impact of the features on a model, and their physiological relevance can improve the performance. This is particularly helpful in the healthcare domain wherein disease states need to be identified with relatively small quantities of data. Autonomic Dysreflexia (AD) is one such example, wherein mismanagement of this neurological condition could lead to severe consequences for individuals with spinal cord injuries. We explore different methods of feature selection needed to improve the performance of a machine learning model in the detection of the onset of AD. We present different techniques used as well as the ideal metrics using a dataset of thirty-six features extracted from electrocardiograms, skin nerve activity, blood pressure and temperature. The best performing algorithm was a 5-layer neural network with five relevant features, which resulted in 93.4% accuracy in the detection of AD. The techniques in this paper can be applied to a myriad of healthcare datasets allowing forays into deeper exploration and improved machine learning model development. Through critical feature selection, it is possible to design better machine learning algorithms for detection of niche disease states using smaller datasets.Entities:
Keywords: electrocardiography; feature selection; healthcare; machine learning; spinal cord injuries
Year: 2022 PMID: 36033642 PMCID: PMC9416695 DOI: 10.3389/fninf.2022.901428
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 3.739
FIGURE 1(A) Schematic of the sensors. Noninvasive electrodes placed on the ventral skin surface of a rat in Lead I configuration, the Coda® Blood Pressure system with occlusion and VPR cuff and a temperature probe connected to an Arduino®. Rats restrained in (B) Lomir® “cuddle” jacket and (C) Plexiglass tube to restrain the animal during data collection.
FIGURE 2(A) Raw ECG data collected from rats (B) processed with ECG without high frequency components and prominent R and S segments. This allows clear determination of individual beats of the ECG signal.
FIGURE 3(A) Raw skNA signal with QRS interferences (B) median filtered skNA signal without QRS interference (C) rectified and integrated skNA (iskNA) (D) mean baseline value of non-bursting events (pink dotted horizontal line) and burst activity during sympathetic activation event (vertical dashed line) indicated by red dots.
36 features extracted from the different sensors.
| Signal | Features | |
| ECG | ||
| skNA | ||
| Skin Temperature | 30. Δtemperature | |
| Blood Pressure | 33. ΔSBP | |
Top 5 selected features have been indicated in bold.
FIGURE 4QRS segments identified from each individual beat of the filtered ECG signal.
FIGURE 5Heatmap of correlation of the thirty-six different features (x and y axes are the features listed in Table 1 above). Highly correlated features are removed and not considered in the development of the models.
Representation of the confusion matrix for AD detection and metrics determination.
| Predicted AD | Predicted Non-AD | |
| Actual AD | True positive (TP) | False negative (FN) |
| Actual non-AD | False positive (FP) | True negative (TN) |
Sensitivity: true positive rate; specificity: true negative rate.
FIGURE 6Bivariate plot showing the differences observed in the five features during AD and non-AD events. There is an observed overlap between the two classes but also some differences between the features which make them discernible. The y-axis are the normalized units of each feature. The green boxes are features which represent sympathetic activity while the red boxes are feature which represent vagal activity.
Performance metrics for the different classifiers with the AD dataset.
| Name | Accuracy | Sensitivity | Specificity | AUC-ROC |
| Neural network | 72.2 | 70.1 | 76.7 | 0.74 |
|
|
|
|
|
|
| Adaboost | 79.3 | 79.3 | 79.2 | 0.78 |
| Decision tree | 86.1 | 83.3 | 89.5 | 0.86 |
| Gaussian process | 91.7 | 88.9 | 94.4 | 0.92 |
| K Nearest neighbor | 86.5 | 83.3 | 89.5 | 0.86 |
| Linear SVM | 62.2 | 30.1 | 86.7 | 0.61 |
| Logistic regression | 87.4 | 84.3 | 82.5 | 0.87 |
| RBF SVM | 63.9 | 72.2 | 84.2 | 0.64 |
| Naïve bayes | 88.9 | 94.4 | 83.3 | 0.89 |
| Random forest | 63.9 | 72.2 | 84.2 | 0.64 |