| Literature DB >> 34876110 |
Benjamin Filtjens1,2, Pieter Ginis3, Alice Nieuwboer3, Muhammad Raheel Afzal4, Joke Spildooren5, Bart Vanrumste6, Peter Slaets4.
Abstract
BACKGROUND: Although deep neural networks (DNNs) are showing state of the art performance in clinical gait analysis, they are considered to be black-box algorithms. In other words, there is a lack of direct understanding of a DNN's ability to identify relevant features, hindering clinical acceptance. Interpretability methods have been developed to ameliorate this concern by providing a way to explain DNN predictions.Entities:
Keywords: Convolutional neural networks; Explainable artificial intelligence; Freezing of gait; Gait analysis; Parkinson’s disease
Mesh:
Year: 2021 PMID: 34876110 PMCID: PMC8650332 DOI: 10.1186/s12911-021-01699-0
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Subject characteristics of the fourteen healthy controls (controls), fourteen PD patients without FOG (non-freezers), and fourteen PD patients with FOG (freezers) in terms of mean ± SD as measured during the ON-phase of the medication cycle
| Controls | Non-freezers | Freezers | |
|---|---|---|---|
| Age (years) | 65.2 ± 6.8 | 66.7 ± 7.4 | 68.6 ± 7.4 |
| Disease duration (years) | 7.8 ± 4.8 | 9.0 ± 4.8 | |
| UPDRS III [ | 34.4 ± 9.9 | 37.9 ± 14.0 | |
| H&Y [ | 2.4 ± 0.3 | 2.5 ± 0.5 |
Fig. 1Visualization of the proposed methodology. The proposed methodology consists of two-stages (1) a convolutional neural network (CNN) to model the dramatic reduction of movement present before a freezing of gait (FOG) episode (Phase 2), and (2) layer-wise relevance propagation (LRP) to interpret the underlying features that the CNN perceives as important to model the pathology (Phase 3). The CNN was trained with the sagittal plane kinematics as recorded by a motion capture system (Phase 1). The figure illustrates the benefit of interpretation in a deep learning framework
Visual overview of the nested leave one subject out cross validation
For simplicity, the visualization is given for five subjects (S1–S5). The dashed lines are added to denote that the visualization is limited to a single iteration of the outer loop, visualizing the tuning procedure for left-out test subject S1. For this single iteration of the outer loop, subject 1 (S1) is left-out as a true holdout set. The remaining subjects (S2–S5) are utilized to optimize the network parameters in the inner loop. For each hyperparameter set, the inner loop computes the prediction accuracy by iteratively using each inner loop subject as a holdout validation set. The hyperparameter set that results in the highest average accuracy on the inner loop subjects is utilized to train a model on all subjects of the inner loop (S2–25). This trained model is utilized to compute the metrics and explanations of the left-out test subject (S1). This process is repeated for all subjects
Results of the convolutional neural network (CNN) and support vector machine with linear kernel (LSVC)
| Subject Number | CNN | LSVC |
|---|---|---|
| 1* (FOG: 18, FGC:15) | 90.9 | 90.9 |
| 2* (FOG: 13, FGC:9) | 72.7 | 63.6 |
| 3* (FOG: 7, FGC:6) | 100 | 100 |
| 4* (FOG: 3, FGC:3) | 83.3 | 83.3 |
| 5* (FOG: 5, FGC:5) | 50.0 | 70.0 |
| 6* (FOG: 9, FGC:9) | 100 | 94.4 |
| 7* (FOG: 1, FGC:1) | 100 | 100 |
| 8 (FGC: 10) | 100 | 100 |
| 9 (FGC: 6) | 100 | 100 |
| 10 (FGC: 7) | 100 | 100 |
| 11 (FGC: 9) | 100 | 100 |
| 12 | 100 | 81.8 |
| 13 | 62.5 | 62.5 |
| 14 | 55.6 | 55.6 |
| Mean accuracy ± SD | 86.8 ± 18.7 | 85.9 ± 16.5 |
| Sensitivity | 82.1 | 85.7 |
| Specificity | 88.9 | 84.3 |
| PPV | 79.3 | 73.8 |
| NPV | 90.6 | 91.9 |
| Non-freezers (FGC: 2421) | 97.6 | 95.8 |
| Controls (FGC: 2258) | 99.9 | 99.9 |
| Mean accuracy ± SD | 98.7 ± 1.66 | 97.9 ± 2.89 |
All scores are given in terms of accuracy (%), assessing the performance of the DL models (and LSVC) on the fourteen freezers individually (Subject 1–14), with a summarized score for the 2421 and 2258 strides extracted from the fourteen non-freezers and fourteen healthy controls, respectively. For the fourteen freezers, the performance is additionally assessed in terms of the sensitivity (%), specificity (%), positive predictive value (PPV) (%), and negative predictive value (NPV) (%). The asterisk (*) is used to denote the seven freezers that froze during the protocol. The dagger () is used to denote the three freezers that froze off camera. The rounded brackets denote the number of extracted strides. For the fourteen freezers, the number of extracted FGCs were controlled for protocol and class imbalance, as explained in the procedure
Fig. 2Mean and standard deviation of the hip, knee, and ankle joint trajectories in the sagittal plane for six of the seven freezers who experienced FOG during the protocol (a), with the excluded subject discussed separately (b), and the fourteen non-freezers and fourteen healthy control subjects (c). The joint trajectories are colorized with the relevance map (heatmap) using -LRP. To ensure an equal contribution, six strides (three pre-FOG and three FGC) are used of each freezer, with exception of subject seven who only froze once. For the non-freezers (NF) and healthy control (HC) subjects, all 2421 and 2258 strides were used. For the attribution plots of the freezers (a and b), the error clouds depict the standard deviations of the pre-FOG trajectories (gray) and FGC trajectories (green). For the attribution plots of the NF and HC (c), the error clouds depict the standard deviations of NF trajectories (green) and HC trajectories (gray). Positive relevance (red) indicates contribution to FOG, while negative relevance (blue) indicates contribution to FGC