| Literature DB >> 26609411 |
Tal Shany1, Kejia Wang1, Ying Liu1, Nigel H Lovell1, Stephen J Redmond1.
Abstract
The field of fall risk testing using wearable sensors is bustling with activity. In this Letter, the authors review publications which incorporated features extracted from sensor signals into statistical models intended to estimate fall risk or predict falls in older people. A review of these studies raises concerns that this body of literature is presenting over-optimistic results in light of small sample sizes, questionable modelling decisions and problematic validation methodologies (e.g. inherent problems with the overly-popular cross-validation technique, lack of external validation). There seem to be substantial issues in the feature selection process, whereby researchers select features before modelling begins based on their relation to the target, and either perform no validation or test the models on the same data used for their training. This, together with potential issues related to the large number of features and their correlations, inevitably leads to models with inflated accuracy that are unlikely to maintain their reported performance during everyday use in relevant populations. Indeed, the availability of rich sensor data and many analytical options provides intellectual and creative freedom for researchers, but should be treated with caution, and such pitfalls must be avoided if we desire to create generalisable prognostic tools of any clinical value.Entities:
Keywords: biomechanics; body sensor networks; cross-validation technique; fall prediction; fall risk testing; feature extraction; feature selection; geriatrics; medical signal processing; older adults; patient monitoring; prognostic tools; review; reviews; sensor-based models; statistical models; telemedicine; wearable sensors
Year: 2015 PMID: 26609411 PMCID: PMC4611882 DOI: 10.1049/htl.2015.0019
Source DB: PubMed Journal: Healthc Technol Lett ISSN: 2053-3713
SFRT modelling studies, sorted by year of publication
| A | B | C | D | E | F | G | H | I |
|---|---|---|---|---|---|---|---|---|
| Author, year | Final sample size | Model standard | Specific model targets | Features; (sensor features) | Potential violation of best modelling practice | Modelling | Maximum accuracy for entire cohort | Validation |
| Maki, 1997 [ | 75 | PF (1y) | no falls: 32; 1 + falls: 43 (74 falls) | 16; (10) | binary logistic regression, forward stepwise | 73% | leave-one-out CV | |
| Hausdorff, 2001 [ | 52 | PF (1y) | no falls: 32; 1 + falls: 20 | ?; (4) | used only features that differed significantly, based on all data, between fallers/non-fallers | binary logistic regression per feature | resubstitution | |
| Laessoe, 2007 [ | 94 | PF (1y) | no falls: 80; 1 + falls: 14 | 9; (2) | scores rounded to 0-10; binary logistic regression per feature and for the test battery, backward stepwise | 44% | resubstitution | |
| Giansanti, 2008 [ | 60 for training; 200 for testing | POMA [ | training only: low risk: 30; high risk: 30 | (2) | multiple comparisons of classifiers across a series of papers using the same dataset; very high accuracies imply test data possibly used in model selection | neural network based on the Mahalanobis distance; other model types tested had reduced success | 97% | stratified holdout |
| Kojima, 2008 [ | 153 with FH and fast walk data | FH (1y) | 0/1 fall: 131; 2 + falls: 22 | 7; (2) | selected only two features (one sensor-based) that differed significantly, using all data, between fallers/non-fallers and had a low correlation with each other | canonical discriminant analysis | 62% | resubstitution |
| Gietzelt, 2009 [ | 110 | STRATIFY [ | no risk: 20; fall risk: 90 | (4) | authors noted when modelling STRATIFY: ‘elimination of all redundant gait parameters’ (no further explanation was provided) | decision tree; curve matching; linear median squared linear regression | 90.5% | ten-fold or leave-one-out CV |
| Marschollek, 2009 [ | 110 | PF (while in hospital) | no falls: 84; 1 + falls: 26 | 27; (6) | decision tree | 90% | resubstitution | |
| Greene, 2010 [ | 349 | FH (5y) | no falls: 142; 1 + falls: 207 | (44) | performed multiple comparison testing using all data before model selection and retained only significant features | binary logistic regression (three models: males, females <75 years old, females ≥75 years old) | 77% (mean for the three models) | stratified holdout (80:20 split × 10 times) |
| Narayanan, 2010 [ | 68 | 54; (51) | feature selection performed outside of CV loop | linear least squares regression, sequential forward floating feature search | leave-one-out CV | |||
| Liu, 2011 [ | PPA [ | N/A | 126; (123) | |||||
| Liu, 2011 [ | 68 | FH (1y) | 0/1 fall: 59; 2 + falls: 9 (36 falls) | 126; (123) | feature selection performed outside of CV loop | linear discriminant classifier, sequential forward floating feature search | 97% (71% for fall counts) | leave-one-out CV |
| Paterson, 2011 [ | 97 women | PF (1y) | no falls: 43; 1 fall: 25; 2 + falls: 29 | 16? (2?) | removed two non-sensor features that violated multicollinearity | binary logistic regression | 67% for fallers/non-fallers | resubstitution |
| Bautmans, 2011 [ | 81 | FH (6m), TUG [ | old controls (no risk): 41; fall risk: 40 | (6) | binary logistic regression, stepwise (forward likelihood ratio) | 78% | resubstitution | |
| Weiss, 2011 [ | 41 | FH (1y) | 0/1 fall: 18; 2 + falls: 23 | 30?; (27) | binary logistic regression, forward stepwise | 88% | resubstitution | |
| Marschollek, 2011 [ | 46 | PF (1y, one phone call at end of 1y) | no falls: 27; 1 + falls: 19 | 15; (14) | wrapper feature selection, using all data, was used to select significant features prior to modelling; receiver-operator characteristic analysis likely performed post-hoc using all data | decision tree with regression; binary logistic regression | 80% | ten-fold CV (×10 times) |
| Caby, 2011 [ | 20 | FH (1y) and additional clinical input | no risk: 5; fall risk: 15 | (67) | various feature pools and algorithms investigated outside CV loop; pre-selection of features performed using Holm-corrected multiple comparisons; unclear if feature selection done using training data only during CV | neural network; support vector machine; | 100% | leave-one-out CV |
| Senden, 2012 [ | 100 | POMA | low risk: 50; high risk: 50 | 13?; (9) | removed two sensor features with low correlation with the target | linear regression | resubstitution | |
| Fuke, 2012 [ | 17 | BBS [ | no risk: 8; some risk: 4; high risk: 5 | (4) | data was scaled prior to modelling using all data | support vector machine | leave-one-out CV | |
| Greene, 2012 [ | 120 | FH (5y) | no falls: 55; 1 + (serious) falls: 65 | (44) | single final list of features for CV ( × 10 times) implies feature selection was performed outside of CV loop; multiple feature subsets explored outside of main CV loop; confidence intervals reported when CV results not i.i.d. | support vector machine using default values for the hyperparameters, sequential forward feature selection | 72% | ten-fold CV ( × 10 times) |
| Greene, 2012 [ | 226 | PF (∼2y, one phone call at end of 2y) | no falls: 143; 1 + falls: 83 (144 falls) | (44) | single final list of features for CV ( × 10 times) implies feature selection was performed outside of CV loop; confidence intervals reported when CV results not i.i.d. | regularised discriminant classifier models (for males, females <75 years old or ≥75 years old), sequential forward feature selection | 80% (mean for the three models) | ten-fold CV ( × 10 times) |
| Schwesig, 2012 [ | 141 | PF (1y, nursing home records) | 0–2 falls: 124 3 + falls: 17 (171 falls) | 12?; (6) | confidence intervals appear to be reported using the same test data as used for model fitting | binary logistic regression, backward stepwise | resubstitution | |
| Doheny, 2013 [ | 39 | FH (5y) | no falls: 20; 1 + (serious) falls: 19 | (70) | removed features with low reliability (≤0.7) after test–retest using all data; feature selection performed outside of CV loop; confidence intervals reported when CV results not i.i.d. | binary logistic regression, sequential forward feature selection | 74% | leave-one-out CV |
| Riva, 2013 [ | 131 | FH (1y) | no falls: 89; 1 + falls: 42 | 27 + ; (24) | factor analysis performed using all data to identify salient sensor features prior to multivariate model selection | binary logistic regression (univariate and multivariate), forward stepwise) | 72.5% | resubstitution |
| Doi, 2013 [ | 73 | PF (1y) | no falls: 57; 1 + falls: 16 | 10 + ; (6) | using all data, pre-selected significant features using multiple comparisons against the faller/non-faller target | binary logistic regression, forward stepwise | resubstitution | |
| Weiss, 2013 [ | 71 | FH (1y) | 0/1 fall: 39; 2 + falls: 32 | 26?; (19?) | using all data, removed features that did not meet corrected significance based on FH, and features with ≥0.8 correlation | binary logistic regression, backward stepwise | 72% FH; | resubstitution |
| PF (6m) | 0/1 fall: 59; 2 + falls: 12 | 95% PF | ||||||
| Greene, 2014 [ | 124 | FH (1y) | no falls: 76; 1 + falls: 48 | (150?) | unclear if feature selection done using only training data inside CV loop; confidence intervals reported when CV results not i.i.d. | support vector machine, sequential forward feature selection | 83% | stratified ten-fold CV ( × 10 times) |
| Rispens, 2014 [ | 110 | FH (1y) | 0/1 fall: ?; 2 + falls:(? falls) | (57) | pre-selection of features using multiple significance and reliability tests on all data prior to modelling | negative binomial regression for number of falls; binary logistic regression for multiple fallers against 0–1 fall | resubstitution | |
| Liu, 2014 [ | 98 | PF (1y) | no falls: 50; 1 fall: 28; 2 + falls: 20 | 125; (123) | several different classification tasks (fallers, or multiple-fallers) and various different starting feature subsets evaluated | binary logistic regression, forward stepwise | 83% for one of the multiple-faller models | two-fold CV |
| Schwenk, 2014 [ | 77 | PF (3m, diaries, but dementia patients) | no falls: 49; 1 + falls: 28 | 20; (9) | pre-selection of features using multiple univariate significance tests, using all data, prior to modelling; four different feature subsets/models investigated | binary logistic regression (univariate and multivariate), stepwise | resubstitution | |
| Gietzelt, 2014 [ | 38 (2m); 33 (4m); 28 (8m) | PF (max 8m, nursing home records) | no falls: 15?; 1 + falls: 13 (26 falls) | ?; (12?) | correlation-based pre-selection of features using all data; specific trained models are shown, even though ten-fold CV should result in ten models per task | decision tree | 88.5% (8m) | ten-fold CV |
| Van Schooten, 2015 [ | 169 | FH (6m) | no falls: 109; 1 + falls: 60 | 67?; (33?) ( + 26 sensor features interactions) | features were transformed to | binary logistic regression per feature and also in various combinations, forward stepwise | resubstitution | |
| PF (6m) | no falls: 110; 1 + falls: 59 |
Column B specifies the final number of subjects, excluding sub-cohorts of young controls. Rows highlighted in grey are studies that used prospective falls (PF) as the basis for the dependent variable; other studies used fall history (FH), clinical risk assessments, or various combinations, as noted under column C. Column C also contains the fall reporting period in parenthesis (i.e. 1y = one year, 6m = six months); fall data was recorded using self-reporting diaries, sometimes accompanied by phone calls, unless otherwise stated in this column. Column D includes the model targets, and pertains to the number of individuals in each category (i.e. fall risk level or sub-group based on number of falls); where available, the actual number of reported falls within the entire cohort is included as well, even if not used for analysis. Column E contains the number of features available for analysis, with the specific number of ‘sensor-extracted’ features in parentheses. In certain cases, only sensor features were used, or it was not possible to accurately report the overall number of model features. Under column I, CV stands for cross-validation. Empty fields in the table imply that the specific aspect was not mentioned in the publication, or could not be confidently deduced from the provided information. A ‘?’ symbol appears where there might be such uncertainty regarding the specific number(s) of model targets and/or features.