Petra J Jones1, Mike Catt2, Melanie J Davies3, Charlotte L Edwardson4, Evgeny M Mirkes5, Kamlesh Khunti6, Tom Yates4, Alex V Rowlands7. 1. Leicester Diabetes Centre, University Hospitals of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; Diabetes Research Centre, University of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK. Electronic address: pj100@leicester.ac.uk. 2. Population Health Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK. 3. Leicester Diabetes Centre, University Hospitals of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; Diabetes Research Centre, University of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; NIHR Leicester Biomedical Research Centre, Leicester General Hospital, Gwendolen Road, Leicester, LE5 4PW, UK. 4. Diabetes Research Centre, University of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; NIHR Leicester Biomedical Research Centre, Leicester General Hospital, Gwendolen Road, Leicester, LE5 4PW, UK. 5. School of Mathematics and Actuarial Science, University of Leicester, University Road, Leicester LE1 7RD, UK. 6. Leicester Diabetes Centre, University Hospitals of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; NIHR Leicester Biomedical Research Centre, Leicester General Hospital, Gwendolen Road, Leicester, LE5 4PW, UK. 7. Diabetes Research Centre, University of Leicester, Leicester General Hospital, Gwendolen Road, Leicester LE5 4PW, UK; NIHR Leicester Biomedical Research Centre, Leicester General Hospital, Gwendolen Road, Leicester, LE5 4PW, UK; Alliance for Research in Exercise, Nutrition and Activity (ARENA), Sansom Institute for Health Research, Division of Health Sciences, University of South Australia, Adelaide, Australia.
Abstract
BACKGROUND: Identifying clusters of physical activity (PA) from accelerometer data is important to identify levels of sedentary behaviour and physical activity associated with risks of serious health conditions and time spent engaging in healthy PA. Unsupervised machine learning models can capture PA in everyday free-living activity without the need for labelled data. However, there is scant research addressing the selection of features from accelerometer data. The aim of this systematic review is to summarise feature selection techniques applied in studies concerned with unsupervised machine learning of accelerometer-based device obtained physical activity, and to identify commonly used features identified through these techniques. Feature selection methods can reduce the complexity and computational burden of these models by removing less important features and assist in understanding the relative importance of feature sets and individual features in clustering. METHOD: We conducted a systematic search of Pubmed, Medline, Google Scholar, Scopus, Arxiv and Web of Science databases to identify studies published before January 2021 which used feature selection methods to derive PA clusters using unsupervised machine learning models. RESULTS: A total of 13 studies were eligible for inclusion within the review. The most popular feature selection techniques were Principal Component Analysis (PCA) and correlation-based methods, with k-means frequently used in clustering accelerometer data. Cluster quality evaluation methods were diverse, including both external (e.g. cluster purity) or internal evaluation measures (silhouette score most frequently). Only four of the 13 studies had more than 25 participants and only four studies included two or more datasets. CONCLUSION: There is a need to assess multiple feature selection methods upon large cohort data consisting of multiple (3 or more) PA datasets. The cut-off criteria e.g. number of components, pairwise correlation value, explained variance ratio for PCA, etc. should be expressly stated along with any hyperparameters used in clustering. Crown
BACKGROUND: Identifying clusters of physical activity (PA) from accelerometer data is important to identify levels of sedentary behaviour and physical activity associated with risks of serious health conditions and time spent engaging in healthy PA. Unsupervised machine learning models can capture PA in everyday free-living activity without the need for labelled data. However, there is scant research addressing the selection of features from accelerometer data. The aim of this systematic review is to summarise feature selection techniques applied in studies concerned with unsupervised machine learning of accelerometer-based device obtained physical activity, and to identify commonly used features identified through these techniques. Feature selection methods can reduce the complexity and computational burden of these models by removing less important features and assist in understanding the relative importance of feature sets and individual features in clustering. METHOD: We conducted a systematic search of Pubmed, Medline, Google Scholar, Scopus, Arxiv and Web of Science databases to identify studies published before January 2021 which used feature selection methods to derive PA clusters using unsupervised machine learning models. RESULTS: A total of 13 studies were eligible for inclusion within the review. The most popular feature selection techniques were Principal Component Analysis (PCA) and correlation-based methods, with k-means frequently used in clustering accelerometer data. Cluster quality evaluation methods were diverse, including both external (e.g. cluster purity) or internal evaluation measures (silhouette score most frequently). Only four of the 13 studies had more than 25 participants and only four studies included two or more datasets. CONCLUSION: There is a need to assess multiple feature selection methods upon large cohort data consisting of multiple (3 or more) PA datasets. The cut-off criteria e.g. number of components, pairwise correlation value, explained variance ratio for PCA, etc. should be expressly stated along with any hyperparameters used in clustering. Crown
Authors: Víctor Micó; Rodrigo San-Cristobal; Roberto Martín; Miguel Ángel Martínez-González; Jordi Salas-Salvadó; Dolores Corella; Montserrat Fitó; Ángel M Alonso-Gómez; Julia Wärnberg; Jesús Vioque; Dora Romaguera; José López-Miranda; Ramon Estruch; Francisco J Tinahones; José Lapetra; J Luís Serra-Majem; Aurora Bueno-Cavanillas; Josep A Tur; Vicente Martín Sánchez; Xavier Pintó; Miguel Delgado-Rodríguez; Pilar Matía-Martín; Josep Vidal; Clotilde Vázquez; Ana García-Arellano; Salvador Pertusa-Martinez; Alice Chaplin; Antonio Garcia-Rios; Carlos Muñoz Bravo; Helmut Schröder; Nancy Babio; Jose V Sorli; Jose I Gonzalez; Diego Martinez-Urbistondo; Estefania Toledo; Vanessa Bullón; Miguel Ruiz-Canela; María Puy- Portillo; Manuel Macías-González; Nuria Perez-Diaz-Del-Campo; Jesús García-Gavilán; Lidia Daimiel; J Alfredo Martínez Journal: Front Endocrinol (Lausanne) Date: 2022-09-06 Impact factor: 6.055