Johann de Jong1, Mohammad Asif Emon2,3, Ping Wu4, Reagon Karki2,3, Meemansa Sood2,3, Patrice Godard5, Ashar Ahmad3, Henri Vrooman6,7, Martin Hofmann-Apitius2,3, Holger Fröhlich1,3. 1. UCB Biosciences GmbH, Alfred-Nobel-Strasse 10, 40789 Monheim, Germany. 2. Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Konrad-Adenauer-Strasse, 53754 Sankt Augustin, Germany. 3. Bonn-Aachen International Center for IT, University of Bonn, Konrad-Adenauer-Strasse, 53115 Bonn, Germany. 4. UCB Pharma, Bath Road 216, Slough SL1 3WE, UK. 5. UCB Pharma, Chemin du Foriest 1, 1420 Braine-l'Alleud, Belgium. 6. Erasmus MC, University Medical Center Rotterdam, Department of Radiology, Doctor Molewaterplein 40, PO Box 2040, 3000 CA Rotterdam, Netherlands. 7. Erasmus MC, University Medical Center Rotterdam, Doctor Molewaterplein 40, Department of Medical Informatics, PO Box 2040, 3000 CA Rotterdam, Netherlands.
Abstract
BACKGROUND: Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts. FINDINGS: The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning-based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease. CONCLUSIONS: We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general.
BACKGROUND: Precision medicine requires a stratification of patients by disease presentation that is sufficiently informative to allow for selecting treatments on a per-patient basis. For many diseases, such as neurological disorders, this stratification problem translates into a complex problem of clustering multivariate and relatively short time series because (i) these diseases are multifactorial and not well described by single clinical outcome variables and (ii) disease progression needs to be monitored over time. Additionally, clinical data often additionally are hindered by the presence of many missing values, further complicating any clustering attempts. FINDINGS: The problem of clustering multivariate short time series with many missing values is generally not well addressed in the literature. In this work, we propose a deep learning-based method to address this issue, variational deep embedding with recurrence (VaDER). VaDER relies on a Gaussian mixture variational autoencoder framework, which is further extended to (i) model multivariate time series and (ii) directly deal with missing values. We validated VaDER by accurately recovering clusters from simulated and benchmark data with known ground truth clustering, while varying the degree of missingness. We then used VaDER to successfully stratify patients with Alzheimer disease and patients with Parkinson disease into subgroups characterized by clinically divergent disease progression profiles. Additional analyses demonstrated that these clinical differences reflected known underlying aspects of Alzheimer disease and Parkinson disease. CONCLUSIONS: We believe our results show that VaDER can be of great value for future efforts in patient stratification, and multivariate time-series clustering in general.
Authors: Sean M Nestor; Raul Rupsingh; Michael Borrie; Matthew Smith; Vittorio Accomazzi; Jennie L Wells; Jennifer Fogarty; Robert Bartha Journal: Brain Date: 2008-07-11 Impact factor: 13.501
Authors: R C Petersen; P S Aisen; L A Beckett; M C Donohue; A C Gamst; D J Harvey; C R Jack; W J Jagust; L M Shaw; A W Toga; J Q Trojanowski; M W Weiner Journal: Neurology Date: 2009-12-30 Impact factor: 9.910
Authors: Benjamin Lam; Mario Masellis; Morris Freedman; Donald T Stuss; Sandra E Black Journal: Alzheimers Res Ther Date: 2013-01-09 Impact factor: 6.982
Authors: Hamed Javidi; Arshiya Mariam; Gholamreza Khademi; Emily C Zabor; Ran Zhao; Tomas Radivoyevitch; Daniel M Rotroff Journal: NPJ Digit Med Date: 2022-07-27
Authors: Kostas Stoitsas; Saurabh Bahulikar; Leonie de Munter; Mariska A C de Jongh; Maria A C Jansen; Merel M Jung; Marijn van Wingerden; Katrijn Van Deun Journal: Sci Rep Date: 2022-10-10 Impact factor: 4.996
Authors: Vivek Singh; Rishikesan Kamaleswaran; Donald Chalfin; Antonio Buño-Soto; Janika San Roman; Edith Rojas-Kenney; Ross Molinaro; Sabine von Sengbusch; Parsa Hodjat; Dorin Comaniciu; Ali Kamen Journal: iScience Date: 2021-11-27