Daniel Stamate1,2,3, Min Kim4, Petroula Proitsi5, Sarah Westwood6, Alison Baird6, Alejo Nevado-Holgado6, Abdul Hye5, Isabelle Bos7,8, Stephanie J B Vos7, Rik Vandenberghe8, Charlotte E Teunissen9, Mara Ten Kate8,9, Philip Scheltens8, Silvy Gabel10,11,12, Karen Meersmans11,12, Olivier Blin13, Jill Richardson14, Ellen De Roeck15,16,17, Sebastiaan Engelborghs16,17,18, Kristel Sleegers17,19, Régis Bordet20, Lorena Ramit21, Petronella Kettunen22, Magda Tsolaki23, Frans Verhey7, Daniel Alcolea24, Alberto Lléo24, Gwendoline Peyratout25, Mikel Tainta26, Peter Johannsen27, Yvonne Freund-Levi5,28, Lutz Frölich29, Valerija Dobricic30, Giovanni B Frisoni31,32, José L Molinuevo20,33, Anders Wallin34, Julius Popp25,35, Pablo Martinez-Lage26, Lars Bertram30,36, Kaj Blennow37,38, Henrik Zetterberg37,38,39,40, Johannes Streffer41, Pieter J Visser7,8, Simon Lovestone6,42, Cristina Legido-Quigley4,43. 1. Division of Population Health, Health Services Research and Primary Care, University of Manchester, Manchester, UK. 2. Data Science & Soft Computing Lab, London, UK. 3. Computing Department, Goldsmiths College, University of London, London, UK. 4. Steno Diabetes Center Copenhagen, Gentofte, Denmark. 5. Institute of Psychiatry, Psychology and Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK. 6. Department of Psychiatry, University of Oxford, Oxford, UK. 7. Department of Psychiatry and Neuropsychology, School for Mental Health and Neuroscience, Alzheimer Centrum Limburg, Maastricht University, Maastricht, the Netherlands. 8. Department of Neurology, Alzheimer Center Amsterdam, Amsterdam Neuroscience, Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, the Netherlands. 9. Department of Radiology and Nuclear Medicine, VU University Medical Center, Amsterdam, the Netherlands. 10. Department of Clinical Chemistry, Neurochemistry Laboratory, Amsterdam Neuroscience, Amsterdam University Medical Centers, Vrije Universiteit, the Netherlands. 11. University Hospital Leuven, Leuven, Belgium. 12. Department of Neurosciences, Laboratory for Cognitive Neurology, KU Leuven, Belgium. 13. AIX Marseille University, INS, Ap-hm, Marseille, France. 14. Neurosciences Therapeutic Area, GlaxoSmithKline R&D, Stevenage, UK. 15. Faculty of Psychology & Educational Sciences Vrije Universiteit Brussel (VUB), Brussels, Belgium. 16. Reference Center for Biological Markers of Dementia (BIODEM), University of Antwerp, Antwerp, Belgium. 17. Institute Born-Bunge, University of Antwerp, Antwerp, Belgium. 18. Department of Neurology, UZ Brussel and Center for Neurosciences, Vrije Universiteit Brussel (VUB), Brussels, Belgium. 19. Neurodegenerative Brain Diseases Group, Center for Molecular Neurology, VIB, Belgium. 20. University of Lille, Inserm, CHU Lille, Lille, France. 21. Alzheimer's Disease & Other Cognitive Disorders Unit, Hospital Clínic-IDIBAPS, Barcelona, Spain. 22. Institute of Neuroscience and Physiology, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden. 23. 1st Department of Neurology, AHEPA University Hospital, Makedonia, Thessaloniki, Greece. 24. Memory Unit, Neurology Department, Hospital de la Santa Creu i Sant Pau, Barcelona, Spain. 25. University Hospital of Lausanne, Lausanne, Switzerland. 26. Center for Research and Advanced Therapies, Fundacion CITA-alzheimer Fundazioa, Donostia/San Sebastian, Spain. 27. Danish Dementia Research Centre, Rigshospitalet, Copenhagen University Hospital, Copenhagen, Denmark. 28. Department of Neurobiology, Caring Sciences and Society (NVS), Division of Clinical Geriatrics, Karolinska Institute, and Department of Geriatric Medicine, Karolinska University Hospital Huddinge, Stockholm, Sweden. 29. Department of Geriatric Psychiatry, Zentralinstitut für Seelische Gesundheit, University of Heidelberg, Mannheim, Germany. 30. Lübeck Interdisciplinary Platform for Genome Analytics, Institutes of Neurogenetics and Cardiogenetics, University of Lübeck, Lübeck, Germany. 31. University of Geneva, Geneva, Switzerland. 32. IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy. 33. Barcelona Beta Brain Research Center, Unversitat Pompeu Fabra, Barcelona, Spain. 34. Institute of Neuroscience and Physiology, Sahlgrenska Academy at the University of Gothenburg, Gothenburg, Sweden. 35. Department of Mental Health and Psychiatry, Geriatric Psychiatry, Geneva University Hospitals, Geneva, Switzerland. 36. Department of Psychology, University of Oslo, Oslo, Norway. 37. Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, University of Gothenburg, Mölndal, Sweden. 38. Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Mölndal, Sweden. 39. UK Dementia Research Institute at UCL, London, UK. 40. Department of Neurodegenerative Disease, UCL Institute of Neurology, London, UK. 41. Reference Center for Biological Markers of Dementia (BIODEM), Institute Born-Bunge, University of Antwerp, Antwerp, Belgium. 42. Janssen-Cilag UK Ltd, Oxford, UK. 43. Institute of Pharmaceutical Science, King's College London, London, UK.
Abstract
INTRODUCTION: Machine learning (ML) may harbor the potential to capture the metabolic complexity in Alzheimer Disease (AD). Here we set out to test the performance of metabolites in blood to categorize AD when compared to CSF biomarkers. METHODS: This study analyzed samples from 242 cognitively normal (CN) people and 115 with AD-type dementia utilizing plasma metabolites (n = 883). Deep Learning (DL), Extreme Gradient Boosting (XGBoost) and Random Forest (RF) were used to differentiate AD from CN. These models were internally validated using Nested Cross Validation (NCV). RESULTS: On the test data, DL produced the AUC of 0.85 (0.80-0.89), XGBoost produced 0.88 (0.86-0.89) and RF produced 0.85 (0.83-0.87). By comparison, CSF measures of amyloid, p-tau and t-tau (together with age and gender) produced with XGBoost the AUC values of 0.78, 0.83 and 0.87, respectively. DISCUSSION: This study showed that plasma metabolites have the potential to match the AUC of well-established AD CSF biomarkers in a relatively small cohort. Further studies in independent cohorts are needed to validate whether this specific panel of blood metabolites can separate AD from controls, and how specific it is for AD as compared with other neurodegenerative disorders.
INTRODUCTION: Machine learning (ML) may harbor the potential to capture the metabolic complexity in Alzheimer Disease (AD). Here we set out to test the performance of metabolites in blood to categorize AD when compared to CSF biomarkers. METHODS: This study analyzed samples from 242 cognitively normal (CN) people and 115 with AD-type dementia utilizing plasma metabolites (n = 883). Deep Learning (DL), Extreme Gradient Boosting (XGBoost) and Random Forest (RF) were used to differentiate AD from CN. These models were internally validated using Nested Cross Validation (NCV). RESULTS: On the test data, DL produced the AUC of 0.85 (0.80-0.89), XGBoost produced 0.88 (0.86-0.89) and RF produced 0.85 (0.83-0.87). By comparison, CSF measures of amyloid, p-tau and t-tau (together with age and gender) produced with XGBoost the AUC values of 0.78, 0.83 and 0.87, respectively. DISCUSSION: This study showed that plasma metabolites have the potential to match the AUC of well-established AD CSF biomarkers in a relatively small cohort. Further studies in independent cohorts are needed to validate whether this specific panel of blood metabolites can separate AD from controls, and how specific it is for AD as compared with other neurodegenerative disorders.
At present, the diagnosis of Alzheimer disease–type dementia (AD) is based on protein biomarkers in cerebrospinal fluid (CSF) and brain imaging together with a battery of cognition tests. Diagnostic tools based on CSF collection are invasive while brain-imaging tools are still costly, and therefore, there is a need to identify noninvasive tools for early detection as well as for measuring disease progression.In recent years, an increasing number of studies have examined blood metabolites as potential AD biomarkers [[1], [2], [3], [4]]. The advantages of looking at blood metabolites are that they are easily accessible but also that they represent an essential aspect of the phenotype of an organism and hence might act as a molecular fingerprint of disease progression [5,6]. Therefore, blood AD markers could potentially aid early diagnosis and recruitment for trials.Here we utilized data generated as part of the European Medical Information Framework for AD Multimodal Biomarker Discovery (EMIF-AD) previously reported in full in Kim et al. [7]. As discussed in that paper, metabolite levels were measured using liquid chromatography–mass spectroscopy (LC-MS) to cover ca. Eight hundred metabolites and these metabolites related to CSF biomarkers of AD commonly used in clinical research including trials, and increasingly in clinical practice, as part of the diagnostic work up. Here we explore the potential of different Machine Learning (ML) algorithms to identify those individuals with AD from dataset and to compare the effectiveness of blood-based metabolites as an indicator of clinical diagnosis to that of CSF markers. In this study we employed two state-of-the-art ML algorithms—Deep Learning (DL) and Extreme Gradient Boosting (XGBoost)—and compared these to the more commonly utilized Random Forest (RF) algorithm.
Methods
This study accessed data previously generated from 242 samples from cognitively normal (CN) individuals and 115 from people with AD-type dementia (AD) samples in which diagnosis was based on clinical diagnosis. Details on the subjects, clinical and cognitive data, as well as measurements of AD pathological markers have been described elsewhere [7,8]. The metabolomics data employed here was accessed in the EMIF-AD portal and the acquisition and processing details can be found via open access in [7]. In short, the EMIF-AD cohort is a collated cohort making use of existing data and samples collected in 11 different studies across Europe, with the aim to discover novel diagnostic and prognostic markers for predementia AD.In the current study, the main objective was to use state-of-the-art ML classification algorithms to build CN versus. AD predictive models using blood metabolites. For this purpose, we employed DL and XGBoost. Additionally we also employed the more popularly used RF algorithm. These models were compared in terms of binary classifiers with Area Under the Curve (AUC) in Receiver Operating Characteristic (ROC) curves.The metabolites with more than 45% missing values were discarded. The remaining missing values were handled with imputation methods based on the k-nearest neighbor (RF and DL), or internally by the classification algorithm (XGBoost). Models were built and evaluated using a Nested Cross Validation (NCV), which used 9/10 data folds for model training and optimization in an inner cross-validation, and 1/10 data folds for model testing in an outer cross-validation. The process was repeated 10 times, for each of the test data folds.The analysis was further extended by assessing the stability of the AUC performance with Monte Carlo (MC) simulations consisting of 50 repeated similar NCV experiments. As such, multiple models were built on multiple samples in the NCV and MC, using metabolite predictors selected on the basis of their capability to discriminate CN versus AD as measured by the Relief algorithm [9] applied on training data in combination with 500 permutations of the outcome variables’ values. This method computes the predictors' importance defined as the standardized Relief score, according to Measuring Predictor Importance chapter of [10]. Part of the prediction modeling methodology in this study was adapted after [11], with different algorithms, and followed recommendations from [10,12]. The analysis was carried out using R software [13]. Pathway analysis was performed on the top 20 ranked metabolites using MetaboAnalyst 4.0 [14]. The algorithms were run on four servers with 6-core Xeon CPUs and 336 GB RAM.
Results
In this study, we analyzed metabolite data derived from blood samples from 357 participants (CN n = 242, AD n = 115) previously reported in Kim et al. [7]. Demographic and clinical data can be found in [7]; in short, there was no difference in gender while AD participants were older when compared with CN participants.On the test data, the DL model produced a Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) value of 0.85 with its 95% confidence interval (CI) ranging between 0.8038 and 0.8895. The XGBoost model produced the AUC value of 0.88 (95% CI [0.8619, 0.8903]). When the classifier model RF was employed, the resulting AUC was 0.85 (95%CI [0.8323, 0.8659]). Fig. 1 illustrates ROC curves obtained from the three ML models.
Fig. 1
Shows the AUC values for the XGBoost, RF and DL models. XGBoost performed best with metabolite predictors in the EMIF cohort.
Shows the AUC values for the XGBoost, RF and DL models. XGBoost performed best with metabolite predictors in the EMIF cohort.The MC simulation conducted with XGBoost, which was the superior predictive model in our analysis, led to a Gaussian distribution of the AUC values according to [11] and as confirmed by Shapiro-Wilk test (P value = .6819). The 50 AUC values obtained in MC had a minimum of 0.8614, a maximum of 0.8923, a mean of 0.8761, a median of 0.8766 and a standard deviation of 0.0072. The t-test showed that the true mean of AUC for XGBoost applied on plasma metabolites was not lower than 0.87 (P value = 1.265 × 10−07).For comparison, we also investigated the levels of amyloid, p-tau and t-tau, to which we added also age and gender, and their prediction for clinical AD versus CN. XGBoost models were built in the same manner as for metabolite predictors. Together with age and gender, amyloid led to AUC 0.78 (95%CI [0.7626, 0.8013]); p-tau led to AUC 0.83 (95%CI [0.8188, 0.8470]); and t-tau led to AUC 0.87 (95%CI [0.8583, 0.8854]). From the mean AUC for metabolites and for amyloid, p-tau and t-tau calculated individually, the t-tests showed superior values for metabolites (P value<2.2 × 10−16, P value<2.2 × 10−16 and P value = .005921, respectively).The top 20 ranked predictors out of the 347 selected by the method presented in the previous section are shown in Fig. 2.
Fig. 2
The x-axis shows the top 20 ranked predictors, and the y-axis shows the predictors' importance computed as the standardized relief score according to Measuring Predictor Importance chapter of [10].
The x-axis shows the top 20 ranked predictors, and the y-axis shows the predictors' importance computed as the standardized relief score according to Measuring Predictor Importance chapter of [10].Pathway analyses revealed that the Nitrogen pathway was overrepresented (qFDR = 0.004) within the panel. Molecules that were captured as the 20 top ranking predictors are discussed in the next section.
Discussion
Machine Learning applied to healthcare is increasingly enabled by the advent of high-performance computing and the development of complex algorithms. In this study, we employed two state-of-the-art algorithms, DL and XGBoost, and a more conventional algorithm, RF, to obtain high accuracy models to predict AD versus CN with metabolites as predictors. Our study showed that the best model was based on XGBoost [15], which is an enhanced form of Gradient Boosting Machines methods based on decision trees [12]. In our study RF and DL achieved comparable AUC. DL algorithms are known to often take advantage of large and/or unstructured data (such as images) to produce more accurate category discrimination/prediction. In a study using the Alzheimer's Disease Neuroimaging Initiative (ADNI) data for AD prediction, XGBoost demonstrated superior results (AUC = 0.97 (0.01) when including imaging parameters (MRI and PET) as predictors and when compared to RF, Support Vector Machines, Gaussian Processes and Stochastic Gradient Boosting [16]. In another study where cognition and MRI were used as predictors, Kernel Ridge Regression was performed to R2 = 0.87 (0.025) when cognition and MRI predictors were included [17].Pathway analyses using the top 20 AD predicting metabolites derived from the Relief method showed that the nitrogen pathway was overrepresented. Some of the molecules selected have been reported in metabolomics studies and have been implicated in neurodegeneration: dodecanoate, which is a C12 fatty acid, was found correlated to longitudinal measures of cognition in the ADNI cohort [3] and so was the bile acid glycolithocholate, which was associated to both AD and cognition measures (ADAS-Cog13) in one of the biggest cross-sectional studies on cognition, AD and the microbiome [18]. Plasmalogens were also found in decreased levels in our cohort in agreement with an earlier report [19]. The amide form of vitamin B3, nicotinamide, has been implicated in both neuroprotection and neuronal death [20].New metabolites that could be of interest and have not been previously reported as related to AD were phytanate and furoylglycine. The former is a known neurotoxin which impairs mitochondrial function and transcription [21]. Furoylglycine is a metabolite which, as lithocholic acid, is mainly synthesized by the microbiome and has been reported as a biomarker of coffee consumption [22].A limitation of our study is that it does not include an external validation due to the size of the cohort. However, we implemented a NCV procedure repeated 50 times in a MC simulation that led to an extended internal validation with prediction accuracy of cases. Further studies will assess the performance of ratios/combinations of CSF markers and metabolites, life-style factors and disorders commonly found in the elderly, together with testing the specificity for this specific panel in other neurodegenerative (e.g., PD, FTD), neurological (e.g., stroke) and psychiatric (e.g., depression) disorders associated with aging.The intent of this paper was to compare the performance of different ML algorithms to identify people with AD from cognitively unimpaired individuals. Here we show first that all three approaches used demonstrate good discriminatory power, second that XGBoost is somewhat more effective in this particular dataset than RF and DL and third, that this accuracy for clinical diagnosis is broadly similar to that achieved by CSF markers of AD pathology. The lack of a replication and validation dataset limits the interpretation of this finding, but nonetheless, the strong prediction of diagnostic category from a blood-based metabolite biomarker set is further evidence of the potential of such approaches to complement other biomarkers in identification of people with likely AD.Systematic review: The authors reviewed the literature using PubMed and reported key publications. Most AD biomarker studies employing state-of-the-art machine learning (ML) techniques utilized neuroimaging data. Those which looked at blood metabolomics data were small and used clinical diagnosis as an endpoint. Subsequently, we explored the potential of state-of-the art ML algorithms including Deep Learning and Extreme Gradient Boosting to test the performance of blood metabolite levels to clinical diagnosis and compared to CSF biomarkers.Interpretation: The results in here show that with state-of-the-art ML algorithms, blood metabolites have the potential to match the CSF markers of AD pathology on identifying people with AD from cognitively unimpaired individuals. All the ML algorithms employed showed good discriminatory power.Future directions: Results of this study should be replicated and validated using an independent dataset. Further studies will also aim to assess the performance of ratios/combinations of CSF markers and metabolites, life-style factors and disorders commonly found in the elderly, together with testing the specificity for this specific panel in other neurodegenerative (e.g., PD, FTD), neurological (e.g., stroke) and psychiatric (e.g., depression) disorders associated with aging.
Authors: Magda Bucholc; Xuemei Ding; Haiying Wang; David H Glass; Hui Wang; Girijesh Prasad; Liam P Maguire; Anthony J Bjourson; Paula L McClean; Stephen Todd; David P Finn; KongFatt Wong-Lin Journal: Expert Syst Appl Date: 2019-04-10 Impact factor: 6.954
Authors: Daniel Stamate; Andrea Katrinecz; Daniel Stahl; Simone J W Verhagen; Philippe A E G Delespaul; Jim van Os; Sinan Guloksuz Journal: Schizophr Res Date: 2019-05-16 Impact factor: 4.939
Authors: Jon B Toledo; Matthias Arnold; Gabi Kastenmüller; Rui Chang; Rebecca A Baillie; Xianlin Han; Madhav Thambisetty; Jessica D Tenenbaum; Karsten Suhre; J Will Thompson; Lisa St John-Williams; Siamak MahmoudianDehkordi; Daniel M Rotroff; John R Jack; Alison Motsinger-Reif; Shannon L Risacher; Colette Blach; Joseph E Lucas; Tyler Massaro; Gregory Louie; Hongjie Zhu; Guido Dallmann; Kristaps Klavins; Therese Koal; Sungeun Kim; Kwangsik Nho; Li Shen; Ramon Casanova; Sudhir Varma; Cristina Legido-Quigley; M Arthur Moseley; Kuixi Zhu; Marc Y R Henrion; Sven J van der Lee; Amy C Harms; Ayse Demirkan; Thomas Hankemeier; Cornelia M van Duijn; John Q Trojanowski; Leslie M Shaw; Andrew J Saykin; Michael W Weiner; P Murali Doraiswamy; Rima Kaddurah-Daouk Journal: Alzheimers Dement Date: 2017-03-22 Impact factor: 16.655
Authors: Siamak MahmoudianDehkordi; Matthias Arnold; Kwangsik Nho; Shahzad Ahmad; Wei Jia; Guoxiang Xie; Gregory Louie; Alexandra Kueider-Paisley; M Arthur Moseley; J Will Thompson; Lisa St John Williams; Jessica D Tenenbaum; Colette Blach; Rebecca Baillie; Xianlin Han; Sudeepa Bhattacharyya; Jon B Toledo; Simon Schafferer; Sebastian Klein; Therese Koal; Shannon L Risacher; Mitchel Allan Kling; Alison Motsinger-Reif; Daniel M Rotroff; John Jack; Thomas Hankemeier; David A Bennett; Philip L De Jager; John Q Trojanowski; Leslie M Shaw; Michael W Weiner; P Murali Doraiswamy; Cornelia M van Duijn; Andrew J Saykin; Gabi Kastenmüller; Rima Kaddurah-Daouk Journal: Alzheimers Dement Date: 2018-10-15 Impact factor: 16.655
Authors: Isabelle Bos; Stephanie Vos; Rik Vandenberghe; Philip Scheltens; Sebastiaan Engelborghs; Giovanni Frisoni; José Luis Molinuevo; Anders Wallin; Alberto Lleó; Julius Popp; Pablo Martinez-Lage; Alison Baird; Richard Dobson; Cristina Legido-Quigley; Kristel Sleegers; Christine Van Broeckhoven; Lars Bertram; Mara Ten Kate; Frederik Barkhof; Henrik Zetterberg; Simon Lovestone; Johannes Streffer; Pieter Jelle Visser Journal: Alzheimers Res Ther Date: 2018-07-06 Impact factor: 6.982
Authors: Vijay R Varma; Anup M Oommen; Sudhir Varma; Ramon Casanova; Yang An; Ryan M Andrews; Richard O'Brien; Olga Pletnikova; Juan C Troncoso; Jon Toledo; Rebecca Baillie; Matthias Arnold; Gabi Kastenmueller; Kwangsik Nho; P Murali Doraiswamy; Andrew J Saykin; Rima Kaddurah-Daouk; Cristina Legido-Quigley; Madhav Thambisetty Journal: PLoS Med Date: 2018-01-25 Impact factor: 11.069
Authors: Philip Scheltens; Bart De Strooper; Miia Kivipelto; Henne Holstege; Gael Chételat; Charlotte E Teunissen; Jeffrey Cummings; Wiesje M van der Flier Journal: Lancet Date: 2021-03-02 Impact factor: 79.321
Authors: Mohammad Nahid Hossain; Mohammad Helal Uddin; K Thapa; Md Abdullah Al Zubaer; Md Shafiqul Islam; Jiyun Lee; JongSu Park; S-H Yang Journal: J Healthc Eng Date: 2021-12-20 Impact factor: 2.682
Authors: Catherine Bjerre Collin; Tom Gebhardt; Martin Golebiewski; Tugce Karaderi; Maximilian Hillemanns; Faiz Muhammad Khan; Ali Salehzadeh-Yazdi; Marc Kirschner; Sylvia Krobitsch; Lars Kuepfer Journal: J Pers Med Date: 2022-01-26
Authors: Jorge I Vélez; Luiggi A Samper; Mauricio Arcos-Holzinger; Lady G Espinosa; Mario A Isaza-Ruget; Francisco Lopera; Mauricio Arcos-Burgos Journal: Diagnostics (Basel) Date: 2021-05-17