Literature DB >> 31890857

A metabolite-based machine learning approach to diagnose Alzheimer-type dementia in blood: Results from the European Medical Information Framework for Alzheimer disease biomarker discovery cohort.

Daniel Stamate^1,2,3, Min Kim⁴, Petroula Proitsi⁵, Sarah Westwood⁶, Alison Baird⁶, Alejo Nevado-Holgado⁶, Abdul Hye⁵, Isabelle Bos^7,8, Stephanie J B Vos⁷, Rik Vandenberghe⁸, Charlotte E Teunissen⁹, Mara Ten Kate^8,9, Philip Scheltens⁸, Silvy Gabel^10,11,12, Karen Meersmans^11,12, Olivier Blin¹³, Jill Richardson¹⁴, Ellen De Roeck^15,16,17, Sebastiaan Engelborghs^16,17,18, Kristel Sleegers^17,19, Régis Bordet²⁰, Lorena Ramit²¹, Petronella Kettunen²², Magda Tsolaki²³, Frans Verhey⁷, Daniel Alcolea²⁴, Alberto Lléo²⁴, Gwendoline Peyratout²⁵, Mikel Tainta²⁶, Peter Johannsen²⁷, Yvonne Freund-Levi^5,28, Lutz Frölich²⁹, Valerija Dobricic³⁰, Giovanni B Frisoni^31,32, José L Molinuevo^20,33, Anders Wallin³⁴, Julius Popp^25,35, Pablo Martinez-Lage²⁶, Lars Bertram^30,36, Kaj Blennow^37,38, Henrik Zetterberg^37,38,39,40, Johannes Streffer⁴¹, Pieter J Visser^7,8, Simon Lovestone^6,42, Cristina Legido-Quigley^4,43.

Abstract

INTRODUCTION: Machine learning (ML) may harbor the potential to capture the metabolic complexity in Alzheimer Disease (AD). Here we set out to test the performance of metabolites in blood to categorize AD when compared to CSF biomarkers.
METHODS: This study analyzed samples from 242 cognitively normal (CN) people and 115 with AD-type dementia utilizing plasma metabolites (n = 883). Deep Learning (DL), Extreme Gradient Boosting (XGBoost) and Random Forest (RF) were used to differentiate AD from CN. These models were internally validated using Nested Cross Validation (NCV).
RESULTS: On the test data, DL produced the AUC of 0.85 (0.80-0.89), XGBoost produced 0.88 (0.86-0.89) and RF produced 0.85 (0.83-0.87). By comparison, CSF measures of amyloid, p-tau and t-tau (together with age and gender) produced with XGBoost the AUC values of 0.78, 0.83 and 0.87, respectively. DISCUSSION: This study showed that plasma metabolites have the potential to match the AUC of well-established AD CSF biomarkers in a relatively small cohort. Further studies in independent cohorts are needed to validate whether this specific panel of blood metabolites can separate AD from controls, and how specific it is for AD as compared with other neurodegenerative disorders.

Entities: Chemical

Keywords: Alzheimer's disease; Biomarkers; EMIF-AD; Machine-Learning; Metabolomics

Year: 2019 PMID： 31890857 PMCID： PMC6928349 DOI： 10.1016/j.trci.2019.11.001

Source DB: PubMed Journal: Alzheimers Dement (N Y) ISSN： 2352-8737

Introduction

At present, the diagnosis of Alzheimer disease–type dementia (AD) is based on protein biomarkers in cerebrospinal fluid (CSF) and brain imaging together with a battery of cognition tests. Diagnostic tools based on CSF collection are invasive while brain-imaging tools are still costly, and therefore, there is a need to identify noninvasive tools for early detection as well as for measuring disease progression. In recent years, an increasing number of studies have examined blood metabolites as potential AD biomarkers [[1], [2], [3], [4]]. The advantages of looking at blood metabolites are that they are easily accessible but also that they represent an essential aspect of the phenotype of an organism and hence might act as a molecular fingerprint of disease progression [5,6]. Therefore, blood AD markers could potentially aid early diagnosis and recruitment for trials. Here we utilized data generated as part of the European Medical Information Framework for AD Multimodal Biomarker Discovery (EMIF-AD) previously reported in full in Kim et al. [7]. As discussed in that paper, metabolite levels were measured using liquid chromatography–mass spectroscopy (LC-MS) to cover ca. Eight hundred metabolites and these metabolites related to CSF biomarkers of AD commonly used in clinical research including trials, and increasingly in clinical practice, as part of the diagnostic work up. Here we explore the potential of different Machine Learning (ML) algorithms to identify those individuals with AD from dataset and to compare the effectiveness of blood-based metabolites as an indicator of clinical diagnosis to that of CSF markers. In this study we employed two state-of-the-art ML algorithms—Deep Learning (DL) and Extreme Gradient Boosting (XGBoost)—and compared these to the more commonly utilized Random Forest (RF) algorithm.

Methods

This study accessed data previously generated from 242 samples from cognitively normal (CN) individuals and 115 from people with AD-type dementia (AD) samples in which diagnosis was based on clinical diagnosis. Details on the subjects, clinical and cognitive data, as well as measurements of AD pathological markers have been described elsewhere [7,8]. The metabolomics data employed here was accessed in the EMIF-AD portal and the acquisition and processing details can be found via open access in [7]. In short, the EMIF-AD cohort is a collated cohort making use of existing data and samples collected in 11 different studies across Europe, with the aim to discover novel diagnostic and prognostic markers for predementia AD. In the current study, the main objective was to use state-of-the-art ML classification algorithms to build CN versus. AD predictive models using blood metabolites. For this purpose, we employed DL and XGBoost. Additionally we also employed the more popularly used RF algorithm. These models were compared in terms of binary classifiers with Area Under the Curve (AUC) in Receiver Operating Characteristic (ROC) curves. The metabolites with more than 45% missing values were discarded. The remaining missing values were handled with imputation methods based on the k-nearest neighbor (RF and DL), or internally by the classification algorithm (XGBoost). Models were built and evaluated using a Nested Cross Validation (NCV), which used 9/10 data folds for model training and optimization in an inner cross-validation, and 1/10 data folds for model testing in an outer cross-validation. The process was repeated 10 times, for each of the test data folds. The analysis was further extended by assessing the stability of the AUC performance with Monte Carlo (MC) simulations consisting of 50 repeated similar NCV experiments. As such, multiple models were built on multiple samples in the NCV and MC, using metabolite predictors selected on the basis of their capability to discriminate CN versus AD as measured by the Relief algorithm [9] applied on training data in combination with 500 permutations of the outcome variables’ values. This method computes the predictors' importance defined as the standardized Relief score, according to Measuring Predictor Importance chapter of [10]. Part of the prediction modeling methodology in this study was adapted after [11], with different algorithms, and followed recommendations from [10,12]. The analysis was carried out using R software [13]. Pathway analysis was performed on the top 20 ranked metabolites using MetaboAnalyst 4.0 [14]. The algorithms were run on four servers with 6-core Xeon CPUs and 336 GB RAM.

Results

In this study, we analyzed metabolite data derived from blood samples from 357 participants (CN n = 242, AD n = 115) previously reported in Kim et al. [7]. Demographic and clinical data can be found in [7]; in short, there was no difference in gender while AD participants were older when compared with CN participants. On the test data, the DL model produced a Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) value of 0.85 with its 95% confidence interval (CI) ranging between 0.8038 and 0.8895. The XGBoost model produced the AUC value of 0.88 (95% CI [0.8619, 0.8903]). When the classifier model RF was employed, the resulting AUC was 0.85 (95%CI [0.8323, 0.8659]). Fig. 1 illustrates ROC curves obtained from the three ML models.

Fig. 1

Shows the AUC values for the XGBoost, RF and DL models. XGBoost performed best with metabolite predictors in the EMIF cohort.

Shows the AUC values for the XGBoost, RF and DL models. XGBoost performed best with metabolite predictors in the EMIF cohort. The MC simulation conducted with XGBoost, which was the superior predictive model in our analysis, led to a Gaussian distribution of the AUC values according to [11] and as confirmed by Shapiro-Wilk test (P value = .6819). The 50 AUC values obtained in MC had a minimum of 0.8614, a maximum of 0.8923, a mean of 0.8761, a median of 0.8766 and a standard deviation of 0.0072. The t-test showed that the true mean of AUC for XGBoost applied on plasma metabolites was not lower than 0.87 (P value = 1.265 × 10−07). For comparison, we also investigated the levels of amyloid, p-tau and t-tau, to which we added also age and gender, and their prediction for clinical AD versus CN. XGBoost models were built in the same manner as for metabolite predictors. Together with age and gender, amyloid led to AUC 0.78 (95%CI [0.7626, 0.8013]); p-tau led to AUC 0.83 (95%CI [0.8188, 0.8470]); and t-tau led to AUC 0.87 (95%CI [0.8583, 0.8854]). From the mean AUC for metabolites and for amyloid, p-tau and t-tau calculated individually, the t-tests showed superior values for metabolites (P value<2.2 × 10−16, P value<2.2 × 10−16 and P value = .005921, respectively). The top 20 ranked predictors out of the 347 selected by the method presented in the previous section are shown in Fig. 2.

Fig. 2

The x-axis shows the top 20 ranked predictors, and the y-axis shows the predictors' importance computed as the standardized relief score according to Measuring Predictor Importance chapter of [10].

The x-axis shows the top 20 ranked predictors, and the y-axis shows the predictors' importance computed as the standardized relief score according to Measuring Predictor Importance chapter of [10]. Pathway analyses revealed that the Nitrogen pathway was overrepresented (qFDR = 0.004) within the panel. Molecules that were captured as the 20 top ranking predictors are discussed in the next section.

Discussion

Machine Learning applied to healthcare is increasingly enabled by the advent of high-performance computing and the development of complex algorithms. In this study, we employed two state-of-the-art algorithms, DL and XGBoost, and a more conventional algorithm, RF, to obtain high accuracy models to predict AD versus CN with metabolites as predictors. Our study showed that the best model was based on XGBoost [15], which is an enhanced form of Gradient Boosting Machines methods based on decision trees [12]. In our study RF and DL achieved comparable AUC. DL algorithms are known to often take advantage of large and/or unstructured data (such as images) to produce more accurate category discrimination/prediction. In a study using the Alzheimer's Disease Neuroimaging Initiative (ADNI) data for AD prediction, XGBoost demonstrated superior results (AUC = 0.97 (0.01) when including imaging parameters (MRI and PET) as predictors and when compared to RF, Support Vector Machines, Gaussian Processes and Stochastic Gradient Boosting [16]. In another study where cognition and MRI were used as predictors, Kernel Ridge Regression was performed to R2 = 0.87 (0.025) when cognition and MRI predictors were included [17]. Pathway analyses using the top 20 AD predicting metabolites derived from the Relief method showed that the nitrogen pathway was overrepresented. Some of the molecules selected have been reported in metabolomics studies and have been implicated in neurodegeneration: dodecanoate, which is a C12 fatty acid, was found correlated to longitudinal measures of cognition in the ADNI cohort [3] and so was the bile acid glycolithocholate, which was associated to both AD and cognition measures (ADAS-Cog13) in one of the biggest cross-sectional studies on cognition, AD and the microbiome [18]. Plasmalogens were also found in decreased levels in our cohort in agreement with an earlier report [19]. The amide form of vitamin B3, nicotinamide, has been implicated in both neuroprotection and neuronal death [20]. New metabolites that could be of interest and have not been previously reported as related to AD were phytanate and furoylglycine. The former is a known neurotoxin which impairs mitochondrial function and transcription [21]. Furoylglycine is a metabolite which, as lithocholic acid, is mainly synthesized by the microbiome and has been reported as a biomarker of coffee consumption [22]. A limitation of our study is that it does not include an external validation due to the size of the cohort. However, we implemented a NCV procedure repeated 50 times in a MC simulation that led to an extended internal validation with prediction accuracy of cases. Further studies will assess the performance of ratios/combinations of CSF markers and metabolites, life-style factors and disorders commonly found in the elderly, together with testing the specificity for this specific panel in other neurodegenerative (e.g., PD, FTD), neurological (e.g., stroke) and psychiatric (e.g., depression) disorders associated with aging. The intent of this paper was to compare the performance of different ML algorithms to identify people with AD from cognitively unimpaired individuals. Here we show first that all three approaches used demonstrate good discriminatory power, second that XGBoost is somewhat more effective in this particular dataset than RF and DL and third, that this accuracy for clinical diagnosis is broadly similar to that achieved by CSF markers of AD pathology. The lack of a replication and validation dataset limits the interpretation of this finding, but nonetheless, the strong prediction of diagnostic category from a blood-based metabolite biomarker set is further evidence of the potential of such approaches to complement other biomarkers in identification of people with likely AD. Systematic review: The authors reviewed the literature using PubMed and reported key publications. Most AD biomarker studies employing state-of-the-art machine learning (ML) techniques utilized neuroimaging data. Those which looked at blood metabolomics data were small and used clinical diagnosis as an endpoint. Subsequently, we explored the potential of state-of-the art ML algorithms including Deep Learning and Extreme Gradient Boosting to test the performance of blood metabolite levels to clinical diagnosis and compared to CSF biomarkers. Interpretation: The results in here show that with state-of-the-art ML algorithms, blood metabolites have the potential to match the CSF markers of AD pathology on identifying people with AD from cognitively unimpaired individuals. All the ML algorithms employed showed good discriminatory power. Future directions: Results of this study should be replicated and validated using an independent dataset. Further studies will also aim to assess the performance of ratios/combinations of CSF markers and metabolites, life-style factors and disorders commonly found in the elderly, together with testing the specificity for this specific panel in other neurodegenerative (e.g., PD, FTD), neurological (e.g., stroke) and psychiatric (e.g., depression) disorders associated with aging.

16 in total

1. Plasmalogen deficiency in early Alzheimer's disease subjects and in animal models: molecular characterization using electrospray ionization mass spectrometry.

Authors: X Han; D M Holtzman; D W McKeel
Journal: J Neurochem Date: 2001-05 Impact factor: 5.372

Review 2. Current strategies in the discovery of small-molecule biomarkers for Alzheimer's disease.

Authors: Luke Whiley; Cristina Legido-Quigley
Journal: Bioanalysis Date: 2011-05 Impact factor: 2.681

3. A practical computerized decision support system for predicting the severity of Alzheimer's disease of an individual.

Authors: Magda Bucholc; Xuemei Ding; Haiying Wang; David H Glass; Hui Wang; Girijesh Prasad; Liam P Maguire; Anthony J Bjourson; Paula L McClean; Stephen Todd; David P Finn; KongFatt Wong-Lin
Journal: Expert Syst Appl Date: 2019-04-10 Impact factor: 6.954

4. Identifying psychosis spectrum disorder from experience sampling data using machine learning approaches.

Authors: Daniel Stamate; Andrea Katrinecz; Daniel Stahl; Simone J W Verhagen; Philippe A E G Delespaul; Jim van Os; Sinan Guloksuz
Journal: Schizophr Res Date: 2019-05-16 Impact factor: 4.939

5. Metabolic network failures in Alzheimer's disease: A biochemical road map.

Authors: Jon B Toledo; Matthias Arnold; Gabi Kastenmüller; Rui Chang; Rebecca A Baillie; Xianlin Han; Madhav Thambisetty; Jessica D Tenenbaum; Karsten Suhre; J Will Thompson; Lisa St John-Williams; Siamak MahmoudianDehkordi; Daniel M Rotroff; John R Jack; Alison Motsinger-Reif; Shannon L Risacher; Colette Blach; Joseph E Lucas; Tyler Massaro; Gregory Louie; Hongjie Zhu; Guido Dallmann; Kristaps Klavins; Therese Koal; Sungeun Kim; Kwangsik Nho; Li Shen; Ramon Casanova; Sudhir Varma; Cristina Legido-Quigley; M Arthur Moseley; Kuixi Zhu; Marc Y R Henrion; Sven J van der Lee; Amy C Harms; Ayse Demirkan; Thomas Hankemeier; Cornelia M van Duijn; John Q Trojanowski; Leslie M Shaw; Andrew J Saykin; Michael W Weiner; P Murali Doraiswamy; Rima Kaddurah-Daouk
Journal: Alzheimers Dement Date: 2017-03-22 Impact factor: 16.655

6. Altered bile acid profile associates with cognitive impairment in Alzheimer's disease-An emerging role for gut microbiome.

Authors: Siamak MahmoudianDehkordi; Matthias Arnold; Kwangsik Nho; Shahzad Ahmad; Wei Jia; Guoxiang Xie; Gregory Louie; Alexandra Kueider-Paisley; M Arthur Moseley; J Will Thompson; Lisa St John Williams; Jessica D Tenenbaum; Colette Blach; Rebecca Baillie; Xianlin Han; Sudeepa Bhattacharyya; Jon B Toledo; Simon Schafferer; Sebastian Klein; Therese Koal; Shannon L Risacher; Mitchel Allan Kling; Alison Motsinger-Reif; Daniel M Rotroff; John Jack; Thomas Hankemeier; David A Bennett; Philip L De Jager; John Q Trojanowski; Leslie M Shaw; Michael W Weiner; P Murali Doraiswamy; Cornelia M van Duijn; Andrew J Saykin; Gabi Kastenmüller; Rima Kaddurah-Daouk
Journal: Alzheimers Dement Date: 2018-10-15 Impact factor: 16.655

Review 7. The Influence of Nicotinamide on Health and Disease in the Central Nervous System.

Authors: Rosemary A Fricker; Emma L Green; Stuart I Jenkins; Síle M Griffin
Journal: Int J Tryptophan Res Date: 2018-05-21

8. The EMIF-AD Multimodal Biomarker Discovery study: design, methods and cohort characteristics.

Authors: Isabelle Bos; Stephanie Vos; Rik Vandenberghe; Philip Scheltens; Sebastiaan Engelborghs; Giovanni Frisoni; José Luis Molinuevo; Anders Wallin; Alberto Lleó; Julius Popp; Pablo Martinez-Lage; Alison Baird; Richard Dobson; Cristina Legido-Quigley; Kristel Sleegers; Christine Van Broeckhoven; Lars Bertram; Mara Ten Kate; Frederik Barkhof; Henrik Zetterberg; Simon Lovestone; Johannes Streffer; Pieter Jelle Visser
Journal: Alzheimers Res Ther Date: 2018-07-06 Impact factor: 6.982

9. Brain and blood metabolite signatures of pathology and progression in Alzheimer disease: A targeted metabolomics study.

Authors: Vijay R Varma; Anup M Oommen; Sudhir Varma; Ramon Casanova; Yang An; Ryan M Andrews; Richard O'Brien; Olga Pletnikova; Juan C Troncoso; Jon Toledo; Rebecca Baillie; Matthias Arnold; Gabi Kastenmueller; Kwangsik Nho; P Murali Doraiswamy; Andrew J Saykin; Rima Kaddurah-Daouk; Cristina Legido-Quigley; Madhav Thambisetty
Journal: PLoS Med Date: 2018-01-25 Impact factor: 11.069

10. MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis.

Authors: Jasmine Chong; Othman Soufan; Carin Li; Iurie Caraus; Shuzhao Li; Guillaume Bourque; David S Wishart; Jianguo Xia
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

19 in total

1. Deep learning-based identification of genetic variants: application to Alzheimer's disease classification.

Authors: Taeho Jo; Kwangsik Nho; Paula Bice; Andrew J Saykin
Journal: Brief Bioinform Date: 2022-03-10 Impact factor: 11.622

2. Serum biomarker-based osteoporosis risk prediction and the systemic effects of Trifolium pratense ethanolic extract in a postmenopausal model.

Authors: Yixian Quah; Jireh Chan Yi-Le; Na-Hye Park; Yuan Yee Lee; Eon-Bee Lee; Seung-Hee Jang; Min-Jeong Kim; Man Hee Rhee; Seung-Jin Lee; Seung-Chun Park
Journal: Chin Med Date: 2022-06-14 Impact factor: 4.546

Review 3. Alzheimer's disease.

Authors: Philip Scheltens; Bart De Strooper; Miia Kivipelto; Henne Holstege; Gael Chételat; Charlotte E Teunissen; Jeffrey Cummings; Wiesje M van der Flier
Journal: Lancet Date: 2021-03-02 Impact factor: 79.321

4. An integrative multi-omics approach reveals new central nervous system pathway alterations in Alzheimer's disease.

Authors: Christopher Clark; Loïc Dayon; Mojgan Masoodi; Gene L Bowman; Julius Popp
Journal: Alzheimers Res Ther Date: 2021-04-01 Impact factor: 6.982

5. Detecting Cognitive Impairment Status Using Keystroke Patterns and Physical Activity Data among the Older Adults: A Machine Learning Approach.

Authors: Mohammad Nahid Hossain; Mohammad Helal Uddin; K Thapa; Md Abdullah Al Zubaer; Md Shafiqul Islam; Jiyun Lee; JongSu Park; S-H Yang
Journal: J Healthc Eng Date: 2021-12-20 Impact factor: 2.682

6. Application of Artificial Intelligence Modeling Technology Based on Fluid Biopsy to Diagnose Alzheimer's Disease.

Authors: Yuan Sh; Benliang Liu; Jianhu Zhang; Ying Zhou; Zhiyuan Hu; Xiuli Zhang
Journal: Front Aging Neurosci Date: 2021-12-03 Impact factor: 5.750

7. Plasma metabolomics of autism spectrum disorder and influence of shared components in proband families.

Authors: Ming Kei Chung; Matthew Ryan Smith; Yufei Lin; Douglas I Walker; Dean Jones; Chirag J Patel; Sek Won Kong
Journal: Exposome Date: 2021-10-07

Review 8. Computational Models for Clinical Applications in Personalized Medicine-Guidelines and Recommendations for Data Integration and Model Validation.

Authors: Catherine Bjerre Collin; Tom Gebhardt; Martin Golebiewski; Tugce Karaderi; Maximilian Hillemanns; Faiz Muhammad Khan; Ali Salehzadeh-Yazdi; Marc Kirschner; Sylvia Krobitsch; Lars Kuepfer
Journal: J Pers Med Date: 2022-01-26

9. A Comprehensive Machine Learning Framework for the Exact Prediction of the Age of Onset in Familial and Sporadic Alzheimer's Disease.

Authors: Jorge I Vélez; Luiggi A Samper; Mauricio Arcos-Holzinger; Lady G Espinosa; Mario A Isaza-Ruget; Francisco Lopera; Mauricio Arcos-Burgos
Journal: Diagnostics (Basel) Date: 2021-05-17

10. Machine Learning and Novel Biomarkers for the Diagnosis of Alzheimer's Disease.

Authors: Chun-Hung Chang; Chieh-Hsin Lin; Hsien-Yuan Lane
Journal: Int J Mol Sci Date: 2021-03-09 Impact factor: 5.923