| Literature DB >> 30042519 |
Shashank Khanna1,2, Daniel Domingo-Fernández1,2, Anandhi Iyappan1,2, Mohammad Asif Emon1,2, Martin Hofmann-Apitius1,2, Holger Fröhlich3,4.
Abstract
Alzheimer's Disease (AD) is among the most frequent neuro-degenerative diseases. Early diagnosis is essential for successful disease management and chance to attenuate symptoms by disease modifying drugs. In the past, a number of cerebrospinal fluid (CSF), plasma and neuro-imaging based biomarkers have been proposed. Still, in current clinical practice, AD diagnosis cannot be made until the patient shows clear signs of cognitive decline, which can partially be attributed to the multi-factorial nature of AD. In this work, we integrated genotype information, neuro-imaging as well as clinical data (including neuro-psychological measures) from ~900 normal and mild cognitively impaired (MCI) individuals and developed a highly accurate machine learning model to predict the time until AD is diagnosed. We performed an in-depth investigation of the relevant baseline characteristics that contributed to the AD risk prediction. More specifically, we used Bayesian Networks to uncover the interplay across biological scales between neuro-psychological assessment scores, single genetic variants, pathways and neuro-imaging related features. Together with information extracted from the literature, this allowed us to partially reconstruct biological mechanisms that could play a role in the conversion of normal/MCI into AD pathology. This in turn may open the door to novel therapeutic options in the future.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30042519 PMCID: PMC6057884 DOI: 10.1038/s41598-018-29433-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overall approach to analyze ADNI data.
Figure 2(A) Boxplot of cross-validated concordance index. (B) Prediction error (Brier score) as a function of time for GBM vs. Kaplan-Meier estimator. The prediction error curve is calculated on held out test data during the 10 times repeated 10-fold cross-validation procedure. The solid curve corresponds to the mean and the shaded area to the standard deviation. (C) 25 most relevant features according to GBM model trained on the whole tuning dataset. (D) Selection frequency of these features during the 10 times repeated 10-fold cross-validation procedure. (E) cumulative relative influence of feature groups in final model.
Figure 3Cumulative hazard as a function of time for the 10% patients with highest AD risk scores (red) and 10% patients with lowest AD risk scores (green). Depicted are the average risk curves plus standard errors as confidence bands.
Figure 4Edges appearing in more than 50% of 1000 Bayesian Network reconstructions based on random sub-samples of the data. Line thickness is proportional to the relative frequency of observing an edge in the 1000 network reconstrucions, and the corresponding number is shown as edge label. The node size is proportional to the relative influence of the variable in the final GBM model, and the color reflects the selection frequency in the repeated cross-validation procedure (more black = higher stability). Sub-figures (A) and (B) depict two examples zooms into the overall network.
Figure 5Two examples of mapping stable BN edges to biological mechanisms via the OpenBEL AD graph by Kodamullil et al.[48]: (A) adjherens junction and autophagy; (B) insulin signaling and natural killer cell mediated cytotoxicity. Biological entities mapping to the source of the edge marked by an arrow on the left hand side of the Figure are drawn in yellow in the OpenBEL graph on the right hand side. Biological entities mapping to the sink of the edge are shown in red. Red edges highlight the shortest among all possible paths connecting yellow and red nodes.
Figure 6Approach to genomic feature extraction.