Literature DB >> 32073441

Data science in neurodegenerative disease: its capabilities, limitations, and perspectives.

Sepehr Golriz Khatami^1,2, Sarah Mubeen¹, Martin Hofmann-Apitius^1,2.

Abstract

PURPOSE OF REVIEW: With the advancement of computational approaches and abundance of biomedical data, a broad range of neurodegenerative disease models have been developed. In this review, we argue that computational models can be both relevant and useful in neurodegenerative disease research and although the current established models have limitations in clinical practice, artificial intelligence has the potential to overcome deficiencies encountered by these models, which in turn can improve our understanding of disease. RECENT
FINDINGS: In recent years, diverse computational approaches have been used to shed light on different aspects of neurodegenerative disease models. For example, linear and nonlinear mixed models, self-modeling regression, differential equation models, and event-based models have been applied to provide a better understanding of disease progression patterns and biomarker trajectories. Additionally, the Cox-regression technique, Bayesian network models, and deep-learning-based approaches have been used to predict the probability of future incidence of disease, whereas nonnegative matrix factorization, nonhierarchical cluster analysis, hierarchical agglomerative clustering, and deep-learning-based approaches have been employed to stratify patients based on their disease subtypes. Furthermore, the interpretation of neurodegenerative disease data is possible through knowledge-based models which use prior knowledge to complement data-driven analyses. These knowledge-based models can include pathway-centric approaches to establish pathways perturbed in a given condition, as well as disease-specific knowledge maps, which elucidate the mechanisms involved in a given disease. Collectively, these established models have revealed high granular details and insights into neurodegenerative disease models.
SUMMARY: In conjunction with increasingly advanced computational approaches, a wide spectrum of neurodegenerative disease models, which can be broadly categorized into data-driven and knowledge-driven, have been developed. We review the state of the art data and knowledge-driven models and discuss the necessary steps which are vital to bring them into clinical application.

Entities: Chemical

Mesh：

Year: 2020 PMID： 32073441 PMCID： PMC7077964 DOI： 10.1097/WCO.0000000000000795

Source DB: PubMed Journal: Curr Opin Neurol ISSN： 1350-7540 Impact factor: 6.283

INTRODUCTION

With the silver tsunami (i.e., an aging population) sweeping across the world, neurodegenerative diseases (NDDs) are becoming endemic, placing a disproportionate level of burden on older adults (those aged 65 years or over) [1]. NDDs affect nearly 50 million people worldwide and roughly 10 million new cases are reported every year (https://www.who.int/news-room/fact-sheets/detail/dementia). To prevent the occurrence of these conditions, slow their progression, and reduce their global socioeconomic impact, a deeper understanding of the pathophysiology underlying these diseases is necessary. Understanding the cause of NDDs is challenging because of the complex nature of these diseases and the existence of dysregulations at different biological scales, ranging from mutations at the genetic level to structural and functional alterations of the brain at the clinical level. For this reason, a broad variety of biomarkers throughout all modalities, including imaging and nonimaging have been studied. However, effectively translating these extensive biomarker modalities into a clinical application remains a challenging task. In recent years, computational approaches that analyze these biomarker modalities have led to a wide range of models that help to understand NDDs. Existing models can be placed in two primary categories namely, data-driven models and knowledge-driven models. Although data-driven models are informed directly by patient-level data, knowledge-driven models rely on reasoning over findings of previously published studies. Here, we highlight recent advancements of diverse data-driven models in the context of their applications and describe knowledge-driven models and their applications in NDD research. Finally, we propose the use of artificial intelligence in this field to overcome the limitations associated with clinical data upon which such models are built in order to generate new avenues for better disease comprehension. no caption available

Data-driven models

The multifaceted nature of NDDs demands quantification of a wide variety of biomarkers of all modality types, including imaging and nonimaging, such as cerebrospinal fluid samples and omics data. To translate these biomarker modalities into clinical application, they can be subjected to computational approaches independently (unimodal) or in combination (multimodal). Unimodal-based models overlook the complexity of a disease as they do not consider the interdependence between different biological modality measurements. Nonetheless, in NDD research, certain modalities, such as genetic [2,3] and neuro-imaging [4-6], are well-suited for unimodal biomarker analysis. Specifically, genetic biomarkers are used in these analyses as NDDs are partially driven by genetics [7] and the imaging modality is used as it can estimate pathological changes occurring in patients [8]. In contrast, multimodal-based models provide a more comprehensive overview of a disease by integrating a variety of biomedical and clinical biomarkers. In the following, we review current developments in the field of multimodal-based models. We start with models that provide an overview of biomarker dynamics in the time course of disease which can facilitate disease diagnosis and patient staging and then highlight models which assist in patient prognosis. Finally, we conclude with models that are used for patient stratification.

Disease progression monitoring, diagnosing, and patient staging

Disease progression, or biomarker trajectory, and the current disease stage can be estimated by two primary models. The traditional type models the trajectory of biomarkers based on discrete disease stages, however, a finite number of stages fail to capture continuous changes related to disease progression over time [9-11]. Alternatively, in the contemporary type, disease progression is modeled based on measured biomarkers (e.g., mini mental state examination) [12] and thus, the disease time course is regarded as a continuous process. Although the traditional models were developed by reasoning over previously published studies, the more contemporary ones were developed using diverse computational approaches. These include linear and nonlinear mixed models (N/LMMs) [13-15], differential equation models (DEMs) [15], self-modeling regression models (SMORs) [16], and event-based models (EBMs) [17-19]. Although a diverse set of contemporary models exist, there are trade-offs between the techniques that have been used to develop them. For example, an N/LMM model [14] can make an assumption on the shape of biomarker trajectory (e.g., exponential curves), whereas a DEM model [15] and SMOR model [16] can loosen this assumption. In DEMs, each biomarker is treated independently, yet SMORs pool data from all available biomarkers to estimate the dynamics of biomarkers over the course of disease [20]. In contrast to SMORs and DEMs which provide continuous biomarker trajectories, EBMs provide a discrete description of the biomarkers dynamics [20,21]. This type of model does not include any time information between rate of biomarker changes, which limits its capability in disease monitoring [21]. However, in contrast to other models, EBMs can also address individual deviations from a generic disease progression model [22].

Patient prognosis

Risk models can provide prognostic information by predicting the probability and the time of future incidence of disease. Diverse computational approaches such as the Cox-regression technique [23-25], Bayesian network models [26], and deep-learning-based approaches [27,28] have been used to establish such models. There also exists a trade-off between current implementations of these approaches. Although in the Cox-regression-based models, relationships among features are restricted by a number of assumptions and the causal structure cannot be modeled, Bayesian networks can model the underlying causal relationship between predictive risk variables [29▪▪]. This enables Bayesian networks to ask ‘what-if’ questions and predict risk at the individual-level as the effects in Bayesian networks are represented by directed arrows and thus, with any manipulation of the independent variable, the model can predict its influence on the dependent variable [30]. Furthermore, the prediction accuracies in current implementations of Bayesian networks and deep-learning-based approaches are notably higher than those for Cox-regression models as the former approaches are well suited for dealing with high dimensional data. In Cox-regression and Bayesian network-based models, feature selection is done manually or semiautomatically and thus relies upon prior knowledge from researchers. However, because deep learning algorithms can automatically infer features that can help to predict future incidence of disease, they perform better compared with the Cox-regression and Bayesian network-based models [31]. Moreover, the current implementation of deep-learning-based models are capable of accepting any irregular length of data as an input without preprocessing, in contrast to Cox-regression and Bayesian network-based models where a preprocessing step is required for handling unequal time-series and missing values.

Patient stratification

NDDs are highly heterogeneous diseases in terms of clinical and biological appearance and progression patterns. As such, stratification of patients based on disease subtypes may lead to improved disease management and the design of better treatments, which in turn brings us closer to the goal of precision medicine. To this end, diverse clustering approaches have been used, such as nonnegative matrix factorization [32], nonhierarchical cluster analysis (e.g., k-means clustering) [33,34], and hierarchical agglomerative clustering [35,36]. Although these methods can differentiate subtypes of patients, they are generally not suitable for longitudinal clinical data that often suffer from missing data or unequal time-series measurements because of patient dropout. This is because state-of-the-art distance measure methods (e.g., Euclidean) are unable to compute dis/similarity between samples with different longitudinal measurement lengths [37]. Moreover, Euclidean distance measures often ignore existing temporal correlations between the measurements. Recently, de Jong et al.[38▪▪] proposed an autoencoder-based method to cluster multivariate time-series with many missing values. Although the autoencoder-based model currently only works with equal length time series, it outperformed the clustering approaches that used state-of-the-art distance measures as well as those models which used distance measures specifically designed for unequal time series, such as Dynamic Time Warping.

Knowledge-based models

Knowledge-based models have been developed in parallel to data-driven approaches to facilitate the interpretation of empirical data with background knowledge. Such models have effectively garnered novel insights into several disease areas and have also led to new disease taxonomies. By classifying diseases through data and knowledge-based models, it is possible to establish an alternative approach to the current paradigm of disease classification by clinical appearance. This can facilitate the identification of disease subtypes and associated molecular processes and thereby, help to establish potential disease biomarkers and novel therapeutic targets [39]. NDDs are a particularly complex set of diseases, where short and direct causal links can be challenging to discern. However, NDDs have considerable genetic components and with an abundance of biomedical omics data generated from high-throughput technologies, several data-driven approaches can be used to gain insights on these multifaceted diseases. Nonetheless, these approaches tend to lack contextual information; for instance, the cumulative effect of several dysregulated genes with slight alterations can be greater than the effect of a single, highly altered gene. Conversely, knowledge-based models, such as pathway-centric approaches, can incorporate prior knowledge to point at relevant pathways or biological processes and possess greater explanatory power [40]. Thus, by taking into account pathway effects, pathway-centric approaches consider a condition in its broader biological context rather than elucidating specific, individual genes or molecular processes involved in that condition [41]. In the field of NDD research, various pathway analyses have reported significantly enriched pathways in specific NDDs, such as Alzheimer's disease [42-44], and Huntington's disease [45,46], as well as across two or more NDDs [47-49]. Though various pathway-centric approaches markedly improve the interpretive power of omics data, these approaches rely upon canonical pathways and representing disease context can be challenging [50]. Moreover, as NDDs tend to be complex and multifaceted, they may only be partially attributable to the involvement of a given number of pathways. As such, elucidating disease-specific mechanisms may be more appropriate for NDDs as compared with applying pathway-centric approaches [51▪▪]. Accordingly, disease-specific knowledge maps, resources which contain mechanisms that are specific to a particular disease, have been developed and can be used to build computational models of disease. Notably, the Disease Maps Project has collected disease maps for several diseases, including AlzPathway for Alzheimer's disease [52] and Parkinson's disease map for Parkinson's disease [53], whereas the NeuroMMSig knowledge graph [51▪▪] has collated candidate mechanisms for Alzheimer's disease, Parkinson's disease, and epilepsy.

Outlook and perspectives of using artificial intelligence in neurodegeneration research: virtual cohorts for data sharing and trial simulation

Although a wide spectrum of computational models has been established, current applications of these models in clinical practice are limited because of deficiencies that come with clinical studies, such as biases toward specific ethnicities, small sample sizes, data missingness, data heterogeneity, and data privacy. In the following, we first elaborate on the inherent challenges in using clinical data, then outline a promising solution to address these challenges and conclude with its potential applications in neurodegeneration research. Ideally, clinical data should be collected in regular intervals for all patients. However, only a limited number of clinical studies have collected longitudinal measurements. Additionally, because of inclusion–exclusion criteria that cannot be avoided or the disproportionate representation of particular ethnicities due to geographic constraints, these studies tend to have biases [54]. Furthermore, most clinical datasets have a relatively small number of samples (fewer than a thousand patients) and a large number of missing observations [55-57]. As such, dealing with these deficiencies generally demands extensive preprocessing such as imputation or discarding of variables. Finally, different clinical studies in equivalent disease contexts usually do not measure the same clinical outcomes and/or molecular data. Nonetheless, even if measurements of identical outcomes and/or data are collected across studies, as their study protocols vary, the data coming from one study is often not interoperable (mappable) to data coming from another. Therefore, clinical data are highly heterogeneous [58]. Additionally, sharing patient data beyond an organization's firewalls is restricted because of legal and ethical constraints. Consequently, there exist data ‘silos’ which impede the required analyses and comparisons of multiple studies which are so vital to gain comprehensive overviews of a specific disease. Although a broad range of solutions has been established to address these deficiencies, each of these has its own challenges. For example, imputing missing values can lead to errors and discarding variables that contain a high proportion of missing values can result in information loss and biased conclusions [59]. Similarly, although individual agreements between data users (e.g., research institutes) and data owners (e.g., Alzheimer's disease neuroimaging initiative) can provide access to the data, its usage is restricted to certain activities. For instance, use of the data may not be permitted for teaching and training purposes. Such shortcomings with clinical data can be overcome with artificial intelligence, and machine learning approaches in particular. These approaches facilitate simulating a synthetic cohort (i.e., virtual cohort) which is informed by actual cohort data and can thus represent the fundamental characteristics of the real cohort [60]. Although this solution has a long history in physiological studies [61,62], and clinical trial simulation [63,64], their application in clinical studies where the focus is on simulating virtual patients across biological scales and modalities (e.g., nonimaging, imaging) is a more recent development. Recently, Gootjes-Dreesbach et al.[65▪▪] have developed a variational autoencoder modular Bayesian network which simulates heterogeneous clinical study data as a virtual patient cohort. Not only do virtual cohorts provide new avenues toward sharing patient-level data without endangering the data privacy of real patients, they can also enable the generation of ‘meta-cohorts’ by combining the virtual patients obtained from different available clinical data in disease-specific contexts [63]. This ‘meta-cohort’ not only addresses the deficiency of small sample sizes in clinical studies, but also eliminates their inherent biases (e.g., underrepresentation of certain ethnicities). Moreover, virtual cohorts provide opportunities to conduct counterfactual or ‘what-if’ scenarios. For example, researchers can add a feature which has not been observed in a specific study (e.g., comorbidity) and investigate how it influences the disease of interest or what the distribution of a particular biomarker would be if a patient's age shifts a number of years [65▪▪]. Ultimately, virtual cohorts can improve the design of clinical trials and have the potential to bring us closer to the goal of precision medicine.

CONCLUSION

We have reviewed a wide range of established NDD models, from unimodal-based models to multimodal-based ones. We have shown that in contrast to data-driven approaches, knowledge-driven approaches can provide meaningful contextualization and insights into the pathophysiology of disease. We described the deficiencies and limitations of currently available clinical studies in the scope of NDDs and argued the potential of artificial intelligence to overcome these shortcomings so that it is possible to generate new avenues toward better comprehending neurodegenerative disease.

Acknowledgements

The authors thank Daniel Domingo-Fernández for critical reading of the manuscript.

Financial support and sponsorship

The authors receive funding from the IMI project AETIONOMY () within the Seventh Framework Program of the European Union under grant agreement No. 115568.

Conflicts of interest

There are no conflicts of interest.

REFERENCES AND RECOMMENDED READING

Papers of particular interest, published within the annual period of review, have been highlighted as: ▪ of special interest ▪▪ of outstanding interest

58 in total

1. Neurodegenerative diseases.

Authors: Marie-Thérèse Heemels
Journal: Nature Date: 2016-11-10 Impact factor: 49.962

2. Tracking pathophysiological processes in Alzheimer's disease: an updated hypothetical model of dynamic biomarkers.

Authors: Clifford R Jack; David S Knopman; William J Jagust; Ronald C Petersen; Michael W Weiner; Paul S Aisen; Leslie M Shaw; Prashanthi Vemuri; Heather J Wiste; Stephen D Weigand; Timothy G Lesnick; Vernon S Pankratz; Michael C Donohue; John Q Trojanowski
Journal: Lancet Neurol Date: 2013-02 Impact factor: 44.182

3. A point-based tool to predict conversion from mild cognitive impairment to probable Alzheimer's disease.

Authors: Deborah E Barnes; Irena S Cenzer; Kristine Yaffe; Christine S Ritchie; Sei J Lee
Journal: Alzheimers Dement Date: 2014-02-01 Impact factor: 21.566

Review 4. The Alzheimer's disease neuroimaging initiative.

Authors: Susanne G Mueller; Michael W Weiner; Leon J Thal; Ronald C Petersen; Clifford Jack; William Jagust; John Q Trojanowski; Arthur W Toga; Laurel Beckett
Journal: Neuroimaging Clin N Am Date: 2005-11 Impact factor: 2.264

5. Variation in Variables that Predict Progression from MCI to AD Dementia over Duration of Follow-up.

Authors: Shanshan Li; Ozioma Okonkwo; Marilyn Albert; Mei-Cheng Wang
Journal: Am J Alzheimers Dis (Columbia) Date: 2013

6. Pathway enrichment analysis and visualization of omics data using g:Profiler, GSEA, Cytoscape and EnrichmentMap.

Authors: Jüri Reimand; Ruth Isserlin; Veronique Voisin; Mike Kucera; Christian Tannus-Lopes; Asha Rostamianfar; Lina Wadi; Mona Meyer; Jeff Wong; Changjiang Xu; Daniele Merico; Gary D Bader
Journal: Nat Protoc Date: 2019-02 Impact factor: 13.491

Review 7. Radiological biomarkers for diagnosis in PSP: Where are we and where do we need to be?

Authors: Jennifer L Whitwell; Günter U Höglinger; Angelo Antonini; Yvette Bordelon; Adam L Boxer; Carlo Colosimo; Thilo van Eimeren; Lawrence I Golbe; Jan Kassubek; Carolin Kurz; Irene Litvan; Alexander Pantelyat; Gil Rabinovici; Gesine Respondek; Axel Rominger; James B Rowe; Maria Stamelou; Keith A Josephs
Journal: Mov Disord Date: 2017-05-13 Impact factor: 10.338

2. A Comprehensive Machine Learning Framework for the Exact Prediction of the Age of Onset in Familial and Sporadic Alzheimer's Disease.

Authors: Jorge I Vélez; Luiggi A Samper; Mauricio Arcos-Holzinger; Lady G Espinosa; Mario A Isaza-Ruget; Francisco Lopera; Mauricio Arcos-Burgos
Journal: Diagnostics (Basel) Date: 2021-05-17

2 in total