Peter Bede1,2, Kai Ming Chang1,3, Ee Ling Tan1. 1. Computational Neuroimaging Group, Trinity College Dublin, Dublin, Ireland. 2. Department of Neurology, St James's Hospital, Dublin, Ireland. 3. Department of Electronics and Computer Science, University of Southampton, Southampton, UK.
Although machine‐learning (ML) approaches have been extensively utilized in neurodegenerative conditions, they can be challenging to implement in motor neuron diseases (MNDs) due to disease‐specific characteristics. The potential of ML algorithms has been explored by academic amyotrophic lateral sclerosis (ALS) studies, but they have not been developed into viable clinical applications to date. ALS studies traditionally conduct "group‐level" analyses to describe phenotype‐ or genotype‐associated clinical traits, survival characteristics, progression rates, biomarker profiles, and imaging signatures [1, 2, 3, 4]. These, although academically interesting, have limited utility for the interpretation of data from single individuals. The appeal of ML frameworks in a condition with considerable clinical heterogeneity, such as ALS, is the opportunity to categorize individual patients into clinically relevant subgroups. The long‐awaited transition from "group‐level" descriptive analyses to precision, "individual‐subject" data interpretation has been fueled by the emergence of large training datasets, in the form of purpose‐designed data repositories, national registries, or leftover data from clinical trials. Harnessing the availability of such data sources, a multitude of promising ML studies have been published demonstrating the prospect of accurately classifying a single individual into relevant diagnostic or prognostic subgroups [5].There are important lessons to consider from early ML initiatives in ALS. Irrespective of the specific ML model implemented, cohort size for model training is crucial, which is one of the biggest challenges in ALS in contrast to more common neurodegenerative conditions. A considerable shortcoming of single‐centre ML studies is the lack of external validation, which coupled with small training datasets, increases the risk of model overfitting to local data. Binary classification studies merely categorizing individuals into "ALS" versus "healthy control" groups have limited practical appeal, as the diagnostic dilemma in the clinical setting is not whether an individual is healthy, but rather whether the constellation of findings represents incipient ALS or an alternative neurodegenerative or neuromuscular condition. Multiclass classification studies mirror real‐life clinical scenarios better, especially if multiple MND phenotypes are represented. The accurate categorization of an early stage upper motor neuron‐predominant case into "ALS" versus "probable PLS," for example, is hugely important due to the survival ramifications of the correct diagnosis [6, 7]. Another stereotyped caveat of ML studies in ALS is model validation on cohorts with long symptom duration. The categorization of patients with long symptom duration with considerable disability and marked biomarker changes is not ideal to test model accuracy. A more compelling validation of a model is whether early stage patients or asymptomatic gene‐carriers are accurately categorized into prospective diagnostic and prognostic groups based on peridiagnostic or presymptomatic biomarker profiles [8].The critical appraisal of published ML studies in MND helps to outline desirable future study designs. Models should ideally be validated on external datasets; the choice of ML model should be determined by inherent data characteristics (missing data, number of features, etc.); multiclass classification models should be implemented preferably with disease‐mimics, disease‐controls, and several MND phenotypes; categorization beyond diagnostic groups into prognostic categories has additional clinical utility; and the implementation of several ML models on the same dataset may help to juxtapose the comparative efficiency of proposed models. The interrogation of quantitative biomarker panels (serum, cerebrospinal fluid [CSF], imaging) may support clinical decision‐making independently [9, 10]. An alternative strategy is the interpretation of demographic and clinical variables in ML models [11], which has a number of practical advantages compared to relying on instrumental metrics (magnetic resonance, positron emission tomography, CSF); data collection is easily harmonized across multiple sites, data acquisition is relatively cheap, data transfer is logistically simple, et cetera. Core clinical variables are typically already recorded, so with the appropriate approvals in place, extant data may potentially be used retrospectively for model training.In this issue of European Journal of Neurology, Gromicho et al. from the University of Lisbon, Portugal present a particularly innovative ML study [12]. The authors implement dynamic Bayesian networks (DBNs) to evaluate the influence of the most commonly recorded clinical variables on disease progression in ALS. DBNs model variable dependencies that evolve over time and are trained upon multi‐time point observations. The five key determinants of disease progression according to the authors are symptom duration at first consultation, body mass index at diagnosis, subscores 1 (speech) and 9 (stairs) of the revised Amyotrophic Lateral Sclerosis Functional Rating Scale, and maximum expiratory pressure. The pragmatic relevance of identifying key determinants of progression rate is that patients entering clinical trials may be informedly stratified so that ensuing "slow progression" is not intuitively attributed to a putative drug effect, and that "fast progression" is not automatically regarded as failure to respond to therapy.Despite its practical pitfalls, ML is one of the most exciting frontiers of ALS research, and it is gaining considerable momentum thanks to the increased availability of large datasets, multicentre data harmonization efforts, and dedicated international consortia. These developments offer unparalleled opportunities for model optimization and validation, paving the way for viable clinical applications.
AUTHOR CONTRIBUTIONS
Peter Bede: Conceptualization (equal); writing – original draft (equal). Kai Ming Chang: Conceptualization (equal); writing – original draft (equal). Ee Ling Tan: Conceptualization (equal); writing – original draft (equal).
CONFLICT OF INTEREST
None of the authors has any conflict of interest to disclose.
Authors: Marta Gromicho; Tiago Leão; Miguel Oliveira Santos; Susana Pinto; Alexandra M Carvalho; Sara C Madeira; Mamede De Carvalho Journal: Eur J Neurol Date: 2022-04-29 Impact factor: 6.288
Authors: Hélène Blasco; Franck Patin; Amandine Descat; Guillaume Garçon; Philippe Corcia; Patrick Gelé; Timothée Lenglet; Peter Bede; Vincent Meininger; David Devos; Jean François Gossens; Pierre-François Pradat Journal: PLoS One Date: 2018-06-05 Impact factor: 3.240
Authors: Eoin Finegan; Rangariroyashe H Chipika; Stacey Li Hi Shing; Mark A Doherty; Jennifer C Hengeveld; Alice Vajda; Colette Donaghy; Russell L McLaughlin; Niall Pender; Orla Hardiman; Peter Bede Journal: J Neurol Date: 2019-07-19 Impact factor: 4.849
Authors: Peter Bede; Rangariroyashe H Chipika; Eoin Finegan; Stacey Li Hi Shing; Mark A Doherty; Jennifer C Hengeveld; Alice Vajda; Siobhan Hutchinson; Colette Donaghy; Russell L McLaughlin; Orla Hardiman Journal: Neuroimage Clin Date: 2019-10-24 Impact factor: 4.881