Literature DB >> 36109726

The role of machine learning in developing non-magnetic resonance imaging based biomarkers for multiple sclerosis: a systematic review.

Md Zakir Hossain1, Elena Daskalaki2, Anne Brüstle3, Jane Desborough4, Christian J Lueck5,6, Hanna Suominen2,7.   

Abstract

BACKGROUND: Multiple sclerosis (MS) is a neurological condition whose symptoms, severity, and progression over time vary enormously among individuals. Ideally, each person living with MS should be provided with an accurate prognosis at the time of diagnosis, precision in initial and subsequent treatment decisions, and improved timeliness in detecting the need to reassess treatment regimens. To manage these three components, discovering an accurate, objective measure of overall disease severity is essential. Machine learning (ML) algorithms can contribute to finding such a clinically useful biomarker of MS through their ability to search and analyze datasets about potential biomarkers at scale. Our aim was to conduct a systematic review to determine how, and in what way, ML has been applied to the study of MS biomarkers on data from sources other than magnetic resonance imaging.
METHODS: Systematic searches through eight databases were conducted for literature published in 2014-2020 on MS and specified ML algorithms.
RESULTS: Of the 1, 052 returned papers, 66 met the inclusion criteria. All included papers addressed developing classifiers for MS identification or measuring its progression, typically, using hold-out evaluation on subsets of fewer than 200 participants with MS. These classifiers focused on biomarkers of MS, ranging from those derived from omics and phenotypical data (34.5% clinical, 33.3% biological, 23.0% physiological, and 9.2% drug response). Algorithmic choices were dependent on both the amount of data available for supervised ML (91.5%; 49.2% classification and 42.3% regression) and the requirement to be able to justify the resulting decision-making principles in healthcare settings. Therefore, algorithms based on decision trees and support vector machines were commonly used, and the maximum average performance of 89.9% AUC was found in random forests comparing with other ML algorithms.
CONCLUSIONS: ML is applicable to determining how candidate biomarkers perform in the assessment of disease severity. However, applying ML research to develop decision aids to help clinicians optimize treatment strategies and analyze treatment responses in individual patients calls for creating appropriate data resources and shared experimental protocols. They should target proceeding from segregated classification of signals or natural language to both holistic analyses across data modalities and clinically-meaningful differentiation of disease.
© 2022. The Author(s).

Entities:  

Keywords:  Deep learning; Disease progression; Medical informatics; Multiple sclerosis; Prognosis; Supervised machine learning; Systematic review

Mesh:

Substances:

Year:  2022        PMID: 36109726      PMCID: PMC9476596          DOI: 10.1186/s12911-022-01985-5

Source DB:  PubMed          Journal:  BMC Med Inform Decis Mak        ISSN: 1472-6947            Impact factor:   3.298


Background

Multiple sclerosis (MS) is a condition affecting the central nervous system (CNS) characterised by a mixture of inflammation and neurodegeneration. Several disease patterns (a.k.a. phenotypes) are recognized, including, but not limited to, relapsing remitting MS (RRMS) and secondary progressive MS (SPMS), but the clinical course varies considerably among individuals [1]. In recent years, the number of treatments available to reduce inflammatory processes has increased dramatically: these agents can be very effective in suppressing clinical disease activity, but they are not effective in all patients and many of them are associated with an appreciable risk of significant side effects. This has resulted in a drive towards personalised treatment for people living with MS (PwMS); ideally, individuals should be provided with (i) an accurate prognosis at the time of diagnosis, (ii) optimization of initial treatment decisions, and (iii) greater precision in following up the response to treatment and, therefore, early detection of the need to modify a particular treatment regimen [2]. To manage these three components, it is essential to discover an accurate, objective way of measuring overall disease severity, or status. However, in common with many neurological conditions, MS still lacks such a measure. Diagnosis is based on a combination of clinical features and information obtained from diagnostic tests, most notably magnetic resonance imaging (MRI) [3]. Clinical disease severity is generally quantified using the Expanded Disability Status Scale (EDSS), MS Severity Score (MSSS), or MS Functional Composite (MSFC) [4, 5], but these tools have drawbacks: each of them suffers from intra-subject and intra-observer variability and the EDSS and MSSS are biased towards the motor domain [6]. Accordingly, there has been a search for a biomarker of MS that would facilitate more accurate and objective definition of disease severity/status. A biomarker has been defined as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention” [7]. MRI is currently the most widely-used biomarker in MS. However, it is not ideal: abnormalities on MRI are not well correlated with clinical manifestations of disease; it is expensive, invasive, and time-consuming; and it requires patients to travel to MRI scanners. Hence, several alternative biomarkers — spanning from blood or breath analysis to cognitive measures — are undergoing assessment in different centres [8-10]. Although this research into a suitable clinical biomarker other than MRI has been extensive, no clear candidate that might complement, or replace, MRI has yet been found. An effective biomarker of MS would also contribute to better overall health and healthcare experience of PwMS. Research examining the experiences of PwMS describes a lack of information and support, particularly at the time of diagnosis [11, 12], requiring extensive personal effort to meet patients’ information needs during an already stressful time [13]. Experiences of uncertainty dominate this literature, when considering treatment options and possible side effects, and in dealing with the impact of MS on work, family, and social life [14, 15]. Identification of a reliable biomarker would help. The focus of this systematic review is to study machine learning (ML) as a way to support the discovery of biomarkers that can be measured regularly and inexpensively using non-invasive and readily-accessible techniques, thus reducing the test burden on PwMS and optimizing early detection and treatment management. ML refers to computational algorithms for gathering and making sense of evidence derived from large volumes of data thereby permitting, or facilitating, human judgement and decision-making [16, 17] (see Supplementary Material A for further background on ML problems; supervised and unsupervised ML algorithms; and their timeline). ML has the potential to help in the search for a clinically useful biomarker because it can assess how well candidate biomarkers perform in the assessment of disease severity and prognosis, either individually or in combination. ML may also assist in developing decision-support techniques to aid clinicians and PwMS in making optimal individual treatment choices and in assessing the response to a chosen treatment. To determine how best to apply ML, it is important to begin by ascertaining what is already known. Comprehensive reviews of ML-assisted MRI analysis in MS have already been performed [18, 19]. However, to date, ML has been applied less frequently to other type of biomarkers [20]. This systematic review was therefore designed to investigate how ML has been applied to the study of potential non-MRI biomarkers in the management of MS, looking specifically at prognosis, disease severity, choice of treatment, and assessment of response to treatment.

Methods

The present systematic literature review, registered under the international prospective register of systematic reviews (PROSPERO) number CRD42020163161, followed the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [21]. Eight resources — PubMed, Cochrane, Google Scholar1, ScienceDirect, Scopus, Web of Science, Lens, and dblp — were used as the primary tools for indexing and retrieving publications, granted their index size and retrieval reliability [22]. The search query was formed by combining the term “Multiple Sclerosis” with a number of ML related terms as described in Table 1. Namely, depending on the resource, both general queries and their more specific variants were used to maximize number of returned relevant publications. Papers published over the 5 years following the introduction of generative adversarial networks (GANs; Supplementary Material A) [23] (i.e., from 1 January 2014 to 31 January 2020) were considered.
Table 1

“Multiple Sclerosis” and specific machine learning algorithms returned 1, 052 studies from eight search resources

Search termsSearch resourceNumber of returned studies
“Multiple Sclerosis” AND (“Machine Learning” OR “Machine Intelligence”
OR “Deep Learning” OR “Decision Tree*” OR “Random Forest*”PubMed75
OR “Pattern Recognition” OR “Genetic Algorithm*” OR “Supervised Algorithm*”
OR “Decision Support System*” OR “Evolutionary Computation*”Cochrane25
OR “Neural Network*” OR “Support Vector Machine*” OR “Autoencoder*”Google scholar100 #
OR “Deep Belief Network*” OR “Adversarial Network*”
OR “Self Organizing Map*” OR “Self Organising Map*”)
“Multiple Sclerosis” AND (“machine learning” OR “machine intelligence”)Science direct340 #
Scopus169
Web of Science179 #
Lens160
“Multiple sclerosis” AND “machine learning”dblp4 #
Total count (# Sort by relevance)1052

# Sort by relevance

“Multiple Sclerosis” and specific machine learning algorithms returned 1, 052 studies from eight search resources # Sort by relevance In order to ensure a low risk of bias, initial searches were conducted by three medical ML researchers. They performed independent searches (Table 1) using the protocol described below and each collected a list of relevant publications. The decision to include or exclude any article not found as relevant by all three reviewers was made through discussion until a consensus was reached. The following exclusion criteria (EC) were defined: Duplicates were removed. Publications that were not original full peer-reviewed papers (e.g., reviews, book chapters, surveys, and abstracts) were removed. Papers that were not about PwMS were removed. Papers that were not about ML were removed. Papers working solely on data from MRI, optical coherence tomography, visual perimetry, and/or lumbar puncture were removed because these examinations are either not routinely conducted as standard clinical tests for MS or were not aligned with our focus on biomarkers that can be measured regularly and inexpensively using minimally invasive and readily-accessible techniques. Flow chart of the systematic review process The selection of the studies considered in this review was performed in four phases (Fig. 1). In the identification phase, the previously discussed search keywords constrained within the search time frame were applied in the databases and resulted in 1, 052 publications. In the screening phase, 368 publications were were excluded as duplicates (EC.1) or non-original papers (EC.2), leaving 682 documents. In the eligibility phase, 355 papers were excluded as they did not consider MS and ML (EC.3 and EC.4). A further 261 papers were excluded on the basis of looking at MRI or other pre-specified tool (EC.5).
Fig. 1

Flow chart of the systematic review process

Ultimately, 66 papers remained for studying; the majority of them () were published in 2019, followed by 15 and 13 papers in 2018 and 2017, respectively (Fig. 2).
Fig. 2

Distribution of manuscripts with publication years. The total number of publications adds up to 68 because out of the 66 included publications, one discussed both diagnosis and MS sub-types and another discussed both diagnosis and prognosis

As a validity assurance method, these papers were assessed with respect to the guidelines for developing and reporting ML analyses and predictive models in biomedical and clinical research [24, 25] (see Additional file 2 for the outcomes). Because almost all criteria included in the guidelines were followed, no further exclusions were made. Distribution of manuscripts with publication years. The total number of publications adds up to 68 because out of the 66 included publications, one discussed both diagnosis and MS sub-types and another discussed both diagnosis and prognosis

Results

Summary of 49 included papers that reported on applications towards supporting diagnosis, disease status assessment, MS sub-typing, and prognosis. See Table 3 for a summary of 17 included papers that reported on other applications. Abbreviations as below in the Table
Table 3

Summary of the included papers that reported on applications towards evaluating response to treatment, symptoms, or underlying pathophysiology together with those for improving measurement tools or support groups. Abbreviations as above in Table 2

AuthorData sourcesML methodsOutcomes
Response to treatment
Baranzini et al. [75]INF-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β responseRF;Accuracy in [75.0%, 82.0%];
CASP2 / IL10 / IL12Rb1.
Ebrahimkhani et al. [76]microRNALR; RF;AUC in [65.2%, 91.1%].
Fagone et al. [77]GenomicsUCSC;Accuracy = 89.2%.
Karim et al. [78]INF-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β responseCART; LASSO; SVM; LR;Hazard Ratio[4] in [1.359, 1.372].
Kasatkin et al. [79]Flu-like symptomsNN; Static Model;Sensitivity in [73.4%, 81.2%];
Specificity in [71.6%, 80.6%].
Li et al. [80]Cardiac dataDT;Baseline hare rate (HR).
Üçer et al. [81]INF-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β responseSNAc; SVM; KNN; RF; NB; LR; DT;Accuracy in [63.1%, 64.5%];
F1 score in [77.4%, 78.3%];
Walter et al. [82]Costing dataDT;NAb is cheaper than other tests.
Patrick et al. [83]RNAsGB; LR; RF; LASSO; DA; Nearest SC; WE;AUC in [72.1%, 89.9%];
Exacerbation of symptoms
Bhattacharya et al. [84]Daily activitiesNN;Fatigue.
Papakostas et al. [85]EMGSVM; RF; ET; Gradient-Boosting;F1 Score in [75.1%, 77.8%].
Underlying pathophysiology
Chi et al. [86]Genetic ancestryLR; RFHLA-DRB1*15:01 and HLA-DRB1*03:01 alleles.
Forbes et al. [87]Gut microbiotaRF;Accuracy in [82.0%, 84.0%];
AUC in [91.0%, 94.0%].
Improve measurement tools
Sébastien et al. [88]Gait analysisET;Accuracy in [70.9%, 91.7%].
Michel et al. [89]Quality of lifeDT; IRT;Accuracy in [96.0%, 98.0%].
Improve support groups
Rezaallah et al. [90]Social media textNLP; NB;6 topics related to MS medication.
Deetjen et al. [91]Text dataLR; NB;Accuracy in [91.6%, 96.0%];
56% informational and 44% emotional for MS.
Accuracy = (TP + TN) / (TP + TN + FP + FN); FPR =FP(FP+TN); Precision = TP / (TP+FP); F1 Score = 2*(Recall * Precision) / (Recall + Precision); Sensitivity / Recall / TPR = TP / (TP + FN); Specificity = TN / (TN + FP); AUC = Area Under the ROC curve, calculated from the plot of TPR vs. FPR; CART = Classification and Regression Tree; DA = Discriminant Analysis; DT = Decision Tree; ET = Extra-Trees; FN = False Negatives; FP = False Positives; FPR = False Positive Rate; GA = Genetic Algorithm; GAIMS = Gait Analysis Imaging System; GB = Gradient Boosting; GLM = Generalized Linear Model; IP-GRASP = A Greedy Randomized Adaptive Search Procedure with memory; IRT = Item Response Theory; KNN = k-nearest Neighbour; LASSO = Least absolute shrinkage and selection operator; LR = Logistic Regression; LS = Least Squares; ML = Machine Learning; MRI = Magnetic Resonance Imaging; NB = Naïve Bayes; NLP = Natural Language Processing; NN = Neural Network; OS-ELM = Online Sequential Extreme Learning Machine; QoL = Quality of Life; RF = Random Forest; RMSE = Root Mean Square Error; ROC = Receiver Operating Characteristic; RR = Relapsing-Remitting Multiple Sclerosis; SC = Shrunken Centroid; SOM = Self-Organising Map; SNAc = Social Network Analysis-based Classifier; SSL = Semi-supervised Learning; SVM = Support Vector Machines; TN = True Negatives; TP = True Positives; TPR = True Positive Rate; CA = Candida Albicans; CAO = Clinician Assessed Outcomes; CFS = Chronic Fatigue Syndrome; CIS = Clinically Isolated Syndrome; EDSS = Expanded Disability Status Scale; EEG = Electroencephalogram; EMG = Electromyogram; EMR = Electronic Medical Record; ERPs = Event Related Potentials; HC = Healthy Controls; IM &NO = Immune-inflammatory, Metabolic, and Nitro-Oxidative; KP = Kynurenine Pathway; lncRNAs = long non-coding RNAs; ME = Myalgic Encephalomyelitis; MEP = Motor Evoked Potentials; MS = Multiple Sclerosis; NAb = Neutralising Antibodies; PP = Primary-Progressive Multiple Sclerosis; PRO = Patient Reported Outcomes; PwMS = people living with MS; rRNA = Ribosomal Ribonucleic Acid; SP = Secondary-Progressive Multiple Sclerosis; without MS = people living without Multiple Sclerosis; WE = Word Embedding; C6ORF10 = Chromosome 6 Open Reading Frame 10; CASP2 = Caspase 2, Apoptosis-Related Cysteine Peptidase; CCR5 = C-C Chemokine Receptor Type 5; CD69 = CD69 Antigen (P60, Early T-Cell Activation Antigen); CRHR1 = Corticotropin Releasing Hormone Receptor 1; CXCR4 = C-X-C Motif Chemokine Receptor 4; GM-CSF = Granulocyte-Macrophage Colony-Stimulating Factor; HLA-DRB1 = Human Leukocyte Antigen haplotype, DR beta 1; IFN- = Interferon beta; IFN- = Interferon Gamma; IL2 = Interleukin 2, T Cell Growth Factor; IL4 = Interleukin 4; IL10 = Interleukin 10; IL12Rb1 = Interleukin 12 Receptor Subunit Beta 1; IL13 = Interleukin 13; TAP2 = Transporter 2, ATP Binding Cassette Subfamily B Member; TNF = Tumor Necrosis Factor; TNFSF10 = Tumor Necrosis Factor (ligand) superfamily, member 10; STAT3 = Signal Transducer and Activator Of Transcription 3; The 66 included studies explored the application of ML to MS for purposes ranging from diagnosis and prognosis to measuring disease status and severity levels (Tables 2 and 3; Additional file 3). They all followed the recommended reporting guidelines [24, 25] from what to include when reporting predictive models in biomedical research to how to succinctly present standardized results of ML methods. In these studies, algorithmic choices were dependent on both the amount of data available for supervised ML and the requirement to be able to justify the resulting decision-making principles in healthcare settings. Typically, datasets with fewer than 200 PwMS were available for supervised ML and, therefore, support vector machines (SVMs) and decision tree-based algorithms were common (Figs. 3 and 4; Additional file 1). These ML applications focused on biomarkers of MS, ranging from those derived from omics and phenotypical data (e.g., cognitive, balance, gait, or other clinical tests) to patients’ self-reported assessments (Figs. 5 and 6).
Table 2

Summary of 49 included papers that reported on applications towards supporting diagnosis, disease status assessment, MS sub-typing, and prognosis. See Table 3 for a summary of 17 included papers that reported on other applications. Abbreviations as below in the Table

AuthorData sourcesML methodsOutcomes
Diagnosis vs normal
Ahmadi et al. [26]EEGOS-ELM;Accuracy in [90.0%, 91.0%].
Andersen et al. [27]MetabolomicsLR; RF;AUC in [81.0%, 86.0%].
Bertolazzi et al. [28]GenesKNN; SVM; DT;Accuracy in [92.0%, 95.0%].
Broza et al. [29]Breath markersNN;Accuracy in [72.0%, 90.0%];
AUC in [79.0%, 87.0%].
Chase et al. [30]Medical recordsNB; NLP;AUC in [90.0%, 94.0%].
deAndrés-G. et al. [31]Genetic pathwaysDistance-based classifier;Accuracy in [93.8, 98.2%].
Minimum spanning tree;Neurogenesis and Hemoglobin related genes.
Galli et al. [32]LymphocytesNN;TNF, GM-CSF, IFN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma$$\end{document}γ, IL2, and CXCR4.
Goldstein et al. [33]SNPRF; LASSO; GLM; KNN; LR;CRHR1.
Goyal et al. [34]CytokinesSVM; NN; DT; RF;Accuracy = 90.9%; AUC = 95.7%.
Lötsch et al. [35]Lipid markersSOM; AdaBoost; KNN; RF;Accuracy in [92.5%, 100%]; AUC in [92.5%, 100%].
Lötsch et al. [36]Lipid markersSOM;Accuracy in [77.0%, 94.6%]; Ceramides.
Perera et al. [37]TremorLinear Regression; SVR; RF;Accuracy in [84.2%, 90.8%]; Velocity of index finger.
Prabahar et al. [38]MicroRNASVM;Accuracy in [87.8%, 90.1%].
Severini et al. [39]Balance boardSVM;Accuracy in [83.3, 85.5%].
Telalovic et al. [40]lncRNAsRF;Accuracy in [61.5%, 84.6%].
Torabi et al. [41]EEGSVM; KNN;Accuracy in [79.8%, 93.1%].
Zhang et al. [42]Genetic pathwaysSVM;Accuracy in [61.2%, 70.3%].
Kiiski et al. [43]ERPsLinear Regression;Visual task is better than auditory task.
Saroukolaei et al. [44]EnzymesLinear Regression; NN;Higher CA.
Sun et al. [45]Postural swayRF;Accuracy in [92.3%, 95.6%].
Diagnosis vs other diseases
Bang et al. [46]Gut microbialSVM; KNN; LogitBoost; Logistic Tree;Accuracy in [96.4%, 98.3%].
Guo et al. [47]TranscriptomicsKNN; SVM; NB; NN; LR; RF;Accuracy in [77.2%, 86.4%];
TNFSF10 is allied to the PwMS.
Ohanian et al. [48]Key symptomsDT;Accuracy in [79.2%, 81.2%];
Immune domain is useful in this case.
Ostmeyer et al. [49]B-cell receptorOptimize Log Likelihood;Accuracy in [72.0%, 87.0%].
Disease status
Azrour et al. [50]Gait analysisDT;EDSS score in [< 0.97 (No MS), >4.15 (MS)].
Fritz et al. [51]Falls riskLR;Fallers and near-fallers are at similar risks.
Gudesblatt et al. [52]Falls riskRF;Accuracy in [82.9%, 91.2%];
F1 score in [78.9%, 91.3%].
Haider et al. [53]Body movementsSVM; KNN; RF;Accuracy in [95.5%, 100%].
Jackson et al. [54]Genetic markersRF;19 genetic variants.
Kosa et al. [55]Clinical data, MEPGA;CombiWISE is better than MRI measures.
McGinnis et al. [56]Gait speedsSVR;RMSE speed in [0.12 m/s, 0.14 m/s].
Morrison et al. [57]Motor assessmentDT; SVM;Visualisation reduce gap between human and ML.
Shahid et al. [58]Clinical dataKNN; SVM; RF; Rough Set;Accuracy in [79.7%, 84.0%].
Supratak et al. [59]Walking speedSVR;Walking speed in [0.57 m/s, 1.22 m/s].
MS sub-types
Acquarelli et al. [60]PathologyNLP; Clustering;Pathological profiles and disease duration.
Fiorini et al. [61]Clinical dataLS; LR; SVM; KNN;Accuracy in [75.0%, 78.3%];
F1 score in [62.3%, 70.2%].
Gronsbell et al. [62]EMRSSL;Accuracy in [92.9%, 93.9%].
Gupta et al. [63]MicrobiomicsRF;Specificity = 86.4%; Sensitivity = 45.4%.
Lim et al. [64]KyneurenineDT; DA; CART; SVM;Accuracy in [83.0%, 91.0%].
Lopez et al. [65]Genetic signaturesClustering;CD69, CCR5, IL13, and STAT3.
Prognosis
Bejarano et al. [66]Clinical, MEPNB; NN; LR; DT; Linear Regression;Accuracy in [67.0%, 80.0%]; AUC in [65%, 76.0%].
Brichetto et al. [67]Clinical dataSupervised Algorithms;Accuracy in [82.6%, 86.0%].
Briggs et al. [68]Clinical dataLASSO;Obesity and smoking.
Flauzino et al. [69]Clinical dataLR; NN;AUC = 84.2; Lower IL4.
Pruenza et al. [70]Clinical dataRF;AUC in [80.0%, 82.0%].
Tacchella et al. [71]Clinical dataRF;AUC in [69.6%, 72.5%].
Yperman et al. [72]MEPRF; LR;AUC in [72.0%, 75.0%].
Zhao et al. [73]Clinical dataSVM; LR;Accuracy in [68.0%, 73.0%].
Zhao et al. [74]Clinical dataSVM; KNN; AdaBoost;Accuracy in [76.0%, 90.0%].

Accuracy = (TP + TN) / (TP + TN + FP + FN); FPR =FP(FP+TN); Precision = TP / (TP+FP); F1 Score = 2*(Recall * Precision) / (Recall + Precision); Sensitivity / Recall / TPR = TP / (TP + FN); Specificity = TN / (TN + FP); AUC = Area Under the ROC curve, calculated from the plot of TPR vs. FPR;

CART = Classification and Regression Tree; DA = Discriminant Analysis; DT = Decision Tree; ET = Extra-Trees; FN = False Negatives; FP = False Positives; FPR = False Positive Rate; GA = Genetic Algorithm; GAIMS = Gait Analysis Imaging System; GB = Gradient Boosting; GLM = Generalized Linear Model; IP-GRASP = A Greedy Randomized Adaptive Search Procedure with memory; IRT = Item Response Theory; KNN = k-nearest Neighbour; LASSO = Least absolute shrinkage and selection operator; LR = Logistic Regression; LS = Least Squares; ML = Machine Learning; MRI = Magnetic Resonance Imaging; NB = Naïve Bayes; NLP = Natural Language Processing; NN = Neural Network; OS-ELM = Online Sequential Extreme Learning Machine; QoL = Quality of Life; RF = Random Forest; RMSE = Root Mean Square Error; ROC = Receiver Operating Characteristic; RR = Relapsing-Remitting Multiple Sclerosis; SC = Shrunken Centroid; SOM = Self-Organising Map; SNAc = Social Network Analysis-based Classifier; SSL = Semi-supervised Learning; SVM = Support Vector Machines; TN = True Negatives; TP = True Positives; TPR = True Positive Rate;

CA = Candida Albicans; CAO = Clinician Assessed Outcomes; CFS = Chronic Fatigue Syndrome; CIS = Clinically Isolated Syndrome; EDSS = Expanded Disability Status Scale; EEG = Electroencephalogram; EMG = Electromyogram; EMR = Electronic Medical Record; ERPs = Event Related Potentials; HC = Healthy Controls; IM &NO = Immune-inflammatory, Metabolic, and Nitro-Oxidative; KP = Kynurenine Pathway; lncRNAs = long non-coding RNAs; ME = Myalgic Encephalomyelitis; MEP = Motor Evoked Potentials; MS = Multiple Sclerosis; NAb = Neutralising Antibodies; PP = Primary-Progressive Multiple Sclerosis; PRO = Patient Reported Outcomes; PwMS = people living with MS; rRNA = Ribosomal Ribonucleic Acid; SP = Secondary-Progressive Multiple Sclerosis; without MS = people living without Multiple Sclerosis; WE = Word Embedding;

C6ORF10 = Chromosome 6 Open Reading Frame 10; CASP2 = Caspase 2, Apoptosis-Related Cysteine Peptidase; CCR5 = C-C Chemokine Receptor Type 5; CD69 = CD69 Antigen (P60, Early T-Cell Activation Antigen); CRHR1 = Corticotropin Releasing Hormone Receptor 1; CXCR4 = C-X-C Motif Chemokine Receptor 4; GM-CSF = Granulocyte-Macrophage Colony-Stimulating Factor; HLA-DRB1 = Human Leukocyte Antigen haplotype, DR beta 1; IFN- = Interferon beta; IFN- = Interferon Gamma; IL2 = Interleukin 2, T Cell Growth Factor; IL4 = Interleukin 4; IL10 = Interleukin 10; IL12Rb1 = Interleukin 12 Receptor Subunit Beta 1; IL13 = Interleukin 13; TAP2 = Transporter 2, ATP Binding Cassette Subfamily B Member; TNF = Tumor Necrosis Factor; TNFSF10 = Tumor Necrosis Factor (ligand) superfamily, member 10; STAT3 = Signal Transducer and Activator Of Transcription 3;

Fig. 3

Sunburst chart of machine learning algorithms applicable to multiple sclerosis studies

Fig. 4

Histogram of machine learning algorithms in multiple sclerosis studies. The y-axis refers to the number of studies

Fig. 5

Sunburst chart of machine learning applications and data in multiple sclerosis studies

Fig. 6

Histogram of data for ML applications. The y-axis refers to the number of studies

Aims and outcomes of applications

ML applications to differentiate PwMS from controls emphasized the benefits of a diversity of data sources in the search for a clinically useful biomarker of MS (Table 2 and Additional file 3). This differentiation problem was studied in as many as 20 out of the 66 included studies () [26-45]. These experiments claimed an accuracy of over 90.0% in ML looking at medical records [30], electroencephalogram (EEG) signals [26, 41], tremor or postural-sway measurements [37, 45], and omics data [28, 34–36, 38]. Decision trees [28, 34], random (decision) forests [34, 35, 37, 45], SVMs [28, 34, 37, 38, 41], neural networks (NNs) [26, 34], self-organizing maps (SOMs) [35, 36], and the naïve Bayes algorithm [30] resulted in the best learning performance. Analyzing the contribution of data sources, modalities, and featurizations to the ML performance, studies [32, 33, 36, 37, 44] supported the possibility of measuring and evaluating stress, anxiety, depression, obesity, and/or inflammatory markers2 as diagnostic biomarkers of MS. Summary of the included papers that reported on applications towards evaluating response to treatment, symptoms, or underlying pathophysiology together with those for improving measurement tools or support groups. Abbreviations as above in Table 2 Studies on diagnostic applications of ML to distinguish MS from other neurological diseases were less common, but they supplemented our list of promising diagnostic biomarkers of MS in the form of genomics and gut microbial data (Table 2 and Additional file 3). Four studies () worked at diagnostic applications of ML to distinguish MS from other neurological [47, 49] or medical diseases3 [46, 48]. These ML applications analyzed biological [46, 47, 49] or clinical data [48]. However, the ML accuracy of over 90.0% was reached only by analyzing gut microbial data through the LogitBoost classification algorithm [48]. Applications of ML to measuring MS status continued to encourage our search for disease biomarkers that can be measured more regularly and inexpensively than MRI (Table 2 and Additional file 3). ML was applied to measuring MS status through disability-scoring or severity level computing in eleven studies (). Data analyzed by these applications were drawn from clinical [55, 58], physical [45, 50–52, 56, 59], physiological [53, 55, 57], and genetic [54] sources. However, the only applications to exceed the accuracy of 90.0% were those based on assessing body movements [53] or falls risk [52] using random forests and SVMs. In contrast, one included study concluded that falls risk should be incorporated into assessment of MS disease status [51].4 Interestingly, when considering longitudinal changes in progressive MS, the sensitivity5 of the Combinatorial WeIght-adjuStEd (CombiWISE) disability-scoring that integrates four clinical scales6 was consistently better than that of MRI [55]. ML applications to recognize MS sub-types or clinical-courses—such as RRMS, Primary-Progressive MS (PPMS), and SPMS, each of which might be mild, moderate, or severe—emphasized the role of medical records and omics data in the biomarker search (Table 2 and Additional file 3). MS sub-typing was addressed in seven studies (10.6%) by analyzing clinical [60-62] and biological [44, 63–65] data. However, the accuracy of over 90.0% was reported only when using data from medical records [62] or omics7 [64]. Again, decision trees and SVMs achieved the best ML performance. In the same vein, ML applications were used to assess MS prognosis. SVMs to classify clinical data outperformed other algorithms and data sources with conclusions suggesting the incorporation of obesity and smoking history and status (Table 2 and Additional file 3). MS prognosis was studied in ten studies (15.2%) by analyzing clinical [66–71, 73, 74] and physiological [43, 66, 72] data. In this application category, only one study reported the 90.0% accuracy [74]: it used an SVM classifier on clinical data. Nevertheless, weaker evidence implicating obesity and smoking data as biomarkers of MS was provided in the context of applying the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm to disability prediction [68]. Omics and physiological data, together with data from medical records, were promising when applying ML to the treatment of MS. Nine studies (13.6%) examined responses of PwMS to treatment (Table 3 and Additional file 3). These studies analyzed responses to drugs, including interferon beta (IFNb) [75, 78, 79, 81, 82], fingolimod [76, 80], natalizumab [77], and glatiramer acetate [83]. The Area Under the receiver operating characteristic Curve (AUC) reached over 90.0% only once [76]: this study classified micro RiboNucleic Acid (microRNA) data using random forests. Finally, after IFNb treatment, measuring heart rate8 [80] and triplet testing of Caspase 2, Apoptosis-Related Cysteine Peptidase (CASP2), Interleukin 10 (IL10), and Interleukin 12 Receptor Subunit Beta 1 (IL12Rb1) [75] were the strongest predictors for response to MS treatments. The remaining studies contributed to our biomarker searching by looking at fatigue measurement and stressing the strengths of omics and gut microbiome data (Table 3 and Additional file 3). Four included studies (6.1%) targeted exacerbation of symptoms [84, 85] or underlying pathophysiology [86, 87]. Fatigue was a main source of impaired quality of life [84, 85], and certain genetic patterns9 were highly associated with PwMS [86]. In addition, particular patterns of gut microbial pathogens10 were found in MS [87]. Another four studies (6.1%) aimed to improve support groups for PwMS by using natural language processing (NLP) to explore online forum posts11 or patients’ experiences with MS medication [90, 91] or, alternatively, using decision-tree and extra-tree algorithms, to enhance measurement tools looking at walking patterns or quality-of-life assessments [88, 89]. Sunburst chart of machine learning algorithms applicable to multiple sclerosis studies

ML methods and ML datasets

To analyze the percentage of articles according to ML methods studies (details in Tables 2 and 3; and Additional file 3), an overview is presented in Fig. 3. Most included studies employed supervised ML algorithms (91.5%) and only a few proposed unsupervised solutions (4.6%). In the case of supervised ML, both classification algorithms [49.2%; incl., but not limited to, random forests and other decision trees (30.8%) as well as K nearest neighbor (KNN) and other KNN-type algorithms based on measuring the distance of, e.g., nearest neighbors (8.5%)] and regression algorithms [42.3%; incl., but not limited to, SVMs (15.4%) and logistic regression (10.8%)] were considered. Applications of later advancements in NNs (6.9%) were rare due to the limited amount of labelled paired input-output training data available for ML, the requirement to be able to justify its decision-making principles in healthcare, or slow adoption of these algorithms by researchers in medical informatics and decision-making. Our further breakdown (Fig. 4) implied that researchers considered decision trees, SVMs, regression models, NNs, and KNN-type ML algorithms for diagnosing PwMS. Usually, they used decision trees and SVMs for measuring disease status. Decision trees and regression algorithms were mostly considered for measuring responses to treatment and MS progression. Typically, all ML evaluation was conducted using hold-out methods in order to use all annotated data available for ML optimally. Histogram of machine learning algorithms in multiple sclerosis studies. The y-axis refers to the number of studies As our quantitative analysis of ML algorithms, we reported the average AUC, accuracy, and F1 score from their performance evaluations with our findings shortlisting random forests and NNs among the best performing ML methods on the basis of their above 80% AUC.12 Most commonly, the included studies considered random forests with their average performance of the AUC of 89.9%, accuracy of 81.5%, and F1 score of 78.1%. In addition, NNs had the AUC of 81.3% and accuracy of 84.8%; SVMs had the accuracy of 79.7% and F1 score of 77.5%; and KNNs the accuracy of 76.8%; and decision trees the accuracy of 76.7%. Furthermore, 68% studies reported validation strategies including k-fold, leave-one-out, and nested cross-validation. Overall, most studies deployed supervised ML to predict future trends of MS, and ML models based on decision trees (i.e., random forests) performed the best and were most commonly used. Clinical data were particularly useful sources for ML-based predictive models, but we identified room for exploring physiological and biological data as well for measuring MS prognosis and distinguishing between MS sub-types (Fig. 5). Clinical datasets — such as demographic data, patient-reported outcomes (PROs, i.e., direct responses from patients and controls), clinician-assisted outcomes (CAOs, i.e., responses provided via a clinician acting as intermediary), and electronic medical records (EMRs) — were used to separate PwMS from controls. PROs and CAOs could describe or reflect how a patient feels, functions, or survives while EMRs might be interrogated to extract demographic and clinical data including prescriptions, pathological diagnosis, medication usage, and so on. Researchers mostly used biological data to support MS diagnosis and to measure response to treatment (Fig. 6). Physiological (and physical) data were used in computer-assisted MS diagnosis and measurement of MS disease status. Predominantly clinical data were used for measuring MS prognosis, disease status, and distinguishing among MS sub-types. Sunburst chart of machine learning applications and data in multiple sclerosis studies Included studies considered both cross-sectional and time-series data from, for example, clinical, physiological, and biological sources, for purposes ranging from diagnosis and prognosis to measuring disease status and severity (Fig. 5). For the analyses, clinical data (34.5%) were most commonly used, followed by biological data (33.3%), and physical and physiological data (23.0%). These applications were typically siloed for each data type (e.g., natural language or biological signals), and multi-modal analyses had not been studied. Histogram of data for ML applications. The y-axis refers to the number of studies

Discussion

Overall, the included studies had many different purposes: most of them were developed to support the diagnosis of MS (30.3%; 20 out of 66), followed by measuring disease status (16.7%; 11/66), prognosis (15.2%; 10/66), response to treatments (13.6%; 9/66), and distinguishing MS sub-types (10.6%; 7/66), among others. Promising data sources in the search for MS biomarkers included medical records and other clinical data (e.g., medications, pathology, as well as clinical history and status); EEG, tremor, postural-sway, heart rate, and/or other physiological data; the EDSS, Scripps neurological rating scale, 25-foot walk, 9-hole peg test, and/or other disability-scoring data; genetics and/or other omics data; and gut microbiome and other biological data. The most promising biomarkers themselves consisted of measurements and evaluations of fatigue, stress, anxiety, depression, body movements, falls risk, inflammatory markers, disability, smoking variables, obesity, and/or inducing apoptosis. However, most studies focused on one of these sources and biomarker types, and leads to potential drawbacks. For example, looking at studies investigating immunological markers [92-94], it is not surprising that mediators of inflammation such as cytokines [34] or genes associated with inflammation such as TNFSF10 [47] were predictive of MS versus non MS given the inflammatory nature of MS. The problem in general is to distinguish MS-related inflammation from other inflammatory aetiologies. The majority of included studies focused on either diagnosis or prognosis without addressing treatment. These studies suggest that it might be possible to discover biomarkers for measuring MS status that are less invasive and expensive than MRI. However, bridging the gap between health science and data science calls for providing appropriate data resources and more holistic multimodal solutions to allow progress from classification to differentiate people living with and without MS, and/or measuring MS progression. That is, finding biomarkers to monitor treatment seems to be an understudied topic. Our systematic review suggests that application of ML to the MS is yet to adopt the latest ML algorithms and to take full utility of these computational modelling methods which might support clinicians’ judgement and decision-making. Overall, we found that NNs, SVMs, and decision-tree based algorithms performed best at differentiating PwMS from controls and recognizing MS sub-types or clinical-courses. We believe this is explained by their tolerance for relatively small amounts of data to learn from and/or by ML researchers’ devotion to careful feature engineering [95, 96]. In general, applications of ML to MS are constrained by the limited amount of annotated data available and as a result, the latest advancements in deep NNs are yet to gain popularity. Another technical gap that we identified was the lack of time-series and longitudinal datasets to allow studying hidden Markov models, recurrent NNs, and other sequential ML methods. One effective approach to facilitate progress should be to organize and facilitate the design, creation, release, and use of experimental protocols (e.g., guidelines for developing and reporting ML analyses in clinical research by [24] and [25]), shared datasets (e.g., MSBase [97] and MS Floodlight Open [98]), and other community resources (e.g., as part of shared tasks, computational challenges, evaluation campaigns, or hackathons such as the Intelligent Disease Progression Prediction at the 2022 Conference and Labs of the Evaluation Forum by Brainteaser [99] that targets amyotrophic lateral sclerosis and MS). Although the 66 included studies followed the cited guidelines carefully in their reporting, comparing their aims, outcomes, ML methods would benefit from shared experimental protocols, supported by more standardized evaluation. More widely in biomedical natural language processing (NLP), community initiatives of this kind with published problem specifications; training and test data; data processing, visualization, and evaluation code and software; and benchmark evaluations and lab overviews have been successful in establishing strong ecosystems across professions and disciplines to conceptualize clinically-meaningful problems and introduce ML methods that have become their new state-of-the-art solutions [100-104]. Their use has also enhanced replicability and reproducibility of biomedical research [105-108]. In addition, their use has facilitated transfer of technology to clinical practice [109] by viewing data as a holistic trustworthy source of information for clinical purpose [110]. We recognize two main limitations of this review. ML has been extensively applied to MRI, but this was deliberately excluded from the current study. In order to assess the possibility of finding an alternative to expensive, invasive, and time-consuming MRI. For recently-published reviews of ML application to MRI and its potential in clinical settings, see [18, 19]. Another limitation of the review was its exclusion of classical statistics algorithms. We refer the reader to the paper [111] for more information about the theoretical and experimental similarities and differences between these ML algorithms in the context of neuroscience. Improving the capacity to differentiate RRMS from other subtypes of MS, and to rate disease severity and prognosis would significantly reduce the levels of uncertainty described by PwMS. This includes uncertainty related to future disease progression [13, 90, 91], whether to have children [92, 93], and fears of becoming a burden [94, 112]. However, alleviating uncertainty for some, might mean removing a source of hope that one’s condition might not be as severe as other people’s [95]. The capacity of ML to inform treatment decisions could therefore provide enormous benefit to PwMS whose current choices are often constitute a trade-off between potential side-effects and limited information about efficacy, making decisions difficult [96, 113]. The collection of adequate quantities of high-quality data requires engagement of PwMS, and a willingness on their behalf to participate, preferably over long periods of time to collect ongoing data. While the use of technology to monitor MS is becoming more common (e.g., smartwatch- and smart phone-based SmartMS Floodlight App [98]) [114], the use of these brings both benefits and costs to the wearer [15]. In particular, technology often requires frequent calibration [115-117], intrudes on daily activities [115, 116], and acts as a constant reminder of chronic health conditions [118]. While for scientists the benefits of having access to large quantities of data may be obvious, it is essential that we understand the implications for vulnerable users, such as PwMS [119, 120]. We believe ML has the potential to be very useful in the search for a non-MRI biomarker of MS if applied appropriately. To maximize the potential of ML in this way, we would suggest to expand the size of the data sets studied. For example, this can be facilitated by sharing of data between different centres and by soliciting direct involvement of PwMS through, e.g., open community resources and computational challenges. As part of them, extending the study of ML algorithms to the currently understudied deep learning and NNs in MS is advisable; out of the top-3 performing ML algorithms of NNs, decision trees, and SVMs (average accuracy of 84.8%, 81.5%, and 79.7%, respectively), NNs were deployed only in 6.9% of the 22 included studies while for the other two algorithms, this deployment rate was 30.8% and 15.4%, respectively.

Conclusions

ML is applicable to determining how candidate biomarkers perform in the assessment of MS and its severity. For instance, the random forest algorithm is both a common and well-performing choice, whilst deep learning advances are yet to become prevalent. However, applying ML research to clinically meaningful problems, including developing decision-support tools to support clinicians to optimize diagnosis, treatment strategies, and analyze treatment responses in individual patients calls for creating appropriate data resources and shared experimental protocols. To illustrate, the progress of these health informatics applications seems to be hindered by insufficient quantity and quality of data. This calls for developing appropriate data resources to proceed from classification to clinically-meaningful differentiation of disease and enabling more holistic analyses across data modalities as opposed to segregated solutions for signal processing, natural language processing, and each other data type. Additional file 1: Background on Machine Learning (PDF) Additional file 2: Validity Evaluation Tables (Document) Additional file 3: Detailed summary of the included papers (Excel) Additional file 4: PRISMA 2020 Checklist (PDF) Additional file 5: Search Results (Document) Additional file 6:Generating Sunburst Plot - ML Applications Additional file 7: Generating Sunburst Plot - ML Methods
  90 in total

1.  Distinguishing among multiple sclerosis fallers, near-fallers and non-fallers.

Authors:  Nora E Fritz; Ani Eloyan; Moira Baynes; Scott D Newsome; Peter A Calabresi; Kathleen M Zackowski
Journal:  Mult Scler Relat Disord       Date:  2017-11-22       Impact factor: 4.339

2.  Expectations and Attitudes of Individuals With Type 1 Diabetes After Using a Hybrid Closed Loop System.

Authors:  Esti Iturralde; Molly L Tanenbaum; Sarah J Hanes; Sakinah C Suttiratana; Jodie M Ambrosino; Trang T Ly; David M Maahs; Diana Naranjo; Natalie Walders-Abramson; Stuart A Weinzimer; Bruce A Buckingham; Korey K Hood
Journal:  Diabetes Educ       Date:  2017-04       Impact factor: 2.140

3.  Recommendations for Reporting Machine Learning Analyses in Clinical Research.

Authors:  Laura M Stevens; Bobak J Mortazavi; Rahul C Deo; Lesley Curtis; David P Kao
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2020-10-14

4.  Multiple sclerosis risk factors contribute to onset heterogeneity.

Authors:  Farren B S Briggs; Justin C Yu; Mary F Davis; Jinghong Jiangyang; Shannon Fu; Erica Parrotta; Douglas D Gunzler; Daniel Ontaneda
Journal:  Mult Scler Relat Disord       Date:  2018-12-04       Impact factor: 4.339

5.  Metabolome-based signature of disease pathology in MS.

Authors:  S L Andersen; F B S Briggs; J H Winnike; Y Natanzon; S Maichle; K J Knagge; L K Newby; S G Gregory
Journal:  Mult Scler Relat Disord       Date:  2019-03-09       Impact factor: 4.339

6.  The role of Candida albicans in the severity of multiple sclerosis.

Authors:  Shahla Amri Saroukolaei; Mojdeh Ghabaee; Hojjatollah Shokri; Alireza Badiei; Shadi Ghourchian
Journal:  Mycoses       Date:  2016-11       Impact factor: 4.377

Review 7.  The CRF1 receptor, a novel target for the treatment of depression, anxiety, and stress-related disorders.

Authors:  John H Kehne
Journal:  CNS Neurol Disord Drug Targets       Date:  2007-06       Impact factor: 4.388

8.  Identification of Genes Discriminating Multiple Sclerosis Patients from Controls by Adapting a Pathway Analysis Method.

Authors:  Lei Zhang; Linlin Wang; Pu Tian; Suyan Tian
Journal:  PLoS One       Date:  2016-11-15       Impact factor: 3.240

9.  Early recognition of multiple sclerosis using natural language processing of the electronic health record.

Authors:  Herbert S Chase; Lindsey R Mitrani; Gabriel G Lu; Dominick J Fulgieri
Journal:  BMC Med Inform Decis Mak       Date:  2017-02-28       Impact factor: 2.796

10.  Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis.

Authors:  Jan Yperman; Thijs Becker; Dirk Valkenborg; Veronica Popescu; Niels Hellings; Bart Van Wijmeersch; Liesbet M Peeters
Journal:  BMC Neurol       Date:  2020-03-21       Impact factor: 2.474

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.