Literature DB >> 36109726

The role of machine learning in developing non-magnetic resonance imaging based biomarkers for multiple sclerosis: a systematic review.

Md Zakir Hossain¹, Elena Daskalaki², Anne Brüstle³, Jane Desborough⁴, Christian J Lueck^5,6, Hanna Suominen^2,7.

Abstract

BACKGROUND: Multiple sclerosis (MS) is a neurological condition whose symptoms, severity, and progression over time vary enormously among individuals. Ideally, each person living with MS should be provided with an accurate prognosis at the time of diagnosis, precision in initial and subsequent treatment decisions, and improved timeliness in detecting the need to reassess treatment regimens. To manage these three components, discovering an accurate, objective measure of overall disease severity is essential. Machine learning (ML) algorithms can contribute to finding such a clinically useful biomarker of MS through their ability to search and analyze datasets about potential biomarkers at scale. Our aim was to conduct a systematic review to determine how, and in what way, ML has been applied to the study of MS biomarkers on data from sources other than magnetic resonance imaging.
METHODS: Systematic searches through eight databases were conducted for literature published in 2014-2020 on MS and specified ML algorithms.
RESULTS: Of the 1, 052 returned papers, 66 met the inclusion criteria. All included papers addressed developing classifiers for MS identification or measuring its progression, typically, using hold-out evaluation on subsets of fewer than 200 participants with MS. These classifiers focused on biomarkers of MS, ranging from those derived from omics and phenotypical data (34.5% clinical, 33.3% biological, 23.0% physiological, and 9.2% drug response). Algorithmic choices were dependent on both the amount of data available for supervised ML (91.5%; 49.2% classification and 42.3% regression) and the requirement to be able to justify the resulting decision-making principles in healthcare settings. Therefore, algorithms based on decision trees and support vector machines were commonly used, and the maximum average performance of 89.9% AUC was found in random forests comparing with other ML algorithms.
CONCLUSIONS: ML is applicable to determining how candidate biomarkers perform in the assessment of disease severity. However, applying ML research to develop decision aids to help clinicians optimize treatment strategies and analyze treatment responses in individual patients calls for creating appropriate data resources and shared experimental protocols. They should target proceeding from segregated classification of signals or natural language to both holistic analyses across data modalities and clinically-meaningful differentiation of disease.

Entities: Chemical

Keywords: Deep learning; Disease progression; Medical informatics; Multiple sclerosis; Prognosis; Supervised machine learning; Systematic review

Mesh：

Substances：
Biomarkers

Year: 2022 PMID： 36109726 PMCID： PMC9476596 DOI： 10.1186/s12911-022-01985-5

Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN： 1472-6947 Impact factor: 3.298

Background

Multiple sclerosis (MS) is a condition affecting the central nervous system (CNS) characterised by a mixture of inflammation and neurodegeneration. Several disease patterns (a.k.a. phenotypes) are recognized, including, but not limited to, relapsing remitting MS (RRMS) and secondary progressive MS (SPMS), but the clinical course varies considerably among individuals [1]. In recent years, the number of treatments available to reduce inflammatory processes has increased dramatically: these agents can be very effective in suppressing clinical disease activity, but they are not effective in all patients and many of them are associated with an appreciable risk of significant side effects. This has resulted in a drive towards personalised treatment for people living with MS (PwMS); ideally, individuals should be provided with (i) an accurate prognosis at the time of diagnosis, (ii) optimization of initial treatment decisions, and (iii) greater precision in following up the response to treatment and, therefore, early detection of the need to modify a particular treatment regimen [2]. To manage these three components, it is essential to discover an accurate, objective way of measuring overall disease severity, or status. However, in common with many neurological conditions, MS still lacks such a measure. Diagnosis is based on a combination of clinical features and information obtained from diagnostic tests, most notably magnetic resonance imaging (MRI) [3]. Clinical disease severity is generally quantified using the Expanded Disability Status Scale (EDSS), MS Severity Score (MSSS), or MS Functional Composite (MSFC) [4, 5], but these tools have drawbacks: each of them suffers from intra-subject and intra-observer variability and the EDSS and MSSS are biased towards the motor domain [6]. Accordingly, there has been a search for a biomarker of MS that would facilitate more accurate and objective definition of disease severity/status. A biomarker has been defined as “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention” [7]. MRI is currently the most widely-used biomarker in MS. However, it is not ideal: abnormalities on MRI are not well correlated with clinical manifestations of disease; it is expensive, invasive, and time-consuming; and it requires patients to travel to MRI scanners. Hence, several alternative biomarkers — spanning from blood or breath analysis to cognitive measures — are undergoing assessment in different centres [8-10]. Although this research into a suitable clinical biomarker other than MRI has been extensive, no clear candidate that might complement, or replace, MRI has yet been found. An effective biomarker of MS would also contribute to better overall health and healthcare experience of PwMS. Research examining the experiences of PwMS describes a lack of information and support, particularly at the time of diagnosis [11, 12], requiring extensive personal effort to meet patients’ information needs during an already stressful time [13]. Experiences of uncertainty dominate this literature, when considering treatment options and possible side effects, and in dealing with the impact of MS on work, family, and social life [14, 15]. Identification of a reliable biomarker would help. The focus of this systematic review is to study machine learning (ML) as a way to support the discovery of biomarkers that can be measured regularly and inexpensively using non-invasive and readily-accessible techniques, thus reducing the test burden on PwMS and optimizing early detection and treatment management. ML refers to computational algorithms for gathering and making sense of evidence derived from large volumes of data thereby permitting, or facilitating, human judgement and decision-making [16, 17] (see Supplementary Material A for further background on ML problems; supervised and unsupervised ML algorithms; and their timeline). ML has the potential to help in the search for a clinically useful biomarker because it can assess how well candidate biomarkers perform in the assessment of disease severity and prognosis, either individually or in combination. ML may also assist in developing decision-support techniques to aid clinicians and PwMS in making optimal individual treatment choices and in assessing the response to a chosen treatment. To determine how best to apply ML, it is important to begin by ascertaining what is already known. Comprehensive reviews of ML-assisted MRI analysis in MS have already been performed [18, 19]. However, to date, ML has been applied less frequently to other type of biomarkers [20]. This systematic review was therefore designed to investigate how ML has been applied to the study of potential non-MRI biomarkers in the management of MS, looking specifically at prognosis, disease severity, choice of treatment, and assessment of response to treatment.

Methods

The present systematic literature review, registered under the international prospective register of systematic reviews (PROSPERO) number CRD42020163161, followed the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [21]. Eight resources — PubMed, Cochrane, Google Scholar1, ScienceDirect, Scopus, Web of Science, Lens, and dblp — were used as the primary tools for indexing and retrieving publications, granted their index size and retrieval reliability [22]. The search query was formed by combining the term “Multiple Sclerosis” with a number of ML related terms as described in Table 1. Namely, depending on the resource, both general queries and their more specific variants were used to maximize number of returned relevant publications. Papers published over the 5 years following the introduction of generative adversarial networks (GANs; Supplementary Material A) [23] (i.e., from 1 January 2014 to 31 January 2020) were considered.

Table 1

“Multiple Sclerosis” and specific machine learning algorithms returned 1, 052 studies from eight search resources

Search terms	Search resource	Number of returned studies
“Multiple Sclerosis” AND (“Machine Learning” OR “Machine Intelligence”
OR “Deep Learning” OR “Decision Tree” OR “Random Forest”	PubMed	75
OR “Pattern Recognition” OR “Genetic Algorithm” OR “Supervised Algorithm”
OR “Decision Support System” OR “Evolutionary Computation”	Cochrane	25
OR “Neural Network” OR “Support Vector Machine” OR “Autoencoder*”	Google scholar	100 #
OR “Deep Belief Network” OR “Adversarial Network”
OR “Self Organizing Map” OR “Self Organising Map”)
“Multiple Sclerosis” AND (“machine learning” OR “machine intelligence”)	Science direct	340 #
	Scopus	169
	Web of Science	179 #
	Lens	160
“Multiple sclerosis” AND “machine learning”	dblp	4 #
Total count (# Sort by relevance)		1052

# Sort by relevance

“Multiple Sclerosis” and specific machine learning algorithms returned 1, 052 studies from eight search resources # Sort by relevance In order to ensure a low risk of bias, initial searches were conducted by three medical ML researchers. They performed independent searches (Table 1) using the protocol described below and each collected a list of relevant publications. The decision to include or exclude any article not found as relevant by all three reviewers was made through discussion until a consensus was reached. The following exclusion criteria (EC) were defined: Duplicates were removed. Publications that were not original full peer-reviewed papers (e.g., reviews, book chapters, surveys, and abstracts) were removed. Papers that were not about PwMS were removed. Papers that were not about ML were removed. Papers working solely on data from MRI, optical coherence tomography, visual perimetry, and/or lumbar puncture were removed because these examinations are either not routinely conducted as standard clinical tests for MS or were not aligned with our focus on biomarkers that can be measured regularly and inexpensively using minimally invasive and readily-accessible techniques. Flow chart of the systematic review process The selection of the studies considered in this review was performed in four phases (Fig. 1). In the identification phase, the previously discussed search keywords constrained within the search time frame were applied in the databases and resulted in 1, 052 publications. In the screening phase, 368 publications were were excluded as duplicates (EC.1) or non-original papers (EC.2), leaving 682 documents. In the eligibility phase, 355 papers were excluded as they did not consider MS and ML (EC.3 and EC.4). A further 261 papers were excluded on the basis of looking at MRI or other pre-specified tool (EC.5).

Fig. 1

Flow chart of the systematic review process

Ultimately, 66 papers remained for studying; the majority of them () were published in 2019, followed by 15 and 13 papers in 2018 and 2017, respectively (Fig. 2).

Fig. 2

Distribution of manuscripts with publication years. The total number of publications adds up to 68 because out of the 66 included publications, one discussed both diagnosis and MS sub-types and another discussed both diagnosis and prognosis

As a validity assurance method, these papers were assessed with respect to the guidelines for developing and reporting ML analyses and predictive models in biomedical and clinical research [24, 25] (see Additional file 2 for the outcomes). Because almost all criteria included in the guidelines were followed, no further exclusions were made. Distribution of manuscripts with publication years. The total number of publications adds up to 68 because out of the 66 included publications, one discussed both diagnosis and MS sub-types and another discussed both diagnosis and prognosis

Results

Summary of 49 included papers that reported on applications towards supporting diagnosis, disease status assessment, MS sub-typing, and prognosis. See Table 3 for a summary of 17 included papers that reported on other applications. Abbreviations as below in the Table

Table 3

Summary of the included papers that reported on applications towards evaluating response to treatment, symptoms, or underlying pathophysiology together with those for improving measurement tools or support groups. Abbreviations as above in Table 2

Author	Data sources	ML methods	Outcomes
Response to treatment
Baranzini et al. [75]	INF-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β response	RF;	Accuracy in [75.0%, 82.0%];
			CASP2 / IL10 / IL12Rb1.
Ebrahimkhani et al. [76]	microRNA	LR; RF;	AUC in [65.2%, 91.1%].
Fagone et al. [77]	Genomics	UCSC;	Accuracy = 89.2%.
Karim et al. [78]	INF-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β response	CART; LASSO; SVM; LR;	Hazard Ratio[4] in [1.359, 1.372].
Kasatkin et al. [79]	Flu-like symptoms	NN; Static Model;	Sensitivity in [73.4%, 81.2%];
			Specificity in [71.6%, 80.6%].
Li et al. [80]	Cardiac data	DT;	Baseline hare rate (HR).
Üçer et al. [81]	INF-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β response	SNAc; SVM; KNN; RF; NB; LR; DT;	Accuracy in [63.1%, 64.5%];
			F1 score in [77.4%, 78.3%];
Walter et al. [82]	Costing data	DT;	NAb is cheaper than other tests.
Patrick et al. [83]	RNAs	GB; LR; RF; LASSO; DA; Nearest SC; WE;	AUC in [72.1%, 89.9%];
Exacerbation of symptoms
Bhattacharya et al. [84]	Daily activities	NN;	Fatigue.
Papakostas et al. [85]	EMG	SVM; RF; ET; Gradient-Boosting;	F1 Score in [75.1%, 77.8%].
Underlying pathophysiology
Chi et al. [86]	Genetic ancestry	LR; RF	HLA-DRB115:01 and HLA-DRB103:01 alleles.
Forbes et al. [87]	Gut microbiota	RF;	Accuracy in [82.0%, 84.0%];
			AUC in [91.0%, 94.0%].
Improve measurement tools
Sébastien et al. [88]	Gait analysis	ET;	Accuracy in [70.9%, 91.7%].
Michel et al. [89]	Quality of life	DT; IRT;	Accuracy in [96.0%, 98.0%].
Improve support groups
Rezaallah et al. [90]	Social media text	NLP; NB;	6 topics related to MS medication.
Deetjen et al. [91]	Text data	LR; NB;	Accuracy in [91.6%, 96.0%];
			56% informational and 44% emotional for MS.

Accuracy = (TP + TN) / (TP + TN + FP + FN); FPR =FP(FP+TN); Precision = TP / (TP+FP); F1 Score = 2*(Recall * Precision) / (Recall + Precision); Sensitivity / Recall / TPR = TP / (TP + FN); Specificity = TN / (TN + FP); AUC = Area Under the ROC curve, calculated from the plot of TPR vs. FPR; CART = Classification and Regression Tree; DA = Discriminant Analysis; DT = Decision Tree; ET = Extra-Trees; FN = False Negatives; FP = False Positives; FPR = False Positive Rate; GA = Genetic Algorithm; GAIMS = Gait Analysis Imaging System; GB = Gradient Boosting; GLM = Generalized Linear Model; IP-GRASP = A Greedy Randomized Adaptive Search Procedure with memory; IRT = Item Response Theory; KNN = k-nearest Neighbour; LASSO = Least absolute shrinkage and selection operator; LR = Logistic Regression; LS = Least Squares; ML = Machine Learning; MRI = Magnetic Resonance Imaging; NB = Naïve Bayes; NLP = Natural Language Processing; NN = Neural Network; OS-ELM = Online Sequential Extreme Learning Machine; QoL = Quality of Life; RF = Random Forest; RMSE = Root Mean Square Error; ROC = Receiver Operating Characteristic; RR = Relapsing-Remitting Multiple Sclerosis; SC = Shrunken Centroid; SOM = Self-Organising Map; SNAc = Social Network Analysis-based Classifier; SSL = Semi-supervised Learning; SVM = Support Vector Machines; TN = True Negatives; TP = True Positives; TPR = True Positive Rate; CA = Candida Albicans; CAO = Clinician Assessed Outcomes; CFS = Chronic Fatigue Syndrome; CIS = Clinically Isolated Syndrome; EDSS = Expanded Disability Status Scale; EEG = Electroencephalogram; EMG = Electromyogram; EMR = Electronic Medical Record; ERPs = Event Related Potentials; HC = Healthy Controls; IM &NO = Immune-inflammatory, Metabolic, and Nitro-Oxidative; KP = Kynurenine Pathway; lncRNAs = long non-coding RNAs; ME = Myalgic Encephalomyelitis; MEP = Motor Evoked Potentials; MS = Multiple Sclerosis; NAb = Neutralising Antibodies; PP = Primary-Progressive Multiple Sclerosis; PRO = Patient Reported Outcomes; PwMS = people living with MS; rRNA = Ribosomal Ribonucleic Acid; SP = Secondary-Progressive Multiple Sclerosis; without MS = people living without Multiple Sclerosis; WE = Word Embedding; C6ORF10 = Chromosome 6 Open Reading Frame 10; CASP2 = Caspase 2, Apoptosis-Related Cysteine Peptidase; CCR5 = C-C Chemokine Receptor Type 5; CD69 = CD69 Antigen (P60, Early T-Cell Activation Antigen); CRHR1 = Corticotropin Releasing Hormone Receptor 1; CXCR4 = C-X-C Motif Chemokine Receptor 4; GM-CSF = Granulocyte-Macrophage Colony-Stimulating Factor; HLA-DRB1 = Human Leukocyte Antigen haplotype, DR beta 1; IFN- = Interferon beta; IFN- = Interferon Gamma; IL2 = Interleukin 2, T Cell Growth Factor; IL4 = Interleukin 4; IL10 = Interleukin 10; IL12Rb1 = Interleukin 12 Receptor Subunit Beta 1; IL13 = Interleukin 13; TAP2 = Transporter 2, ATP Binding Cassette Subfamily B Member; TNF = Tumor Necrosis Factor; TNFSF10 = Tumor Necrosis Factor (ligand) superfamily, member 10; STAT3 = Signal Transducer and Activator Of Transcription 3; The 66 included studies explored the application of ML to MS for purposes ranging from diagnosis and prognosis to measuring disease status and severity levels (Tables 2 and 3; Additional file 3). They all followed the recommended reporting guidelines [24, 25] from what to include when reporting predictive models in biomedical research to how to succinctly present standardized results of ML methods. In these studies, algorithmic choices were dependent on both the amount of data available for supervised ML and the requirement to be able to justify the resulting decision-making principles in healthcare settings. Typically, datasets with fewer than 200 PwMS were available for supervised ML and, therefore, support vector machines (SVMs) and decision tree-based algorithms were common (Figs. 3 and 4; Additional file 1). These ML applications focused on biomarkers of MS, ranging from those derived from omics and phenotypical data (e.g., cognitive, balance, gait, or other clinical tests) to patients’ self-reported assessments (Figs. 5 and 6).

Table 2

Author	Data sources	ML methods	Outcomes
Diagnosis vs normal
Ahmadi et al. [26]	EEG	OS-ELM;	Accuracy in [90.0%, 91.0%].
Andersen et al. [27]	Metabolomics	LR; RF;	AUC in [81.0%, 86.0%].
Bertolazzi et al. [28]	Genes	KNN; SVM; DT;	Accuracy in [92.0%, 95.0%].
Broza et al. [29]	Breath markers	NN;	Accuracy in [72.0%, 90.0%];
			AUC in [79.0%, 87.0%].
Chase et al. [30]	Medical records	NB; NLP;	AUC in [90.0%, 94.0%].
deAndrés-G. et al. [31]	Genetic pathways	Distance-based classifier;	Accuracy in [93.8, 98.2%].
		Minimum spanning tree;	Neurogenesis and Hemoglobin related genes.
Galli et al. [32]	Lymphocytes	NN;	TNF, GM-CSF, IFN-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma$$\end{document}γ, IL2, and CXCR4.
Goldstein et al. [33]	SNP	RF; LASSO; GLM; KNN; LR;	CRHR1.
Goyal et al. [34]	Cytokines	SVM; NN; DT; RF;	Accuracy = 90.9%; AUC = 95.7%.
Lötsch et al. [35]	Lipid markers	SOM; AdaBoost; KNN; RF;	Accuracy in [92.5%, 100%]; AUC in [92.5%, 100%].
Lötsch et al. [36]	Lipid markers	SOM;	Accuracy in [77.0%, 94.6%]; Ceramides.
Perera et al. [37]	Tremor	Linear Regression; SVR; RF;	Accuracy in [84.2%, 90.8%]; Velocity of index finger.
Prabahar et al. [38]	MicroRNA	SVM;	Accuracy in [87.8%, 90.1%].
Severini et al. [39]	Balance board	SVM;	Accuracy in [83.3, 85.5%].
Telalovic et al. [40]	lncRNAs	RF;	Accuracy in [61.5%, 84.6%].
Torabi et al. [41]	EEG	SVM; KNN;	Accuracy in [79.8%, 93.1%].
Zhang et al. [42]	Genetic pathways	SVM;	Accuracy in [61.2%, 70.3%].
Kiiski et al. [43]	ERPs	Linear Regression;	Visual task is better than auditory task.
Saroukolaei et al. [44]	Enzymes	Linear Regression; NN;	Higher CA.
Sun et al. [45]	Postural sway	RF;	Accuracy in [92.3%, 95.6%].
Diagnosis vs other diseases
Bang et al. [46]	Gut microbial	SVM; KNN; LogitBoost; Logistic Tree;	Accuracy in [96.4%, 98.3%].
Guo et al. [47]	Transcriptomics	KNN; SVM; NB; NN; LR; RF;	Accuracy in [77.2%, 86.4%];
			TNFSF10 is allied to the PwMS.
Ohanian et al. [48]	Key symptoms	DT;	Accuracy in [79.2%, 81.2%];
			Immune domain is useful in this case.
Ostmeyer et al. [49]	B-cell receptor	Optimize Log Likelihood;	Accuracy in [72.0%, 87.0%].
Disease status
Azrour et al. [50]	Gait analysis	DT;	EDSS score in [< 0.97 (No MS), >4.15 (MS)].
Fritz et al. [51]	Falls risk	LR;	Fallers and near-fallers are at similar risks.
Gudesblatt et al. [52]	Falls risk	RF;	Accuracy in [82.9%, 91.2%];
			F1 score in [78.9%, 91.3%].
Haider et al. [53]	Body movements	SVM; KNN; RF;	Accuracy in [95.5%, 100%].
Jackson et al. [54]	Genetic markers	RF;	19 genetic variants.
Kosa et al. [55]	Clinical data, MEP	GA;	CombiWISE is better than MRI measures.
McGinnis et al. [56]	Gait speeds	SVR;	RMSE speed in [0.12 m/s, 0.14 m/s].
Morrison et al. [57]	Motor assessment	DT; SVM;	Visualisation reduce gap between human and ML.
Shahid et al. [58]	Clinical data	KNN; SVM; RF; Rough Set;	Accuracy in [79.7%, 84.0%].
Supratak et al. [59]	Walking speed	SVR;	Walking speed in [0.57 m/s, 1.22 m/s].
MS sub-types
Acquarelli et al. [60]	Pathology	NLP; Clustering;	Pathological profiles and disease duration.
Fiorini et al. [61]	Clinical data	LS; LR; SVM; KNN;	Accuracy in [75.0%, 78.3%];
			F1 score in [62.3%, 70.2%].
Gronsbell et al. [62]	EMR	SSL;	Accuracy in [92.9%, 93.9%].
Gupta et al. [63]	Microbiomics	RF;	Specificity = 86.4%; Sensitivity = 45.4%.
Lim et al. [64]	Kyneurenine	DT; DA; CART; SVM;	Accuracy in [83.0%, 91.0%].
Lopez et al. [65]	Genetic signatures	Clustering;	CD69, CCR5, IL13, and STAT3.
Prognosis
Bejarano et al. [66]	Clinical, MEP	NB; NN; LR; DT; Linear Regression;	Accuracy in [67.0%, 80.0%]; AUC in [65%, 76.0%].
Brichetto et al. [67]	Clinical data	Supervised Algorithms;	Accuracy in [82.6%, 86.0%].
Briggs et al. [68]	Clinical data	LASSO;	Obesity and smoking.
Flauzino et al. [69]	Clinical data	LR; NN;	AUC = 84.2; Lower IL4.
Pruenza et al. [70]	Clinical data	RF;	AUC in [80.0%, 82.0%].
Tacchella et al. [71]	Clinical data	RF;	AUC in [69.6%, 72.5%].
Yperman et al. [72]	MEP	RF; LR;	AUC in [72.0%, 75.0%].
Zhao et al. [73]	Clinical data	SVM; LR;	Accuracy in [68.0%, 73.0%].
Zhao et al. [74]	Clinical data	SVM; KNN; AdaBoost;	Accuracy in [76.0%, 90.0%].

CART = Classification and Regression Tree; DA = Discriminant Analysis; DT = Decision Tree; ET = Extra-Trees; FN = False Negatives; FP = False Positives; FPR = False Positive Rate; GA = Genetic Algorithm; GAIMS = Gait Analysis Imaging System; GB = Gradient Boosting; GLM = Generalized Linear Model; IP-GRASP = A Greedy Randomized Adaptive Search Procedure with memory; IRT = Item Response Theory; KNN = k-nearest Neighbour; LASSO = Least absolute shrinkage and selection operator; LR = Logistic Regression; LS = Least Squares; ML = Machine Learning; MRI = Magnetic Resonance Imaging; NB = Naïve Bayes; NLP = Natural Language Processing; NN = Neural Network; OS-ELM = Online Sequential Extreme Learning Machine; QoL = Quality of Life; RF = Random Forest; RMSE = Root Mean Square Error; ROC = Receiver Operating Characteristic; RR = Relapsing-Remitting Multiple Sclerosis; SC = Shrunken Centroid; SOM = Self-Organising Map; SNAc = Social Network Analysis-based Classifier; SSL = Semi-supervised Learning; SVM = Support Vector Machines; TN = True Negatives; TP = True Positives; TPR = True Positive Rate;

CA = Candida Albicans; CAO = Clinician Assessed Outcomes; CFS = Chronic Fatigue Syndrome; CIS = Clinically Isolated Syndrome; EDSS = Expanded Disability Status Scale; EEG = Electroencephalogram; EMG = Electromyogram; EMR = Electronic Medical Record; ERPs = Event Related Potentials; HC = Healthy Controls; IM &NO = Immune-inflammatory, Metabolic, and Nitro-Oxidative; KP = Kynurenine Pathway; lncRNAs = long non-coding RNAs; ME = Myalgic Encephalomyelitis; MEP = Motor Evoked Potentials; MS = Multiple Sclerosis; NAb = Neutralising Antibodies; PP = Primary-Progressive Multiple Sclerosis; PRO = Patient Reported Outcomes; PwMS = people living with MS; rRNA = Ribosomal Ribonucleic Acid; SP = Secondary-Progressive Multiple Sclerosis; without MS = people living without Multiple Sclerosis; WE = Word Embedding;

C6ORF10 = Chromosome 6 Open Reading Frame 10; CASP2 = Caspase 2, Apoptosis-Related Cysteine Peptidase; CCR5 = C-C Chemokine Receptor Type 5; CD69 = CD69 Antigen (P60, Early T-Cell Activation Antigen); CRHR1 = Corticotropin Releasing Hormone Receptor 1; CXCR4 = C-X-C Motif Chemokine Receptor 4; GM-CSF = Granulocyte-Macrophage Colony-Stimulating Factor; HLA-DRB1 = Human Leukocyte Antigen haplotype, DR beta 1; IFN- = Interferon beta; IFN- = Interferon Gamma; IL2 = Interleukin 2, T Cell Growth Factor; IL4 = Interleukin 4; IL10 = Interleukin 10; IL12Rb1 = Interleukin 12 Receptor Subunit Beta 1; IL13 = Interleukin 13; TAP2 = Transporter 2, ATP Binding Cassette Subfamily B Member; TNF = Tumor Necrosis Factor; TNFSF10 = Tumor Necrosis Factor (ligand) superfamily, member 10; STAT3 = Signal Transducer and Activator Of Transcription 3;

Fig. 3

Sunburst chart of machine learning algorithms applicable to multiple sclerosis studies

Fig. 4

Histogram of machine learning algorithms in multiple sclerosis studies. The y-axis refers to the number of studies

Fig. 5

Sunburst chart of machine learning applications and data in multiple sclerosis studies

Fig. 6

Histogram of data for ML applications. The y-axis refers to the number of studies

Aims and outcomes of applications

ML applications to differentiate PwMS from controls emphasized the benefits of a diversity of data sources in the search for a clinically useful biomarker of MS (Table 2 and Additional file 3). This differentiation problem was studied in as many as 20 out of the 66 included studies () [26-45]. These experiments claimed an accuracy of over 90.0% in ML looking at medical records [30], electroencephalogram (EEG) signals [26, 41], tremor or postural-sway measurements [37, 45], and omics data [28, 34–36, 38]. Decision trees [28, 34], random (decision) forests [34, 35, 37, 45], SVMs [28, 34, 37, 38, 41], neural networks (NNs) [26, 34], self-organizing maps (SOMs) [35, 36], and the naïve Bayes algorithm [30] resulted in the best learning performance. Analyzing the contribution of data sources, modalities, and featurizations to the ML performance, studies [32, 33, 36, 37, 44] supported the possibility of measuring and evaluating stress, anxiety, depression, obesity, and/or inflammatory markers2 as diagnostic biomarkers of MS. Summary of the included papers that reported on applications towards evaluating response to treatment, symptoms, or underlying pathophysiology together with those for improving measurement tools or support groups. Abbreviations as above in Table 2 Studies on diagnostic applications of ML to distinguish MS from other neurological diseases were less common, but they supplemented our list of promising diagnostic biomarkers of MS in the form of genomics and gut microbial data (Table 2 and Additional file 3). Four studies () worked at diagnostic applications of ML to distinguish MS from other neurological [47, 49] or medical diseases3 [46, 48]. These ML applications analyzed biological [46, 47, 49] or clinical data [48]. However, the ML accuracy of over 90.0% was reached only by analyzing gut microbial data through the LogitBoost classification algorithm [48]. Applications of ML to measuring MS status continued to encourage our search for disease biomarkers that can be measured more regularly and inexpensively than MRI (Table 2 and Additional file 3). ML was applied to measuring MS status through disability-scoring or severity level computing in eleven studies (). Data analyzed by these applications were drawn from clinical [55, 58], physical [45, 50–52, 56, 59], physiological [53, 55, 57], and genetic [54] sources. However, the only applications to exceed the accuracy of 90.0% were those based on assessing body movements [53] or falls risk [52] using random forests and SVMs. In contrast, one included study concluded that falls risk should be incorporated into assessment of MS disease status [51].4 Interestingly, when considering longitudinal changes in progressive MS, the sensitivity5 of the Combinatorial WeIght-adjuStEd (CombiWISE) disability-scoring that integrates four clinical scales6 was consistently better than that of MRI [55]. ML applications to recognize MS sub-types or clinical-courses—such as RRMS, Primary-Progressive MS (PPMS), and SPMS, each of which might be mild, moderate, or severe—emphasized the role of medical records and omics data in the biomarker search (Table 2 and Additional file 3). MS sub-typing was addressed in seven studies (10.6%) by analyzing clinical [60-62] and biological [44, 63–65] data. However, the accuracy of over 90.0% was reported only when using data from medical records [62] or omics7 [64]. Again, decision trees and SVMs achieved the best ML performance. In the same vein, ML applications were used to assess MS prognosis. SVMs to classify clinical data outperformed other algorithms and data sources with conclusions suggesting the incorporation of obesity and smoking history and status (Table 2 and Additional file 3). MS prognosis was studied in ten studies (15.2%) by analyzing clinical [66–71, 73, 74] and physiological [43, 66, 72] data. In this application category, only one study reported the 90.0% accuracy [74]: it used an SVM classifier on clinical data. Nevertheless, weaker evidence implicating obesity and smoking data as biomarkers of MS was provided in the context of applying the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm to disability prediction [68]. Omics and physiological data, together with data from medical records, were promising when applying ML to the treatment of MS. Nine studies (13.6%) examined responses of PwMS to treatment (Table 3 and Additional file 3). These studies analyzed responses to drugs, including interferon beta (IFNb) [75, 78, 79, 81, 82], fingolimod [76, 80], natalizumab [77], and glatiramer acetate [83]. The Area Under the receiver operating characteristic Curve (AUC) reached over 90.0% only once [76]: this study classified micro RiboNucleic Acid (microRNA) data using random forests. Finally, after IFNb treatment, measuring heart rate8 [80] and triplet testing of Caspase 2, Apoptosis-Related Cysteine Peptidase (CASP2), Interleukin 10 (IL10), and Interleukin 12 Receptor Subunit Beta 1 (IL12Rb1) [75] were the strongest predictors for response to MS treatments. The remaining studies contributed to our biomarker searching by looking at fatigue measurement and stressing the strengths of omics and gut microbiome data (Table 3 and Additional file 3). Four included studies (6.1%) targeted exacerbation of symptoms [84, 85] or underlying pathophysiology [86, 87]. Fatigue was a main source of impaired quality of life [84, 85], and certain genetic patterns9 were highly associated with PwMS [86]. In addition, particular patterns of gut microbial pathogens10 were found in MS [87]. Another four studies (6.1%) aimed to improve support groups for PwMS by using natural language processing (NLP) to explore online forum posts11 or patients’ experiences with MS medication [90, 91] or, alternatively, using decision-tree and extra-tree algorithms, to enhance measurement tools looking at walking patterns or quality-of-life assessments [88, 89]. Sunburst chart of machine learning algorithms applicable to multiple sclerosis studies

ML methods and ML datasets

To analyze the percentage of articles according to ML methods studies (details in Tables 2 and 3; and Additional file 3), an overview is presented in Fig. 3. Most included studies employed supervised ML algorithms (91.5%) and only a few proposed unsupervised solutions (4.6%). In the case of supervised ML, both classification algorithms [49.2%; incl., but not limited to, random forests and other decision trees (30.8%) as well as K nearest neighbor (KNN) and other KNN-type algorithms based on measuring the distance of, e.g., nearest neighbors (8.5%)] and regression algorithms [42.3%; incl., but not limited to, SVMs (15.4%) and logistic regression (10.8%)] were considered. Applications of later advancements in NNs (6.9%) were rare due to the limited amount of labelled paired input-output training data available for ML, the requirement to be able to justify its decision-making principles in healthcare, or slow adoption of these algorithms by researchers in medical informatics and decision-making. Our further breakdown (Fig. 4) implied that researchers considered decision trees, SVMs, regression models, NNs, and KNN-type ML algorithms for diagnosing PwMS. Usually, they used decision trees and SVMs for measuring disease status. Decision trees and regression algorithms were mostly considered for measuring responses to treatment and MS progression. Typically, all ML evaluation was conducted using hold-out methods in order to use all annotated data available for ML optimally. Histogram of machine learning algorithms in multiple sclerosis studies. The y-axis refers to the number of studies As our quantitative analysis of ML algorithms, we reported the average AUC, accuracy, and F1 score from their performance evaluations with our findings shortlisting random forests and NNs among the best performing ML methods on the basis of their above 80% AUC.12 Most commonly, the included studies considered random forests with their average performance of the AUC of 89.9%, accuracy of 81.5%, and F1 score of 78.1%. In addition, NNs had the AUC of 81.3% and accuracy of 84.8%; SVMs had the accuracy of 79.7% and F1 score of 77.5%; and KNNs the accuracy of 76.8%; and decision trees the accuracy of 76.7%. Furthermore, 68% studies reported validation strategies including k-fold, leave-one-out, and nested cross-validation. Overall, most studies deployed supervised ML to predict future trends of MS, and ML models based on decision trees (i.e., random forests) performed the best and were most commonly used. Clinical data were particularly useful sources for ML-based predictive models, but we identified room for exploring physiological and biological data as well for measuring MS prognosis and distinguishing between MS sub-types (Fig. 5). Clinical datasets — such as demographic data, patient-reported outcomes (PROs, i.e., direct responses from patients and controls), clinician-assisted outcomes (CAOs, i.e., responses provided via a clinician acting as intermediary), and electronic medical records (EMRs) — were used to separate PwMS from controls. PROs and CAOs could describe or reflect how a patient feels, functions, or survives while EMRs might be interrogated to extract demographic and clinical data including prescriptions, pathological diagnosis, medication usage, and so on. Researchers mostly used biological data to support MS diagnosis and to measure response to treatment (Fig. 6). Physiological (and physical) data were used in computer-assisted MS diagnosis and measurement of MS disease status. Predominantly clinical data were used for measuring MS prognosis, disease status, and distinguishing among MS sub-types. Sunburst chart of machine learning applications and data in multiple sclerosis studies Included studies considered both cross-sectional and time-series data from, for example, clinical, physiological, and biological sources, for purposes ranging from diagnosis and prognosis to measuring disease status and severity (Fig. 5). For the analyses, clinical data (34.5%) were most commonly used, followed by biological data (33.3%), and physical and physiological data (23.0%). These applications were typically siloed for each data type (e.g., natural language or biological signals), and multi-modal analyses had not been studied. Histogram of data for ML applications. The y-axis refers to the number of studies

Discussion

Overall, the included studies had many different purposes: most of them were developed to support the diagnosis of MS (30.3%; 20 out of 66), followed by measuring disease status (16.7%; 11/66), prognosis (15.2%; 10/66), response to treatments (13.6%; 9/66), and distinguishing MS sub-types (10.6%; 7/66), among others. Promising data sources in the search for MS biomarkers included medical records and other clinical data (e.g., medications, pathology, as well as clinical history and status); EEG, tremor, postural-sway, heart rate, and/or other physiological data; the EDSS, Scripps neurological rating scale, 25-foot walk, 9-hole peg test, and/or other disability-scoring data; genetics and/or other omics data; and gut microbiome and other biological data. The most promising biomarkers themselves consisted of measurements and evaluations of fatigue, stress, anxiety, depression, body movements, falls risk, inflammatory markers, disability, smoking variables, obesity, and/or inducing apoptosis. However, most studies focused on one of these sources and biomarker types, and leads to potential drawbacks. For example, looking at studies investigating immunological markers [92-94], it is not surprising that mediators of inflammation such as cytokines [34] or genes associated with inflammation such as TNFSF10 [47] were predictive of MS versus non MS given the inflammatory nature of MS. The problem in general is to distinguish MS-related inflammation from other inflammatory aetiologies. The majority of included studies focused on either diagnosis or prognosis without addressing treatment. These studies suggest that it might be possible to discover biomarkers for measuring MS status that are less invasive and expensive than MRI. However, bridging the gap between health science and data science calls for providing appropriate data resources and more holistic multimodal solutions to allow progress from classification to differentiate people living with and without MS, and/or measuring MS progression. That is, finding biomarkers to monitor treatment seems to be an understudied topic. Our systematic review suggests that application of ML to the MS is yet to adopt the latest ML algorithms and to take full utility of these computational modelling methods which might support clinicians’ judgement and decision-making. Overall, we found that NNs, SVMs, and decision-tree based algorithms performed best at differentiating PwMS from controls and recognizing MS sub-types or clinical-courses. We believe this is explained by their tolerance for relatively small amounts of data to learn from and/or by ML researchers’ devotion to careful feature engineering [95, 96]. In general, applications of ML to MS are constrained by the limited amount of annotated data available and as a result, the latest advancements in deep NNs are yet to gain popularity. Another technical gap that we identified was the lack of time-series and longitudinal datasets to allow studying hidden Markov models, recurrent NNs, and other sequential ML methods. One effective approach to facilitate progress should be to organize and facilitate the design, creation, release, and use of experimental protocols (e.g., guidelines for developing and reporting ML analyses in clinical research by [24] and [25]), shared datasets (e.g., MSBase [97] and MS Floodlight Open [98]), and other community resources (e.g., as part of shared tasks, computational challenges, evaluation campaigns, or hackathons such as the Intelligent Disease Progression Prediction at the 2022 Conference and Labs of the Evaluation Forum by Brainteaser [99] that targets amyotrophic lateral sclerosis and MS). Although the 66 included studies followed the cited guidelines carefully in their reporting, comparing their aims, outcomes, ML methods would benefit from shared experimental protocols, supported by more standardized evaluation. More widely in biomedical natural language processing (NLP), community initiatives of this kind with published problem specifications; training and test data; data processing, visualization, and evaluation code and software; and benchmark evaluations and lab overviews have been successful in establishing strong ecosystems across professions and disciplines to conceptualize clinically-meaningful problems and introduce ML methods that have become their new state-of-the-art solutions [100-104]. Their use has also enhanced replicability and reproducibility of biomedical research [105-108]. In addition, their use has facilitated transfer of technology to clinical practice [109] by viewing data as a holistic trustworthy source of information for clinical purpose [110]. We recognize two main limitations of this review. ML has been extensively applied to MRI, but this was deliberately excluded from the current study. In order to assess the possibility of finding an alternative to expensive, invasive, and time-consuming MRI. For recently-published reviews of ML application to MRI and its potential in clinical settings, see [18, 19]. Another limitation of the review was its exclusion of classical statistics algorithms. We refer the reader to the paper [111] for more information about the theoretical and experimental similarities and differences between these ML algorithms in the context of neuroscience. Improving the capacity to differentiate RRMS from other subtypes of MS, and to rate disease severity and prognosis would significantly reduce the levels of uncertainty described by PwMS. This includes uncertainty related to future disease progression [13, 90, 91], whether to have children [92, 93], and fears of becoming a burden [94, 112]. However, alleviating uncertainty for some, might mean removing a source of hope that one’s condition might not be as severe as other people’s [95]. The capacity of ML to inform treatment decisions could therefore provide enormous benefit to PwMS whose current choices are often constitute a trade-off between potential side-effects and limited information about efficacy, making decisions difficult [96, 113]. The collection of adequate quantities of high-quality data requires engagement of PwMS, and a willingness on their behalf to participate, preferably over long periods of time to collect ongoing data. While the use of technology to monitor MS is becoming more common (e.g., smartwatch- and smart phone-based SmartMS Floodlight App [98]) [114], the use of these brings both benefits and costs to the wearer [15]. In particular, technology often requires frequent calibration [115-117], intrudes on daily activities [115, 116], and acts as a constant reminder of chronic health conditions [118]. While for scientists the benefits of having access to large quantities of data may be obvious, it is essential that we understand the implications for vulnerable users, such as PwMS [119, 120]. We believe ML has the potential to be very useful in the search for a non-MRI biomarker of MS if applied appropriately. To maximize the potential of ML in this way, we would suggest to expand the size of the data sets studied. For example, this can be facilitated by sharing of data between different centres and by soliciting direct involvement of PwMS through, e.g., open community resources and computational challenges. As part of them, extending the study of ML algorithms to the currently understudied deep learning and NNs in MS is advisable; out of the top-3 performing ML algorithms of NNs, decision trees, and SVMs (average accuracy of 84.8%, 81.5%, and 79.7%, respectively), NNs were deployed only in 6.9% of the 22 included studies while for the other two algorithms, this deployment rate was 30.8% and 15.4%, respectively.

Conclusions

ML is applicable to determining how candidate biomarkers perform in the assessment of MS and its severity. For instance, the random forest algorithm is both a common and well-performing choice, whilst deep learning advances are yet to become prevalent. However, applying ML research to clinically meaningful problems, including developing decision-support tools to support clinicians to optimize diagnosis, treatment strategies, and analyze treatment responses in individual patients calls for creating appropriate data resources and shared experimental protocols. To illustrate, the progress of these health informatics applications seems to be hindered by insufficient quantity and quality of data. This calls for developing appropriate data resources to proceed from classification to clinically-meaningful differentiation of disease and enabling more holistic analyses across data modalities as opposed to segregated solutions for signal processing, natural language processing, and each other data type. Additional file 1: Background on Machine Learning (PDF) Additional file 2: Validity Evaluation Tables (Document) Additional file 3: Detailed summary of the included papers (Excel) Additional file 4: PRISMA 2020 Checklist (PDF) Additional file 5: Search Results (Document) Additional file 6:Generating Sunburst Plot - ML Applications Additional file 7: Generating Sunburst Plot - ML Methods

90 in total

1. Distinguishing among multiple sclerosis fallers, near-fallers and non-fallers.

Authors: Nora E Fritz; Ani Eloyan; Moira Baynes; Scott D Newsome; Peter A Calabresi; Kathleen M Zackowski
Journal: Mult Scler Relat Disord Date: 2017-11-22 Impact factor: 4.339

2. Expectations and Attitudes of Individuals With Type 1 Diabetes After Using a Hybrid Closed Loop System.

Authors: Esti Iturralde; Molly L Tanenbaum; Sarah J Hanes; Sakinah C Suttiratana; Jodie M Ambrosino; Trang T Ly; David M Maahs; Diana Naranjo; Natalie Walders-Abramson; Stuart A Weinzimer; Bruce A Buckingham; Korey K Hood
Journal: Diabetes Educ Date: 2017-04 Impact factor: 2.140

3. Recommendations for Reporting Machine Learning Analyses in Clinical Research.

Authors: Laura M Stevens; Bobak J Mortazavi; Rahul C Deo; Lesley Curtis; David P Kao
Journal: Circ Cardiovasc Qual Outcomes Date: 2020-10-14

4. Multiple sclerosis risk factors contribute to onset heterogeneity.

Authors: Farren B S Briggs; Justin C Yu; Mary F Davis; Jinghong Jiangyang; Shannon Fu; Erica Parrotta; Douglas D Gunzler; Daniel Ontaneda
Journal: Mult Scler Relat Disord Date: 2018-12-04 Impact factor: 4.339

5. Metabolome-based signature of disease pathology in MS.

Authors: S L Andersen; F B S Briggs; J H Winnike; Y Natanzon; S Maichle; K J Knagge; L K Newby; S G Gregory
Journal: Mult Scler Relat Disord Date: 2019-03-09 Impact factor: 4.339

6. The role of Candida albicans in the severity of multiple sclerosis.

Authors: Shahla Amri Saroukolaei; Mojdeh Ghabaee; Hojjatollah Shokri; Alireza Badiei; Shadi Ghourchian
Journal: Mycoses Date: 2016-11 Impact factor: 4.377

Review 7. The CRF1 receptor, a novel target for the treatment of depression, anxiety, and stress-related disorders.

Authors: John H Kehne
Journal: CNS Neurol Disord Drug Targets Date: 2007-06 Impact factor: 4.388

8. Identification of Genes Discriminating Multiple Sclerosis Patients from Controls by Adapting a Pathway Analysis Method.

Authors: Lei Zhang; Linlin Wang; Pu Tian; Suyan Tian
Journal: PLoS One Date: 2016-11-15 Impact factor: 3.240

9. Early recognition of multiple sclerosis using natural language processing of the electronic health record.

Authors: Herbert S Chase; Lindsey R Mitrani; Gabriel G Lu; Dominick J Fulgieri
Journal: BMC Med Inform Decis Mak Date: 2017-02-28 Impact factor: 2.796

10. Machine learning analysis of motor evoked potential time series to predict disability progression in multiple sclerosis.

Authors: Jan Yperman; Thijs Becker; Dirk Valkenborg; Veronica Popescu; Niels Hellings; Bart Van Wijmeersch; Liesbet M Peeters
Journal: BMC Neurol Date: 2020-03-21 Impact factor: 2.474