| Literature DB >> 33273676 |
Mauro F Pinto1, Hugo Oliveira2, Sónia Batista3, Luís Cruz4, Mafalda Pinto4, Inês Correia3, Pedro Martins2, César Teixeira2.
Abstract
Multiple Sclerosis is a chronic inflammatory disease, affecting the Central Nervous System and leading to irreversible neurological damage, such as long term functional impairment and disability. It has no cure and the symptoms vary widely, depending on the affected regions, amount of damage, and the ability to activate compensatory mechanisms, which constitutes a challenge to evaluate and predict its course. Additionally, relapsing-remitting patients can evolve its course into a secondary progressive, characterized by a slow progression of disability independent of relapses. With clinical information from Multiple Sclerosis patients, we developed a machine learning exploration framework concerning this disease evolution, more specifically to obtain three predictions: one on conversion to secondary progressive course and two on disease severity with rapid accumulation of disability, concerning the 6th and 10th years of progression. For the first case, the best results were obtained within two years: AUC=[Formula: see text], sensitivity=[Formula: see text] and specificity=[Formula: see text]; and for the second, the best results were obtained for the 6th year of progression, also within two years: AUC=[Formula: see text], sensitivity=[Formula: see text], and specificity=[Formula: see text]. The Expanded Disability Status Scale value, the majority of functional systems, affected functions during relapses, and age at onset were described as the most predictive features. These results demonstrate the possibility of predicting Multiple Sclerosis progression by using machine learning, which may help to understand this disease's dynamics and thus, advise physicians on medication intake.Entities:
Year: 2020 PMID: 33273676 PMCID: PMC7713436 DOI: 10.1038/s41598-020-78212-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The followed methodology. Two independent problems were investigated: SP development and disease severity. For each prediction, one on SP development and two on disease severity, five year models were developed concerning the accumulation of clinical information on the first five years of progression. Then, the results were analyzed in terms of prediction performance and feature predictive power.
The used sets of patients for MS disease progression, their selection criteria, patients’ characteristics, number of visits per patients in the first 5 years, and ratio of patients with no year visits for the first 5 years.
| Prediction | Number of patients | Less frequent case | Selection criteria | Patient’s characteristics | Visits per patient in first 5 years | Ratio of patients with no year visits in first five years (0–1) |
|---|---|---|---|---|---|---|
| SP development | 187 | SP developed 21 patients (11%) | i) Tracked since onset diagnosis or with SP diagnosis only after the 5 | Gender: 51 men (27%); | 1st year: 1.57±0.93; | 1st year: 0.00; |
| ii) Minimum of 5 years of tracking; | Onset age: 31.10±10.54; | 2nd year: 1.25±1.27; | 2nd year: 0.36; | |||
| iii) Minimum of 5 annotated visits; | Tracked years: 11.01±8.18; | 3rd year: 1.14±1.06; | 3rd year: 0.35; | |||
| Annotated years 13.22±4.87; | 4 | 4 | ||||
| 5 | 5 | |||||
| Disease severity in the 6th year | 145 | Severe cases 38 patients (26%) | i) Tracked since onset diagnosis | Gender: 44 men (28%); | 1st year: 1.61±0.96; | 1st year: 0.00; |
| ii) Minimum of 6 years of tracking; | Onset Age: 30.28±10.89; | 2nd year: 1.28±1.20; | 2nd year: 0.33; | |||
| iii) Minimum of 5 annotated visits; | Tracked years: 10.67±7.89; | 3rd year: 1.14±1.06; | 3rd year: 0.32; | |||
| Annotated years 13.82±4.76; | 4 | 4 | ||||
| 5 | 5 | |||||
| Disease severity in the 10th year | 67 | Severe cases 30 patients (45%) | i) Tracked since onset diagnosis | Gender: 15 men (27%); | 1st year: 1.19±0.47; | 1st year: 0.00; |
| ii) Minimum of 10 years of tracking; | Onset age: 32.30±11.84; | 2nd year: 0.52±0.88; | 2nd year: 0.64; | |||
| iii) Minimum of 5 annotated visits; | Tracked years: 15.24±10.35; | 3rd year: 0.61±0.74; | 3rd year: 0.54; | |||
| Annotated years 15.90±4.83; | 4 | 4 | ||||
| 5 | 5 |
Figure 2Machine Learning used pipeline: after feature extraction, a k-fold cross validation (with k=10) is performed 10 times, which includes missing data imputation, data standardization, feature selection, model training, classification and performance evaluation. The cross validation is performed 10 times. The final performance is the averaged from all runs.
Figure 3Missing data heatmap concerning features extracted from dynamic information for the first five tracked years, from the patients used in SP development prediction. Missing information per patient and per feature are presented on the right and below, respectively. Accumulative missing data is presented in black while normal segmentation data is presented in gray. In the matrix, patients are organized in rows and features in columns.
Obtained results for the two predictions: SP development/not development and disease severity. For each prediction, all classifier performances for all N-year models are presented, where the best are in bold.
| SP Development/Not Development | Disease Severity in the 6th Year Progression | Disease Severity in 10th Year Progression | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| KNN-3 | Decision Tree | Linear Regression | SVM | KNN-3 | Decision Tree | Linear Regression | SVM | KNN-3 | Decision Tree | Linear Regression | SVM | ||
| 1-Year Model | AUC | 0.76±0.05 | 0.65±0.10 | 0.79±0.06 | 0.74±0.06 | 0.74±0.07 | 0.80±0.06 | 0.67±0.09 | 0.57±0.09 | 0.67±0.09 | |||
| Geometric Mean | 0.67±0.05 | 0.61±0.08 | 0.71±0.06 | 0.70±0.06 | 0.69±0.08 | 0.71±0.05 | 0.61±0.08 | 0.55±0.09 | 0.62±0.08 | ||||
| Specificity | 0.60±0.06 | 0.55±0.11 | 0.65±0.07 | 0.66±0.09 | 0.64±0.12 | 0.73±0.06 | 0.61±0.12 | 0.61±0.13 | 0.67±0.09 | ||||
| Sensitivity | 0.75±0.11 | 0.70±0.17 | 0.80±0.14 | 0.74±0.10 | 0.76±0.12 | 0.70±0.10 | 0.63±0.14 | 0.51±0.14 | 0.58±0.14 | ||||
| F1-Score | 0.13±0.02 | 0.11±0.03 | 0.15±0.03 | 0.36±0.06 | 0.35±0.07 | 0.39±0.06 | 0.54±0.09 | 0.46±0.10 | 0.54±0.10 | ||||
| 2-Year Model | AUC | 0.81±0.08 | 0.70±0.09 | 0.82±0.08 | 0.80±0.05 | 0.83±0.06 | 0.87±0.04 | 0.66±0.09 | 0.61±0.07 | 0.71±0.07 | |||
| Geometric Mean | 0.73±0.08 | 0.66±0.08 | 0.76±0.08 | 0.75±0.05 | 0.77±0.08 | 0.82±0.06 | 0.58±0.10 | 0.55±0.09 | 0.66±0.07 | ||||
| Specificity | 0.74±0.06 | 0.65±0.09 | 0.78±0.05 | 0.68±0.06 | 0.72±0.09 | 0.82±0.04 | 0.48±0.13 | 0.52±0.15 | 0.63±0.11 | ||||
| Sensitivity | 0.73±0.14 | 0.69±0.16 | 0.74±0.14 | 0.83±0.11 | 0.83±0.14 | 0.83±0.11 | 0.73±0.15 | 0.62±0.17 | 0.70±0.12 | ||||
| F1-Score | 0.18±0.05 | 0.13±0.04 | 0.21±0.05 | 0.40±0.05 | 0.44±0.09 | 0.53±0.06 | 0.55±0.09 | 0.50±0.10 | 0.59±0.07 | ||||
| 3-Year Model | AUC | 0.73±0.09 | 0.65±0.12 | 0.78±0.09 | 0.81±0.05 | 0.82±0.07 | 0.89±0.03 | 0.68±0.08 | 0.65±0.08 | 0.73±0.07 | |||
| Geometric Mean | 0.70±0.10 | 0.65±0.11 | 0.73±0.09 | 0.76±0.06 | 0.77±0.09 | 0.84±0.05 | 0.61±0.10 | 0.58±0.09 | 0.67±0.07 | ||||
| Specificity | 0.70±0.06 | 0.62±0.11 | 0.75±0.06 | 0.73±0.04 | 0.75±0.09 | 0.84±0.02 | 0.59±0.15 | 0.59±0.16 | 0.72±0.07 | ||||
| Sensitivity | 0.73±0.18 | 0.69±0.19 | 0.72±0.18 | 0.80±0.12 | 0.80±0.16 | 0.85±0.11 | 0.67±0.16 | 0.61±0.17 | 0.63±0.12 | ||||
| F1-Score | 0.16±0.04 | 0.13±0.05 | 0.19±0.05 | 0.42±0.05 | 0.45±0.09 | 0.57±0.06 | 0.56±0.10 | 0.52±0.10 | 0.59±0.09 | ||||
| 4-Year Model | AUC | 0.79±0.08 | 0.65±0.10 | 0.79±0.07 | 0.85±0.05 | 0.85±0.07 | 0.92±0.03 | 0.72±0.08 | 0.70±0.10 | 0.77±0.08 | |||
| Geometric Mean | 0.73±0.08 | 0.65±0.10 | 0.75±0.08 | 0.78±0.07 | 0.79±0.07 | 0.86±0.05 | 0.66±0.10 | 0.64±0.10 | 0.71±0.10 | ||||
| Specificity | 0.74±0.06 | 0.66±0.12 | 0.78±0.06 | 0.82±0.04 | 0.77±0.08 | 0.89±0.04 | 0.70±0.11 | 0.71±0.15 | 0.81±0.08 | ||||
| Sensitivity | 0.74±0.16 | 0.66±0.18 | 0.72±0.13 | 0.74±0.13 | 0.82±0.14 | 0.83±0.09 | 0.65±0.16 | 0.61±0.17 | 0.64±0.15 | ||||
| F1-Score | 0.18±0.04 | 0.14±0.05 | 0.21±0.05 | 0.49±0.08 | 0.48±0.09 | 0.63±0.07 | 0.59±0.11 | 0.57±0.11 | 0.64±0.12 | ||||
| 5-Year Model | AUC | 0.84±0.08 | 0.77±0.13 | 0.81±0.06 | 0.87±0.05 | 0.86±0.09 | 0.94±0.03 | 0.79±0.08 | 0.77±0.10 | 0.81±0.07 | |||
| Geometric Mean | 0.79±0.09 | 0.74±0.12 | 0.75±0.11 | 0.77±0.08 | 0.84±0.09 | 0.87±0.06 | 0.72±0.10 | 0.70±0.11 | 0.74±0.09 | ||||
| Specificity | 0.77±0.06 | 0.77±0.13 | 0.82±0.06 | 0.86±0.04 | 0.87±0.09 | 0.93±0.03 | 0.75±0.11 | 0.80±0.16 | 0.85±0.09 | ||||
| Sensitivity | 0.82±0.17 | 0.74±0.20 | 0.71±0.20 | 0.71±0.14 | 0.83±0.16 | 0.81±0.12 | 0.72±0.14 | 0.63±0.15 | 0.66±0.13 | ||||
| F1-Score | 0.22±0.05 | 0.22±0.08 | 0.24±0.07 | 0.51±0.08 | 0.61±0.11 | 0.71±0.08 | 0.66±0.12 | 0.64±0.12 | 0.68±0.11 | ||||
Figure 4The SVM classifier performance in each N-year model for each prediction. The best performances for SP development was achieved in the 2-year model. Concerning disease severity, the best performances were achieved for the 2-year model and the 5-year model, concerning the 6th and 10th years, respectively.
Figure 5The selected features for SP development/not development and disease severity, for all year models. Color represents predictive power, calculated by the presence of each clinical source among all iterations. The signs are related to the clinical feature prognostic influence: (+) represents a good prognostic and (−) represents a bad prognostic. The diamond marker represents predictive power above 0.90.