| Literature DB >> 35521742 |
Sreetama Basu1, Alain Munafo1, Ali-Frederic Ben-Amor2, Sanjeev Roy2, Pascal Girard1, Nadia Terranova1.
Abstract
Multiple sclerosis (MS) is among the most common autoimmune disabling neurological conditions of young adults and affects more than 2.3 million people worldwide. Predicting future disease activity in patients with MS based on their pathophysiology and current treatment is pivotal to orientate future treatment. In this respect, we used machine learning to predict disease activity status in patients with MS and identify the most predictive covariates of this activity. The analysis is conducted on a pooled population of 1935 patients enrolled in three cladribine tablets clinical trials with different outcomes: relapsing-remitting MS (from CLARITY and CLARITY-Extension trials) and patients experiencing a first demyelinating event (from the ORACLE-MS trial). We applied gradient-boosting (from XgBoost library) and Shapley Additive Explanations (SHAP) methods to identify patients' covariates that predict disease activity 3 and 6 months before their clinical observation, including patient baseline characteristics, longitudinal magnetic resonance imaging readouts, and neurological and laboratory measures. The most predictive covariates for early identification of disease activity in patients were found to be treatment duration, higher number of new combined unique active lesion count, higher number of new T1 hypointense black holes, and higher age-related MS severity score. The outcome of this analysis improves our understanding of the mechanism of onset of disease activity in patients with MS by allowing their early identification in clinical settings and prompting preventive measures, therapeutic interventions, or more frequent patient monitoring.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35521742 PMCID: PMC9286719 DOI: 10.1002/psp4.12796
Source DB: PubMed Journal: CPT Pharmacometrics Syst Pharmacol ISSN: 2163-8306
Input covariates for the P3‐T‐24 and P3‐T‐12 models
| Patient characteristics + baselines | Age, sex, race, dose (number of weeks of treatment), weight, age of onset of disease, time since first attack, lymphocytes_baseline, EDSS_baseline | ||
| Neurological assessment | Global Age‐Related Multiple Sclerosis Severity Score, KFSS1–Bowel and Bladder Functions, KFSS1–Brain Stem Functions, KFSS1–Cerebellar Functions, KFSS1–Cerebral or Mental Functions, KFSS1–Pyramidal Functions, KFSS1–Sensory Functions, KFSS1–Visual or Optic Functions | ||
| MRI assessment | Total number of T1 Gd+ lesions, total T1 hypointense (black holes), total number of T2/flair lesions, T1 Gd+ (volume in mm3), T1 hypointense lesions (volume in mm3), T2 lesions (volume in mm3), combined unique lesion count, new T1 hypointense (black holes) | ||
| Laboratory | Biochemistry: alanine aminotransferase, albumin, alkaline phosphatase, aspartate aminotransferase, bilirubin, blood urea nitrogen, calcium, creatine kinase, creatinine, sodium, potassium, urate, serum protein | Hematology: basophils, basophils/leukocytes, eosinophils, eosinophils/leukocytes, erythrocytes, hematocrit, hemoglobin, leukocytes, lymphocytes, lymphocytes/leukocytes, monocytes, monocytes/leukocytes, neutrophils, neutrophils/leukocytes, platelets | Urinalysis: urine pH, glucose |
Note: The laboratory covariates are not collected in routine clinical practice. Hence, the input to P4‐T‐24 and P4‐T‐12 models have the same set of input covariates as P3 models except for the laboratory covariates.
Abbreviations: EDSS, Expanded Disability Status Scale; Gd+, gadolinium enhancing; KFSS, Kurtzke Functional Systems Scores; MRI, magnetic resonance imaging; P3‐T‐12, phase III 12 weeks; P3‐T‐24, phase III 24 weeks; P4‐T‐12, phase IV 12 weeks; P4‐T‐12, phase IV 24 weeks.
FIGURE 1Overview of our analysis framework. The available data is split into a 80–20 fraction using stratified random sampling as training and testing data. The training data are used to select optimal XGBoost model parameters using repeated cross‐validation, and the final model performance is estimated on the completely unseen test data. In the final step, an explainable machine‐learning model SHAP is used to study the covariate contribution to the model predictions and assess covariate importance. gARMSSS, Global Age‐Related Multiple Sclerosis Severity Score; MRI, magnetic resonance imaging; SHAP, Shapley Additive Explanations
FIGURE 3Covariates predictive of disease activity in patients 6 months in advance. The list of top predictive covariates sorted in decreasing order by their absolute mean SHAP values from the (a) P3‐T‐24 and (b) P4‐T‐24 models. We see that for both models there is a strong overlap of top predictive covariates, including the numbers of weeks of cladribine treatment received, the magnetic resonance imaging measures of new combined unique lesion count and new T1 hypointense lesion count, and other clinically well understood disability measures such as age‐related multiple sclerosis severity score. In absence of laboratory covariates in the P4 model, other well‐known predictive and prognostic covariates become more important such as T1 hypointense lesion volume, age of onset of disease, and time since first symptom. EDSS, Expanded Disability Status Scale; Gd+, gadolinium enhancing; KFSS, Kurtzke Functional Systems Scores; P3‐T‐24, phase III 24 weeks; P4‐T‐12, phase IV 24 weeks; SHAP, Shapley Additive Explanations
FIGURE 4Dependency plots show the global relationship between top predictive covariates and the output variable for the P3‐T‐24 model. More positive SHAP values push model output toward a more confident prediction of disease activity in patients. For example, with the increasing number of weeks of cladribine treatment received, from 0 (placebo) up to 4 weeks, there is a decrease in model output toward a prediction of no disease activity for the patients. The missing values are imputed to population means only for visualization of these dependency plots and are highlighted with gray circles, noticeably for the new CUA lesion count and the new hypointense lesion counts. It is observed that patients who had a missing value for CUA lesion are most at risk for future disease activity events. It shows that missingness for CUA is not at random and in fact informative and related to the event of interest. CUA, combined unique active; MS, multiple sclerosis; SHAP, Shapley Additive Explanations
FIGURE 2Kaplan–Meier survival curves for disease activity in patients in the combined trial population from ORACLE‐MS, CLARITY, and CLARITY‐Extension. The survival curves are stratified by the treatment arm assignment at the start of the observation period for these three‐armed trials. We see that the disease activity free survival probability in the placebo arm (red) drops lower compared with the two treated arms (CT3.5 in blue and CT5.25 in green), showing that there is higher prevalence of disease activity in the placebo population. Vertical bars represent the time of censoring. CT3.5, cumulative cladribine dose of 3.5 mg/kg over 96 weeks; CT5.25, cumulative cladribine dose of 5.25 mg/kg over 96 weeks
Percentage of patients with disease activity not detected because of dropping criterion X
| No C1: one qualified relapse and one new T1 Gd+ in 48 weeks | 0% |
| No C2: one qualified relapse and two NE T2 in 48 weeks | 3.6% |
| No C3: two qualified relapses in 48 weeks | 6.6% |
| No C4: 3‐month sustained EDSS progression | 42.3% |
| No C5: switching DMT | 16.9% |
Abbreviations: DMT, disease‐modifying treatment; EDSS, Expanded Disability Status Scale; Gd+, gadolinium enhancing; NE, new and enlarging.
Performance estimation of models P3‐T‐24 and P4‐T‐24
| P3‐T‐24 | P4‐T‐24 | ||||
|---|---|---|---|---|---|
| Train ( | Test ( | Train ( | Test ( | ||
| Specificity | TN/(TN + FP) | 0.76 | 0.76 | 0.77 | 0.78 |
| Sensitivity | TP/(TP + FN) | 0.81 | 0.84 | 0.78 | 0.81 |
| Balanced accuracy | (Sensitivity + Specificity)/2 | 0.79 | 0.80 | 0.78 | 0.8 |
| AUC‐ROC | Area under curve of ROC | 0.79 | 0.80 | 0.78 | 0.8 |
Note: The table lists the model performance on training and test data with several metrics.
Abbreviations: FP, false positive; FN, false negative; P3‐T‐24, phase III 24 weeks; P4‐T‐12, phase IV 24 weeks; ROC, receiver operating characteristic curve; TP, true positive; TN, true negative.