| Literature DB >> 36163349 |
Jean-Pierre R Falet1,2,3, Joshua Durso-Finley4,5, Brennan Nichyporuk4,5, Julien Schroeter4,5, Francesca Bovis6, Maria-Pia Sormani6,7, Doina Precup5,8, Tal Arbel4,5, Douglas Lorne Arnold9,10.
Abstract
Disability progression in multiple sclerosis remains resistant to treatment. The absence of a suitable biomarker to allow for phase 2 clinical trials presents a high barrier for drug development. We propose to enable short proof-of-concept trials by increasing statistical power using a deep-learning predictive enrichment strategy. Specifically, a multi-headed multilayer perceptron is used to estimate the conditional average treatment effect (CATE) using baseline clinical and imaging features, and patients predicted to be most responsive are preferentially randomized into a trial. Leveraging data from six randomized clinical trials (n = 3,830), we first pre-trained the model on the subset of relapsing-remitting MS patients (n = 2,520), then fine-tuned it on a subset of primary progressive MS (PPMS) patients (n = 695). In a separate held-out test set of PPMS patients randomized to anti-CD20 antibodies or placebo (n = 297), the average treatment effect was larger for the 50% (HR, 0.492; 95% CI, 0.266-0.912; p = 0.0218) and 30% (HR, 0.361; 95% CI, 0.165-0.79; p = 0.008) predicted to be most responsive, compared to 0.743 (95% CI, 0.482-1.15; p = 0.179) for the entire group. The same model could also identify responders to laquinimod in another held-out test set of PPMS patients (n = 318). Finally, we show that using this model for predictive enrichment results in important increases in power.Entities:
Mesh:
Year: 2022 PMID: 36163349 PMCID: PMC9512913 DOI: 10.1038/s41467-022-33269-x
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Baseline features and outcomes per treatment arm
| Ocrelizumab | Rituximab | Laquinimod | Placebo | |||
|---|---|---|---|---|---|---|
| ORATORIO | OLYMPUS | ARPEGGIO | ORATORIO | OLYMPUS | ARPEGGIO | |
| Age (years) | 44.50 (7.90) | 49.54 (9.01) | 46.35 (6.62) | 44.41 (8.40) | 49.89 (8.68) | 46.70 (7.16) |
| Sex (% male) | 51.61 | 48.11 | 56.45 | 47.56 | 43.70 | 50.76 |
| Height (cm) | 170.20 (9.61) | 170.77 (9.30) | 172.11 (9.41) | 170.20 (9.57) | 169.87 (8.90) | 171.23 (9.73) |
| Weight (kg) | 72.35 (17.26) | 78.13 (16.37) | 75.25 (15.40) | 72.51 (15.24) | 77.60 (17.13) | 73.20 (16.21) |
| Disease duration (years) | 6.56 (3.77) | 9.03 (6.25) | 8.12 (6.07) | 6.01 (3.38) | 8.59 (6.81) | 7.41 (5.23) |
| EDSS | 4.69 (1.18) | 4.79 (1.36) | 4.49 (0.98) | 4.65 (1.16) | 4.58 (1.41) | 4.46 (0.91) |
| FSS-Bowel and Bladder | 1.14 (0.85) | 1.42 (0.95) | 1.27 (0.95) | 1.14 (0.91) | 1.21 (0.94) | 1.16 (0.88) |
| FSS-Brainstem | 0.88 (0.91) | 0.75 (0.90) | 1.01 (0.92) | 0.89 (0.93) | 0.61 (0.81) | 0.98 (0.95) |
| FSS-Cerebellar | 2.11 (0.98) | 2.03 (1.12) | 2.11 (0.83) | 2.14 (0.89) | 1.99 (1.10) | 2.10 (0.89) |
| FSS-Cerebral | 0.91 (0.88) | 1.30 (0.84) | 0.93 (0.91) | 0.91 (0.82) | 1.24 (0.89) | 0.86 (0.88) |
| FSS-Pyramidal | 2.87 (0.62) | 2.69 (0.82) | 2.92 (0.55) | 2.83 (0.65) | 2.82 (0.78) | 2.85 (0.66) |
| FSS-Sensory | 1.58 (1.04) | 1.48 (0.99) | 1.73 (1.04) | 1.53 (1.07) | 1.52 (1.11) | 1.74 (1.01) |
| FSS-Visual | 0.79 (0.87) | 0.86 (1.04) | 0.92 (1.30) | 0.71 (0.82) | 0.91 (1.05) | 0.79 (1.10) |
| Mean T25FW (s) | 13.93 (18.44) | 11.74 (14.56) | 9.61 (8.85) | 11.71 (12.35) | 11.01 (13.65) | 9.68 (7.54) |
| Mean 9HPT dominant (s) | 34.09 (33.99) | 28.80 (17.60) | 28.57 (12.37) | 31.67 (21.50) | 27.22 (10.22) | 28.22 (12.15) |
| Mean 9HPT non-dominant (s) | 36.05 (38.50) | 31.88 (24.99) | 31.44 (18.04) | 37.51 (40.29) | 30.95 (17.50) | 29.04 (12.16) |
| MRI metrics | ||||||
| Gad count | 1.23 (5.36) | 0.63 (2.47) | 0.27 (0.81) | 0.56 (1.47) | 0.47 (1.14) | 0.45 (1.84) |
| T2 lesion volume (mL) | 12.45 (14.92) | 8.44 (10.50) | 5.86 (9.11) | 11.33 (13.27) | 8.57 (11.66) | 5.96 (8.65) |
| Normalized brain volume (L) | 1.46 (0.08) | 1.20 (0.12) | 1.46 (0.10) | 1.47 (0.09) | 1.21 (0.12) | 1.46 (0.11) |
| Slope (EDSS change/yr)a | 0.22 (0.53) | 0.27 (0.65) | 0.32 (0.77) | 0.27 (0.71) | 0.39 (0.63) | 0.28 (0.64) |
| RMST (at 2 years)b | 1.92 | 1.89 | 1.69 | 1.91 | 1.87 | 1.72 |
Values in brackets are standard deviations, unless otherwise specified.
a Slope is based on the coefficient of regression from a linear regression model that is fit on an individual’s EDSS values over time, as described in the section “Outcome definition”.
b RMST calculated at 2 years using time to 24-week confirmed disability progression on the EDSS.
RMST restricted mean survival time, EDSS Expanded Disability Status Scale, FSS Functional Systems Score, T25FW timed 25-foot walk, 9HPT 9-hole peg test, Gad gadolinium-enhancing lesion.
Fig. 1Average treatment difference curve for the anti-CD20-Abs held-out test set.
Represents the difference in the ground-truth restricted mean survival time (RMST), calculated at 2 years using time-to-CDP24, between anti-CD20-Abs and placebo, among predicted responders defined using various thresholds. The conditional average treatment effect (CATE) percentile threshold is the minimum CATE (expressed as a percentile among all CATE estimates in the test set) that is used to define an individual as a responder (i.e. a threshold of 0.7 means the 30% predicted to be most responsive are considered responders).
Fig. 2Kaplan–Meyer curves ±95% confidence intervals (CI) for predicted responders and non-responders to anti-CD20-Abs in the held-out test set, defined at two thresholds of predicted effect size.
These are compared to the whole group (top). The placebo group is displayed in blue, and the treatment (anti-CD20-Abs) group is displayed in orange. Survival probability is measured in terms of time-to-CDP24 using the EDSS. p values are calculated using log-rank tests. 95% CIs are estimated using Greenwood's Exponential formula.
Group statistics for predicted responders and non-responders to anti-CD20-Abs at the 50th and 70th percentile thresholds, in the held-out test set
| 50th percentile thresholda | 70th percentile thresholda | |||||||
|---|---|---|---|---|---|---|---|---|
| Responders | Non-responders | Effect size (95% CI)b | Responders | Non-responders | Effect size (95% CI)b | |||
| OLYMPUS | 55 | 54 | 35 | 74 | ||||
| ORATORIO | 96 | 92 | 57 | 131 | ||||
| Age (years) | 45.20 (8.58) | 47.84 (7.89) | −2.64 (-4.53, −0.76) | 0.006 | 44.59 (9.05) | 47.36 (7.87) | −2.77 (−4.93, −0.61) | 0.013 |
| Sex (% male) | 47.02 | 50.68 | 0.86 (0.53, 1.40) | 0.562 | 45.65 | 50.24 | 0.83 (0.49, 1.40) | 0.530 |
| Height (cm) | 170.05 (10.56) | 170.55 (8.80) | −0.50 (−2.72, 1.71) | 0.657 | 169.78 (10.29) | 170.52 (9.47) | −0.74 (−3.23, 1.75) | 0.560 |
| Weight (kg) | 76.17 (18.93) | 72.96 (13.77) | 3.21 (−0.56, 6.98) | 0.096 | 75.68 (20.07) | 74.10 (14.87) | 1.58 (−3.04, 6.20) | 0.502 |
| Disease duration (years) | 6.07 (4.14) | 8.72 (5.45) | −2.65 (−3.76, −1.54) | <0.001 | 5.79 (4.15) | 8.09 (5.19) | −2.30 (−3.41, −1.19) | <0.001 |
| EDSS | 4.87 (1.18) | 4.52 (1.23) | 0.34 (0.07, 0.62) | 0.015 | 5.07 (1.14) | 4.53 (1.21) | 0.54 (0.25, 0.83) | <0.001 |
| FSS-Bowel and Bladder | 1.25 (0.93) | 1.11 (0.80) | 0.14 (−0.05, 0.34) | 0.157 | 1.27 (0.98) | 1.15 (0.82) | 0.12 (−0.11, 0.35) | 0.315 |
| FSS-Brainstem | 0.82 (0.93) | 0.79 (0.87) | 0.04 (−0.17, 0.24) | 0.726 | 0.90 (0.95) | 0.77 (0.88) | 0.13 (−0.10, 0.36) | 0.265 |
| FSS-Cerebellar | 2.38 (0.97) | 1.78 (1.05) | 0.60 (0.37, 0.83) | <0.001 | 2.57 (0.81) | 1.86 (1.08) | 0.71 (0.48, 0.93) | <0.001 |
| FSS-Cerebral | 1.07 (0.83) | 1.05 (0.89) | 0.02 (−0.18, 0.22) | 0.848 | 1.13 (0.84) | 1.04 (0.87) | 0.09 (−0.12, 0.30) | 0.404 |
| FSS-Pyramidal | 2.75 (0.69) | 2.90 (0.58) | −0.14 (−0.29, 0.00) | 0.052 | 2.77 (0.76) | 2.85 (0.58) | −0.08 (−0.26, 0.10) | 0.382 |
| FSS-Sensory | 1.55 (1.06) | 1.64 (1.02) | −0.08 (−0.32, 0.15) | 0.488 | 1.56 (1.00) | 1.61 (1.06) | −0.05 (−0.30, 0.20) | 0.703 |
| FSS-Visual | 1.04 (1.04) | 0.43 (0.62) | 0.62 (0.42, 0.81) | <0.001 | 1.28 (1.07) | 0.50 (0.71) | 0.78 (0.54, 1.02) | <0.001 |
| Mean T25FW (s) | 13.55 (17.61) | 10.75 (11.08) | 2.80 (−0.55, 6.15) | 0.103 | 15.95 (21.79) | 10.48 (9.82) | 5.47 (0.77, 10.17) | 0.024 |
| Mean 9HPT dominant (s) | 32.62 (26.89) | 26.70 (10.24) | 5.92 (1.29, 10.55) | 0.013 | 36.01 (33.25) | 26.88 (9.89) | 9.13 (2.12, 16.15) | 0.012 |
| Mean 9HPT non-dominant (s) | 37.33 (31.11) | 26.97 (9.32) | 10.36 (5.14, 15.58) | <0.001 | 42.39 (38.33) | 27.68 (9.33) | 14.71 (6.68, 22.75) | <0.001 |
| Gad count | 1.62 (3.14) | 0.16 (0.48) | 1.46 (0.95, 1.97) | <0.001 | 1.90 (3.64) | 0.46 (1.27) | 1.44 (0.67, 2.22) | <0.001 |
| T2 lesion volume (mL) | 13.09 (12.85) | 7.72 (10.17) | 5.37 (2.73, 8.01) | <0.001 | 14.31 (14.22) | 8.72 (10.27) | 5.59 (2.33, 8.85) | <0.001 |
| Normalized brain volume (L) | 1.37 (0.16) | 1.38 (0.16) | −0.02 (−0.05, 0.02) | 0.367 | 1.35 (0.16) | 1.38 (0.16) | −0.03 (−0.07, 0.01) | 0.107 |
Values in brackets are standard deviations, unless otherwise specified.
EDSS Expanded Disability Status Scale, FSS Functional Systems Score, T25FW timed 25-foot walk, 9HPT 9-hole peg test, Gad Gadolinium-enhancing lesion.
a Percentile threshold for defining responders. The 50th percentile defines responders as the top 50% who are predicted to be most responsive, while the 70th percentile defines them as the top 30%. The non-responders are those who fall below the percentile threshold.
b Effect size is the average difference between responders and non-responders for all covariates except for "sex” which is an odd’s ratio (OR).
c p values for continuous and ordinal variables are calculated using a two-sided Welch’s t test due to unequal variances/sample sizes. p value for the categorical variable "sex” is calculated using a two-sided Fisher’s exact test due to unequal and relatively small sample sizes. Exact p-values for the 50th percentile threshold: disease duration, p = 4.39 × 10−6; FSS-Cerebellar, p = 6.42 × 10−7; FSS-Visual, p = 2.18 × 10−9; Mean 9HPT non-dominant, p = 1.36 × 10−4; Gad count, p = 8.72 × 10−8; T2 lesion volume, p = 8.57 × 10−5. Exact p-values for the 70th percentile threshold: disease duration, p = 7.04 × 10−5; EDSS, p = 3.03 × 10−4; FSS-Cerebellar, p = 2.61 × 10−9; FSS-Visual, p = 3.38 × 10−9; Mean 9HPT non-dominant, p = 4.82 × 10−4; Gad count, p = 3.69 × 10−4; T2 lesion volume, p = 9.59 × 10−4.
Comparison of model performance (measured by ADwabc) on the held-out test set of patients from ORATORIO and OLYMPUS (anti-CD20-Abs), and on the held-out dataset from ARPEGGIO (laquinimod)
| Anti-CD20-Abs | Laquinimod | |
|---|---|---|
| Negative disease duration | 0.0225 | 0.0114 |
| Negative age | 0.0067 | −0.0287 |
| Negative EDSS | 0.0264 | 0.0074 |
| Negative 9HPT dominant hand | −0.0109 | 0.0023 |
| Negative 9HPT non-dominant hand | −0.0012 | −0.0006 |
| Negative T25FW | 0.0033 | 0.0020 |
| T2 lesion volume | 0.0167 | −0.0051 |
| Gad count | 0.0021 | NaNc |
| Age/disease duration | 0.0268 | 0.0138 |
| EDSS/disease duration | 0.0021 | 0.0020 |
| 9HPT dominant hand/disease duration | 0.0238 | 0.0146 |
| 9HPT non-dominant hand/disease duration | 0.0179 | 0.0098 |
| T25FW/disease duration | 0.0257 | 0.0049 |
| T2 lesion volume/disease duration | 0.0432 | 0.0164 |
| Gad count/disease duration | 0.0030 | NaNc |
| MLP (our model) | 0.0565 | 0.0211 |
| MLP (no pre-trainingd) | 0.0486 | 0.019 |
| MLP (prognostic modele) | 0.0408 | 0.0170 |
| Ridge Regression | 0.0227 | 0.0194 |
| CPH | 0.0305 | 0.0031 |
EDSS Expanded Disability Status Scale, FSS Functional Systems Score, T25FW timed 25-foot walk, 9HPT 9-hole peg test, Gad Gadolinium-enhancing lesion, MLP Multi-layer perceptron.
a The value of the feature is taken to be the CATE estimate for an individual. For example, the "T2 lesion volume” model uses the value of an individual’s T2 lesion volume as the CATE estimate for that individual, such that a larger baseline volume predicts a larger treatment effect. A "negative” feature implies that the CATE estimate is the negative of the value of the feature. For example, the "negative disease duration” model predicts a larger treatment effect with shorter disease duration.
b The value of the feature divided by the disease duration is taken to be the CATE estimate for an individual. For example, the "EDSS/disease duration” model predicts a larger treatment effect with a more rapid historical rate of change in the EDSS over time.
c Value for ADwabc could not be computed due to low variance in values for Gad lesions in the laquinimod dataset.
d This MLP was trained without pre-training on the RRMS dataset.
e The value of the predicted slope of disability progression on the placebo arm is used as the CATE estimate. In other words, a patient predicted to progress more rapidly on placebo (worse prognosis) predicts a larger treatment effect.
Estimated sample size for a one or two-year placebo-controlled randomized clinical trial of anti-CD20-Abs, using different degrees of predictive enrichment
| Percentile thresholda | CDP controlb | CDP treatmentb | HR (95% CI)c | Sample size estimated | Number screenede |
|---|---|---|---|---|---|
| 0 | 0.30 | 0.24 | 0.74 (0.48–1.15) | 1374 | 1374 |
| 10 | 0.31 | 0.24 | 0.72 (0.46–1.13) | 1133 | 1259 |
| 20 | 0.30 | 0.22 | 0.70 (0.43–1.13) | 1019 | 1274 |
| 30 | 0.29 | 0.22 | 0.67 (0.40–1.12) | 812 | 1160 |
| 40 | 0.30 | 0.21 | 0.59 (0.33–1.03) | 464 | 773 |
| 50 | 0.33 | 0.20 | 0.49 (0.27–0.91) | 245 | 490 |
| 60 | 0.36 | 0.22 | 0.51 (0.26–0.98) | 251 | 628 |
| 70 | 0.39 | 0.19 | 0.36 (0.17–0.79) | 111 | 370 |
| 0 | 0.20 | 0.12 | 0.74 (0.48–1.15) | 2435 | 2435 |
| 10 | 0.21 | 0.12 | 0.72 (0.46–1.13) | 1988 | 2209 |
| 20 | 0.20 | 0.11 | 0.70 (0.43–1.13) | 1796 | 2245 |
| 30 | 0.22 | 0.11 | 0.67 (0.40–1.12) | 1346 | 1923 |
| 40 | 0.25 | 0.11 | 0.59 (0.33–1.03) | 710 | 1183 |
| 50 | 0.26 | 0.11 | 0.49 (0.27–0.91) | 371 | 742 |
| 60 | 0.31 | 0.12 | 0.51 (0.26–0.98) | 365 | 913 |
| 70 | 0.30 | 0.10 | 0.36 (0.17–0.79) | 171 | 570 |
a Percentile threshold for randomization. The 0th percentile represents an unenriched population, while the 70th percentile leads to inclusion of only the top 30% who are predicted to be most responsive.
b Proportion of CDP24 events for the responder groups corresponding to each percentile threshold.
c HR for time-to-CDP24 for the responder groups corresponding to each percentile threshold.
d Sample size estimates are calculated using a desired power of 80% and α = 0.05, assuming a 2:1 treatment to control randomization ratio. Calculations are based on the one or two-year CDP24 rate and one or two-year HR of responder groups in the anti-CD20-Abs dataset.
e Number of participants that need to be screened to reach the corresponding sample size estimate for randomization. This is dictated by the amount of predictive enrichment applied at randomization (see Percentile column).
Fig. 3Multi-headed multilayer perceptron (MLP) architecture for CATE estimation.
The MLP was first pre-trained on a relapsing-remitting multiple sclerosis dataset (top), followed by fine tuning on a primary progressive multiple sclerosis dataset (bottom). Subtraction symbols indicate which treatment and control are being subtracted for the CATE estimate. Gray-colored layers indicate the common layers that are transferred from the pre-trained MLP to the fine-tuning MLP, at which point their parameters are frozen and only the parameters of the blue-colored layers are updated. The orange-colored layers are discarded after the pre-training step. x: Feature vector. : CATE estimate for treatment t given feature vector x. : predicted potential outcome on treatment t. IFNb-1a = Interferon beta-1a.