Tom Menzies1,2, Gaelle Saint-Hilary3,4, Pavel Mozgunov5. 1. Clinical Trials Research Unit, Leeds Institute of Clinical Trials Research, 4468University of Leeds, UK. 2. Department of Mathematics and Statistics, 4396Lancaster University, UK. 3. Department of Biostatistics, 154729Institut de Recherches Internationales Servier (IRIS), France. 4. Dipartimento di Scienze Matematiche (DISMA) Giuseppe Luigi Lagrange, Politecnico di Torino, Italy. 5. MRC Biostatistics Unit, 2152University of Cambridge, UK.
Abstract
Multi-criteria decision analysis is a quantitative approach to the drug benefit-risk assessment which allows for consistent comparisons by summarising all benefits and risks in a single score. The multi-criteria decision analysis consists of several components, one of which is the utility (or loss) score function that defines how benefits and risks are aggregated into a single quantity. While a linear utility score is one of the most widely used approach in benefit-risk assessment, it is recognised that it can result in counter-intuitive decisions, for example, recommending a treatment with extremely low benefits or high risks. To overcome this problem, alternative approaches to the scores construction, namely, product, multi-linear and Scale Loss Score models, were suggested. However, to date, the majority of arguments concerning the differences implied by these models are heuristic. In this work, we consider four models to calculate the aggregated utility/loss scores and compared their performance in an extensive simulation study over many different scenarios, and in a case study. It is found that the product and Scale Loss Score models provide more intuitive treatment recommendation decisions in the majority of scenarios compared to the linear and multi-linear models, and are more robust to the correlation in the criteria.
Multi-criteria decision analysis is a quantitative approach to the drug benefit-risk assessment which allows for consistent comparisons by summarising all benefits and risks in a single score. The multi-criteria decision analysis consists of several components, one of which is the utility (or loss) score function that defines how benefits and risks are aggregated into a single quantity. While a linear utility score is one of the most widely used approach in benefit-risk assessment, it is recognised that it can result in counter-intuitive decisions, for example, recommending a treatment with extremely low benefits or high risks. To overcome this problem, alternative approaches to the scores construction, namely, product, multi-linear and Scale Loss Score models, were suggested. However, to date, the majority of arguments concerning the differences implied by these models are heuristic. In this work, we consider four models to calculate the aggregated utility/loss scores and compared their performance in an extensive simulation study over many different scenarios, and in a case study. It is found that the product and Scale Loss Score models provide more intuitive treatment recommendation decisions in the majority of scenarios compared to the linear and multi-linear models, and are more robust to the correlation in the criteria.
Entities:
Keywords:
Aggregation function; benefit–risk; decision-making; loss score; multi-criteria decision analysis
The benefit–risk analysis of a treatment consists of balancing its favourable
therapeutic effects versus adverse reactions it may induce.
This is a process which drug regulatory authorities, such as EMA
and FDA
use when deciding whether a treatment should be recommended. Benefit–risk
assessment (BRA) is mostly performed in a qualitative way.
However, this approach has been criticised for a lack of transparency behind
the final outcome, in part due to large amounts of data considered for this
assessment, and the differing opinions on what this data means. To counter this,
quantitative approaches ensuring continuity and consistency across drug BRA, and
making the decisions easier to justify and to communicate, were proposed.[5,6] While there is a number of
methods to conduct the quantitative BRA, the multi-criteria decision analysis (MCDA)
has been particularly recommended by many expert groups in the field.[7-10] MCDA provides a single score
(a utility or loss score) for a treatment, which summarises all the benefits and
risks induced by the treatment in question. These scores are then used to compare
the treatments and to guide the recommendation of therapies over others.Mussen et al.
proposed to use a linear aggregation model in the MCDA, which takes into
account all main benefits and risks associated with a treatment (as well as their
relative importance) to generate a treatment utility score by taking a linear
combination of all criteria. This utility score is then compared against the utility
score of a competing treatment, and that with the highest score is recommended. This
model appealed for numerous reasons, one of which was its simplicity. The proposed
method, however, was deterministic, point estimates of the benefit and risk criteria
were used, and no uncertainty around these estimates was considered. Yet,
uncertainty and variance are expected in treatments’ performances, and must
therefore be accounted for in the decision making.To resolve this shortcoming, probabilistic MCDA (pMCDA)
that accounts for the variability of the criteria through a Bayesian approach
was proposed. Generalisations of pMCDA for the case of uncertainty in the relative
importance of the criteria were developed, named stochastic multi-criteria
acceptability analysis (SMAA)
or Dirichlet SMAA.
However, it was acknowledged that by accounting for several sources of
uncertainty, these models become more complex and should be used primarily for the
sensitivity analysis.All the works discussed above concern a linear model for aggregation of the criteria,
which is thought to be primarily due to its wider application in practice rather
than its properties. One argument against the linear model is that a treatment which
has either no benefit or extreme risk could be recommended over other alternatives
without such extreme characteristics.[14-16] In addition, the linearity
implies that the relative tolerance in the toxicity increase is constant for all
levels of benefit that might not be the case for a number of clinical settings. To
address these points, a Scale Loss Score (SLoS) model was developed. This model made
it impossible for treatments with no benefit or extremely high risk be recommended.
It also incorporates a decreasing level of risk tolerance relative to the benefits:
where an increase in risk is more tolerated when benefit improves from ‘very low’ to
‘moderate’ compared to an increase from ‘moderate’ to ‘very high’. SLoS model
resulted in similar recommendations to the linear MCDA model when the one treatment
is strictly preferred to another (i.e. has both lower risk and higher benefit), but
resulted in more intuitive recommendations if one of the treatments has either
extremely low benefit or extremely high risk.Whilst other methods are discussed in the literature, the only application of a
non-linear BRA model to the medical field is made by Saint-Hilary et al.,
and this only compares the linear and SLoS models. This paper shall build on
this comparison by introducing various different aggregation models (AM) to analyse
how each work compared to the other in the medical field (by conduction a case study
and a simulation study), and allow an informed decision to be made as to which one
should be used using the results of an extensive and comprehensive simulations study
over a number of clinical scenarios. We will also use a case study to demonstrate
the implication of the choice of AM on the actual decision making using the
MCDA.The rest of the paper proceeds as follows. The general MCDA methodology, the four
different aggregation models considered, linear, product, multi-linear and SLoS, and
the choice of the weights for them are given in Section 2. In Section 3, we revisit
a case study conducted by Nemeroff
looking at the effects of Venlafaxine, Fluoxetine and a placebo on
depression, applying the various aggregation models to a given dataset. In Section
4, a comprehensive simulation study comparing the four aggregation models in many
different scenarios is presented, as well as the effects any correlation between
criteria may have. We conclude with a discussion in Section 5.
Methodology
All of the aggregation models (referred to as to ‘models’ below) considered in this
work are all classified within the MCDA family – they aggregate the information
about benefits and risks in a single (utility or loss) score. Therefore, we would
refer to each of the approaches by their models for the computation of the score.
Below, we outline the general MCDA framework for the construction of a score using
an arbitrary model. We consider the MCDA taking into account the variability of
estimates, pMCDA.
Setting
Consider
treatments (indexed by
) which are assessed on
criteria (indexed by
). To ensure continuity, we use the same notations as those of
Saint-Hilary et al.
: Within a Bayesian approach, the utility score
and the loss score
are random variables having a prior distribution. Given
observed outcomes
and
(corresponding to treatment performances
and
, respectively) for
and
, one can obtain the posterior distribution of
or
, respectively. The inference is based on the complete
posterior distribution and the conclusion on the benefit–risk balance is
supported by the probability of treatment
to have a greater utility score (or smaller loss score) than
treatment
:
or
The probabilities (4) or (5) are
used to guide a decision on taking/dropping a treatment. A possible way to
formalise the decision based on this probability is to compare it to a threshold
confidence level
. Then,
(or
) would mean that one has enough evidence to say that treatment
has a better benefit–risk balance than
with a level of confidence
. Note that
(and
) corresponds to the case where the benefit–risk profiles of
and
are equal according to the corresponding MCDA model.is the performance of treatment
on criterion
, so that treatment
is characterised by a vector showing how it
performed on each criterion:
= (
).The monotonically increasing partial value functions
are used to normalise the criterion performances.
Let
and
be the most and the least preferable values, then
and
. The inequality
indicates that the performance of the treatment
is preferred to the performance of the treatment
on criterion
. In this work, we focus on linear partial value
functions, one of the most common choice in treatment BRA[5,7,12,18,11] that can be written asThe weights indicating the relative importance of the criteria are
known constants denoted by
. The vector of weights used for the analysis is
denoted by
.The MCDA utility or loss scores of treatment
are obtained as
and
respectively, where
and
are the functions specifying how the criteria
should be summarised in a single score, and are referred to as
‘aggregation models’. The impact of this model’s choice on the
performance of treatment recommendation is the focus on this work.
The higher the utility score, or lower the loss score, the more
preferable the benefit–risk ratio. Then, the comparison of
treatments
and
is based on
or
Aggregation models
Below, we consider four specific forms of aggregation models, namely, linear,
product, multi-linear and SLoS, that were argued by various authors to be used
in the MCDA to support decision making.
Linear model
A linear aggregation of treatment’s effects on benefits and risks remains the
most common choice for the treatment development.[7,12,19,18,13] Under the linear
model, the utility score is computed as
where
and
, the superscript
referring to the linear model. The expression (6)
is used in equations (2) and (4)
to compare the associated linear scores for a pair of treatments.As an illustration of all considered aggregation models, we will use the
following example with two criteria: one benefit indexed by
, one risk indexed by
. The linear utility score for treatment
at fixed parameter values
,
takes the form
As values
, one can interpret
as a probability of benefit and
as a probability of risk. This utility score can be
transformed into a loss score by subtracting it from one
We do this as, historically, the concept of a loss function
is preferred both in statistical decision theory and Bayesian analysis for
parameter estimation.
The contours of equal linear loss score for all values of
and
are given in panel A of Figure 1 using
(top row) and
(bottom row).
Figure 1.
Contour plots for linear (A), product (B), multi-linear (C) and SLoS
(D) models with (i) two equally important criteria (top row), and
(ii) the risk criterion being twice as important (on average for
non-linear model) as the benefit criterion (bottom row). Red lines
on panels B to D represent the tangents at the middle point (0.5,
0.5).
Contour plots for linear (A), product (B), multi-linear (C) and SLoS
(D) models with (i) two equally important criteria (top row), and
(ii) the risk criterion being twice as important (on average for
non-linear model) as the benefit criterion (bottom row). Red lines
on panels B to D represent the tangents at the middle point (0.5,
0.5).The contours represent the loss score for each benefit–risk pair. Lower
values of
correspond to better treatment benefit–risk profiles. It
is minimised (right bottom corner) when the maximum possible benefit is
reached (
) = 1) with no risk (
) = 0). The contours are linear, with a constant slope
/(
). This implies that if one treatment has an increased
probability of risk of
compared to another, its benefit probability should be
increased by (
)/
to have the same utility score, and this holds for all
values of benefit and risk. This figure allows for an illustration of the
penalisation of various benefit–risk criteria and for an illustrative
comparison between treatments with different criteria. For example, any
pairwise comparison that lies on a contour line shows that the two
treatments are seen as equal.The major advantage of the linear model is its intuitive interpretation: a
poor efficacy can be compensated by a good safety, and vice-versa. However,
the linear utility score can result in the recommendation of highly unsafe
or poorly effective treatment[21,6] and, consequently, in a
counter-intuitive conclusion. Moreover, the linearity implies that the
relative tolerance in the toxicity increase is constant for all levels of benefit.
These pitfalls could be avoided (or at least reduced) by using
non-linear models.[6,22] Specifically, Saint-Hilary et al.
advocated introducing two principles a desirable benefit-risk
analysis aggregation model should have: Below, we consider three models having one or both of these
properties.One is not interested in treatments with extremely low levels of
benefit or extremely high levels of risks (regardless of how the
treatment performs on other criteria).Decreasing level of risk tolerance relative to benefits: an
increase in risk could be more tolerated when benefit improves
from ‘very low’ to ‘moderate’, compared to from ‘moderate’ to
‘very high’.
Product model
A multiplicative aggregation (known as a product model) is an alternative
method of comparing treatment’s effects on benefits and risks.
Under the product model, the utility score is computed
as
where the superscript
refers to the product model. The expression (9)
is used in equations (2) and (4)
to compare the associated product scores for a pair of treatments.The product utility score for treatment
with two criteria at fixed parameter values
,
takes the form
Similarly as for the linear model, this utility score can be
transformed into a loss score by subtracting it from one
The contours of equal product loss score for all values of
and
are given in panel B of Figure 1 using
(top row) and
(bottom row).One advantage the product model has over the linear model is that it cannot
recommend treatments with either zero benefit or extreme risk. This is
because either of these two options would result in a score of zero for the
utility function, and as such would make it impossible for such a treatment
to be recommended. The contour lines in panel B in Figure 1 demonstrate how the product
model penalises undesirable values compared to the linear model. These
contours are curved, and are bunched together tightest at points where
benefit values are low and where risk values are high. This shows how the
penalisation differs this model from the linear model, as under the linear
model, an increase/decrease in benefit–risk is treated equally regardless of
the marginal values of these criteria, whereas the values of these criteria
often have an effect on our decision making under the product model.
Multi-linear model
A multi-linear model for the aggregation of treatments’ benefits and risks
provides a one more alternative for the comparison of two treatments.
This model can be seen as attempt to combine the linear and product
model. Under the multi-linear model, the utility score is computed
as
where the superscript
refers to the multi-linear model, and the weight criteria
refer to the weight criteria given to the interaction term
between criteria
We require all the weights in the ML model to sum up to 1.
The expression (12) is used in equations
(2) and (4) to compare the
associated multi-linear scores for a pair of treatments.Considering the example with two criteria, the multi-linear utility score for
treatment
at fixed parameter values
,
takes the form
Note that the even under the constraint of the sum of the
weights to be equal to one, there is one more weight parameter than for the
linear and product models. This immediately can make the weight elicitation
procedure more involving for all stakeholders. To link the weights of the ML
model with the rest of the competing approaches (see more details in Section
2.3), we set up one more constraint, so that the number of weight parameters
is the same in all considered model (for the purpose of the comparison in
this manuscript). Specifically, we fix
where
, implying that we fix the effect of the interaction term.
Similarly as for the linear and product models, this utility score can be
transformed into a loss score by subtracting it from one:
The contours of equal linear loss score for all values of
and
,
are given in panel C of Figure 1 using
(top row) and
(bottom row).The contour lines demonstrate the almost linear trade-off between benefit and
risk, but that there is a slight curvature (which becomes more prominent as
it moves further away from more desirable values), indicating a moderate
penalisation of extreme values. This shows that while this model attempts to
penalise the undesirable criteria values, this effect does not seem to be as
strong as in the product model, admittedly due to the chosen value of the
weight,
, given to the interaction term. A moderate level of
penalisation for the chosen value of the weight corresponding to the
interaction term allows for treatments to be recommended when there is no
benefit or extreme risk, as is the case in the linear model. The more the
weight of the interaction terms, the less likely this would happen.
SLoS model
An alternative to the models proposed above is the Scale Loss Score (SLoS)
model, which was proposed by Saint-Hilary et al.
to satisfy the two desirable properties for an aggregation method.
First of all, in contrast to the three models above, SLoS considers a loss
score, rather than a utility score, as the output. Therefore, lower values
are more desirable. Under the SLoS model, the loss score is computed
as
where the superscript
refers to the SLoS model. The expression (15) is used in equations (3) and (5)
to compare the associated SLoSs for a pair of treatments. The loss score
could theoretically be transformed into a utility score as
. However, this form is usually not used because it
provides negative utility values, which is not intuitive for a utility
concept.Coming back to the example with two criteria, the loss score for treatment
at fixed parameter values
,
takes the form
The contours of equal scale loss score for all values of
and
are given in panel D of Figure 1 using
(top row) and
(bottom row).As is the case with the product model, this penalisation makes it impossible
for treatments with either no benefit or extreme risk to be recommended over
other potential treatments, compared to the linear and multi-linear models
(which can recommend such treatments). This is because a treatment that had
either of these would return a loss score of infinity (regardless of the
values of any other criteria) and would therefore be non-recommendable. On
the figure, the white colour at extreme undesirable values (either very low
benefit or very high risk) corresponds to very high to infinite loss scores
and demonstrate the penalisation effect.Of note, Figure 1
displays the contours of equal loss score for all the
models, so all the plots on this figure could be interpreted in the same
way, with lower scores (in blue) corresponding to more desirable
benefit–risk profiles. Even when these contour plots concern the same
values of weights in the models, the weights themselves
are different in each model (represented by different indices). Therefore,
when to provide a fair comparison of these models, it is important to ensure
that the models carry (approximately) the same relative importance of the
criteria defined through the slope of the contour lines. We propose an
approach to match the relative importance of the models below.
Weight elicitation and mapping
Methods for quantifying subjective preferences, for example, Discrete Choice
Experiment and Swing-Weighting, have been widely studied in the
literature.[6,7,24,25] Applied to drug BRA, the majority of the weight
elicitation methods concern the linear model. In the linear model framework, the
weight assigned to one criterion is interpreted as a scaling factor which
relates one increment on this criterion to increments on all other criteria.Note that each of the aggregation models use the individual weights,
and
However, in the actual analysis, regardless of the aggregation
model used, one can expect only one underlying level of the relative importance
of the considered benefit and risk criteria, as the stakeholders’ preferences
between the criteria should not depend on the methodology used for the decision
making. Therefore, it is crucial to make sure when applying different models to
the same problem that they reflect the same stakeholders’ preferences. We adapt
the approach proposed by Saint-Hilary et al.
to achieve that. Since comprehensive work has been published and is
currently being continued on the weight elicitation for the linear model, we
will map the weights
(hypothetically) elicited for the linear model to the weights
and
such that they reflect the same trade-off preferences between
the criteria.
Mapping for two criteria
As described in Saint-Hilary et al.,
formally, the trade-off between the criteria could be represented by
the slope of the tangent of the contour lines where the contour line passes
through the point (0.5, 0.5) (see the red lines in the contour plot of
panels B to D in Figure 1). Therefore, the expressions for the mapping of the
linear weight to the competitive models are found through the equality of
the slopes of the tangents to the corresponding contour lines.We start from the setting with two criteria. As stated above, even for the
two criteria setting, the multi-linear model requires one more weight to be
specified. Therefore, we impose a constraint on the weight corresponding to
the interaction term to obtain the unique solution for the mapped weight
, specifically
, where
. Note that for
, the multi-linear model reduces to the linear one, and for
it becomes the product of the two criteria values.Using the utility/loss scores
obtained at point
, the expressions of the equality of the tangents with two
criteria take the form
where the slope for the linear model is given in the left
hand size, and the slopes for the product, multi-linear and SLoS models are
given in the right hand side, respectively.Note, however, that the slope of the tangent of the contours for the linear
model are constant for all values of parameters and defined
by the weights
only, while the slopes for the competitive models change
with the values of the criteria. For the purpose for the weights mapping, we
would interpret
as an average relative importance of each
criterion over the others, and would match the slopes of the tangents to the
corresponding contours in the middle point,
.
Then, the equalities above reduce to
Therefore, the product weight coincides with the linear
weight in the given middle mapping point. For the SLoS model, the weight
mapping does not have an analytical solution, but the approximate value of
can be obtained by line search. Figure 2 shows the mapping from the
linear model to the multi-linear and SLoS models. It demonstrates how the
value for the linear model (x-axis) can be used to find the
respective weights for the multi-linear and SLoS models on the
y-axes.
Figure 2.
Weight mapping from the linear model to the multi-linear model (left)
and to the SLoS model (right).
Weight mapping from the linear model to the multi-linear model (left)
and to the SLoS model (right).One can note that for the multi-linear model, the proposed mapping process
may result in the obtained negative mapped values of weight. This is because
of how the weight mapping function is elicited in the two criteria case: if
the value of a weight under the linear model is less than half the value of
, then this will map to a negative value (which, in theory,
gives our criteria a negative importance – which is impossible) to reflect
the same relative importance as induced by the linear model. Intuitively, if
the interaction terms already contributes more to the importance of the one
of the criterion in the interaction, the model needs to subtract the
‘excessive’ importance from the weight corresponding to this criterion
standing alone. Whilst this effect can be negated by setting an upper limit
of the values
can take, this in term limits the effect the interaction
terms have, and can make the model more similar to the linear model. This is
demonstrated in Figure 2 for
, where any weights for the linear model that are given a
value of 0.1 or less would be mapped to 0 in the multi-linear model, rather
than a negative value.The mapped weights for the multi-linear and SLoS models do not have a direct
interpretation, and should be back-transformed to linear weights to be
interpreted. For instance, with two criteria, a weight of
for SLoS corresponds to a weight of
for the linear model. Therefore, it still means that the
risk criterion is twice as important, on average, as the
benefit criterion. Since the values are different but the underlying
interpretation remains the same, mapping the weights permits to provide the
fairest comparison between the models.Proof for the above workings is given in the Supplemental Material.
Mapping for setting with more than two criteria
The derivation above concerns the setting with two criteria only but could be
directly extended for the product and SLoS models. Specifically, one can
apply the proposed mapping function to each of the weights in the setting
with more than two criteria marginally. This would imply that the weights
are mapped with respect to the importance of all other criteria rather than
a single benefit (or risk).
The extension for the multi-linear model, however, is less
straightforward. Generally, it would be a much more involving procedure to
elicit weights for all the interactions terms as their number increases
noticeably if more than two criteria are considered. Specifically, in the
case study considered in Section 3, there are four criteria resulting in 11
interaction terms. Following the two criteria setting, we suggest to fix the
total weight attributed to all the interactions to be equal
to
. Then, the ML model for the setting with four criteria
takes the form
where the fraction
ensures that the sum of all the interaction terms equals
and this is split equally between all interaction terms.
To calculate the individual weights
, again, a mapping to the linear weights can be used. In
order for the weights to sum up to 1, the transformation
could be applied. For
, this translates into the corresponding mapping in equation
18. While this procedure does not guarantee the equality of the
slopes of the tangents, it, however, emphasises the potential challenge
associated with the use of the multi-linear model that should be taken into
account when considering it.
Case study
In this section, the performance of the four aggregation models is illustrated in the
setting of an actual case study. This will provide an insight on how the various
models perform, and what difference in the decision making they induce when applied
to real-life data. The case study in question analyses the effects of two treatments
(Venlafaxine and Fluoxetine) compared to a placebo, on the effects of treating
depression. This study uses data from Nemeroff,
and expands on the studies conducted by Tervonen et al.
and Saint-Hilary et al.Fluoxetine and Venlafaxine are both treatments used to treat depression. Here, the
benefit criterion is the treatment response (an increase from baseline score of
Hamilton Depression Rating Scale of at least 50%), and the three risk criteria are
nausea, insomnia and anxiety.Table 1 shows the
outcomes of the trial for the two treatments and the placebo.
Table 1.
Number of events and number of patients for each criteria for Venlafaxine,
Fluoxetine and Placebo.
Venlafaxine
Fluoxetine
Placebo
Treatment response
51/96
45/100
37/101
Nausea
40/100
22/102
8/102
Insomnia
22/100
15/102
14/102
Anxiety
10/100
7/102
1/102
Number of events and number of patients for each criteria for Venlafaxine,
Fluoxetine and Placebo.For all criteria, we approximate the distributions of the event probabilities by Beta
distributions
, with
= number of occurrences and
= (number of patients
number of occurrences) of the considered event (response or
adverse event), assuming Beta(0,0) priors. We generated 100,000 samples from each
distribution. These samples are then used to approximate the distributions of the
linear partial value functions (PVFs) as defined in equation (1) for all
criteria and all treatment arms, with the following most and least preferred
probabilities of occurrence
and
: This case study considers three different weighting combinations, which were
used under the linear model by Saint-Hilary et al.
These sets of weights correspond to three different scenarios of the relative
importance of the criteria for the stakeholders. The first scenario reflects the
case when all four criteria are equally important. The second scenario corresponds
to the benefit criterion having more relative importance than all risk criteria
together. The third scenario can be considered as a ‘safety first’ scenario, in
which each risk criterion has a higher weight than the benefit criterion. As
discussed in Section 2.3, the weights of the criteria for the product, multi-linear
and SLoS models are obtained by mapping. Note, again, that while the multi-linear
model might not exactly induce the same average relative importance of the criteria,
the proposed procedure suggests to control the contribution of the interaction terms
in the decision at the given level of
, and therefore is used for the sake of simplicity. The mapped
weights for each of the three scenarios are presented in Table 3.
Table 3.
Table of mapped weights for each of the three scenarios.
Scenario 1
Scenario 2
Scenario 3
Model
w1
w2
w3
w4
w1
w2
w3
w4
w1
w2
w3
w4
Linear
0.25
0.25
0.25
0.25
0.58
0.11
0.15
0.15
0.18
0.28
0.25
0.29
Product
0.25
0.25
0.25
0.25
0.58
0.11
0.15
0.15
0.18
0.28
0.25
0.29
Multi-linear
0.20
0.20
0.20
0.20
0.53
0.06
0.10
0.10
0.13
0.23
0.20
0.24
SLoS
0.30
0.30
0.30
0.30
0.56
0.16
0.21
0.21
0.24
0.33
0.30
0.34
Most and least preferable values of
and
for the response.Most and least preferable values
and
for the adverse events.Table of mapped weights for each of the three scenarios.Three pairwise comparisons are made: Venlafaxine against Fluoxetine, Venlafaxine
against Placebo and Fluoxetine against Placebo. We consider that one treatment is
recommended over another if the probabilities defined in (4) or
(5)
are greater than
. The probabilities of recommendations under all three scenarios
and for each aggregation model are given in Table 4.
Table 4.
Probability of treatment being recommended as the best treatment against
another for the three pairwise comparison, using each of the four
aggregation models, for each of the three weighting scenarios.
Probability of treatment being
Venlafaxine over
Venlafaxine over
Fluoxetine over
recommended as best treatment
Fluoxetine
Placebo
Placebo
Scenario 1
Linear
1.7%
<0.1%
7.2%
Product
1.7%
1.6%
37.0%
Multi-linear
1.7%
<0.1%
9.1%
SLoS
1.8%
3.7%
47.3%
Scenario 2
Linear
48.0%
64.7%
66.9%
Product
42.6%
74.9%
80.4%
Multi-linear
46.3%
63.0%
66.3%
SLoS
36.6%
72.5%
81.4%
Scenario 3
Linear
0.6%
0%
2.1%
Product
0.5%
0.1%
18.5%
Multi-linear
0.6%
0%
3.0%
SLoS
0.6%
0.6%
30.1%
Probability of treatment being recommended as the best treatment against
another for the three pairwise comparison, using each of the four
aggregation models, for each of the three weighting scenarios.Under the first scenario with the equal weights for all criteria, the treatment with
preferable risk criteria values was more likely to be recommended as the three risk
criteria altogether have a greater weight than the one benefit criterion. For the
comparison between Venlafaxine and Fluoxetine, the probability that Venlafaxine has
better benefit–risk characteristics is around 1.7%–1.8% under all four models. For
the comparison between Venlafaxine and the placebo, there is only a minor difference
in the probability that Venlafaxine has better benefit–risk characteristics
(<0.1% in the linear and multi-linear models, 1.6% in the product model and 3.7%
in the SLoS model), not enough of a difference to change the recommendation.
However, when comparing Fluoxetine to the placebo, a notable difference is observed.
Under the linear and multi-linear models, the probability of Fluoxetine having the
better benefit–risk characteristics is around 7%–10% (suggesting the placebo is much
more preferable), whilst this rises to 37% under the product model and 47.3% under
the SLoS model (suggesting near-parity of treatments). This occurs due to the
penalisation of low benefit criterion values for the placebo, where the 95% credible
interval includes values close to zero (in bold in Table 2). These low values are harshly
penalised under the product and SLoS models, as they suggest that the placebo
induces no treatment benefit with a non-neglectable probability. The linear model
does not account for this and strongly favours the placebo, while the multi-linear
does not penalise these values strongly.
Table 2.
Mean (95
credible interval) of the Beta posterior distributions of
benefit and risk parameters and of corresponding PVFs for Venlafaxine,
Fluoxetine and Placebo (with values in bold corresponding to
those that leading to significant differences between models).
Venlafaxine
Fluoxetine
Placebo
Treatment response
ξi,1
0.52 (0.42, 0.62)
0.45 (0.35, 0.55)
0.37 (0.28, 0.46)
u1(ξi,1)
0.53 (0.37, 0.70)
0.42 (0.26, 0.58)
0.28(0.13,0.44)
Nausea
ξi,2
0.40 (0.31, 0.50)
0.22 (0.14, 0.30)
0.08 (0.04,0.14)
u2(ξi,2)
0.20(0.00,0.39)
0.57 (0.40, 0.72)
0.84 (0.72, 0.93)
Insomnia
ξi,3
0.22 (0.15, 0.31)
0.15 (0.09, 0.22)
0.14 (0.08, 0.21)
u3(ξi,3)
0.56 (0.39, 0.71)
0.71 (0.56, 0.83)
0.73 (0.58, 0.84)
Anxiety
ξi,4
0.10 (0.05, 0.17)
0.07 (0.03, 0.13)
0.01 (0.00, 0.04)
u4(ξi,4)
0.80 (0.67, 0.90)
0.86 (0.75, 0.94)
0.98 (0.93, 1.00)
Mean (95
credible interval) of the Beta posterior distributions of
benefit and risk parameters and of corresponding PVFs for Venlafaxine,
Fluoxetine and Placebo (with values in bold corresponding to
those that leading to significant differences between models).Under the second scenario, the treatment response is considered as the most important
factor, and is given a weighting greater than that of the three risk criteria
combined. For the comparison between Venlafaxine and Fluoxetine, both the product
and SLoS models say that Venlafaxine has inferior benefit–risk characteristics
(42.6% and 36.6% probability of being better, respectively). More average results
are observed with both the linear model, which gives a probability of 48.0%, and the
multi-linear model, which gives a probability of 46.3%. Again, the difference
between the probability of the linear model and those of the product and SLoS models
is due to the penalising effects of the latter. This occurs because of the nausea
risk criterion interval contains zero for Venlafaxine (in bold in Table 2), which causes
the product and SLoS models to recommend Fluoxetine more often than Venlafaxine,
despite the weighing criteria giving preference to the treatment response (which is
greater with Venlafaxine). With the multi-linear model, the penalisation of the
undesirable nausea criterion is not as strong as in the product or SLoS models, as
the weight mapping induces a drop from 0.11 to 0.06 in the weight given to the
corresponding individual term, and the effect of the interaction terms is not enough
to overcome this.For the comparison between Venlafaxine and the placebo, the probability that
Venlafaxine has better benefit-risk characteristics is between 63% and 75
across the four models. The product and SLoS models both penalise
the low benefit value of the placebo, which is why they are both more likely to
recommend Venlafaxine than the other two models. Additionally, the product and SLoS
models both also penalise the nausea criterion value of Venlafaxine, and due to the
increase weighting given to it by the SLoS model mapping, this causes the product
model to be more likely to recommend Venlafaxine than the SLoS model.For the comparison between Fluoxetine and the placebo, the probability that
Fluoxetine has better benefit-risk characteristics is around 65%–80% under all four
models, with the probability of Fluoxetine being preferable increasing as the
methods increase the penalisation applied to the placebo’s lack of benefit effect.
The stronger penalisation occurs under the product and SLoS models, hence why they
are both more likely to recommend Fluoxetine.Across all three comparisons, the multi-linear model is always slightly less likely
to recommend the treatment with the greater benefit value than the linear model. As
this is the scenario where the benefit criterion is considered to be the most
important, this shows that the weight splitting with the multi-linear model induces
a loss of the preferences that were given when the weights were originally set out
for the linear model, illustrating some of the problems theorised in the methods
section.Under the third scenario, a ‘safety first’ approach is adopted, giving the risk
factors a higher weighting. The probability that Venlafaxine has better benefit–risk
characteristics is around 0.5%–0.6% when it is compared to Fluoxetine and around
0%–0.6% when it is compared to placebo, under all four models. For the comparison
between Fluoxetine and the placebo, the probability that Fluoxetine has better
benefit–risk characteristics is around 2.1%–3.0% for the linear and multi-linear
models, whilst this increases to 18.5% under the product model and 30.1% under the
SLoS model. This increase occurs for the same reasons outlined for the same
comparison in scenario 1: The penalisation of the benefit criterion for the placebo,
with its 95% credible interval including low values (in bold in Table 2). The linear
model does not account for this and strongly favours the placebo, while the
multi-linear does not penalise these values sufficiently and still favours the
placebo.Overall, this case study provides us with a number of important observations shedding
a light on the differences in the aggregation performances. Firstly, the effects of
extremely undesirable outcomes (those highlighted in bold in Table 2) are more significantly and
consistently penalised in the product and SLoS models (the penalisation is stronger
in the SLoS model than the product model, although they give the same recommendation
for every comparison). These examples also help to show that the models provide
similar recommendations when one treatment is clearly preferable than its
competitor. Lastly, the weight splitting in the multi-linear model induces a change
in the relative importance between criteria that may not always reflect the choices
of weights as well as other models, highlighted in scenario 2. This makes it less
appealing than other models.To draw further conclusions regarding the differences between models, we conduct a
comprehensive simulation study under various scenarios and under their many
different realisations.
Simulation study
To evaluate the performances of the four aggregation models, a comprehensive
simulation study covering a wide range of possible clinical cases is conducted.
This allows us to investigate many scenarios and their various realisations
rather than a single dataset as in the case study. The simulation study is
preformed in a setting with two treatments, named
and
, that are compared in randomised clinical trials with
patients allocated to each treatment. Each treatment is
evaluated based on two criteria: one benefit (
) and one risk (
). We assume that benefit events are desirable (e.g. treatment
response), while risk events should be avoided (e.g. adverse event), with
being their true probability of occurrence for each treatment
. The PVFs are defined as
and
. The two criteria are deemed equally important and therefore
are given equal weighting criteria. We start with the case of uncorrelated
criteria and explore the effect of the presence of correlations in Section 4.4.
The range of true values of the benefit and risk criteria and the corresponding
simulation scenarios are given in Figure 3.
Figure 3.
Simulation scenarios for the trial with two criteria.
Simulation scenarios for the trial with two criteria.Figure 3 shows all the
different values that the benefit and risk criteria can take for both
and
, where black squares correspond to the pairs of criterion
values for
and white circles correspond to the pairs of criterion values
for
. For each of the nine fixed characteristics of
, all 81 possible values of
, with
are considered, resulting in 729 scenarios. The fixed
characteristics for
are referred to as follows:Scenario 1:
=(
=0.5,
=0.5) Scenario 2:
=(
=0.3,
=0.7)Scenario 3:
=(
=0.7,
=0.3) Scenario 4:
=(
=0.1,
=0.1)Scenario 5:
=(
=0.9,
=0.9) Scenario 6:
=(
=0.3,
=0.3)Scenario 7:
=(
=0.7,
=0.7) Scenario 8:
=(
=0.9,
=0.1)Scenario 9:
=(
=0.1,
=0.9)) where
is the true value of criterion
for
.
Data generation and comparison procedure
The following Bayesian procedure is used for the simulation study: The aggregation models will be compared using
, which is the probability that the model
recommends
over
, and
, which is the difference between the probability that the
model
recommends
and the probability that the model
recommends
. The value of
represents a difference between two probabilities, and can
therefore take the range of values
. If
, then the model
recommends
more often than model
. If
, then the model
recommends
more often than model
. If
, then the two models make the recommendations with the same
probability. Note that, for the ML model, we adopt
as in the case study above.Step 1: Simulate randomised clinical trials with two treatments
and
, each with two uncorrelated criteria, and the
sample size of
in each treatment arm.Step 2: Derive the posterior distributions using the simulated data
assuming a degenerate prior, Beta(0,0), to reduce the influence of
the prior distribution. Draw 2000 samples from each posterior
distribution of the criteria and obtain the corresponding empirical
distribution for the PVF.Step 3: Use the posterior distributions of the PVF in each of the
aggregation models as given in equations 2 and
3 to compute the
probability in equations 4 and
5 that treatment
has better benefit–risk profile,
(for some model
), and compare to the threshold value
. If
, then treatment
is recommended. If
, then treatment
is recommended. If 0.2
, then neither treatment is recommended.Step 4: Repeat steps 1–3 for 2500 simulations trials.Step 5: Estimate the probability that each treatment is recommended
by its proportion over 2500 simulated trials.
Results
The results are presented on Figures 4 and 5. The first seven scenarios referred to above for treatment
are presented in the rows labelled 1–7. Each graph corresponds
to fixed expected probabilities of event for treatment
, and each cell corresponds to a combination of expected
probabilities of benefit and risk for
. When reference is made to the ‘diagonal’, this refers to the
diagonal line that runs from the bottom left corner of the graph to the top
right. In all scenarios, all models agree to recommend
when it is undoubtedly better than
, i.e., when
is more effective and less harmful than
(or to recommend
when
is indisputably worse, i.e., less effective and more toxic).
For this reason, the results for scenarios 8 and 9 are not presented here, but
are included in the Supplemental Material for completeness.
Figure 4.
Probability that the model recommends
over
,
), for scenarios 1 to 7 for the linear (red), product
(purple), multi-linear (orange) and SLoS (blue) models.
Figure 5.
Results of the six pairwise comparisons of the four AM, where a cell
being a colour indicates that AM recommended
more than the comparative AM (the deeper the colour,
the greater the difference in recommendation).
Probability that the model recommends
over
,
), for scenarios 1 to 7 for the linear (red), product
(purple), multi-linear (orange) and SLoS (blue) models.Results of the six pairwise comparisons of the four AM, where a cell
being a colour indicates that AM recommended
more than the comparative AM (the deeper the colour,
the greater the difference in recommendation).The probabilities
) (red),
) (purple),
) (orange) and
) (blue) are shown in Figure 4, and all six pairwise
comparisons in these probabilities are given in Figure 5. From left to right, Figure 5 shows
and
.In Figure 5, a colour of
a cell corresponds to the aggregation model of this colour to recommend
treatment
with higher probability than another method. For instance, red
cells in the first column of Figure 5 showing (
) indicate that, when
characteristics take the corresponding value, the linear model
recommends
more often than the product one.In scenario 1, the four models are in agreement to recommend
when
corresponds to less benefit and more risk. On the diagonal,
the product and SLoS models both favour
over
when
has either extremely high benefit and risk (top right corner),
or extremely low benefit and risk (left bottom corner), compared to either the
linear or multi-linear models. This occurs due to the penalisation of extremely
low benefit and extremely high risk by the product and SLoS models. Comparing
product and SLoS models for these values of benefit–risk, SLoS favours
over
more often for low but not boundary values of the criteria.
This occurs due to the SLoS model penalising the undesirable qualities more than
the product one (this is similar to trends observed in the case study). Compared
to the linear model, the multi-linear model recommends
over
with higher probability when
has either higher benefit and higher risk, or lower benefit
and lower risk due to the interaction term providing mild penalisation of
extremely high risk or extremely low benefit. However, there is (in most cases),
a greater magnitude of difference between the SLoS and product models than
between the linear and multi-linear models.For example, when
has criteria values
(benefit),
(risk) (lower benefit, lower risk),
is recommended in 2% of the trials under the linear model, in
70% under the product, in 8% under the multi-linear and in 90% under SLoS. This
tells us that the product and SLoS models do not permit that the decrease in
risk is worth the decrease in benefit that comes with it (the SLoS model more
than the product model), whilst the linear and multi-linear models both consider
it acceptable. Considering the case when
,
(higher benefit and higher risk compared to
),
is recommended in 20% of the trials under the linear model
compared to 49% for product model, 25% for the multi-linear model and 61% for
SLoS model. This tells us that the product and SLoS models do not permit that
the increase in benefit is worth the increase in risk that comes with it (again,
this effect is stronger in the SLoS model than the product model), whilst the
linear and multi-linear both consider it acceptable (the linear model more-so
than the multi-linear model). Similar observations can be made in scenarios 2
and 3.However, a distinguishing difference between the designs under scenario 1 can be
found when
has the criteria
,
. In this comparison,
is recommended in 0% of the trials under the linear model
compared to 11% for product model, 0% for the multi-linear model and 30% for
SLoS model. Meanwhile,
is recommended in 92% of the trials under the linear model
compared to 32% for product model, 84% for the multi-linear model and 13
for SLoS model. This shows that the linear, product and
multi-linear models are all more likely to recommend
, whilst only the SLoS model is more likely to recommend
. This occurs due to the different strengths of penalisation
between the models, and only the SLoS model does not consider this an acceptable
trade-off. This shows that the product model and the SLoS model do not always
make the same recommendations, and that these differences can sometimes be quite
large.In scenario 4, where
has extremely low benefit and risk, it is very rarely
recommended by either the product of SLoS models, whereas it recommended by both
the linear and multi-linear models, in cases where
has some increase in benefit, but a higher increase in risk.
This occurs because the SLoS and product models penalise extremely low benefit
so severely that the level of risk has almost no impact on the recommendation.
The multi-linear model also penalises the extreme low benefit, but on a much
smaller scale. For example, for
with criteria values
,
,
is recommended with probability 68% under the linear model,
never recommended under the product model, 41% under the multi-linear model and
never recommended under the SLoS model. This shows that the product and SLoS
models reflect the desirable properties outlined above: that we are not
interested in the risk criterion value of a treatment if the benefit criterion
value is small/zero, whilst both the linear and multi-linear models do not
reflect this (although the multi-linear model does somewhat penalise this).
Similar results are observed in scenario 5, where
has extreme risk and extreme benefit. The SLoS and product
models will recommend
if it has lower risk than
as long as it has some benefit, whereas the linear model and
the multi-linear model will recommend
over
if the benefit of
decreases by a greater amount than the risk.It should be noted that poor recommendations can be made under the product and
SLoS models if both
and
have a risk criterion value of 0.9, as the strength of the
penalisation of the undesirable criteria overpowers the effect of the benefit.
For example, in scenario 5 where
has criteria values
,
(same risk criterion value as
but a lower benefit criterion value),
is recommended with probability 75% under the linear model,
27% under the product model, 68% under the multi-linear model and 23% under the
SLoS model (this effect is stronger in the SLoS model than in the product model
due to its harsher penalisation of the undesirable criteria). They both did
recommend
with probabilities 13% and 17%, respectively, showing that
they still recommend the better treatment
more often than
, but that these two models hardly discriminate very unsafe
drugs (for comparison, both the linear and multi-linear models only recommended
with probability 1% each).In scenarios 6 and 7, all AM recommend
over
when
is unarguably worse (similarly they all recommend
over
when
is unarguably worse). Along the diagonal, the SLoS model
recommends
over
more often than the other AM when
has either extreme low benefit and extreme low risk, or
extreme high benefit and extreme high risk, compared to
(although the product model recommends
only a slightly smaller proportion of times than the SLoS
model). Again, this is the result of the penalisation of extremely low benefit
or extremely high risk criteria. Similarly, the multi-linear model recommends
over
more often than the linear model in the same circumstances.
For example, in Scenario 6, when
has criteria values
,
(lower benefit and lower risk),
is recommended with probability 21% under the linear model,
59% under the product model, 28% under the multi-linear model and 68% under the
SLoS model. This shows how the different levels of penalisation affect the
recommendations, where the stronger the penalisation of the undesirable low
benefit criterion value, the more likely an AM is to recommend
, and is the reason why there is such a large difference
between the linear and SLoS models recommendations.Overall, the simulation study has shown that, for the two criteria having an
equal relative importance, SLoS penalises extremely low benefit and extremely
high risk criteria the most, whilst the product model penalises these
moderately, acting as a sort of middle ground between the linear and SLoS
models. The multi-linear model offers a small amount of penalisation (less than
the product model), but due to the added complexity of this model when more
criteria are added, it should not be recommended over either the SLoS model or
the product model. The linear and multi-linear models both recommend treatments
with no benefit/high risk over other viable alternatives, which contradicts
conditions set out by Saint-Hilary et al.
Therefore we can provisionally conclude that the two models that appeal
most at this point are the product and SLoS models.
Sensitivity analysis: Correlated criteria
The results above concerned the case with the two criteria being uncorrelated.
However, it might be reasonable to assume that the criteria for one treatment
might be correlated. In this section, we study how robust the recommendation by
each of the four models are to the correlation between the benefit and risk
criteria. We consider two cases of the correlation: a strong positive
correlation (
) and a strong negative correlation (
) between the criteria. The correlated outcomes were generated
using a procedure laid out in Mozgunov et al.We study how likely the correlated outcomes are to change the final
recommendation of one of the treatments. Specifically, we study the proportion
of cases under each of the scenarios in which the difference in the probability
of recommending treatment
,
, changes by more than 2.5% and by 5%. Table 5 shows the number of cases (out
of 81) under each of nine scenarios, in which the differences in the
probabilities to recommend
over
changes by at least 2.5% and 5% comparing the positively
correlated and uncorrelated criteria. The case investigating the effects of
negative correlation shows similar results to those presented here, and is
included in the Supplemental Material. For example, the first entry in Table 5 shows that in
37% cases under scenario 1, the probability to recommend
changes by at least 2.5% if the linear model is used.
Table 5.
Number of times (
) when the difference in recommending
changes by at least 2.5% or 5% between the positively
correlated criteria and the non-correlated criteria.
Linear model
Product model
Multi-linear model
SLoS model
Scenario 1
≥2.5%
30/81 (37.0%)
24/81 (29.6%)
29/81 (35.8%)
22/81 (27.2%)
≥5%
22/81 (27.2%)
15/81 (18.5%)
21/81 (25.9%)
12/81 (14.8%)
Scenario 2
≥2.5%
10/81 (12.3%)
15/81 (18.5%)
15/81 (18.5%)
12/81 (14.8%)
≥5%
5/81 (6.2%)
3/81 (3.7%)
5/81 (6.2%)
3/81 (3.7%)
Scenario 3
≥2.5%
16/81 (19.8%)
13/81 (16.0%)
17/81 (21.0%)
14/81 (17.3%)
≥5%
6/81 (7.4%)
5/81 (6.2%)
4/81 (4.9%)
4/81 (4.9%)
Scenario 4
≥2.5%
22/81 (27.2%)
3/81 (3.7%)
22/81 (27.2%)
0/81 (0%)
≥5%
17/81 (21.0%)
0/81 (0%)
15/81 (18.5%)
0/81 (0%)
Scenario 5
≥2.5%
23/81 (28.4%)
4/81 (4.9%)
22/81 (27.2%)
0/81 (0%)
≥5%
17/81 (21.0%)
0/81 (0%)
15/81 (18.5%)
0/81 (0%)
Scenario 6
≥2.5%
29/81 (35.8%)
21/81 (25.9%)
30/81 (37.0%)
17/81 (21.0%)
≥5%
20/81 (24.7%)
12/81 (14.8%)
16/81 (19.8%)
4/81 (4.9%)
Scenario 7
≥2.5%
25/81 (30.9%)
19/81 (23.5%)
24/81 (29.6%)
13/81 (16.0%)
≥5%
14/81 (17.3%)
9/81 (11.1%)
15/81 (18.5%)
2/81 (2.5%)
Scenario 8
≥2.5%
0/81 (0%)
0/81 (0%)
0/81 (0%)
0/81 (0%)
≥5%
0/81 (0%)
0/81 (0%)
0/81 (0%)
0/81 (0%)
Scenario 9
≥2.5%
1/81 (1.2%)
1/81 (1.2%)
1/81 (1.2%)
1/81 (1.2%)
≥5%
0/81 (0%)
0/81 (0%)
0/81 (0%)
0/81 (0%)
Total
≥2.5%
156/729 (21.4%)
100/729 (13.7%)
160/729 (21.9%)
79/729 (10.88%)
≥5%
101/729 (13.9%)
91/729 (12.5%)
44/729 (6.0%)
25/729 (3.4%)
Number of times (
) when the difference in recommending
changes by at least 2.5% or 5% between the positively
correlated criteria and the non-correlated criteria.Table 5 shows that
all four models are the most affected by correlation under scenario 1 with the
characteristics of
being in the middle of the unit interval. This effect is,
however, less prominent for the product and SLoS models. At the same time, under
scenarios 2 to 7, the correlation has a larger effect on the linear and
multi-linear models than on the other two models. Scenarios 8 and 9 are hardly
affected by any correlation, and the effect is similar across all four
models.Overall, the SLoS model is the least affected by correlation between the
criteria, the product model is the second least affected whereas the
multi-linear (for the threshold 2.5%) and the linear model (for the threshold
5%) are the most affected ones.
Discussion
In this article, four potential AM are investigated for use in benefit–risk analyses:
The linear model, product model, multi-linear model and the SLoS model. The
differences of these models were highlighted in a case-study and a simulation
study.In most clear cases (i.e. when one treatment has more benefit and less risk than the
competitor), all AM gave similar recommendations. However, in cases where one
treatment had either no benefit or extreme risk, the models which penalised
undesirable values more (the product and SLoS models) gave more desirable
recommendations: non-effective or extremely unsafe treatments are never recommended.
Furthermore, with these models, more risk is accepted in order to increase benefit
when the amount of benefit is small than when it is high (or less benefit is
desirable to reduce risk when the amount of risk is high than when it is small),
which is consistent with the well established assumption of non-linearity of human preferences.
It should be noted that these models hardly discriminate two treatments that
slightly differ but have both extremely undesirable properties. However, in this
case, none of the treatments should be recommended anyway.The effects of correlations between criteria was also investigated in this study. The
overall effect of correlations was small to negligible in the product and SLoS
models, showing these AM are not much affected by correlations between the criteria.
However, the linear and multi-linear models were more likely to see a 2.5% or 5%
change in the probability of recommending one treatment over another, showing that
they are more affected by correlations between the criteria.A simple mapping was applied to obtain multi-linear and SLoS weights from linear
weights, so that the models could be fairly compared while preserving the weight
interpretation. However, since the mapping is not far from an identity
transformation, omitting it would not have a major impact on the results, as
demonstrated in Saint-Hilary et al. 2018
for SLoS.Overall, the two models to recommend from this investigation are the product model
and the SLoS model, depending on how severely the decision maker whish to penalise
treatments with either no benefit or extreme risk (moderate penalisation: product
model, strong penalisation: SLoS model). The multi-linear model, whilst acting as a
middle ground between the linear model and the product and SLoS models in the
simulation study, involves an increased complexity behind the model. These include
the increased complexity involved with adding additional terms and increased
difficulty in weight mapping. This model also struggled to truly reflect the
weightings given in the case study, especially in scenario 2. Because of this, we do
not recommend this AM over the product or SLoS models. Additionally, the linear and
multi-linear models should not be recommended as both of these models do not contain
the two desirable properties outlined in Saint-Hilary et al.
: That treatments with no benefit/extreme risk should not be recommended, and
that a larger increase in risk is accepted in order to increase the benefit if the
benefit is small compared to if the benefit is high – both of which are present in
the product and SLoS models.
Authors: Praveen Thokala; Nancy Devlin; Kevin Marsh; Rob Baltussen; Meindert Boysen; Zoltan Kalo; Thomas Longrenn; Filip Mussen; Stuart Peacock; John Watkins; Maarten Ijzerman Journal: Value Health Date: 2016-01-08 Impact factor: 5.725
Authors: Henk Broekhuizen; Maarten J IJzerman; A Brett Hauber; Catharina G M Groothuis-Oudshoorn Journal: Pharmacoeconomics Date: 2017-03 Impact factor: 4.981
Authors: Gaelle Saint-Hilary; Veronique Robert; Mauro Gasparini; Thomas Jaki; Pavel Mozgunov Journal: Stat Methods Med Res Date: 2018-07-20 Impact factor: 3.021