Rowan Iskandar1,2. 1. Center of Excellence in Decision-Analytic Modeling and Health Economics Research, Swiss Institute for Translational and Entrepreneurial Medicine (sitem-insel), Bern, Switzerland. 2. Department of Health Services, Policy, & Practice, Brown University, Providence, Rhode Island, USA.
Abstract
Decisions about health interventions are often made using limited evidence. Mathematical models used to inform such decisions often include uncertainty analysis to account for the effect of uncertainty in the current evidence base on decision-relevant quantities. However, current uncertainty quantification methodologies, including probabilistic sensitivity analysis (PSA), require modelers to specify a precise probability distribution to represent the uncertainty of a model parameter. This study introduces a novel approach for representing and propagating parameter uncertainty, probability bounds analysis (PBA), where the uncertainty about the unknown probability distribution of a model parameter is expressed in terms of an interval bounded by lower and upper bounds on the unknown cumulative distribution function (p-box) and without assuming a particular form of the distribution function. We give the formulas of the p-boxes for common situations (given combinations of data on minimum, maximum, median, mean, or standard deviation), describe an approach to propagate p-boxes into a black-box mathematical model, and introduce an approach for decision-making based on the results of PBA. We demonstrate the characteristics and utility of PBA vs PSA using two case studies. In sum, this study provides modelers with practical tools to conduct parameter uncertainty quantification given the constraints of available data and with the fewest assumptions.
Decisions about health interventions are often made using limited evidence. Mathematical models used to inform such decisions often include uncertainty analysis to account for the effect of uncertainty in the current evidence base on decision-relevant quantities. However, current uncertainty quantification methodologies, including probabilistic sensitivity analysis (PSA), require modelers to specify a precise probability distribution to represent the uncertainty of a model parameter. This study introduces a novel approach for representing and propagating parameter uncertainty, probability bounds analysis (PBA), where the uncertainty about the unknown probability distribution of a model parameter is expressed in terms of an interval bounded by lower and upper bounds on the unknown cumulative distribution function (p-box) and without assuming a particular form of the distribution function. We give the formulas of the p-boxes for common situations (given combinations of data on minimum, maximum, median, mean, or standard deviation), describe an approach to propagate p-boxes into a black-box mathematical model, and introduce an approach for decision-making based on the results of PBA. We demonstrate the characteristics and utility of PBA vs PSA using two case studies. In sum, this study provides modelers with practical tools to conduct parameter uncertainty quantification given the constraints of available data and with the fewest assumptions.
Decision‐analytic models (DAMs) have been used in numerous applications, from clinical decision‐making to cost‐effectiveness analysis (CEA). A DAM integrates evidence within a coherent and explicit mathematical structure used to link evidence to decision‐relevant outcomes.
There are situations where the evidence required for informing the values of model parameters that govern the behavior of DAMs is incomplete or non‐existence, for example, health economic modeling at the early stages of a product's life cycle
,
and lack of resources to obtain the required data.
The importance of explicitly accounting for incomplete knowledge about model parameters (parameter uncertainty) and propagating its effect through a decisional process is underscored in numerous guidance documents in health, including, but not limited to, the guidelines by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR)‐Society for Medical Decision Making (SMDM),
the Agency for Healthcare Research and Quality (AHRQ),
the 2nd panel for Cost‐Effectiveness Analysis (CEA) in Health and Medicine,
and beyond.
At its most basic characterization, parameter uncertainty means that we do not know the exact value of a parameter as several different (potentially uncountable) values may be possible for reasons such as the amount (the size of the available samples of observations) and quality (measurement error or accuracy of the observations) of the available information.
In many situations, the only information we have about a parameter is that it belongs to an interval bounded by a lower bound and an upper bound. In addition to knowing the interval, we may have some information about the relative plausibility of different values of in the interval. In situations where we have data or previous knowledge about a parameter, we can leverage standard statistical techniques to represent uncertainty in the form of a probability distribution. However, when data and knowledge are limited, we may have only partial or no information about the probability distribution, that is, we can not assign the relative plausibilities of different parameter values. In some cases, we only know the measures of central tendency (mean or median) from published articles, while, in more extreme cases, only the minimum and maximum values are known to the researchers. To handle such data sparsity situations, it is necessary to have an approach for quantifying parameter uncertainty using the fewest number of assumptions and without the need for assuming precise probability distributions.Despite the emphasis on its importance, the ISPOR‐SMDM best‐practice
recommends only two analytical tools for evaluating the effect of incomplete knowledge of model parameters on decisional‐relevant outcomes despite the wealth of available methods in the engineering literature.
First, the best practice prescribes a set of default probability distributions that are mainly driven by the consideration of the parameter's support. For example, a beta distribution is used for characterizing the uncertainty of a parameter with a support . As a result, modelers tend to rely on “off‐the‐shelf” probability distributions to portray uncertainty “realistically over the theoretical range of the parameter.”
The use of “default distributions” is, in fact, a matter of convenience because there is no sure way to verify that our choice of the distribution and its parameters is indeed valid. Furthermore, forcing the modelers to commit to a particular distribution implicitly assumes that the modelers have more information (eg, knowing the shape of a distribution) than they actually possess and the uncertainty is known and quantifiable by a probability distribution. Second, the best practice proposes the use of expert knowledge elicitation
if no prior data is available. However, the proposed approach also hinges on a rather unverifiable assumption: the precise form of the probability distribution. The lack of methodological guidance is due to the lack of an available approach for representing and propagating parameter uncertainty in situations where it is impossible to assume a precise probability distribution.An ideal approach to parameter uncertainty characterization is one that requires minimal assumptions. Specifically, in the absence of individual patient data, such an approach should require only information on statistics that are typically accessible to practitioners, such as mean, median, quantiles, minimum, and maximum (hereinafter collectively termed as minimal data). Additionally, the ideal method does not require information on or assumptions about the precise form of a probability distribution. Probability bounds analysis (PBA),
a combination of interval analysis and probability theory, is one such method and has been applied in risk engineering and management studies
,
and many other fields.
,
,
In a PBA, the uncertainty about the probability distribution for each model parameter is expressed in terms of upper and lower bounds on the cumulative distribution function (CDF). These CDF bounds form a probability box and are sufficient for circumscribing the unknown CDF of the model parameter given some minimal data about it. The goal of this article is to introduce the PBA method for representing and propagating parameter uncertainty in situations where knowledge or data about the parameter is limited and a probability distribution can not be specified precisely or the practitioners are not willing to commit to a particular form. In this study, we assume that the model parameters are mutually independent. This article is organized into five main parts. First, we review the concept of parameter uncertainty quantification. Second, we formally describe an approach for representing parameter uncertainty in PBA using a probability box. We focus on free probability boxes
that is a generalization of parametric probability boxes. Then, we describe an approach for propagating probability boxes into a mathematical model. Fourth, we introduce an approach for decision‐making using PBA results. Then, we demonstrate two applications of PBA in modeling using Markov cohort models and a cost‐effectiveness analysis of novel health technology. Lastly, we conclude with a discussion on the limitations and directions for future research. Throughout this exposition, we try to strike a balance between mathematical rigor and accessibility to practitioners.
PRELIMINARIES
We begin by briefly introducing the concept of parameter uncertainty quantification and the status quo approach of probability sensitivity analyses.
Parameter uncertainty quantification
Let denote a mathematical model (eg, a cost‐effectiveness model
or a decision‐analytic model
) that maps a set of k
inputs () to a set of quantities of interest , that is, . We treat as a black‐box model, that is, only and the corresponding , after “running” at particular values of or a realization of , are accessible. We assume that the values of cannot be determined precisely due to lack of knowledge or data (epistemic uncertainty). Our uncertainty about each parameter in may vary according to the extent of what is known. To quantify the effect of not knowing the values of parameters precisely on decision‐relevant outcomes (parameter uncertainty quantification), we proceed with the following tasks. First, we specify a mathematical framework to encode the degree of uncertainty in the model parameters (parameter uncertainty representation). Then, we prescribe an approach to propagate parameter uncertainty, given a representation from the previous step, into our health economic model (parameter uncertainty propagation). Lastly, we set an approach to interpret the resulting uncertainty in the model outcomes for use in further analyses.
Probabilistic sensitivity analysis
If we adopt the standard approach for parameter uncertainty quantification, that is, probabilistic sensitivity analysis (PSA),
we proceed with the following steps. For parameter uncertainty representation, we treat each parameter in as a random variable that is endowed with a CDF, , which is a monotonically increasing function from a sample space (eg, the real number line ) onto , zero at negative infinity, and one at positive infinity. In situations where the availability of data informing the estimation of the parameters is limited or non‐existent, practitioners tend to select a type of CDF whose support matches with the model parameter's support (eg, gamma distribution for non‐negative parameters). Hence, this common practice implicitly assumes that the form of
can be precisely specified. The location and ancillary parameters of the chosen distributions are typically estimated using a moment matching approach. After the CDF has been assigned to each uncertain parameter, the uncertainty propagation follows an iterative Monte Carlo sampling approach. For each iteration, parameter values are sampled independently from the precisely specified CDFs, and the model is evaluated using these values to generate model outcomes. After a prespecified number of samples, an empirical CDF of the model outcome is obtained. Given the empirical distribution of an outcome, we can calculate its expected value and use it as an input to other analytical tasks (eg, decision rule and value of information analysis).
PARAMETER UNCERTAINTY REPRESENTATION
This section introduces the parameter uncertainty representation step of PBA. First, we describe the concept of a probability box. Then, we introduce the formulas for a probability box given varying levels of available minimal data.
Probability box
As above, we suppose that the imperfect or lack of knowledge about a parameter () can be characterized by a random variable endowed with a CDF . Instead of being restrictive in the context of limited data, PBA assumes that is unknown or cannot be precisely specified and introduces the concept of a probability box or p‐box. A probability box of a continuous random variable with an unknown CDF is an interval, , which consists of all CDFs, including the unknown , that are: 1) bounded by a pair of bounding functions, that is, a lower‐bounding function (LBF) and an upper‐bounding function (UBF) and 2) consistent with a minimal data (where denotes the set of available data or information on the statistics of the unknown CDF).
,
The UBF and LBF have the following properties:and are CDFsfor in the support ofand form the “tightest” boundsand are consistent withWe say that a CDF of is consistent with the minimal data if each element in can be equated to a statistic that is derivable from the CDF. Under the PBA framework, the epistemic uncertainty is given by: for every possible realization of , we can only assign an interval of CDF values, , in contrast to a single CDF value. As we accumulate more and more data on the parameter, the epistemic uncertainty is reduced, and the interval will eventually shrink to a single CDF.
P‐box formulas for different
We consider commons situations of data availability where a modeler can identify and specify a combination of different summary statistics of and/or information on that constitutes a particular minimal data . We show the derivation of one formula (Equation 7) as an exemplar in Appendix A.0.1. We also show one proof of a p‐box providing the tightest bounds on the unknown CDF, among all other pairs of bounding functions, given (Appendix A.0.2).The first situation involves knowing the smallest (minimum) and largest (maximum) values of a parameter. For some parameters, one can infer the range from the theoretical limits, such as zero to one for probability or utility parameters. In some cases, a modeler may ask domain experts to specify a range from their knowledge about the quantity in question. In both cases, the task will set a minimum and a maximum such that the value of a parameter lies in the interval . The p‐box is given by:
for LBF, and,
for UBF.If, in addition to knowing , the median m of is also known, then the p‐box will be tighter than . The p‐box is given by:
for LBF, and,
for UBF.If, in addition to knowing , the mean is also known, then the p‐box is given by:
for LBF, and,
for UBF.If, in addition to , and , we have data on the SD , then the p‐box is given by:
for LBF, and,
for UBF, where and .In principle, as we have additional summary statistics on or more information about the unknown , the p‐box becomes tighter (Figure 1). Explicit formulas for other s have a complex form (see Appendix A.0.3 for an example where ) and are generally difficult to derive.
In general, we can derive further cases by intersecting the p‐boxes of different s described above (termed as primitive p‐boxes) by “tracing” the minimum (or maximum) of the intersection of the corresponding UBFs (or LBFs). More formally, for each (where d indexes each combination of available data), the LBF and UBF are given by:
and
respectively.
FIGURE 1
P‐boxes (solid) with different s and a normal CDF (dashed). CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]
P‐boxes (solid) with different s and a normal CDF (dashed). CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]
PARAMETER UNCERTAINTY PROPAGATION
In a modeling study, we typically have heterogeneity in the amount of and indirectness or imprecision in the available data used to estimate . In principle, each , based on the data availability and the chosen representation of its uncertainty, falls into of the following subsets of : (1) for parameters with fixed values (no uncertainty), (2) for parameters known up to their precise CDFs, and (3) for parameters known up to their s. In some cases, practitioners may have access to information that is sufficient for specifying probability distributions of parameters (). This section presents the parameter uncertainty propagation step of PBA in the context of only and a mix of and . First, we describe the intuition behind the propagation and proceed with an algorithm.
Propagating p‐boxes
We recall that uncertainty propagation in PSA works as follows: each parameter value is sampled from its precise CDF, typically using the inverse transform sampling method if the inverse of the CDF is explicitly known. For PBA, we use the same idea with one modification, that is, we sample an interval of values instead of a single value. The sampling scheme loosely mimics the inverse transform sampling. For each (the image of the CDF), we “sample” an interval by using the inverses of the LBF and UBF. To mitigate the computational burden, instead of sampling the intervals for all values in , we partition the image of the CDF into finite sub‐intervals. For each sub‐interval and its endpoints, we calculate the corresponding interval of parameter values using the inverse p‐box. The choice of how to evaluate the endpoints of the sub‐interval, for example, using the LBF (UBF) for the upper (lower) endpoint, determines the accuracy of the approximation due to discretization. We assign a probability to the interval based on the length of the sub‐interval. We then repeat the process for each parameter. Since there are multiple possible realizations (equal to the number of sub‐intervals) for each parameter, we need to consider all possible combinations of sub‐intervals across all parameters, for example, using a Cartesian product. The probability of each possible combination (henceforth termed as a hyperrectangle) is computed by multiplying the probabilities assigned to the sub‐intervals comprising the combination because of our independence among parameters assumption. After specifying an approach for sampling hyperrectangles from p‐boxes, we need to prescribe a method to evaluate our model using the sampled intervals. One approach is based on optimization, where the goal is to find a pair of optima, that is, the minimum and maximum values of the model outcome for each possible hyperrectangle. The probability of observing a pair of optimum values is equal to the sampling probability of the corresponding hyperrectangle. To obtain the p‐box of the model outcome, we cumulate the probabilities of each minimum (maximum) to derive the UBF (LBF).Formally, to propagate the uncertainty of the set into , we proceed with the following steps. First, for each , we derive its p‐box given . Next, we specify the approach for generating interval‐valued samples. To build intuitions about how the sampling works, we use a full‐factorial‐design approach, that is, the slicing algorithm with outer discretization.
For each , the interval is partitioned into sub‐intervals with its corresponding probability mass () such that . For the jth sub‐interval, we identify its lower and upper boundary points, that is, and , respectively. Given each pair of boundary points, we calculate the boundary points in the domain by using the inverse or quasi‐inverse of the p‐box:These inverse or quasi‐inverse functions are derived from their corresponding LBF and UBF (Appendix A.0.4), where the quasi‐inverses are due to the fact that some LBFs and UBFs are not strictly injective functions. Equation 11 corresponds to a particular choice of a discretization, that is, an outer discretization approach (Figure 2). The intervals and their associated collectively form the p‐box of .
FIGURE 2
Outer discretization approach for approximating the p‐box of the model parameter . Sub‐figure A shows the sampling of an interval using the quasi‐inverse of the p‐box, given a particular sub‐interval in . Sub‐figure B and C Show the accuracy of the approximation when using, and , respectively [Colour figure can be viewed at wileyonlinelibrary.com]
Outer discretization approach for approximating the p‐box of the model parameter . Sub‐figure A shows the sampling of an interval using the quasi‐inverse of the p‐box, given a particular sub‐interval in . Sub‐figure B and C Show the accuracy of the approximation when using, and , respectively [Colour figure can be viewed at wileyonlinelibrary.com]We denote as the set of multi‐indices where each multi‐index corresponds to a particular combination of sub‐intervals of all s in :
where n is the number of parameters in and indexes the sub‐intervals for . For each , we denote as a hyperrectangle (ie, a Cartesian product of intervals) which is given by:For each we calculate its probability mass as follows:
where corresponds to the probability mass associated with the th subinterval of . Equation 14 represents our assumption about the independence among model parameters since the dependence structure depends on how the probabilities in the Cartesian product are computed. In our case, we multiply the marginal probabilities to get the probabilities of each , which is akin to assuming random set independence.For each , we associate two optimization problems whose solutions provide the bounds on a quantity of interest (model outcome) Y:
for a particular set of values of s in and . The existence of a maximum and a minimum is guaranteed by the Weierstrass extreme value theorem
since, in decision‐analytic modeling, the model is typically smooth and is closed and bounded (compact). The p‐box of Y is therefore characterized by a collection of and its corresponding probability mass . The empirical p‐box of Y can be calculated as:
where indexes all elements in , denotes the number of elements in , and is an indicator function.
Propagating p‐boxes and precise CDFs
To propagate uncertainty from both sets and into , we proceed in two steps. First, since the uncertainty of each parameter in can be characterized by a precise CDF, the uncertainty propagation reduces to a Monte Carlo approach,
a repeated sampling from a joint distribution of parameters in (if their dependencies are known). Let be the joint distribution. Repeated samplings from will generate a sequence of samples of : , where N is the total number of Monte Carlo samples. Second, for each sample (l indexes the parameter in ) and each (Equation 13), we solve the following optimization problems:
and derive and using Equation 16. The p‐box of Y is then calculated by averaging over the N samples:Alternatively, if is relatively linear, we can fix the values of in at their mean values. This approach avoids the use of repeated sampling and reduces the computational time.
APPLICATION OF PBA
This section describes how practitioners can utilize the results of uncertainty propagation using PBA (Equations 16 and 18). First, we introduce notations to fix ideas. Then, we describe an application in decision analysis.
Formalism of a decisional problem
A typical decision‐making problem in health domains consists of: 1) m competing interventions (eg, new drug vs usual care), (); 2) n decision‐relevant outcomes (eg, life expectancy or lifetime cost), (; 3) a mathematical model to evaluate the effect of a on as in Section 2.1, ; 4) k model parameters, (); 5) measures of knowledge or uncertainty about each parameter (eg, precise CDFs or p‐boxes) and their dependencies; 6) a value (or utility) function, , that integrates the evaluation of each intervention on all s; and 7) a choice function capturing a decision rule for choosing the (or set of) optimal intervention(s), , where is the set of optimal interventions. For ease of exposition and without loss of generality, we assume that is deterministic. Hence, the states of the world are completely determined by our knowledge about .
Decision analysis with PBA
We recall that the most commonly used decision rule, that is, expected value maximization, requires the specification of CDFs in the context of parameter uncertainty.
Under this choice function, if we can specify all the CDFs , then an intervention is chosen ifSince the propagation of uncertainty in results in uncertainty in (), we write . We note that the calculation of the expected value of over its p‐box results in an interval of expected values. The interval includes all expected values that correspond to CDFs enclosed by the p‐box. This is true because the p‐box of is guaranteed to enclose all CDFs of (assuming that the p‐boxes of the s are properly specified). Therefore, the expected value of for each CDF in the p‐box must lie in the interval that is given by where and are given by and , respectively. Given what we know and assume about the uncertainty of the model parameters, the expected utilities can not be larger (smaller) than (). Furthermore, is not endowed with an uncertainty measure, that is, we cannot say the relative plausibilities of each value in the interval. Therefore, we cannot use the expected value maximization for PBA. Instead, the decision rule is based on finding the optimal intervention by comparing the intervals of all interventions ().Suppose that we have two competing interventions, that is, and , with their corresponding intervals and , respectively. We conclude that is preferred to if:andThe in the Hurwicz decision criterion captures a decision‐maker's relative attitude towards being overly pessimistic. The choice of the decision rule is decisional‐problem dependent and is typically driven by the type of outcomes and the decisionmaker's risk preference.
CASE STUDIES
We conduct two case studies. The first case study uses a hypothetical Markov cohort model to examine the characteristics of PBA and demonstrate the difference between PBA and PSA. The second case study is based on a published early assessment of the cost‐effectiveness of a computer‐assisted total knee replacement in the absence of clinical trial data.
The models are coded in R
and available under a GNU GPL license and can be found at https://github.com/rowaniskandar/PBA.
Case study 1
We consider a generic four‐state stochastic (Markov) cohort model as our , which is commonly used in DAM and CEA studies,
with the following health states (Figure 3), where is an absorbing state. We assume that the probability distributions for the rates of transitions (), (), and () are known or that our knowledge is sufficient for precise specifications of CDFs. We fix the values of the four parameters at their mean values: , , , and . For (rates of transitions ) and (rates for ), we conduct two sets of comparisons. First, we compare the following scenarios: 1) the PBA scenario where the uncertainties in and are modeled using p‐boxes with and , respectively, and 2) the uncertainties in both rates are precisely specified using a gamma distribution with the same s as in the PBA scenario. For the latter scenario, the uncertainty propagation follows the PSA approach. This comparison demonstrates the effect of different degrees of conservatism, that is, precise vs imprecise CDFs, on the resulting uncertainty in the model outcome. For the second comparison, we assume that only the minimum and maximum values of and are available to illustrate how PBA treats extreme data sparsity with fewer assumptions when compared to the common practice of using uniform distributions. For the model outcome of interest, we calculate the expected residence time in states other than (Figure 4). For uncertainty propagation with precise CDFs, we use the support point method
for sampling from the gamma and uniform distributions with . For uncertainty propagation using p‐boxes (Equation 16), we apply a deterministic search algorithm based on systematic divisions of the domain (Equation 13) into smaller hyperrectangles
and use the implementation of the nlopt library
in the R program.
FIGURE 3
A state‐transition model diagram used in case study 1
FIGURE 4
Uncertainty around model outcome of 4‐state model using p‐boxes vs precise CDFs for and . Sub‐figure A portrays the comparison of the uncertainties in the model outcome resulting from a p‐box and a gamma distribution. Each (.) corresponds to the different combination of available data on parameter (). As more information is available, the p‐box enclosing the unknown precise CDF becomes tighter. Sub‐figure B illustrates the comparison between a p‐box and a uniform distribution and demonstrates how p‐box is more honest in representing the uncertainty, given information only on the minimum and the maximum values of the model parameters. CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]
A state‐transition model diagram used in case study 1Uncertainty around model outcome of 4‐state model using p‐boxes vs precise CDFs for and . Sub‐figure A portrays the comparison of the uncertainties in the model outcome resulting from a p‐box and a gamma distribution. Each (.) corresponds to the different combination of available data on parameter (). As more information is available, the p‐box enclosing the unknown precise CDF becomes tighter. Sub‐figure B illustrates the comparison between a p‐box and a uniform distribution and demonstrates how p‐box is more honest in representing the uncertainty, given information only on the minimum and the maximum values of the model parameters. CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]The first comparison shows the difference between the results of a parameter uncertainty propagation into a model outcome using precise CDFs (gamma distribution) vs p‐box. A PBA results in a p‐box enclosing the unknown CDF of the model outcome instead of a precise CDF (Figure 4A). The p‐box gives additional information: (1) the amount of uncertainty in the model outcome due to our imperfect or complete lack of knowledge about some model parameters, which is indicated by the area enclosed by the p‐box and (2) the plausible values of the model outcome, which is indicated by the model outcome values with non‐zero probabilities. The latter suggests the minimum and maximum achievable values of the model outcome. We also note that the accuracy of the empirical LBF and UBF increases with the number of sub‐intervals () of each parameter (Figure 5). The second comparison showcases the implications of how uncertainty due to a severe lack of data about parameter values is modeled (Figure 4B). Uncertainty propagation with a uniform distribution results in a model outcome's CDF that gravitates towards a central tendency and, essentially, “eliminates” our ignorance. In contrast, the result of PBA preserves our ignorance. Furthermore, we observe that the plausible values of the model outcome under uniform distributions are concentrated in the leftmost region of the support, thereby discounting the possibility of having high values. Conversely, PBA produces bounds on the model outcome while maintaining the plausibility of a wide range of values. This observation highlights the potential peril of assuming a precise form of a CDF, particularly when the model outcome represents an undesirable outcome.
FIGURE 5
The accuracies of the approximations of the p‐box of the model outcome as a function of the increasing numbers of sub‐intervals () (as indicated by the numbers in the parentheses) for each parameter in , given data on a, b
, and . CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]
The accuracies of the approximations of the p‐box of the model outcome as a function of the increasing numbers of sub‐intervals () (as indicated by the numbers in the parentheses) for each parameter in , given data on a, b
, and . CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]
Case study 2
We replicate a published cost‐effectiveness analysis of a computer‐assisted total knee replacement (CA‐TKR) vs a conventional TKR.
We develop a Markov model with the following states: TKR operation for a knee problem, normal health after primary TKR, TKR with minor complications, TKR with serious complications, simple revision operation for treating complications, complex revision operation for treating complications, other non‐revision treatments for complications, normal health after TKR revision, and death. The analytical period is 10 years with a monthly time‐step. For the transition probabilities which could not be estimated from available data, that is, transitions to serious complication from minor complication or other treatment, transitions to minor complication from other treatment or serious complication, and transitions to simple revision from other treatment or vice versa, the authors assumed that their values are identical to the estimated mean values for the same transitions from other states. We relax these assumptions and, instead, subject the six transition probabilities and the efficacy of CA‐TKR to an uncertainty analysis using PBA and PSA with data only on the mean, minimum, and maximum values. For probabilities and the efficacy parameters, we use beta and gamma distributions, respectively. Since the study does not report the variances, we assume that the SD is of the mean value. We conduct two uncertainty analyses in which we vary the minimum and maximum values (ranges) of the seven parameters of interest, that is, the ranges reported in the study and wider ranges of values (ten times the original ranges). For the other parameters, we fix them at their mean values (see tab. 2 in the original study
). For the cost‐effectiveness measure, we calculate the incremental net monetary benefit (INMB) and estimate its empirical CDF (PSA) and p‐box (PBA), given a willingness‐to‐pay threshold of £30 000 per quality‐adjusted life year. The cost‐effectiveness analysis is conducted from the National Health Services' perspective and uses as the discount rate. We use all the data and assumptions that are reported in the study and make reasonable assumptions whenever the data is not available in the published article. For more details on the model structure and estimation and their assumptions, we refer the readers to the original study.Using the PSA approach where we assume precise specifications of the CDF and use the published ranges, the CDF of the INMB lies entirely to the right of zero in Figure 6, that is, CA‐TKR is always cost‐effective at the given willingness‐to‐pay threshold. In contrast, the PBA approach using the same ranges results in a p‐box of the INMB, a marginal part of which is to the left of the zero line, that is, CA‐TKR is not cost‐effective at the given willingness‐to‐pay threshold. We observe a wider range of plausible INMB values for PBA when compared to that of PSA, that is, , vs [, which indicates more uncertainty in the cost‐effectiveness of the CA‐TKR. If we consider a wider range of possible values for each of the seven parameters, the p‐box stretches to minimum and maximum plausible values of ‐£10 509 248 and £383 278, respectively. This observation indicates that the uncertainty in the INMB is sensitive to the assumed ranges of values. Therefore, the cost‐effectiveness of CA‐TKR is overestimated when we assume rather narrow ranges for model parameters for which we lack reliable data.
FIGURE 6
Uncertainties around the incremental net monetary benefit of computer‐assisted vs conventional total knee replacement surgeries using (1) precise CDFs, (2) p‐boxes with published minimum and maximum values, and (3) p‐boxes with extreme minimum and maximum values. The dashed vertical line lies at the zero INMB. The ranges of plausible values are [,
, and
for precise CDF, p‐box (published ranges), and p‐box (extreme ranges), respectively. CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]
Uncertainties around the incremental net monetary benefit of computer‐assisted vs conventional total knee replacement surgeries using (1) precise CDFs, (2) p‐boxes with published minimum and maximum values, and (3) p‐boxes with extreme minimum and maximum values. The dashed vertical line lies at the zero INMB. The ranges of plausible values are [,
, and
for precise CDF, p‐box (published ranges), and p‐box (extreme ranges), respectively. CDF: cumulative distribution function [Colour figure can be viewed at wileyonlinelibrary.com]
DISCUSSION
This study introduces the probability bound analysis method for quantifying the effect of parameter uncertainty on decision‐relevant outcomes that is distribution‐free. This article is the first study that examines the utility of PBA in DAM and CEA studies. Although our contribution focuses mainly on medical decision‐making and economic evaluation fields, the methodologies apply to many studies using mathematical models to inform policy decisions.
To assist practitioners, we provide p‐box formulas for the most common situations of data availability. We show an approach for propagating p‐boxes into a black‐box model where the uncertainty of the model parameters is characterized by a combination of p‐boxes and precise distribution functions. We conduct two case studies to demonstrate the methodological characteristics and practical application of PBA.
Advantages of PBA
The novel approach allows practitioners to conduct probabilistic assessments even when extremely little reliable empirical information is available about the distributions of model parameters. In PBA, parameter uncertainties are characterized by p‐boxes that provide the maximum area of uncertainty (tightest bounds) containing the unknown distribution function, given knowledge about the summary statistics of the parameters. For basic binary operations (addition, subtraction, multiplication, and division),
,
the derived p‐box of a model outcome is optimal in the following sense: one can not find other tighter bounds without excluding some of the plausible CDFs. However, p‐box computations using basic operations cannot be easily extended to black‐box models.
Nevertheless, the uncertainty propagation of p‐boxes into a black‐box model, using optimization (Equation 15), generates bounds that are guaranteed to enclose all possible CDFs of the model outcome provided that the parameter p‐boxes enclose their respective distributions, without the assurance of the optimality of the bounds.
PBA is based on two existing approaches. First, we believe that a parameter value can be bounded in some intervals without specifying the relative plausibility over the interval (interval analysis
). Second, we assert that the parameter uncertainty can be represented by a probability (probability theory
). When taken together, PBA models the uncertainty using a CDF, but the CDF is not precisely specified and assumed to be located within an interval containing all possible CDFs. In a way, a PBA gives an identical answer as an interval analysis whenever the range is the only accessible information. If the lower and upper bounds of a CDF coincide for every element in the support, then a p‐box degenerates to a CDF: a situation where Monte Carlo simulation is the standard approach. Therefore, a PBA is a generalization of the two standard approaches for representing parameter uncertainty. In addition, a PBA is an improvement over both approaches for situations where one approach is not sufficient by itself.One decision‐relevant information from the results of a PBA, as demonstrated in our case studies, is the bounds on the plausible values of a model outcome. This information is particularly useful when the model outcome represents a negative outcome (or a catastrophic event).
The p‐box of a model outcome suggests that the outcome will not be smaller (or larger) than a minimum (or maximum), which can be identified by the infimum (supremum) of the support of the UBF (LBF). The standard approach in DAM and CEA studies, that is, the (over‐) reliance on using “off‐the‐shelf” probability distributions for characterizing uncertainties about model parameters, may potentially lead to an underestimation of the probability of observing extreme values of the model outcome. Using probability distributions may also assume more information about uncertainty than that is supported by the current evidence base. These errors in estimating probabilities in the context of insufficient data or a complete lack of knowledge may contribute to overconfidence and lead to a failure to insure ourselves against highly consequential risks.
Our first case study also highlights the consequence of using a uniform distribution, the most common approach for modeling ignorance about a parameter. Although using a uniform distribution may be justifiable as the embodiment of the principle of indifference,
,
this “all are equally likely” assumption significantly discounts the possibility of the extremes. On the other hand, PBA can, loosely speaking, transfer our ignorance about parameter values to ignorance about a model outcome. The second case study
represents a real‐world setting where we lack the data to inform some of the key parameters, including the efficacy of the novel technology and probabilities of adverse events. The authors of the original study failed to adequately represent the uncertainties in these parameters by prescribing narrow ranges. In our re‐analysis, our PBA approach yields a wider p‐box of the INMB (more uncertainties) when assuming wider ranges of values. Moreover, the PBA does not require any assumptions about the SDs (cf. Equations 5 and 6). Although we are not able to exactly replicate the published results due to missing information on the variances, our qualitative result is still valid. Regardless of the value of the variances, the conclusion on whether CA‐TKR is cost‐effective is sensitive to the assumed ranges of values.
Computational costs
PBA is computationally intensive for the following reasons. First, an implementation of a PBA requires an optimization step over the p‐box. In this study, we use a full factorial design that transforms the problem of propagating a p‐box into propagation of a large number of intervals. The higher the required level of accuracy is, the higher number of intervals is needed. Furthermore, an increase in the number of p‐box parameters will lead to a higher‐dimensional optimization problem. Second, the computational burden is further exacerbated if the black‐box model () is “expensive” to evaluate for a given . Third, if, in addition to p‐box parameters, some parameters are characterized by their CDFs, the optimization step is embedded in a Monte Carlo sampling loop (Section 4.2); thereby increasing the number of optimizations by a factor of N (the total number of Monte Carlo samples). To mitigate the computational burden, users of PBA may opt to use less conservative p‐box propagation approaches,
more efficient optimization methods,
and fast‐to‐evaluate approximations of the original model or meta‐models.
Nevertheless, we expect a higher computational burden since a PBA imposes fewer restrictions (ie, we do not assume a functional form), leading to a larger region of uncertainty over which a model needs to be evaluated.
Relation to other methods
PBA is generally regarded as one of the uncertainty quantification approaches related to the theory of imprecise probability.
,
In particular, a p‐box is closely connected to Dempster‐Shafer's theory of evidence.
,
,
The LBF and UBF can be interpreted as belief and plausibility measures for the event taking values than a particular value .
In Dempster‐Shafer's theory, the belief function describes the minimum amount of probability that must be associated with the event, whereas the plausibility function describes the maximum amount of probability that might be associated with the same event. The PBA framework is also related to Bayesian sensitivity analysis (or robust Bayesian analysis).
In this approach, an analyst's uncertainty about which prior distribution and likelihood function should be used is characterized by an entire class of prior distributions and likelihood functions. The analysis proceeds by studying various outcomes for each possible combination of prior distribution and likelihood function. Another distribution‐free approach is the Chebyshev inequality
that can be used to compute bounds on the CDF of a random variable, given the mean and SD of the random variable. However, the inequality cannot produce a tighter bound even if we have more data (eg, median). Kolmogorov‐Smirnov (KS) confidence limits
also provide distribution‐free bounds on an empirical CDF. The calculation of KS limits requires, however, requires access to sample data.
Limitations
Our study has limitations in the following context. First, we assume independence among the model parameters. To the extent of our knowledge, how to model dependencies among the parameters in the context of uncertainty propagation using PBA and black‐box models is an open problem. One potential approach for modeling dependencies among parameters is to use a copula to represent the joint uncertainty of all parameters.
A copula approach factors the joint CDF into a product of independent marginal CDFs and a copula that capture the dependencies. In this formulation of bounds using a joint CDF, the overall bound is a function of the bounds on CDFs for some parameters represented by their p‐boxes and the bounds of the copula. The potentially promising approach using copula warrants further study and is, however, beyond the scope of our study. Second, our study does not address the question of when one should consider using p‐box vs assuming a particular CDF to characterize uncertainty. Instead of being prescriptive, we defer such decisions to the analysts because the level of uncertainty at which a p‐box is the preferred approach is problem‐dependent. For example, a parameter may be highly uncertain due to the lack of empirical data and/or previous knowledge and, at the same time, non‐influential, that is, the model outcome is not sensitive to variations in the parameter values. Third, we provide a rudimentary treatment on how to make decisions using the results of a PBA. In situations where best‐case/worst‐case results are the basis for decision making, the analytical interval approach is preferred to assuming a distribution (eg, uniform) and performing a simulation, particularly when that distribution may not correctly describe the parameters. A more comprehensive treatment of decision‐making based on interval values or bounds on probability distributions is needed, and it should be a focus of future studies on uncertainty quantification in decision‐analytic modeling and cost‐effectiveness analysis.
CONCLUDING REMARKS
This study addresses limitations in current methodologies for characterizing uncertainty in data and knowledge used to inform mathematical models. The novel methodology maximizes the use of existing limited information with the fewest number of assumptions and provides a way to honestly characterize the uncertainty in the model parameters distributions used in decision‐analytic modeling and cost‐effectiveness analysis studies.
Authors: Claire Rothery; Karl Claxton; Stephen Palmer; David Epstein; Rosanna Tarricone; Mark Sculpher Journal: Health Econ Date: 2017-02 Impact factor: 3.046
Authors: Gillian D Sanders; Peter J Neumann; Anirban Basu; Dan W Brock; David Feeny; Murray Krahn; Karen M Kuntz; David O Meltzer; Douglas K Owens; Lisa A Prosser; Joshua A Salomon; Mark J Sculpher; Thomas A Trikalinos; Louise B Russell; Joanna E Siegel; Theodore G Ganiats Journal: JAMA Date: 2016-09-13 Impact factor: 56.272
Authors: Alexandra G Ellis; Rowan Iskandar; Christopher H Schmid; John B Wong; Thomas A Trikalinos Journal: Stat Med Date: 2020-08-11 Impact factor: 2.373
Authors: Saskia den Boon; Mark Jit; Marc Brisson; Graham Medley; Philippe Beutels; Richard White; Stefan Flasche; T Déirdre Hollingsworth; Tini Garske; Virginia E Pitzer; Martine Hoogendoorn; Oliver Geffen; Andrew Clark; Jane Kim; Raymond Hutubessy Journal: BMC Med Date: 2019-08-19 Impact factor: 8.775