Amirsaman H Bajgiran1, Mahsa Mardikoraem2, Ehsan S Soofi2. 1. Department of Industrial and Manufacturing Engineering, University of Wisconsin-Milwaukee, Milwaukee, WI 53201, USA. 2. Sheldon B. Lubar School of Business, University of Wisconsin-Milwaukee, Milwaukee, WI, 53201, USA.
Abstract
Quantiles are available in various problems for developing probability distributions. In some problems quantiles are elicited from experts and used for fitting parametric models, which induce non-elicited information. In some other problems comparisons are made with a quantile of an assumed model which is noncommittal to the quantile information. The maximum entropy (ME) principle provides models that avoid these issues. However, the information theory literature has been mainly concerned about models based on moment information. This paper explores the ME models that are the minimum elaborations of the uniform and moment-based ME models by quantiles. This property provides diagnostics for the utility of elaboration in terms of the information value of each type of information over the other. The ME model with quantiles and moments is represented as the mixture of truncated distributions on consecutive intervals whose shapes and existence are determined by the moments. Elaborations of several ME distributions by quantiles are presented. The ME model based only on quantiles elicited by the fixed interval method possesses a useful property for pooling information elicited from multiple experts. The elaboration of Laplace distribution is an extension of the information theory connection with minimum risk under symmetric loss functions to the asymmetric linear loss. This extension produces a new Asymmetric Laplace distribution. Application examples compare ME priors with a parametric model fitted to elicited quantiles, illustrate measuring uncertainty and disagreement of economic forecasters based on elicited probabilities, and adjust ME models for a fundamental quantile in an inventory management problem.
Quantiles are available in various problems for developing probability distributions. In some problems quantiles are elicited from experts and used for fitting parametric models, which induce non-elicited information. In some other problems comparisons are made with a quantile of an assumed model which is noncommittal to the quantile information. The maximum entropy (ME) principle provides models that avoid these issues. However, the information theory literature has been mainly concerned about models based on moment information. This paper explores the ME models that are the minimum elaborations of the uniform and moment-based ME models by quantiles. This property provides diagnostics for the utility of elaboration in terms of the information value of each type of information over the other. The ME model with quantiles and moments is represented as the mixture of truncated distributions on consecutive intervals whose shapes and existence are determined by the moments. Elaborations of several ME distributions by quantiles are presented. The ME model based only on quantiles elicited by the fixed interval method possesses a useful property for pooling information elicited from multiple experts. The elaboration of Laplace distribution is an extension of the information theory connection with minimum risk under symmetric loss functions to the asymmetric linear loss. This extension produces a new Asymmetric Laplace distribution. Application examples compare ME priors with a parametric model fitted to elicited quantiles, illustrate measuring uncertainty and disagreement of economic forecasters based on elicited probabilities, and adjust ME models for a fundamental quantile in an inventory management problem.
Quantile information is available in various decision and statistical problems. Experiments on accuracy of elicitation of probability distributions have shown that subjects can assess median reasonably more accurate than the mean (Garthwaite, Kadane, & OH́agan, 2005). Variable interval method and fixed interval method are used for eliciting quantiles and probabilities. Central banks such as the Federal Reserve Bank of Philadelphia, European Central Bank, the Bank of England, and other institutions survey economic forecasters for eliciting subjective probabilities for a set of fixed intervals. These probabilities are used to produce economic outlook reports for policy makers. The research area of measuring the uncertainty and disagreement of economic forecasters is primarily based on using these probabilities for developing other aspects of forecast distribution of variables such as the growth of Gross Domestic Product (GDP); Lahiri and Wang (2019) and Shoja and Soofi (2017) provide reviews and latest developments on this topic. In operations and business decision problems a quantile appears as the solution to the minimum risk (expected loss) under the asymmetric linear loss function such as the newsvendor problem, where the optimal order quantity that maximizes the profit is a quantile of the demand distribution (Snyder and Shen, 2011).Quantiles provide partial information about a probability distribution. Traditionally, this type of partial information is used to fit parametric probability distributions. The information theory literature primarily has considered developing distributions based on various types of moments (Jaynes, 1957, Jaynes, 1968). The use of probability allocation has been sporadic (Brockett, Charnes, Paick, 1995, Ebrahimi, Soofi, Soyer, 2008); quantile is only noted (Asadi, Ebrahimi, Soofi, & Zarezadeh, 2014). This paper aims to fill this void.Two information measures are used for deriving models. The Shannon entropy of a quantity X with continuous distribution F and a probability density function (PDF) f is defined bywhere is the range of X, provided that the integral is finite. The Kullback-Leibler (KL) information divergence between f and f is defined byprovided that f is absolutely continuous with respect to f ( whenever ); the inequality becomes equality if and only if almost everywhere. Unlike the entropy, K(f: f) is invariant under one-to-one transformations of X for the continuous case. This measure has been used in operations and decision problems (see, for example, Alwan, Ebrahimi, and Soofi (1998); Asadi, Ebrahimi, and Soofi (2018); Plischke, Borgonovo, and Smith (2013); Saghafian and Tomlin (2016)). The Minimum Discrimination Information (MDI) principle minimizes K(f: f) where f is in a class of distributions Ω and f ∉ Ω. When f is uniform (proper PDF or improper), the MDI coincides with the Maximum Entropy (ME) principle (Jaynes, 1957). For the continuous case, the ME is defined as maximization of where f is interpreted as the “invariance measure” function (Jaynes, 1968). In the game theory problem of selecting a distribution where the utility is defined in terms of the score function, the MDI and ME models are minimax decisions (Grünwald, Dawid, 2004, Smith, 1974). Jaynes (1968) advanced assessing the information value of additional constraints by comparing the entropies of two discrete ME models obtained from information sets of two experts. Soofi, Ebrahimi, and Habibullah (1995) extended this idea through the notion of information distinguishability defined by equality between K(f: f) and the entropy difference where f ∈ Ω and f is the ME model.We consider the following three classes of partial information about f on continuous support:
where T(X) is integrable with respect to f, (A) is the indicator function of the event A, and are quantiles; (3) and (4) will be referred to as moment information and quantile information (QI) and Ω is the set of PDFs that include both types of information.Fig. 1
gives a schematic representation of the modeling approach of this paper. Distributions in (3) and (4) are embedded in the larger family (5). Such larger families are called elaboration of the smaller models (Box & Tiao, 1973) indexed by the elaboration parameter (Carota, Parmigiani, & Nicholas, 1966). In Fig. 1
f
∈ Ω is elaboration of f and f indexed by α and θ, respectively. Carota et al. (1966) proposed a Bayesian KL diagnostic for measuring the utility of elaboration. Thus far, elaborations are chosen arbitrarily for model evaluation. Including additional moments to ME and MDI models is well-known, but is not characterized as being the minimum elaboration. This research develops minimum elaborations of ME models by α and θ according to the MDI principle.
Fig. 1
Maximum entropy minimum elaboration modeling; f* and f are ME and MDI models.
Maximum entropy minimum elaboration modeling; f* and f are ME and MDI models.This research develops ME minimal elaborations of ME models and diagnostics (highlighted parts of Fig. 1). The ME models and are elaborations of the uniform distribution and by the QI, respectively. We show that the MDI updating of and relative to each other give the equivalence of and and characterizes as the minimum elaboration of the smaller ME models. These MDI adjustments of and for satisfying the constraints in Ω provide diagnostics for assessing their information values and the utility of the minimum elaboration. A result represents as the mixture of truncated PDFs on supports formed by consecutive quantiles whose shapes and existence are determined by the type of moments. This representation facilitates computation of the moments and entropy of and simulations from this model. Elaborations of the classical ME models (uniform, exponential, and normal) by QI are given. A result explores a useful property of for aggregating quantiles elicited from multiple experts by the fixed interval method. The elaboration of Laplace distribution extends the notion of minimum risk ME models under symmetric loss (Ardakani et al., 2017) to an asymmetric loss and produces a new Asymmetric Laplace (AL) distribution. The elaboration of a well-known AL is also given.Applications are illustrated for a wide range of problems with a common theme: ME models with QI purely carry the given information into the results because of being “maximally noncommittal with regard to missing information” (Jaynes, 1957). Three application areas illustrate cases where non-elicited information is included or existing information is excluded. The ME prior, is piecewise uniform and proper on a bounded range, which is most directly in accord with Laplace’s principle of “insufficient reason”. The range can be set wide like fitting parametric models such as gamma priors to quantiles elicited for a finite range. However, unlike the parametric families, does not induce non-elicited information. Model fitting also is used for measuring uncertainty and disagreement of forecasters where probabilities are elicited by the fixed interval method. The result for the ME model based on QI with fixed intervals is particularly suitable for this problem. This is illustrated for the Federal Reserve Bank of Philadelphia’s Survey of Professional Forecasters. The case of excluding existing information is seen in the newsvendor (NV) problem of inventory management. The ME principle has been invoked for justifying the classical ME distributions as models for the demand distribution (Andersson, Jörnsten, Nonås, Sandal, Ubøe, 2013, Perakis, Roels, 2008), where a profit maximizing QI that appears in the analysis is not accounted for. We illustrate updating of these classical ME models in light of this information. The new minimum risk ME model and the elaboration of the known AL model are particularly useful for the NV problem.The paper is organized as follows. Section 2 gives preliminaries and ME models with moment information. Section 3 presents the minimum elaboration of ME models by the QI. Section 4 illustrates potential applications of QI for developing Bayesian prior, measuring uncertainty and disagreement of economic forecasters, and reliability and inventory problems. Section 5 gives some concluding remarks. Computational details are available in a Supplementary Document.
Preliminaries
The ME model in Ω, if exists, is unique and has PDF in the following form:where are Lagrange multipliers andis the normalizing factor for and see, for example, Soofi et al. (1995). The existence of the ME model is determined by the finiteness condition (7).Well-known families of probability distributions are ME models with various types of moment information. On the finite range an ME distribution always exists. The best known examples are the uniform distribution [a, b] with the range information and the beta distribution [0,1] with two geometric means: ; hereafter the range information will be omitted. On the nonnegative range, well known examples of ME models are the exponential distribution with the gamma distribution with and and the log-normal distribution with . On the unrestricted range, the ME model with when J is an odd number does not exist. Examples of the ME models with even J are the normal with and the quartic exponential with (Zellner & Highfield, 1988).If is the ME model, then for any f ∈ Ω with finite entropy,provided that f is absolutely continuous with respect to (Soofi et al., 1995). Due to the uniqueness of
f is distinguishable with f* ∈ Ω if and only if it reduces the maximum entropy.The information distinguishability (ID) index of f is defined by normalized KL measureWhen f is also an ME in a subset of Ω with more constraints, (8) provides the information values of additional constraints. Asadi et al. (2014) have used this diagnostic for assessing the information value of additional moments in a ME problem. Mc Culloch (1989) proposed a calibration of the KL divergence based on the KL between the Bernoulli distributions for the outcome of a biased coin with π > 0.5 and a fair coin with . This measure is represented by
Symmetric minimum risk ME models
The Laplace and normal distributions are ME models consistent with for respectively. Ardakani et al. (2017) discussed the ME problem consistent with the minimum decision-theoretic risk of predicting X under the general symmetric loss function on the unrestricted range. In the statistical decision theory, the consequence of a decision d about an unknown quantity X is a loss function L(X, d) ≥ 0 and the risk function of a decision rule is defined by (Berger, 1985). The optimal decision in a set of possible decisions is defined byThe ME problem consistent with the minimum risk of decision, d, is as follows:Letwhere be the minimum risk of the optimal decision. The ME model in if exists, is given byThe optimal decision under the quadratic loss is the mean, and the risk is the variance, so the minimum risk ME problem includes the first two moments. On the unrestricted range, the minimum risk ME model is normal, but the solutions on the finite and nonnegative ranges are not so straightforward; these cases will be discussed in Section 2.2. The optimal decision under the absolute loss (symmetric linear loss function ) is the median, so the minimum risk ME problem includes a moment and a quantile; this case will be extended to the asymmetric linear loss function in Section 3.3. (Fleishhacker & Folk (2015) studied the ME model with the expected loss under the loss function where is a probability vector (discrete distribution on N point), is a nonnegative N × N matrix, and .)
First two moments on restricted range
The ME modeling with the first two moments, on a restricted range (finite or nonnegative) is intricate. By (6), the PDF of the ME model, if exists, is in the following form:where the Lagrange multipliers are given by and as follows:However, unlike the unrestricted range, the existence and shape of the ME model depend on the range and the relationship between the moments. Dowson and Wragg (1973) give a rigorous treatment of the problem and provide results that are summarized in our notations as follows.On the finite range the ME model always exists. The left panel of Fig. 2
shows various shapes of the ME PDFs with first two moments on [0,1], where are the regions in the (θ
1, θ
2)-plane shown in the right panel of Fig. 2. For the PDF on range [0,1], (θ
1, θ
2) are bounded in the region defined by . On the lower boundary the ME model is degenerate and on the upper boundary the ME model is a two-point distribution with . On the curve A
2, the Lagrange multiplier for the second moment is zero implying that the ME model is the truncated exponential (TE) distribution with decreasing PDF when θ
1 < .5, increasing PDF when θ
1 > .5, or uniform for . In the interior of the subregion A
1 the Lagrange multiplier for the second moment is negative implying that the ME model is the truncated normal (TN) distribution. In the interior of the subregion A
3 the Lagrange multiplier for the second moment is positive implying that the ME PDF is U-shaped.
Fig. 2
Shapes of the maximum entropy PDFs with first two moments on [0,1] and regions of parameters for the shapes of the distributions: A1, η2 > 0 (truncated normal); (η1 ≠ 0, truncated exponential, Uniform); A3, η2 < 0 (U-shaped).
Shapes of the maximum entropy PDFs with first two moments on [0,1] and regions of parameters for the shapes of the distributions: A1, η2 > 0 (truncated normal); (η1 ≠ 0, truncated exponential, Uniform); A3, η2 < 0 (U-shaped).On the nonnegative range, (12) is integrable only when η
2 ≥ 0 implying that with the first two moments an ME distribution exists if and only if . If in the finite interval case, then at the limit as b → ∞, the ME PDF TN → TE (Dowson & Wragg, 1973). Thus, the shape and existence of the ME model in Ω with the first two moments on x ≥ 0 are determined by the coefficient of variation . (The relationships between the parameters μ and σ and the moments of TN(μ, σ) are given in the Supplementary Document by (S.1) and (S.2)).
Mixture information
Mixture models will be utilized for representing the ME model with QI and for aggregating ME models when partial information is elicited from multiple individuals. LetThe raw moments of can be calculated using the following decomposition:where E(X) is the moment of f. However, the variance of f decomposes as follows:The information measure of a mixture PDF f is the Jensen-Shannon (JS) divergence,
where W stands for and K and H are the KL divergence and Shannon entropy as defined in (2) and (1). The KL divergences K(f: f) are well defined due to the fact that for all i the supports of f are subsets of the support of f, but in general, K(f: f) and H(f) cannot be computed in closed form. A relationship between the Shannon entropy and the KL divergence implies the second equality. It is clear from (17) that the JS is equal to zero if and only if all individual models are identical. The entropy difference (18) gives the uncertainty increase (information decrease) when the pooled distribution is used for representing the set of information of individual distributions.In the sequel, we will also encounter mixtures where the set of supports of f is a partition of the support of f. In these cases application of the entropy decomposition formula (Di Crescenzo & Longobardi, 2002) gives the entropy of f as follows:where denotes the entropy of the discrete distribution that gives probabilities to provided that H(f) is finite for all i.
ME minimum elaboration
The elaboration of by QI is the PDF of the ME distribution in Ω given bywhere the normalizing factor C
is determined by the Lagrange multipliers for the moment and QI constraints in Ω. This ME model gives . Because are determined by all constraints in Ω, in general, they are different from the parameters in (6).The ME model (20) generalizes (6). In the absence of the moment information, (20) gives as the elaboration of the uniform distribution by QI. With the moment information, provides the elaboration of by α and the elaboration of by θ.The KL divergence provides diagnostics for measuring the utility of elaboration (Carota et al., 1966). Because Ω⊆Ω and Ω⊆Ω, the ME model does not satisfy the QI constraints and does not satisfy the moment constraints. The MDI principle provides the minimum adjustments for these model to satisfy both types of constraints. The MDI distribution in Ω, if exists, is unique and has PDF in the following form:where the Lagrange multipliers are for constraints in Ω, and C is the normalizing factor for . Letting and gives the MDI updates of these ME models for satisfying both types of constraints. The following lemma gives the equivalence between and which characterizes the minimum elaboration of and by α and θ, respectively.Let
and
be the ME models and
denote the MDI model with reference to
or
. Then, the MDI updates of
and
for satisfying Ω
and Ω, respectively, are identical to
almost everywhere, where
for all j and k.Application of (8) to result of Lemma 1 establishes the following two ID relationships for updating of each type of information by the other type:
where the inequalities become equalities if and only if and respectively, almost everywhere. By the inequalities in (21) and (22), the PDF is more concentrated than and . Hence, the addition of one type of partial information to the other type leads to an information gain. The KL divergences in (21) and (22) give the minimum utility of the elaboration for and . The entropy reductions in (21) and (22) quantify the information values of each type of partial information (Jaynes, 1968).Consider the partition of the support of f formed by consecutive quantiles in (4):where and . The constraints in (4) can be represented in terms of the following expectations:where . The following proposition gives a new representation of the ME model (20) in terms of the mixture of truncated distributions on the partition of the support (23).Let Ω
be the class of distributions defined in
(5)
. Then the ME model, if exists, is continuous and its PDF has the following
-piece truncated representation:
where
C
and the Lagrange multipliers
for the moment constraints in Ω, and F
.The existence and shape of the ME model is determined by the existence and shapes of . The kernels of the truncated PDFs f are in the form of the ME model but the Lagrange multipliers of f and are different due to the additional QI constraints. Proposition 1 is insightful and facilitates computations of the moments of via (15). Application of the entropy decomposition (19) facilitates computation of diagnostics (21) and (22) via H(f) given by (S.10) of Supplementary Document. Representation (25) facilitates simulations via mixing outcomes generated from .
Elaboration of uniform with QI
The ME model with only QI constraints is in the following mixture of uniform PDFs:where and . This piecewise uniform PDF is a density histogram with unequal bins B. On an unbounded range, additional information such as moments to supplement (24) is needed for the existence of an ME model. In the absence of moment information lower and upper limits for the unbounded bins, X ≤ q
1 and X > q, should be set (like constructing histograms).The entropy of (27) is given by (19) with and where P is distribution with probabilities of the partition sets B. This measure is sensitive to the choice of end points for unbounded bins. Its sensitivity can be easily assessed using p
1log B
1 and . The diagnostics (21) and (22) are invariant for the choice of endpoints.When partial information is elicited from multiple experts pooling the data is in order. The arithmetic pool is commonplace. For developing ME models two options may be followed: (a) develop the ME model that is consistent with arithmetic pool of partial information; or (b) pool the set of ME models that are consistent with the partial information provided by individual respondents. In general, these options provide two different forecast models. For example, the ME model consistent with the arithmetic pools of moments is in the same family as the individual ME models, but is not the mixture of the individual ME models (Shoja & Soofi, 2017). However, the ME model (27) produces the same model for the two options when probabilities are elicited from multiple experts using fixed interval method. In this approach, the range of X is partitioned into intervals and the expert provides the probability that X will fall in each interval.The following proposition highlights this interesting mixture property of ME model (27).Let
denote probabilities assigned by an individual to given bins
and
be the set of individual ME models
(27)
. Then,
where
is the ME model consistent with the average bin probabilities
.In pooling applications, the subscript of stands for its designation as the consensus distribution and the weights are uniform. The divergence and information measures (17) and (18) are invariant for the choice of endpoints for the unbounded bins (Supplementary Document, Section S.4). In the context of pooling, (18) gives the information increment of the set of partial information provided by individuals over their pool.
Elaboration of exponential and normal with QI
The main moment information sets used in ME modeling are the mean and the first two moments. The following corollary gives the ME model with the QI and mean constraints.Let Ω
be the class of distributions on support 0 ≤ x < b ≤ ∞, with
. Then the ME model is the
-piece truncated exponential with the following PDF:
where
and λ is given byWhen QI constraint is included in addition to the first two moments, the ME model is given by Proposition 1, where the shapes and existences of the truncated distributions in (26) are given according to results of Dowson and Wragg (1973), summarized in Section 2.2. Let where Λ: [0, 1] × [0, 1] → [a, b] × [a, b] and the regions are shown in Fig. 2. This is an affine transformation which affects the location and scale of a PDF, but does not affect its shape. The following corollary presents the shape and existence of the ME model with QI in addition to the first two moments for the finite, nonnegative, and unrestricted ranges.Let f(x) be as defined in
(26)
and denote their first two moments by
. Then the shape and existence of the ME model
is determined by
as follows.On the finite range a ≤ x ≤ b,
always exists and its shape is given by
where TN(μ) and TE(θ
1) are truncated normal and exponential with parent distributions
and Exp(θ
1) on the unrestricted and nonnegative range, respectively.On the nonnegative range, the shapes of
on
are as in (a) and its shape and existence on x > qOn the unrestricted range, the shapes of
on
are as in (a) and shapes and existence on
are as in (b), where
.
Elaboration of Laplace models with QI
The ME characterization of the Laplace distribution extends to the family of ME models with expected minimum risk under the following asymmetric linear loss function:where c and c are factors for losses of overestimation and underestimation, respectively. The loss function (33) is used in various problems such as quantile regression estimation and inventory problems. The optimal decision that minimizes E[L(ξ, X)] is given byThe solution of (34) is the quantile of the distribution of X corresponding to and the risk, is the mean absolute deviation from the quantile q.The class of distributions that satisfies the minimum risk under loss function (33) isThe ME model in Ω, defined in
(35)
, on the unrestricted range
is an Asymmetric Laplace (AL) distribution with the following absolutely continuous PDF:
where the Lagrange multipliers for the risk and QI constraints are
and
respectively.Because the second constraint in (35) is redundant. That is, (36) is the ME model in the following class of distributions:The entropy of (36) is given byThe entropy expression maps the intuition that the uncertainty increases with the risk θ and confirms that (36) is also the ME model in (37). The mean and variance of (36) areThe mean exceeds q whenever c < c. The variance provides the same conclusion as the entropy; (computations of the entropy and moments are shown in the Supplementary Document).Proposition 3 establishes the information theoretic link between an AL distribution and the asymmetric linear loss function as the links for the normal and Laplace distributions with the symmetric quadratic and linear loss functions, respectively. The ME model (36) is the PDF of a new AL distribution. It generalizes the symmetric Laplace distribution with .Fig. 3
depicts the PDFs and CDFs of three examples of the ME model given by Proposition 3 with and chosen for the purpose of illustration, but different under and over estimation costs: (solid blue), (dashed blue), and (dotted red); the expression for the CDF is given in the Supplementary Document.
Fig. 3
Plots of the PDFs and CDFs of three ME models given by Proposition 3 (). (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
Plots of the PDFs and CDFs of three ME models given by Proposition 3 (). (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)The following corollary gives the minimum risk ME model for nonnegative range .The ME model in
(35)
on nonnegative range is the truncated continuous AL distribution with the following piecewise PDF:
where the Lagrange multipliers for the risk and QI constraints are given, respectively, byDue to the truncation of the PDF in Proposition 3 for the nonnegative range the normalizing factor for the first branch has been adjusted which makes the PDF (38) discontinuous at . Also the second term in the expression for risk and the negative expression for ζ have been adjusted. The model for the finite range can be adjusted similarly.For
(36) gives an AL distribution used by Poiraud-Casanova and Thomas-Agnan (2000) to prove a property of a quantile regression estimator. For and the PDF (36) gives the known AL distribution (41). Kotz, Kozubowski, and Podgórski (2001) have extensively studied this model, including an ME characterization for a version with in terms of the following constraints: . We give the elaboration of this AL distribution by QI as follows.We first give a new ME characterization of this well-known AL model. Consider the class of PDFs with the following equivalent representations:
where
and .The ME model in
(40)
is an AL distribution with the following absolutely continuous PDF
where the Lagrange multipliers for the constraints in
(40)
are given by
.The constraints in can be written as . Asadi et al. (2014) referred to these as “local moments” and showed that the ME model that satisfies these constraints on the nonnegative range is piecewise exponential. The truncated exponential on 0 ≤ x ≤ τ can be increasing, decreasing, or uniform (see Fig. 2), but on the unrestricted range must be increasing on 0 ≤ x ≤ τ which gives an AL distribution.Letting and in (41) gives a well-known AL distribution, where σ is the scale parameter and is the asymmetry parameter (Kotz et al., 2001). Bera, Galvao, Mobte-Rojas, & Park (2016) related this formulation to the quantile regression estimation. In general, (41) does not satisfy a given QI, since
Lemma 2 provides an ME formulation suitable for the elaboration of (41) by the QI constraint in the following class of distributions:The first two constraints in (43) can be represented by the risk constraint in (35) where and one of the local moments, for example, .The following corollary gives the ME model in this class.The ME model in
is a continuous AL distribution with the following piecewise PDF:
where the Lagrange multipliers for the risk and QI constraints in
(43)
are given byCorollary 4 gives the ME model for unrestricted range. The ME model in (43) for nonnegative demand can be obtained similarly.
Application examples
ME priors
Eliciting quantiles is known to be an effective approach for developing Bayesian priors. The QI information is usually used to fit a convenient parametric distribution. We illustrate ME priors with QI. A commonly used variable interval method is bisection elicitation. Faucett (Lecture Notes http://www.mas.ncl.ac.uk/~nlf8/teaching/mas2317/notes/chapter3.pdf) uses a scenario to illustrate the bisection elicitation method for eliciting a prior distribution for the rate, β, of retreat (per foot) of the Zachariae Isstrøm glacier in Greenland. The range [0, 70] is bisected into two intervals of equal probability for eliciting the expert’s median. Then the expert is asked to further bisect each of these intervals for eliciting the lower and upper quartiles. The process resulted in and . Faucett used these quartiles for computing the parameters of the gamma conjugate prior (shape and scale ) for a Pareto likelihood model. Fig. 4
depicts the PDF and CDF of this gamma and the PDF of the ME piecewise uniform priors (27) for the median and for all elicited information. The uniform ME reference prior is also included.
Fig. 4
ME prior consistent with the range, the median, three quartiles (QI), and the gamma prior with parameters chosen to satisfy the quartiles.
ME prior consistent with the range, the median, three quartiles (QI), and the gamma prior with parameters chosen to satisfy the quartiles.Table 1
presents the elicited information provided by the median and the three quartiles, and by the additional non-elicited information induced by the gamma prior. The information value, K, is computed by (8) with being the ME prior with less constraint. The information value of the median relative to the uniform reference corresponds to the information provided by experimental results of a biased coin with probability.66 against the fair coin, which is rather informative. The information value of the three quartiles, given by K
20, is comparable with a biased coin with probability.86, which is substantial. The non-elicited information, K
32, is computed by (8) with ; the fitted gamma prior is in Ω, in an approximate sense (its support is x > 0 but probability over the range [0,70] is greater than.9999 and it approximately satisfies the quartile constraints). This is comparable with the information divergence of a biased coin with probabilities.87 from a fair coin. The entropy reduction of the gamma prior relative to the ME QI prior (K
32) is about the same as the information provided by the three quartiles. That is, this fitted prior induces non-elicited information as much as the elicited quartiles (K
20). This is particularly important as it underscores the amount of extraneous information induced by the choice of this prior.
Table 1
Information analysis of prior distributions shown in Fig. 4.
Information
Information
i
Prior
QI
H
relative to uniform
relative to other ME
Ki0
ID
π
KijIDπ
0
Uniform
4.25
0
0
.50
1
Median
24
4.20
.05
.10
.66
2
Three quartiles (QI)
19, 24, 30
3.89
.35
.51
.86
{K21.31.46.84
3
gamma
19, 24, 30
3.50
.75
.78
.94
{K32.39.54.87
Information analysis of prior distributions shown in Fig. 4.Suppose that one also elicits the mean, in the above problem; (this is the mean of the fitted gamma prior). The upper row of Fig. 5
shows the PDFs and CDFs of the ME priors with the mean, the median, and both. The CDFs of these priors are continuous. The ME prior with the mean is the truncated exponential and the prior with the mean and median is the two-piece truncated exponential. Table 2
presents the information analysis of these priors (rows ). The mean relative to the uniform reference is more informative than the median. The information value of the mean relative to the ME with the median corresponds to a biased coin with probability.71 against the fair coin, which is not negligible and the information value of the median relative to the ME with the mean corresponds to a biased coin with probability.67.
Fig. 5
ME models consistent with the range, only QI (median, upper panels, three quartiles (QI), lower panels), only the mean, both types of constraints, and the gamma model with parameters chosen to satisfy the quartiles (lower panels).
Table 2
Information analysis of distributions shown in Fig. 5.
Information
Information
i
Prior (Fig. 5 row)
Constraints
relative to uniform
relative to other ME
QI
Ei(X)
H
Ki0
ID
π
KijIDπ
0
Uniform (both)
4.25
0
0
.50
1
Median (upper)
24
4.20
.05
.10
.66
2
Mean (both)
25
4.16
.09
.17
.71
3
Mean & median (upper)
24
25
4.10
.15
.26
.75
{K31.10.18.71K32.06.11.67
4
Three quartiles (lower)
19, 24, 30
3.89
.35
.51
.86
5
Mean & three quartiles (lower)
19, 24, 30
25
3.85
.40
.55
.87
{K53.25.40.81K54.05.10.66
6
gamma (lower)
19, 24, 30
25
3.50
.75
.78
.94
{K65.35.50.85
ME models consistent with the range, only QI (median, upper panels, three quartiles (QI), lower panels), only the mean, both types of constraints, and the gamma model with parameters chosen to satisfy the quartiles (lower panels).Information analysis of distributions shown in Fig. 5.The lower row of Fig. 5 shows the PDFs and CDFs of the ME priors with the mean, the three quartiles, and both. Plots for the gamma prior fitted to the quartiles are also shown. Table 2 presents the information analysis of these priors. Comparison of K
10 and K
20 indicates that the mean is more informative than the median. The indices K
31 and K
32 indicate that the mean provides more information over the median than the median over the mean. Comparison of K
30 and K
40 indicates that the three quartiles are substantially more informative than the mean and median jointly. The indices K
53 and K
54 indicate that the first and third quartiles provide more information over the mean and median than the mean over the three quartiles. Information analysis of the fitted gamma indicates that it induces substantial non-elicited information over the elicited information, which corresponds to a biased coin with probability.85; K
65 is calculated using (8).
Survey of Professional Forecasters
In the middle of each quarter, the Federal Reserve Bank of Philadelphia surveys experts to solicit subjective forecast probabilities for the GDP growth during the current year and in the following year. The Survey of Professional Forecasters (SPF) offers a set of m fixed intervals (bins) for the experts to assign probabilities. Since 1992, SPF offers ten bins, where the lower and upper bins are unbounded. The SPF also includes a question asking for the point forecast of the respondent. The respondents’ point forecasts cannot be simply interpreted as the means of their forecast distributions (Engelberg, Manski, & Williams, 2009).The subjective probabilities are used to estimate the individual forecast distributions and the arithmetic pool of the individual probabilities (consensus distribution with PDF f) as defined in (14), with . A considerable amount of attention is given to the variance decomposition (16) for . Both, and the average of are used as measures of uncertainty and variance of the means (last term in (16)) is used as a measure of disagreement. Various variance estimates are used, including the variance of the midpoints of the bins this with the Sheppard correction 1/12, the variance of the normal fit to histogram, and the variance of beta distribution fitted to the histogram.Information measures also have been used for measuring uncertainty and disagreement. Rich and Tracy (2010) used the discrete entropy of the probabilities H(P) ≥ 0, but did not use any information divergence for disagreement. Shoja and Soofi (2017) proposed an information framework for measuring uncertainty and disagreement. This framework requires a set of individual forecast distributions, specifying weights for their arithmetic pooling, and measures of uncertainty and divergence. These authors used the discrete entropy, H(P), and the ME models based on the variance of the midpoints to illustrate their proposed framework. Lahiri and Wang (2019) avoided using discrete entropy and applied the information framework of Shoja and Soofi (2017) using the entropies of beta and triangular distributions fitted to the histograms of forecasters’ subjective probabilities and their average; they estimated bounds for the unbounded intervals according to the fit. The use of fitted parametric distributions, such as the normal and beta, induces non-elicited information, so does the use of their variance. In addition, a parametric model fitted to the pooled quantiles is not a mixture of the models fitted to the elicited quantiles, which is a pool of the models fitted for the individuals.We use the ME model (27) to measure uncertainty, disagreement and information of economic forecasters within the framework of Shoja and Soofi (2017). The subjective probabilities of each forecaster provide an ME model . By Proposition 2, is the ME forecast model based on the pool of subjective probabilities of forecasters. Then and (18) with provide two uncertainty measures for the set of forecasters and (17) gives the disagreement measure for the set of forecasters. Following the literature, we use .We first illustrate the implementation of the information framework with the ME model (27) using data given by four respondents of the SPF in the first and second quarters of 2020 for the U.S. GDP growth in 2020 and then apply to all SPF respondents (2019-2020Q2) who provided probabilities for the 2020 U.S. GDP growth. Table 3
gives the probabilities assigned by the four forecasters selected to represent typical forecasters of the first two quarters of 2020 (before and during the coronavirus pandemic). The last row gives the pooled elicited information.
Table 3
The subjective probabilities (in percentage) assigned by four SPF respondents in the first and second quarters (Q1 and Q2) of 2020 to intervals of the U.S. GDP in 2020.
Forecaster
Q
≤ 0%
0-0.9%
1-1.9%
2-2.9%
3-3.9%
4-4.9%
5-5.9%
6-6.9%
7-7.9%
≥ 8%
421
Q1
2
5
45
45
3
Q2
30
40
30
504
Q1
1
3
3
5
27
50
10
1
Q2
10
50
35
5
563
Q1
.28
3.72
24.40
46.60
20.92
3.44
.52
.12
Q2
20
60
20
588
Q1
1
2
5
18
55
17
2
Q2
18
55
18
5
3
1
Pooled
Q1
0.25
1.00
1.82
4.68
28.60
49.15
12.73
1.61
0.13
0.03
Q2
9.50
38.75
32.00
17.50
2.00
0.25
The subjective probabilities (in percentage) assigned by four SPF respondents in the first and second quarters (Q1 and Q2) of 2020 to intervals of the U.S. GDP in 2020.Fig. 6
shows the plots of the ME forecast distributions for these forecasters, superimposed by the pooled ME forecast model which is derived using the pooled QI constraints according to Proposition 2. The unbounded bins are assigned . The plots in the first row are for the first quarter of 2020 and the second row shows the corresponding plots for the second quarter. Shifts of all distributions toward the lower side are apparent.
Fig. 6
The ME forecast models with quantile information based on subjective probabilities of four SPF forecasters in the first (first row) and second (second row) quarters of 2020 (dashed-red) and the PDFs of the respective pooled ME forecast models (solid blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
The ME forecast models with quantile information based on subjective probabilities of four SPF forecasters in the first (first row) and second (second row) quarters of 2020 (dashed-red) and the PDFs of the respective pooled ME forecast models (solid blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)The uncertainty of individual forecasters and disagreement among them are tabulated below the plots of the ME models. These measures are computed using (19) for the individual i and pooled distributions within the information framework of Shoja and Soofi (2017). For each forecaster, maps the extent of its concentration and quantifies the uncertainty of the ME model (solid blue). The KL divergence maps the extent to which the individual ME model disagrees with the pooled ME model for the quarter (dashed red); π(K) index gives the probability according to the biased coin calibration for the KL divergences K. These measures indicate that in most cases the uncertainty is decreased in the second quarter and in all cases the disagreement is sharply increased. The pooled uncertainty H and disagreement JS are averages of and . For the quarter pool, and the average of four individual entropies, H, are measures of overall uncertainty. The disagreement information, JS, is the average of four individual divergences, which is also the difference between the two measures of uncertainty for the pool. These measures confirm that, overall, the uncertainty of these four forecasters is decreased in the second quarter and their disagreement is sharply increased. The coronavirus pandemic might have played a role in these changes.Next we present application to all 2019 and 2020 quarterly forecasters of the 2020 U.S. GDP in the SPF database. Table 4
summarizes the information analysis of quarterly data for all 2019 and 2020 forecasters. For the unbounded bins we set . The sensitivity of entropies to the choice of the lower and upper bins, B
1 and B
10 can be easily assessed using . The average probabilities of the unbounded bins are minute which make the entropy practically insensitive to the choice of the bins. The following points are evident from Table 4.
Table 4
Uncertainty and disagreement of all SPF respondents during 2019Q1-2020Q2 about the 2020 U.S. GDP growth.
Data
Forecast
QI measures
Quarter
horizon
No.
H(fc*)
Hc
JSc
π(JSc)
2019Q1
7.5
31
1.615
1.355
.260
.82
2019Q2
6.5
34
1.578
1.321
.257
.82
2019Q3
5.5
30
1.583
1.369
.214
.80
2019Q4
4.5
30
1.571
1.343
.228
.80
2020Q1
3.5
30
1.313
1.143
.170
.77
2020Q2
2.5
34
1.619
1.266
.353
.86
During 2019 quarters the uncertainty of the pooled model H(f) remains stable while the pooled uncertainty varies a little bit more, resulting in varying the information/disagreement JS through the quarters. The disagreement is stronger in the first half of 2019, where the forecast horizon is longer, as compared with the second half (shorter forecast horizon).In the first quarter of 2020, the uncertainty and disagreement of the forecaster decrease; a shorter horizon might have played a role. But in the second quarter of 2020, the uncertainty sharply and disagreement modestly increase. In spite of a shorter horizon these changes can be attributed to the coronavirus pandemic.Uncertainty and disagreement of all SPF respondents during 2019Q1-2020Q2 about the 2020 U.S. GDP growth.The SPF provides information about the forecaster’s industry. Table 5
reports the uncertainty and disagreement according to the forecaster’s firm. The uncertainty of the forecaster from the financial service providers is higher and their disagreement is lower than respective measures for forecasters from other industries. The time patterns of the measures are similar to Table 5. (R codes for computing Tables 4 and 5 are available upon request from the authors.)
Table 5
Uncertainty and disagreement of all SPF respondents during 2019Q1-2020Q2 about the 2020 U.S. GDP growth according to the forecaster’s industry.
Data
Forecast
Forecasters from financial services
Forecasters from other industries
Quarter
horizon
No.
H(fc*)
Hc
JSc
π(JSc)
No.
H(fc*)
Hc
JSc
π(JSc)
2019Q1
7.5
11
1.602
1.437
.165
.77
20
1.615
1.310
.305
.84
2019Q2
6.5
9
1.561
1.379
.182
.78
25
1.577
1.297
.280
.83
2019Q3
5.5
10
1.527
1.368
.159
.76
20
1.605
1.370
.235
.81
2019Q4
4.5
10
1.518
1.338
.180
.77
20
1.586
1.345
.241
.81
2020Q1
3.5
10
1.323
1.155
.168
.77
20
1.301
1.137
.164
.76
2020Q2
2.5
11
1.638
1.292
.346
.85
23
1.607
1.252
.355
.86
Uncertainty and disagreement of all SPF respondents during 2019Q1-2020Q2 about the 2020 U.S. GDP growth according to the forecaster’s industry.
Reliability and inventory management
Two truncated distributions below and above a threshold, τ, on the support of a distribution are prominent in reliability modeling, where τ represents the current age of an item and in inventory management where τ represents the inventory. We consider the case when is a quantile of the ME PDF on the entire support. Then the truncated distributions in (26) are:
where is the survival function at the optimal order. In reliability modeling, (45) and (46) correspond to the past-life or down-time and the residual distributions at a given time q. The “mean residual” is also used in inventory problems. In inventory analysis, (45) is the distribution of sales (the conditional distribution of the demand, given that demand does not exceed the inventory), and (46) is the distribution of stock out (the conditional distribution of the demand, given that demand exceeds the inventory). We present ME models useful for inventory problems.A widely studied topic in inventory management is the so called newsvendor (NV) problem, which is defined by an optimal order quantity for a product with an uncertain demand X and fixed prices. Let c be the cost per unit, r the selling price per unit, and s the salvage value per unit, then and are the overage and underage costs, respectively. Assuming, s < c < r, the expected profit maximizing (loss minimization) solution for the optimal quantity order is the αth quantile of the demand distribution where . This solution is traditionally represented in terms of the βth upper quantile . This optimization, however, does not offer a model for the unknown demand distribution, F, and without a distribution q remains unknown.The NV optimal decision theoretic decision under the loss function (33) coincides with NV decision based on the expected profit maximization; see for example, Snyder & Shen (2011, p. 78). Thus, the new ME Asymmetric Laplace model (36) provides a demand distribution consistent with the optimal order for the basic NV problem. Corollary 4 also provides a model for the NV problem. In (39), μ
1 is the average sales which can be assessed based on the sales data. This and the usual assumption of given mean demand, μ, provide the partial information about the demand distribution. In (40), the constraints represent the mean inventory, the mean stock-out, and the profit maximizing quantile.The NV literature offers moment-based rules for computing optimal order, such as the Scarf (1958)’s rule and the minimax regret rule (Perakis & Roels, 2008). These rules include the cost ratio, α, but do not offer models for the demand distribution. Instead, well-known probability models are then chosen for the demand distribution as a separate task. The classical ME models (uniform, exponential, and normal) are used for the distribution of the demand and justified based on the ME principle (Andersson, Jörnsten, Nonås, Sandal, Ubøe, 2013, Perakis, Roels, 2008). The moment based ME models are said to be non-robust to various degrees because their αth quantiles are different from the optimal solutions of the moment-based rules. Such discrepancies between the optimal order quantities and the αth quantiles of moment-based ME models are due to the lack of inclusion of the QI in the ME calculations. The ME models with moments and quantiles resolve this discrepancy.We illustrate applications of Corollaries 1 and 2 for two moment-based optimal order rules, one is based on the mean and one is based on the mean and variance. The minimax regret rule, depends on the partial information. With a given mean on the nonnegative range . For the case of the first two moments on the nonnegative range, an approximate minimax regret rule (Roels, 2006) isThe Scarf’s exact rule for the optimal order quantity based on the mean μ and the variance σ
2 isFig. 7
illustrates the lack of robustness of the exponential and truncated normal models against these rules. The PDFs and CDFs of these two moment-based models are shown by dashed red plots and for the corresponding ME models with moments and QI are shown by solid-blue plots. The left two panels of Fig. 7 show the plots of the PDF and CDF of the ME models with and without the QI, where given by the minimum regret rule, . Corollary 1 with and gives the continuous distribution with the two-piece PDF: a truncated exponential on [0, q] and a shifted exponential on x ≥ q. The CDF plot illustrates the lack of robustness of the exponential model for this rule.
Fig. 7
The PDFs and CDFs of the ME exponential and truncated normal models (dashed red) and the corresponding ME models with an additional QI constraint (solid blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
The PDFs and CDFs of the ME exponential and truncated normal models (dashed red) and the corresponding ME models with an additional QI constraint (solid blue). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)The right two panels of Fig. 7 show the plots of the PDF and CDF of the ME models with and without the QI. For the case of the ME model with QI so that the coefficient of variation is and the ME model is truncated normal. We let so that the coefficient of variation conditions for the approximate minimum regret and the Scarf’s rule hold. These rules give and . With the ME model is TN(3.77, 1.11), where μ and σ are the mean and standard deviation of its parent normal distribution. The first two moments of
are and which give ; (the moments are computed using in (S.1) and (S.2) in the Supplementary Document). Part (b) of Corollary 2 with gives a continuous distribution with a two-piece PDF: a truncated normal on [0, q] and a truncated normal on x ≥ q. The CDF plot illustrates the lack of robustness of the truncated normal model for this rule.Table 6
gives the information values of QI when the uniform, exponential, and truncated normal distributions are adjusted to satisfy the maximizing profit quantiles as well as the minimax regret or Scarf’s rules. The Scarf’s rule for the uniform ME model on a finite range, [a, b], gives for α ≤ .25 and for α < .5 (for α > .5, ). The minimax regret rule when the partial information is a finite range [a, b] gives which coincides with (Perakis & Roels, 2008).
Table 6
Entropies and information indices for the adjusted classical maximum entropy models.
Rule
H(fθ*)
α
qα
H(fθ,α*)
Information value of QI
K(fθ,α*:fθ*)
ID(K)
π(K)
Uniform on (Table 1)
Minimax regret
4.248
.75
52.5
4.510
0
0
.50
Scarf
4.248
.75
46.67
4.232
.278
.426
.83
Exponential (Fig. 7)
Minimax regret
2.609
.40
2.00
2.594
.015
.030
.586
Scarf
2.609
.40
3.98
2.534
.075
.140
.687
Truncated normal (Fig. 7)
Minimax regret
1.522
.35
3.49
1.440
.082
.151
.694
Scarf
1.522
.35
3.42
1.520
.002
.004
.532
Entropies and information indices for the adjusted classical maximum entropy models.In Table 6 the uniform distribution is on [0, 70] and . The minimax regret rule gives and there is no information value for the QI of this rule. The Scarf’s rule gives and the information gain of QI over the uniform model is comparable to the prediction of outcomes of a coin with probability.83, indicating that the QI of this rule is very informative. Notice that, here we have a single QI constraint and QI is nearly as informative as the four quartiles information in Table 1. The information values of QI constraint with the minimax regret rule and Scarf’s rule in addition to the mean (Fig. 7), are comparable to the prediction of outcomes of a coin with probability.59 and.69, respectively. Accordingly, QI is rather informative indicating that these rules are not robust for the exponential model, however, the minimax regret improves over the Scarf’s rule. The information values of QI constraint with and Scarf’s rule in addition to the first two moments (Fig. 7), are comparable to the prediction of outcomes of a coin with probability.62 and.53, respectively. Accordingly, QI with the minimax regret is rather informative and the Scarf’s rule is rather robust under the truncated normal for the demand.
Summary and discussion
This paper presented several results and an assortment of ME models with given quantiles with or without moments and illustrated their potential applications in various problems. The results explored properties, existence, and shapes of models for the minimum information elaborations of the uniform and moment-based ME models by quantiles, and provided diagnostics for assessment of the utility of the minimal elaborations. The application examples illustrated the merits of the results by explicating problems where non-elicited information is induced in the assumed models and information that appears in the analysis is not included in the assumed ME models. The results for the ME model with quantile and moments information enabled us to determine the shape and existence of the ME model, compute its entropy rather easily, and assess the information utility of the elaboration. Specific models provided elaborations of the uniform, exponential, normal, Laplace, and a known asymmetric Laplace distribution. The elaboration of the Laplace distribution is along the lines of Ardakani, Ebrahimi, and Soofi (2018) that explored the link between the information theory and Laplace’s first and second laws of error in terms of the minimum risk ME models under the absolute error and quadratic loss functions (Laplace and normal). This extension provides a gateway to further research on developing ME models with minimum risks of other asymmetric loss functions such as Linex loss which is used in various problems.The ME distributions with quantile information are continuous, but their PDFs are not, so they do not provide the convenience of parametric priors. However, ME models enable assessing the non-elicited information that is induced by assuming parametric models that fit the elicited quantiles. From an example we learned that a gamma distribution fitted to quartiles elicited for a Bayesian prior induces non-elicited information as much as the elicited quartiles. The example also compared information values of eliciting median, quartiles, and the mean.Application of the ME model with QI elicited by fixed interval method to the Federal Reserve Bank of Philadelphia’s SPF revealed the effect of the coronavirus outbreak on the uncertainty and disagreement of the economic forecasters. The results informed that disagreement among forecasters about the U.S. 2020 GDP growth increased as time evolved until the coronavirus outbreak. After the outbreak, the uncertainty and disagreement of forecasters increased sharply above the level of the first quarter of 2020. An industry level analysis indicated that disagreement among forecasters in financial services is lower than those who are in other industries.Our results shed light on an issue in the inventory management literature. A demand distribution is said to be (perfectly) robust if its profit maximizing quantile is consistent with a moment-based rule for an optimal order quantity. Achieving this property requires inclusion of this important quantile within the partial information set used for developing the rule. This is illustrated using elaborations of the uniform, exponential, and normal distributions by the profit maximizing quantile.This paper considered only the case of continuous random variables. The discrete case can be illustrated similarly. The biased coin calibration of the KL divergence has been supplied so that the tabulated disagreement measures can be judged sizeable or not. The Bayesian inferences about the entropy and KL divergence can be used for more formal inferences.