Literature DB >> 33286090

Conditional Rényi Divergences and Horse Betting.

Cédric Bleuler¹, Amos Lapidoth¹, Christoph Pfister¹.

Abstract

Motivated by a horse betting problem, a new conditional Rényi divergence is introduced. It is compared with the conditional Rényi divergences that appear in the definitions of the dependence measures by Csiszár and Sibson, and the properties of all three are studied with emphasis on their behavior under data processing. In the same way that Csiszár's and Sibson's conditional divergence lead to the respective dependence measures, so does the new conditional divergence lead to the Lapidoth-Pfister mutual information. Moreover, the new conditional divergence is also related to the Arimoto-Rényi conditional entropy and to Arimoto's measure of dependence. In the second part of the paper, the horse betting problem is analyzed where, instead of Kelly's expected log-wealth criterion, a more general family of power-mean utility functions is considered. The key role in the analysis is played by the Rényi divergence, and in the setting where the gambler has access to side information, the new conditional Rényi divergence is key. The setting with side information also provides another operational meaning to the Lapidoth-Pfister mutual information. Finally, a universal strategy for independent and identically distributed races is presented that-without knowing the winning probabilities or the parameter of the utility function-asymptotically maximizes the gambler's utility function.

Entities: Chemical Disease Species

Keywords: Kelly gambling; Rényi divergence; Rényi mutual information; conditional Rényi divergence; horse betting

Year: 2020 PMID： 33286090 PMCID： PMC7516775 DOI： 10.3390/e22030316

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

As shown by Kelly [1,2], many of Shannon’s information measures appear naturally in the context of horse gambling when the gambler’s utility function is expected log-wealth. Here, we show that under a more general family of utility functions, gambling also provides a context for some of Rényi’s information measures. Moreover, the setting where the gambler has side information motivates a new Rényi-like conditional divergence, which we study and compare to other conditional divergences. The proposed family of utility functions in the context of gambling with side information also provides another operational meaning to the Rényi-like mutual information that was recently proposed by Lapidoth and Pfister [3]: it measures the gambler’s gain from the side information as measured by the increase in the minimax value of the two-player zero-sum game in which the bookmaker picks the odds and the gambler then places the bets based on these odds and her side information. Deferring the gambling-based motivation to the second part of the paper, we first describe the different conditional divergences and study some of their properties with emphasis on their behavior under data processing. We also show that the new conditional Rényi divergence relates to the Lapidoth–Pfister mutual information in much the same way that Csiszár’s and Sibson’s conditional divergences relate to their corresponding mutual informations. Before discussing the conditional divergences, we first recall other information measures. The Kullback–Leibler divergence (or relative entropy) is an important concept in information theory and statistics [2,4,5,6]. It is defined between two probability mass functions (PMFs) P and Q over a finite set as where denotes the base-2 logarithm. Defining a conditional Kullback–Leibler divergence is straightforward because, as simple algebra shows, the two natural approaches lead to the same result: where denotes the support of P, and in (3) and throughout denotes the PMF on that assigns the probability . The Rényi divergence of order [7,8] between two PMFs P and Q is defined for all positive ’s other than one as A conditional Rényi divergence can be defined in more than one way. In this paper, we consider the following three definitions, two classic and one new: where (5) is inspired by Csiszár [9]; (6) is inspired by Sibson [10]; and (7) is motivated by the horse betting problem discussed in Section 9. The first two conditional Rényi divergences were used to define the Rényi measures of dependence of Csiszár [9] and of Sibson [10]: where the minimization is over all PMFs on the set . (Gallager’s function [11] and are in one-to-one correspondence; see (65) below.) The analogous minimization of leads to the Lapidoth–Pfister mutual information [3]: where (11) is proved in Proposition 5. The first part of the paper is structured as follows: In Section 2, we discuss some preliminaries. In Section 3, Section 4 and Section 5, we study the properties of the three conditional Rényi divergences and their associated measure of dependence. In Section 6, we express the Arimoto–Rényi conditional entropy and the Arimoto measure of dependence [12] in terms of . In Section 7, we relate the conditional Rényi divergences to each other and discuss the relations between the Rényi dependence measures. The second part of the paper deals with horse gambling under our proposed family of power-mean utility functions. It is in this context that the Rényi divergence (Theorem 9) and the conditional Rényi divergence (Theorem 10) appear naturally. More specifically, consider a horse race with a finite nonempty set of horses , where a bookmaker offers odds -for-1 on each horse , where [2] (Section 6.1). A gambler spends all her wealth placing bets on the horses. The fraction of her wealth that she bets on Horse is denoted , which sums up to 1 over , and the PMF b is her “betting strategy.” The winning horse, which we denote X, is drawn according to the PMF p, where we assume for all . The wealth relative (or end-to-beginning wealth ratio) is the random variable Hence, given an initial wealth , the gambler’s wealth after the race is . We seek betting strategies that maximize the utility function where is a parameter that accounts for the risk sensitivity. This optimization generalizes the following cases: In the limit as tends to , we optimize the worst-case return. The optimal strategy is risk-free in the sense that S does not depend on the winning horse (see Proposition 8). If , then we optimize , which is known as the doubling rate [2] (Section 6.1). The optimal strategy is proportional betting, i.e., to choose (see Remark 4). If , then we optimize , the expected return. The optimal strategy is to put all the money on a horse that maximizes (see Proposition 9). In general, if , then it is optimal to put all the money on one horse (see Proposition 9). This is risky: if that horse loses, the gambler will go broke. In the limit as tends to , we optimize the best-case return. The optimal strategy is to put all the money on a horse that maximizes (see Proposition 10). Note that, for and , maximizing is equivalent to maximizing which is known in the finance literature as Constant Relative Risk Aversion (CRRA) [13,14]. We refer to our utility function as “power mean” because it can be written as the logarithm of a weighted power mean [15,16]: Because the power mean tends to the geometric mean as tends to zero [15] (Problem 8.1), is continuous at : Campbell [17,18] used an exponential cost function with a similar structure to (15) to provide an operational meaning to the Rényi entropy in source coding. Other information-theoretic applications of exponential moments were studied in [19]. The second part of the paper is structured as follows: In Section 8, we relate the utility function to the Rényi divergence (Theorem 9) and derive its optimal gambling strategy. In Section 9, we consider the situation where the gambler observes side information prior to betting, a situation that leads to the conditional Rényi divergence (Theorem 10) and to a new operational meaning for the measure of dependence (Theorem 11). In Section 10, we consider the situation where the gambler invests only part of her money. In Section 11, we present a universal strategy for independent and identically distributed (IID) races that requires neither knowledge of the winning probabilities nor of the parameter of the utility function and yet asymptotically maximizes the utility function for all PMFs p and all .

2. Preliminaries

Throughout the paper, denotes the base-2 logarithm, and are finite sets, denotes a joint PMF over , denotes a PMF over , and denotes a PMF over . An expression of the form denotes the PMF on that assigns the probability . We use P and Q as generic PMFs over a finite set . We denote by the support of P, and by the set of all PMFs over . When clear from the context, we often omit sets and subscripts: for example, we write for , for , for , and for . When is 0, we define the conditional probability as . The conditional distribution of Y given is denoted by , thus We denote by the indicator function that is one if the condition is satisfied and zero otherwise. In the definition of the Kullback–Leibler divergence in (1), we use the conventions In the definition of the Rényi divergence in (4), we read as for and use the conventions For being zero, one, or infinity, we define by continuous extension of (4) The Rényi divergence for negative is defined as (We use negative in the proof of Proposition 1 (e) below and in Remark 6. More about negative orders can be found in [8] (Section V). For other applications of negative orders, see [20] (Proof of Theorem 1 and Example 1).) The Rényi divergence satisfies the following basic properties: Let P and Q be PMFs. Then, the Rényi divergence satisfies the following: For all , . If , then if and only if . For all , is finite if and only if . For all , is finite if and only if . The mapping is continuous on . The mapping is nondecreasing on . The mapping is nonincreasing on . The mapping is concave on . The mapping is concave on . (Data-processing inequality.) Let be a conditional PMF, and define the PMFs Then, for all , See Appendix A. □ All three conditional Rényi divergences reduce to the unconditional Rényi divergence when both and are independent of X: Let , , and be PMFs. Then, for all , This follows from the definitions of , , and in (5)–(7). □

3. Csiszár’s Conditional Rényi Divergence

For a PMF and conditional PMFs and , Csiszár’s conditional Rényi divergence is defined for every as For , which follows from the definition of the Rényi divergence in (4). For being zero, one, or infinity, we obtain from (21)–(23) and (2) Augustin [21] and later Csiszár [9] defined the measure of dependence Augustin used this measure to study the error exponents for channel coding with input constraints, while Csiszár used it to study generalized cutoff rates for channel coding with composition constraints. Nakiboğlu [22] studied more properties of . Inter alia, he analyzed the minimax properties of the Augustin capacity where is a constraint set. The Augustin capacity is used in [23] to establish the sphere packing bound for memoryless channels with cost constraints. The rest of the section presents some properties of . Being an average of Rényi divergences (see (29)), inherits many properties from the Rényi divergence: Let be a PMF, and let and be conditional PMFs. Then, For all , . If , then if and only if for all . For all , is finite if and only if for all . For all , is finite if and only if for all . The mapping is continuous on . The mapping is nondecreasing on . The mapping is nonincreasing on . The mapping is concave on . The mapping is concave on . These follow from (29) and the properties of the Rényi divergence (Proposition 1). For Parts (f) and (g), recall that a nonnegative weighted sum of concave functions is concave. □ We next consider data-processing inequalities for . We distinguish between processing Y and processing X. The data-processing inequality for processing Y follows from the data-processing inequality for the (unconditional) Rényi divergence: Let be a PMF, and let and be conditional PMFs. For a conditional PMF , define Then, for all , See Appendix B. □ The following data-processing inequality for processing X holds for (as shown in Example 1 below, it does not extend to ): Let be a PMF, and let and be conditional PMFs. For a conditional PMF , define the PMFs Then, for all , Note that , , and in Theorem 2 can be obtained from the following marginalizations: See Appendix C. □ As a special case of Theorem 2, we obtain the following relation between the conditional and the unconditional Rényi divergence: For a PMF and conditional PMFs and , define the marginal PMFs Then, for all , See Appendix D. □ Consider next . It turns out that Corollary 1, and hence Theorem 2, cannot be extended to these values of (not even if is restricted to be independent of X, i.e., if ): Let . For , define the PMFs , , and as Then, for every, there exists ansuch thatwhere the PMFis defined by (46) and, irrespective of . See Appendix E. □

4. Sibson’s Conditional Rényi Divergence

For a PMF and conditional PMFs and , Sibson’s conditional Rényi divergence is defined for every as For , where (55) and (56) follow from the definition of the Rényi divergence in (4). For being zero, one, or infinity, we obtain from (21)–(23) and (3) Sibson [10] defined the measure of dependence This minimum can be computed explicitly [10] (Corollary 2.3): For , and for being one or infinity, where denotes Shannon’s mutual information. The concavity and convexity properties of and were studied by Ho–Verdú [24]. More properties of were collected by Verdú [25]. The maximization of with respect to and the minimax properties of were studied by Nakiboğlu [26] and Cai–Verdú [27]. The conditional Rényi divergence was used by Fong and Tan [28] to establish strong converse theorems for multicast networks. Yu and Tan [29] analyzed channel resolvability, among other measures, in terms of . From (61) we see that Gallager’s function [11], which is defined as is in one-to-one correspondence to Sibson’s measure of dependence: Gallager’s function is important in channel coding: it appears in the random coding exponent [30] and in the sphere packing exponent [31,32] (see also Gallager [11]). The exponential strong converse theorem proved by Arimoto [33] also uses the function. Polyanskiy and Verdú [34] extended the exponential strong converse theorem to channels with feedback. Augustin [21] and Nakiboğlu [35,36] extended the sphere packing bound to channels with feedback. The rest of the section presents some properties of . Because can be written as an (unconditional) Rényi divergence (see (54)), it inherits many properties from the Rényi divergence: Let be a PMF, and let and be conditional PMFs. Then, For all , . If , then if and only if for all . For all , is finite if and only if (there exists an such that . For all , is finite if and only if for all . The mapping is continuous on . The mapping is nondecreasing on . The mapping is nonincreasing on . The mapping is concave on . The mapping is concave on . These follow from (54) and the properties of the Rényi divergence (Proposition 1). □ We next consider data-processing inequalities for . We distinguish between processing Y and processing X. The data-processing inequality for processing Y follows from the data-processing inequality for the (unconditional) Rényi divergence: Let be a PMF, and let and be conditional PMFs. For a conditional PMF , define Then, for all , See Appendix F. □ The data-processing inequality for processing X similarly follows from the data-processing inequality for the (unconditional) Rényi divergence: Let be a PMF, and let and be conditional PMFs. For a conditional PMF , define the PMFs Then, for all , See Appendix G. □ As a special case of Theorem 4, we obtain the following relation between the conditional and the unconditional Rényi divergence: Let be a PMF, and let and be conditional PMFs. Define the marginal PMFs Then, for all , This follows from Theorem 4 in the same way that Corollary 1 followed from Theorem 2. □

5. New Conditional Rényi Divergence

Let be a PMF, and let and be conditional PMFs. For , define where (78) follows from the definition of the Rényi divergence in (4). (Except for the sign, the exponential averaging in (77) is very similar to the one of the Arimoto–Rényi conditional entropy; compare with (147) below.) For being zero, one, or infinity, we define by continuous extension of (77) This conditional Rényi divergence has an operational meaning in horse betting with side information (see Theorem 10 below). Before discussing the measure of dependence associated with , we establish the following alternative characterization of : Let be a PMF, and let and be conditional PMFs. Then, for all , We first treat the case . Some algebra reveals that, for every PMF , where the PMF is defined as The right-hand side (RHS) of (82) is thus equal to the minimum over of the RHS of (83). Since with equality if (Proposition 1 (a)), this minimum is equal to the second term on the RHS of (83), which, by (78), equals . For and , (82) follows from the same argument using that, for every PMF , where the PMF is defined as For , (82) holds because where (88) follows from the definition of in (21), and (91) follows from (79). □ Tomamichel and Hayashi [37] and Lapidoth and Pfister [3] independently introduced and studied the dependence measure (For some measure-theoretic properties of , see Aishwarya–Madiman [38].) The measure can be related to the error exponents in a hypothesis testing problem where the samples are either from a known joint distribution or an unknown product distribution (see [37] (Equation (57)) and [39]). It also appears in horse betting with side information (see Theorem 11 below). Similar to in (34) and in (60), the measure can be expressed as a minimization involving the new conditional Rényi divergence: Let be a joint PMF. Denote its marginal PMFs by and and its conditional PMFs by and , so . Then, for all , Equation (93) holds because where (95) follows from Proposition 4, and (96) follows from (92). Swapping the roles of X and Y establishes (94): where (97) follows from Proposition 4, and (98) follows from (92). □ The rest of the section presents some properties of . Let be a PMF, and let and be conditional PMFs. Then, For all , . If , then if and only if for all . For all , is finite if and only if (there exists an such that . For all , is finite if and only if for all . The mapping is continuous on . The mapping is nondecreasing on . The mapping is nonincreasing on . The mapping The mapping is concave on . We prove these properties as follows: For all , Proposition 4 implies The nonnegativity of now follows from the nonnegativity of the Rényi divergence (Proposition 1 (a)). If for all , then . Hence, using on the RHS of (99), equals zero. Conversely, if and , then for some by Proposition 1 (a), which implies for all . This follows from the definitions in (77) and (79)–(81) and the conventions in (20). For , is continuous because it is, by its definition in (77), a composition of continuous functions. The continuity at follows from a careful application of L’Hôpital’s rule. We next consider the continuity at . Define . Then, for all , where (100) follows from the definition in (77). On the other hand, for all , Because , it follows from (103) and (106) and the sandwich theorem that where (108) follows from the continuity of the Rényi divergence (Proposition 1 (c)) and the definition of in (21). We conclude with the continuity at . Observe that where (109) follows from the definition in (77), and (111) follows from the continuity of the Rényi divergence (Proposition 1 (c)) and the definition of in (23). For all , Proposition 4 implies Because is nonincreasing on (Proposition 1 (d)) and because the pointwise minimum preserves the monotonicity, the mapping is nonincreasing on . By Proposition 4, By the nonnegativity of the Rényi divergence (Proposition 1 (a)), the RHS of (113) is nonnegative for and nonpositive for . Hence, it suffices to show separately that the mapping is nonincreasing on and on . This is indeed the case: the mapping on the RHS of (113) is nonincreasing on (Proposition 1 (e)), and the monotonicity is preserved by the pointwise minimum and maximum, respectively. For , Proposition 4 implies that Because is concave on (Proposition 1 (f)) and because the pointwise minimum preserves the concavity, the mapping is concave on . This follows from Proposition 1 (g) in the same way that Part (f) followed from Proposition 1 (f). □ We next consider data-processing inequalities for . We distinguish between processing Y and processing X. The data-processing inequality for processing Y follows from the data-processing inequality for the (unconditional) Rényi divergence: Let Then, for all , We prove (117) for ; the claim will then extend to by the continuity of in (Proposition 6 (c)). For every , we can apply Proposition 1 (h) with the substitution of for to obtain For , (117) now follows from (77) and (118). □ Processing X is different. Consider first that does not depend on X. Then, writing , we have the following result (which, as shown in Example 2 below, does not extend to general ): Let and be PMFs, and let be a conditional PMF. For a conditional PMF , define the PMFs Then, for all , Once we provide the operational meaning of in horse betting with side information (Theorem 10 below), Theorem 6 will become very intuitive: it expresses the fact that preprocessing the side information cannot increase the gambler’s utility; see Remark 8. Note that and in Theorem 6 can be obtained from the following marginalization: We show (122) for ; the claim will then extend to by the continuity of in (Proposition 6 (c)). Consider first . Then, (122) holds because where (124) follows from (78); (125) follows from (121); (126) follows from (120); (127) follows from the Minkowski inequality [16] (III 2.4 Theorem 9); (129) holds because and imply , hence the first expression in square brackets on the left-hand side (LHS) of (129) equals one; and (130) follows from (78). The proof for is very similar: (124)–(126) and (128)–(130) continue to hold, and (127) is reversed [16] (III 2.4 Theorem 9). Because now , (122) continues to hold for . □ As a special case of Theorem 6, we obtain the following relation between the conditional and the unconditional Rényi divergence: Let and be PMFs, and let be a conditional PMF. Define the marginal PMF Then, for all , This follows from Theorem 6 in the same way that Corollary 1 followed from Theorem 2. □ Consider next that does depend on X. It turns out that Corollary 3, and hence Theorem 6, cannot be extended to this setting: Let and . Define the PMFs , , and as Then, for and for , where the PMFs and are given by Numerically, bits, which is larger than bits. Similarly, bits, which is larger than bits. □

6. Relation to Arimoto’s Measures

Before discussing Arimoto’s measures, we first recall the definition of the Rényi entropy. The Rényi entropy of order [7] is defined for all positive ’s other than one as For being zero, one, or infinity, we define by continuous extension of (141) where denotes Shannon’s entropy. The Rényi entropy can be related to the Rényi divergence as follows: where denotes the uniform distribution over . There are different ways to define a conditional Rényi entropy [40]; we use Arimoto’s proposal. The Arimoto–Rényi conditional entropy of order [12,38,40,41] is defined for positive other than one as where (147) follows from the definition of the Rényi entropy in (141). The Arimoto–Rényi conditional entropy plays a key role in guessing with side information [20,42,43,44] and in task encoding with side information [45]; and it can be related to hypothesis testing [41]. For being zero, one, or infinity, we define by continuous extension of (146) where denotes Shannon’s conditional entropy. The analog of (145) for is: For all , Equation (151) follows, using some algebra, from the definition of in (78)–(81); and (152) follows from Proposition 4. (The characterization in (152) previously appeared as [40] (Theorem 4).) □ Arimoto [12] also defined the following measure of dependence: where (154) follows from (141) and (146). Using Remark 2, we can express in terms of : For all , This follows from (145), (151), and (153). □

7. Relations Between the Conditional Rényi Divergences and the Rényi Dependence Measures

In this section, we first establish the greater-or-equal-than order between the conditional Rényi divergences, where the order depends on whether or . We then show that this implies the same order between the dependence measures derived from the conditional Rényi divergences. Finally, we remark that many of the dependence measures coincide when they are maximized over all PMFs . For all , This holds because where (157) follows from Proposition 4, and (159) follows from the definition of in (54). □ For all , For all , For both and , the relation follows from Proposition 7. We next show that for . We show this for ; the claim will then extend to by the continuity in of and (Proposition 3 (c) and Proposition 2 (c)). For , where (162) follows from (55); (163) follows from Jensen’s inequality because is a concave function; and (164) follows from (30). The proof of the claim for is finished by dividing (162)–(164) by , which reverses the inequality because . We conclude by showing that for . We show this for ; the claim will then extend to by the continuity of and in (Proposition 2 (c) and Proposition 6 (c)). For , where (165) follows from (30); (167) follows from Jensen’s inequality because is a concave function; and (168) follows from (78). □ For all , For all , By (34) and (60) and Proposition 5, respectively, The corollary now follows from (171)–(173) and Theorem 7. □ Despite , , , and being different measures, they often coincide when maximized over all PMFs : For every conditional PMF and every , In addition, for every conditional PMF and every , For , the situation is different: there exists a conditional PMF such that, for every , Equation (174) follows from [9] (Proposition 1); (175) follows from [12] (Lemma 1); and (176) follows from [38] (Theorem V.1) for . We next establish (176) for . Observe that, for , (176) is equivalent to For , (178) holds because where (179) follows from Proposition 5; (180) follows from (78); (181) and (185) follow from a minimax theorem and are justified below; (187) follows from (55); and (188) follows from (60). To justify (181), we apply the minimax theorem [46] (Corollary 37.3.2) to the function , The sets of all PMFs over and over are convex and compact; the function f is jointly continuous in the pair because it is a composition of continuous functions; for every , the function f is linear and hence convex in ; and it only remains to show that the function f is concave in for every . Indeed, for every with , every , and every , where (193) follows from the reverse Minkowski inequality [16] (III 2.4 Theorem 9) because ; and (195) holds because the function is concave for . The justification of (185) is very similar to that of (181); here, we apply the minimax theorem to the function , Compared to the justification of (181), the only essential difference lies in showing that the function g is concave in for every : here, this follows easily from the concavity of the function for . We conclude the proof by establishing (177). Let , and let the conditional PMF be given by . (This corresponds to a binary noiseless channel.) Then, denoting by the uniform distribution over , where (199) follows from (61). On the other hand, for every and every PMF , where (200) follows from [3] (Lemma 11); (201) follows from (144); and (202) holds because . Inequality (177) now follows from (199) and (202). □

8. Horse Betting

In this section, we analyze horse betting with a gambler investing all her money. Recall from the introduction that the winning horse X is distributed according to the PMF p, where we assume for all ; that the odds offered by the bookmaker are denoted by ; that the fraction of her wealth that the gambler bets on Horse is denoted ; that the wealth relative is the random variable ; and that we seek betting strategies that maximize the utility function Because the gambler invests all her money, b is a PMF. As in [47] (Section 10.3), define the constant and the PMF Using these definitions, the utility function can be decomposed as follows: Let , and let b be a PMF. Then, where the PMF is given by Thus, choosing uniquely maximizes among all PMFs b. The three terms in (206) can be interpreted as follows: The first term, , depends only on the odds and is related to the fairness of the odds. The odds are called subfair if , fair if , and superfair if . The second term, , is related to the bookmaker’s estimate of the winning probabilities. It is zero if and only if the odds are inversely proportional to the winning probabilities. The third term, , is related to the gambler’s estimate of the winning probabilities. It is zero if and only if b is equal to . For, (206) reduces to the following decomposition of the doubling rate : (This decomposition appeared previously in [ (208) implies that the doubling rate is maximized by proportional gambling, i.e., that is maximized if and only if b is equal to p. Considering the limitsand, the PMFsatisfies, for every,where the setis defined as. It follows from Proposition 8 below that the RHS of (209) is the unique maximizer of ; and it follows from the proof of Proposition 9 below that the RHS of (210) is a maximizer (not necessarily unique) of . Recall that we assume for every . Then, (209) follows from (207) and the definition of c in (204). To establish (210), define and observe that, for every , where (211) follows from (207) and some algebra; and (212) is justified as follows: if , then equals one; and if , then tends to zero as because and because . □ Using the definition in (24) for the Rényi divergence of negative orders, it is not difficult to see from the proof of Theorem 9 below that (206) also holds for . However, because the Rényi divergence of negative orders is nonpositive instead of nonnegative, the above interpretation is not valid anymore; in particular, for , choosing is in general not optimal. We first show the maximization claim. The only term on the RHS of (206) that depends on b is . Because , this term is maximized if and only if (Proposition 1 (a)). We now establish (206) for ; we omit the proof for , which can be found in [47] (Section 10.3). For , For every , which follows from (207). Now, (206) holds because where (215) follows from (213) and (214); (216) follows from identifying the Rényi divergence (recall that and b are PMFs); (217) follows from (205); and (218) follows from identifying the Rényi divergence (recall that r is a PMF). □ The rest of the section presents the cases , , and . Let b be a PMF. Then, Inequality (220) holds with equality if and only if for all . Observe that if for all , then with probability one, i.e., S does not depend on the winning horse. Equation (219) holds because where (222) holds because, in the limit as tends to , the power mean tends to the minimum (since p is a PMF with for all [15] (Chapter 8)). We show (220) by contradiction. Assume that there exists a PMF b that does not satisfy (220), thus for all . Then, where (224) holds because b is a PMF; (225) follows from (223); and (226) follows from the definition of c in (204). Because is impossible, such a b cannot exist, which establishes (220). It is not difficult to see that (220) holds with equality if for all . We therefore focus on establishing that if (220) holds with equality, then for all . Observe first that, if (220) holds with equality, then, for all , We now claim that (227) holds with equality for all . Indeed, if this were not the case, then there would exist an for which , thus (224)–(226) would hold, which would lead to a contradiction. Hence, if (220) holds with equality, then for all . □ Let , and let b be a PMF. Then, Equality in (228) can be achieved by choosing for some satisfying Proposition 9 implies that if , then it is optimal to bet on a single horse. Unless , this is not the case when : When , an optimal betting strategy requires placing a bet on every horse. This follows from Theorem 9 and our assumption that and are all positive. Inequality (228) holds because where (231) holds because and , and (233) holds because b is a PMF. It is not difficult to see that (228) holds with equality if for some satisfying (229). □ Let b be a PMF. Then, Equality in (236) can be achieved by choosing for some satisfying Equation (235) holds because where (239) holds because in the limit as tends to , the power mean tends to the maximum (since p is a PMF with for all [15] (Chapter 8)). Inequality (236) holds because for all . It is not difficult to see that (236) holds with equality if for some satisfying (237). □

9. Horse Betting with Side Information

In this section, we study the horse betting problem where the gambler observes some side information Y before placing her bets. This setting leads to the conditional Rényi divergence discussed in Section 5 (see Theorem 10). In addition, it provides a new operational meaning to the dependence measure (see Theorem 11). We adapt our notation as follows: The joint PMF of X and Y is denoted . (Recall that X denotes the winning horse.) We drop the assumption that the winning probabilities are positive, but we assume that for all . We continue to assume that the gambler invests all her wealth, so a betting strategy is now a conditional PMF , and the wealth relative S is As in Section 8, define the constant and the PMF The following decomposition of the utility function parallels that of Theorem 9: Let . Then, where the conditional PMF and the PMF are given by Thus, choosing uniquely maximizes among all conditional PMFs . We first show that is uniquely maximized by . The only term on the RHS of (243) that depends on is . Because , this term is maximized if and only if (Proposition 1 (a)). By our assumptions that for all and for all , we have for all . Consequently, if and only if . Consider now (243) for . For , (243) reduces to and some algebra reveals that (246) holds. We conclude with establishing (243) for . For , For every and every , which follows from (244) and (245). Now, (243) holds because where (249) follows from (247) and (248) and the fact that ; (250) follows by identifying the Rényi divergence; (251) follows from (242); and (252) follows by identifying the conditional Rényi divergence using (78). □ It follows from Theorem 10 that, if the gambler gambles optimally, then, for , Operationally, it is clear that preprocessing the side information cannot increase the gambler’s utility, i.e., that, for every conditional PMF , where and are derived from the joint PMF given by This provides the intuition for Theorem 6, where (254) is shown directly. The extreme case is when the preprocessing maps the side information to a constant and hence leads to the case where the side information is absent. In this case, is deterministic and equals . Theorem 9 and Theorem 10 then lead to the following relation between the conditional and unconditional Rényi divergence: where the marginal PMF is given by This motivates Corollary 3, where ( The last result of this section provides a new operational meaning to the Lapidoth–Pfister mutual information : assuming that and that the gambler knows the winning probabilities, measures how much the side information that is available to the gambler but not the bookmaker increases the gambler’s smallest guaranteed utility for a fixed level of fairness c. To see this, consider first the setting without side information. By Theorem 9, the gambler chooses to maximize her utility, where is defined in (207). Then, using the nonnegativity of the Rényi divergence (Proposition 1 (a)), the following lower bound on the gambler’s utility follows from (206): We call the RHS of (258) the smallest guaranteed utility for a fixed level of fairness c because (258) holds with equality if the bookmaker chooses the odds inversely proportional to the winning probabilities. Comparing (258) with (259) below, we see that the difference due to the side information is . Note that is typically not the difference between the utility with and without side information; this is because the odds for which (258) and (259) hold with equality are typically not the same. Let . If is equal to from Theorem 10, then Moreover, for every, there exist oddssuch that (259) holds with equality. For this choice of , (259) holds because where (260) follows from Theorem 10, and (262) follows from Proposition 5. Fix now , let achieve the minimum on the RHS of (261), and choose the odds Then, (261) holds with equality because by (241) and (242). □

10. Horse Betting with Part of the Money

In this section, we treat the possibility that the gambler does not invest all her wealth. We restrict ourselves to the setting without side information and to . (For the case , see [47] (Section 10.5).) We assume that and for all . Denote by the fraction of her wealth that the gambler does not use for betting. (We assume .) Then, is a PMF, and the wealth relative S is the random variable As in Section 8, define the constant We treat the cases and separately, starting with the latter. If , then it is optimal to invest all the money: Assume, let, and let b be a PMF onwith utility. Then, there exists a PMF on with and utility . Choose the PMF as follows: Then, for every , where (268) holds because by assumption. For , holds because (268) implies . For and , follows similarly from (268). □ On the other hand, if and the odds are subfair, i.e., if , then Claim (c) of the following theorem shows that investing all the money is not optimal: Assume , let , and let be a PMF on that maximizes among all PMFs b. Defining the following claims hold: Both the numerator and denominator on the RHS of (270) are positive, so Γ is well-defined and positive. For every , The quantity satisfies In particular, . Claim (b) implies that for every , if and only if . Ordering the elements of such that , the set thus has a special structure: it is either empty or equal to for some integer k. To maximize , the following procedure can be used: for every with the above structure, compute the corresponding b according to (270)–(273); and from these b’s, take one that maximizes . This procedure leads to an optimal solution: an optimal solution exists because we are optimizing a continuous function over a compact set, and corresponds to a set that will be considered by the procedure. The proof is based on the Karush–Kuhn–Tucker conditions. By separately considering the cases and , we first show that, for , a strategy is optimal if and only if the following conditions are satisfied for some : and, for every , Consider first , and define the function , Since and since the logarithm is an increasing function, maximizing over b is equivalent to maximizing . Observe that is concave, thus, by the Karush–Kuhn–Tucker conditions [11] (Theorem 4.4.1), it is maximized by a PMF b if and only if there exists a such that (i) for all with , and (ii) for all with , Henceforth, we use the following notation: to designate that (i) and (ii) both hold, we write Dividing both sides of (279) by and defining , we obtain that (279) is equivalent to Now, (280) translates to (274) for and to (275) for . Consider now , and define as in (276). Then, because , maximizing is equivalent to minimizing . The function is convex, thus Inequality (278) is reversed. Dividing by again reverses the inequalities, thus (280), (274), and (275) continue to hold for . Having established that, for all , a strategy b is optimal if and only if (274) and (275) hold, we next continue with the proof. Let , and let be a PMF on that maximizes . By the above discussion, (274) and (275) are satisfied by for some . The LHS of (274) is positive, so . We now show that for all , To this end, fix . If , then (275) implies and the RHS of (282) is equal to the RHS of (281) because, being equal to , it is positive. If , then (275) implies so the RHS of (281) is zero and (281) hence holds. Having established (281), we next show that for some . For a contradiction, assume that for all . Then, where (284) follows from (275), and (285) holds because by assumption. However, this is impossible: (285) contradicts (274). Let now be such that . Then, by (281), Because and are positive, this implies . Thus, by (274), Splitting the sum on the LHS of (287) depending on whether or , we obtain where (289) follows from (275). Rearranging (290), we obtain Recall that and . In addition, because and hence . Thus, , so both the numerator and denominator in the definition of in (270) are positive, which establishes Claim (a), namely that is well-defined and positive. To establish Claim (b), note that (291) and (270) imply that is given by which, when substituted into (281), yields (272). We conclude by proving Claim (c). Because is a PMF on , where (294) follows from (272). Rearranging (294) yields (273). □

11. Universal Betting for IID Races

In this section, we present a universal gambling strategy for IID races that requires neither knowledge of the winning probabilities nor of the parameter of the utility function and yet asymptotically maximizes the utility function for all PMFs p and all . Consider n consecutive horse races, where the winning horse in the ith race is denoted for . We assume that are IID according to the PMF p, where for all . In every race, the bookmaker offers the same odds , and the gambler spends all her wealth placing bets on the horses. The gambler plays race-after-race, i.e., before placing bets for a race, she is revealed the winning horse of the previous race and receives the money from the bookmaker. Her betting strategy is hence a sequence of conditional PMFs . The wealth relative is the random variable We seek betting strategies that maximize the utility function We first establish that to maximize for a fixed , it suffices to use the same betting strategy in every race; see Theorem 13. We then show that the individual-sequence-universal strategy by Cover–Ordentlich [48] allows to asymptotically achieve the same normalized utility without knowing p or (see Theorem 14). For a fixed , let the PMF be a betting strategy that maximizes the single-race utility discussed in Section 8, and denote by the utility associated with . Using the same betting strategy over n races leads to the utility , and it follows from (295) and (296) that As we show next, is the maximum utility that can be achieved among all betting strategies: Let , and let be a sequence of conditional PMFs. Then, We show (298) for ; analogous arguments establish (298) for and . We prove (298) by induction on n. For , (298) holds because is the maximum single-race utility. Assume now and that (298) is valid for . For , (298) holds because where (303) holds because maximizes the single-race utility , and (305) holds because (298) is valid for . □ In portfolio theory, Cover–Ordentlich [48] (Definition 1) proposed a universal strategy. Adapted to our setting, it leads to the following sequence of conditional PMFs: where ; is the distribution on ; ; and This strategy depends neither on the winning probabilities p nor on the parameter . Denoting the utility (296) associated with the strategy by , we have the following result: For every , Hence, Inequality (310) follows from Theorem 13; and (311) follows from (309) and (310) and the sandwich theorem. It thus remains to establish (309): We do so for ; analogous arguments establish (309) for and . For a fixed sequence , let be a PMF on that maximizes , and denote the wealth relative in (295) associated with using in every race by , thus Let denote the wealth relative in (295) associated with the strategy and the sequence . Using [48] (Theorem 2) it follows that, for every , This implies that (309) holds for because where (315) follows from (313), and (316) follows from (312). □ As discussed in , the wealth relative of the Cover–Ordentlich strategy is not much worse than that of using the same strategyin every race, irrespective of(see (313)). Hence, irrespective of the optimal single-race betting strategy, the Cover–Ordentlich strategy is able to asymptotically achieve the same normalized utility.

1 in total

1. Divergence Measures: Mathematical Foundations and Applications in Information-Theoretic and Statistical Problems.

Authors: Igal Sason
Journal: Entropy (Basel) Date: 2022-05-16 Impact factor: 2.738

1 in total