Literature DB >> 35052055

The Listsize Capacity of the Gaussian Channel with Decoder Assistance.

Abstract

The listsize capacity is computed for the Gaussian channel with a helper that-cognizant of the channel-noise sequence but not of the transmitted message-provides the decoder with a rate-limited description of said sequence. This capacity is shown to equal the sum of the cutoff rate of the Gaussian channel without help and the rate of help. In particular, zero-rate help raises the listsize capacity from zero to the cutoff rate. This is achieved by having the helper provide the decoder with a sufficiently fine quantization of the normalized squared Euclidean norm of the noise sequence.

Entities: Chemical

Keywords: Gaussian channel; bit pipe; cutoff rate; decoder assistance; helper; listsize capacity

Year: 2021 PMID： 35052055 PMCID： PMC8774540 DOI： 10.3390/e24010029

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

The order- listsize capacity of a channel is the supremum of the coding rates for which there exist codes guaranteeing the large-blocklength convergence to one of the -th moment of the cardinality of the list of messages that, given the received output sequence, have positive a posteriori probability. It is zero for the Gaussian channel because, on this channel, no codeword is ruled out by any received sequence so said list contains all the messages. Here we derive this capacity for the Gaussian channel with a helper that observes the noise sequence and describes it to the decoder using a rate-limited noise-free bit pipe; see Figure 1.

Figure 1

Gaussian channel with decoder assistance.

We show that the listsize capacity is then the sum of bit-pipe’s rate and the order- cutoff rate of the Gaussian channel without a helper The latter’s definition is similar to that of the listsize capacity, but with the list now comprising only those messages that are a posteriori at least as likely as the transmitted one. As we shall see, for the Gaussian channel with average power , noise-variance , and corresponding signal-to-noise ratio (SNR) , where (in nats) is a function that plays a prominent role in the analysis of the Reliability Function of said channel (Section 7.4 in [1]), [2]. That analysis does not, however, carry over directly to our setting because it deals with error exponents and not lists. It is interesting to note that (1) also holds when the help rate is zero: the number of help bits required to increase the listsize capacity from zero to is sublinear in the blocklength. In fact, as we shall see, all it takes is a sufficiently fine quantization of the normalized squared Euclidean norm of the noise sequence. The relation (1) is reminiscent of the analogous result on the erasures-only capacity of the Gaussian channel with a rate- helper (Remark 10 in [3]), namely, that where C denotes the Shannon capacity of the Gaussian channel (without help) (Theorem 9.1.1 in [4]), and is the erasures-only capacity, which is defined like but with the requirement on the -th moment of the list replaced by the requirement that the list be of size 1 with probability tending to one. (The Gaussian erasures-only capacity with a helper is given by the RHS of (4) irrespective of whether the assistance is provided to the encoder or decoder.) The latter result in turn is reminiscent of the analogous result on the Shannon capacity with a helper [5,6,7,8] In proving (1), we shall focus on the “direct part,” i.e., that the right-hand side (RHS) of (1) is achievable. The “converse,” that no rate exceeding the RHS of (1) is achievable, is omitted because it follows directly from (Remark 4 in [3]): There it is shown that this is true even if, given the received sequence and the provided help, the list contains only a subset of the messages that are of positive a posteriori probability, namely, those that are a posteriori at least as likely as the transmitted message. The listsize capacity is relevant, for example, when the message set corresponds to tasks [9] and the transmitted message corresponds to one that must be performed by the decoder with absolute certainty. To ensure this, the decoder must perform all the tasks in the list of tasks that are not ruled out by the received sequence. (In addition to the transmitted task, other tasks need not but may be performed.) The -th moment of the list’s size then measures the receiver’s average effort. Results on the listsize capacity and the erasures-only capacity of general discrete memoryless channels (DMCs) in the absence of help are scarce. Noteworthy exceptions are the results of Pinsker and Sheverdjaev [10], Csiszár and Narayan [11], and Telatar [12], that provide sufficient conditions for the erasures-only capacity to equal the Shannon capacity and for the listsize capacity to equal the cutoff rate. Asymptotic results on the erasures-only capacity in the low-noise regime can be found in [13,14]. Once noiseless feedback is introduced, the problems become more tractable [15,16,17]. The rest of the paper is organized as follows. Section 2 describes our set-up and presents the main result. Section 3 contains some classical and some new observations regarding Gallager’s function and its modification. Section 4 derives the cutoff rate of the Gaussian channel without help and proves (2). Section 5 describes and analyzes a coding scheme that proves the direct part of (1).

2. The Main Result

A power- blocklength-n encoder for a message set is a mapping that maps each message to an n-tuple whose Euclidean norm satisfies We sometimes use to denote , and to denote the k-th component of , so The encoder is said to be of rate R if the cardinality of is , in which case we often assume that . (We ignore the fact that need not be an integer; this issue washes out in the large-n asymptotics we study.) When a message is sent over the discrete-time additive Gaussian noise channel using the encoder , the channel produces the random vector whose k-th component is where are independent and identically distributed (IID) zero-mean Gaussians of variance . We assume that is positive and use to denote the density of the channel’s output when its input is x, i.e., the mean-x variance- Gaussian density which we extend to n-tuples in a memoryless fashion: For convenience, we define Given an output sequence and a message m, we define the “at-least-as-likely list” Assuming, as we do, that the messages are a priori equally likely, this list comprises the messages that, given the output sequence , are a posteriori at least as likely as m. If a message M, drawn equiprobably from , is transmitted over the channel with a resulting received sequence , then the cardinality of the at-least-as-likely list is a random positive integer, and we denote its -th moment : where denotes the Lebesgue measure on . For a given , we define the order- cutoff rate as the supremum of the rates R for which there exists a sequence of rate-R power- blocklength-n encoders satisfying The order-ρ cutoff rate (3). See Section 4. □ A -valued description of the noise sequence is a mapping with the understanding that , which we denote T, is the description of . We say that a sequence of descriptions is of rate (nats) if Suppose now that, in addition to the received sequence , the receiver is also presented with the description of the noise, and that, based on the two, it forms the “remotely-plausible list” comprising the messages that have positive a posteriori probability given the two: Given , the listsize capacity with rate- decoder assistance is the supremum of the rates R for which there exists a sequence of rate-R power- blocklength-n encoders and a sequence of descriptions of rate such that On the Gaussian channel, the listsize capacity with rate-where (2) and (3). The “converse,” that (19) cannot be achieved when the rate exceeds the RHS of (20), follows from (Remark 4 in [3]). The “direct part,” describing a coding scheme that achieves (19) with rates approaching the RHS of (20), is proved in Section 5. □

3. Preliminaries

Given and any probability measure Q on , Gallager’s function for our channel is defined as [1] where is now the Lebesgue measure on . The result of maximizing over all Q under which , is denoted : The multi-letter extension of is where is a probability measure on ; the integrals are over ; the channel is defined in (11). Similarly, Given probability measures on and on that satisfy the power constraints and respectively, the product measure on satisfies the power constraint and because The sequence is thus superadditive, and Feket’s Subadditive lemma implies that converges to its supremum: We shall later see (cf. (55) ahead) that where is defined in (3). We shall also need Gallager’s modified function. To highlight its relation to the unmodified function, which is quite general, we shall use for and for . We shall also replace with . Given some , some probability distribution Q on under which , and some , the modified Gallager’s function is defined as We shall also be interested in the maximum of over both Q and r. We distinguish between two cases depending on whether holds strictly or not. In the former case we only allow r to be zero, whereas in the latter case it can be any non-negative number. We thus define and The next proposition provides a lower bound on . Any probability distribution Q on Γ provides the lower bound Let Q be any input distributions Q under which is of finite second moment and . For each , let be the conditional distribution of the n-fold product distribution given the event , where where is some positive constant. Thus, for every Borel measurable subset of , with For any , we can upper-bound the Radon–Nykodim derivative of with respect to product distribution as follows: where equals 1 if the statement is true and else 0. Using this bound on the Radon–Nykodim derivative we obtain: By the Central Limit Theorem, tends to 1/2 as n tends to infinity, so (43) implies that Taking the supremum of the RHS over all , establishes that and hence, by (24), proves (33). □ We next turn to upper-bounding . If the probability distribution and, consequently, The proof is based on Proposition 2 in [18], which implies that for every density on and any probability measure on , Applying this inequality to the product density where is a density on , and using the product form of the channel (11), we obtain that for any density on where is the i-th marginal of , and is the probability measure on defined by Observe that if under , then under . This observation and (51) establish (46). Since (46) holds for all n, (47) must also hold. □

4. The Cutoff Rate of the Gaussian Channel

In this section, we prove Theorem 1. Since scaling the output does not change the cutoff rate, we will assume WLOG that the noise variance is 1 and the transmit power is ; see (12). Thus, and each codeword satisfies

4.1. Computing

Here we shall establish that on the Gaussian channel (53) where is defined in (3), and is the zero-mean variance- Gaussian distribution. To this end, we shall derive matching upper and lower bounds on the limit. We begin with the former.

4.1.1. Upper-Bounding

We show that on the channel (10) The proof is based on Proposition 2 with the density corresponding to a centered Gaussian of variance , where and Evaluating the RHS of (47) for this density, we obtain where in (61) we defined To conclude the proof, it remains to show that the RHS of (64) coincides with . To this end, observe that some basic algebra reveals that and Therefore, the first term in (64) can be rewritten as and the remaining terms rewritten as The sum equals to .

4.1.2. Lower-Bounding

To lower-bound , we shall use Proposition 1 with Q chosen as a centered variance- Gaussian distribution . For this probability distribution Gallager calculated (Section 7.4 in [1]). He showed that for any , where is defined in (3). Using this result and Proposition 1 we obtain

4.2. The Mapping Is Monotonically Decreasing

For the purpose of proving the achievability of , we will need the fact that it is monotonically decreasing in . In view of (55), it suffices to show that, for every , the mapping is monotonically decreasing. In view of (24), the latter will follow once we establish the monotonicity of for any fixed . Since evaluates to zero at , this monotonicity can be established by showing that the mapping is concave. This is established in (Appendix 5.B in [1]). (That appendix deals with finite alphabets, but the proof goes through also to our case.)

4.3. Achievability of

The achievability of will be proved using a random-coding argument. Let Q be the zero-mean variance- Gaussian distribution, let be a positive constant, and let be the distribution on defined in (35) and (36). Draw the codewords of a blocklength-n random codebook independently, each according to , so with probability 1 for every . By symmetry, (where the expectation is over the random choice of codebook and on the channel behavior) does not depend on m. Consequently, and if we establish that tends to 1, it will follow by the random-coding argument that there exists a codebook for which the LHS of (77)—with the expectation now over the channel behavior only—tends to 1. Defining we can express the RHS of (77) as and we seek to show that To this end, we shall need the following lemma. Let where The implication (ii) ⟹ (i) follows by noting for any and so As for the implication (i) ⟹ (ii), note that any and so The implication is now established by noting that (i) implies that because, by Markov’s inequality (and the strict positivity of ), □ In light of the above lemma, to establish (80) it suffices to show that i.e., that where the outer expectation is over and . A related expectation—but one where it is the conditional expectation that is raised to the -th power—is studied in the following lemma: If See Appendix A. □ To establish (88) using this lemma, we distinguish between two cases depending on whether or . In the former case is concave, so Jensen’s inequality implies that which, together with Lemma 2, implies (88) whenever . Suppose now that . Conditional on the transmitted codeword and the output , the random variables are IID Bernoulli, with determined by . We can thus use Rosenthal’s technique (Lemma 5.10 in [19]), [20] to obtain Taking the expectation over and yields The first term on the RHS can be treated using the lemma. The second—but for the constant—is the one encountered when is 1. Since by Section 4.2, (because for the case at hand), it too tends to zero when .

4.4. No Rate Exceeding Is Achievable

To show the converse, we need Arıkan’s lower bound on guessing [21]. Fix any sequence of rate-R blocklength-n codebooks satisfying the cost constraint. For any , let be the induced probability distribution on . Since the codebook satisfies the cost constraint, under . Given , list the messages in decreasing order of likelihood (resolving ties arbitrarily, e.g., ranking low numerical values of m higher), and let denote the ranking of the message m in this list. Note that where the inequality can be strict because there may be messages that are in because they have the same likelihood as m, and that are yet ranked lower than m by because of the way ties are resolved. It follows from this inequality that the -th moment of cannot tend to one unless the -th moment of does. By Arıkan’s guessing inequality [21], so the -th moment of can tend to one only if From this, the converse now follows using (24) and (55) because

5. The Direct Part of Theorem 2

In this section we prove the direct part of Theorem 2: when the decoder can be provided with a rate- description of the noise, the convergence (19) can be achieved at all transmission rates below . As noted earlier, the converse follows directly from (Remark 4 in [3]). Our proof treats the cases and separately. As in Section 4, we assume that the channel is normalized to having noise variance 1 and transmit power .

5.1. Case 1:

The analogous result for the modulo-additive channel was proved in [3] by having the helper provide the decoder with a lossless description of the type of the noise sequence. Since this type fully specifies the a posteriori probability of the transmitted message, the decoder’s remotely-plausible-with-this-help list contains only messages whose a posteriori probability is equal to that of the correct message. It is therefore a subset of the at-least-as-likely list (without help) and hence of smaller-or-equal -th moment. Consequently, any rate that allows the latter to tend to one, also allows the former to tend to one. On the Gaussian channel the likelihood is specified by the normalized squared Euclidean norm of the noise sequence . The latter, however, cannot be described at zero rate with infinite precision. This motivates us to quantize it and have the quantized version be the zero-rate help. The result will then follow by considering the high-resolution limit of the achievable rates. For this purpose, a uniform quantizer will do. Given some large (which determines the overload region) and some large K (corresponding to the number of quantization cells), we partition the interval into K subintervals, each of length . The helper, upon observing the noise sequence , produces The constant M, which does not depend on the blocklength n, is chosen large enough to guarantee that the large-deviation probability of overload decay sufficiently fast in n so that the contribution of the overload to the -th moment of the list be negligible, even if an overload results in the list containing all codewords: (Upper bounds on the tail of the distribution show, for example, that for , the choice will do.) Since the help takes values in the finite set , where K does not depend on the blocklength, it is of zero rate. As in Section 4.3, we consider a random codebook whose codewords are drawn independently from the conditional Gaussian distribution, i.e., from defined in (35) and (36) with Q being , the centered variance- Gaussian distribution. Using the same symmetry arguments, we also assume that the transmitted message is and study the -th moment of the list under this assumption. Defining we can express the -th moment of the remotely-plausible list when as In view of Lemma 1, we need to prove that where the expectation is over both the random choice of the codebook and the channel behavior. To analyze the LHS of (107), we define for every and every message the binary random variable Our analysis of depends on whether differs from K (no overload) or equals K (corresponding to quantizer overload). In the former case, the random variable can be upper bounded by because where (110) holds because, for the case at hand, the equality of helper’s description implies that and lie in a same interval of length . In the latter case—which is exponentially rare when M exceeds the noise variance—we simply upper bound by 1. The -th moment of the list can now be expressed using the law of total expectation as The second term on the RHS of (116) tends to zero by (104). The first term is studied in the following lemma: If See Appendix B. □ For a given , achievability is thus established using this lemma and (116) by picking M sufficiently large for (104) to hold, and then picking K large enough to guarantee that so that, by Lemma 3, the first term on the RHS of (116) will also tend to zero.

5.2. Case 2:

The key to proving the achievability of is in showing that rate- help can be utilized to increase the data rate by , and that this can be done losslessly, with arbitrarily small (positive) power, and in one channel use. To show how this can be done, we show that—by using the channel once to send a single input that is bounded by (with any prespecified positive number) and using help taking values in the set —we can send error-free a message taking values in said set. To transmit , the encoder sends which is upper-bounded by . Upon observing the noise Z, the helper produces the description T by quantizing the normalized noise and taking modulo, i.e., which is an element of . Based on Y and T, the decoder can calculate which equals m, because where (123) holds because m and T are both integers. Using this building-block, we can now prove the achievability of by employing two-phase time sharing. Specifically, we propose the following blocklength- scheme. In the first n channel uses, the helper operates at rate zero as in Section 5.1. By the achievability result proved in Section 5.1, for any , there exists a sequence of blocklength-n rate-R codebooks , with for every m, and zero-rate helpers , such that the remotely-plausible-list satisfies In the -th channel-use we use the aforementioned coding scheme with being . Since that scheme is error-free, the overall remotely-plausible-list for the two phases has the same cardinality as that of the first phase, namely , and hence, its -th moment tends to 1 by (125). The achievability now follows by verifying that, the power of the transmitted input sequence satisfies the rate of the helper is and the rate achieved by the scheme is which tend to and , respectively, as n tends to infinity.

1 in total

1. Error Exponents and α-Mutual Information.

Authors: Sergio Verdú
Journal: Entropy (Basel) Date: 2021-02-05 Impact factor: 2.524

1 in total