Literature DB >> 26379538

Time resolution dependence of information measures for spiking neurons: scaling and universality.

Sarah E Marzen¹, Michael R DeWeese², James P Crutchfield³.

Abstract

The mutual information between stimulus and spike-train response is commonly used to monitor neural coding efficiency, but neuronal computation broadly conceived requires more refined and targeted information measures of input-output joint processes. A first step toward that larger goal is to develop information measures for individual output processes, including information generation (entropy rate), stored information (statistical complexity), predictable information (excess entropy), and active information accumulation (bound information rate). We calculate these for spike trains generated by a variety of noise-driven integrate-and-fire neurons as a function of time resolution and for alternating renewal processes. We show that their time-resolution dependence reveals coarse-grained structural properties of interspike interval statistics; e.g., τ-entropy rates that diverge less quickly than the firing rate indicated by interspike interval correlations. We also find evidence that the excess entropy and regularized statistical complexity of different types of integrate-and-fire neurons are universal in the continuous-time limit in the sense that they do not depend on mechanism details. This suggests a surprising simplicity in the spike trains generated by these model neurons. Interestingly, neurons with gamma-distributed ISIs and neurons whose spike trains are alternating renewal processes do not fall into the same universality class. These results lead to two conclusions. First, the dependence of information measures on time resolution reveals mechanistic details about spike train generation. Second, information measures can be used as model selection tools for analyzing spike train processes.

Entities: Disease Species

Keywords: 05.45.Tp 02.50.Ey 87.10.Vg 87.19.ll 87.19.lo 87.19.ls; alternating renewal process; entropy rate; excess entropy; integrate and fire neuron; leaky integrate and fire neuron; quadratic integrate and fire neuron; renewal process; statistical complexity

Year: 2015 PMID： 26379538 PMCID： PMC4551861 DOI： 10.3389/fncom.2015.00105

Source DB: PubMed Journal: Front Comput Neurosci ISSN： 1662-5188 Impact factor: 2.380

1. Introduction

Despite a half century of concerted effort (Mackay and McCulloch, 1952), neuroscientists continue to debate the relevant timescales of neuronal communication as well as the basic coding schemes at work in the cortex, even in early sensory processing regions of the brain thought to be dominated by feedforward pathways (Softky and Koch, 1993; Bell et al., 1995; Shadlen and Newsome, 1995; Stevens and Zador, 1998; Destexhe et al., 2003; DeWeese and Zador, 2006; Jacobs et al., 2009; Koepsell et al., 2010; London et al., 2010). For example, the apparent variability of neural responses to repeated presentations of sensory stimuli has led many to conclude that the brain must average across tens or hundreds of milliseconds or across large populations of neurons to extract a meaningful signal (Shadlen and Newsome, 1998). Whereas, reports of reliable responses suggest shorter relevant timescales and more nuanced coding schemes (Berry et al., 1997; Reinagel and Reid, 2000; DeWeese et al., 2003). In fact, there is evidence for different characteristic timescales for neural coding in different primary sensory regions of the cortex (Yang and Zador, 2012). In addition to questions about the relevant timescales of neural communication, there has been an ongoing debate regarding the magnitude and importance of correlations among the spiking responses of neural populations (Meister et al., 1995; Nirenberg et al., 2001; Averbeck et al., 2006; Schneidman et al., 2003, 2006). Most studies of neural coding focus on the relationship between a sensory stimulus and the neural response. Others consider the relationship between the neural response and the animal's behavioral response (Britten et al., 1996), the relationship between pairs or groups of neurons at different stages of processing (Linsker, 1989; Dan et al., 1996), or the variability of neural responses themselves without regard to other variables (Schneidman et al., 2006). Complementing the latter studies, we are interested in quantifying the randomness and predictability of neural responses without reference to stimulus. We consider the variability of a given neuron's activity at one time and how this is related to the same neuron's activity at other times in the future and the past. Along these lines, information theory (Shannon, 1948; Cover and Thomas, 2006) provides an insightful and rich toolset for interpreting neural data and for formulating theories of communication and computation in the nervous system (Rieke et al., 1999). In particular, Shannon's mutual information has developed into a powerful probe that quantifies the amount of information about a sensory stimulus encoded by neural activity (Mackay and McCulloch, 1952; Barlow, 1961; Stein, 1967; Laughlin, 1981; Sakitt and Barlow, 1982; Srinivasan et al., 1982; Linsker, 1989; Bialek et al., 1991; Theunissen and Miller, 1991; Atick, 1992; Rieke et al., 1999). Similarly, the Shannon entropy has been used to quantify the variability of the resulting spike-train response. In contrast to these standard stimulus- and response-averaged quantities, a host of other information-theoretic measures have been applied in neuroscience, such as the Fisher information (Cover and Thomas, 2006) and various measures of the information gained per observation (DeWeese and Meister, 1999; Butts and Goldman, 2006). We take an approach that complements more familiar informational analyses. First, we consider “output-only” processes, since their analysis is a theoretical prerequisite to understanding information in the stimulus-response paradigm. Second, we analyze rates of informational divergence, not only nondivergent components. Indeed, we show that divergences, rather than being a kind of mathematical failure, are important and revealing features of information processing in spike trains. We are particularly interested in the information content of neural spiking on fine timescales. How is information encoded in spike timing and, more specifically, in interspike intervals? In this regime, the critical questions turn on determining the kind of information encoded and the required “accuracy” of individual spike timing to support it. At present, unfortunately, characterizing communication at submillisecond time scales and below remains computationally and theoretically challenging. Practically, a spike train is converted into a binary sequence for analysis by choosing a time bin size and counting the number of spikes in successive time bins. Notwithstanding Strong et al. (1998) and Nemenman et al. (2008), there are few studies of how estimates of communication properties change as a function of time bin size, though there are examples of both short (Panzeri et al., 1999) and long (DeWeese, 1996; Strong et al., 1998) time expansions. Said most plainly, it is difficult to directly calculate the most basic quantities—e.g., communication rates between stimulus and spike-train response—in the submillisecond regime, despite progress on undersampling (Treves and Panzeri, 1995; Nemenman et al., 2004; Archer et al., 2012). Beyond the practical, the challenges are also conceptual. For example, given that a stochastic process' entropy rate diverges in a process-characteristic fashion for small time discretizations (Gaspard and Wang, 1993), measures of communication efficacy require careful interpretation in this limit. Compounding the need for better theoretical tools, measurement techniques will soon amass enough data to allow serious study of neuronal communication at fine time resolutions and across large populations (Alivisatos et al., 2012). In this happy circumstance, we will need guideposts for how information measures of neuronal communication vary with time resolution so that we can properly interpret the empirical findings and refine the design of nanoscale probes. Many single-neuron models generate neural spike trains that are renewal processes (Gerstner and Kistler, 2002). Starting from this observation, we use recent results (Marzen and Crutchfield, 2015) to determine how information measures scale in the small time-resolution limit. This is exactly the regime where numerical methods are most likely to fail due to undersampling and, thus, where analytic formulae are most useful. We also extend the previous analyses to structurally more complex, alternating renewal processes and analyze the time-resolution scaling of their information measures. This yields important clues as to which scaling results apply more generally. We then show that, across several standard neuronal models, the information measures are universal in the sense that their scaling does not depend on the details of spike-generation mechanisms. Several information measures we consider are already common fixtures in theoretical neuroscience, such as Shannon's source entropy rate (Strong et al., 1998; Nemenman et al., 2008). Others have appeared at least once, such as the finite-time excess entropy (or predictable information) (Bialek et al., 2001; Crutchfield and Feldman, 2003) and statistical complexity (Haslinger et al., 2010). And others have not yet been applied, such as the bound information (Abdallah and Plumbley, 2009, 2012; James et al., 2011, 2014). The development proceeds as follows. Section 2 reviews notation and definitions. To investigate the dependence of causal information measures on time resolution, Section 3 studies a class of renewal processes motivated by their wide use in describing neuronal behavior. Section 4 then explores the time-resolution scaling of information measures of alternating renewal processes, identifying those scalings likely to hold generally. Section 5 evaluates continuous-time limits of these information measures for common single-neuron models. This reveals a new kind of universality in which the information measures' scaling is independent of detailed spiking mechanisms. Taken altogether, the analyses provide intuition and motivation for several of the rarely-used, but key informational quantities. For example, the informational signatures of integrate-and-fire model neurons differ from both simpler, gamma-distributed processes and more complex, compound renewal processes. Finally, Section 6 summarizes the results, giving a view to future directions and mathematical and empirical challenges.

2. Background

We can only briefly review the relevant physics of information. Much of the phrasing is taken directly from background presented in Marzen and Crutchfield (2014, 2015). Let us first recall the causal state definitions (Shalizi and Crutchfield, 2001) and information measures of discrete-time, discrete-state processes introduced in Crutchfield et al. (2009), James et al. (2011). The main object of study is a process : the list of all of a system's behaviors or realizations {…x−2, x−1, x0, x1, …} and their probabilities, specified by the joint distribution Pr(…X−2, X−1, X0, X1, …). We denote a contiguous chain of random variables as X0: = X0X1⋯X. We assume the process is ergodic and stationary—Pr(X0:) = Pr(X) for all t ∈ ℤ —and the measurement symbols range over a finite alphabet: x ∈ . In this setting, the present X0 is the random variable measured at t = 0, the past is the chain X:0 = …X−2X−1 leading up the present, and the future is the chain following the present X1: = X1X2⋯ (We suppress the infinite index in these). As the Introduction noted, many information-theoretic studies of neural spike trains concern input-output information measures that characterize stimulus-response properties; e.g., the mutual information between stimulus and resulting spike train. In the absence of stimulus or even with a non-trivial stimulus, we can still study neural activity from an information-theoretic point of view using “output-only” information measures that quantify intrinsic properties of neural activity alone: How random is it? The entropy rate hμ = H[X0|X:0], which is the entropy in the present observation conditioned on all past observations (Cover and Thomas, 2006). What must be remembered about the past to optimally predict the future? The causal states +, which are groupings of pasts that lead to the same probability distribution over future trajectories (Crutchfield and Young, 1989; Shalizi and Crutchfield, 2001). How much memory is required to store the causal states? The statistical complexity , or the entropy of the causal states (Crutchfield and Young, 1989). How much of the future is predictable from the past? The excess entropy E = I[X:0; X0:], which is the mutual information between the past and the future (Crutchfield and Feldman, 2003). How much of the generated information (hμ) is relevant to predicting the future? The bound information bμ = I[X0; X1:|X:0], which is the mutual information between the present and future observations conditioned on all past observations (Abdallah and Plumbley, 2009; James et al., 2011). How much of the generated information is useless—neither affects future behavior nor contains information about the past? The ephemeral information rμ = H[X0|X:0, X1:], which is the entropy in the present observation conditioned on all past and future observations (Verdú and Weissman, 2006; James et al., 2011). The information diagram of Figure 1 illustrates the relationship between hμ, rμ, bμ, and E. When we change the time discretization Δt, our interpretation and definitions change somewhat, as we describe in Section 3.

Figure 1

Information diagram illustrating the anatomy of the information . Although the past entropy H[X:0] and the future entropy H[X1:] typically are infinite, space precludes depicting them as such. They do scale in a controlled way, however: H[X − ℓ:0] ∝ hμℓ and H[X1:ℓ] ∝ hμℓ. The two atoms labeled bμ are the same, since we consider only stationary processes. (After James et al., 2011, with permission.) Shannon's various information quantities—entropy, conditional entropy, mutual information, and the like—when applied to time series are functions of the joint distributions Pr(X0:). Importantly, for a given set of random variables they define an algebra of atoms out of which information measures are composed (Yeung, 2008). James et al. (2011) used this to show that the past and future partition the single-measurement entropy H(X0) into the measure-theoretic atoms of Figure 1. These include those—rμ and bμ—already mentioned and the enigmatic information: which is the co-information between past, present, and future. One can also consider the amount of predictable information not captured by the present: which is the elusive information (Ara et al., 2015). It measures the amount of past-future correlation not contained in the present. It is nonzero if the process has “hidden states” and is therefore quite sensitive to how the state space is “observed” or coarse-grained. The total information in the future predictable from the past (or vice versa)—the excess entropy—decomposes into particular atoms: The process's Shannon entropy rate hμ is also a sum of atoms: This tells us that a portion of the information (hμ) a process spontaneously generates is thrown away (rμ) and a portion is actively stored (bμ). Putting these observations together gives the information anatomy of a single measurement X0: Although these measures were originally defined for stationary processes, they easily carry over to a nonstationary process of finite Markov order. Calculating these information measures in closed-form given a model requires finding the ϵ-machine, which is constructed from causal states. Forward-time causal states + are minimal sufficient statistics for predicting a process's future (Crutchfield and Young, 1989; Shalizi and Crutchfield, 2001). This follows from their definition—a causal state σ+ ∈ + is a sets of pasts grouped by the equivalence relation ~+: So, + is a set of classes—a coarse-graining of the uncountably infinite set of all pasts. At time t, we have the random variable that takes values σ+ ∈ + and describes the causal-state process . is a partition of pasts X: that, according to the indexing convention, does not include the present observation X. In addition to the set of pasts leading to it, a causal state has an associated future morph—the conditional measure of futures that can be generated from it. Moreover, each state inherits a probability from the process's measure over pasts Pr(X:). The forward-time statistical complexity is then the Shannon entropy of the state distribution (Crutchfield and Young, 1989): . A generative model is constructed out of the causal states by endowing the causal-state process with transitions: that give the probability of generating the next symbol x and ending in the next state σ′, if starting in state σ (Residing in a state and generating a symbol do not occur simultaneously. Since symbols are generated during transitions there is, in effect, a half time-step difference in the indexes of the random variables X and . We suppress notating this.) To summarize, a process's forward-time is the tuple {, +, {T(:x ∈ }}. For a discrete-time, discrete-alphabet process, the ϵ-machine is its minimal unifilar hidden Markov model (HMM) (Crutchfield and Young, 1989; Shalizi and Crutchfield, 2001) (For general background on HMMs see Paz, 1971; Rabiner and Juang, 1986; Rabiner, 1989). Note that the causal state set can be finite, countable, or uncountable; the latter two cases can occur even for processes generated by finite-state HMMs. Minimality can be defined by either the smallest number of states or the smallest entropy over states (Shalizi and Crutchfield, 2001). Unifilarity is a constraint on the transition matrices T( such that the next state σ′ is determined by knowing the current state σ and the next symbol x. That is, if the transition exists, then has support on a single causal state.

3. Infinitesimal time resolution

One often treats a continuous-time renewal process, such as a spike train from a noisy integrate-and-fire neuron, in a discrete-time setting (Rieke et al., 1999). With results of Marzen and Crutchfield (2015) in hand, we can investigate how artificial time binning affects estimates of a model neuron's spike train's randomness, predictability, and information storage in the limit of infinitesimal time resolution. This is exactly the limit in which analytic formulae for information measures are most useful, since increasing the time resolution artificially increases the apparent range of temporal correlations as shown in Figure 3. Time-binned neural spike trains of noisy integrate-and-fire neurons have been studied for quite some time (Mackay and McCulloch, 1952) and, despite that history, this is still an active endeavor (Rieke et al., 1999; Cessac and Cofre, 2013). Our emphasis and approach differ, though. We do not estimate statistics or reconstruct models from simulated spike train data using nonparametric inference algorithms—e.g., as done in Haslinger et al. (2010). Rather, we ask how ϵ-machine extracted from a spike train process and information measures calculated from them vary as a function of time coarse-graining. Our analytic approach highlights an important lesson about such studies in general: A process' ϵ-machine and information anatomy are sensitive to time resolution. A secondary and compensating lesson is that the manner in which the ϵ-machine and information anatomy scale with time resolution conveys much about the process' structure. Suppose we are given a neural spike train with interspike intervals independently drawn from the same interspike interval (ISI) distribution ϕ(t) with mean ISI 1/μ. To convert the continuous-time point process into a sequence of binary spike-quiescence symbols, we track the number of spikes emitted in successive time bins of size Δt. Our goal, however, is to understand how the choice of Δt affects reported estimates for Cμ, hμ, E, bμ, and σμ. The way in which each of these vary with Δt reveals information about the intrinsic time scales on which a process behaves; cf., the descriptions of entropy rates in Costa et al. (2002, 2005) and Gaspard and Wang (1993). We concern ourselves with the infinitesimal Δt limit, even though the behavior of these information atoms is potentially most interesting when Δt is on the order of the process' intrinsic time scales. In the infinitesimal time-resolution limit, when Δt is smaller than any intrinsic timescale, the neural spike train is a renewal process with interevent count distribution: and survival function: The interevent distribution F(n) is the probability distribution that the silence separating successive events (bins with spikes) is n counts long. While the survival function w(n) is the probability that the silence separating successive events is at least n counts long. The ϵ-machine transition probabilities therefore change with Δt. The mean interevent count 〈T〉 + 1 is not the mean interspike interval 1/μ since one must convert between counts and spikes: In this limit, the ϵ-machine of spike-train renewal processes can take one of the topologies described in Marzen and Crutchfield (2015). Here, we focus only on two of these ϵ-machine topologies. The first topology corresponds to that of an eventually Poisson process, in which the ISI distribution takes the form ϕ(t) = ϕ(T)e−λ( for some finite T and λ > 0. A Poisson neuron with firing rate λ and refractory period of time T, for instance, eventually (t > T) generates a Poisson process. Hence, we refer to them as eventually Poisson processes; see Figure 2B. A Poisson process is a special type of eventually Poisson process with T = 0; see Figure 2A. However, the generic renewal process has topology shown in Figure 2C. Technically, only non-eventually-Δ Poisson processes have this ϵ-machine topology, but for our purposes, this is the ϵ-machine topology for any renewal process not generated by a Poisson neuron; see Marzen and Crutchfield (2015).

Figure 2

ϵ-Machines of processes generated by Poisson neurons and by integrate-and-fire neurons (left to right): (A) The ϵ-machine for a Poisson process. (B) The ϵ-machine for an eventually Poisson process; i.e., a Poisson neuron with a refractory period of length ñΔt. (C) The ϵ-machine for a generic renewal process—the not eventually Δ-Poisson process of Marzen and Crutchfield (2015); i.e., the process generated by noise-driven integrate-and-fire neurons. Edge labels p|x denote emitting symbol x (“1” is “spike”) with probability p. (Reprinted with permission from Marzen and Crutchfield, 2015.) At present, inference algorithms can only infer finite ϵ-machines. So, such algorithms applied to renewal processes will yield an eventually Poisson topology. (Compare Figure 2C to the inferred approximate of an integrate-and-fire neuron in Figure 2 in Haslinger et al., 2010.) The generic renewal process has an infinite ϵ-machines, though, for which the inferred ϵ-machines are only approximations. We calculated E and Cμ using the expressions given in Marzen and Crutchfield (2015). Substituting in Equations (3), (4), and (5), we find that the excess entropy E tends to: where is the probability that an ISI is longer than t. It is easy to see that E(Δt) limits to a positive and (usually) finite value as the time resolution vanishes, with some exceptions described below. Similarly, using the expression in Marzen and Crutchfield (2015)'s Appendix II, one can show that the finite-time excess entropy E(T) takes the form: As T → ∞, E(T) → E. Note that these formulae apply only when mean firing rate μ is nonzero. Even if E limits to a finite value, the statistical complexity typically diverges due to its dependence on time discretization Δt. Suppose that we observe an eventually Poisson process, such that ϕ(t) = ϕ(T)e−λ( for t > T. Then, from formulae in Marzen and Crutchfield (2015), statistical complexity in the infinitesimal time-resolution limit becomes: ignoring terms of O(Δt) or higher. The first term diverges, and its rate of divergence is the probability of observing a time since last spike less than T. This measures the spike train's deviation from being Δ-Poisson and so reveals the effective dimension of the underlying causal state space. Cμ's remaining nondivergent component is equally interesting. In fact, it is the differential entropy of the time since last spike distribution. An immediate consequence of the analysis is that this generic infinitesimal renewal process is highly cryptic (Crutchfield et al., 2009). It hides an arbitrarily large amount of its internal state information: Cμ diverges as Δt → 0 but E (usually) asymptotes to a finite value. We have very structured processes that have disproportionately little in the future to predict. Periodic processes constitute an important exception to this general rule of thumb for continuous-time processes. A neuron that fires every T seconds without jitter has E = Cμ, and both E and Cμ diverge logarithmically with 1/Δt. It is straightforward to show that any information measure contained within the present—H[X0], hμ, bμ, rμ, and qμ (recall Figure 1)—all vanish as Δt tends to 0. Therefore, E and the entropy rate becomes: With Δt → 0, hμ nominally tends to 0: As we shorten the observation time scale, spike events become increasingly rare. There are at least two known ways to address hμ apparently not being very revealing when so defined. On the one hand, rather than focusing on the uncertainty per symbol, as hμ does, we opt to look at the uncertainty per unit time: hμ/Δt. This is the so-called Δt-entropy rate (Gaspard and Wang, 1993) and it diverges as −μ log Δt. Such divergences are to be expected: The large literature on dimension theory characterizes a continuous set's randomness by its divergence scaling rates (Farmer et al., 1983; Mayer-Kress, 1986). Here, we are characterizing sets of similar cardinality—infinite sequences. On the other hand, paralleling sequence block-entropy definition of entropy rate (hμ =ℓ → ∞H[X0:ℓ]/ℓ) (Crutchfield and Feldman, 2003), continuous-time entropy rates are often approached within a continuous-time framework using: where H(T) is path entropy, the continuous-time analog of the block entropy H(ℓ) (Girardin, 2005). In these analyses, any log Δt terms are regularized away using Shannon's differential entropy (Cover and Thomas, 2006), leaving the nondivergent component . Using the Δt-entropy rate but keeping both the divergent and nondivergent components, as in Equations (8) and (9), is an approach that respects both viewpoints and gives a detailed picture of time-resolution scaling. A major challenge in analyzing spike trains concerns locating the timescales on which information relevant to the stimulus is carried. Or, more precisely, we are often interested in estimating what percentage of the raw entropy of a neural spike train is used to communicate information about a stimulus; cf. the framing in Strong et al. (1998). For such analyses, the entropy rate is often taken to be H(Δt, T)/T, where T is the total path time and H(Δt, T) is the entropy of neural spike trains over time T resolved at time bin size Δt. In terms of previously derived quantities and paralleling the well known block-entropy linear asymptote H(ℓ) = E + hμℓ (Crutchfield and Feldman, 2003), this is: From the scaling analyses above, the extensive component of H(Δt, T)/T diverges logarithmically in the small Δt limit due to the logarithmic divergence (Equation 9) in hμ(Δt)/Δt. If we are interested in accurately estimating the entropy rate, then the above is one finite-time T estimate of it. However, there are other estimators, including: This estimator converges more quickly to the true entropy rate hμ(Δt)/Δt than does H(Δt, T)/T. No such log Δt divergences occur with bμ. Straightforward calculation, not shown here, reveals that: Since and diverges, the ephemeral information rate rμ(Δt)/Δt also diverges as Δt → 0. The bulk of the information generated by such renewal processes is dissipated and, having no impact on future behavior, is not useful for prediction. Were we allowed to observe relatively microscopic membrane voltage fluctuations rather than being restricted to the relatively macroscopic spike sequence, the Δt-scaling analysis would be entirely different. Following Marzen and Crutchfield (2014) or natural extensions thereof, the statistical complexity diverges as −log ϵ, where ϵ is the resolution level for the membrane voltage, the excess entropy diverges as log1/Δt, the time-normalized entropy rate diverges as , and the time-normalized bound information diverges as 1/2Δt. In other words, observing membrane voltage rather than spikes makes the process far more predictable. The relatively more macroscopic modeling at the level of spikes throws away much detail of the underlying biochemical dynamics. To illustrate the previous points, we turn to numerics and a particular neural model. Consider an (unleaky) integrate-and-fire neuron driven by white noise whose membrane voltage (after suitable change of parameters) evolves according to: where η(t) is white noise such that 〈η(t)〉 = 0 and 〈η(t)η(t′)〉 = δ(t − t′). When V = 1, the neuron spikes and the voltage is reset to V = 0; it stays at V = 0 for a time τ, which enforces a hard refractory period. Since the membrane voltage resets to a predetermined value, the interspike intervals produced by this model are independently drawn from the same interspike interval distribution: Here, 1/μ = 1/b is the mean interspike interval and λ = 1/D is a shape parameter that controls ISI variance. This neural model is not as realistic as that of a linear leaky integrate-and-fire neural model (Gerstner and Kistler, 2002), but is complex enough to illustrate the points made earlier about the scaling of information measures and time resolution. For illustration purposes, we assume that the time-binned neural spike train is well approximated by a renewal process, even when Δt is as large as one millisecond. This assumption will generally not hold, as past interevent counts could provide more detailed historical information that more precisely places the last spike within its time bin. Even so, the reported information measure estimates are still useful. The estimated hμ is an upper bound on the true entropy rate; the reported E is a lower bound on the true excess entropy using the Data Processing Inequality (Cover and Thomas, 2006); and the reported Cμ will usually be a lower bound on the true process' statistical complexity. Employing the renewal process assumption, numerical analysis corroborates the infinitesimal analysis above. Figure 3 plots F(n)—the proxy for the full, continuous-time, ISI distribution—for a given set of neuronal parameter values as a function of time resolution. Figure 4 then shows that hμ and Cμ exhibit logarithmic scaling at millisecond time discretizations, but that E does not converge to its continuous-time value until we reach time discretizations on the order of hundreds of microseconds. Even when Δt = 100 μs, bμ(Δt)/Δt still has not converged to its continuous-time values.

Figure 3

Figure 4

How spike-train information measures (or rates) depend on time discretization Δ. Top left: Statistical complexity Cμ as a function of both the ISI distribution shape parameters and the time bin size Δt. The horizontal axis is Δt in milliseconds on a log-scale and the vertical axis is Cμ in bits on a linear scale for three different ISI distributions following Equation (12) with τ = 2 ms. Top right: Entropy rate hμ also as a function of both shape parameters and Δt. Axes labeled as in the previous panel and the same three ISI distributions are used. Bottom left: Excess entropy E as a function of both the shape parameters and Δt. For the blue line bits; purple line, bits; and yellow line, bits. All computed from Equation (6). Bottom right: Bound information rate bμ(Δt)/Δt parameterized as in the previous panels. For the blue line bits per second; purple line, bits per second; and yellow line, bits per second. All computed from Equation (10).

An unleaky integrate-and-fire neuron driven by white noise has varying interevent count distributions . Based on the ISI distribution ϕ(t) given in Equation (12) with τ = 2 ms, 1/μ = 1 ms, and λ = 1 ms. Data points represent exact values of F(n) calculated for integer values of N. Dashed lines are interpolations based on straight line segments connecting nearest neighbor points. How spike-train information measures (or rates) depend on time discretization Δ. Top left: Statistical complexity Cμ as a function of both the ISI distribution shape parameters and the time bin size Δt. The horizontal axis is Δt in milliseconds on a log-scale and the vertical axis is Cμ in bits on a linear scale for three different ISI distributions following Equation (12) with τ = 2 ms. Top right: Entropy rate hμ also as a function of both shape parameters and Δt. Axes labeled as in the previous panel and the same three ISI distributions are used. Bottom left: Excess entropy E as a function of both the shape parameters and Δt. For the blue line bits; purple line, bits; and yellow line, bits. All computed from Equation (6). Bottom right: Bound information rate bμ(Δt)/Δt parameterized as in the previous panels. For the blue line bits per second; purple line, bits per second; and yellow line, bits per second. All computed from Equation (10). The statistical complexity Cμ increases without bound, as Δt → 0; see the top left panel of Figure 4. As suggested in the infinitesimal renewal analysis, hμ vanishes, whereas hμ/Δt diverges at a rate of μlog21/Δt, as shown in the top right plots of Figure 4. As anticipated, E tends to a finite, ISI distribution-dependent value when Δt tends to 0, as shown in the bottom left panel in Figure 4. Finally, the lower right panel plots bμ(Δt)/Δt. One conclusion from this simple numerical analysis is that one should consider going to submillisecond time resolutions to obtain accurate estimates of and , even though the calculated informational values are a few bits or even less than one bit per second in magnitude.

4. Alternating renewal processes

The form of the Δt-scalings discussed in Section 3 occur much more generally than indicated there. Often, our aim is to calculate the nondivergent component of these information measures as Δt → 0, but the rates of these scalings are process-dependent. Therefore, these divergences can be viewed as a feature rather than a bug; they contain additional information about the process' structure (Gaspard and Wang, 1993). To illustrate this point, we now investigate Δt-scalings for information measures of alternating renewal processes (ARPs), which are structurally more complex than the standard renewal processes considered above. For instance, these calculations suggest that rates of divergence of the τ-entropy rate smaller than the firing rate, such as those seen in Nemenman et al. (2008), are indicative of strong ISI correlations. Calculational details are sequestered in Appendix A. In an ARP, an ISI is drawn from one distribution ϕ(1)(t), then another distribution ϕ(2)(t), then the first ϕ(1)(t) again, and so on. We refer to the new piece of additional information—the ISI distribution currently being drawn from—as the modality. Under weak technical conditions, the causal states are the modality and time since last spike. The corresponding, generic ϵ-machine is shown in Figure 5. We define the modality-dependent survival functions as , the modality-dependent mean firing rates as:

Figure 5

ϵ-Machine for an alternating renewal process in which neither interevent count distribution is Δ-Poisson and they are not equal almost everywhere. State label n denotes n counts since the last event and present modality m.

the modality-dependent differential entropy rates: the modality-dependent continuous-time statistical complexity: and the modality-dependent excess entropy: It is straightforward to show, as done in Appendix A, that the time-normalized entropy rate still scales with log21/Δt: where . As expected, the statistical complexity still diverges: where H(p) = −p log2 p − (1 − p) log2 (1 − p) is the entropy in bits of a Bernoulli random variable with bias p. Finally, the excess entropy still limits to a positive constant: The additional terms H(·) come from the information stored in the time course of modalities. ϵ-Machine for an alternating renewal process in which neither interevent count distribution is Δ-Poisson and they are not equal almost everywhere. State label n denotes n counts since the last event and present modality m. As a point of comparison, we ask what these information measures would be for the original (noncomposite) renewal process with the same ISI distribution as the ARP. As described in Appendix B, the former entropy rate is always less than the true hμ; its statistical complexity is always less than the true Cμ; and its excess entropy is always smaller than the true . In particular, the ARP's hμ divergence rate is always less than or equal to the mean firing rate μ. Interestingly, this coincides with what was found empirically in the time series of a single neuron; see Figure 5C in Nemenman et al. (2008). The ARPs here are a first example of how one can calculate information measures of the much broader and more structurally complex class of processes generated by unifilar hidden semi-Markov models, a subclass of hidden semi-Markov models (Tokdar et al., 2010).

5. Information universality

Another aim of ours is to interpret the information measures. In particular, we wished to relate infinitesimal time-resolution excess entropies, statistical complexities, entropy rates, and bound information rates to more familiar characterizations of neural spike trains—firing rates μ and ISI coefficient of variations C. To address this, we now analyze a suite of familiar single-neuron models. We introduce the models first, describe the parameters behind our numerical estimates, and then compare the information measures. Many single-neuron models, when driven by temporally uncorrelated and stationary input, produce neural spike trains that are renewal processes. We just analyzed one model class, the noisy integrate-and-fire (NIF) neurons in Section 3, focusing on time-resolution dependence. Other common neural models include the linear leaky integrate-and-fire (LIF) neuron, whose dimensionless membrane voltage, after a suitable change of parameters, fluctuates as: and when V = 1, a spike is emitted and V is instantaneously reset to 0. We computed ISI survival functions from empirical histograms of 105 ISIs; we varied b ∈ [1.5, 5.75] in steps of 0.25 and a ∈ [0.1, 3.0] in steps of 0.1 to a = 1.0 and in steps of 0.25 thereafter. The quadratic integrate-and-fire (QIF) neuron has membrane voltage fluctuations that, after a suitable change of variables, are described by: and when V = 100, a spike is emitted and V is instantaneously reset to −100. We computed ISI survival functions from empirical histograms of trajectories with 105 ISIs; we varied b ∈ [0.25, 4.75] in steps of 0.25 and a ∈ [0.25, 2.75] in steps of 0.25. The QIF neuron has a very different dynamical behavior from the LIF neuron, exhibiting a Hopf bifurcation at b = 0. Simulation details are given in Appendix B. Finally, ISI distributions are often fit to gamma distributions, and so we also calculated the information measures of spike trains with gamma-distributed ISIs (GISI). Each neural model—NIF, LIF, QIF, and GISI—has its own set of parameters that governs its ISI distribution shape. Taken at face value, this would make it difficult to compare information measures across models. Fortunately, for each of these neural models, the firing rate μ and coefficient of variation C uniquely determine the underlying model parameters (Vilela and Lindner, 2009). As Appendix B shows, the quantities , , , and depend only on the ISI coefficient of variation C and not the mean firing rate μ. We estimated information measures from the simulated spike train data using plug-in estimators based on the formulae in Section 3. Enough data was generated that even naive plug-in estimators were adequate except for estimating bμ when C was larger than 1. See Appendix B for estimation details. That said, binned estimators are likely inferior to binless entropy estimators (Victor, 2002), and naive estimators tend to have large biases. This will be an interesting direction for future research, since a detailed analysis goes beyond the present scope. Figure 6 compares the statistical complexity, excess entropy, entropy rate, and bound information rate for all four neuron types as a function of their C. Surprisingly, the NIF, LIF, and QIF neuron's information measures have essentially identical dependence on C. That is, the differences in mechanism do not strongly affect these informational properties of the spike trains they generate. Naturally, this leads one to ask if the informational indifference to mechanism generalizes to other spike train model classes and stimulus-response settings.

Figure 6

Information universality across distinct neuron dynamics. We find that several information measures depend only on the ISI coefficient of variation C and not the ISI mean firing rate μ for the following neural spike train models: (i) neurons with Gamma distributed ISIs (GISI, blue), (ii) noisy integrate-and-fire neurons governed by Equation (11) (NIF, green), (iii) noisy linear leaky integrate-and-fire neurons governed by Equation (18) (LIF, dotted red), and (iv) noisy quadratic integrate-and-fire neurons governed by Equation (19) (QIF, dotted blue). Top left: . Top right: . Bottom left: . Bottom right: . In the latter, ISI distributions with smaller C were excluded due to the difficulty of accurately estimating from simulated spike trains. See text for discussion. Figure 6's top left panel shows that the continuous-time statistical complexity grows monotonically with increasing C. In particular, the statistical complexity increases logarithmically with ISI mean and approximately linearly with the ISI coefficient of variation C. That is, the number of bits that must be stored to predict these processes increases in response to additional process stochasticity and longer temporal correlations. In fact, it is straightforward to show that the statistical complexity is minimized and excess entropy maximized at fixed μ when the neural spike train is periodic. This is unsurprising since, in the space of processes, periodic processes are least cryptic (Cμ − E = 0) and so knowledge of oscillation phase is enough to completely predict the future. (See Appendix B.) The bottom left panel in Figure 6 shows that increasing C tends to decrease the excess entropy E—the number of bits that one can predict about the future. E diverges for small C, dips at the C where the ISI distribution is closest to exponential, and limits to a small number of bits at large C. At small C, the neural spike train is close to noise-free periodic behavior. When analyzed at small but nonzero Δt, E encounters an “ultraviolet divergence” (Tchernookov and Nemenman, 2013). Thus, E diverges as C → 0, and a simple argument in Appendix B suggests that the rate of divergence is log2(1/C). At an intermediate C ~ 1, the ISI distribution is as close as possible to that of a memoryless Poisson process and so E is close to vanishing. At larger C, the neural spike train is noise-driven. Surprisingly, completely noise-driven processes still have a fraction of a bit of predictability: knowing the time since last spike allows for some power in predicting the time to next spike. The top right panel shows that an appropriately rescaled differential entropy rate varies differently for neural spike trains from noisy integrate-and-fire neurons and neural spike trains with gamma-distributed ISIs. As expected, the entropy rate is maximized at C near 1, consistent with the Poisson process being the maximum entropy distribution for fixed mean ISI. Gamma-distributed ISIs are far less random than ISIs from noisy integrate-and-fire neurons, holding μ and C constant. Finally, the continuous-time bound information (bμ) rate varies in a similar way to E with C. (Note that since the plotted quantity is , one could interpret the normalization by 1/μ as a statement about how the mean firing rate μ sets the natural timescale.) At low C, the bμ rate diverges as , as described in Appendix B. Interestingly, this limit is singular, similar to the results in Marzen and Crutchfield (2014): at C = 0, the spike train is noise-free periodic and so the bμ rate is 0. For C ≈ 1, it dips for the same reason that E decreases. For larger C, bμ's behavior depends rather strongly on the ISI distribution shape. The longer-ranged gamma-distribution results in ever-increasing bμ rate for larger C, while the bμ rate of neural spike trains produced by NIF neurons tends to a small positive constant at large C. The variation of bμ deviates from that of E qualitatively at larger C in that the GISI spike trains yield smaller total predictability E than that of NIF neurons, but arbitrarily higher predictability rate. These calculations suggest a new kind of universality for neuronal information measures within a particular generative model class. All of these distinct integrate-and-fire neuron models generate ISI distributions from different families, yet their informational properties exhibit the same dependencies on Δt, μ, and C in the limit of small Δt. Neural spike trains with gamma-distributed ISIs did not show similar informational properties. And, we would not expect neural spike trains that are alternating renewal processes to show similar informational properties either. (See Section 4.) These coarse information quantities might therefore be effective model selection tools for real neural spike train data, though more groundwork must be explored to ascertain their utility.

6. Conclusions

We explored the scaling properties of a variety of information-theoretic quantities associated with two classes of spiking neural models: renewal processes and alternating renewal processes. We found that information generation (entropy rate) and stored information (statistical complexity) both diverge logarithmically with decreasing time resolution for both types of spiking models, whereas the predictable information (excess entropy) and active information accumulation (bound information rate) limit to a constant. Our results suggest that the excess entropy and regularized statistical complexity of different types of integrate-and-fire neurons are universal in the sense that they do not depend on mechanism details, indicating a surprising simplicity in complex neural spike trains. Our findings highlight the importance of analyzing the scaling behavior of information quantities, rather than assessing these only at a fixed temporal resolution. By restricting ourselves to relatively simple spiking models we have been able to establish several key properties of their behavior. There are, of course, other important spiking models that cannot be expressed as renewal processes or alternating renewal processes, but we are encouraged by the robust scaling behavior of the entropy rate, statistical complexity, excess entropy, and bound information rate over the range of models we considered. There was a certain emphasis here on the entropy rate and hidden Markov models of neural spike trains, both familiar tools in computational neuroscience. On this score, our contributions are straightforward. We determined how the entropy rate varies with the time discretization and identified the possibly infinite-state, unifilar HMMs required for optimal prediction of spike-train renewal processes. Entropy rate diverges logarithmically for stochastic processes (Gaspard and Wang, 1993), and this has been observed empirically for neural spike trains for time discretizations in the submillisecond regime (Nemenman et al., 2008). We argued that the hμ divergence rate is an important characteristic. For renewal processes, it is the mean firing rate; for alternating renewal processes, the “reduced mass” of the mean firing rates. Our analysis of the latter, more structured processes showed that a divergence rate less than the mean firing rate—also seen experimentally (Nemenman et al., 2008)—indicates that there are strong correlations between ISIs. Generally, the nondivergent component of the time discretization-normalized entropy rate is the differential entropy rate; e.g., as given in Stevens and Zador (1996). Empirically studying information measures as a function of time resolution can lead to a refined understanding of the time scales over which neuronal communication occurs. Regardless of the information measure chosen, the results and analysis here suggest that much can be learned by studying scaling behavior rather than focusing only on neural information as a single quantity estimated at a fixed temporal resolution. While we focused on the regime in which the time discretization was smaller than any intrinsic timescale of the process, future and more revealing analyses would study scaling behavior at even smaller time resolutions to directly determine intrinsic time scales (Crutchfield, 1994). Going beyond information generation (entropy rate), we analyzed information measures—namely, statistical complexity and excess entropy—that have only recently been used to understand neural coding and communication. Their introduction is motivated by the hypothesis that neurons benefit from learning to predict their inputs (Palmer et al., 2013), which can consist of the neural spike trains of upstream neurons. The statistical complexity is the minimal amount of historical information required for exact prediction. To our knowledge, the statistical complexity has appeared only once previously in computational neuroscience (Haslinger et al., 2010). The excess entropy, a closely related companion, is the maximum amount of information that can be predicted about the future. When it diverges, then its divergence rate is quite revealing of the underlying process (Crutchfield, 1994; Bialek et al., 2001), but none of the model neural spike trains studied here had divergent excess entropy. Finally, the bound information rate has yet to be deployed in the context of neural coding, though related quantities have drawn attention elsewhere, such as in nonlinear dynamics (James et al., 2014), music (Abdallah and Plumbley, 2009), spin systems (Abdallah and Plumbley, 2012), and information-based reinforcement learning (Martius et al., 2013). Though its potential uses have yet to be exploited, it is an interesting quantity in that it captures the rate at which spontaneously generated information is actively stored by neurons. That is, it quantifies how neurons harness randomness. Our contributions to this endeavor are more substantial than the preceding points. We provided exact formulae for the above quantities for renewal processes and alternating renewal processes. The new expressions can be developed further as lower bounds and empirical estimators for a process' statistical complexity, excess entropy, and bound information rate. This parallels how the renewal-process entropy-rate formula is a surprisingly accurate entropy-rate estimator (Gao et al., 2008). By deriving explicit expressions, we were able to analyze time-resolution scaling, showing that the statistical complexity diverges logarithmically for all but Poisson processes. So, just like the entropy rate, any calculations of the statistical complexity—e.g., as in Haslinger et al. (2010)—should be accompanied by the time discretization dependence. Notably, the excess entropy and the bound information rate have no such divergences. To appreciate more directly what neural information processing behavior these information measures capture in the continuous-time limit, we studied them as functions of the ISI coefficient of variation. With an appropriate renormalization, simulations revealed surprising simplicity: a universal dependence on the coefficient of variation across several familiar neural models. The simplicity is worth investigating further since the dynamics and biophysical mechanisms implicit in the alternative noisy integrate-and-fire neural models are quite different. If other generative models of neural spike trains also show similar information universality, then these information measures might prove useful as model selection tools. Finally, we close with a discussion of a practical issue related to the scaling analyses—one that is especially important given the increasingly sophisticated neuronal measurement technologies coming online at a rapid pace (Alivisatos et al., 2012). How small should Δt be to obtain correct estimates of neuronal communication? First, as we emphasized, there is no single “correct” estimate for an information quantity, rather its resolution scaling is key. Second, results presented here and in a previous study by others (Nemenman et al., 2008) suggest that extracting information scaling rates and nondivergent components can require submillisecond time resolution. Third, and to highlight, the regime of infinitesimal time resolution is exactly the limit in which computational efforts without analytic foundation will fail or, at a minimum, be rather inefficient. As such, we hope that the results and methods developed here will be useful to these future endeavors and guide how new technologies facilitate scaling analysis.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

48 in total

1. On decoding the responses of a population of neurons from short time windows.

Authors: S Panzeri; A Treves; S Schultz; E T Rolls
Journal: Neural Comput Date: 1999-10-01 Impact factor: 2.026

2. How to measure the information gained from one symbol.

Authors: M R DeWeese; M Meister
Journal: Network Date: 1999-11 Impact factor: 1.273

3. Input synchrony and the irregular firing of cortical neurons.

Authors: C F Stevens; A M Zador
Journal: Nat Neurosci Date: 1998-07 Impact factor: 24.884

4. Temporal coding of visual information in the thalamus.

Authors: P Reinagel; R C Reid
Journal: J Neurosci Date: 2000-07-15 Impact factor: 6.167

5. Predictability, complexity, and learning.

Authors: W Bialek; I Nemenman; N Tishby
Journal: Neural Comput Date: 2001-11 Impact factor: 2.026

6. Retinal ganglion cells act largely as independent encoders.

Authors: S Nirenberg; S M Carcieri; A L Jacobs; P E Latham
Journal: Nature Date: 2001-06-07 Impact factor: 49.962

7. Regularities unseen, randomness observed: levels of entropy convergence.

Authors: James P Crutchfield; David P Feldman
Journal: Chaos Date: 2003-03 Impact factor: 3.642

8. Multiscale entropy analysis of complex physiologic time series.

Authors: Madalena Costa; Ary L Goldberger; C-K Peng
Journal: Phys Rev Lett Date: 2002-07-19 Impact factor: 9.161

9. Binless strategies for estimation of information from neural data.

Authors: Jonathan D Victor
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2002-11-11

10. Inferring statistical complexity.

Authors:
Journal: Phys Rev Lett Date: 1989-07-10 Impact factor: 9.161

5 in total

1. Relationship in Pacemaker Neurons Between the Long-Term Correlations of Membrane Voltage Fluctuations and the Corresponding Duration of the Inter-Spike Interval.

Authors: Alberto Seseña Rubfiaro; José Rafael Godínez; Juan Carlos Echeverría
Journal: J Membr Biol Date: 2017-04-17 Impact factor: 1.843